torsdag 17 juli 2014

Spy with Timbre

Peter Taoussanis Timbre logging library is awesome. It's dead easy to get started with - add a dependecy and you're done.

One really usable features in it (apart from the ordinary "write this to log" functions) is spy.

spy let's you wrap an expression and log both the expression and it's result, like so:

(require '[taoensso.timbre :as timbre])

(defn some-function [argh]
    (timbre/spy (reverse argh)))

(defn some-other-function [x]
    (timbre/spy (map rand-int x)))

(some-other-function (some-function [1 2 34]))
2014-Jul-17 18:12:06 +0200 Albatron-5 DEBUG [user] - (reverse argh) (34 2 1)
2014-Jul-17 18:12:06 +0200 Albatron-5 DEBUG [user] - (map rand-int x) (30 0 0)
-> (30 0 0)

 
The result of the two above functions are not apparent, and there is good reason to keep an eye on intermediate results. This can be tedious when not in light-table. Not so with spy.


söndag 4 maj 2014

The risks with TDD

Lately there has been a lot of debate about Test Driven Development. For me I started consider the usage of TDD when Rich Hickey formulated it as "you wouldn't drive a car by crashing into the guard-rails". I would change the metaphor slightly to mean "you don't design cars not crash into the guard-rails".

Whether you hit the guard-rails or not is not a decent measure whether a car is good or not. The higher idea of a car is not something that don't crashes into the guard rails. The higher idea of a car is more of something like a tool to transport yourself and some more people and stuff quickly, safe, cheap, and in a joyful manner from where you are now to some place farther away than you could comfortably walk. No guard rails involved.

"But the test-cases is a form of strict requirement specifications!" you say. Of course they are. It's necessary to specify requirements in some form, to detect logic inconsistencies and communicate the design (to make others able to criticize it). Write test cases is a way of doing this, but usually not the best way.

Another problem which is perpendicular to get a logically consistent view of the problem is the problem of avoiding/catch trivial implementation mistakes early. By trivial I mean something like a comparison with > instead of >=. These mistakes usually don't really touch the higher idea of the program, but are of course crucial to get correct. Handmade test cases, asserts, contract programming and even generative testing are great tools for this. Don't mix solving these bugs with detecting higher level logic inconsistencies of your code. Trivial bugs are relieving to correct, but never of any significant value for the program. It's not the hard problem.

My baby message-queue example
The risk with test-driven development is that you start to code before you know what to code. It is of course a good way to get yourself out of analysis paralysis, but in my experience, there are to little 'analysis paralysis' compared to 'cowboy coding that gets impossible to manage and extend later on'.

Let's say you are in need of some kind of messaging system. Your system is spread over several computers, and you have already utilized ad-hoc socket stream formats and various REST-APIs in an unsustainable way, just to get the whole thing going. You shrug at the idea of expanding the system. Wouldn't it be nice to solve this problem once and for all?

Let's say we try to formulate some test cases about how this messaging system could work. Obviously you'll need some way to connect the program to the messaging system bus, and some way to send and receive messages.

It could be a test-case like this:

(let [connection (connect "some-message-bus")]
   (is (up? connection)))

and

(let [connection (connect "some-message-bus")]
   (send! connection "hello")
   (is (== (receive! connection) "hello"))))

This is great! It is simple! I can see that there's a function connect that need to be implemented. This needs to return an object that, given to a function up? returns true.

I also want to be able to send and receive things on each bus.

This is actually highly enlightening. It's a minimalistic interface but it captures a small API. I know what I want and we can even discuss parts of the design given these small lines of code.

Of course there's a ton of various things I need to add - some event loop facility, networking support to be able to connect remote computers, reliance of the message queue, potentially other behaviours, but I could very well code up something that would work quite OK in process just given this simple test. Almost like magic!

Never underestimate the real problem in distributed computing
It could easily be the case that we actually would be in need of quite a different solution than the one we thought we needed - maybe we didn't parallelize the problem enough and did all the socket shuffling in vain (I've seen that more than once). And how do we handle that connections are lost? Can we unsubscribe a bus? What happens if we subscribe to many queues. What if servers are down? New servers join the cluster? Authentication? Am I the first person in the Universe facing this particular problem and is it unique of its kind? (that last one was a rhetoric question).

Depending on the specific problem the computing cluster should solve, there are many different ways to approach it. My small TDD approach has the insidious side-effect that it actually makes me narrow down my view of the problem way too early. It's very hard to kill code you wrote start over, almost from scratch. Good test takes effort to specify (and rightly so). To refactor tests probably takes even more effort than to refactor code.

The TDD way to navigate through the large space of solutions, can easily get stuck on a local maxima.

I can do nothing but approve Rich Hickeys idea of "hammock time". You really have to understand and be able to keep the whole problem you try to solve, and as many of its parameters and quirks. One good way to do this is really to try to solve the problem (preferably on paper) and after that see how others did solve similar problems, code up a small prototype, see how others code would solve the same problem, make sure you know how to the whole stack would work. When this work is done: TDD would work just fine.

TDD is a great scaffold, but a most often a really shitty sketch. Actually test cases have too high fidelity and is to slow to write and change, and you can mistake them for "real test cases that should be used to test the final solution on".

Test cases is not a good blue print either - test cases does not say how the solution should work, only how it should behave. If you really want to code this way (which is really powerful), use a declarative programming language, like datalog instead of abusing your precious brain time with being a manual  compiler for your own, made up, non-standard, most likely hard-to-understand declarative programming language. (Those are more common than you might think).

In summary
The major risk with TDD is that one get carried away on the wrong track, starts to solve some problem, get some kick from passing tests, continues in that direction for more passing tests kicks, and never really takes the opportunity to really think and reason systematically about the whole problem one have at hand.

Don't be mean to yourself and your friends, always think hard on your problem and possible solutions AFK before coding. If that is not possible, something is wrong, and your code will likely be as messy as the problem it tries to solve.

When you know exactly what the outlines of the problem are and how you want to solve this problem - then TDD is one of several tools to get your code super duper great. Use it accordingly. Thanks.

lördag 28 december 2013

∑ 2013

This year has been a very enlightening one. I have peeled the onion and learnt that many things earlier considered magic, was not very magic at all, yet elegant and far from trivial to implement myself. Hopefully I'm less prone to get stuck in trying to reimplement these features in the future.

I have finally learnt to see through all the abstract classes of java, finally understood the benefits interface and how clojure generates bytecode. Maybe a bit late, but I'm always learn things backwards anyway.

I've been reading some really good books and articles. Among them are

Brian Goetz - Java Concurrency in practice in which I learnt much more about the volatile and various atomic constructs in java.

Fred Hébert - Learn You Some Erlang for great good! - apart from the marvelous illustrations it's fun to see what Erlangs strengths really are, among them a very efficient implementation of green threads and the selective message receiving, removing much complexity in parsing-like functionality.

The JVM serialization benchmarking results - told me about the existence of Avro and Kryo, among others. I later found out about Kryonet, which I hope to try out further. I also read up on Fressian,

Pedestal.io entered my life. It will be very hard to start develop web applications in other frameworks after trying out this beast.

I re-read The Joy of Clojure (Fogus, Chouser) and realized I had missed most of it at the first read. The talk from Hugo Duncan on Debuging in Clojure helped me grasp the mind-blowing feature of starting a repl in the middle of an error.

Entered Garbage Collections Eden area when I visited a Lisp meet up in Gothenburg. People discussed to implement their own Garbage Collector and I was thinking "Impossible!". Afterwards I read up on the subject. It's not impossible at all, and it made me a somewhat better programmer to know it's not magic and I even dare to think I can play a bit better with garbage collecting now.

Professionally I've been able to juggle matrixes several gigabytes large in memory, which made my $500 laptop be as performant as a half rack of Proliant servers. Fun, and more than a bit scary.

I finally read through all of the Clojure source code. Much to say, yet little. The array-map (as well as the large hash-map) is likely much more conservative on allocating objects than Java Collections own implementations.

I learnt that TCP is quite a shitty protocol for throughput because of its strict ordering ACK mechanism. This explains why Storm and a lot of other applications chose to use UDP and implement their own ACK-ing mechanisms. I also learned that the SCTP protocol is quite cool, and that even Java already supports it.

I thought long and hard about compilation and compilers. I read parts of Urban Boquists PhD thesis on Code Optimization for Lazy Functional Languages, and found realized that inlining and register optimization share some similarities, although explicit register optimization likely will produce better results faster.

I wanted to find out if JIT supports the SSE and other late x86 instruction sets, and turns out it does, although it's hard to know exactly when. There's the option PrintAssembly for knowing exactly what the JIT spits out, which I hope to investigate more.

CUDA and OpenCL was getting visits from Clojure-generated guests like Sleipnir, I'm still looking for a suitable problem to squeeze into GPUs.

I also read up on Postgres SQL internals and indexing, the bitmap indexes are a cool thing. The clojure.lang.PersistentHashMap is implemented in very similar way. Could this be used to optimize the clojure.set/union and other set-operations somehow?

I finally discovered the ThreadLocal classes in java, which are potentially great for thread-bound processing, like CRC calculations or cryptographic state.

Thanks to David Nolen for continously tweeting things about relational programming I didn't even know existed.

Zach Tellman published Narrator, which is yet another stream processing library, full of clever close-to-JVM goodies.

(inc 2013)
Hopefully next year will be the year I visit some Clojure conference. I'm thinking a lot on state machines and network programming in my current job, but also visualizations, so I really hope I will be able to publish something slightly valuable regarding these issues.


Suprisingly expensive volatiles

The java keyword volatile defines a variable to always be written to main memory before some other method access it. It's important to notice that this write to main memory takes about 1000 clockcycles on a modern CPU, since the variable has to traverse three layers of cache to get there. The use of volatile should be very carefully investigated. The blog Mechanical Sympathy wrote more about the use of volatile variables two and a half year ago.

torsdag 26 september 2013

Need for counterspeed

I just had to try out the speed of different counters. Turns out clojure atoms is competitive for all but the most specialized applications.

fredag 6 september 2013

View 2d-arrays in Incanter

There were a question the in Incanter google-group for a way to show the content of 2d-arrays as heatmaps.

I suggested him to use a function that takes the integer from the given coordinates and be careful about the axis-scaling. Trying it out my self it turned out to be a terrible suggestion. Sorry.

I looked into the source-code of incanter.chart/heat-map* and highjacked the place where the function was called and replaced that with the previously developed matrix-lookup. Simple and ugly.




måndag 2 september 2013

Touché

Great blog post: The Perils of Future-Coding. Of course I can recognize myself in that one.

Well, the cures are: dumb down things! A lot. I will publish three examples of dumbing down things here in rapid order. See ya!