lördag 28 december 2013

∑ 2013

This year has been a very enlightening one. I have peeled the onion and learnt that many things earlier considered magic, was not very magic at all, yet elegant and far from trivial to implement myself. Hopefully I'm less prone to get stuck in trying to reimplement these features in the future.

I have finally learnt to see through all the abstract classes of java, finally understood the benefits interface and how clojure generates bytecode. Maybe a bit late, but I'm always learn things backwards anyway.

I've been reading some really good books and articles. Among them are

Brian Goetz - Java Concurrency in practice in which I learnt much more about the volatile and various atomic constructs in java.

Fred Hébert - Learn You Some Erlang for great good! - apart from the marvelous illustrations it's fun to see what Erlangs strengths really are, among them a very efficient implementation of green threads and the selective message receiving, removing much complexity in parsing-like functionality.

The JVM serialization benchmarking results - told me about the existence of Avro and Kryo, among others. I later found out about Kryonet, which I hope to try out further. I also read up on Fressian,

Pedestal.io entered my life. It will be very hard to start develop web applications in other frameworks after trying out this beast.

I re-read The Joy of Clojure (Fogus, Chouser) and realized I had missed most of it at the first read. The talk from Hugo Duncan on Debuging in Clojure helped me grasp the mind-blowing feature of starting a repl in the middle of an error.

Entered Garbage Collections Eden area when I visited a Lisp meet up in Gothenburg. People discussed to implement their own Garbage Collector and I was thinking "Impossible!". Afterwards I read up on the subject. It's not impossible at all, and it made me a somewhat better programmer to know it's not magic and I even dare to think I can play a bit better with garbage collecting now.

Professionally I've been able to juggle matrixes several gigabytes large in memory, which made my $500 laptop be as performant as a half rack of Proliant servers. Fun, and more than a bit scary.

I finally read through all of the Clojure source code. Much to say, yet little. The array-map (as well as the large hash-map) is likely much more conservative on allocating objects than Java Collections own implementations.

I learnt that TCP is quite a shitty protocol for throughput because of its strict ordering ACK mechanism. This explains why Storm and a lot of other applications chose to use UDP and implement their own ACK-ing mechanisms. I also learned that the SCTP protocol is quite cool, and that even Java already supports it.

I thought long and hard about compilation and compilers. I read parts of Urban Boquists PhD thesis on Code Optimization for Lazy Functional Languages, and found realized that inlining and register optimization share some similarities, although explicit register optimization likely will produce better results faster.

I wanted to find out if JIT supports the SSE and other late x86 instruction sets, and turns out it does, although it's hard to know exactly when. There's the option PrintAssembly for knowing exactly what the JIT spits out, which I hope to investigate more.

CUDA and OpenCL was getting visits from Clojure-generated guests like Sleipnir, I'm still looking for a suitable problem to squeeze into GPUs.

I also read up on Postgres SQL internals and indexing, the bitmap indexes are a cool thing. The clojure.lang.PersistentHashMap is implemented in very similar way. Could this be used to optimize the clojure.set/union and other set-operations somehow?

I finally discovered the ThreadLocal classes in java, which are potentially great for thread-bound processing, like CRC calculations or cryptographic state.

Thanks to David Nolen for continously tweeting things about relational programming I didn't even know existed.

Zach Tellman published Narrator, which is yet another stream processing library, full of clever close-to-JVM goodies.

(inc 2013)
Hopefully next year will be the year I visit some Clojure conference. I'm thinking a lot on state machines and network programming in my current job, but also visualizations, so I really hope I will be able to publish something slightly valuable regarding these issues.


Suprisingly expensive volatiles

The java keyword volatile defines a variable to always be written to main memory before some other method access it. It's important to notice that this write to main memory takes about 1000 clockcycles on a modern CPU, since the variable has to traverse three layers of cache to get there. The use of volatile should be very carefully investigated. The blog Mechanical Sympathy wrote more about the use of volatile variables two and a half year ago.

torsdag 26 september 2013

Need for counterspeed

I just had to try out the speed of different counters. Turns out clojure atoms is competitive for all but the most specialized applications.

fredag 6 september 2013

View 2d-arrays in Incanter

There were a question the in Incanter google-group for a way to show the content of 2d-arrays as heatmaps.

I suggested him to use a function that takes the integer from the given coordinates and be careful about the axis-scaling. Trying it out my self it turned out to be a terrible suggestion. Sorry.

I looked into the source-code of incanter.chart/heat-map* and highjacked the place where the function was called and replaced that with the previously developed matrix-lookup. Simple and ugly.




måndag 2 september 2013

Touché

Great blog post: The Perils of Future-Coding. Of course I can recognize myself in that one.

Well, the cures are: dumb down things! A lot. I will publish three examples of dumbing down things here in rapid order. See ya!

söndag 18 augusti 2013

Soft Realtime Java

Anyone visiting my home could verify that garbage collection is something best left to some other entity than me. One severe problem with having a computer program looking what your program is doing trying to clean up after it, is that it sometimes gets so confused it has to trigger some divine intervention, stop the time, go through much of the allocated memory and reallocate parts not in use anymore.

This seasonally cleaning can take a while. Usually that's ok, but for more and more applications the world can't wait. One of these are when the computer is busy generating cool synth sounds for a dancefloor in real time.

To solve this issue many programs simply don't use Java, but C++ or something else where you have more control over how your memory is allocated and deallocated. This leads to nicely working software, like the Ableton Live, but also rather much work for the programmers behind it. Ok, Javas somewhat crimped primitives and the JVMs limitied abilities to reach new efficient processor instructions wouldn't make much to help it out either, but anyway.

One new player in the game is IBM's Metronome GC, that can guarantee a maximum pause time for Java GC that is about 2 ms. Of course, someone made a sample-player synth written in java with rather competitive latencies.

onsdag 14 augusti 2013

Garbage Collection visualization


I really like the old disk defragmenter program in Windows. The small blocks where moved all around and I could watch it for hours (well, minutes at least).


The JVM Garbage Collection is something of a black-box which sometimes just makes everything go to halt. To get a better understanding of how it is struggling with my code I wanted something like the disk defragmenter, and, Eureka! Netflix has made their own tool, gc.viz (blogpost), and also logs all the garbage collection closely in their production systems. Interesting indeed!


lördag 10 augusti 2013

Leaderboard, some approaches

I'm looking into the leader board problem. In short, the problem can be expressed as the most efficient way to take the n best out of a large (often semi ordered) set with the size m (the cardinality of the total league).

The most naive version is to keep a list of the n≤m best players seen so far. If we assume no sorting among these n players, the complexity of the algorithm will be n*m, which is often unacceptable, since both m and n can be huge.

Cache the worst one the list
A slightly less naive way is to cache the least good player in the current toplist. This will, at best, give only slightly above m comparisions, given that the best players came first. Beware though, if the set is given in reversed order, the complexity once again soars to n*m.

Keep a binary tree heap
A heap is an often excellent way to select items out of a set. The most general selection, the heap-sort has a well known n log n complexity. The heap could be a candidate because it keeps the players on the top list sorted and could be a good candidate. This is not the case. The binary tree has the sad property that it can be very unbalanced and we are back again at n*m.

Keep a balanced binary tree heap!
Now we're talking! There are several trees that balance themselves in a well-known worst case time available. Actually, there are some specialized heap data structures availiable, a list from Wikipedia on Heap data structures gives the following
  • 2-3 heap
  • Beap 
  • Binary heap
  • Binomial heap
  • Brodal queue
  • d-ary heap
  • Fibonacci heap
  • Leftist heap
  • Pairing heap
  • Skew heap
  • Soft heap
  • Weak heap
  • Leaf heap
  • Radix heap
  • Randomized meldable heap
where the Fibonacci and Brodal heaps seems to have among the best over-all performance (edit: no, the different heaps do of course have different properties depending on the insert/delete proportions). 

I'll give a try to implement the Fibonacci heap. And yes, it's one availiable in clojure already, courtesy of Mary Rose Cook, maryrosecook/fibonacciheap, built with zippers, nicely described a great blogpost. Super elegant.

fredag 9 augusti 2013

Small app for finding BPMs by key-tap

With Giorgio-voice: When I was young, 15-16 I definetly wanted to become a DJ and producer. I left that for my friends in Skillnicker instead. I took the engineering route instead. And here we are.

When you want to mix two songs togheter they needed to be in the same tempo, therefore we marked our records with the number of BPMs a certain song had. We calculated in the beginning by measuring the length of 16 beats with a stop watch and look into a table. After a while there were programmes where one just could tap the space bar and get a decent result. Here is such a program in Clojure.

  The intresting part of the program is the minimal state, it's just the last time we got a key press in the frame. I will add a Kalman-filter to it soon, giving it a state of a hefty two or three variables.

torsdag 11 juli 2013

Core.async for simple user-interaction

Alex Miller gives an excellent example on how to make rock-paper-scissors with core.async which made me want to try it on my mekanikmaskin - a computer-aided learning and examinating tool.

The cool thing about core.async is that it makes sort-of a reactive programming possible. Put something in a channel, and watch the program take care of it. Almost like REPLs waiting inside functions wherever you like.


And yes, it's a terrible mixture between printlns and the async constructs, but the way to create a more advanced interaction is for another day.

lördag 29 juni 2013

Project.clj superpowers with Alembic

Hugo Duncan (Pallet etc) have made it possible to dynamically add new Maven/Leiningen dependencies from the REPL through Alembic.

This opens up from starting a repl with just Clojure and keep adding dependencies all night long.

As a courtesy to you, dear reader, a current project.clj of mine have the following deps so you can try them out for yourself, in the live-running repl.

Some of my common Project.clj dependencies

[org.clojure/clojure "1.5.1"]
[org.clojure/tools.logging "0.2.6"]

[org.clojure/data.csv "0.1.2"]
[org.clojure/data.json "0.2.2"]


The Clojure language (which should already be available in the repl), functions for logging, parsing comma-separated values from exports and parse JSON from REST-apis.

[org.clojure/google-closure-library-third-party "0.0-2029"]
Needed for ClojureScript code compilation. Probably outdated and I should switch to Pedestal instead.

[clj-time "0.5.0"]
A wrapper around Joda-time, the sane way to handle time, date and intervals on the JVM.

[clj-http "0.7.2"]
An http-client based on Apache Commons

[ch.qos.logback/logback-classic "1.0.11"]
The best working logging library IMHO.

[incanter/incanter-core "1.5.1"]
[incanter/incanter-io "1.5.1"]
[incanter/incanter-charts "1.5.1"]

Incanter is a R-like package for Clojure. There are functions for various statistics, matrix calculations, not-super-cool-but-convenient visualization through JFreeChart, readers for various data sets and more.

[enlive "1.1.1"]
HTML Templating library based on transforms.

[domina "1.0.0"]
ClojureScript-library for manipulating the DOM.

[com.datomic/datomic-free "0.8.4020"
 :exclusions [org.slf4j/slf4j-nop          
              org.slf4j/slf4j-log4j12]]

A convenient, scalable, relational database that is built in Clojure keeping many of Clojure's idioms while getting the Durability in Clojures ACI-persistent datastructures. The exclusions is to make sure the logging from Datomic not collides with logback.

[compojure "1.1.5"]
A convenient library to make ring-handlers for specified routes. If you come from PHP-world this is like where you put the "filenames" and/or the rewrite rules, only that it's made right from the beginning. 

[ring "1.1.8"]
The servlet-abstraction library for clojure.

[com.keminglabs/vomnibus "0.3.2"]
Small useful things, like the ColorBrewer colors.

[com.keminglabs/c2 "0.2.2"]
I try to make visualizations with this one instead of D3, but it's less mature and I'm not skilled enough in the toolchain to get as much from it as from D3 right now. I have not given up yet!

[ring-mock "0.1.5"]
Mocking is a way to create a fake instance of something, in this case it allows a ring handler function to be called to verify that the function returns excepted results, all this without starting a web-server, eliminating the possibility to have collisions in port-allocation etc.

[clj-pid "0.1"]
A .pid contains the Process ID (an integer) as a string. This can be used to verify the process is running, and for kill it by a shellscript.

[schejulure "0.1.4"]
A cron-like scheduling library based on the Java concurrent scheduling libraries and some core.logic.

måndag 3 juni 2013

Clj-Liblinear (thanks Kevin)

I had some problems with "typed" ragged arrays when I tried to wrap Lib-linear in Clojure. Kevin Lynagh and Benedikt Waldvogel had already solved them in clj-liblinear, thanks!

Now onto some batching for determining nice parameters for our tests.

tisdag 28 maj 2013

Quick distinct set generator

In one application I needed to create quite many sets of distinct numbers in a limited range.

This is done quickly by doing a knuth-shuffle of an array with all the numbers in the range and do the random swapping as many steps as the size of the requested set.

Here's a typed and thread safe version - no reflections gives a super quick execution. The locking macro makes sure we don't mix everything up by parallel threads.

(let [vektor-length 400
      set-size 20
      shuffle-seed (long-array (shuffle (range vektor-length)))]

  (defn shuffle-n!
    "swaps the n first with any other in vector,
  (see knuth shuffle)
array locking has to be done by the caller, otherwise
we could have leaks -> duplicates"

    [^longs arr n]
    (dotimes [idx n]
      (let [shuffle-with-idx (rand-int vektor-length)
            a (aget arr idx)
            b (aget arr shuffle-with-idx)]
        (aset arr idx b)
        (aset arr shuffle-with-idx a))))

  (defn gen-distinct-set []
    (locking ^longs shuffle-seed
             (shuffle-n! shuffle-seed set-size)
             (let [new-distinct-set (long-array set-size)]
               (dotimes [idx set-size]
                 (aset new-distinct-set idx (aget ^longs shuffle-seed idx)))
               new-distinct-set))))



A test of this algorithm rendered a million sets in less than a second, which was more than sufficient for my application, otherwise my next step would have been to dig into thread-local constructs in flatland/useful thread-local macro, where every thread could have it's own seed vector.

(I don't check for duplicates, since they are unlikely enough for my application).

tisdag 26 mars 2013

Don'ts: Git hooks as CI/deploy tool

GIҬ

I want to get Heroku-like functionality for deploying new version - just git push and you're done. Crazy sexy cool.

I did try with git hooks in various ways, but since I run the server as a nohup jetty-process (I know) and have to kill that in brutal ways, cannot be sure on the merging of the branches since I had the git repo checked out (use a intermediate bare repo, please) I could not get it work. It was also really hard to know what was happening.

The search light is now against lein-war (for running in tomcat7) and some nice pallet solution.\

Git hooks is awesome for other things, but they are no system deployment/ continuous integration tool.

Update: I'm now a happy Jenkins user. Easy to get started with!

fredag 22 mars 2013

merge-with to the rescue

I'm working on coloring a map with different colors, that sometimes overlap.

After deep though I found out that I could have colorings in maps and then merge the maps somehow. This doesn't work perfectly, but it's a good start. After fiddling around with home-brewed solutions I looked in core.clj for merge in the source code and found merge-with just below. Perfect match.

A short example of the power of merge-with:


(def scandinavia (zipmap ["sweden" "norway" "denmark" "aaland" "iceland" "finland" "greenland" "fareoes"] (repeat :yellow)))
=>{"fareoes" :yellow, ...

(def swedish-speaking (zipmap ["sweden" "aeland" "finland" "ukraine" ] (repeat :red)))
=>{"ukraine" :red, ... * see below for Swedish language in Ukraine

(defn blend [a b] :orange)

(merge-with blend scandinavia swedish-speaking)

{"ukraine" :red, 
"fareoes" :yellow, 
"greenland" :yellow, 
"finland" :orange, 
"iceland" :yellow, 
"aaland" :orange, 
"denmark" :yellow, 
"norway" :yellow, 
"sweden" :orange}

As you can I cheated quite extensively with the blend function, but in this case it doesn't matter, since we're always blending yellow and red if we are blendning.

Merge-with uses the function given (blend) if two keys are colliding when merging. If you want to sum to maps, use (merge-with + {:x 1 :y 2} {:x 3}) and get {:x 4 :y 2}.

* And yes, there are some really old ladies speaking Swedish in Ukraine in the Gammalsvenskby.

torsdag 21 mars 2013

Minimal C2 bind! example

 C2 is a very sane vizualisation library for both Clojure and ClojureScript.

I had a hard time grasping bind! and unify, so I wanted to make the smallest example I could think of: generate a blue svg circle.

From O'Reilly's excellent introduction to SVG (available online) I found that the XML for an SVG circle would be something like:

<svg width="400px" height="400px">
 <circle cx="100" cy="100" r="45" style="fill: 'blue';"/>
</svg> 

I did once spend unreasonably long time to try to reach the DOM of an embedded SVG image, so from now on I embed the SVG DOM root in my html document, like this document with an empty SVG tag with id "bubbles":

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title>SVG bubble</title>
 </head>
 <body>
 <div>

  <svg id="bubbles"></svg></div>
 </div>
 <script src="/cljscode.js"></script>
 </body>
</html>


C2 transforms tags like Hiccup do, but you can say where you want to add it with the macro c2.util/bind! (works a bit like in enlive).

And some very minimal ClojureScript code:

(ns c2test.core
 (:require  [c2.core :as c2core])
 (:require-macros [c2.util :as c2util]))

(defn ^:export c2bubble [x y r]
  (c2util/bind! "#bubbles"
                [:svg#bubbles {:style  {:display "block" :margin "auto" :height 200 :width 200}}
                 [:circle {:cx x :cy y :r r :style { :fill "blue"}}]]))


By opening a browser console (for javascript) I can set coordinates of a blue bubble with a command like

c2test.core.c2bubble(20,50,30);

Now on to world domination.

onsdag 20 mars 2013

Logging in Clojure

There are several competing logging facilities in Java and some custom made for Clojure.

If you are starting a new project, consider to simply use Logback. There's an bloggpost introduction to integrate logback with Clojure and it's github repo clojure-example-logback-integration.

This took me too long to figure out.