tisdag 12 augusti 2014

Steam-roller for nested Clojure data structures

There was a question on the Clojure Google group about structural sharing in rrb-vectors visavi a zipper-construction.

As we all know, it's not very easy to be certain of how much memory is allocated in Java/Clojure since the pointers are sometimes reused, and the situation gets even more complicated (for the better) when we have structural sharing and immutable objects.

The Clojure data structures are constructed from Object arrays, which creates shallow nested trees of either vector elements or key-value pairs in the case of PersistentMaps. Likely one could use reflection to dig deeper into the the inner parts of these data structures and arrays, Iroh and Clj-wallhack seems both worthy to try out.

I created a small function for getting a pointer to every  Clojure data structure from a nested structure (without taking care of it's aforementioned internals, though).
(steam-roller [:a {1 2 3 4} 1 1 :a :a])
-> ([:a {1 2, 3 4} 1 1 :a :a] :a {1 2, 3 4} 1 3 2 4 1 1 :a :a)
count: 11
;;identityHashCodes:
(map #(System/identityHashCode %) (steam-roller [:a {1 2 3 4} 1 1 :a :a]))
(1460028191
2005509007
1545655515
504432400
1778018115
1172818142
256714182
504432400
504432400
2005509007
2005509007)
count: 11
;;unique objects
(set (map #(System/identityHashCode %) (steam-roller [:a {1 2 3 4} 1 1 :a :a])))
#{2005509007
1172818142
1778018115
163464565
504432400
717359442
256714182}
count: 7
view raw example.clj hosted with ❤ by GitHub
(steam-roller [:a :b {:c 2 3 4}])
([:a :b {:c 2, 3 4}] :a :b {:c 2, 3 4} :c 3 2 4)
(defn steam-roller [l]
"gives a representation of all object pointers in a
nested structure by both adding pointers to everything seqable
as well as its contents
it also takes out keys and vals of maps, as well as the whole map"
(let [l [l]]
(loop [l1 l l2 '()]
(cond
(sequential? (first l1)) (recur (concat (first l1) (rest l1)) (cons (first l1) l2))
(map? (first l1)) (recur (concat (keys (first l1)) (vals (first l1)) (rest l1)) (cons (first l1) l2))
(empty? l1) (reverse l2)
:else (recur (rest l1) (cons (first l1) l2))))))
view raw steamroller.clj hosted with ❤ by GitHub

I think the best way would be to count objects with VisualVM or some other profiler, though.