tisdag 28 maj 2013

Quick distinct set generator

In one application I needed to create quite many sets of distinct numbers in a limited range.

This is done quickly by doing a knuth-shuffle of an array with all the numbers in the range and do the random swapping as many steps as the size of the requested set.

Here's a typed and thread safe version - no reflections gives a super quick execution. The locking macro makes sure we don't mix everything up by parallel threads.

(let [vektor-length 400
      set-size 20
      shuffle-seed (long-array (shuffle (range vektor-length)))]

  (defn shuffle-n!
    "swaps the n first with any other in vector,
  (see knuth shuffle)
array locking has to be done by the caller, otherwise
we could have leaks -> duplicates"

    [^longs arr n]
    (dotimes [idx n]
      (let [shuffle-with-idx (rand-int vektor-length)
            a (aget arr idx)
            b (aget arr shuffle-with-idx)]
        (aset arr idx b)
        (aset arr shuffle-with-idx a))))

  (defn gen-distinct-set []
    (locking ^longs shuffle-seed
             (shuffle-n! shuffle-seed set-size)
             (let [new-distinct-set (long-array set-size)]
               (dotimes [idx set-size]
                 (aset new-distinct-set idx (aget ^longs shuffle-seed idx)))
               new-distinct-set))))



A test of this algorithm rendered a million sets in less than a second, which was more than sufficient for my application, otherwise my next step would have been to dig into thread-local constructs in flatland/useful thread-local macro, where every thread could have it's own seed vector.

(I don't check for duplicates, since they are unlikely enough for my application).