### Optimizations

May. 31st, 2015 11:34 amIntersection is, roughly, O(n^2). To be more precise, it's O(min(n,m)^2), where n and m are the lengths of the input vectors.

This turns out to be hugely important when implementing JOIN in a relational database, because you wind up intersecting n-ways for n tables.

Some empirical analysis of a 3-way intersection:

Intersect from largest to smallest:

CLINK> (time (ref *foo* :row (multi-column-query *foo* (list `(1 ,#'(lambda (s) (find #\1 s))) `(0 ,(lambda (x) (> x 300))) `(3 ,(lambda (x) (> x 900))))))) Evaluation took: 4.015 seconds of real time 4.149000 seconds of total run time (4.149000 user, 0.000000 system) 103.34% CPU 8,676,250,063 processor cycles 1,252,768 bytes consed

And from smallest to largest:

CLINK> (time (ref *foo* :row (multi-column-query *foo* (list `(1 ,#'(lambda (s) (find #\1 s))) `(0 ,(lambda (x) (> x 300))) `(3 ,(lambda (x) (> x 900))))))) Evaluation took: 0.766 seconds of real time 0.879000 seconds of total run time (0.879000 user, 0.000000 system) 114.75% CPU 1,655,372,433 processor cycles 1,074,592 bytes consed

We can clearly see that our runtime dropped by roughly 4x, cycles dropped about 7x, and our allocations by about 0.2x.

One of the most fun things about the Clink project is that it directly uses concepts from undergraduate computer science courses and applies them.