pnathan: elephant bypasses fence to drink from pool (Default)
An amusing anecdote from the Clink development trenches.

Intersection is, roughly, O(n^2). To be more precise, it's O(min(n,m)^2), where n and m are the lengths of the input vectors.

This turns out to be hugely important when implementing JOIN in a relational database, because you wind up intersecting n-ways for n tables.

Some empirical analysis of a 3-way intersection:

Intersect from largest to smallest:


CLINK> (time (ref *foo* :row 
   (multi-column-query *foo*
     (list
       `(1 ,#'(lambda (s) (find #\1 s))) 
       `(0 ,(lambda (x) (> x 300))) 
       `(3 ,(lambda (x) (> x 900)))))))
Evaluation took:
  4.015 seconds of real time
  4.149000 seconds of total run time (4.149000 user, 0.000000 system)
  103.34% CPU
  8,676,250,063 processor cycles
  1,252,768 bytes consed




And from smallest to largest:

CLINK> (time (ref *foo* :row 
   (multi-column-query *foo*
     (list
       `(1 ,#'(lambda (s) (find #\1 s))) 
       `(0 ,(lambda (x) (> x 300))) 
       `(3 ,(lambda (x) (> x 900)))))))

Evaluation took:
  0.766 seconds of real time
  0.879000 seconds of total run time (0.879000 user, 0.000000 system)
  114.75% CPU
  1,655,372,433 processor cycles
  1,074,592 bytes consed




We can clearly see that our runtime dropped by roughly 4x, cycles dropped about 7x, and our allocations by about 0.2x.

One of the most fun things about the Clink project is that it directly uses concepts from undergraduate computer science courses and applies them.

Syndicate

RSS Atom

Most Popular Tags

Page Summary

Expand Cut Tags

No cut tags
Page generated Jul. 27th, 2017 04:39 am
Powered by Dreamwidth Studios