pnathan: elephant bypasses fence to drink from pool (Default)

Common Lisp tooling typically isn't oriented around the continuous integration/build systems that we're accustomed to in 2014.

I don't terribly like that, particularly since it's one of the few tools that have been demonstrated to work, and work well, in software engineering.

Anyway, I updated my TOML parser to work with (Travis CI)[] (but the principles are the same regardless of box host). Here's the code, followed by a write-up.

As a YAML:

  - curl -O -L
  - tar xjf sbcl-1.2.6-x86-64-linux-binary.tar.bz2
  - pushd sbcl-1.2.6-x86-64-linux/ && sudo bash && popd
  - curl -O -L
  - sbcl --load quicklisp.lisp --eval '(quicklisp-quickstart:install)' --eval '(quit)'
  - sbcl --script run-sbcl-tests.lisp

Where the run-sbcl-tests.lisp looks as follows:

(require "sb-posix")
 (let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp"
  (when (probe-file quicklisp-init)
   (load quicklisp-init)))
(defparameter *pwd* (concatenate 'string (sb-posix:getcwd) "/"))
(push *pwd* asdf:*central-registry*)
(ql:quickload :pp-toml-tests)
(let ((result-status (pp-toml-tests:run-tests)))
 (sb-posix:exit (if result-status 0 1) ))

Under the hood and in Lisp, pp-toml-tests:run-tests drives some fiveAM code to run the extant tests. FiveAM is, as far as I can tell, designed for on-the-fly interactive testing, as most Lisp tooling is. It was entirely too surprising in an integration with continuous integration. I've written my own bad hack for a unit testing framework, the "checker" system, designed for running in CI, but it's kind of, well, a bad hack. I should look elsewhere.

A few key highlights of what's going on in this code:

  1. I use old-style ASDF registry manipulation to dynamically set where we should be expecting our systems under test to be.
  2. This code relies on the SB-POSIX SBCL extension - I expect other systems will be able to do the same functionality, but SBCL is what I primarily use.
  3. I hand-load Quicklisp each time. That's not ideal, and should be changed when I update pp-toml to TOML 0.3.

Hope this helps your Common Lisp integration testing!


Feb. 16th, 2014 02:00 am
pnathan: elephant bypasses fence to drink from pool (Default)
One project that has languished for years is the CUSP Common Lisp plugin for Eclipse. There's a fork, Lispdev, also abandoned.

It's very aggravating, frankly. Lispdev doesn't appear to work - the preferences pane isn't even set up. Booting throws exceptions right and left.

CUSP doesn't really work on install, failing to lock into SBCL, and things are eh'.

Lispdev is about 28KLoC and Cusp about 19KLoC, all in Java, of course.

Feh. I want to get this working to the point of releasing a SBCL-working version. Let's see if interest exists for further work past that.
pnathan: elephant bypasses fence to drink from pool (Default)
The Lisp REPL is a particularly awesome tool, particularly when paired with SLIME or other customized evaluation system for live programming.

This insight has led to R, ipython, Macsyma, MySQL, Postgres, and other systems having their own REPL.

However, a serious problem in the Common Lisp REPL is the inability to sling large sums of data around easily, perform queries, etc. It's simply not built in to the system to have multimillion rows of data, perform queries on it, and feed it into particular functions. Lists are too slow; vectors are too primitive, hash tables are too restrictive. Further, queries start looking really hairy as lambdas, reduces, and mapcars chain together. SQL has shown a clearly superior succinctness of syntax. Worse, these queries are ridiculously non-optimized out of the gate. I've had to deal with this situation in multiple industry positions, and it is *not* acceptable for getting work done. It is too slow, too incoherent, and too inelegant.

Hence, I am working on a solution; it started out as CL-LINQ, or, Common Lisp Language INtegrated Queries, a derivative of the C# approach. The initial cut can be found at my github for interested parties. It suffers from a basic design flaw: 100% in-memory storage and using lists for internal representation.

I am proud to note that I've been able to begin work on an entirely improved and redesigned system. This system is derived from several key pieces. The first and most important is the data storage system, which is what I've been working on recently.

Data is stored in data frames; each data frame has information about its headers. Data itself is in a 2D Common Lisp array, ensuring nearly-constant access time to a known cell. Data frames are loaded by pages, which contains a reference to the data table, as well as a reference to the backing store. Pages store information about the data in the data frame. Each page has a 1:1 mapping to a data frame. Pages are routed through a caching layer with a configurable caching strategy, allowing only data of interest to be loaded in memory at a given point in time. Finally, a table contains a number of pages, along with methods to access the headers, particular rows in the table, etc.

After this system is done (perhaps 80% of the way done now), then the index system can be built. By building the indexes separate from the raw storage system, I can tune both for optimal behavior - indexes can be built as a tree over the data, while the data can be stored in an efficiently accessible mechanism.

Finally, as the motivating factor, the query engine will be designed with both prior systems in mind. The query engine's complexity will be interacting with the index system, to ensure high speed JOINs. A carefully developed query macro system could actually precompile desired queries for optimal layout and speed, for instance.

Features that will be considered for this project include - integration with postgres as the storage engine - compiled optimization of queries - pluggable conversion system for arbitrary objects and their analysis.

At the completion of this project, a library will be available for loading large amounts of data into data tables, computing queries and processing upon them, and then storing the transformed data into external sources.


Oct. 13th, 2013 11:15 pm
pnathan: elephant bypasses fence to drink from pool (Default)
Building an in-memory query system isn't trivially easy, without even going to scale.


RSS Atom

Most Popular Tags

Expand Cut Tags

No cut tags
Page generated Jul. 27th, 2017 04:45 am
Powered by Dreamwidth Studios