common lisp redux
Dec. 10th, 2022 07:21 pmCommon Lisp is the dynamic language I keep returning to. Garbage collection; a reasonably large amount of libraries; not-absurd policies around exceptions (hi Perl), not-absurd design around power (hi Python), not-absurd design around formatting & naming (hi Go).
Scala is, plausibly, at this point, the only other major language I rather like hacking in, and it's JVM tied (sigh), slow to compile (sigh), and frankly rather verbose.
And, too, the advanced-type languages often turn into a game of "lets defeat the type system so it works right". That's not to knock Rust or its less popular peers - Nim, Pony - but it's just, after a while, a drag.
The basic problem with Common Lisp is the lack of effective static assertions - this isn't so bad when you're in the moment, but when the codebase gets larger and you wind up modifying things "far away" - or you come back to the project months later - things don't work quiiite as well.
I will have to keep grinding on this little problem, but I think I will need to cajole up a basic solution:
A dynamic-binding macro system which acts as a type asserter that acts at eval-time time to check that the current args seem to be valid.
More sophisticatedly, this would have to be a treewalker that finds, expands, and checks function invocations.
Essentially, if I write
(DEFUN-TYPED foo ((x int) (y funny-object))
(compute-funny y (+ x 1)))
then the -TYPED system should validate that yes, x is doing int-y things and yes, compute-funny takes a funny object in the second position - and this should happen at eval-time.
If, however, some dynamic binding *no-typed-assert
is turned off, then at eval time this should not occur.
The previous effort I know about for Common Lisp gradual typing is Coalton, which is a sort of library-in-CL, which turned into its own CL-hosted language. Meh. Then there's https://github.com/mmontone/cl-gradual, which seems to be sort of off and on.
Anyway. I might use one of those, or cobble together some 'orrible 'ack myself.
yet another defclass wrapping macro
Dec. 10th, 2022 06:23 pm(defmacro defstruct* (name &rest slots) "Provides a DEFSTRUCT interface to a DEFCLASS" (flet ((accessor-symbol (sym) (intern (concatenate 'string (string name) "-" (string sym))))) (let* ((actual-slots (etypecase (car slots) (string (cdr slots)) (symbol slots) (cons slots))) (slot-list (loop for s in actual-slots collect (etypecase s (symbol `(,s :initarg ,s :accessor ,(accessor-symbol s))) (cons `(,(car s) :initarg ,(car s) :accessor ,(accessor-symbol (car s)) :initform ,(cadr s)))) ))) `(defclass ,name () ,slot-list (:documentation ,(etypecase (car slots) (string (car slots)) (symbol (format nil "The struct defining ~a, containing slots ~{~a~^, ~}" name slots)))) ))))
I like defstruct, a lot, in terms of its speed and simplicity of interface. But it gets rather miserable when you want to integrate with CLOS IME. So, another stab at it (previous stab was https://github.com/pnathan/defobject ). I think this one is considerably cleaner in stylistics.
Language debates: Ocaml
Oct. 5th, 2022 05:04 pmA few years ago, I went on a "best language for home development" hunt. I sorted through pretty much all I knew and came out the other side with "Ocaml".
Some 1100 lines of Ocaml later, I terminated the project.
Couple reasons.
- Generics without type specialization are painful. At some point, you have to concretize, and different types are, actually, different.
- Similarly, the lack of good generics meant things like printing was painful[1].
- The libraries for web interaction were very much inadequate, and tended to be async. I don't like or want async in my codebases, so that was aggravating.
- Similarly, the libraries for interacting with postgres weren't great.
- Build system initially was horrible; eventually things moved to a more stable point, but the hand was really burnt.
- Modules fried my hair. Powerful and deeply opaque - even after reading most of the docs.
- Docs were bad
I'd like to believe it's better now. I invested a lot of time into that code. The concision was fantastic. That said, the friction was very high.
I went and did things with Scala after that...
- See Go.
nothing in consumer electronics is so frustrating right now to me as the state of music playing.
- everything (ish) is streaming. But different means of getting audio to speakers exist.
- "Youtube Music" wants to autoplay your life out.
- "Google Play Music" was fine, but it was axed in favor of YT. And it removed "upload your music".
- Amazon music also whacked upload-your-music and went autoplay.
- Reliably instructing an Alexa device for words that aren't some deformed truncation of normal english goes like this, "hahahaha no".
- Google Home devices are better but I couldn't turn off autoplay, for even more fun
mp3 players are a defunct line of hardware now reserved for crappy crappy ipod knockoffs and some running devices.
None of the above needed to happen except, maybe, the hardware (due to profit/loss aspects).
Hosting my own music streaming service is doable, I guess, but then I lose access to the streaming libraries for which I (happily) pay for. And albums are expensive.
What's extra fantastic is that there doesn't seem to be a standard "wifi speaker" protocol, so I can't reliably send music from my android apps to Alexa or Google Home.
This is all bound in with what the choice was for the software to be written and the agency the individual programmers had at these different companies.
So it seems that the right way to do this is to buy a phone (because companies prefer to stream quality through apps, not browsers), hook it to wifi, and then have it wired into a speaker system. The phone would also have all my mp3s hosted on it.
Fantastically overcomplicated....
golang std lib deficiency
Sep. 1st, 2022 01:37 pmhttps://go.dev/play/p/po5A8KG6Zpv?v=goprev
package main import "fmt" func MapMerge1[K comparable, V any](a map[K]V, b map[K]V) map[K]V { m := map[K]V{} for k, v := range a { m[k] = v } for k, v := range b { m[k] = v } return m } func main() { disabledThing := map[string]interface{}{ "enabled": false, } other := map[string]interface{}{ "options": map[string]interface{}{ "thungus": false, }, } merged := MapMerge1(disabledThing, other) fmt.Println(merged) }
there's no reason why a generic Merge1 and MergeRecursive function shouldn't be implemented.
Note that this essentially requires `interface{}` as the value type.
The actual V type should be the unification of both input types: if that can't be resolved intelligibly, then barf.
Part of the issue here is that the type system _should_ (but doesn't) have tagged union types. That would be a reasonably elegant solution to this.
Optimizations
May. 31st, 2015 11:34 amIntersection is, roughly, O(n^2). To be more precise, it's O(min(n,m)^2), where n and m are the lengths of the input vectors.
This turns out to be hugely important when implementing JOIN in a relational database, because you wind up intersecting n-ways for n tables.
Some empirical analysis of a 3-way intersection:
Intersect from largest to smallest:
CLINK> (time (ref *foo* :row (multi-column-query *foo* (list `(1 ,#'(lambda (s) (find #\1 s))) `(0 ,(lambda (x) (> x 300))) `(3 ,(lambda (x) (> x 900))))))) Evaluation took: 4.015 seconds of real time 4.149000 seconds of total run time (4.149000 user, 0.000000 system) 103.34% CPU 8,676,250,063 processor cycles 1,252,768 bytes consed
And from smallest to largest:
CLINK> (time (ref *foo* :row (multi-column-query *foo* (list `(1 ,#'(lambda (s) (find #\1 s))) `(0 ,(lambda (x) (> x 300))) `(3 ,(lambda (x) (> x 900))))))) Evaluation took: 0.766 seconds of real time 0.879000 seconds of total run time (0.879000 user, 0.000000 system) 114.75% CPU 1,655,372,433 processor cycles 1,074,592 bytes consed
We can clearly see that our runtime dropped by roughly 4x, cycles dropped about 7x, and our allocations by about 0.2x.
One of the most fun things about the Clink project is that it directly uses concepts from undergraduate computer science courses and applies them.
travis CI with Common Lisp.
Dec. 29th, 2014 02:53 amCommon Lisp tooling typically isn't oriented around the continuous integration/build systems that we're accustomed to in 2014.
I don't terribly like that, particularly since it's one of the few tools that have been demonstrated to work, and work well, in software engineering.
Anyway, I updated my TOML parser to work with (Travis CI)[https://github.com/pnathan/pp-toml] (but the principles are the same regardless of box host). Here's the code, followed by a write-up.
As a YAML:
before_script:
- curl -O -L http://prdownloads.sourceforge.net/sbcl/sbcl-1.2.6-x86-64-linux-binary.tar.bz2
- tar xjf sbcl-1.2.6-x86-64-linux-binary.tar.bz2
- pushd sbcl-1.2.6-x86-64-linux/ && sudo bash install.sh && popd
- curl -O -L http://beta.quicklisp.org/quicklisp.lisp
- sbcl --load quicklisp.lisp --eval '(quicklisp-quickstart:install)' --eval '(quit)'
script:
- sbcl --script run-sbcl-tests.lisp
Where the run-sbcl-tests.lisp looks as follows:
#!/usr/local/bin/sbcl
(require "sb-posix")
#-quicklisp
(let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp"
(user-homedir-pathname))))
(when (probe-file quicklisp-init)
(load quicklisp-init)))
(defparameter *pwd* (concatenate 'string (sb-posix:getcwd) "/"))
(push *pwd* asdf:*central-registry*)
(ql:quickload :pp-toml-tests)
(let ((result-status (pp-toml-tests:run-tests)))
(sb-posix:exit (if result-status 0 1) ))
Under the hood and in Lisp, pp-toml-tests:run-tests
drives some fiveAM code to run the extant tests. FiveAM is, as far as I can tell, designed for on-the-fly interactive testing, as most Lisp tooling is. It was entirely too surprising in an integration with continuous integration. I've written my own bad hack for a unit testing framework, the "checker" system, designed for running in CI, but it's kind of, well, a bad hack. I should look elsewhere.
A few key highlights of what's going on in this code:
- I use old-style ASDF registry manipulation to dynamically set where we should be expecting our systems under test to be.
- This code relies on the SB-POSIX SBCL extension - I expect other systems will be able to do the same functionality, but SBCL is what I primarily use.
- I hand-load Quicklisp each time. That's not ideal, and should be changed when I update pp-toml to TOML 0.3.
Hope this helps your Common Lisp integration testing!
(no subject)
Oct. 28th, 2014 11:04 pmSo.
In order to do this:
- diddle DNS with network solutions away from VPS to my DigitalOcean box.
- diddle DNS with Gandi ( a classier provider)
- figure out which user can log onto a digitalocean coreOS box.
- upload docker image to dockerhub
- learn that my custom static site generator Volt never was in a stable state when I generated pnathan.com. pnathan.com is now only a collection of markdown files.
- figure out systemd enough to get docker running on boot
- fix pnathan.com to have a derpy page but not an apache index
- I bounce Jenkins and find out that Jenkins is *now* crashing the jvm somehow, thus dropping my ability to keep my network sane i.e., my regularly scheduled ansible run.
What I want for a scripting language
Mar. 8th, 2014 05:06 pmIn this situation, I have, roughly, several options for programming languages to do my job in. My job will entail devops/tools/scripting sorts of things. Things that I need to be able to do include - shelling out, string futzing, regex work, bundling thingies and sending them to remote servers, and *so forth*.
On the plate of possibilities we have - Ruby, Python, Clojure, and Go. Python is the officially preferred language (as well as the common tooling language).
Roughly, the first three languages are similar - dynamic typing, interpreted(ish), unsound type system, relatively easy regexes, and not terribly fast. Go is different - it is statically typed, compiled, also has unsound type system (lots of potshots have been made here), and is actually reasoanbly fast. Go and Clojure both have a relatively sane multithreading story.
I evaluate languages on several areas:
0. Maturity. This is a bit hard to define, but it roughly translates to the amount of times angry engineers have forced changes in the language because mistakes were made (along with the willingness of maintainers to make this change). A good language has had this happen early on and very few aspects are widely considered to be wholly bad design at this point.
1. Expressivity. Where does it live on the Blub lattice?
2. Speed. Contrary to many popular opinions, speed does matter at times, and you need to be able to break out the optmization guns when the occasion demands.
3. Correctness of written code. How likely is an arbitrary piece of given code likely to have bugs? Something like Agda might rate very low here, and something like hand-written hex might rate very high. This is typically achieved by a combination of tests and type systems. Tests rely on having reasonable "seams" in the code creation process, where test instrumentation can be injected and the module evaluated for correctness.
4. OODA decision rate. How fast can you observe your code's effects and change it? Lisps traditionally rate extremely high on this, with C++ ranking supremely low.
5. Quality of implementation. Separate from maturity, the compiler/interpreter actually defines the language you use - it reifies the language. It should, therefore, be a very solid system.
6. Library breadth and depth. Much like implementation and maturity, libraries that have been around a long time, and have had the bugs ironed out of them provide a better service to the language.
I plan to work through each of the 4 languages and write a simple tool to parse logfiles in each language, summing up my experiences with each language as I go through.
Cusp/lispdev - distractions
Feb. 24th, 2014 02:06 pm"I've taken the Lispdev code and gotten it sort of more or less working. That is, I can communicate with the REPL within Eclipse. Stack traces sort of work. Highlighting works. And a few other things work. There's a lot more capability in the code that needs to be enabled and winkled out." - from my post on reddit/r/lisp/.
Looking at automating the builds now.
It's very aggravating, frankly. Lispdev doesn't appear to work - the preferences pane isn't even set up. Booting throws exceptions right and left.
CUSP doesn't really work on install, failing to lock into SBCL, and things are eh'.
Lispdev is about 28KLoC and Cusp about 19KLoC, all in Java, of course.
Feh. I want to get this working to the point of releasing a SBCL-working version. Let's see if interest exists for further work past that.
I'm on the record of denying the idea of software engineering. We don't work with the physical the way physical engineers do. We don't have the science<->practice chain the way the regular engineers do. Worse, I think the IEEE SWEBOK is horse poop.
But there's still a practice and rigor to software development. I've been in this world for a few years now, in a few different teams. I've drunk from the wisdom of others. I think I can say something not entirely worthless about the matter of writing good software.
The first thing is the goal.
- What is the intended product?
- When must it be done by?
- To what end are we undertaking this effort?
These questions describe the scale of the effort, the kind of people asked to work on the effort, the tools used during the effort, and the process best implemented during the execution. This is a very simple set of questions designed to understand what the problem is and where you want to go. Put simply, these are the strategic points required to put substrategies and tactics into play.
The common ends in business are "makes us money" or "saves us money". The time it must be done by is usually impossible and best described as "yesterday". The actual product is often mutable and is the usual concern of software creators.
The second thing to consider is the famous cost-quality-speed triangle (pick two). Your company mandates the quality. While you the creator control it, your company may find you lacking if you mishandle it. This is partially true of speed, however.Very few products make or break a company by release date, and software projects are notorious for being late. Particularly when estimates from the line people are disregarded. Cost is, again, something you don't really control for software projects: it's labor + overhead for your workspace + support staff.
As the creator of software, you can materially affect speed and quality. Let us presume that you are going to work your usual 35-45 hour work week and have a reasonable competence at the particular technology that you're dealing with - same as everyone else. How do you manage - from your level as an individual contributor and perhaps mentor to others - keeping things working in alignment with your company? That is the next blog post.
The third thing to consider is politics, or, more euphemistically, "Social questions". An old rule of thumb is that software architecture reflects the organization it was developed in. Another rule of thumb is that people are promoted to their level of incompetence. Yet another is that most organizations stratify vertically and attempt to build internal empires. Let us not assume that our fine institution is not subject to these pressures. It probably has already succumbed in part.
Several implications result from this.
- Only you and the others tasked with your project are actually incentivized to complete it. Others may be incentivized to support your organization. Result - when you have to work with other groups, ensure that they have an axe to grind, a wagon to pull, some interest in helping you get your job done. You need them to help, but they can either help you right now or maybe later. I can't remember how many emails I've written that have gotten dropped on the floor.
- The software architecture is not per se the best one for the technical task. It did, however, represent a satisficed social architecture for the task & person work division to allow the workers to effectively operate.
- Your software probably duplicates someone else's, and they won't want to merge. Your pressure and your silos have subtley different constraints and needs than other silos. Often, the constraint is as simple as "Bob reports to me and fixes bugs the day I ask, but you, Alice, don't report to me and may find my bug to be rather PEBCAK and not fix it". While this is more than slightly silly, it really does have implications. There's no use having centralized software if only some users are served. The others will decentralize themselves and get their jobs done.
- Incentives usually produce results designed to ensure more incentives. E.g., if you are not rewarded for fixing problems but instead are rewarded for moving fast, then you won't fix bugs, you'll move fast.
None of these things have to do with software engineering; they hold pretty true pretty well across any producing endeavor. But they lay the context and foundation for the next blog post.
Critters business model
Feb. 1st, 2014 05:04 pmIt seems clear that IAP enables 'whale' behavior, which substantially increases total & average revenue per user. Since I am someone who likes making money (and likes having a way to keep getting the customer's money), it makes sense to figure out how to 'play' IAP.
By the way, I'm mentioning numbers here, but these numbers are, flatly, provisional and are not final.
Key idea: You get your game and the non-cosmetic content by giving an up-front fee. Cosmetic & 'hard-core' play costs money. Things that increase database burden cost money to cover it. No handwaving.
Set up account and play the game, free, for 24 hours.
Purchase the game for a price point (between 2.99 and 4.99).
This gets you access to N critters. You can play the critters as much as you like, as long as you like,until the service shuts down. Since the game is running online, you will get updates as part of your purchase.
Certain add-ons will cost money. For instance,
- a default emotion set will be available for dogs, cats, and foxes(have to figure out the picture rights for dogs & foxes, since I don't have either). If you want to upload your own pictures, that will be a charge. Say, $0.99 for a picture set.
- If you want to write a history of your critter, it will cost some $X (not too expensive) per critter. Maybe more if you really want to write a lot. This is directly targeted at role-players and people who want to record their virtual pet's story.
- Another idea might be swag. Say, you can feed your cat - Amos - kibble every night. He is a happy cat. However, you can buy 'Tuna' from the swag menu for 0.25[1]. Amos adores tuna, adores you for giving you tuna (boost in stats and behaves better). You feel good, and I get a wee bit of money.
Fundamentally, I am someone who played games as a teen and young adult - you bought them, and that's all. No continual mooching. I played WoW. It seemed reasonable to pay a monthly cut. This worked out, as I knew that they were keeping servers alive and improving the game. I don't want to play a game where I have to 'insert a quarter to keep playing'. Holding your experience hostage to money seems... off. It's not above-board. It's like if a hotel informs you that in order to turn the lights off to go to sleep, you have to pay extra. And then to pay to turn the lights back on. Yech.
Seems much more fair and honest to charge up-front for a fair and reasonable service, with any premium services available and marked as such.
[1] This might actually not be workable due to payment processors wanting a cut. If they want a $0.25 min transaction, it'll have to be more.
Code as Art
Jan. 25th, 2014 10:49 pmIt used to be my personal site, but you know how hard it is to spell out faegernis to people? Too hard. Anyway -
Perhaps about 18 months ago I had the idea of "code as art". In particular, is it possible to consider code as art without reference to extant art forms (poetry, visual designs)? Some hold that obfuscated code (IOCC) is an art, but I'm not looking for crafty work, I'm looking for Art.
Another way to think about it is - what makes Quicksort and Floyd-Warshall's algorithms so beautiful?
Or, what if Quicksort and Floyd-Warshall's algorithm represent a minimalist aesthetic best suited to the Modernist conceptions (e.g., Apple hardware design taste), and other equally viable aesthetics for code exist, such as baroque & rococo aesthetic?
I want to explore these ideas with faegernis. I don't know where I'm going to land or how I'm going to get there, but I think it's something that needs doing.
Coding vacation
Jan. 19th, 2014 11:41 pmI've taken an at-home coding vacation this week and last week. I've been doing reading on model railroads and sailing, as well as a smattering of other books.
I flipped through the list of the papers from POPL2014; it's kind of frustrating to me- they all appear to be focused on type systems. I'm not sure why this is such a thing in programming language design (perhaps it is enjoyable to work on the math!). But I don't think the big problem - software crisis if you will - is in data types. Data types in most of the developer's world sits in the grungy province of the C(C, C++, Java, C#, ObjC) family or in the more fluid province of the Perl family (Perl, Ruby, Python, Groovy, etc). Neither of these languages has the type system problem solved even at the level of ML languages (which are, AFAICT, sort of old hat today). So from a "cool insights useful in fifteen years" level, I'm depressed! These results probably won't ever get to an Ordinary Practitioner.
For me, the bridge into the Best Possible World is founded in Software Contracts, most commonly associated with Eiffel and occasionally supported in obscure extensions for other languages. Suppose that I claim that not only will some variable quux
is an int
, (solvable in C) but in fact it is between 0
and 10
(solvable in Ada, somewhat in SBCL, and perhaps in the Agda & family of dependently typed languages), and not only that, quux
will always be passed into frobbing_quux
and used to generate a result (similarily quantified). Let me call that "Code Type System". It may be that complete analysis of the code demands a complete type system of the data. Certainly some of this work has already been done in the model checking & static analysis community, as they can already detect certain concurrency bugs and malloc
/free
errors. Then, of course, we find ourselves building a model of the program in our "Code Type System" before we actually build the program, and if we find we need to alter the system, we have to rebuild the model framework. This is well understood to be the classic failure model of fully verified software development.
Let me instead take this thought experiment by way of an example, using Common Lisp.
(def-checked-fun mondo (quux baz)
"Some docs"
(check (and (calls-into frobbing_quux quux)
(type-of quux integer)
(range quux 0 10))
;; do stuff
(frobbing_quux quux)
;; do stuff
)
The theoretical def-checked-fun
macro will examine the code, both as-is and macro-expanded, verifying that it does indeed appear that quux satisfies the required judgement. Of course, we can't answer that quux is of a particular type or that it falls into a certain range at this point: in order to make that judgement, either run-time checks need to be added (to constrain possibilities), or intraprocedural analysis needs to be performed. However, the calls-into
check can be either demonstrated or an "unproven" result returned. This is simple - some CAR
needs to be frobbing_quux
with a bar
in the argument list.
Some of this, in some measure is already done in SBCL. I have long thought (at least 8 months now), that a compile-time macro system (def-checked-fun
is an example) could provide some very interesting insight, particularly if you begin to share information).
The worst possible result is that the checker returns something to the effect of "No possible errors detected; no data found to make judgement". In short, it's useless. The best possible result is that intraprocedural contracts can be built based on not simply types but expected semantics in a succinct system (only modelling what is useful), then when time comes for running, all this information is quietly removed.
I propose that this is a useful idea, and practical, too - for an Ordinary Practitioner. It's like lightweight unit tests that require almost no lines of code and almost no tweaking.
It's important to understand, of course, that I'm thinking about "correct" results, not "complete" results, quite two different things! Dawson Engler's talk/paper "A Few Billion Lines of Code Later" really strikes at the heart of what makes a system useful for this kind of work in practice and the exigencies of real world work. I won't summarize it here.
What I don't know is if CoQ or other systems (ACL2, etc) already implement this sort of fluid procedural static type checking. Possibly they do.
Valentines App 2014
Jan. 8th, 2014 11:13 pmI've written the app and got a landing page developed. I'm thinking that I'll go two ways:
- Gumroad for a non-customizable Valentines Day app.
- PayPal for a customizable app. This will involve a custom build process for each request, and I'm not ready to automate this process yet via some sort of Ruby on Rails job, since I don't have any revenues. It'll be faster and simpler just to do a custom build and .app ship for each user for a while, I think.
Adventures in Rust
Jan. 8th, 2014 11:02 pmI have a playground of data structures (flaky data structures, get the pun? ha ha). I've roughly kept it maintained for about a year now & updated some of it recently to Rust on master.
Wow. So change compared to Rust 0.6.
* No more @ pointer. Now it's rc::Rc, .borrow(), and .clone(). Really tedious.
* Total confusion on my part on how to build traits for things that wind up being rc:Rc'd. Still have no idea. I'll need to sort this out with #rust at some point.
* match(ref foo, ref bar, ref baz) is new. Argh!
Other than that, there are a few oddities but nothing catastrophically weird. Although it was vaguely amusing writing myself an infinite loop by accident, I was able to get the linked list and circular buffers compiling.
Next time I'm looking for low-stress coding & debugging, I'll fix up the binary tree and start work on a 1-dimensional range tree (Data structure #1 in Samet's multi-dimensional data structures book).
cl-linq information
Dec. 25th, 2013 06:56 pmThis insight has led to R, ipython, Macsyma, MySQL, Postgres, and other systems having their own REPL.
However, a serious problem in the Common Lisp REPL is the inability to sling large sums of data around easily, perform queries, etc. It's simply not built in to the system to have multimillion rows of data, perform queries on it, and feed it into particular functions. Lists are too slow; vectors are too primitive, hash tables are too restrictive. Further, queries start looking really hairy as lambdas, reduces, and mapcars chain together. SQL has shown a clearly superior succinctness of syntax. Worse, these queries are ridiculously non-optimized out of the gate. I've had to deal with this situation in multiple industry positions, and it is *not* acceptable for getting work done. It is too slow, too incoherent, and too inelegant.
Hence, I am working on a solution; it started out as CL-LINQ, or, Common Lisp Language INtegrated Queries, a derivative of the C# approach. The initial cut can be found at my github for interested parties. It suffers from a basic design flaw: 100% in-memory storage and using lists for internal representation.
I am proud to note that I've been able to begin work on an entirely improved and redesigned system. This system is derived from several key pieces. The first and most important is the data storage system, which is what I've been working on recently.
Data is stored in data frames; each data frame has information about its headers. Data itself is in a 2D Common Lisp array, ensuring nearly-constant access time to a known cell. Data frames are loaded by pages, which contains a reference to the data table, as well as a reference to the backing store. Pages store information about the data in the data frame. Each page has a 1:1 mapping to a data frame. Pages are routed through a caching layer with a configurable caching strategy, allowing only data of interest to be loaded in memory at a given point in time. Finally, a table contains a number of pages, along with methods to access the headers, particular rows in the table, etc.
After this system is done (perhaps 80% of the way done now), then the index system can be built. By building the indexes separate from the raw storage system, I can tune both for optimal behavior - indexes can be built as a tree over the data, while the data can be stored in an efficiently accessible mechanism.
Finally, as the motivating factor, the query engine will be designed with both prior systems in mind. The query engine's complexity will be interacting with the index system, to ensure high speed JOINs. A carefully developed query macro system could actually precompile desired queries for optimal layout and speed, for instance.
Features that will be considered for this project include - integration with postgres as the storage engine - compiled optimization of queries - pluggable conversion system for arbitrary objects and their analysis.
At the completion of this project, a library will be available for loading large amounts of data into data tables, computing queries and processing upon them, and then storing the transformed data into external sources.