« Inventing the Future | Main | Graph Databases and Star Wars »

Spreadsheets for the New Millennium

"We shape our tools, and thereafter our tools shape us." ~ Marshall McLuhan

I've done a lot of writing on "big data" and NoSQL solutions over the past couple of months, and it's time to take stock of it all: "Why should anyone care about any of this?" I've spent some time in 2011 playing with the four main varieties of new "NoSQL" data stores:

  • I started with a simple little note taking app, hosted on the cloud and backed by document-oriented store MongoDB --
  • The second application was a URL shortener, hosted on the cloud and backed by the key-value NoSQL store Redis -- based on a terrific example: An URL shortener with Ruby on Rails 3 and Redis by Christoph Petschnig You can play with the URL shortener here: Mini-URLs -- bit.ly for the masses
  • The third application was a nice Hadoop / Cloudera model to perform word counts across a bunch of files. It was based on Phil Whelan's terrific posting and it showed just how easy it can be to trigger MPP with Hadoop and Cloudera
  • My final example is a "6 Degrees of Kevin Bacon" social-graph-solution generator written in the graph database Neo4j, based on Ian Dees' terrific Everyday JRuby posting
    This is a fun little app, and you can play the 6-Degrees game here: 6-Degrees of Kevin Bacon

So we have some examples up, but why does this matter? Who cares?

As it turns out we have a confluence of a number of technologies and trends that make solutions of an entirely new type possible. Here's what's new:

  • The Internet makes it possible to market to the masses, and keep detailed records of their response-to-stimuli
  • The Cloud makes it possible to spin up supercomputer-level resources with no capital expenses
  • The "NoSQL" family of data stores has arisen to deal with non-relational data challenges: Hadoop for MPP, Memcached, Redis and Voldemort for transient data, SETI and CouchDB for log data, etc.
  • Visualization tools like Tableau make it possible to create stories and narratives that result from the other tools here

So why does this matter, and how might things turn out for the pioneers of something new like this? For the answer let's look back to a previous example, in which an MBA-student decided to try programming a "toy" personal computer to manage data for the business cases he encountered at school. In 1979 the MBA (Dan Bricklin) asked a buddy (Bob Frankston) to help him code up a solution on a primitive, toy Apple II. How that that turn out for them? As Steven Levy wrote, way back in 1984:

It is not far-fetched to imagine that the introduction of the electronic spreadsheet will have an effect like that brought about by the development during the Renaissance of double-entry bookkeeping. Like the new spreadsheet, the double-entry ledger, with its separation of debits and credits, gave merchants a more accurate picture of their businesses and let them see – there, on the page – how they might grow by pruning here, investing there. The electronic spreadsheet is to double entry what an oil painting is to a sketch. And just as double-entry changed not only individual businesses but business, so has the electronic spreadsheet.

And so it happened -- a novel application (the spreadsheet) on a toy machine gave business managers a new way to track and value their businesses. The "and value" is key here -- this was an application of the technology that Bricklin and Frankston never really considered, and advances to support valuation (1MB memory space and macros, in Mitch Kapor's Lotus 1-2-3) enabled the Mergers and Acquisition boom that has continued to this day, and this application (in vehicles like hedge funds) created more than a dozen billionaires just in 2007 alone.

So this is why NoSQL, big data, the cloud and visualization are so fascinating to watch. They enable solutions to problems that would have been inconceivable even a decade ago. The cloud era doesn't have its "spreadsheet," yet, but it will...