« Spreadsheets for the New Millennium -- Part 3 | Main | Not "Big Data" -- FAST Data »
Thursday
May192011

The Big Easy - Spreadsheets for the New Millennium Part 2

Back in January I wrote a post that I called Spreadsheets for the New Millennium, and in that posting I suggested that Big Data would never take hold in the public consciousness until there were gateways to it -- tools that could let anybody play with it, tools that could make it easy.

I love the idea of Mardi Gras -- everybody picks a time (the Tuesday before the start-of-Lent Ash Wednesday) and a place (New Orleans' French Quarter) to get together to have a party. I love it because that practically never happens in technology. As much as we celebrate advances in technology, we need to celebrate them because progress (both discovery and adoption) is so hard.

Why does it seem to be so hard for communities to assemble and take up advances in science? This is one of the key questions in Thomas Kuhn's The Structure of Scientific Revolutions

What does it take for an idea to break through? Here is Kuhn's answer:

Kuhn saw that for a new candidate paradigm to be accepted by a scientific community, "First, the new candidate must seem to resolve some outstanding and generally recognized problem that can be met in no other way.

Second, the new paradigm must promise to preserve a relatively large part of the concrete problem solving activity that has accrued to science through its predecessors..."

Or, as Steve Jobs might say: Think Different, but not Too Different.

For Big Data to become mainstream technology we need to satisfy two conditions:

  1. Solve a generally-recognized problem that can be met no other way. Check - Progressive Insurance can give you an estimated insurance quote on the spot, but only because they pre-calculate a rate quote for every car in the US every night -- using Hadoop. The problem and solution are simple, but without Hadoop they could never generate every car, every night...
  2. ...Preserve its predecessors. FAIL. Hadoop is a terrific tool, but unless you've had the good fortune to take MIT's 6.001 or Yale's CS 323 you've probably never seen anything like it.

The Hadoop community has desperately added Hive and Pig to try to reduce the foreignness-barrier of functional programming and get Hadoop over Kuhn's second barrier.

Brook Byers of Kleiner Perkins put me onto Kuhn back when I was at Stanford, so it's no surprise that KBCP announced a $9M round with the Hadoop-y Big Data company Datameer, announcing

Datameer’s Analytics Solution, which integrates the data mining power of Hadoop with a spreadsheet interface, enables business users to run analytics against very large data sets with no programming required. The product is designed to help users with little to no computer engineering experience handle massive amounts of data.

Kleiner and Datameer aren't alone in the race for Spreadsheets for the New Millennium, Factual is another player that's raised a lot of smart money.

These are great approaches, but in the best open-source tradition we'll bring up a solution that does a lot of the same things -- based in open-source code -- in one of my next postings.