Sunday
Sep072014

I Saw Sparks

I've long been a follower of Joel Spolsky and his writings on software development, and some of them (e.g. Can Your Programming Language Do This?) are practically QED for their topics. I think I can do him one better on one similarly terrific writing of his: Smart and Gets Things Done.

You can't argue with "smart" — as a software engineer, manager or executive you have to expect that your discipline and skill set will turn over 99% (the remaining 1% being vi and UNIX commands from the '80s) every 3-4 years. If you're not really smart and really dedicated you can't possibly keep up past a single product cycle.

"Gets Things Done" is similarly dispositive — even the brainiest developer won't get products out if they

  • Stay locked in their Microsoft-mandated individual offices, never talk to anyone and stay alive only if/because you keep sliding pizza's under their door
  • Are so spectacularly abrasive that they make the rest of your team take to living in their Microsoft-mandated offices — leading to endless shifts in the product schedule (and ever-increasing pizza bills !)

Interpersonal skills tend to be undervalued in Technology, but are essential to getting things done. Even more than pure skill, "Get's things done" is a testimony to human grace: It takes a lot of humility to get products out the door, and you might be a Putnam Fellow, but without some give and take all that brilliant code will never make it off of your machine!

To Spolsky's pair I'd add one more category — one more thing that I look for when I'm hiring or building teams: "Sparks." Sparks are those odd nuggets that pop up on a resume — seemingly unrelated to anything, that indicate the kind of rare gifts that make our world the wonder it is. I once interviewed (and hired!) a fantastic software engineer, former graduate EE whose "spark" was that she'd done research work on (and helped write the book on) chinchillas! She was qualified in all the Spolsky ways — but to me the chinchilla book was the clincher. Few are those who do EE research on chinchillas, but only the rarest write the book following that research.

Smart?
Check!
Get's Things Done?
Check!
Chinchillas? Chinchillas??? Chinchillas!

HIRED!

Sparks have brought me some of the best things in my life: my wife Kate and some great friends and co-workers (among them a founding Menudo member, another of the greatest musicians of all time, several rare inventors and scientists, and artists and more…)

Sparks are also one of the things I look for in software development efforts, and thus (for my own efforts, and for my work) I tend to stay away from development approaches and tools that required teams and time that only Cecil B DeMille could master.

This might be fine for some, but I think it's just too many to expect the apex of human expression and genius to appear. We don't know what we're doing, but that doesn't stop us from trying and genius is sometimes the result. As I've written before, the great breakthrough that was Lotus 1-2-3 came from its macro capability — the magic-decoder-ring that gave spreadsheet users the ability to do things that it's inventors might never have dreamed!

This is what makes Spark such a big win for Big Data — It's light and interactive, and rewards people who might have that spark of insight — even if they can't afford a 10-geek programming team. With the balance of this post we'll get Spark started, and in my next post we'll go deeper into the wonders that Spark can do.

First, we're going to want to update our Java runtime and JDK environments. There are options in this space now, but as a former Oracle employee (and still Oracle-rooter and fan) we'll head directly over to Larry's site for what we need:

And we're set. I'm running on a Macintosh and I've chosen Java 8 (finally! closures!!), but there is a version of Java that comes with the Macintosh, and you're going to want to add a little magic to your ~/.bash_profile to make OSX recognize the latest Java. Once that's in, you can run
$ java -version

java version "1.8.0_20" Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

from your terminal to confirm that we're all ready with Java.

Next comes an installation (or update) to Hadoop. I know I've spent most of my past four Big Data posts moaning about Hadoop's batch-y, non-interactive style, but for data that really is embarrassingly parallel it's the tool you need. Cloudera has taken a lot of the adventure out of Hadoop installations, but for my Mac I'm grateful to Yi Wang's Terrific Tech Note on Hadoop for Mac OSX. I've installed Hadoop 2.4.1, and the tech note covers the installation and does a nice job on getting started with the core.site.xml, hdfs-site.xml, and yarn-site.xml setup as well. Again, once you can run

$ hadoop jar ./hadoop-mapreduce-examples-2.4.1 wordcount LICENSE.txt out

from the MapReduce Examples folder you're set. Now, for Sparks and the show we've all been waiting for:

Monday
Sep012014

Spreadsheets for the New Millennium -- Getting There...

I've written some posts about my hopes for a next generation of computing; about the rise of "Spreadsheets for the New Millennium" here: part 1, part 2, and part 3. Well it's been a couple of years since I wrote about Spreadsheets, and it's a decade now since our current computational generation began, so let's see what we've got:

Google published their breakthrough paper in late 2004, and it was based on work that had been ongoing since at least 2002. Now 2002 was a great year (but with Nickleback topping the charts we might want to reconsider just how great), but that's more than 10 years ago now and computing has changed a lot since then. Here are some things that were unknown in 2002 that are commonplace now:

  • SSDs — My current Mac has a 768GB SSD, and computer disks have since gone the way of… CRT displays
  • Flat-screen displays — HP used to sell monitors that my friend Julie Funk (correctly) called "2 men and a small boy" monitors — because that's what it took to carry one. Nobody misses them now — gone and forgotten
  • Multicore processors — I'm still waiting for faster living from my GPU, but Moore's Law still lives on in multicore
  • "10Gig-E" networks — I'm old enough to still remember IBM token-ring networks. Now "E" has been replaced by "Gig-E", which is itself headed for the "10Gig-E" boneyard.
  • GigaRAM — I did some work for Oracle back in 2000's that showed the largest Oracle transactional DB running on about 1TB of memory. That was a lot then, but you can buy machines with at TB of RAM now. Memory is the new disk, and disk is the new tape…

There are still more innovations, but even with what I've listed so far I believe that we can safely say that we're not living in the same computational world that Brin & Page found in the early days of Google. "Big Data" has had lots of wins in the technology domain and even some that have reached general public recognition (such as IBM's jeopardy-playing Watson, Amazon's Customers who bought … also bought.) Expectations for results from data have risen, and it's time for some new approaches. Technologies from then just don't meet the needs of now

Hadoop was a terrific advance (basically a Dennis Machine for the rest of us), but by today’s standards it’s clumsy, slow and inefficient. Hadoop brought us parallel computing and Big Data, but has done it through a disk-y solution model that really doesn't "feed the bulldog" now:

  1. Everything gets written to disk ( the new tape ), including all the interim steps — and there are lots of interim steps
  2. Hadoop really doesn't handle intermediate results and you'll need to chain lots of jobs together to perform your analysis, making Problem 1 even worse.
  3. I've written beautiful MapReduce code with ruby and streaming, but nobody's willing to pay the streaming performance penalty and thus we have the bane of Java MapReduce code — the API is rudimentary, it's difficult to test and it's impossible to confirm results. Hadoop has spawned a pile of add-on tools such as Hive and Pig that make this easier, but the API problems here are fundamental:
    • You have to write and test piles of code to perform even modest tasks
    • You have to generate volumes of "boilerplate" code
    • Hadoop doesn’t do anything out of the box. It can be a herculean writing and configuration effort to tackle even modest problems.

This brings us to the biggest problem of the now-passing MapReduce era — most haystacks DO NOT have any needles in them! The "Big Data" era is still only just beginning, but if you're looking for needles then lighter, more interactive approaches are already a better way to find them.

The great news is that solutions are emerging that increasingly provide my long-dreamed Spreadsheets for the New Millennium. One of my favorite of these new approaches is Apache Spark and the work evolving from the Berkeley Data Analytics Stack.

Spark is a nice framework for general-purpose in-memory distributed analysis. I've sung the praises of in-memory before ( Life Beyond Hadoop ), and in-memory is a silver bullet for real-time computation. Spark is also familiar: You can deploy Spark as a cluster and submit jobs to it - much as you would with Hadoop. Spark also offers Spark SQL (formerly Shark) that brings advances beyond Hive in the Spark environment.

Many of the major Hadoop vendors have embraced Spark and it's a strong Hadoop replacement because it tackles the fundamental issues that have plagued Hadoop in the 2010's:

  • Hadoop has a single point of failure (namenode) — fixed using Hadoop v2 or Spark
  • Hadoop lacks acceleration features — Spark is in-memory and parallelized and fast
  • Hadoop provides neither data integrity nor data provenance — RDDs (resilient distributed datasets) are (re)generated by provenance, and legacy data management can be augmented by Loom in the Hadoop ecosystem
  • HDFS stores three copies of all data (basically by brute force) — Spark RDDs are cleaner and faster
  • Hive is slow — Spark SQL (with caching) is fast - routinely 10X to 100X faster

Spark supports both batch and ( unlike Hadoop ) streaming analysis, thus you can use a single framework for both real-time exploration as well as batch processing. Spark also introduces a nice Scala-based functional programming model, which offers a simple introduction to the dominant Reduce patterns for Hadoop’s Map/Reduce.

So Spark is:

  • An in-memory cluster computing framework
  • Built in Scala, so it runs on the JVM and is compatible with existing Java code and libraries
  • 10-100 times faster than Hadoop Map/Reduce because it runs in memory and avoids Hadoop-y disk I/O
  • A riff on the Scala collections API -- working with large distributed datasets
  • Batch and stream processing in single framework with a common API

Spark really looks like the next step forward. It additionally supports Thomas Kuhn's Structure of Scientific Revolutions requirements for The Next Step Forward in that it preserves many of the existing Big Data approaches while simultaneously moving beyond them. Spark has native language bindings for Scala, Python, and Java and offers some interesting advances, including a native graph processing library called GraphX and a machine learning library (like Mahout) called MLlib.

These are all valuable steps beyond the Toy Elephant, and they give us a great way to find needles while controlling "Needle-less Haystack" risks and costs. So here is our core scenario:

  • You have a haystack
  • You think there might be a needle (or needles!) in it
  • You want to staff and fund a project to find needles — even if you don't know where they are or exactly how to find them

So — do you:

Staff a big project with lots of resources, writing lots of boilerplate code that you'll run slowly in batch mode -- all while praying that magical "needle" answers appear?

or

Start experimenting in real time, with most of your filters and reducers pre-written for you — producing new knowledge and results in 1-10th to 1-100th the time!

With Spark and Spark SQL, to paraphrase William Gibson, "the future is already here, and it's about to get a lot more evenly distributed!" More on rolling with Spark and Shark / Spark SQL in future postings…

Saturday
May172014

The New Old Software

The New Old Software

The forward march of technology is magnificent — sometimes the path to progress is so breathtaking that we lose track of just how far that path might stray from our everyday lives. It is great that Twitter and Facebook are inventing all kinds of ingenious innovations, but before we dive in we have to make sure we don't blot out previous generations of working software (and leave users high-and-dry). They (the kool kids) have different goals and requirements than we do…

The Road to Ruin

This may not draw nearly the buzz and press of its Facebook-y counterparts, but we are seeing a terrific evolution in the processing of "legacy" software — a lot of great pieces are all coming together and developers are just catching on on to the combined wonder of them now. Here is the Hobbesian world of the-rest-of-software today:

  • Vastly more brownfield than "greenfield" development — yes, the kool kids may be inventing Rails4.1.x and AngularJS and MongoDB, but we're NOT. We are more commonly asked to provide updates to current site, even if it was written all the way back in the bubble in ASP or JSP or something, with piles of HTML table-layouts and stored procedures.
  • Outdated technology. A/JSP pages may have been great then, but they scream "2001: A Code Odyssey" now. There was nothing wrong with writing software that way back then, but we might do better now…
  • No clear requirements, either for the legacy system or for the futuristic wonder that we are being asked to create to replace it
  • No documentation of the legacy system,or libraries of documents that are ponderous and out-of-date
  • Few tests or coverage of the legacy system, or…
  • Bad tests — lovely unit tests that may all pass (if we don't touch anything), but might all break if we modify a single line of code...
  • BUT (and this is critical) the current software mostly does work, and has lots of users who count on it!

We don't need kool-kid-ware, we just need the tools to advance in the world we live in. Most code stinks and simply disappears… The code that remains is still with us because it met a real need, with real users. Whatever sins may plague them, we're left with the winners and we need to take them boldly into the next generation.

We need the right approach, now — hesitation leads to even worse issues…

The Road to Ruin

So here is our task — to skip the magical incantations (that might be great for Google-glass alpha-testers) and to mix a little magic into the systems that serve our customers and pay the bills every day. Here are the rules we will follow into that new world.

  • Brownfield - we will evolve the code base, but we cannot break it and we'd be idiots to rewrite it!
  • We will adopt leading technologies to provide new features — leading edge but not bleeding edge
  • Our user base is engaged, but their requirements are sketchy — one of the most important creations from our work will be clear requirements for any future work!
  • Our requirements and our tests will be read by future generations, but (face it) NOBODY is ever going to read our documentation, and we have no time or budget to write any
  • Tests — TDD and BDD are a great step forward, so we will write tests AND we'll write them the right way, at the right level!
  • We will update the system incrementally, and at every step in our updates everything will still work!

In my next post I'll describe how to write new old software, updated for 2014...

Sunday
Feb092014

Life Beyond Hadoop

But I'm here to tell you … There's something else … the afterworld.

~ Prince

"Yet, for all of the SQL-like familiarity, they ignore one fundamental reality – MapReduce (and thereby Hadoop) is purpose-built for organized data processing (jobs). It is baked from the core for workflows, not ad hoc exploration."

Why the Days are Numbered for MapReduce as we Know It

I first started with Big Data back in 2008, when CouchDB introduced itself as New! and Different! and offered live links to the Ruby on Rails development that we were doing. It worked, but it couldn't easily explain why it was better than the MySQL and PostgreSQL that we were using. By 2010 I hired (and got brilliant work from) Cindy Simpson and build a web system backed by MongoDB, and it was great for reasons we could understand, namely:

  • It could handle loosely structured and schema'd data
  • Mongoid and MongoMapper give it a nice link to Ruby on Rails
  • It was straightforward step beyond its more SQL-y cousins
  • Binary JSON (and the end of XML as we knew it)

I wrote about it here (NoSQL on the Cloud), and followed that writing with other notes on Redis, Hadoop, Riak, Cassandra, and Neo4J before heading off for the Big Data wilds of Accenture.

Accenture covered every kind of data analysis, but everyone's love there was All Hadoop, All the Time and Hadoop projects produced a lot of great results for Accenture teams and customers. Still, it's been 5 years since I started in Big Data, and it's more than time to take a look to see what else the Big Data world offers BEYOND Hadoop. Let's start with what Hadoop does well:

What Hadoop / MapReduce does well:

  • ETL. This is Hadoop's Greatest hit: MapReduce makes it very easy to program data transformations, and Hadoop is perfect for turning the mishmash you got from the web into nice analytic rows-and-columns
  • MapReduce runs in massively parallel mode right "out of the box." More hardware = faster, without a lot of extra programming.
  • MapReduce through Hadoop is open source and freely (as in beer) licensed; DW tools have recently run as much as $30K / terabyte in licensing fees
  • Hadoop has become the golden child, the be-all and end-all of modern advanced analytics (even where it really doesn't fit the problem domain)

These are all great, but even Mighty Hadoop falls short of The Computer Wore Tennis Shoes, 'Open the pod bay door, HAL', and Watson. It turns out that there are A LOT of places where Hadoop really doesn't fit the problem domain. The first problems are tactical issues — things that Hadoop might do well, but it just doesn't:

Tactical Hadoop Issues:

  • Hadoop is run by a master node (namenode), providing a single point of failure.
  • Hadoop lacks acceleration features, such as indexing
  • Hadoop provides neither data integrity nor data provenance, making it practically impossible to prove that results aren't wooden nickels
  • HDFS stores three copies of all data (basically by brute force) — DBMS and advanced file systems are more sophisticated and should be more efficient
  • Hive (which provides SQL joins, and rudimentary searches and analysis on Hadoop) is slow.

Strategic Hadoop Issues:

Then there are the strategic issues — places where map and reduce just aren't the right fit to the solution domain. Hadoop may be Turing-complete, but that doesn't mean it's a great match to the whole solution domain, but as the Big Data Golden Child, Hadoop has been applied to everything! The realm of data and data analysis is (unbeknownst to many) so much larger than just MapReduce! These different solution domains were once though limited and as the first paper on them revealed 7 of them, so they were referred to as the "7 Dwarfs."

More have been revealed since that first paper, and Dwarf Mine offers a more general look at the kinds of solutions that make up the Data Problem Domain:

  1. Dense Linear Algebra
  2. Sparse Linear Algebra
  3. Spectral Methods
  4. N-Body Methods
  5. Structured Grids
  6. Unstructured Grids
  7. MapReduce
  8. Combinational Logic
  9. Graph Traversal
  10. Dynamic Programming
  11. Backtrack and Branch-and-Bound
  12. Graphical Models
  13. Finite State Machines

These dwarves cover the Wide Wide World of Data, and MapReduce (and thus Hadoop) are merely one dwarf among many. "Big Data" can be so much bigger than we've seen, so

Let's see what paths to progress we might make from here…

If you have just some data: Megabytes to Gigabytes

  • Julia — Julia is a new entry in the systems-language armory for solving about anything with data that may scale to big. Julia has a sweet, simple syntax, and as the following table shows it is already blisteringly fast:

Julia is new, but it was built from the ground up with support for parallel computing, so I expect to see more from Julia as time goes by.

If you have kind of a lot of data: up to a Terabyte

Parallel Databases

Parallel and in-memory databases start from a world (RDBMS storage and analytics) and extend it to order-of-magnitude great processing speeds with the same ACID features and SQL access that generations have already run very successfully. The leading parallel players also generally offer the following advantages over Hadoop:

  • Flexibility: MapReduce provides a lot more generality in what can be performed by the programmer and and almost limitless freedom, as long as you stay in the map/reduce processing model and are willing to give up intermediate results and state. Modern database systems generally support user-defined functions and stored procedures that trade freedom for a more conventional programming model.
  • Schema support: Parallel databases require data to fit into the relational data model, whereas MapReduce systems allow users to free format the data. The added work is a drawback, but since the principal patterns we're seeking are analytics and reporting, that "free format" generally doesn't last long in the real world.
  • Indexing: Indexes are so fast and valuable that it's hard to imagine a world without indexing. Moving text searches from SQL to SOLR or Sphinx is the nearest comparison I can make in the web programming world — once you've tried it you'll never go back. This feature is however lacking in the MapReduce paradigm.
  • Programming Language: SQL is not exactly Smalltalk as a high-level language, but almost every imaginable problem has already been solved and a Google search can take even novices to some pretty decent sample code.
  • Data Distribution: In modern DBMS systems, query optimizers can work wonders in speeding access times. In most MapReduce systems, data distribution and optimization are still often manual programming tasks

I've highlighted SciDB/Paradigm4 and VoltDB in the set above, not (only) because they are both the brainchild of Michael Stonebraker, but because both he and they have some of the best writing on the not-Hadoop side of the big data (re)volution.

Specific Solutions: Real-time data analysis

  • Spark
    • Spark is designed to make data analytics fast to write, and fast to run. Unlike many MapReduce systems, Spark allows in-memory querying of data, and consequently Spark out-performs Hadoop on many iterative algorithms.
      • Spark Advantages:
        • Speed
        • Ease of Use
        • Generality (with SQL, streaming, and complex analytics)
        • Integrated with Hadoop (see my notes from Thomas Kuhn's Structure of Scientific Revolutions here and most-importantly here.
  • Storm
    • Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation.
  • MPI: Message Passing Interface
    • MPI is a standardized and portable message-passing system designed to function on a wide variety of parallel computers. The MPI standard defines the syntax and semantics of a core of library routines useful to a wide range of users writing portable message-passing programs in Fortran or the C programming language.

Specific Solution Types: Graph Navigation:

I've written about Graph databases before (Graph Databases and Star Wars), and they are the most DSL-like approach to many kinds of social networked problems, such as The Six Degrees of Kevin Bacon. The leaders in the field (as of this writing) are:

Specific Solution Types: Analysis:

Hadoop is great at crunching data yet inefficient for analyzing data because each time you add, change or manipulate data you must stream over the entire dataset

  • Dremel
    • Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data.
  • Percolator
    • Percolator is a system for incrementally processing updates to a large data sets. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, you significantly speed up the process and reduce the time to analyze data.
    • Percolator’s architecture provides horizontal scalability and resilience. Percolator allows reducing the latency (time between page crawling and availability in the index) by a factor of 100. It allows simplifying the algorithm. The big advantage of Percolator is that the indexing time is now proportional to the size of the page to index and no more to the whole existing index size.

If you really do have lots of data: Terabytes — but want something other than Hadoop

  • HPCC
    • HPCC enables parallel-processing workflows through Enterprise Control Language (ECL), a declarative (like SQL and Pig), data-centric language.
    • Like Hadoop, HPCC has a rich ecosystem of technologies. HPCC has two “systems” for processing and serving data: the Thor Data Refinery Cluster, and the Roxy Rapid Data Delivery Cluster. Thor is a data processor, like Hadoop. Roxie is similar to a data warehouse (like HBase) and supports transactions. HPCC uses a distributed file system.
  • MapReduce-y solutions for Big Data without Hadoop:

Data is a rich world, and even this timestamped note will likely be outdated even by the time it's published. The most exciting part of the "Big Data" world is that "Big" is increasingly an oxymoron — ALL data is "big", and ever more powerful tools are appearing for all scales of data. Hadoop is a great tool, but in some aspects it has "2005" written all over it. Review the field, choose the tools for your needs, and…

"…go crazy — put it to a higher floor…"

Saturday
Jan112014

Predictions for 2014 - Part #1

"The most exciting phrase to hear in science... is not 'Eureka!' (I found it!) but 'That's funny.'" ~ Isaac Asimov

"One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them." ~ JRR Tolkien

1. The Big Easy — Big Data gets smaller part #1

I started with big data solutions back in 2008, not (unlike Twitter or Facebook) because I needed a solution to escape the CAP limitations of SQL solutions, but in search of new value from data that we’d otherwise have discarded. CouchDB came first as an experiment in moving off MySQL for Rails apps. MongoDB came next, and persisted because of the following features:

  • Easy structure and protocols for SQL-trained DBAs to adopt
  • Mongoid and MongoMapper data modeling gems for Rails
  • JSON syntax and conventions

This got things started, as time went on the Hadoop environment has produced richer and richer toolsets for bigger and bigger data. These are great for global web-scale companies, but might miss the point for the rest of us. I made the point a couple of years ago that, for the rest of us, the key thing was NOT that big data was BIG — it was that big data was FAST. Now in 2014 we’re ready to take the next step forward: in 2014 everything is FAST, and so big data now needs to be EASY.

I’ve also written before that “there are only 2 kinds of problem that big data solves”:: "Hindsight" (where something has happened and you want to know what in your pile of data might have predicted it) and "Foresight" (where you have a pile of data and want to know what it leads to). Foresight solutions probably outnumber Hindsight 10:1, and being 2014 everybody should be familiar with recommendation engines. So here’s what we can expect in 2014:

  • Seeking insight from all forms (web, orders, social, searches) of social data has moved from Innovative to to Best Practice
  • All that data is unstructured and needs structure. Hadoop-as-ETL rules the day — leading to…
  • Return of the JediSQL databases and reporting tools were always good but couldn’t handle unstructured data. Hadoop is magical with unstructured data, but doesn’t easily provide real-time results, reporting, and support hands-on analysis. So…

2. Business Intelligence is BACK - Big Data gets smaller part #2

The ubiquity of Y2K-spawned ERP and enterprise data systems led to a golden age for BI, but those implementations are mossbacked now, and more than 65% of the new data generated today is unstructured. Standard BI solutions, before the Hadoop boom might run $50K per terabyte in licensing fees alone, and cost and structure made them a tough fit for vast, sloppy piles of interaction data. MapReduce to the rescue, with the operation part of the term is Reduce — as petabytes of document-data crystalize like diamonds into gigabytes of nice, reduced, rows and columns in conventional data stores. Manage your data right, and the "Spreadsheets for the New Millennium" that I’ve written about previously here, here and here become exactly that:

Spreadsheets!

3. JavaScript everywhere - MEAN and Meteor

Much of my earlier big data work started with Ruby on Rails — Rails was a great DSL for the web, and it provided a couple of wonders that are still wonders today, and were absolutely magical in 2005:

  • A web domain-specific-language. Write for the Web in the language (thinly-veneered http/html) of the web
  • Active Record — object-relational management for the rest of us — escaping the nasty .Net antipattern of starting with brittle stored procedures first and subsequently coding your way out into the user domain
  • Full-stacked-ness — with Active Record and similar patterns it only took one language (Ruby) to create your entire application — front end, back end, databases and presentation layer and all. Seemingly gone was the need for distinct (and non-cross-communicating) teams of DB-developers, middleware developers, and front-end developers.

I still love Rails, but by Version 4 Rails has left its simple past. Gone are the days when DHH (and everyone like him) could produce a good demo of a blog application, crafted from scratch in the course of a 15-minute video. As Rails has gotten bigger and richer, its universality has declined because it got cumbersome and subsequent programming models started with assumptions Rails pioneered.

I still love full-stack development and tools, and much as I like Rails as a replacement for sweet-but-overgrown J2EE, it’s now time to see what might advance us beyond sweet-but-overgrown Rails. Such an überstack might feature:

  • Full-stack: one programming language and model, top-to-bottom
  • Fast: I’ve always loved Smalltalk (Alan Kay’s gift to programming languages), but the cobra still bites at anything > 100 ms.
  • Universal: Separate teams with separate development languages is SO 1994! Even worse, desktops are so 1981 and modern code needs to expect to run on everything from handsets to big-screen displays.

Can any language and platform meet all these requirements? Fortunately, there is a solution!

Here’s what we need — power and consistency at the database, application, and presentation layers, with a common language and syntax across all layers. In Rails we covered the layers with Ruby from Active Record up to ERB, and if we’re going to get better and faster, we’ve got to get MEAN:

  • MongoDB (the database)
  • Express (presentation framework)
  • AngularJS (presentation language extensions)
  • Node.js (the web server)

Meteor and Ember.js are great emerging framework as well, and they all lead to full stack development with JavaScript everywhere. As I’ve written before, Node.js is wicked fast with modern JS engines. There are other nice frameworks rising in the JS world — Meteor.js, Derby.js and others — and JavaScript is the world’s most popular programming language, and has moved as near to ubiquity as a programming language can. For a great introduction to JavaScript and what makes it good, and a nice intro to MapReduce you might look here: Can Your Programming Language Do This?. To paraphrase William Gibson, the future is already here, and it's about to become more evenly distributed...

We’ll need these tools and popularity for the yin and the yang of the modern web age: The Internet of Things and BubblePop, which I’ll cover next time.