Wednesday
May112011

Cobra strike - 100 milliseconds to understanding new architectures

I've written many times now about NoSQL architectures and the rise of whole new species of data stores as the software of the Facebook age. But what's going on here, really? As Ian Fleming wrote in Goldfinger:

"Once is happenstance. Twice is coincidence. The third time it's enemy action."

I've probably written a dozen of the same pieces now on "new software architecture," so let's 1) take a look at what this is all about, and 2) let's see if we can see where it's all headed.

We see so many new components (Hadoop, NoSQL, Sphinx/SOLR, Node.js) with seemingly nothing to link them, other than as different exotic beasts in the new-software zoo. There are some fundamental truths behind why Mutual of Omaha's Software Kingdom is featuring them now, and Google has the answer. Not "Google the search engine" ... but Google the company.

Robin Bloor put a nice light on this back in 2009 with Why Google Won In The Search Market. In that post, Bloor might have been thinking of Google VP Marissa Mayer's famous ...Users really respond to speed... quotation when he wrote:

We can normally react to a stimulus in the 140-200 millisecond range, which is great news for cobras, because it takes a cobra about 100 milliseconds to bite. To put it another way, if a cobra is within striking range and it decides to bite you, it’s too late to stop it. If the mouse pointer moves more than 100 milliseconds after you move the mouse, it feels slow.

That brings me to the fundamental truths of modern software that link all the beasts in our zoo and point the direction ahead. They are:

  • In a high-resolution, handsetted, wifi'd world the distinctions blur between Enterprise software, Desktop software, and Handset software. It's all just software, delivered as a service everywhere
  • If your software can't respond in about 100 milliseconds you're dead. Down to 100ms the faster cobra always wins.
  • If you can't make down to 100ms, it doesn't matter how "good" your architecture is. It fails.

These three rules explain a lot about what's going on in software today. In my next postings we'll do a quick tour of the zoo with these new perspectives, and introduce a really neat package that is a harbinger of where this all is headed.

I've got 911 on speed dial. ~ Douglas Coupland

Saturday
Apr232011

Wicked Fast

“Now, here, you see, it takes all the running you can do, to stay in the same place. If you want to get somewhere else, you must run at least twice as fast as that!” Lewis Carroll ~ Alice's Adventures in Wonderland

I really love Ruby on Rails. My biggest pet peeve with software development platforms has always been their quest for generality -- "with our program, you could build anything, from an iPhone tic-tac-toe app to systems code for the Space Shuttle!" The problem here is that nobody wants to build just "anything" -- people's needs at any given time tend to be pretty specific. A platform that claims to be good for everything is generally good for nothing. That's where Rails comes in -- web apps is all it does.

The best frameworks are in my opinion extracted, not envisioned. And the best way to extract is first to actually do. ~ David Heinemeier Hansson ~ Ruby on Rails

This is where the "PT boats to Battleships" metaphor I wrote about in my last post comes in. I believe Rails is unbeatable for web apps, as long as the definition of a web app doesn't change. It was perfect for what it did -- but do we still do that anymore?

As I mentioned last post -- the world is changing and the old patterns may not work anymore. So what do you do? Is there a "Rails" for Ajax applications between handheld devices?

NodeJS

There is -- or at least there's the start of a platform built around a very different set of assumptions of what Internet applications are all about. It's called Node.js, and it springs from work that Ryan Dahl first published in 2009.

Node is really interesting and it builds on a capability of its core JavaScript language that Joel Spolsky wrote about in 2006: Can Your Programming Language Do This? -- the ability to package rich objects (including inline functions) as parameters in function calls. Hmmmm ... this sounds like this could get deep and theoretical... but stay with me: here's why it matters:

  1. In the web-pagey world, to respond to a request you compose and send a page. With simple web pages you can do this sequentially and are probably fine, and threads are there to bail you out wherever you aren't fine.
  2. Today, with Big Data databases and media files, you might get a request and not know if that request is EVER going to complete!? Processes block, and even the fastest processor can't do much while it's just sitting and waiting.
  3. The solution? A non-blocking architecture. It's fine to have long requests -- as long as you're not stuck waiting for them to finish.... so how do we do that?
  4. This is the problem Dahl solves with Node.js. Node is an event-driven architecture -- when requests come in, Node processes them by attaching a callback routine to them and launching them, and then moving on to the next request.

This is the perfect architecture for a modern web age with a mix of skinny and chunky requests. Rather than grinding through it all in sequence, you tell each request "Here you go -- call me when you're done..." and move on to the next thing. It's a clean approach, and with modern JavaScript engines, such as Google V8 or Apache SpiderMonkey, this kind of approach is fast.

Wicked fast.

Node.js is tight and clean, and it's amazing what you can get done with just a little code. Like all Unix-y code since Kernighan and Ritchie, Node.js has its Hello World app:

var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124);
console.log('Server running at http://127.0.0.1:8124/');

The explanation of the code is really simple -- it's just as it reads:

  1. Create a HTTP server
  2. Make it request / response
  3. Write a 200 code for success with plain text
  4. Write "Hello World"
  5. Listen on port 8124
  6. Tell the console that we're listening on 8124

It really is just that simple. I'm not sure how you'd make space shuttle code with it, but if you're looking for evented web apps with a tiny footprint, Node.js is it. For this blog posting I wanted to try something a bit bigger, so I put Ben Gourley's little NodeJS-driven presentation page up on the Amazon cloud:

The code was adapted from Ben's CLOCK BLO" site, and while the presentation isn't exactly full-featured, it has less than 100 lines of code invested in it, and it is...

Wicked Fast.

As you can see, the presentation, but it's in the best 15 minute tradition of new web platform development.

One of the beauties of Node.js with the Express package is that, despite its simplicity, it is still full Model-View-Controller, so setting up the code was easy, and laid out in a nice, clean beautiful way:

There's a lot to write about Node.js, its package manager npm, and development packages like Express, Connect, and Websockets/socket.io, and those will come in other posts. There's a lot here -- maybe the future of the handheld, small-screened, peer-to-peer web.

It really is ... WICKED FAST!

"This is your last chance. After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes." Morpheus ~ The Matrix

Saturday
Apr232011

Back to the Future

Don't worry. As long as you hit that wire with the connecting hook at precisely 88mph the instant the lightning strikes the tower... everything will be fine. ~ Back to the Future (1985)

One of the great challenges of working in technology is that patterns of thinking change quickly and from time to time, no matter how wired-in you are, you discover that everything you know is wrong. Novelist William Gibson is right: the future has already arrived -- it's just not evenly distributed... When I first learned Ruby on Rails back in 2006, it struck me as a wondrous advance on the Java development I was doing. Java had bulked up as an Enterprise solution so now, 5 years later, it's little surprise that Java End of Life is something Thoughtworks worries about.

In tech we often see the tail end of Clayton Christensen's The Innovator's Dilemma. In TID, disruptive technologies catch on because whatever they lack in robust features they make up for in agility. With time, though, the PT Boats grow into Battleships, and the cycle starts anew.

There are signs that this is happening now with Internet technology -- our toolsets (like Rails) have grown so fit to the task that they seem a bit ponderous as the task shifts. With enough shift we again conclude that everything we know is wrong and the cycle starts again.

"It's not what you don't know that kills you, it's what you know for sure that ain't true." ~ Mark Twain

Here's what we know about Internet technology today:

  • "Computers" are how people interact with the Internet
  • Modern apps display web pages and submit information
  • Pages are served from servers (of course)
  • The client-server Internet model works fine

WRONG, WRONG, WRONG, and WRONG. Here's the world we've been living in for a while now:

  • Today there are more wireless handsets than there are people on earth
  • In 2011, nobody updates a whole page anymore -- Ajax rules
  • To paraphrase Bill Joy -- no matter where you are, most of the interesting content is somewhere else (on someone else's handset)
  • Pages are easier -- and if we wait maybe those pesky smartphones will just go away...

We're ready for a new programming world, and I've been investigating that new world for a while now. With my next post post I'll write up what I've found. As in the sound clip below, you may not be ready for this yet -- but it'll be here soon "...and your kids are gonna love it!"

Tuesday
Mar082011

Happy Birthday, Bobby Fischer

We opened a group meeting at work today with the classic icebreaker "Two truths and a lie." In TTL everyone in the class writes down two obscure truths about themselves along with a single lie, and hands them in to the instructor on a folded sheet of paper. The instructor then selects a sheet at random, reads the "3 truths" and the class has to guess 1) who they apply to, and 2) which one is the falsehood.

I have a great fun truth for the game that runs like this:

I once won a chess tournament, playing blindfolded, and then didn't go on a date for 3 whole years!

This is a true story -- I was a terrific chess player and really did once win a tournament playing blindfolded in the town I grew up in. I also really did NOT go on a date for the 3 years that followed -- but that was OK, not because I was hopelessly geeky but because I was only 13 years old when I won, blindfolded.

My skills were in part the product of the man above, who would have celebrated his 67th birthday today. Bobby Fischer was the greatest American chess player ever -- maybe the greatest chess player ever, period. He won the World Chess Championship back in the Cold-Warry summer of 1972, and made the game of chess as much a sensation as chess could be back then.

I played chess all the time because Fischer was a sensation, and you could find people to play against easily back then. I played blindfolded because I'd read about historical American champion Paul Morphy, who was said to have been a great blindfold player by age 12. I was 13, and how hard could it be? It really wasn't that hard, and my mental images of board positions weren't blurred by my opponents having to recite their every move to me.

That was a fun time -- Fischer - Spassky taught a generation of American kids to spell "Reykjavik," and books like Fisher's My 60 Memorable Games gave my dad and me hours of fun -- playing each other and playing the classics. For me Fischer's most memorable game is his Game of the Century -- a breathtaking classic by a 13 year old boy against one of the strongest masters of his day.

It's a fine line between brilliance and madness, and the eccentric Fischer crossed over and back freely between his triumph in 1972 and his death in 2008. His gifts were wondrous games that we can still enjoy even today -- his birthday (born 1943).

Happy Birthday, Bobby Fischer.

The Game of the Century (scorecard)

Monday
Feb142011

Casi Casi ... Cassandra

I've written a couple of times about the "N+1 Queries" problem and I've suggested that it's a bane to relational databases. But there's a way out of it -- let me tell you about it.

But first let's wallow a bit in it. I'm in Twitter, I've written a tweet and I'm ready for it to be sent out to all of my (countless) followers... Here's what my code for that broadcast might look like:

All fine so far -- that's a Rubyish-take twittery world we all live in. I can send out my breathless message of what I had for breakfast, and then Twitter picks it up and broadcasts the message from me (as well as all the messages from the other tweeters):

So here we're going to do a query for each of the X tweeters, and for them we'll do another query for each of their Y followers.

Code smell! Fail Whale!!!

(particularly when you consider Dare Obasanjo's take on Twitter combinatorics)

The problem here is Relational: we need a SELECT to find me, and then a new SELECT to get the info on each of my followers. This "N+1 SELECTS" problem is a simplified version of a real problem, where relational databases stagger and where column-oriented databases are much more what we're looking for. Column oriented databases are designed to be fast at grabbing all of the attributes (columns) associated with a given entity. To understand why this is vital for a Twitter or any other social application, consider the one-to-manys: Twitter has many tweeters, who have many followers, who themselves have many followers... and so on.

Let's think, though, about the code that gets generated when I tweet. If we're using a relational database we'll follow a SELECT for each of my followers with a SELECT for each of their followers -- so we got a polynomial number of SELECTs grinding away for each tweet, and as I get more popular the the disks whirr and lights dim every time I tweet about anything.

So to save the power grid let's try a little Twitter application, but this time using the column-oriented data store Cassandra to handle our users and tweets.

I'll run this from the same Amazon Cloud instance that I've used for my previous postings:
So, in my terminal connected to Amazon, I enter:

sudo gem install cassandra

I've already put Java on my base instance, so I'm just about good to go! A single-line command, and it really does run...

Now, lets start Twitter and tweeting. We'll use the Ruby interpreter IRB on Amazon to enter our users and their tweets:

root@ip-10-245-133-190:/var/www/apps# irb

We're rolling -- first we'll enter our requirements: rubygems to run our additional toys, cassandra to link to the data store we just installed, and SimpleUUID to identify our tweeters:

Now we'll start Twitter in Cassandra, and put in some users and screen names (I've mostly left the Cassandra responses out for brevity here):

Great so far -- we have user 5, "mudcat," and we've given him a tweet. Let's give him someone to tweet to:

And there we are -- we have a reasonable data model for Twitter, backed by the Cassandra data store. Let's review what we've got here:

Cassandra works as a kind of multidimensional hash, and the data it contains can be referenced as:

  • A keyspace
  • A column family
  • An optional super column A column, and
  • A key
    

Source: http://nimbledais.com/?tag=column-family

Here's what these all mean:

The keyspace is the highest, most abstract level of organization. Our Cassandra conf/storage-conf.xml file contains our keyspace definitions at startup.

The column-family is the chunk of data that corresponds to a particular key. In Cassandra each column family is stored in a separate file on disk, so frequently-accessed data should be placed in a column family for fastest access. Column families are also defined at startup.

A super column is a named list, containing standard columns stored in recency order A column is a tuple, a key-value pair with a key (name) and a value
A key is the permanent name of the record, and keys are defined on the fly
With this structure we're basically defining a schema, and I'd like to claim it's original, but this one was taken from Twissandra by Eric Florenzano.

The great thing about Cassandra is that it evolved to solve real-world problems, and that it may have a free form but it is NOT exactly schema-less. Cassandra may fall in the "NoSQL" class with Hadoop, but the use cases that apply to it could scarcely be more different. Runtime lookups can be handled really well in Cassandra, due to Cassandra's low latency organization and strict definition. Asychronous analytics with the freedom of high latency and greater flexibility demands are a better fit for analytics systems like Hadoop.

Cassandra generally offers terrific performance. There is a tradeoff in eventual consistency, something that perhaps I'll take up in my next blog post.

Page 1 ... 3 4 5 6 7 ... 8 Next 5 Entries »