« Consumerizing Big Data | Main | How do I get started? A General Solution to Discovery in Big Data »

You Only Live Twice (Basho and Riak)

You only live twice...
When you are born, and
When you look death in the face
Ian Fleming ~ "You Only Live Twice"

It's not about the bike. It's a metaphor for life...
Lance Armstrong ~ "It's Not About the Bike"

Today was a big day for me. Way back on June 6, 2008 I was in a terrible car-bike accident. It was so bad that the first word that got sent to a traffic copter overhead was that I'd been killed. I hadn't, but it was a couple of months of hospitalization and six months of hard rehab before I was back to anything like my life before the accident again. I got great support from my family and with great care and therapy I even got back on the bike again.

January 1, 2009 was my first post-accident bike ride - 1.4 miles around Clement Park lake here in Littleton, Colorado. As little as that was, I kept at it and today, 3 years later, I completed my 10,000th mile since the accident. It's true that you "only live twice," and the greatest gift in life is to come back from that edge.

The quotation above is a haiku coined by James Bond in the book "You Only Live Twice," which Bond himself declares "...after Basho..." -- referring to Matsuo Basho, the great Japanese poet (1644-1694). Basho was the master of the haiku, and a nice sampling of his work can be found here: A Selection of Matsuo Basho's Haiku.

Basho may be revered as a poet-laureate of Japan (something like Robert Frost is considered here) but it's a shame that there's so little awareness of his work. Our world is full of fine, obscure art, and the joy of an internet-enabled world is that it's not so hard to find it anymore.

Basho's name (if not his verse) lives on in the NoSQL datastore company Basho, and through their key-value store database Riak. I spent the weekend getting Riak rolling in the cloud -- it's not hard to set up, and it's scalable, flexible and fast as a key-value store. Here's a quick peek at how I got there:

Riak was designed for robustness, speed and scalability, and to get started with Riak you'll need to install the programming language Erlang first. Riak was built with Erlang, and Erlang is a terrific jackrabbit of a language that even on its own is absolutely worth a look. I was running 10.04 LTS (Lucid Lynx) on AWS, and in that world the Erlang install only took 4 steps:

curl -O http://erlang.org/download/otp_src_R14B03.tar.gz
tar zxvf otp_src_R14B03.tar.gz
cd otp_src_R14B03
./configure && make && sudo make install

The latest Erlang (R15B) doesn't work yet with the latest (1.02) Riak, so you'll want to make sure you're linking compatible pairs of Erlang and Riak. Once that's complete, it's also a simple set of steps to install Riak:

curl -O http://downloads.basho.com/riak/riak-1.0.2/riak-1.0.2.tar.gz
tar zxvf riak-1.0.2.tar.gz
cd riak-1.0.2
make rel

With Erland and Riak installed we're ready to get rolling. Inasmuch as I see "Big Data" as an emerging data structure and both NoSQL and Hadoop as tools forming the operating system around that data structure, I like (where I can) to stick to high-level languages and OBDM (object-big-data-mapping) tools for access to the structure. Fortunately, Sean Cribbs has just released Ripple, an Active Model-based document abstraction utility based on Active Record and MongoMapper. With Ripple added, we just need a bit of code (and a big assist to Justin Pease) to migrate our Redis-based URL shortener over to Riak. But first, let's get Riak working:

First we'll need a new Rails project to test Riak:

$rails new riaktest

Then we'll go into riaktest and add Ripple and curb to our Rails 3.x Gemfile, and do a bundle install:

gem 'ripple', :git => 'http://github.com/seancribbs/ripple.git'
gem 'curb'

Save the Gemfile, and then

$ bundle install

Next we'll add Ripple into or config/database.yml:

    port: 8098
    host: localhost

Next we'll add a little Url class in app/models/url.rb:

require 'ripple'
class Url
  include Ripple::Document
  property :ukey, String, :presence => true
  property :url,    String

And finally we'll fire up Riak:

$ /var/www/apps/riak-1.0.2/rel/riak/bin/riak start

With our Development environment complete, we can now dive into Rails on the console and play with our Riak data store:

$ rails console@
Loading development environment (Rails 3.1.3)
ruby-1.9.2-p290 :001 > url = Url.new
 => <Url:[new] ukey=nil url=nil>
ruby-1.9.2-p290 :002 > url.ukey = "2432"
 => "2432" 
ruby-1.9.2-p290 :003 > url.url = "http://www.ibm.com"
 => "http://www.ibm.com" 
ruby-1.9.2-p290 :004 > url.valid?
 => true 
ruby-1.9.2-p290 :005 > url.save
 => true 
ruby-1.9.2-p290 :006 > exit

Great -- we've initialized our data store, and gone away (thus the "exit") above. Now we can come back and access our Riak store:

rails console
Loading development environment (Rails 3.1.3)
ruby-1.9.2-p290 :001 > newurl = Url.first
 => <Url:TdxQ3iFGEwkmfMrYQBmvwcZYoCM ukey="2432" url="http://www.ibm.com">
ruby-1.9.2-p290 :002 > exit

So we have Riak operational on the Amazon cloud, and it's a small matter of coding to move our Redis URL shortener over to a new back end. In my next posting I'll show how we can do that, and do a little Apache Benchmark testing to see how our little example applications benchmark out.

We'll end with a little inspiration from Lance Armstrong: