<?xml version="1.0" encoding="UTF-8"?>
<!--Generated by Squarespace Site Server v5.11.81 (http://www.squarespace.com/) on Thu, 31 May 2012 06:30:58 GMT--><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><title>Journal</title><link>http://www.pikasoft.com/journal/</link><description></description><lastBuildDate>Tue, 29 May 2012 20:46:09 +0000</lastBuildDate><copyright></copyright><language>en-US</language><generator>Squarespace Site Server v5.11.81 (http://www.squarespace.com/)</generator><item><title>Understanding Social Media "Insanity"</title><dc:creator>John Repko</dc:creator><pubDate>Mon, 28 May 2012 03:05:37 +0000</pubDate><link>http://www.pikasoft.com/journal/2012/5/27/understanding-social-media-insanity.html</link><guid isPermaLink="false">512153:5946984:16467870</guid><description><![CDATA[<p><a href="http://blog.eloqua.com/social-media-infgraphic/"><img src="http://img.scoop.it/rfrD5bTwZOy_6QRhU9f8BDl72eJkfbmt4t8yenImKBVaiQDB_Rd1H6kmuBWtceBJ" alt="" /></a></p>

<blockquote><p>"Insanity is relative. It depends on who has who locked in what cage." ~ Ray Bradbury</p></blockquote>

<p>Well the Facebook <span class="caps">IPO </span>has been completed, and the first crazy thing we might consider is the diversity of opinions on the success or failure of the <span class="caps">IPO.  </span> Put me in the "success" camp -- the objective of an <span class="caps">IPO </span>is to raise money in exchange for a share of the company.   Offering shares at $38 was a great deal for Facebook, and if the market now values those shares at 16% less, that only reinforces the notion that Facebook got an impressive price for its shares.</p>

<p>The valuation of Facebook is a second insanity that we might consider.   Most analysts have focused on the monetization of pageviews, noting (for example) that <a href="http://cdixon.org/2012/05/15/facebooks-business-model/">Google generates a lot more revenue per pageview</a>, and that this speaks of strong monetization upside potential for Facebook.  This may be so, but we should also consider that Facebook is a media channel, and that there are a booming number of media channels competing for eyeballs and online time.</p>

<p>Business Insider started all this with their article: <a href="http://www.businessinsider.com/social-media-marketing-landscape-complicated-2012-5">This <span class="caps">INSANE</span> Graphic Shows How Ludicrously Complicated Social Media Marketing Is Now.</a>   That graphic, as well as the more florid one here:  <a href="http://www.pamorama.net/wp-content/uploads/2010/03/conversationprism.jpg">The Conversation Prism</a> show hundreds of competitors for a slice of the social pie.  </p>

<p>So many companies, so little time.  Why do they bother?  Why would another company ply the Social space?  Sure, Google and Facebook might buy a bunch of them, but why should Google and Facebook do that?  To clear up this seeming insanity, let's take a look at how eyeballs and share might work in a social media space.  To sort things out we'll apply a technique called "Markov analysis" to the Social Media space.</p>

<p>Markov analysis is an evaluation approach that uses the current movement of a variable to predict the future movement of that variable.  Here we'll look at the "Url Shortener" subset of the Social space, but the same approach can be used independent of the number of companies under review.   We've played with <span class="caps">URL </span>shortener's before, describing them here: <a href="http://www.pikasoft.com/journal/2011/1/29/spreadsheets-for-the-new-millennium.html">Spreadsheets for the New Millennium</a> and implementing one here: <a href="http://jkr-blog.dyndns.org:3001/mini_urls">MiniURLs for the Masses</a>, but this time we're going to look at three of the leading <span class="caps">URL </span>shortener offerings: <a href="http://bit.ly">bit.ly</a>, <a href="http://tinyarrows.com/">tinyarrows</a> and <a href="http://tinyurl.com/">tinyurl.com</a>.</p>

<p>To get started with our analysis we need to look at the current share for our providers and to get a sense of where customers come from for each of our providers, Bit.ly, TinyArrows and TinyURL.   A <strong>hypothetical</strong> model of that information is presented in what is called a Transition Table, as shown below:</p>

<p><img src="http://s3.amazonaws.com/web_picts/Transition.jpg" alt="" /></p>

<p>Here's how to read a Transition Table:</p>

<ol>
<li>Start with initial customer counts and market share</li>
<li>For each provider and each competitor, note the gains and losses for the time period in question</li>
<li>A single "play" of the Transition Table takes us from May market share to June market share</li>
</ol>

<p>Microsoft Excel is not a bad place to start for share analyses, but for our calculations (and for a greater number of providers, certainly) we'll want a more powerful tool with Matrix math and/or linear algebra functionality, like NumPy (for Python), or linalg (for Fortran through Ruby).   For the purposes of this review, I'll use Mathematica to show the essential matrix calculations that can show us evolving Markov analysis for estimating market share.</p>

<p>In this analysis we'll use a first-order Markov process, and assume that the customer purchase decision for each month depends <strong>only</strong> on the choices available for that month.   <a href="http://www.siam.org/meetings/sdm01/pdf/sdm01_04.pdf">Studies have shown</a> that first order Markov processes can be successful at predicting web behavior, particularly if the transition matrix is stable.</p>

<p>We can load our transition matrix into Mathematica, where the Mathematica transition matrix vectors are generated by calculating losses to competitors:  Bit.ly (for example), kept 920 customers in May, but lost 23 to TinyArrows and 57 to TinyURL, yielding their vector of {.920, .023, .057 }.</p>

<p>The result is shown below:</p>

<p><img src="http://s3.amazonaws.com/web_picts/initial.jpg" alt="" /></p>

<p>The key to Markov analysis is the ability to determine or estimate the number of customers gained-from and lost-to competitors.  Web analytics can often provide an estimate for such customer migrations, as can the results of a "competitive upgrade" marketing program.</p>

<p>Markov analysis for a single month can show meaningful transitions, but a more useful analysis can be had when</p>

<ol>
<li>The transition matrix is assumed to be stable, and</li>
<li>The model is used to determine equilibrium market shares</li>
</ol>

<p>Such an analysis is shown below:</p>

<p><img src="http://s3.amazonaws.com/web_picts/second.jpg" alt="" /></p>

<p>As we might guess from the initial transition table, this is a very favorable market for Bit.ly, based in the hypothetical numbers presented here.   Bit.ly started with an even share of the market, but will evolve to nearly double the market share of it's competitors with the transition table shown here.  If there are second-order effects (such as Bit.ly being seen as a "leader" in potential customers' eyes) then the share gain my be even larger than that shown here.</p>

<p>But that's not the only fascinating thing about Markov analysis here.   <strong><em>It's not where you start the game, but  how well you play it.</em></strong>   If we keep the transition matrix constant (i.e. how the game is played), then even if we drop Bit.ly and Tinyarrows to 1% market share and play the game to equilibrium, we still end up with the same basic equilibrium that we'd achieved from even shares!   The effect of playing this game to equilibrium is shown below:</p>

<p><img src="http://s3.amazonaws.com/web_picts/final.jpg" alt="" /></p>

<p>So perhaps this is the "Ah HA!" of the crowded social media space, and the reason that small companies keep entering the space to try to carve their niche in it.  The model here might suggest the following:</p>

<ol>
<li>In a world of compute clouds, the barriers-to-entry for social media startups is low</li>
<li>The social media space is new enough that many firms with "one-stripe zebra" distinctive competencies might still carve out and defend niches successfully -- they <strong><em>play well</em></strong></li>
<li>Publicly traded firms (like Facebook and Google) are compelled to increase market share and earnings and have powerful incentives to change the nature of competition -- to "shake up" the transition matrix from time to time</li>
<li>Nothing shakes up a transition matrix like the acquisition of a competitor</li>
<li><a href="http://www.embracingchaos.com/2011/07/google-and-facebook%E2%80%99s-natural-monopoly-in-social-networks.html">Technology tends to produce natural monopolies</a>, but only if a leader can acquire enough share that higher-order monopolistic effects take over</li>
</ol>

<p>So -- when all is said and done, it really is in the interest of lots of niche firms to try to carve out a defensible space, and it is in Facebook's and Google's interest to acquire the pieces that let the "natural monopolies" play out.</p>

<p>So -- Social Media "Insanity?" --  "Crazy like a fox" is more like it.</p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-16467870.xml</wfw:commentRss></item><item><title>Consumerizing Big Data</title><dc:creator>John Repko</dc:creator><pubDate>Sun, 05 Feb 2012 21:52:44 +0000</pubDate><link>http://www.pikasoft.com/journal/2012/2/5/consumerizing-big-data.html</link><guid isPermaLink="false">512153:5946984:14885893</guid><description><![CDATA[<p><a href="http://www.presentationzen.com/presentationzen/2009/06/simplicity-in-las-vegas.html"><img src="https://images20120204.s3.amazonaws.com/simplicity.jpeg" alt="" /></a></p>

<blockquote><p>Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.<br />
~ Antoine de Saint Exupéry</p></blockquote>

<p>These are great days for Big Data -- Oracle's now in the game <a href="http://www.oracle.com/us/products/database/exadata/overview/index.html">with an appliance</a> and <a href="http://www.oracle.com/technetwork/database/berkeleydb/overview/index.html">a new database</a>, Microsoft <a href="http://blogs.msdn.com/b/uk_faculty_connection/archive/2011/11/20/microsoft-big-data-solution-sql-server-apache-hadoop-and-windows-azure.aspx">has</a> <a href="http://radar.oreilly.com/2012/01/microsoft-big-data.html">all</a> <a href="http://www.zdnet.com/blog/microsoft/understanding-microsofts-big-picture-plans-for-hadoop-and-project-isotope/11466">kinds</a> <a href="http://blogs.technet.com/b/next/archive/2011/12/06/big-data-and-microsoft-s-codename-data-explorer.aspx">of new initiatives</a> <a href="http://www.zdnet.com/blog/microsoft/microsoft-drops-dryad-puts-its-big-data-bets-on-hadoop/11226">post-Dryad</a>, and Amazon is going <a href="https://forums.aws.amazon.com/ann.jspa?annID=1326">big data and Enterprise with DynamoDB</a>.</p>

<p>Where are we going with this?   The new initiatives may validate the space but they belie the notion that  "more is better."   More <strong><em>is</em></strong> better, but only until the field gets swept by <strong><em>less</em></strong>.   37Signals suggests that you <a href="http://gettingreal.37signals.com/ch02_Build_Less.php">Underdo your competition</a>, and the late Steve Jobs raised simplicity to a high art.   I suggest that Big Data will reach gestalt when we agree, not on more, but on <strong><em>less</em></strong>.</p>

<p>To appreciate the power of less, lets go back to one of my favorite Big Data solutions -- the one based on the terrific Phil Whelan article: <a href="http://www.bigfastblog.com/map-reduce-with-ruby-using-hadoop">Map Reduce with Ruby Using Hadoop.</a>  We got a nice solution working last year, and I <a href="http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and-cloudera.html">posted about it then</a>.  In that posting, I noted that Cloudera scripts make Hadoop accessible for the masses, but was that all there is to it?</p>

<p>As with late-night-TV, I have to offer: "But Wait!  There's <strong><em>More</em>...</strong>"  Indeed there is, and better yet there's <strong><em>Less</em></strong>.   To show where we're headed let's take another look at that Hadoop solution.</p>

<p>The Hadoop app we wrote last year was based on an earlier version of Cloudera's Hadoop release -- <span class="caps">CDH </span><a href="http://archive.cloudera.com/cdh/3/whirr-0.1.0+23.tar.gz">version 0.1.0+23</a>.   That version was a lot of Cloudera ago, so we'll explore Hadoop with the latest version, <a href="https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information#CDHVersionandPackagingInformation-CDHDownloadInformation"><span class="caps">CDH </span>version 3 Update 3</a>.   <span class="caps">CDH3</span> U3 <a href="https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs">integrates Hadoop 0.20.2 with a lot of goodies</a> that we'll see later, including</p>

<ul>
<li>Mahout 0.5+9.3 -- we'll see this later as part of our Recommendation Engine</li>
<li>Hive-0.7.1+42.36 and Pig 0.8.1+28.26 for programming</li>
<li>Whirr 0.5.0+4.8 -- we'll use here for cloud integration, and </li>
<li>Zookeeper 3.3.4+19 -- to coordinate the processes we spawn</li>
</ul>

<p>Download and installation are much as we performed last year, and we'll start with a similar word-count application that we ran last year.   But first -- let's define our data input sources and output directory, and kick off our Hadoop run:</p>

<p><img src="https://images20120204.s3.amazonaws.com/1_dict_setup_and_map.jpg" alt="" /></p>

<p>Now we've got input $IN and output $OUT sources set, and after a bunch of output to <span class="caps">STDOUT </span>we pull things together with:</p>

<p><img src="https://images20120204.s3.amazonaws.com/1_dict_reduce.jpg" alt="" /></p>

<p>...and we can go to $OUT to see the results:</p>

<p><img src="https://images20120204.s3.amazonaws.com/1_dict_results.jpg" alt="" /></p>

<p>So fine so far -- we've got the same 13 aardvarks and aardwolves we had last year, from the same Macintosh dictionary file we looked at last year.   One dictionary is nice, but by setting the input and output directories as we have we can run Hadoop on much more than just one file.   Since we routinely run on Ubuntu Linux, let's take its dictionary file was well and add it to the mix.  Here I've got a copy of the Ubuntu dictionary, entitled "unix_words."  Let's copy it on in, and have another run.</p>

<p>First we'll add in unix_words and kick off the Hadoop run:</p>

<p><img src="https://images20120204.s3.amazonaws.com/2_dict_map.jpg" alt="" /></p>

<p>It runs much as before, and here are our results:</p>

<p><img src="https://images20120204.s3.amazonaws.com/2_dict_results.jpg" alt="" /></p>

<p>Bingo!  Our varks and wolves are now supplanted by "a'" at the top of our list, but there are 21 of them now.  We could add more data, hundreds more or thousands more input files and it's a one-line command to perform the analysis.  But that's not all we can do.  As we did last year, we have simple map and reduce files -- let's try adjusting the map file to sort by the first <span class="caps">THREE </span>letters this time.</p>

<p>It's a simple 2-line change to make our map function grab 3-letter combinations.  Here's our new map.rb function.</p>

<p><img src="https://images20120204.s3.amazonaws.com/map.rb.jpg" alt="" /></p>

<p>We can save it, and as we've defined a run_hadoop function and set $IN and $OUT, we can trigger our ./run_hadoop and see the new results.</p>

<p><img src="https://images20120204.s3.amazonaws.com/new_map_start.jpg" alt="" /></p>

<p>Simple start -- we'll clear out our previous $OUT results, and with the new map.rb file we'll kick off another Hadoop run.   Here we made a simple change (2 letters to 3) but there's no reason we couldn't get more creative with our simple map and reduce functions.  Let's see what we get:</p>

<p><img src="https://images20120204.s3.amazonaws.com/new_map_results.jpg" alt="" /></p>

<p>So there we are.   Our analysis is not exactly Turing-award rich, but we've got a couple of things here that might really change the game for Big Data analysis.   Specifically, we've got</p>

<ul>
<li>A standard input target directory (could be "file system," but this is a start)</li>
<li>A standard output target</li>
<li>A flexible, readable map function</li>
<li>Standard location and processing for output</li>
</ul>

<p>We have the core components of a big data application emerging.  Rather than "one-offing" Big Data analysis, we can standardize the basic approach by </p>

<ul>
<li>Enriching the mappers and reducers </li>
<li>Expanding our input processing, and </li>
<li>Feed our outputs to visualization tools like Jaspersoft or Tableau</li>
</ul>

<p>If we put the platform on a standard (HBase) data store and tie in search engine and matrix processing we start to approach the long-sought <em>spreadsheet for the new millennium</em>.   We're still just getting started, but the future is this way...</p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-14885893.xml</wfw:commentRss></item><item><title>You Only Live Twice (Basho and Riak)</title><dc:creator>John Repko</dc:creator><pubDate>Mon, 16 Jan 2012 02:14:06 +0000</pubDate><link>http://www.pikasoft.com/journal/2012/1/15/you-only-live-twice-basho-and-riak.html</link><guid isPermaLink="false">512153:5946984:14595523</guid><description><![CDATA[<p><a href="http://4.bp.blogspot.com/_i8bLtPph-Ug/RvDvc8NyaUI/AAAAAAAAAoI/Vqi_2WrAmTE/s1600/Basho+3..jpg"><img src="https://pikasoft.s3.amazonaws.com/basho_mountain.jpg" alt="" /></a></p>


<blockquote><p>You only live twice<br />
Once when you are born, and once<br />
When you look death in the face<br />
<em>Ian Fleming ~ "You Only Live Twice"</em></p></blockquote>

<blockquote><p>It's not about the bike. It's a metaphor for life...<br />
<em>Lance Armstrong ~ "It's Not About the Bike"</em></p></blockquote>

<p>Today was a big day for me.  Way back on June 6, 2008 I was in a terrible car-bike accident.   It was so bad that the first word that got sent to a traffic copter overhead was that I'd been killed. I hadn't, but it was a couple of months of hospitalization and six months of hard rehab before I was back to anything like my life before the accident again.   I got great support from my wife Barbara and son Bryan, and with great care and therapy I even got back on the bike again.   </p>

<p>January 1, 2009 was my first post-accident bike ride - 1.4 miles around Clement Park lake here in Littleton, Colorado.  As little as that was, I kept at it and today, 3 years later, I completed my 10,000th mile since the accident.  It's true that you "only live twice," and the greatest gift in life is to come back from that edge.</p>

<p>The quotation above is a haiku coined by James Bond in the book "You Only Live Twice," which Bond himself declares "...after Basho..." -- referring to <a href="http://en.wikipedia.org/wiki/Matsuo_Bash%C5%8D">Matsuo Basho</a>, the great Japanese poet (1644-1694).  Basho was the master of the haiku, and a nice sampling of his work can be found here:  <a href="http://thegreenleaf.co.uk/hp/basho/00bashohaiku.htm">A Selection of Matsuo Basho's Haiku</a>.</p>

<p>Basho may be revered as a poet-laureate of Japan (something like Robert Frost is considered here) but it's a shame that there's so little awareness of his work.  Our world is full of fine, obscure art, and the joy of an internet-enabled world is that it's not so hard to find it anymore.</p>

<p>Basho's name (if not his verse) lives on in the NoSQL datastore company <a href="http://basho.com/">Basho</a>, and through their key-value store database Riak.   I spent the weekend getting Riak rolling in the cloud -- it's not hard to set up, and it's scalable, flexible and fast as a key-value store.   Here's a quick peek at how I got there:</p>

<p>Riak was designed for robustness, speed and scalability, and to get started with Riak you'll need to install the programming language <a href="http://www.erlang.org/">Erlang</a> first.  Riak was built with Erlang, and Erlang is a terrific jackrabbit of a language that even on its own is absolutely worth a look.   I was running 10.04 <span class="caps">LTS </span>(Lucid Lynx) on <span class="caps">AWS, </span>and in that world the Erlang install only took 4 steps:</p>

<p><code>curl -O http://erlang.org/download/otp_src_R14B03.tar.gz</code><br />
<code>tar zxvf otp_src_R14B03.tar.gz</code><br />
<code>cd otp_src_R14B03</code><br />
<code>./configure &amp;&amp; make &amp;&amp; sudo make install</code></p>

<p>The latest Erlang (R15B) doesn't work yet with the latest (1.02) Riak, so you'll want to make sure you're linking compatible pairs of Erlang and Riak.   Once that's complete, it's also a simple set of steps to install Riak:</p>

<p><code>curl -O http://downloads.basho.com/riak/riak-1.0.2/riak-1.0.2.tar.gz</code><br />
<code>tar zxvf riak-1.0.2.tar.gz</code><br />
<code>cd riak-1.0.2</code><br />
<code>make rel</code></p>

<p>With Erland and Riak installed we're ready to get rolling.   Inasmuch as I see "Big Data" as an emerging data structure and both NoSQL and Hadoop as tools forming the operating system around that data structure, I like (where I can) to stick to high-level languages and <span class="caps">OBDM </span>(object-big-data-mapping) tools for access to the structure.   Fortunately, Sean Cribbs has just released <a href="https://github.com/seancribbs/ripple#readme">Ripple</a>, an Active Model-based document abstraction utility based on Active Record and <a href="http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first-application.html">MongoMapper</a>.   With Ripple added, we just need a bit of code (and a big assist to <a href="http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and-ripple.html">Justin Pease</a>) to migrate our Redis-based <span class="caps">URL </span>shortener over to Riak.   But first, let's get Riak working:</p>

<p>First we'll need a new Rails project to test Riak:</p>

<p><code>$rails new riaktest</code></p>

<p>Then we'll go into riaktest and add Ripple and curb to our Rails 3.x Gemfile, and do a bundle install:  </p>

<p><code>gem 'ripple', :git =&gt; 'http://github.com/seancribbs/ripple.git'</code><br />
<code>gem 'curb'</code></p>

<p>Save the Gemfile, and then</p>

<p><code>$ bundle install</code></p>

<p>Next we'll add Ripple into or config/database.yml:</p>



<pre>
ripple:
  development:
    port: 8098
    host: localhost
</pre>



<p>Next we'll add a little Url class in app/models/url.rb:</p>



<pre>
require 'ripple'
class Url
  include Ripple::Document
  property :ukey, String, :presence =&gt; true
  property :url,    String
end
</pre>



<p>And finally we'll fire up Riak:  </p>

<p><code>$ /var/www/apps/riak-1.0.2/rel/riak/bin/riak start</code></p>

<p>With our Development environment complete, we can now dive into Rails on the console and play with our Riak data store:</p>



<pre>
$ rails console@
Loading development environment (Rails 3.1.3)
ruby-1.9.2-p290 :001 &gt; url = Url.new
 =&gt; &lt;Url:[new] ukey=nil url=nil&gt;
ruby-1.9.2-p290 :002 &gt; url.ukey = &quot;2432&quot;
 =&gt; &quot;2432&quot; 
ruby-1.9.2-p290 :003 &gt; url.url = &quot;http://www.ibm.com&quot;
 =&gt; &quot;http://www.ibm.com&quot; 
ruby-1.9.2-p290 :004 &gt; url.valid?
 =&gt; true 
ruby-1.9.2-p290 :005 &gt; url.save
 =&gt; true 
ruby-1.9.2-p290 :006 &gt; exit
</pre>



<p>Great -- we've initialized our data store, and gone away (thus the "exit") above.   Now we can come back and access our Riak store:</p>



<pre>
rails console
Loading development environment (Rails 3.1.3)
ruby-1.9.2-p290 :001 &gt; newurl = Url.first
 =&gt; &lt;Url:TdxQ3iFGEwkmfMrYQBmvwcZYoCM ukey=&quot;2432&quot; url=&quot;http://www.ibm.com&quot;&gt;
ruby-1.9.2-p290 :002 &gt; exit
</pre>



<p>So we have Riak operational on the Amazon cloud, and it's a small matter of coding to move our Redis <span class="caps">URL </span>shortener over to a new back end.   In my next posting I'll show how we can do that, and do a little Apache Benchmark testing to see how our little example applications benchmark out.</p>

<p>We'll end with a little inspiration from Lance Armstrong:</p>

<p><embed type="application/x-shockwave-flash" flashvars="audioUrl=http://s3.amazonaws.com/funny_jkr/lance.mp3" src="http://www.google.com/reader/ui/3523697345-audio-player.swf" width="400" height="27" quality="best"></embed></p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-14595523.xml</wfw:commentRss></item><item><title>How do I get started? A General Solution to Discovery in Big Data</title><dc:creator>John Repko</dc:creator><pubDate>Wed, 25 May 2011 19:20:06 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/5/25/how-do-i-get-started-a-general-solution-to-discovery-in-big.html</link><guid isPermaLink="false">512153:5946984:11575940</guid><description><![CDATA[<p><img src="http://s3.amazonaws.com/jkr_images/yellow_brick_road.jpg" alt="" /></p>

<p><em>Source: http://www.flickr.com/photos/41829005@N02/6162370327/</em></p>

<p>I've used the "spreadsheet" as a metaphor for an epiphany -- in this case combining enabling technologies (cheap PC processing, high-resolution displays and cheap memory) to provide a new metaphor for problem solving.      Spreadsheet visual programming is a perfect metaphor for financial analysis because the rows-and-columns of financial ledgers map crisply to rows and columns on a computer screen.  The final essential piece of the "PC Data" revolution arrived when a macro language was built into Lotus 1-2-3 that hadn't been build into Visicalc.  This single feature guaranteed the hegemony of 1-2-3 and spreadsheets, as the macro language made them capable of solving problems outside of the domains envisioned but the first spreadsheet's developers.</p>

<p>Before spreadsheets, if you had a problem you could either lay it out on paper, or have a programmer write a specific program to perform the analysis you wanted.   "Exploration" and "Discovery" were limited to what you could describe to a developer to program.   Life before spreadsheets was brutish and short…</p>

<p><img src="http://appraisalnewsonline.typepad.com/photos/uncategorized/2008/01/08/matrix_data.jpg" alt="" /></p>

<p><em>Source: http://appraisalnewsonline.typepad.com/photos/uncategorized/2008/01/08/matrix_data.jpg</em></p>

<p>So here we are today, at the dawn of the Big Data era.   The core toolset is emerging (MapReduce via the Hadoop family of products) and word is spreading that remarkable solutions might be found in data that we formerly thought of as "disposable."   The old problem is back, though -- if you (as a manager or executive) want solutions, you better go find a programmer.   There are steps being taken to bring us <em>spreadsheets for big data</em> -- <a href="http://techcrunch.com/2010/04/13/datameer-raises-2-5-million-for-apache-hadoop-based-analytics-platform/">Datameer</a> particularly is bringing spreadsheets to Big Data.  Or, more properly, bringing Big Data to spreadsheets.   They may move Big Data forward, but there's an impedance mismatch here -- if Big Data naturally fit in the rows and columns of spreadsheets it would already have made the jump and be found there.   If Big Data describes a world beyond rows and columns, then the spreadsheet metaphor will end up fitting Big Data like a bad suit.    Sure, we'll have our familiar rows and columns, but like Mozart played on a kazoo something in the essential nature of the data will be lost.</p>

<p>The answer for Big Data is a spreadsheet <em>conceptually</em>, but with a richer representational metaphor than rows and columns.     We want fundamental insights from big data, so our building blocks should match the topologies that we're studying.   Here's a first take at what "rows and columns" for Big Data might look like:</p>

<ul>
<li><strong>Predictive Modeling</strong> -- stripped of scale, are there linear relationships in the data that offer explanatory or predictive value?</li>
<li><strong>Clustering Partition</strong> -- is the data uniformly distributed or clustered, and what can we learn from the clusters?</li>
<li><strong>N-Dimensional Visualization</strong> -- US Supreme Court Justice Potter Stewart once said that he couldn't define pornography, but "…He knew it when he saw it."   Are  there visual representations of Big Data that provide insight?</li>
<li><strong>Outlier Analysis</strong> -- does the data follow a predictable distribution (normal, exponential, poisson, etc.) and if we can fit the data to control charts, and what is meant by outliers to those charts?</li>
<li><strong>AB Analysis</strong> -- The data may be noisy, but can we use it to measure the performance of key variables against each other?</li>
<li><strong>Markov Chains</strong> -- You know the score this far into the game, and your customers' web interactions foreshadow their interests going forward.   Where are we heading, and when do we get there?</li>
</ul>

<p>These are our rows and columns, and in my next post I'll describe the architecture I'm pursuing to explore them, an architecture built around:</p>

<p><img src="https://s3.amazonaws.com/jkr_images/discovery.jpg" alt="" /></p>

<ul>
<li><span class="caps">HDFS </span>for general data storage</li>
<li>HBase for data management</li>
<li>Hadoop for unstructured data analysis</li>
<li>Zookeeper for task management</li>
<li><span class="caps">SOLR </span>for structured "free text" search</li>
<li>Thrift for access to external development languages and platforms</li>
<li>Massive_record to provide <span class="caps">ORM</span>-access to all that HBase data</li>
<li>JQuery for unobtrusive JavaScript and core visual presentation</li>
<li><span class="caps">SIMILE </span>for advanced visual presentation</li>
<li>Tableau for advanced visual presentation</li>
<li>Node.js to serve up all that JavaScript</li>
</ul>

<p>That's a lot to describe and it'll take some posting to do it, but the ultimate objective never changes -- to provide a sandbox that managers can play with and coax Big Data into giving up it's secrets.</p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11575940.xml</wfw:commentRss></item><item><title>Spreadsheets for the New Millennium -- Part 3</title><dc:creator>John Repko</dc:creator><pubDate>Mon, 23 May 2011 22:33:13 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/5/23/spreadsheets-for-the-new-millennium-part-3.html</link><guid isPermaLink="false">512153:5946984:11554492</guid><description><![CDATA[<p><img src="http://datatreespreadsheets.co.uk/ESW/Images/A_World_Beyond_Spreadsheets.JPG?xcache=1122" alt="" /></p>

<p>So here's what comes next:   </p>

<p>When I write about "spreadsheets," I'm thinking about technology bringing a real innovation to market.   Spreadsheets were a breakthrough in modern business because they took new technologies - low-cost PCs, high-resolution displays and comparatively large amounts of <span class="caps">RAM </span>- and combined them into a facile metaphor that fit a rich set of problems.    Hadoop and MapReduce are terrific but they are elemental -- they provide a rich, parallel, functional-programming approach, but they remain basically metaphor-free.     They are to Big Data what Quicksort is elementary computer science -- a nice step beyond <em>Bubblesort</em>, but in themselves just tools.    The Killer App lies elsewhere.</p>

<p>For that reason I think <a href="http://www.datameer.com/">Datameer</a> and <a href="http://www.factual.com/">Factual</a> are a step forward in the routinization of big data, but I don't think they've got it yet either. The metaphor is still wrong.   Visicalc and Lotus 1-2-3 were a big step forward because they gave a hands-on way for non-IT people to grasp the rows-and-columns world of financial analysis. The impedance barrier went away because you could make financial models in a visual domain-specific language (DSL) that mirrored the world you were modeling. </p>

<p><img src="http://blogs.msdn.com/blogfiles/devschool/WindowsLiveWriter/WOWLuaWorldofWarcraftandAI_A08E/clip_image004_thumb.jpg" alt="" /></p>

<p> The <span class="caps">DSL </span>has to match the world you're modeling, thus I expect that jamming big data into a spreadsheet today will be like jamming financial calculations into Wordstar would have been back then. It's a step forward (maybe a big one) but the gestalt will arrive elsewhere.   When I wrote "big data needs a spreadsheet" in the past <a href="http://www.pikasoft.com/journal/2011/1/29/spreadsheets-for-the-new-millennium.html">Spreadsheets for the New Millennium</a> what I meant was that big data needs a metaphor and a <span class="caps">DSL </span>-- a way to put big data <strong>understanding</strong> into the hands of everyday users. Putting big data in a spreadsheet is a start, but these aren't rows-and-columns problem domains and stuffing them into rows and columns might provide some facility, but at a cost of richness and understanding.   Big Data deserves its own metaphor and a <span class="caps">DSL</span> ... somebody's incubating it ... even as I type this ... now, where is it???  In my next post I'll lay out <em>a few steps to the epiphany</em>.</p>

<p><img src="http://profile.ak.fbcdn.net/hprofile-ak-snc4/50252_321170220545_6621995_n.jpg" alt="" /></p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11554492.xml</wfw:commentRss></item><item><title>The Big Easy - Spreadsheets for the New Millennium Part 2</title><dc:creator>John Repko</dc:creator><pubDate>Thu, 19 May 2011 19:30:50 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/5/19/the-big-easy-spreadsheets-for-the-new-millennium-part-2.html</link><guid isPermaLink="false">512153:5946984:11512052</guid><description><![CDATA[<p><img src="https://s3.amazonaws.com/jkr_images/burboun_st.jpg" alt="" />
Back in January I wrote a post that I called <a href="http://www.pikasoft.com/journal/2011/1/29/spreadsheets-for-the-new-millennium.html">Spreadsheets for the New Millennium,</a> and in that posting I suggested that Big Data would never take hold in the public consciousness until there were gateways to it -- tools that could let anybody play with it, tools that could make it <em>easy</em>.   </p>

<p>I love the idea of Mardi Gras -- everybody picks a time (the Tuesday before the start-of-Lent Ash Wednesday) and a place (New Orleans' Bourbon St.) to get together to have a party.   I love it because that <strong><em>practically never happens</em></strong> in technology.  As much as we celebrate advances in technology, we need to celebrate them because progress (both discovery and adoption) is <strong><em>so hard</em></strong>.  </p>

<p>Why does it seem to be so hard for communities to assemble and take up advances in science?  This is one of the key questions in Thomas Kuhn's <a href="http://www.amazon.com/Structure-Scientific-Revolutions-Thomas-Kuhn/dp/0226458083/ref=pd_bbs_sr_1?ie=UTF8&amp;s=books&amp;qid=1230764643&amp;sr=8-1">The Structure of Scientific Revolutions</a>  </p>

<p>What does it take for an idea to break through?  Here is Kuhn's answer:</p>

<blockquote><p>Kuhn saw that for a new candidate for a new candidate for paradigm to be accepted by a scientific community, "First, the new candidate must seem to resolve some outstanding and generally recognized problem that can be met in no other way.</p></blockquote>

<blockquote><p>Second, the new paradigm must promise to preserve a relatively large part of the concrete problem solving activity that has accrued to science through its predecessors..."</p></blockquote>

<p>Or, as Steve Jobs might say:  <em>Think Different, <strong>but not Too Different.</strong></em></p>

<p>For Big Data to become mainstream technology we need to satisfy two conditions:</p>

<ol>
<li><strong><em>Solve a generally-recognized problem that can be met no other way.</em></strong>  Check - Progressive Insurance can give you <a href="http://www.cringely.com/tag/bigtable/">an estimated insurance quote on the spot</a>, but only because they pre-calculate a rate quote for every car in the US every night -- using Hadoop.   The problem and solution are simple, but without Hadoop the could never generate <em>every car, every night...</em></li>
<li><strong><em>...Preserve its predecessors.</em></strong>   <span class="caps">FAIL.   </span> Hadoop is a terrific tool, but unless you've had the good fortune to take <span class="caps">MIT'</span>s 6.001 or Yale's CS 323 you've probably never seen anything like it.  </li>
</ol>

<p>The Hadoop community has desperately added <a href="http://wiki.apache.org/hadoop/Hive">Hive</a> and <a href="http://pig.apache.org/">Pig</a> to try to reduce the foreignness-barrier of functional programming  and get Hadoop over Kuhn's second barrier.</p>

<p>Brook Byers of Kleiner Perkins put my onto Kuhn back when I was at Stanford, so it's no surprise that <span class="caps">KBCP </span>announced a $9M round with the Hadoop-y Big Data company <a href="http://techcrunch.com/2011/05/16/kleiner-perkins-leads-9m-round-in-apache-hadoop-based-analytics-platform-datameer/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29">Datameer, announcing</a></p>

<blockquote><p>Datameer’s Analytics Solution, which integrates the data mining power of Hadoop with a spreadsheet interface, enables business users to run analytics against very large data sets with no programming required. The product is designed to help users with little to no computer engineering experience handle massive amounts of data.</p></blockquote>

<p>Kleiner and Datameer aren't alone in the race for Spreadsheets for the New Millennium, <a href="http://www.readwriteweb.com/start/2010/12/big-opportunities-for-startups.php">Factual</a> is another player that's raised a lot of smart money.</p>

<p>These are great approaches, but in the best open-source tradition we'll bring up a solution that does a lot of the same things -- based in open-source code -- in one of my next postings.</p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11512052.xml</wfw:commentRss></item><item><title>Not "Big Data" -- FAST Data</title><dc:creator>John Repko</dc:creator><pubDate>Sat, 14 May 2011 03:23:11 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html</link><guid isPermaLink="false">512153:5946984:11456959</guid><description><![CDATA[<p><img src="https://s3.amazonaws.com/jkr_images/big_vs_fast.jpg" alt="" /></p>

<p>One of the great things about working in technology is that it's marked by seasons, and watching the seasons you learn that you can plan what's ahead.   Much as a robin is a harbinger of Spring,  a new McKinsey Technology report signals the arrival of a new technology for McKinsey to ponder.    Now, I like McKinsey -- they are fine strategists, and I have a bunch of friends from Stanford <span class="caps">GSB </span>who landed there.   Still, as technologists they might do well to stretch out their fingers and login a bit deeper sometimes.  At the beginning of 2009 their <a href="http://www.isaca.org/Groups/Professional-English/cloud-computing/GroupDocuments/McKinsey_Cloud%20matters.pdf">Clearing the air on cloud computing</a> declared that <strong><em>'Cloud computing' is approaching the top of the Gartner Hype-cycle.</em></strong>   That was two full years ago, and the clouds haven't exactly burned off "cloud computing" since.</p>

<p>Well, like the baddies in Poltergeist <span class="caps">II, </span><em>"They're baaaack…"</em>   This time McKinsey weighs in on Big Data, and in classic McKinsey fashion they deliver terrific facts without providing any insight on why all this is happening around them.    McKinsey's latest, <a href="http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation">Big data, the next frontier for innovation, competition and productivity</a>, starts with the common Big Data red herring: <em>"...the volume of data is growing at an exponential rate..."</em>  which is indisputable but totally misses the point.     Data has been growing exponentially since at least the <span class="caps">IBM</span> 360 era -- almost 50 years now.  The key point is <span class="caps">NOT </span>that the data is "Big."   The data has always been big.  The question is not <em>Why Big Data?</em> but <strong><em>Why Now?</em></strong> </p>

<p><img src="https://s3.amazonaws.com/jkr_images/google-instant.jpg" alt="" /></p>

<p>The answer is not "Now the data is big" -- the answer is "Now the data is fast!"  Google didn't become Google because their data was big --  Google went to MapReduce so they could keep growing the number of sites-crawled while still returning results in &lt; 200 milliseconds, and now they're going to <a href="http://en.wikipedia.org/wiki/Google_Instant#Instant_Search">Google Instant</a> because even 200 milliseconds isn't fast enough anymore.   Consider all the action we're seeing today in NoSQL data stores -- the point is <span class="caps">NOT </span>that they are big -- the point is that apps need to quickly serve data that is globally partitioned and remarkably de-normalized.   Even the best web-era app isn't successful if it isn't fast. </p>

<p>So for now let's forget about McKinsey.   If you are looking for opportunity, the question to ask is <span class="caps">NOT </span>"Where is there Big Data?"  the question to ask is <strong><em>"Where can fast data really make a difference?"</em></strong></p>

<p><em>…Even the best web-era app isn't successful if it isn't fast…</em>  This is the thinking that brought all the NoSQL data stores to social networking software.   The new applications like Twitter and Facebook are huge and distributed but still have to be fast.    To the billionaires who founded them, throwing out the conventions of the Relational model was a small price to pay for the success and scale that speed brought.</p>

<p><img src="https://s3.amazonaws.com/jkr_images/NoSQL.jpg" alt="" /></p>

<p>The core idea behind HBase and Cassandra as NoSQL leaders is that they may be schemaless (which is nice for web data) but they are not unstructured!   What makes the column-oriented databases so magical is that they avoid the "6-JOIN" database push-up problem that Dare Obasanjo wrote up in <a href="http://www.25hoursaday.com/weblog/2007/08/03/WhenNotToNormalizeYourSQLDatabase.aspx">When Not to Normalize your <span class="caps">SQL</span> Database</a>.   To get speed we're willing to make compromises with some of the core components of heretofore-modern data processing.     To get speed we change some of the rules of the game.</p>

<h3>Here are the new rules for software delivery in the Web era</h3>

<ul>
<li>You have 100 milliseconds to respond to a user action in a web application.  This is where we ended up in my last post:  <a href="http://www.havemacwillblog.com/2009/02/why-google-won-in-the-search-market/">Much over 100ms == <span class="caps">FAIL</span></a>.   </li>
</ul>

<ul>
<li>100ms is one tough target, because<ul>
<li>Accessing a web server in Palo Alto from a site in NY costs 50-80 ms just in latency (unless you can increase the speed of light)</li>
<li>Every router-hop = about 3ms</li>
<li><span class="caps">ESB </span>response times = 10s of ms (maybe 100s)                              </li>
<li><span class="caps">XML </span>marshalling / unmarshalling = 10s of ms (maybe 100s) <br />
(this is why <span class="caps">JSON </span>is replaces <span class="caps">XML </span>in web apps)</li>
<li>DB access = ~1 (good) to 10 (cheap) ms -- this is why <a href="http://www.25hoursaday.com/weblog/2007/08/03/WhenNotToNormalizeYourSQLDatabase.aspx">6-JOINS</a> <span class="caps">FAIL</span></li>
</ul></li>
</ul>

<p>So if we believe that the faster cobra <strong>always</strong> wins, here are the rules that fall out from this:</p>

<h3>Rules for App Delivery in the Web Age</h3>

<ol>
<li>You need to cache data near users -- cross-country transmission = <span class="caps">FAIL </span>right off the bat (too far, too many hops to be fast)</li>
<li><span class="caps">ESB</span>s for enterprise apps may be fine, but probably fail for web apps</li>
<li><span class="caps">XML </span>went away in web-space because it had to (JBoss' Marc Fleury once wrote a great article on this)</li>
<li>One DB access is not fatal, but 6-10 surely are -- thus for the biggest data we find no <span class="caps">JOINS, </span>no Transactions, no Stored Procedures, and ultimately NO <span class="caps">DATABASES </span> (see the classic <a href="http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf">eBay Architecture</a>, and note that eBay has already moved most DB ops into App/RAM space (slides 22-23)) for web apps</li>
<li>Zero lookups are better than one:  Hello <a href="http://www.memcached.org">Memcached!</a></li>
<li>If you have a <span class="caps">DB, </span>you better get all you need with that one lookup.   Thus columnar databases like HBase and Cassandra -- if I lookup "Bill Smith" I get a big chunk of <span class="caps">EVERYTHING </span>known about Bill -- I then work on the chunk, and write it to storage as an object.  <span class="caps">RAM </span>is cheap, busses are fast and this approach works in web-app-land.</li>
<li>"Eventual" consistency is fine, as long as you have some idea of how eventual "eventual" really is</li>
<li>Hadoop can prowl around in background, making sure our data stores all eventually sync up</li>
<li>Conventional data models no longer work here -- they the world of fast big data is all about <a href="http://en.wikipedia.org/wiki/Denormalization">denormalization</a> and <a href="http://en.wikipedia.org/wiki/Data_deduplication">deduplication</a></li>
</ol>

<p>In this big data-driven world the data model morphs to provide the fast data that apps require.  We thus have a new kind of app / data model -- much more object-oriented than the pure-data stores that have taken us this far.     </p>

<p>In this world we go to NoSQL for access speed, and gain all kinds of other processing possibilities in the process.  The beauty of Google (and other similar Hadoop-y efforts) is that once you get used to working in a Googleplex with MapReduce as a routine operation, you discover that there are all kinds of other operations that you can do in similarly massively parallel fashion.    It's likely that most of the wins we're seeing in Big Data are coming, not from intrepid data explorers, but from routine operations-people who went in looking for speed and figured out that the approach yielded other discoveries as well.</p>

<p>What would happen if we <span class="caps">STARTED </span>with a data framework with an infinite distributed data store, MapReduce built in for unstructured data analysis, and Apache <span class="caps">SOLR </span>as well for free-text search and structured data querying?   We'd have an environment where the speed was free, and we could devote our energies to finding patterns in data.  Now <strong><span class="caps">THAT</span></strong> would be magical…</p>

<p><img src="http://drzaius.ics.uci.edu/blogs/setbang/raiders_of_the_lost_ark_1.jpg" alt="" /></p>


<p><em><strong>Indy: That's the Ark of the Covenant.</strong></em> <br />
<em><strong>Elsa: Are you sure?</strong></em><br />
<em><strong>Indy: Pretty sure.</em></strong></p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11456959.xml</wfw:commentRss></item><item><title>Cobra strike - 100 milliseconds to understanding new architectures</title><dc:creator>John Repko</dc:creator><pubDate>Wed, 11 May 2011 11:58:24 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/5/11/cobra-strike-100-milliseconds-to-understanding-new-architect.html</link><guid isPermaLink="false">512153:5946984:11427700</guid><description><![CDATA[<p><img src="https://s3.amazonaws.com/jkr_images/king_cobra.jpg" alt="" /></p>

<p>I've written many times now about NoSQL architectures and the rise of whole new species of data stores as <em>the software of the Facebook age.</em>  But what's going on here, really?   As Ian Fleming wrote in <a href="http://en.wikipedia.org/wiki/Auric_Goldfinger">Goldfinger</a>:</p>

<blockquote><p> "Once is happenstance. Twice is coincidence. The third time it's enemy action."</p></blockquote>

<p>I've probably written a dozen of the same pieces now on "new software architecture," so let's 1) take a look at what this is all about, and 2) let's see if we can see where it's all headed.</p>

<p>We see so many new components (Hadoop, NoSQL, Sphinx/SOLR, Node.js) with seemingly nothing to link them, other than as different exotic beasts in the new-software zoo.   There are some fundamental truths behind why Mutual of Omaha's Software Kingdom is featuring them now, and Google has the answer.  Not "Google the search engine" ... but Google <strong><em>the company</em></strong>.</p>

<p>Robin Bloor put a nice light on this back in 2009 with <a href="http://www.havemacwillblog.com/2009/02/why-google-won-in-the-search-market/">Why Google Won In The Search Market</a>.    In that post, Bloor might have been thinking of Google VP Marissa Mayer's famous <em>...Users really respond to speed...</em>  <a href="http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html">quotation</a> when he wrote:</p>

<blockquote><p><em>We can normally react to a stimulus in the 140-200 millisecond range, which is great news for cobras, because it takes a cobra about 100 milliseconds to bite. To put it another way, if a cobra is within striking range and it decides to bite you, it’s too late to stop it. If the mouse pointer moves more than 100 milliseconds after you move the mouse, it feels slow.</em></p></blockquote>

<p>That brings me to the fundamental truths of modern software that link all the beasts in our zoo and point the direction ahead.  They are:</p>

<ul>
<li>In a high-resolution, handsetted, wifi'd world the distinctions blur between Enterprise software, Desktop software, and Handset software.  It's all just software, delivered as a service everywhere</li>
</ul>

<ul>
<li>If your software can't respond in about 100 milliseconds you're dead.  Down to 100ms the faster cobra always wins.</li>
</ul>

<ul>
<li>If you can't make down to 100ms, it doesn't matter how "good" your architecture is.   It fails.</li>
</ul>

<p>These three rules explain a lot about what's going on in software today.  In my next postings we'll do a quick tour of the zoo with these new perspectives, and introduce a really neat package that is a harbinger of where this all is headed.</p>

<blockquote><p><em>I've got 911 on speed dial.</em>  ~  Douglas Coupland</p></blockquote>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11427700.xml</wfw:commentRss></item><item><title>Wicked Fast</title><dc:creator>John Repko</dc:creator><pubDate>Sun, 24 Apr 2011 00:16:14 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/4/23/wicked-fast.html</link><guid isPermaLink="false">512153:5946984:11247946</guid><description><![CDATA[<p><img src="http://www.floppingaces.net/wp-content/uploads/Memorex-man-in-chair-e1292523824959.jpg" alt="" /></p>

<blockquote><p>“Now, here, you see, it takes all the running you can do, to stay in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”   Lewis Carroll ~ Alice's Adventures in Wonderland</p></blockquote>

<p>I really love Ruby on Rails.  My biggest pet peeve with software development platforms has always been their quest for generality -- "with our program, you could build <em><strong>anything</strong></em>, from an iPhone tic-tac-toe app to systems code for the Space Shuttle!"   The problem here is that nobody wants to build just "anything" -- people's needs at any given time tend to be pretty  specific.   A platform that claims to be good for everything is generally good for <strong><em>nothing</em></strong>. That's where Rails comes in -- web apps is all it does.   </p>

<blockquote><p>The best frameworks are in my opinion extracted, not envisioned. And the best way to extract is first to actually do. ~  David Heinemeier Hansson ~ Ruby on Rails</p></blockquote>

<p>This is where the "PT boats to Battleships" metaphor I wrote about in my last post comes in.  I believe Rails is unbeatable for web apps, <em>as long as the definition of a web app doesn't change.</em>   It was perfect for what it did -- but do we still do that anymore? </p>

<p>As I mentioned last post -- the world is changing and the old patterns may not work anymore.  So what do you do?   Is there a "Rails" for Ajax applications between handheld devices?</p>

<p><a title='By Michael Borejdo [Public domain], via Wikimedia Commons' href='http://commons.wikimedia.org/wiki/File:NodeJS.png'><img width='240' alt='NodeJS' src='http://upload.wikimedia.org/wikipedia/commons/6/67/NodeJS.png'/></a></p>

<p>There is -- or at least there's the start of a platform built around a <strong>very</strong> different set of assumptions of what Internet applications are all about.   It's called <a href="http://www.nodejs.org">Node.js</a>, and it springs from work that Ryan Dahl first published in 2009.   </p>

<p>Node is really interesting and it builds on a capability of its core JavaScript language that Joel Spolsky wrote about in 2006: <a href="http://www.joelonsoftware.com/items/2006/08/01.html">Can Your Programming Language Do This?</a> -- the ability to package rich objects (including inline functions) as parameters in function calls.  Hmmmm ... this sounds like this could get deep and theoretical... <em><strong>but stay with me:</strong></em> here's why it matters:</p>

<ol>
<li>In the web-pagey world, to respond to a request you compose and send a page.  With simple web pages you can do this sequentially and are probably fine, and threads are there to bail you out wherever you aren't fine.</li>
<li>Today, with Big Data databases and media files, you might get a request and not know if that request is <span class="caps">EVER </span>going to complete!?  Processes block, and even the fastest processor can't do much while it's just sitting and waiting.</li>
<li>The solution?  A non-blocking architecture.  It's fine to have long requests -- as long as you're not stuck waiting for them to finish.... <em>so how do we do that?</em></li>
<li>This is the problem Dahl solves with Node.js.   Node is an event-driven architecture -- when requests come in, Node processes them by attaching a callback routine to them and launching them, and then moving on to the next request.   </li>
</ol>

<p>This is the perfect architecture for a modern web age with a mix of skinny and chunky requests.  Rather than grinding through it all in sequence, you tell each request <em>"Here you go -- call me when you're done..."</em> and move on to the next thing.   It's a clean approach, and with modern JavaScript engines, such as Google V8 or Apache SpiderMonkey, this kind of approach is fast.</p>

<p>Wicked fast.</p>

<p>Node.js is tight and clean, and it's amazing what you can get done with just a little code.  Like all Unix-y code since <a href="http://www.amazon.com/Programming-Language-2nd-Brian-Kernighan/dp/0131103628">Kernighan and Ritchie</a>, Node.js has its Hello World app:</p>


<pre><code>var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(8124);
console.log('Server running at http://127.0.0.1:8124/');</code></pre>

<p>The explanation of the code is really simple -- it's just as it reads:</p>

<ol>
<li>Create a <span class="caps">HTTP </span>server</li>
<li>Make it request / response</li>
<li>Write a 200 code for success with plain text</li>
<li>Write "Hello World"</li>
<li>Listen on port 8124</li>
<li>Tell the console that we're listening on 8124</li>
</ol>

<p>It really is just that simple.  I'm not sure how you'd make space shuttle code with it, but if you're looking for evented web apps with a tiny footprint, Node.js is it.   For this blog posting I wanted to try something a bit bigger, so I put a little NodeJS-driven weblog up on the Amazon cloud:</p>

<ul>
<li>You can access (and enter postings on) the blog here: <a href="http://jkr-blog.dyndns.org:8124">nodejs_blog blog site</a></li>
<li>You can download the source code here: <a href="https://s3.amazonaws.com/file_repository/nodejs_blog.zip">nodejs_blog source code</a></li>
</ul>

<p>The code was adapted from the <a href="http://expressjs.com">Express</a> examples site, and while the blog isn't exactly full-featured, I probably don't have 100 lines of code invested in it, and it is...</p>

<p>Wicked Fast.   </p>

<p>As you can see, my blog may be tiny, but it's in the best <a href="http://www.youtube.com/watch?v=Gzj723LkRJY">15 minute tradition</a> of new web platform development.  Let's see some pictures:</p>

<p>First, Let's go to the site and create a new posting:<br />
<img src="https://s3.amazonaws.com/file_repository/Blog.jpg" alt="" /></p>

<p>Then we'll enter some text into it and save it:<br />
<img src="https://s3.amazonaws.com/file_repository/Blog-1.jpg" alt="" /></p>

<p>And finally see the summary of what we've done so far:<br />
<img src="https://s3.amazonaws.com/file_repository/Blog-2.jpg" alt="" /></p>

<p>One of the beauties of Node.js with the Express package is that, despite its simplicity, it is still full Model-View-Controller, so setting up the code was easy, and laid out in a nice, clean beautiful way:<br />
<img src="https://s3.amazonaws.com/file_repository/TextMate.jpg" alt="" /></p>


<p>There's a lot to write about Node.js, its package manager npm, and development packages like Express, Connect, and Websockets/socket.io, and those will come in other posts.  There's a lot here -- maybe the future of the handheld, small-screened, peer-to-peer web.</p>

<p>It really is ... <span class="caps">WICKED FAST</span>!</p>

<blockquote><p>"This is your last chance. After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes."  Morpheus ~ The Matrix</p></blockquote>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11247946.xml</wfw:commentRss></item><item><title>Back to the Future</title><dc:creator>John Repko</dc:creator><pubDate>Sat, 23 Apr 2011 19:24:08 +0000</pubDate><link>http://www.pikasoft.com/journal/2011/4/23/back-to-the-future.html</link><guid isPermaLink="false">512153:5946984:11242707</guid><description><![CDATA[<p><img src="http://farm2.static.flickr.com/1027/757609629_508f24829d.jpg" alt="" /></p>

<blockquote><p>Don't worry. As long as you hit that wire with the connecting hook at precisely 88mph the instant the lightning strikes the tower... everything will be fine. ~ <a href="http://www.imdb.com/title/tt0088763/quotes">Back to the Future</a> (1985)</p></blockquote>


<p>One of the great challenges of working in technology is that patterns of thinking change quickly and from time to time, no matter how wired-in you are, you discover that <em>everything you know is <strong>wrong.</strong></em>  Novelist William Gibson is right: <em>the future <em><strong>has</strong></em> already arrived -- it's just not evenly distributed</em>...   When I first learned Ruby on Rails back in 2006, it struck me as a wondrous advance on the Java development I was doing.  Java had bulked up as an Enterprise solution so now, 5 years later, it's little surprise that <a href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=1&amp;ved=0CBUQFjAA&amp;url=http%3A%2F%2Fwww.infoq.com%2Fnews%2F2010%2F01%2FThoughtWorks-Technology-Radar&amp;ei=WjCzTf25Eob4sAOfm_HvCw&amp;usg=AFQjCNFjOuonXpLB_khQhW3XI7ZQgUGV5A">Java End of Life</a> is something Thoughtworks worries about.</p>

<p>In tech we often see the tail end of Clayton Christensen's <a href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=1&amp;ved=0CCMQFjAA&amp;url=http%3A%2F%2Fwww.amazon.com%2FInnovators-Dilemma-Revolutionary-Business-Essentials%2Fdp%2F0060521996&amp;ei=9zCzTbGBGpO0sAPApLXoCw&amp;usg=AFQjCNFPhK2S5md4XrpYVUNyb38XnuyOMA">The Innovator's Dilemma</a>.   In <span class="caps">TID, </span>disruptive technologies catch on because whatever they lack in robust features they make up for in agility.   With time, though, the PT Boats grow into Battleships, and the cycle starts anew.</p>

<p>There are signs that this is happening now with Internet technology -- our toolsets (like Rails) have grown so fit to the task that they seem a bit ponderous as the task shifts.   With enough shift we again conclude that everything we know is wrong and the cycle starts again.</p>

<blockquote><p>"It's not what you don't know that kills you, it's what you know for sure that ain't true." ~ Mark Twain</p></blockquote>

<p>Here's what we know about Internet technology today:</p>

<ul>
<li>"Computers" are how people interact with the Internet</li>
<li>Modern apps display web pages and submit information</li>
<li>Pages are served from servers (of course)</li>
<li>The client-server Internet model works fine</li>
</ul>

<p><span class="caps">WRONG, WRONG, WRONG, </span>and <span class="caps">WRONG.  </span> Here's the world we've been living in for a while now:</p>

<ul>
<li>Today there are more wireless handsets than there are people on earth</li>
<li>In 2011, nobody updates a whole page anymore -- Ajax rules</li>
<li>To paraphrase Bill Joy -- no matter where you are, most of the interesting content is somewhere else (on someone else's handset)</li>
<li>Pages are easier -- and if we wait maybe those pesky smartphones will <strong>just go away...</strong></li>
</ul>

<p>We're ready for a new programming world, and I've been investigating that new world for a while now.   With my next post post I'll write up what I've found.  As in the sound clip below, you may not be ready for this yet -- but it'll be here soon <em>"...and your kids are gonna love it!"</em></p>

<p><embed type="application/x-shockwave-flash" flashvars="audioUrl=https://s3.amazonaws.com/file_repository/johnny_b_goode.mp3" src="http://www.google.com/reader/ui/3523697345-audio-player.swf" width="400" height="27" quality="best"></embed></p>
]]></description><wfw:commentRss>http://www.pikasoft.com/journal/rss-comments-entry-11242707.xml</wfw:commentRss></item></channel></rss>
