Wicked - Cold!

I love computer science for the beauty of it, but often that beauty is obscured by the Moore's Law-rate of progress in the field. Things get so much better so quickly that it's easy to lose track of just how far we've come. It's also possible that even the most profound advances in processing speed get swallowed up by still-faster-bloating software.

The Moore's Law I mentioned above states roughly that computing speed doubles about every two years. So the computers we buy this Christmas should run about twice as fast as computers we got back in 2011. Sounds great, and while we still might be stuck waiting for computers of any generation to boot, worries about things like "speed of macro calculation" are lost to the distant past. Big Data is the computing frontier, because advanced analytics is the only area where calculations aren't instantaneous - yet.

At the other end of the spectrum, computer graphics is one of the best areas to see just how far computer power (and the human ingenuity that it advances) have come. My wife Kate and I went out to see the Disney film Frozen over Thanksgiving, and it was a great example of just how far we've come. The animation was breathtaking and beautiful, and it was even more beautiful because it was INVISIBLE.

Frozen is a fun movie and I hope you go see it, but forget the computer animation for a minute: Great songs and two strong female characters, with the remarkable Idina Menzel as the "bad witch" and Kristen Bell playing the good witch this time. The songs are straight from (or possibly heading STRAIGHT TO) Broadway -- Frozen really not a cartoon animated by computers, but a Broadway Show animated by computers!

Forget "Frozen" -- they might have called it "Wicked - Cold!"

But back to our main point. The wonder of the computer animation of Frozen is NOT that it's wonderful (it is, but we've grown to expect that from that-which-once-was-Pixar). The remarkable thing is that the animation is invisible -- John Lasseter is back with his storytelling genius, and the show unfolds beautifully before our eyes.

John Lasseter is one of the wonders of our time, and it's a joy to see the computers let him tell his story. Such has it ever been: even 25 years ago (in 1988) Lasseter was already at work with a set of Macintosh IIs at his command. Each machine in his arsenel now is roughly 6,000 times more powerful than each box was then, but even then he was the sorcerer and the machines did his bidding.

To appreciate the storyteller and how much richer his current computer canvases are, just watch this wonderful time-piece -- his video "Pencil Test" from way back in 1988!

Mmmmmmmm... Computer animation sure has come a long way in the past 25 years. Craftsmen are still craftmen and whizzy graphics are nothing without a great story and a compelling storyteller. The great wonder of a John Lasseter or a Steve Jobs is that they could envision stories like "Frozen" -- back in 1988 and probably even earlier...

...and that's something we can all enjoy and be thankful for this Thanksgiving.

...Happy Thanksgiving, everybody...!


Pearls After Breakfast ~ Systems Orchestration with Ansible

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. ~ Antoine de Saint Exupéry

I've long been a fan and I've always been amazed at the skill of violinist Joshua Bell. I've seen him in concert a half-dozen times in the past 5 years, and in each performance he has succeeded in producing what I can only call "Joshua Bell Moments" - instances in time, perfect in themselves in which you can just watch, listen and marvel. Bell is a dynamic performer onstage, and it's remarkable to see musical precision generated so dynamically. If you're not familiar with Joshua Bell or his music, you should start with the Pulitzer Prize-winning introduction here: Pearls Before Breakfast

In Pearls Before Breakfast, Bell describes his skills as 'largely interpretive', and his ability to mesh with a symphony orchestra is a wonder to behold. Orchestration in any field is hard, and it's the holy grail of modern computer architectures. MapReduce might be elegant but Hadoop surely isn't - the wonder isn't that it works elegantly, but that it works at all. These days it's routine to require a dozen or dozens of systems to support modern applications, so when a system promises "Radically simple IT orchestration," I've got to take a look.

But is the problem really that bad? Can we just "do it by hand" or simply script these things? In his book, Taste Teste: Puppet - Chef - Salt - Ansible, Matt Jaynes writes

There were about 60 servers altogether and no two were exactly the same since they had all been set up by hand. Over time, the systems engineers would periodically update the servers, but they always seemed to miss some. Every server was a “beautiful snowflake” and unique from all the others.

The development environments were all slightly different too. So, an app release that worked on one developer’s environment, wouldn’t work on another developer’s environment. And an app release that worked on all the developers’ environments would have bugs in the staging environment. And an app release that worked in the staging environment would break in production. You get the idea.

If you've worked in cloud applications you certainly do get the idea. So what Ansible offers is a simple, YAML-based approach to managing the raft of servers that run most modern web applications. Does it work? Is it Radically Simple? Let's try it out and see what we find.

We'll be doing a lot of cloud-work on the new platform that I'll call "Cloudburst", so first we'll update the Linux base machine I'm running on. I'm on Ubuntu 13.04 "Raring Ringtail" for the usual reasons: not because it's necessarily a "better" distribution, but because there is practically limitless documentation on Ubuntu distributions online.

So first let's update Ringtail, and make sure we've got an up-to-date-enough Python version for Ansible:

$ sudo apt-get update
$ sudo apt-get upgrade
$ python --version
Python 2.7.4

Great so far. Ansible does it's work via SSH (not unlike the soon-to-be-written-about Github), so let's make sure our SSH keys are in place as well:

$ ssh-keygen -t rsa
$ cd ~/.ssh
$ cp authorized_keys

Ansible will look for its keys in the SSH directory file "authorized_keys", so once our copy is complete we're ready to install Ansible. Let's do that, then use ifconfig to identify the IP address of our target machine.

$ sudo apt-get install ansible
$ ifconfig
$ sudo vi /etc/ansible/hosts

    # Here's another example of host ranges, this time there are no
    # leading 0s:


$ sudo apt-get install openssh-server

All set! Generally I'll use ifconfig got get the IP address for servers in my cloud stack, but we'll just use IP address of our base Ubuntu machine here. Now let's try a quick call-out from Ansible to our target machine to confirm that our installation is complete, and that all is well:

$ ansible all -m ping | success >> {
    "changed": false, 
    "ping": "pong"

$ ansible all -a "/bin/echo hello Ansible" | success | rc=0 >>
hello Ansible

GREAT! We're almost finished. To confirm Ansible (and have it do a little work for us), I'm going to create a simple YAML file for Ansible to confirm that the web server nginx is installed and the config file is in the right place. Our YAML file will just be a start, and even as an example it's great at showing how closely the YAML description matches our task list. So what we want to do is (in psuedocode):

  • For all of our servers
    • Make sure https support is enabled
    • Make sure nginx is installed
    • Make sure the nginx.conf file is installed in the right place, and
    • Make sure nginx is running

So how much scripting and syntactic sugar do we need to get these tasks done? With Ansible, not much:


- hosts: all

  - name: Ensure https support for apt is installed
    apt: pkg=apt-transport-https state=present

  - name: Ensure nginx is installed
    apt: pkg=nginx-full state=present

  - name: Ensure the nginx configuration file is set
    copy: src=/app/config/nginx.conf dest=/etc/nginx/nginx.conf

  - name: Ensure nginx is running
    service: name=nginx state=started

Let's try it out, and see how our script works:

$ ansible-playbook nginx.yml

Success! This is just the beginning - from here we can use Ansible to

  1. Make a standard server install definition (Note: Not a script!)
  2. Apply that definition to all the servers in our stack
  3. Use the definition for regular stack updates

Our stack in now just another variable in our solutions, and with the setup-magic taken away, we can focus on human interaction and delivery-magic! It might be too much to aspire to the performance of a Joshua Bell, but with Ansible we can set up and tune the whole backing orchestra correctly every time...


Cloudburst - What "Client Server" Grew Up Into

Never trust a computer you can't throw out a window. ~ Steve Wozniak

Computing is not about computers any more. It is about living. ~ Nicholas Negroponte

At this point in the countdown, it seems appropriate to say to the crew of Discovery -- good luck, have a safe flight and to say once again 'Godspeed John Glenn' ~ Scott Carpenter

I grew up in the era of The Computer Wore Tennis Shoes, and I think it's in spite of this mindset that information systems have come as far as they have. Computers won't replace human thought, and that leaves us with the yin and yang of computers as tattletales and spies VERSUS computers as creation and communication tools.

I'm going to side with the creative Miller Puckette's of this world and write a bit today about what the computer - minus tennis shoes, minus client-server, minus "the Internet changes everything!" has evolved into. Back in 2008 I had the privilege of designing an application (the revolutionary Sales Sonar by Innovadex) that employed many of the latest developments in the web delivery stack: Amazon AWS, Ruby on Rails v2.x, nginx, thin, mysql, JQuery, god, haproxy, Google Maps and more. Today I'm going to list out the soul of a 2013 new machine, secure in the knowledge that this genie too shall be surpassed, but equally sure that there will be fun to be had in building a platform of the latest pieces.

Here's where we begin -- I've written about many of the pieces here before, and in 2013 AFAIK the killer environment has/does the following:

  • Web deployed - Everything connects through the 'web
  • Handheld - Never trust a computer bigger than a Whopper
  • Cloud based - Nothing like the ability to spin up dozens of servers when we need them
  • Big Data'd - Lots of the data will unstructured, and we'll want to ETL it with Hadoop
  • Data'd - Our mySQL branch MariaDB will handle routine data chores
  • Git'd - Git is de rigeur now, more and in different ways than Subversion was then
  • Scrumban'd - JIRA is very popular, but for friendliness and flexibility I'm going to give the nod to VersionOne
  • Ruby'd - this is still my favorite language, and with Ruby 2 and Rails 4 new vistas (like embedded applications are possible
  • Handheld - here we'll use two othogonal tools: RubyMotion and Phonegap
  • Automatically deployed - Chef and Puppet are worthy candidates, but we're going to go lighter and faster with Ansible

I'll have lots more tools as people weigh in on their favorites (and slam the unworthy), but this gives us a start so let's kick the tire, light the fire and see how this new platform flies!

Godspeed Scott Carpenter - May 1, 1925 - October 10, 2013


My Maria ... DB

You set my soul free like a ship sailing on the sea
~ "My Maria" - BW Stevenson, (later Brooks & Dunn)

As a former Oracle employee I was pleased today to see the last of the stirring comeback for Oracle Team USA, and thrilled to see some footage shot from 101 California St. in San Francisco by former co-workers at Booz Allen. Faulkner was right: "The past is never dead. It's not even past."

The best part of that past world was working for Chris Russell and meeting with Larry Ellison every Friday (3:00PM or whenever Larry showed up to 5:00PM) to go over our progress with the hosted Oracle offering Oracle Business OnLine. Chris is a fantastic manager and a great person -- we got BOL (as it was called) rolling, and I was thrilled just coming out of our first couple of Larry-meetings with the feeling: "He liked it! We didn't get fired!" Little secret: Larry was a better manager than he ever gets credit for, and/but he is magnificently competitive!

Oracle is a terrific database, but when Oracle acquired Sun Microsystems they also acquired the previously-acquired-by-Sun MySQL database. MySQL is a nice open-source database, and was at the core of Ruby on Rails development efforts practically back to RoR 1.0 back in 2006. Rails has long since broken that direct linkage, but it was a nice luxury to tie in MySQL and get ActiveRecord object-relational management for free. I've missed MySQL, and there have been varied worries that MySQL would be the red-headed stepchild in the Oracle household, but now Maria has stepped in to take our worries away.

"Maria" in this case is MariaDB. As wikipedia notes "MariaDB is a community-developed fork of the MySQL relational database management system, the impetus being the community maintenance of its free status under the GNU GPL." Having a good, trusted, universal SQL database was a terrific luxury, and that sounds enough like the old MySQL battle cry, so let's get started.

The first thing we'll want to do is clear out any installations or vestiges of mySQL that currently exist on our servers. For the MariaDB example, I'm going to clear out both my development machine (a MacBook Pro running OSX 10.8.5) and my target machine (a Linux box running Ubuntu 13_04 Raring Ringtail). Tom Keur offers an nice example here: How to remove MySQL completely Mac OS X Leopard

$ sudo rm /usr/local/mysql
$ sudo rm -rf /usr/local/mysql*
$ sudo rm -rf /Library/StartupItems/MySQLCOM
$ sudo rm -rf /Library/PreferencePanes/My*
$ sudo rm -rf /Library/Receipts/mysql*
$ sudo rm -rf /Library/Receipts/MySQL*
$ sudo rm /etc/my.cnf

That clears out the Mac version, and on Ubuntu a simple...

$ sudo apt-get remove mysql

... should do the trick. Now we'll install MariaDB. For work on a Macintosh we can let Homebrew do most of the work for us. As MariaDB is a drop-in replacement for mysql, once we have it we can have it install itself:

$ brew install mariadb
$ unset TMPDIR
$ mysql_install_db
$ cp /usr/local/Cellar/mariadb/5.5.32/homebrew.mxcl.mariadb.plist \ 

The final plist copy ensures that MariaDB starts up whenever we boot the Macintosh. It's just a little more work on Ubuntu:

$ sudo apt-get install mariadb-server
$ sudo apt-get install libmariadbd-dev

The second call to install libmariadb-dev is something we'll need to install the mysql2 gem on Ubuntu. With our database installed, we'll now install the mysql2 gem to be our database adaptor.

$ sudo gem install --no-rdoc --no-ri mysql2 -- \ 
--with-mysql-dir=$(brew --prefix mariadb) \
--with-mysql-config=$(brew --prefix mariadb)/bin/mysql_config

The commands to make this work on Ubuntu will be familiar to any Linux sysadmin:

  sudo /etc/init.d/mysql start
  sudo /etc/init.d/mysql stop

Now that the installation is complete, lets try a nice standard Rails app to confirm that our DB and adaptor are working properly. Let's create a new Rails app and add in the necessary components to run it. Since I've just updated my development Mac to Apple's new "Mavericks" OSX 10.9 release, let's call our app Mavericks, put in on mySQL, and add in the web server 'thin' and

$ rails new mavericks -d mysql
$ cd mavericks
$ gem install thin
  Fetching: eventmachine-1.0.3.gem (100%)
  Building native extensions.  This could take a while...
  Successfully installed eventmachine-1.0.3
  Fetching: daemons-1.1.9.gem (100%)
  Successfully installed daemons-1.1.9
  Fetching: thin-1.6.0.gem (100%)
  Building native extensions.  This could take a while...
  Successfully installed thin-1.6.0
3 gems installed

Now the final step is to create a test mavericks_development database, and of course we'll add thin to our Gemfile. We'll want our first MariaDB app to do a bit, so lets give it a controller to show us some pages. Here's the command to generate our controller 'pages', and to stub in 'index' and 'about' methods:

$ rails generate controller pages index about

We'll create some "Hello World" text in our pages_controller:

and pass that code to be displayed in our index.html.erb file:

Once we've added these in we can fire the Mavericks application up:

$ rails server

We'll go to the web url of our pages' index page, and here we are:

YAY! Rails is up and running, with our mySQL replacement MariaDB under the hood. Let's take a look at the Rails Environment, and we can confirm that all is well with our mysql adapter as well.

Victory! It's not quite an America's Cup regatta triumph, but we've got a nice defensibly-open database under our application and we can roll forward Oraclelessly from here. But who cares, really? Oracle mySQL goes back to the future with MariaDB, but where does that take us from here?

The answer is: it's all about architecture.

As Proust said, "The real voyage of discovery consists not in seeking new landscapes, but in having new eyes." Once upon a time the IT landscape was all mainframe-based, and no one got fired for buying IBM. Then came the minicomputer and PC ages, both (as DEC's disappearance and Microsoft's present travails show us) replaced by the web. But the web isn't the leading edge of systems design anymore. With MariaDB we've reopened the data layer and in my next posts I'll explore the soul of these new machines -- web enabled and handheld, event-driven and big-data ready. The right architecture will be the DeLorean for our trip into the new millennium of computer services.


Wooden Nickels

It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so. ~ Mark Twain

In theory there is no difference between theory and practice. In practice there is. ~ Yogi Berra

I love Big Data analysis -- the prospect that from mighty oaks of data tiny acorns of insight might be gathered. As much as I love it, I expect that the Gartner Hype Cycle will eventually catch up with it, and here I'm going to make a breathless prediction -- that the Big Data "Trough of Disillusionment" will soon be upon us.

I'm not predicting these analytics sloughs of despond willy-nilly - here the trough will come because of the meshing of 4 specific leading indicators:

  • The Early Adopters (Google with MapReduce, Progressive Insurance with insurance updates for every car, every night, etc.) have succeeded, been recognized and rewarded. The pioneering innovators are doing a victory-lap.
  • Early Majority applications (IBM's Jeopardy-winning Watson, running on $3M of hardware) are HARD
  • Everybody is in the game now, but major new wins are scarce
  • It's easy to fudge - with Big Data / MapReduce data provenance is nonexistent, and all results (even nonsensical ones) can be taken as "valid" if big-enough, complex-enough solutions produced them...
  • Lots of money has been bet on Early Majority solutions (I've been writing about them for four years now), and the bets are still out there...

With this type of dynamic, we might expect a couple of things to happen now:

  • Outcomes start to become selected before there are data sets to justify them, and
  • Managers and executives start learning how to properly question results

My posting last week Glittering Data was the start of a set of posts on how to judge Big Data results -- it talked about data sets that can be shown to have no magic numbers in them. This post is about "wooden nickels" - how to know whether to trust Big Systems or our lying eyes when our results are different from the facts. So let's get started...

In our last, Glittering Data posting we talked about the T-Distribution calculation, and in that example we used it to show that sometimes there really isn't a pony there. We can also use it to show the opposite. Our data might contain magic numbers, just not the magic numbers that we were hoping for...

Let's take a marketing example: You've been running pilot tests of your new car, and you've been collecting data from focus groups, interviews and social media sources. Everybody loves your new car! The data back from your Marketing group shows a clear winner .. then you decide to do some drill-downs...

You start worrying with your first look into the drill-downs. According to Marketing, 70% of pilot customers loved your car - rating it an average of 4 (out of 5) on their scale. But when you dug into individual surveys the results were a bit different: the first dozen reviewers you read hated it, rating it only 2 our of 5. Can these results be real? Let's see what our test shows...

In our T-distribution calculation, we have 10,000 inputs collected with an average of 4 and a standard deviation of 1.3. The first dozen "actuals" you reviewed have a mean of 2.0 and a standard deviation of 1.3. Can such results legitimately have come at random from our larger data set?

Here's what our T-distribution shows:

We might be fine if our test sample failed at 80% confidence interval. Even 90% or 95% might be easily explained away by the way we grabbed our sample. But our tests show that our sample represents different data, at 99.9% confidence interval! Time to take a hard look at the numbers...

There are several explanation that might explain such a discrepancy, such as

  • Our sample was drawn from pre-sorted data, so our results are not randomly selected
  • Sample bias - our sample was drawn from a user set prejudicially disposed against our product
  • Various mathematical errors, in either our calculation or the data selection

But there could be other, darker causes at work

This is where data provenance (still a novelty in Big Data analysis) will be so valuable:

  • Yes - drill-down data can differ from the general Big Data population, but
  • No - the laws of statistics still apply, and if our actuals are that different from our expectations from greater population, then we need to take a hard look somewhere.

In data, getting Trustworthy will be even more important than getting Big was.

The great enemy of the truth is very often not the lie — deliberate, contrived, and dishonest — but the myth — persistent, persuasive and realistic. ~ John F. Kennedy

Page 1 ... 2 3 4 5 6 ... 10 Next 5 Entries »