Thursday
Jul232015

Beyond Scrum - The Evolution of Software Development

To Infinity and Beyond

It might have seemed unkind to sack Scrum and Agile development methodologies in my previous post.  After all, they are an improvement on the time-honored "Waterfall" method. Few know that their predecessor Waterfall was itself based on a mistake, a mistake so appalling that even it's inventor, Winston Royce, wrote in the paper that delivered Waterfall to the world:
“I believe in this concept, but the implementation described above is risky and invites failure.”
The bar was low, and Scrum and Agile methods easily cleared it.  We can't forget, though, that they were the products of their time, and many of the things they took for granted may have applied only during that tiny sliver of Internet-bubble time.    Let's take another look at the implied software development environment of the approach that I wrote up last time (paraphrased):
  • We’re all co-located...
  • We don’t know how to write requirements...
  • Our leader doesn't work here / is a consultant...
  • Our entire team are newbies, so we'll
    • let everyone own all the code
    • track performance daily with check-ins and velocity
    • eliminate any concept of seniority
  • We’ll code in sprints that last from 2-4 weeks each
    • it’s only payroll
    • if its something big we’ll just have the sprints go on forever
    • everything can be fit into a 2-4 week sprint
  • We'll deliver software at the end of each sprint.
    • the software doesn’t have to be great, just delivered
These assumptions might have been been fine for Internet-bubble days, but in 2015 and beyond we can do better.  We won't throw the (Scrum-)baby out with bathwater, but that implied Scrum environment is no longer valid for software development teams today.

"The real voyage of discovery consists not in seeking new landscapes, but in having new eyes." ~ Proust

"Form follows function" ~ Ludwig Mies van der Rohe
Let's start with some new assumptions and our development approaches will follow:
  1. The development team is distributed.   EVERY team is distributed, even if they are co-located!   Work-from-home vs. work in the office.  "Morning people" vs. Evening (or 2:00AM) people, "flu season."    The first rule of new development is to have everyone on the team work in the best environment for them. The second rule of software is to enough eyes, all bugs are shallow. We will need great collaboration tools for software review and pair programming.
  2. Timely delivery is nice, but compelling delivery captures markets.    The Apple App Store is a microcosm of the software industry overall: your software might not have 1.5 million applications competing with it, but you surely have competitors and (as Bill Gates and Mark Zuckerberg can confirm), in large and small winner takes all.   If you write software you better be ready to put a ding in the universe, otherwise you and your "snowflake" will disappear completely.
  3. Different components have different delivery schedules.   The transition to microservices is well underway, but until that transition is complete we will be writing software with widely varying levels of granularity.   Form and schedule follow function -- let the software determine it's own best delivery schedule.
  4. Timing?  Long sprints and short sprints are false timings -- in 2015 we deliver software continuously (with Jenkins, and increasingly with Docker) and take advantage of the heartbeat that the regular old workweek gives our teams.
  5. Fast-Track everything that can be fast-tracked.   The Project Management Institute has the right idea here - serialization is the enemy of compelling, continuous delivery.
  6. "Pull" development beats "push".   Pull has revolutionized manufactured components, why shouldn't it also revolutionize conceptual ones?   If you want compelling delivery (per point #2), then you need a buy/sell market where the customer has the right of refusal.
These are the rules -- software has come a long way since the days of Chrysler Payroll.  Let's move forward...
"Scrum programmers are destined to code Scrum sprints for the rest of their lives, and thereafter." ~ John Repko (after Bertrand Meyer)
 

 

 

Sunday
Jul192015

Fighting the Last War — Software Development Beyond Agile and Scrum

The Last War...

"If you don't show Kansas, Oz isn't all that special." 

~ Frank Decker, on adding a dystopian LA backstory to the movie Demolition Man

"Generals always fight the last war"

~ Edward P. Warner (1934)

1. The Scientific Revolution Ahead

For the next Jonas Salk to be a mathematician we need to use new tools to provide a mathematical focus on previously unsolvable problems.  Software lies between that vision and the practical world we live in, but software has long been a gap that has kept mathematical insights from becoming real world solutions.   We have 200,000+ clinical trials' worth of data in storage, but what use are they if you can't search them, analyze them for results and plan next steps from there?   "Big Data" is a catch-all term for the new kinds of analysis that are tailored to massive or genomic data sets, but our software deficiencies only start with wrestling the data.   The world ahead needs two things:  1) a way to channel the myriad human communications  (spoken, texted, emailed, webbed, blogged, fitbitted, jawboned and more) into an Interstellar-tesseract library for review,

The Tesseract

and 2) a better way to write the tools and applications that we'll need to navigate and draw wisdom from that tesseract.

2. The World We've Come From

It's said that "generals always fight the last war," so it shouldn't be surprising that while big data tools are new, our software development approaches are still the ones we developed when "the internet changed everything."   Extreme Programming and the Agile Manifesto got us started, and the backsides of the manifesto authors is an appropriate image for how that manifesto suits us now in the Age of Analytics.  Time will make fools of us all, but the principles from 2001 read like platitudes now -- they are not wrong, per se, but a lot has changed since 2001, and we can choose better rules today.  Let's start with some things that have changed:

  • Teams are more likely to be distributed now.   Daily stand-ups will have a different flavor when no two people are on the same continent or in the same time zone, rather than the same room.
  • Web development has matured.   In the late 90's everyone was a "web newbie."   The game had already changed when David Heinemeier Hansson released a video building a weblog in 15 minutes, and even that is almost 10 years ago now.
  • Web tools have matured.   Java, .Net, Rails, Python and PHP are old hat now, and RCS got replaced by Subversion and then Git as the tools raced ahead even faster.
  • Communications tools are richer now.   By now most developers have video-conferenced with Facetime or Skype or Lync, IM'd with Yammer or Campfire, wiki'd with Sharepoint or Basecamp, and done code reviews through Git (or read the sordid code history with Git Blame).
  • The competition to your software is a lot greater than it was then.   The godfather of agile methodologies, Extreme Programming (XP), was introduced in 1996 by Kent Beck when he was working on a Chrysler payroll project.  Chrysler payroll?   Not exactly the hair-on-fire environment that Jamie Zawinski lived in at Netscape, much less edgier parts of Facebook or Google.    Kent Beck 1) had an internal audience 2) who had no competitor software to choose from, and 3) couldn't effectively decline to use his software on release.    Apple iStore now has 1.5 million iPhone apps, available for review and updated continuously.    Kent Beck did great work, but the iStore represents a massively more competitive environment than he faced in Chrysler.
According to Conway's Law, modern software designs match the communication structures of the organizations that built the software.  Cobol mainframe software was thus the product of The Organizational Man age.  Here I'll posit Repko's corollary, which states that modern software methodologies match the prevailing organizational mores of their era.   Thus it was that 2000-era technology mores that gave us Scrum:
  • Here in Payroll...
    • We're all co-located, so we'll improve communications with morning stand-up meetings
    • We don't know how to write requirements that are a) rigorous enough to be developed swiftly, nor b) flexible enough to stand the test of time, so we'll have a Product Owner that we can rap with on a daily basis
    • Our leader is a consultant who doesn't work here, so we'll define the role of Scrum Master and (like our consultant) that role will have power but not authority
    • Web development is new and Java is even newer, so our entire team are newbies.   We'll handle this by
      • letting everyone own all the code -- easier to spot problem-children, and to rebalance the team if we have to "cull the herd"
      • tracking performance daily with check-ins and velocity (what other profession monitors itself daily?)
      • eliminating any concept of seniority -- if nobody knows anything about web or Java, what's the benefit of seniority?
    • We'll prepare code in sub-projects called sprints that last from 2-4 weeks each
      • it's only payroll -- it's not like we're building an operating system or air-traffic control
      • if it were something big like an operating system, we'd just have the sprints go on forever
      • we don't have any functionality (like architectural advances) that can't be fit into a 2-4 week sprint
    • Our goal is to have deliverable software at the end of each sprint.
      • the software doesn't have to be good (or compete with a dozen similar apps on the iStore), it just has to be deliverable
It's not that I don't like Scrum -- it is easily grasped, it provides a structure to software projects, and it's a positive step forward from the Waterfall software development approaches that preceded it.   It is, though, the product of its times and environment, and competitive-market communications and advanced analytics software needn't be saddled with Internet-bubble-Chrysler-payroll software practices.

While better than nothing, many of the "agile" practices that comprise Scrum don't make a lick of sense anymore.   Their era has passed, and as much as people still bow to Imperial Science of Scrum and its Certified Scrum Masters, late-90's-payroll approaches just aren't going to cut it in 2015.

Scrum is the development methodology of the last war.  A quick note about the image at the header of this posting.   Many readers might suspect that the Pearl Harbor attack on December 7, 1941 may have been a turning point in naval warfare -- the specific point at which aircraft carriers (with their bombers and torpedo planes) became THE capital ship for Navy planning.   Some with greater perspective might even flag the earlier Battle of Taranto (Italy) of 11-12 November, 1940 as that turning point.  In fact we're approaching the anniversary of that turning point, the warfare revolution of its day, which came far sooner.

US General Billy Mitchell came out of World War I with the belief that his aircraft -- 1920's-era biplanes seemingly little-advanced beyond what the Wright brothers flew at Kitty Hawk -- could sink any ship of any navy, and he pleaded for a chance to prove it.   His bombers did surprisingly well in early tests, and so he was presented the ultimate challenge:   The Ostfriesland -- pride of the WWI Germany navy, due for demolition but still thought by many (stop me if you've heard this before) to be "unsinkable."   In an earlier effort, US Secretary of the Navy Josephus Daniels smugly proclaimed:
"I'm so confident that neither Army nor Navy aviators can hit the <ship> when she is under way that I would be perfectly willing to be on board her when they bomb her!"

Good thing he didn't.  He would soon discover the truth of Victor Hugo's words "No army can stop an idea whose time has come."  When the paradigm shifts, you're either on the right side of the shift or you are lost to history.    The first bomb fell at 12:17pm on that day 94 years ago -- July 21, 1921.  The 6 bombers came at two minute intervals, with the last bomb falling at 12:27pm.   The new era of warfare dawned just 13 minutes later, when at 12:40pm the mighty Ostfriesland slipped beneath the waves.

It's time to change the way we write software, and I'll write about that new paradigm:

Beyond Scrum -- The Evolution of Software Development

in my next posting.

 
Tuesday
May052015

Pathways

"Without [taking a process perspective of business], business improvement efforts amount to rearranging deck chairs on the Titanic.”

~ Michael Hammer & James Champy, Reengineering The Corporation

In the Beginning: Reengineering the Corporation

Back when I was a newly-minted MBA, client-server computing had already passed it's sell-by date but the Internet as we know it was yet to be born, so when I left school it was to do business-process consulting at Booz Allen & Hamilton. Process consulting was the province of McKinsey and Booz, Bain and BCG, but Hammer & Champy's Reengineering the Corporation jolted the business world when it came out in 1993. With Reengineering new rules emerged and lots of businesses started dabbling with reengineering.

Those days were not such a great time for Oracle Corporation; in fact Oracle nearly went bankrupt in the early '90s. Seeking a new executive team, Oracle CEO Larry Ellison hired Ray Lane from Booz Allen, and thence began the Oracle takeover of Booz's Information Systems Group: Ray Lane hired Robert Shaw, both hired their lieutenants who in turn hire their lieutenants, and in short order Oracle had about 40 new managers from Booz Allen, including me.

But what were we to do? To a new MBA it was obvious: Oracle had a lot of "hot" accounts that needed serious, young, suit-wearing professionals to keep Oracle from being thrown out of them. We were just the ticket, and a Business Process Reengineering (BPR) consulting group was born inside Oracle. The remarkable thing about BPR at Oracle is that, as misplaced as we were in a purely-techie company, we got really good at it! We started with Hammer & Champy and then wrote our own rules from there.

Our run as reengineering superstars was bound to come to an end, and did so around 2000 for a variety of reasons:

  • Oracle V6 was leading-edge product and the Oracle Applications built on it didn't always work right. Oracle V7 and Smart Client apps were a much sharper effort, so our target-market of enraged clients and accounts simply dried up.

  • Reengineering itself got a flood of new practitioners, who discovered that even if they didn't know what they were doing they could always just drive a bunch of layoffs and (courtesy of the same information flows that made our reengineering efforts) the organization would survive somehow while the consultant basked in the glory of the cost-savings.

Reengineering's death couldn't come a moment too soon, but many of the new reengineering ideas were good ones and are even better today in the advent of Advanced Analytics. The business world was so chastened by "reengineering" that many of these approaches were actively forgotten. Their rebirth has come -- from outside the business world. In 2015 it's time to ask: "Is there a doctor in the house?"

Pathways: Advanced Analytics and Healthcare

"In the next 10 years, data science and software will do more for medicine than all of the biological sciences together."

~ Vinod Khosla, Director, Founder and Partner of Khosla Ventures

In his book The Agenda, Michael Hammer defined a process as “an organized group of related activities that work together to transform one or more kinds of input into outputs that are of value to the customer.” These are simple words and early BPR practitioners probably never dreamed that such a simple idea could become synonymous with 'mass layoffs'.

But so it has, leaving behind a sometimes-insightful body of work in the process. To bring the magic back we have to start with the definition of another kind of process:

In biochemistry, a metabolic pathway is a series of chemical reactions occurring within a cell. In a pathway, the initial chemical (metabolite) is modified by a sequence of chemical reactions. These reactions are catalyzed by enzymes, where the product of one enzyme acts as the substrate for the next

Pathways in biochemistry are analogous to processes in business, but lack the taint that process reengineering wrought in the business world. Biochemist's ignorance really has been bliss, and pathways have brought lifesaving breakthroughs in the world of chronic myelogenous leukemia that I mentioned in my last post. We will start with Jim Collins' brutal facts then walk through the process, with credits to the National Cancer Institute:

  • Prior to 2001, less than 1 in 3 chronic myelogenous leukemia (CML) patients survived 5 years past diagnosis.

  • In 1960, Peter Nowell, M.D., of the University of Pennsylvania, and David Hungerford, Ph.D., of the Fox Chase Cancer Center in Philadelphia, reported finding an abnormally short chromosome in bone marrow cells from patients with CML. The tiny chromosome was quickly dubbed "the Philadelphia chromosome" for the city in which it was discovered.

  • Not until the 1970s did researchers learn how the Philadelphia chromosome was formed. In 1973, using new DNA-staining technology, Janet Rowley, M.D., of the University of Chicago, discovered that chromosome 22 and chromosome 9 had exchanged bits of DNA. This phenomenon is known as chromosomal translocation—when one piece of a chromosome breaks off and attaches to another or when pieces from two different chromosomes trade places.

…things start moving faster now…

  • In the 1980s, Nora Heisterkamp, M.D., then an NCI intramural scientist and now of Children’s Hospital in Los Angeles, and her colleagues figured out that translocation resulted in the fusion of two genes that created a new gene known as BCR-ABL.

  • In 1986, Owen Witte, M.D., and his colleagues at UCLA discovered that this fusion gene causes the body to produce an abnormally active form of an enzyme called a tyrosine kinase that stimulates uncontrolled cell growth in white blood cells.

…A HA! Now we have a mechanism driving growth — the "Achilles heel" for cancer progression! If this highly active enzyme could be suppressed, CML might be treatable. Now we move to the endgame…

  • By this time, Brian Druker, M.D., of Oregon Health and Science University, had already been studying tyrosine kinases as possible targets for precisely aimed treatment and he was focusing on CML as a promising disease to study … Because CML has a singular mutation, these researchers targeted this disease and began screening compounds developed by Dr. Lydon’s lab. Eventually, Dr. Druker found one compound, called STI571, that was more effective than others. This compound, which eventually became known as imatinib, would kill every CML cell in a petri dish, every time.

This is the approach that's driving a scientific revolution in healthcare, and can drive a similar revolution in business. The new approach is, loosely:

Here scientific advances were critical, but the path to victory over CML began with "screening compounds." The best-in-class were identified by the screen, chemically modified to enhance the desired properties (tyrosene kinase inhibition), and the winning compound became Gleevec.

We can adopt a similar approach with Advanced Analytics for a new era of business process revolution. Next we'll talk about how AA can drive new approaches to business.

The Path Forward: a New Look at Business Processes with Advanced Analytics

"We approached the case, you remember, with an absolutely blank mind, which is always an advantage. We had formed no theories. We were simply there to observe and to draw inferences from our observations."

~ Sherlock Holmes - The Adventure of the Cardboard Box

The wonder of BPR in its early days was the belief that if you could just draw up the right process — boxes and arrows and swim lanes — all of the business's problems would be solved. In fact, a great many businesses advanced by getting just such basic controls in place. Even though the image of "BPR" is tarnished today, the word did get out and in 2015 most businesses are at least modestly process-aligned. This is progress, but it's only the beginning.

Eli Goldratt, with his book The Goal accurately pointed out the problems with pure process-ness: All process-steps are NOT equal, and the path to the The Goal required us to a) identify the constrained steps, and b) exploit the constraints. This is also a real advance, but it too is not enough. In a world where processes are in place and the obvious rate-limiting steps have been removed, how can we still move forward? Even in our Hammered, Champied and Goldratted world, it's still all too common to hear:

We have processes and we have data. What we can't explain is why both our mean manufacturing cycle times and our on-time delivery rates are dropping!?

This is where Sherlock Holmes' observation above comes into play, along with another observation commonly attributed to Albert Einstein:

"We can not solve our problems with the same level of thinking that created them"

We can go with more boxes, arrows and swim lanes, but these are what got us to where we are now! We don't need more processes but we do need "new eyes" to discover new insights on our challenges. We already have metrics that describe the nature of competition in different industries; these can give us insights into where to look for new wins in existing business.

Let's consider some different businesses, such as high tech manufacturing, aerospace & defense, and consumer packaged goods. Each of these industries has key performance indicators (KPIs) that describe how they are run — at a high level an Aerospace and Defense company might have the same boxes-and-arrows for core Procurement or Fulfillment processes as a CPG company, but the nature of the two business could scarcely be more different and the critical KPIs for those businesses will be different as well. These key KPIs are where we'll use our new tools to start a new search for business advancement.

The following table shows how some major industries compare in the types of metrics we might look at every day:

Table 1. KPIs For Some Leading Industries

Let's consider this new "pathways" approach for a business that we know. "Rocketera" (a fictional company) manufactures embedded circuit boards for the automotive and aerospace industries. They are a skilled high tech manufacturer, but they also have some real business challenges to deal with:

  • Their customers are major auto companies and defense contractors — big companies with JIT manufacturing and demanding delivery schedules

  • Their suppliers are electronics companies in Taiwan and Japan — products are frequently revised and updated and delivery times for the most complex products can be long

  • Materials management matters a lot to Rocketera — the length of the supply chain, short lifespan of components and the cost of those components makes supply chain monitoring more important in HT-MFG than in almost any other industry

  • Rocketera targets a gross margin of 25%, and keeping gross margins with Moore's Law-era products means that Rocketera has to cut supplied costs by 30% each year — just to stay in the game!

This is just a sample case, but even with these first rough numbers we have a few obvious candidates for analytics and Advanced Analytics:

  • We need better forecasts — here advanced analytics can supplement Rocketera's existing planning system by
    • Reviewing customer order histories to find signals that forecast unexpected orders
    • Tracking media sources on the growth and business prospects of its customers
    • Tracking political trends (as via 538) on key candidates with hawkish or dovish positions
    • Tracking social sources for customer and public opinions on high-profile products or customers

  • Supply chain tracking is also critical — Rocketera can also update it's planning system by:
    • Reviewing supplier delivery histories to find signals that forecast unexpected delays
    • Analyzing the spot market (or setting up an exchange) for critical components
    • Integrating weather updates into supply chain planning for potentially-delayed shipments

The approach shown here is a general one, but advanced analytics with modern tools gives it a new relevance. Every industry group has it's own set of KPIs that describe the locus of competition, and by focusing on the KPIs that provide differentiation we can ensure that even small changes have magnified impacts. The great promise of Advanced Analytics is that it makes business advances possible — even in (in fact, particularly in) businesses that have already crossed their T's and dotted their i's in process and business planning. Done properly, an agile business can move data to information, and from information to wisdom. Pathways are the next step in business planning.

"The real voyage of discovery consists not in seeking new lands but seeing with new eyes."
~ Proust

Friday
Apr172015

Analytics and Healthcare: the Next Scientific Revolution

"The next Jonas Salk will be a Mathematician, not a Doctor."

~ Jack Einhorn, Chief Scientist, Inform Laboratories

"Soon, you're probably not going to be able to say that you're a molecular biologist if you don't understand some statistics or rudimentary data-handling technologies," says Blume. "You're simply going to be a dinosaur if you don't."

~ John Blume, VP of Product Development, Affymetrix in Nature

Pieces of April

I'm excited about the start of baseball season — for someone who grew up in Western New York the coming of April, spring and warm weather is always a good thing. April meant that the days of taking ground balls off of gymnasium floors were over and it might be cold and awful outside, but you were finally out playing baseball! The cold dark winter was finally over…

Later, when I was at MIT and lived in the Boston area, April also meant the Boston Marathon. Patriot's Day was the one day of the year when (in sympathy) I always hoped for cool, damp weather.

This year's April 21 Marathon is a fun one for me, because this year I have a rooting interest and this year (courtesy of the Boston Athletic Association, Nike and the Internet) it will be possible to track the split times of all the runners as they race throughout the day. My rooting interest is for my niece Amanda — Mars rocket scientist at the Jet Propulsion Lab who ran a 3:16 ( WOW! ) marathon out in California last year to qualify. Amanda is a great person, a great runner, and her run Monday is a great win for the next Scientific Revolution that is upon us:

Amanda has leukemia.

Has. Not had, but qualifying for Boston does show that she has a certain leg-up on things. Leukemia. Think of old movie images — Ali MacGraw as Jennifer Cavalleri in Love Story, fading away romantically (if unrealistically) to close the movie… Not anymore — That was then, this is now.

The difference between 2010's medicine and 1970s bathos is advanced analytics — here manifested as rational drug design. As summarized in the article on the breakthrough:

Imatinib was developed by rational drug design. After the Philadelphia chromosome mutation and hyperactive bcr-abl protein were discovered, the investigators screened chemical libraries to find a drug that would inhibit that protein. With high-throughput screening, they identified 2-phenylaminopyrimidine. This lead compound was then tested and modified by the introduction of methyl and benzamide groups to give it enhanced binding properties, resulting in imatinib.

This is where the new scientific revolution is coming from. We've reached the limits of the classic scientific method, where scientific advancement came in a three-step process:

In the world ahead we advance our understanding by:

These two new first steps change the world as we know it.

So, amid the many steps that drive the advance of cancer, there may be some where the cancer shows an Achilles heel — here the rogue bcr-abl protein that is critical to cancer progression. FORGET about "curing" cancer — if you can identify and deliver a mechanism that stops that protein you will stop the progression of the disease and change the cancer from dreaded-evil-of-pulp-movies into just another serious-but-treatable condition.

Rule #1 of the new Revolution: Answers aren't All or Nothing anymore. If you can identify things that make your world a little better, you might then find big wins by targeting their underlying mechanism.

You can track individual runners and results for Monday's Boston Marathon here: Live Raceday Coverage.

Moneyball

So back to baseball. Part of my love of the sport was that it was possible to be an effective baseball player without necessarily being all that great a baseball player. In high school and at MIT I had an OK fastball, no curveball but a surprising knuckleball, and I could throw strikes, hit, bunt and field decently. I was the type of pitcher who had batters cursing themselves on the way back to the dugout, but if I played the percentages I could survive more than I deserved by "throwing junk."

The Michael Lewis book Moneyball is fascinating because it is based on the notion that, to paraphrase The Bard: "There are more ways to win baseball games, Horatio, Than are dreamt of in your philosophy."

The magic here was not that there were still ways for a cheap, bad team to win baseball games. The magic was that, even in as statistical a game as baseball (Proof: How many home runs did Babe Ruth hit? If you know baseball then you didn't have to look that up.), we paid most of our attention to the wrong things!

Everybody who has ever been around a ballpark has heard of the Triple Crown (home runs, RBIs and batting average) for hitters, and W/L records and maybe strikeouts per innings pitched for pitchers. These may be fine, but a very different set of indicators are the statistics that made Moneyball happen.

Before Moneyball, who ever heard of

(and for you Fantasy Baseball GMs)

These bring us to our second rule of the new scientific revolution. Back in the DBS (days before spreadsheets) we worried about things like "statistical sample sizes" because it was practically impossible to track and measure everything.

Again — that was before. With a million rows possible in Excel and smartphone processors 150 million times faster than the computers that took Man to the Moon (much less new tools like Hadoop), our sample sizes can now be the entire population and we can track anything imaginable within that population. In baseball we historically tracked HRs, BA and RBIs because you could calculate them in time for the next edition of the morning paper, not because they were what we should have tracked. So:

Rule #2 of the new Revolution: Your sample size is the entire population: what things do you need to track to make things better in Rule #1?

Application to Modern Medicine and Healthcare

While it's easy to focus on innovations in medicine and patient care from advanced analytics, there are also positive advances in healthcare business management that shouldn't be overlooked when considering data in the healthcare equation. The diagram below shows some of the kinds of advances that are becoming everyday practices for leading healthcare providers.

Integrated patient data in the evolving age of EMR is just one of the places where advanced analytics can improve facility operations. Analytics and heuristics can advance this critical area, potentially augmenting advanced practices such as those outlined by Grant Landsbach in a recent paper.

Assisted diagnosis is another area of rapid evolution. IBM's Jeopardy-winning analytics system 'Watson' has been retargeted at medical diagnosis, and even if the breathless claim IBM's Watson Supercomputer May Soon Be The Best Doctor in the World falls short in actual practice, everyday benefits are likely in all fields of medicine as smart analytics evolve to resemble 'Librarians' from Neal Stephenson's futuristic novel Snow Crash.

Genomic treatment is a third area that is rapidly evolving today. Tamoxifen (where metabolism to endoxifen is genetically limited in 20% of cases) and Plavix (which by a similar mechanism is ineffective in almost 25% of patients) are just two of potentially many cases where medicine must be personalized to be effective.

This brings us to the third rule of the new Revolution. As the sample size grows to the whole population, the target population shrinks to one. I still remember being stunned, reading the headline back in 1991 when Magic Johnson announced that he had the AIDS virus. With the devastation legacy that AIDS had wrought by 1991, who would have dreamed that Magic would be alive and well almost 25 years later? We still have no "cure" for AIDS, but drug cocktails have moved from Dallas Buyers Club to real medicine. So, finally:

Rule #3 of the new Revolution: Wins are statistical — just because you don't have the all-or-nothing of a "cure" doesn't mean that progress on a little can't add up to a lot.

The Revolution Realized — Moving to Personalized Healthcare

The Healthcare advanced analytics field is ripe and significant advances are occurring daily. Even in an increasingly rugged business environment, Healthcare leaders are still driving better business practices and innovations in medicine and patient care.

Innovations in healthcare are increasingly appearing from outside of the classical scientific method, and these advances in management and patient care may match the revolutionary breakthroughs from Lister and Semmelweis more than a century ago. Leukemia treatable? — sure, we've had a range of approaches for fifty years now. But glioblastoma multiforme?

Advanced analytics is changing medicine and healthcare, and innovative leaders are changing the practice of medicine and with it are changing life as we know it. Managerial and clinical advances are a marathon effort, and many of the tools and techniques for advanced analytics are still in their infancy. We are, both healthcare providers and analytics experts, just getting started.

YAY Amanda! 3:27:42 in driving rain IN THE BOSTON MARATHON!!!

Find your strength. Change the world. Be part of next revolution at TeamAmandaStrong !

Play Ball!

Wednesday
Oct222014

Spark 1.1 live - from Kitty Hawk to Infinity (and beyond...)

"The credit belongs to the man who is actually in the arena … who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.”

~ Theodore Roosevelt

It's not fair to be too hard on technological pioneers; the path to great progress is often marked with fine innovations that are trumpeted as "better than sliced bread", even if later hindsight shows them to be merely VHS ("better than Beta") — a humble step on the road to DVDs and then digital video.

So it has been with Big Data technologies; Big Data was has done great things for my Stanford classmate Omid Kordestani at Google; even if Google doesn't use MapReduce anymore it was still a milestone on our path, not just to the "Internet of Things" but to the hopefully-coming "Internet of Us."

So it's not surprising that Big Data is taking a pounding these days, exemplified by machine learning's Michael Jordan decrying the Delusions of Big Data. This is par for the course; even as advanced analytics becomes too big to simply dismiss the techniques are still subject to the ills that flesh technology is heir to — welcome to the human condition.

Jordan notes:

These are all true - this is the imperfect world we inhabit. I still see great possibilities in big data, and my take on Jordan's comments falls somewhere between physicist Niels Bohr:

"The opposite of a great truth is also true."

and an unknown writer (possibly Vonnegut), who opined:

"A pioneer in any field, if he stays long enough in the field, becomes an impediment to progress in that field…"

Progress changes everything. We must try to imagine the mindset of a Henry Ford, advancing manufacturing processes to put automobiles in the hands of all of his employees; even if he lacked the gasoline to power them; gas stations to fill them or even paved roads to drive them on. The first models were technological marvels of their age, but that doesn't mean we can't laugh at them now:

So it is with the advances of big data technologies. I might reasonably agree with both Jordan and Michael Stonebraker that Hadoop, the darling of the first Data Age, is not just a yellow elephant but has some of the characteristics of a white elephant as well.

I've written about the foibles of Hadoop before. Hadoop is (and continues to advance as) a terrific technology for working with embarrassingly parallel data, but in a real-time world these drawbacks are like a manual crank on a car — it may work, but it's not what everybody (anybody?) would choose going forward. Here's what's wrong:

  • Limited acceleration options
  • Poor data provenance
  • Disc (not RAM) based triply-redundant storage — bulky and slow
  • Slow (HIVE) support for SQL — the data query language that everybody knows

Fortunately, the next step in technology evolution has reached Version 1.1.0 since I last wrote. Spark can solve all of these problems, so let's go get it. Spark can be downloaded from the Spark download site:

Once we've downloaded the latest Spark tar file, we can un-tar it and set it up:

$ curl -O http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0.tgz
$ mv ~/Downloads/spark-1.1.0.tar .
$ cd /usr/local
$ sudo tar xvf ~/Downloads/spark-1.1.0.tar
$ cd spark-1.1.0

Got it! Now let's try running Spark 1.1.0…

$ ./bin/run-example SparkPi 10
Failed to find Spark examples assembly in /Users/jkrepko/src/spark-1.1.0/lib or /Users/jkrepko/src/spark-1.1.0/examples/target

Whoops — spoke too soon. Let's build Spark, starting with Hadoop and including Scala and any of the other tools we'll need.

First let's install hadoop 2.4.1 by downloading our choice of Hadoop version from our chosen download mirror.

Once the Hadoop 2.4.1 download is complete, we untar it and symlink it

$ sudo tar xvf $HOME/Downloads/hadoop-2.4.1.tar
$ sudo ln -s hadoop-2.4.1 hadoop

Now we set the ownership of the installed files

$ ls -ld $HOME

which for me gives

drwxr-xr-x+ 127 jkrepko  staff  4318 Oct 20 09:43 /Users/jkrepko

Let me set the ownership for our Hadoop install and we can roll on from here

$ sudo chown -R jkrepko:staff hadoop-2.4.1 hadoop

We can then check the changes with

$ ls -ld hadoop* — which for me gives
lrwxr-xr-x   1 jkrepko  staff   12 Oct 21 10:04 hadoop -> hadoop-2.4.1
drwxr-xr-x@ 12 jkrepko  staff  408 Jun 21 00:38 hadoop-2.4.1

We'll want to update our ~/.bashrc file to make sure our HADOOP_HOME and other key globals set correctly:

export HADOOP_PREFIX="/usr/local/hadoop"
export HADOOP_HOME="${HADOOP_PREFIX}"
export HADOOP_COMMON_HOME="${HADOOP_PREFIX}"
export HADOOP_CONF_DIR="${HADOOP_PREFIX}/etc/hadoop"
export HADOOP_HDFS_HOME="${HADOOP_PREFIX}"
export HADOOP_MAPRED_HOME="${HADOOP_PREFIX}"
export HADOOP_YARN_HOME="${HADOOP_PREFIX}"
export "PATH=${PATH}:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin"
export SCALA_HOME=/usr/local/bin/scala

Now that Hadoop is installed, we can walk through the .sh and .xml files to ensure that our Hadoop installation is configured correctly. These are all routine Hadoop configurations. We'll start with hadoop-env.sh — comment out the first HADOOP_OPTS, and add the following line:

vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh
## export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Next up are our updates to Core-site.xml

vi /usr/local/hadoop/etc/hadoop/core-site.xml

Here we'll add the following lines to configuration:

<configuration>
  <property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
     <description>A base for other temporary directories.</description>
  </property>
  <property>
     <name>fs.default.name</name>
     <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Next up is our mapred-site.xml.

vi /usr/local/hadoop/etc/hadoop/mapred-site.xml

Immediately following installation this file will be blank, but feel free to copy and edit the mapred-site.xml.template file, or simply add the following code to our blank mapred-site.xml file:



<configuration>
       <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9010</value>
       </property>
</configuration>

Our final configuration file is hdfs-site.xml — let's edit it as well:

$ vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Add the following configuration information

<property>
         <name>dfs.replication</name>
         <value>1</value>
      </property>

Finally, to start / stop Hadoop let's add the following to our ~/.profile or ~/.bashrc file

$ vi ~/.profile
alias hstart="$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh"
alias hstop="$HADOOP_HOME/sbin/stop-yarn.sh;$HADOOP_HOME/sbin/stop-dfs.sh"

And source the file to make hstart and hstop active

$ source ~/.profile

Before we can run Hadoop we first need to format the HDFS using

$ hadoop namenode -format

This yields a lot of configuration messages ending in

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at jkrepko-2.local/10.0.0.153
************************************************************/

Just as housekeeping , if you haven't done so already you must make your ssh keys available. I already have keys (which can be otherwise generated by keygen), so I just need to add:

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

I can then confirm that ssh is working with:

$ ssh localhost
$ exit

We can now start hadoop with

$ hstart

Let's see how our Hadoop system is running by entering

http://localhost:50070
Bravo! Hadoop 2.4.1 is up and running. Port 50070 gives us a basic heartbeat:

We've started Hadoop, and we can stop it with

$ hstop

Now that Hadoop is installed, we can build Spark, but first we have to set up Maven properties

Add this to ~/.bashrc

export MAVEN_OPTS="-Xmx2g  -XX:ReservedCodeCacheSize=512m"

Now we can build Spark with the built-in Scala builder. We'll kick that off with the following command. First let's make sure we have version 2.10 or later of Scala installed. On a Macintosh (my base machine here) this is a simple Homebrew install command:

$ brew install scala

Now we can run the Spark build utility:

$ SPARK_HADOOP_VERSION=2.4.1 sbt/sbt assembly
NOTE: SPARK_HADOOP_VERSION is deprecated, please use -Dhadoop.version=2.4.1

The build succeeded, but with the deprecation warning we might do better in the future with something more like:

 $ sbt/sbt assembly -Dhadoop.version=2.4.1

We're now LIVE on Spark 1.1.0! Before we start, let's turn the logging down a bit. We can do this by editing an update to the conf/log4j.properties.template file, and saving that update as conf/log4j.properties, as such:

log4j.rootCategory=INFO, console

Let's lower the log-level so that we only show WARN message and above. Here we change the rootCategory as such:

log4j.rootCategory=WARN, console

Now we're live and can run some examples! Time for some Pi…

$ ./bin/run-example SparkPi 10
Pi is roughly 3.140184

Not bad, but let's bump up the precision a bit:

$ ./bin/run-example SparkPi 100
Pi is roughly 3.14157

Mmmmmnnnn, Mmmmmnnnn good!

There are lots of other great emerging Spark examples, but we're up and running here and we'll stop for now.

It's a long road from Kitty Hawk to the (sadly missed) Concorde or the 787, and we won't get there in just one step. In my next post I'll lay out the toolkit we have today that should take Big Data from the sandy shores of North Carolina and a team of crazy bike guys (who should never have beaten Samuel Langley to first-flight, but did anyway!) to Lindbergh crossing the Atlantic, and maybe even to the DC3 — the airplane that brought air travel (big data?) to everyone.

Ad astra per aspera!