Look, Ma, I’m a Data Scientist!

Posted by Robert Merrill on January 31, 2012 under Data Scientist | Be the First to Comment

Tom Groenfeldt just told me what to start calling myself.

What’s a data scientist?

In “Big Data Needs Data Scientists, Or Quants, Or Excel Jockeys,” he quotes Randy Lea at Teradata’s Aster Center of Innovation, who “defines a Data Scientist as a person with mathematical and statistical skills, an investigative mind, an understanding of computer languages like C++ and Java,” and ability to write code. Groenfeld’s own definition includes, “multi-skilled experts who understand programming, large-scale mathematics, statistics and business.” That’s me!

Why I claim to be enough of a data scientist to be useful (a.k.a. War Stories)

(Robert, I believe you—spare me!)

In graduate school, I crunched daily sounding-balloon observations around 10 years worth of typhoons and 27 years worth of hurricanes. I then built a bigger data set of winds from passenger jets and cloud motions in satellite loops and used that to study the exhaust plumes from hurricanes.

At the National Hurricane Center, I used multivariate analysis to build a statistical model to predict hurricane intensity—the first one to include environmental conditions as well as storm history. We had specially-modified confidence limits on our F-Tests to prevent spurious selection of predictors by our stepwise regression package (I’ll be glad to explain why that’s a problem in a way you’ll understand). The data set was small by modern standards but was pushing the limits of what we had to work with for storage—6250 bpi 9-track tapes.

At the University of Wisconsin, I built and analyzed sets of weather satellite data for all sorts of things–winds from cloud motions on satellite loops, apparent temperatures of the cloud and background to estimate the altitude of the moving cloud (without which the wind measurement isn’t worth much), and the air temperatures in the centers of hurricanes. The software we had to put map overlays on satellite pictures broke down over the polar regions, so I re-wrote it using vector algebra instead of trigonometry (I’ll be glad to help you understand why that was the right solution, even though management rightly questioned me when I said we needed to  “rewrite the whole thing”—you should never let a programmer do that!)

The map outlines on our satellite pictures weren’t very good, so they asked me to redo them. I met my next Big Data (at the time), the Defense Mapping Agency’s Digital Chart of the World. It came on 4 CD-ROMs. We had a computer in the library with a CD reader on it. The problem of picking out the layers we wanted, especially just the “major” lakes and rivers, was too complex for me to keep straight and implement in C, so I asked my boss if I could learn C++ and program it in that (something else you should never do—let a programmer use a newly learned language on a major project!) My auto-feature extractors were not quite perfect, so I taught myself enough Java (the new new thing in 1995!) to build a little graphical editor to clean up our map overlays.

Lately, I’ve studied Bioinformatics, learned R, and used it and the Processing data visualization toolkit to visualize and understand xDSL broadband speeds.

Oh yeah, back at UW, I also figured out how to get an Excel spreadsheet to fit B-splines to gridded weather data. We needed to figure out how to get the boundary conditions right, and being able to visualize what was going wrong using Excel charts made all the difference.

On my last engagement, I really got my hands greasy with what I can only describe as Little Big Data–100K customers out of QuickBooks and into Fishbowl Inventory, including a largely automated address and inactive-customer scrubber, built in Microsoft Access. Probably beneath the dignity of most “Data Scientists,” but at the time, I didn’t know I was one.

Now I do! I’m a “Data Scientist!”

…Or maybe I’m a “Data Dog?”

Until I learned about Data Scientists, I (informally) called myself a “Data Dog”—“Dig up the Good Stuff and roll in it!©” So I’m still not too proud to take on your QuickBooks and Excel mess. But Hadoop and NoSQL are on my to-learn list.

So if you need a Data Scientist to wrangle some Big Data (or Little Big Data) in Madison, WI, I just might be able to help.

Forced March, not Death March

Posted by Robert Merrill on January 17, 2012 under Software Development, Software teams | Be the First to Comment

For almost exactly two months, from mid-November 2011 through mid-January 2012, I was on a Forced March, meaning lots of overtime and weekend work to get a software system up and running. It’s still not live—the sponsor switched from Date-Driven to Done-Driven and slipped the launch date six weeks, but not after I’d worked pretty hard.

These are sometimes called Death Marches, after Edward Yourdon’s book, in turn named after (I guess) the Bataan Death March. This was nowhere close. First, comparing anything that’s ever happened on a software project to the likes of Bataan shows a terrible lack of perspective. Second, even as software projects go, this was short and not at all harmful for someone in my circumstances of life. Third, “death march” implies at best a selfish motive, at worst a truly sinister one, and most likely just macho stupidity (read about Electronic Arts and judge for yourself). On the contrary, this Period Of Harder Work Than I Would Have Preferred had a worthwhile aim—get the new system live in time for year-end inventory and closing of the books. All the people involved were great to work with. I willingly signed up for it.

That’s the first “Lesson from a Forced March.” Sometimes a forced march is the right response to a threat or an opportunity. Wiktionary defines it as, “Soldiers, especially infantry, being made to move at a speed that would normally tire them excessively, to meet a military necessity.” Napoleon I was a master. Says historian Alistair Horne of the French redeployment from the English Channel to the Black Forest in the fall of 1805, “Napoleon moved with this great speed, 200,000 men marching 500 miles in 40 days…So there he has already defeated half the Austrian army.” Napoleon himself wrote of the surrender of 27,000 surrounded Austrians at Ulm, “I have accomplished my object. I have destroyed the Austrian army by simply marching.”

I’ve heard the phrase, “We have an aggressive timeline,” uttered at too many project kick-offs, for no reason other than a sponsor’s ego, or a misunderstanding of software development economics or programmer motivation. Forced marches, even the software kind, are risky. There needs to be a real possibility of a real reward.

If you can’t explain why finishing on that target date is decisively superior to finishing a week later or a month later, you have no reason to undertake a forced march.

But what if you don’t know a forced march will be required until after you’ve hit the trail? There are good reasons why that can happen, and then there’s the reason it happened to me—and the second Lesson from a Forced March.

Lesson #1: A valuable (read non-arbitrary) target date can make a Forced March worth the cost and risk.

Lessons from a Forced March

Posted by Robert Merrill on December 26, 2011 under Software Development, Software teams | Read the First Comment

My primary client is going live in one week (1/2/2012) on an Order Management System that’s been in the works since March 2011 and was selected in August. For me, it turned into a forced march in mid-November—six-day weeks, extra hours, and more stress all around that I would have liked.

It’s part of my professional mission to prevent such things, and I ended up with one anyway. The tuition’s largely paid, so lets at least make sure we get something out of the course.

  • What was the primary cause?
  • What were the contributing factors?
  • What have been the consequences?
  • Could it have been prevented?
  • What can I, and you, learn from it?

Stay metaphorically tuned for a series of blog posts, but not for another week or more. One of the direct consequences was no blogging and very little other marketing activity for uFunctional LLC. The long-term consequences of an empty pipeline are about to become apparent.