big data skills gap

Two Ways to Bridge the Big Data Analytics Skills Gap

I just got back from Big Data Tech Con in Boston. It’s a marvelous con put on by the folks who create SDTimes,  BZMedia. I’m always impressed with the sharp people I meet and the depth of technical presentations there. I did three presentations myself. One was a technical tutorial on using KNIME and Actian DataFlow. One was a commercial presentation on Actian Vortex, To Boldly Go Where No Cluster Has Gone Before. The presentation that seemed to be the biggest hit was a 5 minute lightning talk I did titled “Bridging the Big Data Analytics Skills Gap.” All of the people at Big Data Tech Con were clearly invested in improving their own skills to help address the giant gap between the business need for big data skills and the people who have those skills. I proposed that there were two ways that analytics software could help bridge that gap.

A lot of folks told me that what I said really resonated with them. So, I thought I’d share that short speech with you, as best I can in a blog post. It was an interactive speech, so this may be a bit odd, but here goes.

I started by reading what sounded like gibberish to the audience. They couldn’t see what I was reading, but you can:

public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context

What the heck language am I reading? Klingon? Who knows? Raise your hand. (One brave guy raised his hand and said, Java.)

Technically, you’re correct. That’s MapReduce.

In fact, that’s the beginning of the MapReduce version of a Hello World, Word Count.

So, how many people here, in this room, could sit down and write that code, could write the most basic MapReduce application? (About a dozen hands went up in a group of over a hundred people.)

Keep your hands up for a second and look around. You’re at Big Data Tech Con. You are looking at the highest concentration of MapReduce coders you are ever likely to find in one place.

Businesses right now are desperately searching for people with big data analysis skills. They’re struggling with a MASSIVE gap between the need for big data analytics skills and the people with the skills to fill that need. Businesses are drowning in data because they don’t know a mapper from a reducer. You know what the number one use of Hadoop clusters today is? (Lots of folks mumbling, “storage.” Smart crowd.)

Storage. Coolest, most extraordinary advancement in data management of the century, and it’s getting used as a dumping ground, because businesses are hoping that some day, they’ll find people with the skills to do something with that data.

Bridge to Unknown

I know half you guys spent your day in Spark camp, so I won’t ask if you can write a word count in Scala or Python with Spark. Instead, I want to know how many of you could do that when you arrived here at Big Data Tech Con? (About a half dozen hands went up. All women! Yay, women in tech!)

Keep your hands up and look around. You are all big data analysis professionals, or students here to learn how to become big data analysis professionals, and that’s how rare those skills are.

Ok, I’m going to try this again.

Select * from customer where last_name = ‘Roberts’

Who knows what language I’m speaking now?  (Lots of hands raised. I point at one at random. “SQL”)

SQL. Right. So, how many of you can write a basic SQL statement?  (Nearly every hand in the room goes up.)

Look around. Big difference, huh?

Did you know that you could query hundreds of terabytes of data with SQL and get back a response in seconds? (Some nods.)

I’m not talking about Impala or Spark SQL, both good technologies. I’m talking about something as much as thirty times faster than that, that uses plain old ANSI standard SQL. (Lots of skeptical looks, and thoughtful faces.) Technology like that would go a long way toward bridging that big data analysis skills gap, wouldn’t it? (A lot of nods.)

I’m teaching a class on the technology that drives Actian Vortex, the highest performing SQL in Hadoop on the market by far at 3:15 tomorrow. (They switched that to 11:30 the next day, but c’est la vie.) Bring your laptops, come up to me after the class and I’ll give you a free copy. You can judge for yourself. (I brought nearly a dozen cheap memory sticks with Actian Vector Express on them to the presentation, and still ran out. I told folks they could download it from the website, or if the internet was sluggish, swing by the booth with a laptop and I’d copy it onto their hard drives from my last remaining stick.)

Ok, I got one more question. How many of you can click, double-click, and drag and drop an icon with a mouse? (Half the room’s hands went up. The others were too busy laughing.) I assume that if your hand isn’t up, it’s because both of your arms are broken.

You are the answer to bridging the big data skills gap.

That’s why you’re here, right? To gain big data analysis skills. You already have the basic tools you need, a passion for data, the drive to learn, and the ability to click a mouse.

Did you know that you could design an end-to-end big data analysis workflow using nothing but a mouse driven graphical user interface? (A few nods, a few skeptical faces.)

Did you know that you could execute that workflow on a cluster, and it would leave workflows built by the MapReduce coders in the dust, and run neck and neck with those workflows you just learned how to program with Spark? (Raised eyebrows, skeptical, but interested faces.)

At 8:30 AM on Wednesday, you can design your own analytics workflow in KNIME, an open source graphical, point, click and configure data analytics platform, and I’ll show you how to execute it with Actian DataFlow, a free MapReduce replacer that executes between 10 and 100 times faster. (I had WAY too much material for a one hour class. I’ll propose it as a half day tutorial for the next Big Data Tech Con.)

You’re here to help bridge the big data skills gap. I believe the software needs to meet you half way.

If you have a passion to work with data, Actian will meet you half way. We’ll give you the next generation of data analysis technology that you don’t have to speak Klingon to put to work.


A lot of people stopped by the Actian booth in the vendor area the next day to tell us that what I had to say made a lot of sense to them. Open source software is on the cutting edge with a lot of capabilities, but being easy to use isn’t one of them. Tech and data savvy folks all over the place are working hard, trying to gain new skills as fast as they can to build the bridge across that skills gap. But they can only do so much. If the software, open source or commercial, can build half the bridge from the other side by making skills they already have useful again, that’s a win for everyone.

Bridge over big data skills gap


Related Posts

two Comments

  1. Paige On May 6, 2015 at 21:07

    Lots of discussion on this post ended up on LinkedIn in the Advanced Business Analytics, Data Mining and Predictive Modeling group here:

  2. Danny Myers On July 30, 2015 at 6:38

    Sharp and clear work…I’ve read this post on Linkedln and seen people discuss about the problem, so just visit your blog and give you few supporting words. I really enjoy reading the blog…keep it up!