David and Goliath

Pitching Stones with David

It’s a brand new year, and I’ve got a brand new job. As of today, you’re looking at the new Product Marketing Manager for Syncsort.

It’s true. After spending half a year doing a little freelance white paper work for the Bloor Group, and documenting for Hortonworks the most complex ETL process I’ve seen in nearly two decades in the business, I’ve found a new home to settle into. I got courted by some Goliaths in the data management software and hardware space, but in the end, I chose a tech savvy David, Syncsort.

So, you might be wondering, who the heck is Syncsort and what do they do? Or, if you know a bit about their history as the company that created a widely used mainframe sort acceleration application, you might be wondering, what the heck is Paige, who is all about the Hadoop, big data, open source, etc. world, doing at a stodgy old mainframe company?

Well, in some ways, I’m still figuring out some of what Syncsort does, but when I was looking around for a new gig, I was very pleasantly surprised to learn some of the cool open source, big data, Hadoop related stuff this little company has been doing.

David

First, the basics: Syncsort makes software that does data integration and data sorting. It’s been doing sort acceleration on mainframes for about 30 years, so this isn’t exactly a start-up company. Pretty much, if you have a mainframe in your business, it probably already has Syncsort software on it.

So, they do mainframe software, right?  The Syncsort tag line is “Big Iron to Big Data.” A lot of the same data wrangling grief that parallel mainframe boxes have been dealing with for ages, are now causing Hadoop admins to tear their hair out. So, Syncsort developed some new technology with their mature data crunching expertise, and opened a whole new market. I will, no surprise here, be working in the big data division.

For a company that’s been around since the dawn of computers, they’ve done some pretty cool, and very modern stuff. For example, they created a mainframe data connector for Sqoop and donated it to the open source world. This, of course, makes getting data moved to new Hadoop clusters from old big iron systems relatively hassle free.

The thing that most people don’t realize about modern computing, is that lots of companies aren’t moving off of their mainframes, they’re just buying better mainframes. And they want those new, better mainframes integrated with the rest of the business. Four years ago, I worked on a big, new integration project with CSC. My job was to translate medical claims EDI data into COBOL data, and pass it into mainframe systems for verification and response. Then I built a translator for the response back to EDI to send out. Mainframes are still a bedrock technology of a lot of modern businesses.

But, still, mainframes have been around forever. That’s some less than exciting technology when compared to the cool stuff happening in the Hadoop ecosystem.

True, but that’s not all Syncsort does. In addition to a Sqoop connector, the big data division developed a modification to the sort aspect of the open source MapReduce 2 API. The modification makes it so that MR 2 workflows can either skip the sort altogether if it’s not needed, plug in a faster parallel sort engine like Syncsort’s, or do hash aggregations or other  forms of reducing that don’t require a full sort. Now, that’s pretty cool.

Anyone who works with data knows that sorting is the most common bottleneck in any data processing job, compounded hugely by the volume of the data. Sort is the nasty resource hog that often crashes systems. One of the weaknesses of MapReduce 1 was that it required sorting and re-distribution of data at every single step. With sorting eliminated when not really needed, that one improvement alone gives MR 2 a significant performance improvement over its predecessor on most jobs. The option to plug in a third-party sort accelerator can speed things up even more. Nicely done.

And that’s just the tech they’ve developed for the open source community. Their proprietary tech is full of even more brilliant gems of innovation to fling in the face of their larger competitors.

In any case, it’s clear that Syncsort really gets the one thing that I think is most important for working with the open source community, you have to contribute. And it’s also clear to me that they have some interesting tech for me to learn, play and experiment with, and extoll the virtues of.

When it came to making a choice where to go next, I opted to walk away from some bigger players with less interesting tech. I am a sucker for small companies building awesome software for data wrangling. And, I am always impressed by companies that slug it out with giants and regularly come out on top due to sheer technological know-how and chutzpah.

David and Goliath

Related Posts