Archive for Heron tag

The Spark that Set the Hadoop World on Fire

On August 18, 2015 in Big Data, Data Management, Hadoop

Spark is the darling of the open source community right now. It’s setting the Hadoop world on fire with its power and speed in large scale data processing on Hadoop clusters. Spark is one of the most active big data open source projects, has bunches of enthusiastic committers, has its own group of ecosystem applications, and is now part of most standard Hadoop distributions. Neat trick for a data processing framework that didn’t even start life as a Hadoop project.

The Tragedy of Tez

On June 23, 2015 in Big Data, Hadoop

Tez is one of the marvelous ironies of the fast moving big data and open source software space, a piece of brilliant technology that was obsolete almost as soon as it was released. In the second in my series of short posts on Hadoop data processing frameworks, I’ll look at the bouncing baby born of the Stinger Initiative, and point out where it’s ugly.

Using MapReduce is Like Plumbing with Pre-Clogged Pipes

On June 16, 2015 in Big Data, Data Management, Hadoop

MapReduce is no longer the only way to process data on Hadoop. In fact, it’s arguably the worst Hadoop data processing framework.

By now, everyone knows how awesome Hadoop is for large scale, data storage, processing and analysis. Hadoop is the darling of large scale data processing, while MapReduce keeps getting nothing but bad press and complaints that it’s too slow, too hard to use, and generally doesn’t live up to its hype. But aren’t Hadoop and MapReduce the same thing?

Big Data Page by Paige

Thoughts on Analytics, Software and Data Management

Archive for Heron tag

The Spark that Set the Hadoop World on Fire

The Tragedy of Tez

Using MapReduce is Like Plumbing with Pre-Clogged Pipes