Archive for Tez tag

Orc O'Malley of the Yellow Elephant clan says LLAP

Owen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC

On September 12, 2016 in Big Data, Hadoop, Spark

Owen O’Malley is one of the folks I chatted with at the last Hadoop Summit in San Jose. I already discovered the first time I met him that he was the big Tolkien geek behind the naming of ORC files, as well as making sure that Not All Hadoop Users Drop ACID. In this conversation, I learned that Hadoop and Spark are both partially his fault, about the amazing performance strides Hive with ORC, Tez and LLAP have made, and that he’s a Trek geek, too.

Tags: ACID, big data, Flink, Hadoop, Hadoop Summit, Hive, Hortonworks, MapReduce, ORC, real-time, Spark, SQL in Hadoop, Storm, streaming data, Syncsort, Tez

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

On September 22, 2015 in Hadoop, Life

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

It was interesting reading my opinions on the nature and comparative strengths of the various strategies and technologies from a few months ago. It had been long enough that I didn’t remember what I’d written. I got a kick out of comparing my perspective, now that I have some recent hands-on experience digging through Hive code, comparing query speed with ORC vs without, or with MapReduce vs Tez.

Tags: Hadoop, Hive, life, MapReduce, ORC, schema on read, schema on write, SQL in Hadoop, Storm, Tez

The Spark that Set the Hadoop World on Fire

On August 18, 2015 in Big Data, Data Management, Hadoop

Spark is the darling of the open source community right now. It’s setting the Hadoop world on fire with its power and speed in large scale data processing on Hadoop clusters. Spark is one of the most active big data open source projects, has bunches of enthusiastic committers, has its own group of ecosystem applications, and is now part of most standard Hadoop distributions. Neat trick for a data processing framework that didn’t even start life as a Hadoop project.

Tags: Actian DataFlow, big data, Flink, Hadoop, Heron, MapReduce, Spark, Storm, Tez

The Little Actian DataFlow Engine That Could

Actian DataFlow, the Little Hadoop Engine That Could, But Probably Won’t

On July 27, 2015 in Big Data, Hadoop

In Hadoop’s ecosystem of massively parallel cluster computing frameworks, Actian DataFlow is an anomaly. It’s a powerful little engine that thinks it can take on any data processing problem, no matter the scale. The trouble is that unlike MapReduce, Tez, Spark, Storm and all of the other Hadoop engines, DataFlow is proprietary, not open source.

Tags: Actian DataFlow, big data, Hadoop, KNIME, MapReduce, Spark, Tez

The Tragedy of Tez

On June 23, 2015 in Big Data, Hadoop

Tez is one of the marvelous ironies of the fast moving big data and open source software space, a piece of brilliant technology that was obsolete almost as soon as it was released. In the second in my series of short posts on Hadoop data processing frameworks, I’ll look at the bouncing baby born of the Stinger Initiative, and point out where it’s ugly.

Tags: big data, Flink, Hadoop, Heron, Hive, Hortonworks, MapReduce, Spark, SQL in Hadoop, Storm, Tez

Using MapReduce is Like Plumbing with Pre-Clogged Pipes

On June 16, 2015 in Big Data, Data Management, Hadoop

MapReduce is no longer the only way to process data on Hadoop. In fact, it’s arguably the worst Hadoop data processing framework.

By now, everyone knows how awesome Hadoop is for large scale, data storage, processing and analysis. Hadoop is the darling of large scale data processing, while MapReduce keeps getting nothing but bad press and complaints that it’s too slow, too hard to use, and generally doesn’t live up to its hype. But aren’t Hadoop and MapReduce the same thing?

Tags: Actian DataFlow, big data, Flink, Hadoop, Hadoop Summit, Heron, MapReduce, Spark, Storm, Tez

Big Data Page by Paige

Thoughts on Analytics, Software and Data Management

Archive for Tez tag

Owen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

The Spark that Set the Hadoop World on Fire

Actian DataFlow, the Little Hadoop Engine That Could, But Probably Won’t

The Tragedy of Tez

Using MapReduce is Like Plumbing with Pre-Clogged Pipes