Archive for Hadoop Summit tag

Orc O'Malley of the Yellow Elephant clan says LLAP

Owen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC

On September 12, 2016 in Big Data, Hadoop, Spark

Owen O’Malley is one of the folks I chatted with at the last Hadoop Summit in San Jose. I already discovered the first time I met him that he was the big Tolkien geek behind the naming of ORC files, as well as making sure that Not All Hadoop Users Drop ACID. In this conversation, I learned that Hadoop and Spark are both partially his fault, about the amazing performance strides Hive with ORC, Tez and LLAP have made, and that he’s a Trek geek, too.

Tags: ACID, big data, Flink, Hadoop, Hadoop Summit, Hive, Hortonworks, MapReduce, ORC, real-time, Spark, SQL in Hadoop, Storm, streaming data, Syncsort, Tez

Ten Years of Hadoop, Apache Nifi and Being Alone in a Crowd

On August 29, 2016 in Big Data, Hadoop, Life, Streaming

Hadoop Summit in San Jose this year celebrated Hadoop’s 10th birthday. All of the folks on stage are people who contributed to Hadoop during those 10 years. One of them is Yolanda Davis.

Yolanda and I worked together on a Hortonworks project last year. She was in charge of the user interface design and development team. I caught up with her early in the morning of the last day of Hadoop Summit, and quizzed her on this new project she’s working on that you may have heard of, Apache Nifi. As promised, here is my interview with her on the subject of Nifi and the new HDF (Hortonworks Data Flow) streaming data processing platform, which includes Nifi, Apache Kafka and Apache Storm.

Tags: big data, Hadoop, Hadoop Summit, Hortonworks, Kafka, life, real-time, Storm, streaming data, women in tech

Cyber Security with Apache Metron and Storm

On August 2, 2016 in Analytics, Big Data, Hadoop

A few weeks ago at Hadoop Summit, I caught up with some friends from the project I worked on last year with Hortonworks, including Ryan Merriman who is now an Apache Metron architect. Since Apache Metron was a project I knew virtually nothing about beforehand, I quizzed Ryan about it. The conversation evolved into a discussion of the merits of Storm versus Flink and Heron, something I’ve been meaning to delve into for months here.

Tags: big data, cyber security, Flink, Hadoop, Hadoop Summit, Hive, Hortonworks, Kafka, life, Metron, Nifi, real-time, Spark, Storm, streaming data

Holden Karau's audience at High Performance Spark preso at Data Day Texas

Interviews with Brilliant People on Hadoop and the Future of Big Data Tech

On July 6, 2016 in Big Data, Data Management, Hadoop, Life, Spark

I have been doing some very cool interviews with brilliant people, usually at events like Strata + Hadoop World and Hadoop Summit. The intention is to use their brilliant thoughts so that I don’t have to take the extra time to come up with my own. Not to mention I get the bonus of learning new things, and getting the unique perspectives of folks who really know their stuff. Nothing like learning tech from the folks who literally wrote the book on it.

Tags: big data, Data Day Texas, Data Geeks, Hadoop, Hadoop Summit, Hive, Hortonworks, life, MapR, NASA, ORC, Spark, SQL in Hadoop, Strata Hadoop World, streaming data, women in tech

Using MapReduce is Like Plumbing with Pre-Clogged Pipes

On June 16, 2015 in Big Data, Data Management, Hadoop

MapReduce is no longer the only way to process data on Hadoop. In fact, it’s arguably the worst Hadoop data processing framework.

By now, everyone knows how awesome Hadoop is for large scale, data storage, processing and analysis. Hadoop is the darling of large scale data processing, while MapReduce keeps getting nothing but bad press and complaints that it’s too slow, too hard to use, and generally doesn’t live up to its hype. But aren’t Hadoop and MapReduce the same thing?

Tags: Actian DataFlow, big data, Flink, Hadoop, Hadoop Summit, Heron, MapReduce, Spark, Storm, Tez

Big Data Page by Paige

Thoughts on Analytics, Software and Data Management

Archive for Hadoop Summit tag

Owen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC

Ten Years of Hadoop, Apache Nifi and Being Alone in a Crowd

Cyber Security with Apache Metron and Storm

Interviews with Brilliant People on Hadoop and the Future of Big Data Tech

Using MapReduce is Like Plumbing with Pre-Clogged Pipes