Archive for SQL in Hadoop tag

What You Never Knew About Vertica Could Surprise You

On March 11, 2019 in Analytics, Big Data, Data Management, Life

I just started working on the Vertica team. As the “new guy,” I’ve been cramming as much Vertica information into my brain as possible in the shortest time possible. Some things really surprised me, and I bet they’ll surprise you, too.

Tags: big data, databases, Kafka, RDBMS, real-time, SQL in Hadoop, streaming data, Vertica

Orc O'Malley of the Yellow Elephant clan says LLAP

Owen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC

On September 12, 2016 in Big Data, Hadoop, Spark

Owen O’Malley is one of the folks I chatted with at the last Hadoop Summit in San Jose. I already discovered the first time I met him that he was the big Tolkien geek behind the naming of ORC files, as well as making sure that Not All Hadoop Users Drop ACID. In this conversation, I learned that Hadoop and Spark are both partially his fault, about the amazing performance strides Hive with ORC, Tez and LLAP have made, and that he’s a Trek geek, too.

Tags: ACID, big data, Flink, Hadoop, Hadoop Summit, Hive, Hortonworks, MapReduce, ORC, real-time, Spark, SQL in Hadoop, Storm, streaming data, Syncsort, Tez

Holden Karau's audience at High Performance Spark preso at Data Day Texas

Interviews with Brilliant People on Hadoop and the Future of Big Data Tech

On July 6, 2016 in Big Data, Data Management, Hadoop, Life, Spark

I have been doing some very cool interviews with brilliant people, usually at events like Strata + Hadoop World and Hadoop Summit. The intention is to use their brilliant thoughts so that I don’t have to take the extra time to come up with my own. Not to mention I get the bonus of learning new things, and getting the unique perspectives of folks who really know their stuff. Nothing like learning tech from the folks who literally wrote the book on it.

Tags: big data, Data Day Texas, Data Geeks, Hadoop, Hadoop Summit, Hive, Hortonworks, life, MapR, NASA, ORC, Spark, SQL in Hadoop, Strata Hadoop World, streaming data, women in tech

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

On September 22, 2015 in Hadoop, Life

A couple of months ago, I found myself without a full time gig for the first time in decades, and I did a little freelance blogging. Being an overachiever, I wrote such a long post for Adaptive Systems Inc. that I broke it into two parts. The first part got published before I dove head first into documenting and unit testing a big Hadoop implementation. The second part got published last week.

It was interesting reading my opinions on the nature and comparative strengths of the various strategies and technologies from a few months ago. It had been long enough that I didn’t remember what I’d written. I got a kick out of comparing my perspective, now that I have some recent hands-on experience digging through Hive code, comparing query speed with ORC vs without, or with MapReduce vs Tez.

Tags: Hadoop, Hive, life, MapReduce, ORC, schema on read, schema on write, SQL in Hadoop, Storm, Tez

Abusing Shakespeare and Celebrating Independence

On July 2, 2015 in Hadoop, Life

For the last few weeks, I have been pondering whether to move on to another company, and take a permanent position, or to launch my own thing, to become an independent analyst / consultant. And my decision is …

Tags: Hadoop, Hortonworks, life, schema on read, schema on write, SQL in Hadoop

The Tragedy of Tez

On June 23, 2015 in Big Data, Hadoop

Tez is one of the marvelous ironies of the fast moving big data and open source software space, a piece of brilliant technology that was obsolete almost as soon as it was released. In the second in my series of short posts on Hadoop data processing frameworks, I’ll look at the bouncing baby born of the Stinger Initiative, and point out where it’s ugly.

Tags: big data, Flink, Hadoop, Heron, Hive, Hortonworks, MapReduce, Spark, SQL in Hadoop, Storm, Tez

Hadoop Can’t Do That

On June 1, 2015 in Data Management, Hadoop, Life

I just got back from a little executive summit conference in Dallas for Chief Data Officers. Frustratingly, I heard a lot of folks telling me what Hadoop CAN’T do. Now, I know that Hadoop can’t bring about world peace or get my husband to put the toilet seat down, but the things people keep saying it can’t do are things that I’ve personally DONE on Hadoop clusters, so I know they’re doable.

If you asked most people if water could cut through steel, they would probably tell you it can’t. They would be wrong, too.

Tags: ACID, Actian DataFlow, Actian Vortex, big data, Hadoop, KNIME, life, MapR, Splice Machine, SQL in Hadoop

In-Memory wave crests in-chip wave coming

In-Memory Analytic Databases are So Last Century

On May 19, 2015 in Big Data, Data Management, Hadoop

In an article written last year by an industry analyst that I respect, IDC’s Carl Olofson, he gave the impression that in-memory analytics are the wave of the future, the new paradigm for high performance analytic databases. He said, “embrace the new paradigm and plan for it.”

For once, I didn’t agree with him.

In-memory analytics are last decade’s revolution, or even last century’s. The wave of the future is something far faster, and far more revolutionary.

Tags: ACID, Actian Vector, Actian Vortex, big data, databases, Hadoop, IBM, in-chip, in-memory, Oracle, RDBMS, Sisense, SQL in Hadoop, vector processing

Two Ways to Bridge the Big Data Analytics Skills Gap

On May 4, 2015 in Analytics, Big Data, Hadoop, Life

All of the people at Big Data Tech Con were clearly invested in improving their skills to help address the gap between the business need for big data skills and the people who can meet that need. I proposed that there were two ways that analytics software could help.

Tags: Actian DataFlow, Actian Vortex, big data, Big Data Tech Con, Hadoop, KNIME, SQL in Hadoop

Not All Hadoop Users Drop ACID

On April 6, 2015 in Data Management, Hadoop

In the age of businesses with data that lives on dozens or even hundreds of servers, expecting transactional integrity and data consistency and currency are old-fashioned notions. On Hadoop, you just have to settle for the new NoSQL standard of BASE and eventual consistency. That’s what they say. But, as usual, “they” are wrong. Not all Hadoop users have to drop ACID…

Tags: ACID, Actian Vortex, big data, Hadoop, HBase, Hive, Hortonworks, Splice Machine, SQL in Hadoop

Big Data Page by Paige

Thoughts on Analytics, Software and Data Management

Archive for SQL in Hadoop tag

What You Never Knew About Vertica Could Surprise You

Owen O’Malley on the Origins of Hadoop, Spark and a Vulcan ORC

Interviews with Brilliant People on Hadoop and the Future of Big Data Tech

Schema on Read vs Schema on Write and Why Shakespeare Hates Me

Abusing Shakespeare and Celebrating Independence

The Tragedy of Tez

Hadoop Can’t Do That

In-Memory Analytic Databases are So Last Century

Two Ways to Bridge the Big Data Analytics Skills Gap

Not All Hadoop Users Drop ACID