Several options are being touted for doing Hadoop data analytics. Here are a few and their pros and cons as Hadoop alternatives.
I’ve been presenting a bunch and writing some blog posts, all on changes in the data management and analytics industry …
I’ve been doing a ton of writing and speaking, just not here. My first post on Medium is on COVID-19 myths and has a ton of links to reliable data sources to help dispel them. I’ve been writing some on the Vertica blog, doing a few projects for O’Reilly, and I’ve been writing my usual web content and technical architecture papers. But the main thing I’ve spent my time on in the last couple of years is public speaking. I was set to travel 5 weeks out of the last six to speak at conferences. That didn’t exactly happen.
The recent DBTA Data Summit provided a lot to think about. I did a short talk in the “Analytics in Action” track about how data analysts, architects and engineers can turn the endless waves of disruption we keep getting hit with into opportunities to boost bottom line. There were some very cool talks by other folks as well. For me, the highlights of the conference were Michael Stonebraker’s keynote, and the Data Kitchen folks diving into the principles of DataOps.
At the recent Data Day Texas event, I sat down with Davin Potts, who I have known for many years, and had a long conversation about a wide variety of subjects. Over on the Vertica blog, I broke the conversation into chunks, but I wanted to put it all together in one place so you can see what we chatted about end to end. So, here’s all of it, from machine learning to open source, from Python to Knime, and why the heck DO we move data out of a database to analyze it?
I just started working on the Vertica team. As the “new guy,” I’ve been cramming as much Vertica information into my brain as possible in the shortest time possible. Some things really surprised me, and I bet they’ll surprise you, too.
The theme of Data Day TX 2019 was the highly cooperative landscape between proprietary and open source, and a good architect doesn’t choose sides.
A few weeks ago at Hadoop Summit, I caught up with some friends from the project I worked on last year with Hortonworks, including Ryan Merriman who is now an Apache Metron architect. Since Apache Metron was a project I knew virtually nothing about beforehand, I quizzed Ryan about it. The conversation evolved into a discussion of the merits of Storm versus Flink and Heron, something I’ve been meaning to delve into for months here.
A few months back, I was presenting with a friend at a Chief Data Officer summit in Dallas, and my co-presenter put up a slide that said, “60 % of all big data analytics projects fail.” Someone in the audience asked, “Why do they fail?” My friend said, “I think Paige could answer that better than I could.”
Put on the spot, three reasons that have been confirmed from multiple sources jumped immediately into my head. I used those three to answer the question. But later, when I had time to think, I realized there was one other reason that shows up repeatedly, but often gets downplayed or written off as not the REAL problem, when in my opinion, it very much is.