Archive for Analytics category

Technical History of Paige

On October 10, 2020 in Analytics, Big Data, Data Management, Hadoop, Life, Machine Learning, Open Source

For a personal blog, I don’t spend much time on here talking about myself. It’s my place to talk shop, even when there’s no one around to talk shop with. But I gotta wonder, why are you readers taking my word for any of it? You have no idea what I know or how I know. So, I figured there ought to be a post about that.

What is the Best Hadoop Alternative?

On August 28, 2020 in Analytics, Big Data, Data Management, Hadoop, Open Source, Spark

Several options are being touted for doing Hadoop data analytics. Here are a few and their pros and cons as Hadoop alternatives.

Data Management and Analytics Changes and Travelling from My Chair

On July 27, 2020 in Analytics, Big Data, Data Management, Hadoop, Life, Machine Learning, Open Source, Spark, Streaming

I’ve been presenting a bunch and writing some blog posts, all on changes in the data management and analytics industry …

Can Presto SQL on Hadoop Replace Your Data Warehouse?

On July 6, 2020 in Analytics, Big Data, Data Management, Hadoop, Machine Learning, Open Source

Presto is the best of the SQL on Hadoop open source bunch. Why not just use it and ditch your analytical database? Uber knows why …

COVID19 and New Life

On April 27, 2020 in Analytics, Big Data, Life

I’ve been doing a ton of writing and speaking, just not here. My first post on Medium is on COVID-19 myths and has a ton of links to reliable data sources to help dispel them. I’ve been writing some on the Vertica blog, doing a few projects for O’Reilly, and I’ve been writing my usual web content and technical architecture papers. But the main thing I’ve spent my time on in the last couple of years is public speaking. I was set to travel 5 weeks out of the last six to speak at conferences. That didn’t exactly happen.

Tags: COVID19

Paige Roberts presenting to full room with Zynga case study slide showing

DBTA Data Summit – The Rise of DataOps

On June 6, 2019 in Analytics, Big Data, Data Management, Machine Learning, Open Source, Spark, Streaming

The recent DBTA Data Summit provided a lot to think about. I did a short talk in the “Analytics in Action” track about how data analysts, architects and engineers can turn the endless waves of disruption we keep getting hit with into opportunities to boost bottom line. There were some very cool talks by other folks as well. For me, the highlights of the conference were Michael Stonebraker’s keynote, and the Data Kitchen folks diving into the principles of DataOps.

Tags: IOT

Davin Potts, CEO Appliomics, Founder KNIME, Core Python Commiter

One on One with Davin Potts

On April 10, 2019 in Analytics, Big Data, Data Management, Machine Learning, Open Source

At the recent Data Day Texas event, I sat down with Davin Potts, who I have known for many years, and had a long conversation about a wide variety of subjects. Over on the Vertica blog, I broke the conversation into chunks, but I wanted to put it all together in one place so you can see what we chatted about end to end. So, here’s all of it, from machine learning to open source, from Python to Knime, and why the heck DO we move data out of a database to analyze it?

Tags: big data, Data Day Texas, Data Geeks, databases, Kafka, KNIME, RDBMS, real-time, Spark, streaming data

What You Never Knew About Vertica Could Surprise You

On March 11, 2019 in Analytics, Big Data, Data Management, Life

I just started working on the Vertica team. As the “new guy,” I’ve been cramming as much Vertica information into my brain as possible in the shortest time possible. Some things really surprised me, and I bet they’ll surprise you, too.

Tags: big data, databases, Kafka, RDBMS, real-time, SQL in Hadoop, streaming data, Vertica

pile of chains with a gold padlock and key

Data Day Texas: Keep Your Architecture Open and Avoid Mindset Lock-In

On February 21, 2019 in Analytics, Big Data, Data Management

The theme of Data Day TX 2019 was the highly cooperative landscape between proprietary and open source, and a good architect doesn’t choose sides.

Tags: big data, Data Day Texas, Data Geeks, databases, KNIME, streaming data

Cyber Security with Apache Metron and Storm

On August 2, 2016 in Analytics, Big Data, Hadoop

A few weeks ago at Hadoop Summit, I caught up with some friends from the project I worked on last year with Hortonworks, including Ryan Merriman who is now an Apache Metron architect. Since Apache Metron was a project I knew virtually nothing about beforehand, I quizzed Ryan about it. The conversation evolved into a discussion of the merits of Storm versus Flink and Heron, something I’ve been meaning to delve into for months here.

Tags: big data, cyber security, Flink, Hadoop, Hadoop Summit, Hive, Hortonworks, Kafka, life, Metron, Nifi, real-time, Spark, Storm, streaming data

@sogrady Yes, and shampoo, conditioner, and lotion. I'm allergic to silicone in all forms and most stuff in hotels… https://t.co/WTxJNkoHPN
about 1 hour ago
RT @datachick: Agenda - SQL Konferenz 2023 in Hanau https://t.co/bZFmuFIGUA I will be presenting on Database Design Contentious Issues.…
about 6 hours ago
RT @moribajah: Today, I did a trial run of new content I'll be producing via livestreams across different socials. Among other things, we'…
about 6 hours ago
Nothing wrong with capitalism, but when pursuing profit, never forget that you're human, the employees are human, t… https://t.co/QgTGMogehR
about 1 day ago

Big Data Page by Paige

Thoughts on Analytics, Software and Data Management

Archive for Analytics category

Technical History of Paige

What is the Best Hadoop Alternative?

Data Management and Analytics Changes and Travelling from My Chair

Can Presto SQL on Hadoop Replace Your Data Warehouse?

COVID19 and New Life

DBTA Data Summit – The Rise of DataOps

One on One with Davin Potts

What You Never Knew About Vertica Could Surprise You

Data Day Texas: Keep Your Architecture Open and Avoid Mindset Lock-In

Cyber Security with Apache Metron and Storm