Happy 10 Years Hadoop

Ten Years of Hadoop, Apache Nifi and Being Alone in a Crowd

Hadoop Summit in San Jose this year celebrated Hadoop’s 10th birthday. All of the folks on stage are people who contributed to Hadoop during those 10 years. One of them is Yolanda Davis.

Yolanda and I worked together on a Hortonworks project last year. She was in charge of the user interface design and development team. I caught up with her early in the morning of the last day of Hadoop Summit, and quizzed her on this new project she’s working on that you may have heard of, Apache Nifi. As promised, here is my interview with her on the subject of Nifi and the new HDF (Hortonworks Data Flow) streaming data processing platform, which includes Nifi, Apache Kafka and Apache Storm.

Just to give you an idea of what she’s been doing lately, I’ll start it off with a few Tweets from the end of that day when there was a Birds of a Feather session for all things Nifi and Streaming.

Variable registry for @apachenifi created by @YolandaMDavis – Example of cool tech. (From #Streaming BOF session at #HS16SJ)

“We have all these deficiencies, and @YolandaMDavis swoops in and makes it more useful.” #Streaming and @apachenifi BOF session #HS16SJ

After getting the skinny on Apache Nifi, I also chatted with Yolanda briefly about her experiences as a woman of color in a white male dominated field. And got some info on good organizations to encourage young women to go into engineering fields. I’ll talk about that a bit more at the end.

My friend @YolandaMDavis on what can we do about culture of “Oh” As in Oh! You’re an engineer? @DataWomen #HS16SJ

Paige Roberts: Let’s start with an introduction. What’s your new position?

Yolanda Davis: I’m a Senior Software Engineer. I work with Hortonworks specifically HDF engineering, so Hortonworks Data Flow products and framework, which is powered by Apache NiFi. The goal of that framework is to help deal with the whole data ingress/egress problem. A lot of people are just trying to get their data in, trying to get high quality data, so they can go ahead and process. How can they get there quickly? That is what HDF is helping to resolve: How can we get this data from the edge, process and transform it to a form that we can start doing some real analytics on it. That is what the framework is designed for, and I’m part of that team.

Paige: HDF is all about streaming.  So you guys are working a lot with data in motion, right?

Yolanda: Exactly. It’s all about dealing with the problem of data in motion, at play and at rest. We have NiFi that helps gets you data from multiple, whatever type of resources. If you have like a REST service you’re trying to pull in, absolutely you can do that. Your typical, regular data stores or regular database, you can pull them in that way. Or, we have the newer project that’s on the community called MiNiFi that’s helping us extend the edge even further. So now, whether it’s a particular device that you need to track, a sensor or some other device out in the field, MiNiFi will help resolve that problem. It has an agent that’s out there getting that data from that device and transporting it back to NiFi to do the rest of the processing. So that’s your data in motion.  And the whole deal is not only do you want to capture that data, but you want to see what happens to that data along the way, which is another problem.

Tracking and lineage.

Yeah, tracking the change. What happened when you received it? What did it look like? What are the attributes of that data? What was processed along the way? How do we track it? That’s the provenance data.

All along the way as we process through NiFi, there is a record that’s kept, so you know what happened at this place and time. It’s all based on flow programming. Part of it is that once you reach this step, the data that comes in and gets transformed, it doesn’t care what happened before or after. It spits it back out with the information left behind of, “This is what occurred.”

It’s great for especially those governance challenges where people want to see the history of data, of what happened to it, or even replay what happened in the flow. That is a critical piece to putting that picture together of your whole data journey.

How does it share that tracking and lineage data? Does it integrate with Apache Atlas?

Yolanda Davis

That’s going to be the end game. Atlas will use that provenance data. The whole goal is for that provenance information to be fed into Atlas. Then that can help create a larger picture for you.

I’m working with the Atlas product manager. We’re talking with the PMs from Atlas and from NiFi. Basically what we’re bringing into the game is mainframe data.

Ah, okay. So in terms of connecting through some messaging service or —

We do Kafka.

Yeah, well that’s part of that stack, right? A critical part.

Yeah. It seems like Kafka is becoming almost like a backbone of the stack.

Yeah! It is a backbone. It’s a commit log. It can be supported with HDFS or not, which makes it really, this is just my opinion, but I think it makes it attractive to a lot of people who might not be ready for the whole HDP [Hortonworks Data Platform] yet. They might not have clusters in play. But the thing with HDF, it is a great introduction, right?


If you want to get there and get there quickly. So, as you know, in a previous life, what was I doing to get data?

Yeah. And that was not quick. [laughter] 

It was not quick.  The other challenge, too, was you had to have developers. You had to have a deployment model. And then, if you had to make a change, then you had to have another developer come in and make that change.

With the whole interactive command and control in HDF, it helps eliminate that need. You can create your flow, your environment within this UI, and put it out there without having to deploy some code. That helps to not only eliminate the need for a developer in order to make this happen, but also, you can have different levels of people interacting with the system. Whether it’s testing out within a small test environment or your operational folks, it makes it more accessible.

Then, you get to see real-time live what’s going on with your data, and you can make changes real time live if you need to. That’s what makes it awesome, I think, making it more accessible. You see, even here [at Hadoop Summit], the presentations where, not only were people able to quickly get their jobs done, but also, NiFi has a lot of the controls that you need for guaranteed delivery, and solving the problems that we talked about before – things that people look for in their production environment. The framework comes with them.

So it expands your user base, gets your time to value shortened, and gives you a lot of enterprise features without having to sit down and write code.

Exactly, exactly.

Well, that’s pretty awesome. I want to ask some more Yolanda specific questions, as opposed to Nifi questions, okay?

Okay. I don’t mind Yolanda questions.

You stood up at the Women in Big Data lunch yesterday.

Yolanda Davis at Big Data Women Lunch

I did.

And you were talking about the culture of “Oh.” As in, “Oh, you’re an engineer?”


How did that affect you in your career?

I have to say I’ve been fortunate. I don’t feel in any way that I have been stymied from what I wanted to do. However, I know that walking in the door, I have to assert myself probably more than my colleagues in demonstrating what I know. That is, not only as a woman, but also as a black woman. Just to put the numbers out there, between 1% and 3% of computer science degrees are awarded to African-American women. That’s really specific. So, I know that I am rare, but at the same time, my work speaks for itself.

Even fewer women are going into engineering and data fields now than in the 80’s. There are so many hurdles. Of all those Hadoop committers that got up on stage yesterday, only two were women.

I’ve been fortunate that I’ve had people who helped me along the way, our CEO included. [Rob Bearden]


Yes. I’m going to cite him specifically because he has spoken to me and said, “We value you being here.” I’m appreciative of him acknowledging that. There are people who have supported me along the way always in my career, and helped and guided me, who did not look like me at all. That has been important for me. That is, unfortunately, not a shared experience for many people.

The one thing that I do, on the personal Yolanda side, is I have worked with a lot of organizations that work with kids, to help girls who look like me, to tell them that they are welcome in this field too.  And to help change that culture of “Oh!”

You worked with Black Girls Code in Atlanta, right?

I did. Yes. It’s a great organization. Girls Who Code is making a big difference for people, too, I think. Anything that encourages more young women to get into the field, and doesn’t leave them feeling isolated, like the only ones doing this, is a good thing.

I’m sorry, I’ve got to run to the keynote. My boss is presenting.

No problem. I want to catch that, too. Thanks for taking the time.

Good seeing you again.


During that keynote, all the Apache committers in the audience were asked to come up on-stage to celebrate Hadoop’s 10th birthday. I had to nudge Ryan Merriman to get up there. A lot of programmers are not big on stages, but one thing that stood out like a spotlight when all those open source contributors stepped up, was the maybe 2 women in the whole crowd. Talk about lonely. It really makes you wonder why open source, and software engineering in general, seems like such a men’s club.

During the Women in Big Data lunch, a young intern working on Apache Ambari credited Girls Who Code with getting her started in programming. One of the things she stood up and said during that lunch was, “If you want to get girls involved, don’t just include one girl.” Being the only one who looks like you in any crowd can be intimidating.Girls Who Code and Black Girls Code are great organizations for helping young women get started in the field, and also helping them not feel alone.

Women get satisfaction from working with wicked cool software to solve real world problems.@DataWomen #HS16SJ

Great organizations like that are a good start, but it takes a lot of support over time to counteract the barriers that nearly every woman faces as she progresses in the field. Some articles that give a good idea of the obstacles women face are When Women Stopped Coding and Report: Disturbing drop in women in computing field. I have been told that “Women can’t really program. Their brains aren’t built for that kind of logic,” by a close friend. I had a new co-worker I shared an office with mansplain how to do the job I’d been doing for over a year, and insist that code I had written must have been written by my male predecessor. If I get going on the discouragement I’ve received over the years, I could go all day. The simple truth is that nearly every woman hits discrimination in any tech field, whether it’s overt or subtle. It is a rare woman who hasn’t, at the very minimum, had that experience of looking around a room filled with people in her field, and realizing she’s the only woman there.

My own experience has been similar to Yolanda’s, in that I have been given opportunities, help, and encouragement all along the way by people who didn’t look like me. From the man who gave me an entry level software engineer position because he believed in my ability, despite not having a computer related degree, to the man who groomed me for my first management position, to the man who fought to get me a 25% raise so that I was paid an equivalent salary for the job I was doing, I owe a lot to men in this field who believed in me and gave me a chance. Any man working with a woman can make her job worse or better, just by following Wheaton’s Law, or not. But if you are the manager of women, you may be the one person with the most power to make a difference in a smart woman’s career, either good or bad.

You might make the difference between a rockstar like Yolanda kicking technical butt on your team, or her and her talents leaving for another company, or even another field.

The other people in a position to make a huge difference are other women already in the field. From the woman who hired me when I made the jump from teaching to tech, to the woman who mentored me when I first started as an engineer, to my current boss at Syncsort who gave me a shot at yet another new level of challenge, women in tech can make a huge difference for each other. At the very minimum, you can make sure that the other women working with you don’t have to feel alone in a crowd.

Hadoop and the open source community have accomplished a lot in the last 10 years, with women largely excluded. I’d love to see what they could accomplish, if when the number says 20 on that big slide, half those folks on stage are brilliant women like Yolanda.

Happy 10 Years Hadoop

Related Posts