How Big Data Deep Analysis and Agile SQL Querying Give
2016 Campaigners an Edge to Define and Reach
Individually Qualified Voters
Transcript of a discussion on how data analysis services startup BlueLabs in Washington, helps
presidential campaigns better know and engage with potential voters.
Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett
Packard Enterprise.
Dana Gardner: Hello, and welcome to the next edition to the Hewlett Packard Enterprise
(HPE) Voice of the Customer podcast series. I’m Dana Gardner, Principal Analyst at Interarbor
Solutions, your host and moderator for this ongoing discussion on business
digital transformation. Stay with us now to learn how agile companies are
fending off disruption in favor of innovation.
Our next case study explores how data-analysis services startup BlueLabs in
Washington D.C. helps presidential campaigns better know and engage with
potential voters.
We'll learn how BlueLabs relies on analytics platforms that allow a
democratization of querying, of opening the value of vast big data resources to more of those in
the need to know.
In this example of helping organizations work smarter by leveraging innovative statistical
methods and technology, we'll discover how specific types of voters can be identified and
reached.
Here to describe how big data is being used creatively by contemporary political organizations
for two-way voter engagement, we're joined by Erek Dyskant Co-Founder and Vice President of
Impact at BlueLabs Analytics in Washington. Welcome, Erek.
Join myVertica
To Get the Free
HPE Vertica Community Edition
Erek Dyskant: I'm so happy to be here, and thanks for having me.
Gardner: Obviously, this is a busy season for the analytics people who are focused on politics
and campaigns. What are some of the trends that are different in 2016 from just four years ago.
1
Gardner
It’s a fast-changing technology set, it's also a fast-changing methodology, and of course, the
trends about how voters think, react, use social, and engage are also dynamic. So what's different
this cycle?
Dyskant: From a voter-engagement perspective, in 2012, we could reach most of our voters
online through a relatively small set of social media channels -- Facebook, Twitter, and a little bit
on the Instagram side. Moving into 2016, we see a fragmentation of the online
and offline media consumption landscape and many more folks moving
towards purpose-built social media platforms.
If I'm at the HPE Conference and I want my colleagues back in D.C. to see
what I'm seeing, then maybe I'll use Periscope, maybe Facebook Live, but
probably Periscope. If I see something that I think one of my friends will think
is really funny, I'll send that to them on Snapchat.
Where political campaigns have traditionally broadcast messages out through
the news-feed style social-media strategies, now we need to consider how it is that one-to-one
social media is acting as a force multiplier for our events and for the ideas of our candidates,
filtered through our campaign’s champions.
Gardner: So, perhaps a way to look at that is that you're no longer focused on precincts
physically and you're no longer able to use broadcast through social media. It’s much more of an
influence within communities and identifying those communities in a new way through these
apps, perhaps more than platforms.
Social media
Dyskant: That's exactly right. Campaigns have always organized voters at the door and on the
phone. Now, we think of one more way. If you want to be a champion for a candidate, you can be
a champion by knocking on doors for us, by making phone calls, or by making phone calls
through online platforms.
You can also use one-to-one social media channels to let your friends know why the
election matters so much to you and why they should turn out and vote,
or vote for the issues that really matter to you.
Gardner: So, we're talking about retail campaigning, but it's a bit more virtual.
What’s interesting though is that you can get a lot more data through the interaction than you
might if you were physically knocking on someone's door.
Dyskant: The data is different. We're starting to see a shift from demographic targeting. In 2000,
we were targeting on precincts. A little bit later, we were targeting on combinations of
2
Dyskant
demographics, on soccer moms, on single women, on single men, on rural, urban, or suburban
communities separately.
Moving to 2012, we've looked at everything that we knew about a person and built individual-
level predictive models, so that we knew each person's individual set of characteristics made that
person more or less likely to be someone that our candidate would have an engaging
conversation through a volunteer.
Now, what we're starting to see is behavioral characteristics trumping demographic or even
consumer data. You can put whiskey drinkers in your model, you can put cat owners in your
model, but isn't it a lot more interesting to put in your model that fact that this person has an
online profile on our website and this is their clickstream? Isn't it much more interesting to put
into a model that this person is likely to consume media via TV, is likely to be a cord-cutter, is
likely to be a social media trendsetter, is likely to view multiple channels, or to use both
Facebook and media on TV?
That lets us have a really broad reach or really broad set of interested voters, rather than just
creating an echo chamber where we're talking to the same voters across different platforms.
Gardner: So, over time, the analytics tools have gone from semi-blunt instruments to much
more precise, and you're also able to better target what you think would be the right voter for you
to get the right message out to.
One of the things you mentioned that struck me is the word "predictive." I suppose I think of
campaigning as looking to influence people, and that polling then tries to predict what will
happen as a result. Is there somewhat less daylight between these two than I am thinking, that
being predictive and campaigning are much more closely associated, and how would that work?
Predictive modeling
Dyskant: When I think of predictive modeling, what I think of is predicting something that the
campaign doesn't know. That may be something that will happen in the future or it may be
something that already exists today, but that we don't have an observation for it.
In the case of the role of polling, what I really see about that is understanding what issues matter
the most to voters and how it is that we can craft messages that resonate with those issues. When
I think of predictive analytics, I think of how is it that we allocate our resources to persuade and
activate voters.
Over the course of elections, what we've seen is an exponential trajectory of the amount of data
that is considered by predictive models. Even more important than that is an exponential set of
the use cases of models. Today, we see every time a predictive model is used, it’s used in a
3
million and one ways, whereas in 2012 it might have been used in 50, 20, or 100 sessions about
each voter contract.
Gardner: It’s a fascinating use case to see how analytics and data can be brought to bear on the
democratic process and to help you get messages out, probably in a way that's better received by
the voter or the prospective voter, like in a retail or commercial environment. You don’t want to
hear things that aren’t relevant to you, and when people do make an effort to provide you with
information that's useful or that helps you make a decision, you benefit and you respect and even
admire and enjoy it.
Dyskant: What I really want is for the voter experience to be as transparent and easy as possible,
that campaigns reach out to me around the same time that I'm seeking information about who I'm
going to vote for in November. I know who I'm voting for in 2016, but in some local actions, I
may not have made that decision yet. So, I want a steady stream of information to be reaching
voters, as they're in those key decision points, with messaging that really is relevant to their lives.
I also want to listen to what voters tell me. If a voter has a conversation with a volunteer at the
door, that should inform future communications. If somebody has told me that they're definitely
voting for the candidate, then the next conversation should be different from someone who says,
"I work in energy. I really want to know more about the Secretary’s energy policies."
Gardner: Just as if a salesperson is engaging with process, they use customer relationship
management (CRM), and that data is captured, analyzed, and shared. That becomes a much
better process for both the buyer and the seller. It's the same thing in a campaign, right? The
better information you have, the more likely you're going to be able to serve that user, that voter.
Dyskant: There definitely are parallels to marketing, and that’s how we at BlueLabs decided to
found the company and work across industries. We work with Fortune 100 retail organizations
that are interested in how, once someone buys one item, we can bring them back into the store to
buy the follow-on item or maybe to buy the follow-on item through that same store’s online
portal. How it is that we can provide relevant messaging as users engage in complex processes
online? All those things are driven from our lessons in politics.
Politics is fundamentally different from retail, though. It's a civic decision, rather than an
individual-level decision. I always want to be mindful that I have a duty to voters to provide
extremely relevant information to them, so that they can be engaged in the civic decision that
they need to make.
Gardner: Suffice it to say that good quality comparison shopping is still good quality
comparison decision making.
Dyskant: Yes, I would agree with you.
4
Relevant and speedy
Gardner: Now that we've established how really relevant, important, and powerful this type of
analysis can be in the context of the 2016 campaign, I'd like to learn more about how you go
about getting that analysis and making it relevant and speedy across large variety of data sets and
content sets and so forth. But first, let’s hear more about BlueLabs. Tell me about your company,
how it started, why you started it, maybe a little bit about yourself as well.
Dyskant: Of the four of us who started BlueLabs, some of us met in the 2008 elections and some
of us met during the 2010 midterms working at the Democratic National Committee (DNC).
Throughout that pre-2012 experience, we had the opportunity as practitioners to try a lot of
things, sometimes just once or twice, sometimes things that we operationalized within those
cycles.
Jumping forward to 2012 we had the opportunity to scale all that research and development to
say that we did this one thing that was a different way of building models, and it worked for in
this congressional array. We decided to make this three people’s full-time jobs and scale that up.
Moving past 2012, we got to build potentially one of the fastest-growing startups, one of the
most data-driven organizations, and we knew that we built a special team. We wanted to continue
working together with ourselves and the folks who we worked with and who made all this
possible. We also wanted to apply the same types of techniques to other areas of social impact
and other areas of commerce. This individual-level approach to identifying conversations is
something that we found unique in the marketplace. We wanted to expand on that.
Join myVertica
To Get the Free
HPE Vertica Community Edition
Increasingly, what we're working on is this segmentation-of-media problem. It's this idea that
some people watch only TV, and you can't ignore a TV. It has lots of eyeballs. Some people
watch only digital and some people consume a mix of media. How is it that you can build media
plans that are aware of people's cross-channel media preferences and reach the right audience
with their preferred means of communications?
Gardner: That’s fascinating. You start with the rigors of the demands of a political campaign,
but then you can apply in so many ways, answering the types of questions anticipating the type
of questions that more verticals, more sectors, and charitable organizations would want to be
involved with. That’s very cool.
5
Let’s go back to the data science. You have this vast pool of data. You have a snappy analytics
platform to work with. But, one of the things that I am interested in is how you get more people
whether it's in your organization or a campaign, like the Hillary Clinton campaign, or the DNC to
then be able to utilize that data to get to these inferences, get to these insights that you want.
What is it that you look for and what is it that you've been able to do in that form of getting more
people able to query and utilize the data?
Dyskant: Data science happens when individuals have direct access to ask complex questions of
a large, gnarly, but well-integrated data set. If I have 30 terabytes of data across online contacts,
off-line contacts, and maybe a sample of clickstream data, and I want to ask things like of all the
people who went to my online platform and clicked the password reset because they couldn't
remember their password, then never followed up with an e-mail, how many of them showed up
at a retail location within the next five days? They tried to engage online, and it didn't work out
for them. I want to know whether we're losing them or are they showing up in person.
That type of question maybe would make it into a business-intelligence (BI) report a few months
from that, but people who are thinking about what we do every day, would say "I wonder about
this, turn it into a query, and say "I think I found something." If we give these customers phone
calls, maybe we can reset their passwords over the phone and reengage them.
Human intensive
That's just one tiny, micro example, which is why data science is truly a human-intensive
exercise. You get 50-100 people working at an enterprise solving problems like that and what
you ultimately get is a positive feedback loop of self-correcting systems. Every time there's a
problem, somebody is thinking about how that problem is represented in the data. How do I
quantify that. If it’s significant enough, then how is it that the organization can improve in this
one specific area?
All that can be done with business logic is the interesting piece. You need very granular data
that's accessible via query and you need reasonably fast query time, because you can’t ask
questions like that when you're going to get coffee every time you run a query.
Layering predictive modeling allows you to understand the opportunity for impact if you fix that
problem. That one hypothesis with those users who cannot reset their passwords maybe those
users aren't that engaged in the first place. You fix their password but it doesn’t move the needle.
The other hypothesis is that it's people who are actively trying to engage with your server and are
unsuccessful because of this one very specific barrier. If you have a model of user engagement at
an individual level, you can say that these are really high-value users that are having this
problem, or maybe they aren’t. So you take data science, align it with really smart individual-
6
level business analysis, and what you get is an organization that continues to improve without
having to have at an executive-decision level for each one of those things.
Gardner: So a great deal of inquiry experimentation, iterative improvement, and feedback loops
can all come together very powerfully. I'm all for the data scientist full-employment movement,
but we need to do more than have people have to go through data scientist to use, access, and
develop these feedback insights. What is it about the SQL, natural language, or APIs? What is it
that you like to see that allows for more people to be able to directly relate and engage with these
powerful data sets?
Dyskant: One of the things is the product management of data schemas. So whenever we build
an analytics database for a large-scale organization I think a lot about an analyst who is 22,
knows VLOOKUP, took some statistics classes in college, and has some personal stories about
the industry that they're working in. They know, "My grandmother isn't a native English speaker,
and this is how she would use this website."
So it's taking that hypothesis that’s driven from personal stories, and being able to, through a
relatively simple query, translate that into a database query, and find out if that hypothesis proves
true at scale.
Then, potentially take the result of that query, dump them into a statistical-analysis language, or
use database analytics to answer that in a more robust way. What that means is that our schemas
favor very wide schemas, because I want someone to be able to write a three-line SQL statement,
no joins, that enters a business question that I wouldn't have thought to put in a report. So that’s
the first line -- is analyst-friendly schemas that are accessed via SQL.
The next line is deep key performance indicators (KPIs). Once we step out of the analytics
database, consumers drop into the wider organization that’s consuming data at a different level. I
always want reporting to report on opportunity for impact, to report on whether we're reaching
our most valuable customers, not how many customers are we reaching.
"Are we reaching our most valuable customers" is much more easily addressable; you just talk to
different people. Whereas, when you ask, "Are we reaching enough customers," I don’t know
how find out. I can go over to the sales team and yell at them to work harder, but ultimately, I
want our reporting to facilitate smarter working, which means incorporating model scores and
predictive analytics into our KPIs.
Getting to the core
Gardner: Let’s step back from the edge, where we engage the analysts, to the core, where we
need to provide the ability for them to do what they want and which gets them those great
results.
7
It seems to me that when you're dealing in a campaign cycle that is very spiky, you have a short
period of time where there's a need for a tremendous amount of data, but that could quickly go
down between cycles of an election, or in a retail environment, be very intensive leading up to a
holiday season.
Do you therefore take advantage of the cloud models for your analytics that make a fit-for-
purpose approach to data and analytics pay as you go? Tell us a little bit about your strategy for
the data and the analytics engine?
Dyskant: All of our customers have a cyclical nature to them. I think that almost every business
is cyclical, just some more than others. Horizontal scaling is incredibly important to us. It would
be very difficult for us to do what we do without using a cloud model such as Amazon Web
Services (AWS).
Also, one of the things that works well for us with HPE Vertica is the licensing model where we
can add additional performance with only the cost of hardware or hardware provision through the
cloud. That allows us to scale up our cost areas during the busy season. We'll sometimes even
scale them back down during slower periods so that we can have those 150 analysts asking their
own questions about the areas of the program that they're responsible for during busy cycles, and
then during less busy cycles, scale down the footprint of the operation.
Gardner: Is there anything else about the HPE Vertica OnDemand platform that benefits your
particular need for analysis. I'm thinking about the scale and the rows. You must have so many
variables when it comes to a retail situation, a commercial situation ,where you're trying to really
understand that consumer?
Dyskant: I do everything I can to avoid aggregation. I want my analysts to be looking at the data
at the interaction-by-interaction level. If it’s a website, I want them to be looking at clickstream
data. If it's a retail organization, I want them to be looking at point-of-sale data. In order to do
that, we build data sets that are very frequently in the billions of rows. They're also very
frequently incredibly wide, because we don't just want to know every transaction with this dollar
amount. We want to know things like what the variables were, and where that store was located.
Getting back to the idea that we want our queries to be dead simple, that means that we very
frequently append additional columns on to our transactional tables. We’re okay that the table is
big, because in a columnar model, we can pick out just the columns that we want for that
particular query.
Then, moving into some of the in-database machine-learning algorithms allows us to perform
more higher-order computation within the database and have less data shipping.
8
Gardner: We're almost out of time, but I wanted to do some predictive analysis ourselves.
Thinking about the next election cycle, midterms, only two years away, what might change
between now and then? We hear so much about machine learning, bots, and advanced
algorithms. How do you predict, Erek, the way that big data will come to bear on the next
election cycle?
Behavioral targeting
Dyskant: I think that a big piece of the next election will be around moving even more away
from demographic targeting, towards even more behavioral targeting. How is it that we reach
every voter based on what they're telling us about them and what matters to them, how that
matters to them? That will increasingly drive our models.
To do thatinvolves probably another 10X scale in the data, because that type of data is generally
at the clickstream level, generally at the interaction-by-interaction level, incorporating things like
Twitter feeds, adds an additional level of complexity and laying in computational necessity to the
data.
Gardner: It almost sounds like you're shooting for sentiment analysis on an issue-by-issue basis,
a very complex undertaking, but it could be very powerful.
Dyskant: I think that it's heading in that direction, yes.
Gardner: I am afraid we'll have to leave it there. We've been exploring how data analysis
services startup BlueLabs in Washington, DC helps presidential campaigns better know and
engage with potential voters. And we've learned how organizations are working smarter by
leveraging innovative statistical methods and technologies, and in this case, looking at two-way
voter engagement in entirely new ways in this and in future election cycles.
Join myVertica
To Get the Free
HPE Vertica Community Edition
So, please join me in thanking our guest, we have been here with Erek Dyskant, Co-Founder and
Vice President of Impact at BlueLabs in Washington. Thank you, Erek.
Dyskant: Thank you.
Gardner: And a big thank you as well to our audience for joining us for this Hewlett-Packard
Enterprise Voice of the Customer digital transformation discussion.
9
I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HPE-sponsored interviews. Thanks again for listening, and please come back next time.
Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett
Packard Enterprise.
Transcript of a discussion on how data analysis services startup BlueLabs in Washington, helps
presidential campaigns better know and engage with potential voters. Copyright Interarbor
Solutions, LLC, 2005-2016. All rights reserved.
You may also be interested in:
• How Propelling Instant Results to the Excel Edge Democratizes Advanced Analytics
• How ServiceMaster Develops Applications with a Security-Minded Focus as a DevOps
Benefit
• How JetBlue Mobile Applications Quality Assurance Leads to Greater Workforce
Productivity
• How Software-defined Storage Translates into Just-In-Time Data Center Scaling
• How Cutting-Edge Storage Provides a Competitive Footing for Canadian Music Provider
SOCAN
• Strategic DevOps -- How Advanced Testing Brings Broad Benefits to Operations and
Systems Monitoring for Independent Health
• How Always-Available Data Forms the Digital Lifeblood for a University Medical Center
• Loyalty Management Innovator Aimia's Transformation Journey to Modernized and
Standardized IT
• How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medicine,
and Entrepreneurship
10

How Big Data Deep Analysis and Agile SQL Querying Give 2016 Campaigners an Edge to Define and Reach Individually Qualified Voters

  • 1.
    How Big DataDeep Analysis and Agile SQL Querying Give 2016 Campaigners an Edge to Define and Reach Individually Qualified Voters Transcript of a discussion on how data analysis services startup BlueLabs in Washington, helps presidential campaigns better know and engage with potential voters. Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett Packard Enterprise. Dana Gardner: Hello, and welcome to the next edition to the Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on business digital transformation. Stay with us now to learn how agile companies are fending off disruption in favor of innovation. Our next case study explores how data-analysis services startup BlueLabs in Washington D.C. helps presidential campaigns better know and engage with potential voters. We'll learn how BlueLabs relies on analytics platforms that allow a democratization of querying, of opening the value of vast big data resources to more of those in the need to know. In this example of helping organizations work smarter by leveraging innovative statistical methods and technology, we'll discover how specific types of voters can be identified and reached. Here to describe how big data is being used creatively by contemporary political organizations for two-way voter engagement, we're joined by Erek Dyskant Co-Founder and Vice President of Impact at BlueLabs Analytics in Washington. Welcome, Erek. Join myVertica To Get the Free HPE Vertica Community Edition Erek Dyskant: I'm so happy to be here, and thanks for having me. Gardner: Obviously, this is a busy season for the analytics people who are focused on politics and campaigns. What are some of the trends that are different in 2016 from just four years ago. 1 Gardner
  • 2.
    It’s a fast-changingtechnology set, it's also a fast-changing methodology, and of course, the trends about how voters think, react, use social, and engage are also dynamic. So what's different this cycle? Dyskant: From a voter-engagement perspective, in 2012, we could reach most of our voters online through a relatively small set of social media channels -- Facebook, Twitter, and a little bit on the Instagram side. Moving into 2016, we see a fragmentation of the online and offline media consumption landscape and many more folks moving towards purpose-built social media platforms. If I'm at the HPE Conference and I want my colleagues back in D.C. to see what I'm seeing, then maybe I'll use Periscope, maybe Facebook Live, but probably Periscope. If I see something that I think one of my friends will think is really funny, I'll send that to them on Snapchat. Where political campaigns have traditionally broadcast messages out through the news-feed style social-media strategies, now we need to consider how it is that one-to-one social media is acting as a force multiplier for our events and for the ideas of our candidates, filtered through our campaign’s champions. Gardner: So, perhaps a way to look at that is that you're no longer focused on precincts physically and you're no longer able to use broadcast through social media. It’s much more of an influence within communities and identifying those communities in a new way through these apps, perhaps more than platforms. Social media Dyskant: That's exactly right. Campaigns have always organized voters at the door and on the phone. Now, we think of one more way. If you want to be a champion for a candidate, you can be a champion by knocking on doors for us, by making phone calls, or by making phone calls through online platforms. You can also use one-to-one social media channels to let your friends know why the election matters so much to you and why they should turn out and vote, or vote for the issues that really matter to you. Gardner: So, we're talking about retail campaigning, but it's a bit more virtual. What’s interesting though is that you can get a lot more data through the interaction than you might if you were physically knocking on someone's door. Dyskant: The data is different. We're starting to see a shift from demographic targeting. In 2000, we were targeting on precincts. A little bit later, we were targeting on combinations of 2 Dyskant
  • 3.
    demographics, on soccermoms, on single women, on single men, on rural, urban, or suburban communities separately. Moving to 2012, we've looked at everything that we knew about a person and built individual- level predictive models, so that we knew each person's individual set of characteristics made that person more or less likely to be someone that our candidate would have an engaging conversation through a volunteer. Now, what we're starting to see is behavioral characteristics trumping demographic or even consumer data. You can put whiskey drinkers in your model, you can put cat owners in your model, but isn't it a lot more interesting to put in your model that fact that this person has an online profile on our website and this is their clickstream? Isn't it much more interesting to put into a model that this person is likely to consume media via TV, is likely to be a cord-cutter, is likely to be a social media trendsetter, is likely to view multiple channels, or to use both Facebook and media on TV? That lets us have a really broad reach or really broad set of interested voters, rather than just creating an echo chamber where we're talking to the same voters across different platforms. Gardner: So, over time, the analytics tools have gone from semi-blunt instruments to much more precise, and you're also able to better target what you think would be the right voter for you to get the right message out to. One of the things you mentioned that struck me is the word "predictive." I suppose I think of campaigning as looking to influence people, and that polling then tries to predict what will happen as a result. Is there somewhat less daylight between these two than I am thinking, that being predictive and campaigning are much more closely associated, and how would that work? Predictive modeling Dyskant: When I think of predictive modeling, what I think of is predicting something that the campaign doesn't know. That may be something that will happen in the future or it may be something that already exists today, but that we don't have an observation for it. In the case of the role of polling, what I really see about that is understanding what issues matter the most to voters and how it is that we can craft messages that resonate with those issues. When I think of predictive analytics, I think of how is it that we allocate our resources to persuade and activate voters. Over the course of elections, what we've seen is an exponential trajectory of the amount of data that is considered by predictive models. Even more important than that is an exponential set of the use cases of models. Today, we see every time a predictive model is used, it’s used in a 3
  • 4.
    million and oneways, whereas in 2012 it might have been used in 50, 20, or 100 sessions about each voter contract. Gardner: It’s a fascinating use case to see how analytics and data can be brought to bear on the democratic process and to help you get messages out, probably in a way that's better received by the voter or the prospective voter, like in a retail or commercial environment. You don’t want to hear things that aren’t relevant to you, and when people do make an effort to provide you with information that's useful or that helps you make a decision, you benefit and you respect and even admire and enjoy it. Dyskant: What I really want is for the voter experience to be as transparent and easy as possible, that campaigns reach out to me around the same time that I'm seeking information about who I'm going to vote for in November. I know who I'm voting for in 2016, but in some local actions, I may not have made that decision yet. So, I want a steady stream of information to be reaching voters, as they're in those key decision points, with messaging that really is relevant to their lives. I also want to listen to what voters tell me. If a voter has a conversation with a volunteer at the door, that should inform future communications. If somebody has told me that they're definitely voting for the candidate, then the next conversation should be different from someone who says, "I work in energy. I really want to know more about the Secretary’s energy policies." Gardner: Just as if a salesperson is engaging with process, they use customer relationship management (CRM), and that data is captured, analyzed, and shared. That becomes a much better process for both the buyer and the seller. It's the same thing in a campaign, right? The better information you have, the more likely you're going to be able to serve that user, that voter. Dyskant: There definitely are parallels to marketing, and that’s how we at BlueLabs decided to found the company and work across industries. We work with Fortune 100 retail organizations that are interested in how, once someone buys one item, we can bring them back into the store to buy the follow-on item or maybe to buy the follow-on item through that same store’s online portal. How it is that we can provide relevant messaging as users engage in complex processes online? All those things are driven from our lessons in politics. Politics is fundamentally different from retail, though. It's a civic decision, rather than an individual-level decision. I always want to be mindful that I have a duty to voters to provide extremely relevant information to them, so that they can be engaged in the civic decision that they need to make. Gardner: Suffice it to say that good quality comparison shopping is still good quality comparison decision making. Dyskant: Yes, I would agree with you. 4
  • 5.
    Relevant and speedy Gardner:Now that we've established how really relevant, important, and powerful this type of analysis can be in the context of the 2016 campaign, I'd like to learn more about how you go about getting that analysis and making it relevant and speedy across large variety of data sets and content sets and so forth. But first, let’s hear more about BlueLabs. Tell me about your company, how it started, why you started it, maybe a little bit about yourself as well. Dyskant: Of the four of us who started BlueLabs, some of us met in the 2008 elections and some of us met during the 2010 midterms working at the Democratic National Committee (DNC). Throughout that pre-2012 experience, we had the opportunity as practitioners to try a lot of things, sometimes just once or twice, sometimes things that we operationalized within those cycles. Jumping forward to 2012 we had the opportunity to scale all that research and development to say that we did this one thing that was a different way of building models, and it worked for in this congressional array. We decided to make this three people’s full-time jobs and scale that up. Moving past 2012, we got to build potentially one of the fastest-growing startups, one of the most data-driven organizations, and we knew that we built a special team. We wanted to continue working together with ourselves and the folks who we worked with and who made all this possible. We also wanted to apply the same types of techniques to other areas of social impact and other areas of commerce. This individual-level approach to identifying conversations is something that we found unique in the marketplace. We wanted to expand on that. Join myVertica To Get the Free HPE Vertica Community Edition Increasingly, what we're working on is this segmentation-of-media problem. It's this idea that some people watch only TV, and you can't ignore a TV. It has lots of eyeballs. Some people watch only digital and some people consume a mix of media. How is it that you can build media plans that are aware of people's cross-channel media preferences and reach the right audience with their preferred means of communications? Gardner: That’s fascinating. You start with the rigors of the demands of a political campaign, but then you can apply in so many ways, answering the types of questions anticipating the type of questions that more verticals, more sectors, and charitable organizations would want to be involved with. That’s very cool. 5
  • 6.
    Let’s go backto the data science. You have this vast pool of data. You have a snappy analytics platform to work with. But, one of the things that I am interested in is how you get more people whether it's in your organization or a campaign, like the Hillary Clinton campaign, or the DNC to then be able to utilize that data to get to these inferences, get to these insights that you want. What is it that you look for and what is it that you've been able to do in that form of getting more people able to query and utilize the data? Dyskant: Data science happens when individuals have direct access to ask complex questions of a large, gnarly, but well-integrated data set. If I have 30 terabytes of data across online contacts, off-line contacts, and maybe a sample of clickstream data, and I want to ask things like of all the people who went to my online platform and clicked the password reset because they couldn't remember their password, then never followed up with an e-mail, how many of them showed up at a retail location within the next five days? They tried to engage online, and it didn't work out for them. I want to know whether we're losing them or are they showing up in person. That type of question maybe would make it into a business-intelligence (BI) report a few months from that, but people who are thinking about what we do every day, would say "I wonder about this, turn it into a query, and say "I think I found something." If we give these customers phone calls, maybe we can reset their passwords over the phone and reengage them. Human intensive That's just one tiny, micro example, which is why data science is truly a human-intensive exercise. You get 50-100 people working at an enterprise solving problems like that and what you ultimately get is a positive feedback loop of self-correcting systems. Every time there's a problem, somebody is thinking about how that problem is represented in the data. How do I quantify that. If it’s significant enough, then how is it that the organization can improve in this one specific area? All that can be done with business logic is the interesting piece. You need very granular data that's accessible via query and you need reasonably fast query time, because you can’t ask questions like that when you're going to get coffee every time you run a query. Layering predictive modeling allows you to understand the opportunity for impact if you fix that problem. That one hypothesis with those users who cannot reset their passwords maybe those users aren't that engaged in the first place. You fix their password but it doesn’t move the needle. The other hypothesis is that it's people who are actively trying to engage with your server and are unsuccessful because of this one very specific barrier. If you have a model of user engagement at an individual level, you can say that these are really high-value users that are having this problem, or maybe they aren’t. So you take data science, align it with really smart individual- 6
  • 7.
    level business analysis,and what you get is an organization that continues to improve without having to have at an executive-decision level for each one of those things. Gardner: So a great deal of inquiry experimentation, iterative improvement, and feedback loops can all come together very powerfully. I'm all for the data scientist full-employment movement, but we need to do more than have people have to go through data scientist to use, access, and develop these feedback insights. What is it about the SQL, natural language, or APIs? What is it that you like to see that allows for more people to be able to directly relate and engage with these powerful data sets? Dyskant: One of the things is the product management of data schemas. So whenever we build an analytics database for a large-scale organization I think a lot about an analyst who is 22, knows VLOOKUP, took some statistics classes in college, and has some personal stories about the industry that they're working in. They know, "My grandmother isn't a native English speaker, and this is how she would use this website." So it's taking that hypothesis that’s driven from personal stories, and being able to, through a relatively simple query, translate that into a database query, and find out if that hypothesis proves true at scale. Then, potentially take the result of that query, dump them into a statistical-analysis language, or use database analytics to answer that in a more robust way. What that means is that our schemas favor very wide schemas, because I want someone to be able to write a three-line SQL statement, no joins, that enters a business question that I wouldn't have thought to put in a report. So that’s the first line -- is analyst-friendly schemas that are accessed via SQL. The next line is deep key performance indicators (KPIs). Once we step out of the analytics database, consumers drop into the wider organization that’s consuming data at a different level. I always want reporting to report on opportunity for impact, to report on whether we're reaching our most valuable customers, not how many customers are we reaching. "Are we reaching our most valuable customers" is much more easily addressable; you just talk to different people. Whereas, when you ask, "Are we reaching enough customers," I don’t know how find out. I can go over to the sales team and yell at them to work harder, but ultimately, I want our reporting to facilitate smarter working, which means incorporating model scores and predictive analytics into our KPIs. Getting to the core Gardner: Let’s step back from the edge, where we engage the analysts, to the core, where we need to provide the ability for them to do what they want and which gets them those great results. 7
  • 8.
    It seems tome that when you're dealing in a campaign cycle that is very spiky, you have a short period of time where there's a need for a tremendous amount of data, but that could quickly go down between cycles of an election, or in a retail environment, be very intensive leading up to a holiday season. Do you therefore take advantage of the cloud models for your analytics that make a fit-for- purpose approach to data and analytics pay as you go? Tell us a little bit about your strategy for the data and the analytics engine? Dyskant: All of our customers have a cyclical nature to them. I think that almost every business is cyclical, just some more than others. Horizontal scaling is incredibly important to us. It would be very difficult for us to do what we do without using a cloud model such as Amazon Web Services (AWS). Also, one of the things that works well for us with HPE Vertica is the licensing model where we can add additional performance with only the cost of hardware or hardware provision through the cloud. That allows us to scale up our cost areas during the busy season. We'll sometimes even scale them back down during slower periods so that we can have those 150 analysts asking their own questions about the areas of the program that they're responsible for during busy cycles, and then during less busy cycles, scale down the footprint of the operation. Gardner: Is there anything else about the HPE Vertica OnDemand platform that benefits your particular need for analysis. I'm thinking about the scale and the rows. You must have so many variables when it comes to a retail situation, a commercial situation ,where you're trying to really understand that consumer? Dyskant: I do everything I can to avoid aggregation. I want my analysts to be looking at the data at the interaction-by-interaction level. If it’s a website, I want them to be looking at clickstream data. If it's a retail organization, I want them to be looking at point-of-sale data. In order to do that, we build data sets that are very frequently in the billions of rows. They're also very frequently incredibly wide, because we don't just want to know every transaction with this dollar amount. We want to know things like what the variables were, and where that store was located. Getting back to the idea that we want our queries to be dead simple, that means that we very frequently append additional columns on to our transactional tables. We’re okay that the table is big, because in a columnar model, we can pick out just the columns that we want for that particular query. Then, moving into some of the in-database machine-learning algorithms allows us to perform more higher-order computation within the database and have less data shipping. 8
  • 9.
    Gardner: We're almostout of time, but I wanted to do some predictive analysis ourselves. Thinking about the next election cycle, midterms, only two years away, what might change between now and then? We hear so much about machine learning, bots, and advanced algorithms. How do you predict, Erek, the way that big data will come to bear on the next election cycle? Behavioral targeting Dyskant: I think that a big piece of the next election will be around moving even more away from demographic targeting, towards even more behavioral targeting. How is it that we reach every voter based on what they're telling us about them and what matters to them, how that matters to them? That will increasingly drive our models. To do thatinvolves probably another 10X scale in the data, because that type of data is generally at the clickstream level, generally at the interaction-by-interaction level, incorporating things like Twitter feeds, adds an additional level of complexity and laying in computational necessity to the data. Gardner: It almost sounds like you're shooting for sentiment analysis on an issue-by-issue basis, a very complex undertaking, but it could be very powerful. Dyskant: I think that it's heading in that direction, yes. Gardner: I am afraid we'll have to leave it there. We've been exploring how data analysis services startup BlueLabs in Washington, DC helps presidential campaigns better know and engage with potential voters. And we've learned how organizations are working smarter by leveraging innovative statistical methods and technologies, and in this case, looking at two-way voter engagement in entirely new ways in this and in future election cycles. Join myVertica To Get the Free HPE Vertica Community Edition So, please join me in thanking our guest, we have been here with Erek Dyskant, Co-Founder and Vice President of Impact at BlueLabs in Washington. Thank you, Erek. Dyskant: Thank you. Gardner: And a big thank you as well to our audience for joining us for this Hewlett-Packard Enterprise Voice of the Customer digital transformation discussion. 9
  • 10.
    I'm Dana Gardner,Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored interviews. Thanks again for listening, and please come back next time. Listen to the podcast. Find it on iTunes. Get the mobile app. Sponsor: Hewlett Packard Enterprise. Transcript of a discussion on how data analysis services startup BlueLabs in Washington, helps presidential campaigns better know and engage with potential voters. Copyright Interarbor Solutions, LLC, 2005-2016. All rights reserved. You may also be interested in: • How Propelling Instant Results to the Excel Edge Democratizes Advanced Analytics • How ServiceMaster Develops Applications with a Security-Minded Focus as a DevOps Benefit • How JetBlue Mobile Applications Quality Assurance Leads to Greater Workforce Productivity • How Software-defined Storage Translates into Just-In-Time Data Center Scaling • How Cutting-Edge Storage Provides a Competitive Footing for Canadian Music Provider SOCAN • Strategic DevOps -- How Advanced Testing Brings Broad Benefits to Operations and Systems Monitoring for Independent Health • How Always-Available Data Forms the Digital Lifeblood for a University Medical Center • Loyalty Management Innovator Aimia's Transformation Journey to Modernized and Standardized IT • How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medicine, and Entrepreneurship 10