GSN Games Wins Big Using Vertica to Uncover Deep Customer Insights
GSN Games Wins Big Using Vertica to Uncover Deep
Transcripts of a sponsored BrieﬁngsDirect podcast on how big data and instant analysis can
provide valuable feedback on company initiatives.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I’m
Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this
ongoing sponsored discussion on IT innovation and how it’s making an impact
on people’s lives.
Once again, we’re focusing on how companies are adapting to the new style of
IT to improve IT performance and deliver better user experiences, and business
results. This time, we’re coming to you directly from the HP Discover 2013
Conference in Barcelona.
We’re here the week of December 9 to learn directly from IT and business leaders alike how big
data, mobile, and cloud, along with converged infrastructure are all supporting their goals.
Our next innovation case study interview highlights how GSN Games is using big data to
uncover more information to produce and deliver entertainment for their audience. With that, I’d
like to welcome our guest, we’re here with Portman Wills, Vice President of Data at GSN Games
in San Francisco. Welcome, Portman.
Portman Wills: Hi. Nice to be here.
Gardner: We’re glad to have you. Tell us a little bit about GSN Games. What do you do, and
who is playing these games?
Wills: GSN started as a cable network in the U.S. We’re distributed in 80 million
households as the Game Show Network, and then we also have a digital wing that
produces casual and social games on Facebook, web, tablets, and mobile. That
division has 110 million registered game-players. My team takes data from all over
those worlds, throws them into a big data warehouse, and starts trying to ﬁnd trends and insights
for both our TV audience and our online game-players.
In terms of the games, which is really where the growth is, our core demographic is older
females, believe it or not, who love playing casual games. We skew more in the 55-plus age
range and we have players from all over the world.
Because we’re here in Spain, a quick tidbit that we uncovered recently is that our main
timeframe in every country on earth, when people play games, is 7 p.m. to 11 p.m., except in
Spain where it’s 1 p.m. to 3 p.m. -- siesta time. That’s just one of the examples of how we use big
data to use discover insights about our players and our audiences worldwide.
Understanding the audience
Gardner: I have to imagine that the data that led you to that inﬂuence in Spain was something
other than what we might consider typical structured data. How did the different data brought
together allow you to understand your Spanish audience better?
Wills: We use this product from HP called Vertica, which is just a tremendous data warehouse,
that lets us throw every single click, touch, or swipe in all of our games into a
big table. By big, I mean right now it’s I think 1.3 trillion rows. We keep saying
that we should really archive this thing. Then, we say we’ll archive it when it
slows down, and then it just never slows down, so we have yet to archive it.
We put all of the click stream data in there. The traditional joins, schemas, and
all of that don’t really have to happen because we have one table with all of the
interactions. You have the device, the country, the player, all these attributes.
It’s a very wide table. So if you want to do things like ask what is the usage
ﬁve-minute sliced by country, it’s a simple SQL query, and you get your results.
Gardner: The word “games” means a lot of different things to a lot of people. We’re talking
about a heritage of network television games back in the ’60s and ’70s that have led us to what is
now your organization. But what sort of games are we talking about, and what proportion of
them are online games, versus more of the passive watching that on a cable or other media outlet.
Wills: Originally, when our games division started as a branch of GSN, it was companion games
to Wheel of Fortune, Minute to Win It, whatever the hot game show was. That's still a part of it,
but the growth in the last few years has been in social games on Facebook, where a lot of our
games are more casual titles and have nothing to do with the game show -- tile-matching games
or solitaire games.
Then, in the last year or year-and-a-half for us, like everyone else, there’s been this explosion in
mobile. So it’s iPad, Android, and iPhone games, and there we have the solitaires and the tile
Increasingly, a lot of our success and growth has come from virtual casino games. People are
playing Bingo, video poker, even slots, virtual slots. We have this title called GSN Casino. That’s
an umbrella app with a lot of mini games that are casino-themed, and that one has really just
exploded really in the last six months. It's a long way from Point A of Family Feud reruns to
Point Z of virtual slot machines, but hopefully you can see how we got there.
Gardner: It seems like a long distance but it’s been also a fairly short amount of time. It wasn't
that long ago that the information you might have in your audience came through Nielsen for
passive audiences, and you had basically a one- or two-dimensional view of that individual,
based on the estimate of time was devoted to that show. But now, with the mobile devices in
particular, you have a plethora of data.
Tell us a little bit about the types of data that you can get and what volumes are we talking
Wills: Let’s take mobile because I think it's easy to grok. Everything about the device is
exposed to us. The fact that you’re playing on an iPad Mini Retina versus an iPad 1 tells us a lot
about you, whether you know it or not.
Then, a lot of our users sign-in via Facebook, which is another vector for information. If you
sign-in via Facebook, Facebook provides us your age range, gender, some granular location. For
every player, we get between 40 and 50 dimensions of data about that player or about that
That’s one bucket, but the actual gameplay is another whole bucket. What games do you choose
to play in our catalogue? How long do you play them? What time of day do you play them.
Those start to classify users into various buckets from the casual commute player, who plays for
15 minutes every morning and afternoon, to the hard-core player who spends 8 to 10 hours a day,
believe it or not, playing our games on their mobile device.
At that point, and this is a little bit of a pet peeve of mine, mobile doesn’t necessarily mean
mobile, like out and about. A lot of our players are on their iPad, sitting on the couch in their
It’s not mobility. They’re not using 3G. They’re not using augmented reality. It’s just a device
that happens to be a very convenient device for playing games. So it’s much more of a laptop
replacement than any sort of mobile thing. That’s sort of a side track.
We collect all of this data, and it’s a fair amount. Right now, we’re generating about 900 million
events per day across all of our players. That’s all streamed into our data warehouse, and there
are a few tables, event time series tables, that we put the stuff into. A small table for us would be
a few hundred billion records, and a large table, as I said, is 1.3 trillion records right now.
So the scale is big for us. I know that for other companies that seems like peanuts. It’s funny how
big data is so broad. What’s big to one person is tiny to someone else, but this is the world that
we’re dealing in right now.
We have 110 million players. Thankfully, not all of them are active at one time. That would be
really big data. But we will have about 20 million at any given time in peak time playing
concurrently. That’s a little bit about the numbers in our warehouse.
Gardner: Understanding your audience through this data is something fairly new. Before, you
couldn’t get this amount of data. Now that you have it, what is it able to do for you? Are you
crafting new games based on your ﬁndings? Are you ﬁnding information that you can deliver
back to a marketer or advertiser that links them to the audience better? There must be many
things you can do.
Wills: First of all, we don’t do any advertising in our mobile games. So that’s one piece that
we’re not doing, although I know others are. But there are two broad buckets in which we use
data. The ﬁrst is that we run a lot of the A/B tests, experiments. All of our games are constantly
being multivariate tested with different versions of that same game in the ﬁeld.
We run 20 to 40 tests per week. As an example, we have a Wheel of Fortune game that we
recently released, and there was all this debate about the difﬁculty of the puzzles. How hard
should the puzzles be? Should they be very obscure pieces of eastern literature, mainstream pop
culture, or even easier?
So, we tested different levels of difﬁculty. Some players got the easy, some players got the
medium, and some players got the hard ones. We can measure the return rate, the session
duration, and the monetization for people who buy power-ups, and see which level of difﬁculty
performs the best. In the ﬁrst test of easy, medium, hard, easy overwhelmingly did the best.
So we generated a whole bunch of new puzzles that were even easier than were previous easy
and tested that against what was now the control. The easier puzzles won again. So we generated
a whole new set of puzzles that were absurdly easy. We were trying to prove the point that if we
gave Wheel of Fortune puzzles that are four-letter words like “bird” and “cups,” nobody would
enjoy playing something that simplistic.
Well it turns that they do -- surprise, surprise -- and so that’s how we evolved into a version of
Wheel of Fortune that, compared to the game show, looks very different, but it’s actually what
customers want. It’s what players want. They want to relax and solve simple puzzles like “door.”
Gardner: So it determined that everyone is a winner on GSN, but you’re able to do real-time
focus-group types of activities. The data, because it's so fast, because there is so much
information available and you can deal with it so quickly, means that you’re able to tune your
games to the audience virtually overnight.
Wills: Hopefully faster than overnight. Overnight is a little too slow these days. We push twice a
day both to our platform code and updates to all of our games in the morning around 11 a.m and
in the afternoon around 3:30. Each one of those releases is based on the data that came from the
So we're constantly evolving these games. I want to go back to your previous question, because I
only got to talk about one bucket, which is this experimentation. The other bucket is using the
usage patterns that customers have to evolve our product in ways that aren’t necessarily
structured around an A/B test.
We thought when we launched our iPhone app that there would be a lot of commuting usage. We
had in our head this hypothetical bus player, who plays on the bus in the morning. And so we
thought we would build all the stuff around daily patterns. We built this daily return bonus that
you can do in the morning and then again in the evening.
The data showed us that that really was only a tiny fraction of our players. There were, in fact,
very few players who had this bimodal, morning and evening usage pattern. Most people didn't
play at all until after dinner and then they would play a lot, sometimes even binge from 7 p.m.
until 2 a.m. on games.
That was an area where we didn't even set up an experiment. We just had false assumptions
about our player base. And that happens a surprising amount of time. We all -- especially the
game-design team and people who spent their careers designing video games -- have
assumptions about their audience that half the time are just wrong. One of the things we use data
for is to challenge all of our assumptions about our own products and our own businesses.
It's really gotten to a point where it's almost religious in our company. The moment two people
start debating what should or shouldn't happen, they say, “Well let's just let the data decide.”
That's been a core change not just for us, but for the game industry as a whole.
Gardner: I expect that to be a change across many more industries. What you’re describing is
very much desired by a lot of types of businesses through understanding a massive amount of
data of their audience, to be able to react quickly to that, and then to stop guessing about
products and pricing and distribution and logistics and supply chain and be driven by the data.
You’re a really interesting harbinger of things to come.
Portman, tell me little bit about the process by which you were able to do this. Did you have an
older data warehouse? What did you use before and how did you make a transition to Vertica?
Wills: When we started the social mobile business three years ago, we were on MySQL, which
we are still on for our transactional load. We have three data centers around the world. When
people are playing our games, it’s recording, reading, and writing 125,000 transactions per
second, and that MySQL, sharded out, works great for that.
When you want to look at your entire player base and do a cross-shard query, we found that
MySQL really fell down. Our original Vertica proof of concept (POC) was just to replace these
A/B test queries which have to look across the entire population.
So in comes Vertica. We set up a single node, a Vertica data warehouse. We pull in a year's worth
of data, and the same query to synthesize these sessions ran in 800 milliseconds.
So the thing that took 24 hours, which is 86,400 seconds, ran in less than one second. By the
way, that 24-hour query was running across dozens of machines, and this Vertica query was
running on a single server of commodity hardware.
That's when we really became believers in the power of the column store and column-oriented
data warehouses. From the small beginning of just one simple query, it’s now expanded and
pretty much our whole business runs on top of HP Vertica on the data warehouse side.
Gardner: As I said, I think GSN Games is a really harbinger of what a lot of other companies
in many different vertical industries will be seeking. Do you have any thoughts in terms of
lessons learned, as you progressed over the past three years to this size of a data set to this level
of inference that you can deliver to virtually everyone in your company?
Looking back, if you had to do it again, what might you have done differently or what
suggestions might you have for others who would like to be able to do what you are doing?
Wills: I deﬁnitely wish that we had switched to a column store sooner. I think the reason that
we've been so successful at this is because of our game design team, which was so open to using
I’ve heard hard stories from other companies where they want to use a data-driven approach, and
there's just a lot of cultural inertia and pushback against doing that. It's hard to be consistently
proven wrong in your job, which is always what happens when you rely on data.
The real thing that's helped us get to the point we are in is a culture and a company where
everybody is open to being wrong and open to being proven wrong by the data, which I am very
Gardner: Well, it's good to be data-driven, and I think you should feel good being responsible
for making 110 million people feel good about themselves every day.
I'm afraid we will have to leave it there. We've been talking about how GSN Games is using HP
Vertica to gather amazing insights and go beyond instinct and intuition into more of a science for
their audiences' beneﬁt and for their business’s beneﬁt.
I would like to thank our guest, Portman Wills, Vice President of Data at GSN Games in San
Francisco. Thank you, sir.
Wills: Thank you.
Gardner: And thank you to our audience as well for joining us for this special new style of IT
discussion, coming to you directly from the HP Discover 2013 Conference in Barcelona.
I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HP sponsored discussions. Thanks again for listening, and come back next time.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Transcripts of a sponsored BrieﬁngsDirect podcast on how big data and instant analysis can
provide valuable feedback on company initiatives. Copyright Interarbor Solutions, LLC,
2005-2014. All rights reserved.
You may also be interested in:
Network virtualization eases developer and operations snafus in the mobile and cloud era
Siemens Brazil blazes a best practices path to deliver work ﬂow applications on mobile
Service virtualization solves bottlenecks amid complex billing process for German telco
Nimble Storage Leverages Big Data and Cloud to Produce Data Performance
Optimization on the Fly
Inside story on how HP implemented the TippingPoint intrusion prevention system across
its own security infrastructure
In remaking itself, HP delivers the IT means for struggling enterprises to remake
MZI Healthcare Identiﬁes Big Data Patient Productivity Gems Using HP Vertica
Thought Leader Interview: HP's Global CISO Brett Wahlin on the future of Security and
Panel explains how CSC creates a tough cybersecurity posture against global threats
Risk and complexity: Businesses need to get a grip