SlideShare a Scribd company logo
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
@fzk
frisovanvollenhoven@godatadriven.com
Go Data Driven NOW!
Friso van Vollenhoven
CTO
Modal
asking price
3470.- EUR / m2
(Hint:This only works if you know how to look at data.)
What about Big Data?
(source: Google Trends)
A question
The answer
(bron:Wikipedia)
The Big Data way
(http://trends.google.com)
The Big Data way
(source: Google Trends)
(Hint:This only works if you put effort
into the general availability of data.)
Nobody understands
Moore’s law
1985 €100000.00
1990 €110408.08
1995 €121899.44
2000 €134586.83
2005 €148594.74
2010 €164060.60
2015 €181136.16
1,900,000,000
FLOPS
$16,000,000.-
7,000,000,000,000
FLOPS
$999.-
computing power price
1985 1900000000 16000000
2015 7000000000000 999
change times 370 divide by 16000
Cray 2
iPad 2
(source: http://www.phoronix.com/scan.php?page=news_item&px=MTE4NjU)
(source: http://hblok.net/blog/storage/)
(Hint: If hardware and licensing costs are part
of your Big Data business cases, you’re working
on the wrong cases in the wrong way.)
Beyond word
counting…
User Engagement through Topic Modelling in Travel
Athanasios Noulas
⇤
Rembrandt Square Office
Herengracht 597, 1017 CE Amsterdam
athanasios.noulas@booking.com
Mats Stafseng Einarsen
†
Rembrandt Square Office
Herengracht 597, 1017 CE Amsterdam
mats.einarsen@booking.com
ABSTRACT
Booking.com engages its users in di↵erent ways, like for
example email campaigns or on-site recommendations, in
which the user receives suggestions for the destination of
their next trip. This engagement is data-driven, and its pa-
rameters emerge from the corresponding relevant past be-
havioural pattern of users in the form of collaborative fil-
tering or other recommender algorithms. In this work we
use a secondary database with meta-information about the
recommended destination in the form of user endorsements.
We model the endorsements using Latent Dirichlet alloca-
tion, a well-known principled probabilistic framework, and
use the resulting latent space to optimise user engagement.
We demonstrate measurable benefits in two distinct inter-
actions with the user in the form of email marketing and
menu-based website browsing.
1. INTRODUCTION
Booking.com [2] is the world’s largest online travel agent
and millions of users visit its e-shop to reserve their accom-
modation. As in most shopping experiences, the users have
often only partially settled their product expectations. For
example, the user might have a fixed budget and dates, but
still be flexible in terms of the final destination. In such
cases, optimising user engagement becomes a high impact
objective, as an engaged user has higher chances on settling
the final details of his choice and making a purchase which
leads to short- and long-term prosperity of the company.
There has been extensive work on recommending relevant
items to the user [4] and much of those recommendations are
provided both on the website of Booking.com as“alternative
destinations” as well as through our email campaign which
reaches millions of subscribers all over the world. Unfor-
tunately, in a world-wide-web drowning in advertisements,
these recommendations often fail to di↵erentiate themselves,
⇤Data Scientist, Visitor Profiling
†Senior Product Owner, Front End
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
KDD, The Second Workshop on User Engagement Optimization 2014 New
York City, New York USA
Copyright 2014 ACM X-XXXXX-XX-X/XX/XX ...$15.00.
Endorsement # given # destinations # users
Shopping 513536 9105 467143
Food 242647 13595 228130
Beach 237035 8255 220603
Adventure 9622 2327 9374
Chinatown 2041 132 1992
Table 1: Example endorsements, the number of
times this endorsement was given by our users, the
number of destinations that have been endorsed
with this endorsement and the number of users that
have given this endorsement
and they are therefore discarded. Moreover, simple collab-
orative filter-like messages, e.g. user who travelled to Rome
also travelled to Florence, miss the personal touch which will
engage the user optimally.
Over the last year1
, Booking.com launched the Destina-
tion Finder [1], a project that allowed users to endorse des-
tinations for di↵erent activities. The objective of the project
was to help users choose a destination based on their favourite
activities rather than the location itself. What di↵erentiates
the Destination Finder endorsements from usual travel web-
sites is that they come strictly from people who have stayed
at the destination they endorse — as endorsements are as-
sociated to a completed hotel reservation. The objective of
our project was to use the endorsement meta-data to opti-
mise and personalise user engagement in every part of their
interaction with Booking.com.
The remainder of this paper is organised as follows. Sec-
tion 2 contains a short description of the endorsement data.
Section 3 gives a brief overview of the Latent Dirichlet Allo-
cation[3], a probabilistic topic model we used to analyse the
endorsement data. Section 4 presents the results acquired on
the endorsement dataset. Section 5 describes di↵erent suc-
cessful applications of the resulting endorsement data model
along with their quantitative comparative results.
2. ENDORSEMENT DATA
When this work was carried out the destination finder
had collected more than 10 million endorsements from more
than 2 million real users. These endorsements can be free,
unprocessed text, or be one of 213 fixed endorsements. Some
statistics of the most and least common fixed endorsements
are shown in table 1.
1
The first endorsement was recorded in September 2013
Bangkok Book now
Shopping
Food
Nightlife
Sightseeing
Temples
12416
3980
2745
1862
1822
endorsements
endorsements
endorsements
endorsements
endorsements
Clothes Shopping 1048 Street Food 906 Markets 852 Gourmet Food 747 Monuments 654 City Trip 650Culture 1424
Relaxation 603
Show all
Culturally Diverse Food 589 Friendly People 557 Luxury Brand Shopping 396 Fashion Bargains 313 Local Food 309
London Book now
Shopping
Sightseeing
Museums
Theater
Culture
32075
18474
12133
10438
8908
endorsements
endorsements
endorsements
endorsements
endorsements
History 6486 Food 4714 Entertainment 4331 City Trip 4113 Nightlife 3616 Musicals 2999Monuments 1424
Restaurants 2957
Show all
Parks 2776 Pubs 2575 Architecture 2493 Luxury Brand Shopping 2350 Live Music 2237
Figure 1: The endorsement pages of Destination Finder for Bangkok and London
The endorsement dataset presents a number of challenges
which are common in natural language processing applica-
tions:
1. Sparsity: Most of the endorsements represent a very
long tail appearing very rarely
2. Ambiguity: Endorsements like ”Shopping” can repre-
sent a variety of activities, ranging from shopping for
luxury brands to buying souvenirs or bargaining in an
open market
3. Competition: Most users give a limited number of en-
dorsements, so inevitably endorsements in large cities,
where multiple activities are available, will compete
against each other for this limited space.
4. Redundancy: People mention ”Food”, ”Street Food”
and so on interchangeably
5. Relativity: London might be a great destination for a
city-trip from Amsterdam, but it is a far-away des-
tination for someone coming from Thailand. Also,
Culturally-Diverse-Food, which is a very common en-
dorsement, has completely di↵erent meaning at di↵er-
ent places of the world.
As a result, most cities usually have groups of endorse-
ments with similar frequency which depend both on the city
and the visitors it attracts. For example, figure 1 contains
the endorsements given to Bangkok and London as they are
presented in each city’s Destination Finder page. We can
see that Shopping is first in both destination, while Bangkok
gets higher percentage of endorsements related to nightlife
in contrast to London which gets more culture-related en-
dorsements.
3. LATENT DIRICHLET ALLOCATION
Latent Dirichlet Allocation [3] is a generative model which
has been used extensively to model the distribution of words
in documents. The naming comes from the fact that these
topics are latent in the sense that we retrieve them from
an unlabelled dataset as opposed to produce them using
labelled data. The Dirichlet part refers to the priors we
provide for the topic distribution in the documents and the
word distribution in each topic. The objective behind the
application of Latent Dirichlet Allocation is to model each
document as a mixture of a small number of topics and at
the same time attribute each word of the document to one
of the topics present in the corresponding document.
The probabilistic graphical model of the Latent Dirich-
Figure 2: The probabilistic graphical model of LDA
let Allocation is visible in figure 2. The plates represent M
documents each one containing some of the N words of our
vocabulary. In mathematical terms ↵ and are priors for
the Dirichlet distributions of topics and words. Each docu-
ment i has a topic distribution parameterised by ✓i and each
topic k has a word distribution parameterised by k. The
jth word in document i is then wij and it is associated to
topic zij. The generative process then works in three steps:
1. Choose ✓i ⇠ Dir(↵), where i 2 {1, . . . , M} and Dir(↵)
is the Dirichlet distribution for parameter ↵
2. Choose k ⇠ Dir( ) , where k 2 {1, . . . , K}
3. For each of the word positions i, j, where j 2 {1, . . . , Ni},
and i 2 {1, . . . , M}
• Choose a topic zi,j ⇠ Multinomial(✓i).
• Choose a word wi,j ⇠ Multinomial( zi,j ).
In practice we need to provide the learning algorithm with
a corpus containing the M documents as the counts for each
of the N word for each document. We further provide the
desired parameter K which represents the total number of
topics we want to discover. During learning, the model will
retrieve the distribution over words for each topic so as to
maximise the complete corpus likelihood:
P(W , Z, ✓, '; ↵, ) =
KY
i=1
P('i; )
MY
j=1
P(✓j; ↵) · (1)
NY
t=1
P(Zj,t|✓j)P(Wj,t|'Zj,t ) (2)
we can then use this distribution to map each of the corpus’
documents to a mixture over the latent topics. We refer the
reader interested in the details of learning and inference for
the Latent Dirichlet Allocation model to David Blei’s page 2
which contains more information regarding topic modelling,
many open-source implementations of Latent Dirichlet Al-
location as well as a review paper and a tutorial presented
in KDD-2011.
4. MODELLING ENDORSEMENT DATA
The application of Latent Dirichlet Allocation (LDA) on
our endorsement data is quite straightforward. Each en-
dorsement is one word, while each reservation and its cor-
responding endorsements can be seen as a single document
2
http://www.cs.princeton.edu/ blei/topicmodeling.html
Topic
Dominant Endorsements
(probability of appearance)
Example Des-
tinations
1
Romantic (0.64), Roman-
ruins (0.06), Photogra-
phy (0.06), Historical-
Landmarks (0.06)
Brugge, Vi-
enna, Verona
2
Snorkelling (0.23), Diving
(0.19), Sun-Bathing (0.11),
Walking-with-Kids (0.10)
Bayahibe, Ko
Phi Phi Don,
Rhodes
4
Shopping (0.40), Food
(0.35), Walking (0.07),
Entertainment (0.06)
Taipei,
Buenos Aires,
Valencia
23
Shopping (0.28), Monu-
ments (0.12), Culturally-
Diverse-Food (0.10), Busi-
ness (0.08)
Sydney,
Lisbon, Liver-
pool
Table 2: Some of the topics discovered by the Latent
Dirichlet Allocation. The topic number an identifier
of the latent topics discovered by LDA. The domi-
nant endorsements are the endorsements that ap-
pear most probably when a trip is associated to the
corresponding topic, while in brackets we have their
probability of appearance.
and its corresponding words. LDA requires a user-defined
number of latent topics, and after experimentation with 5,
20, 40, 100 or 200 topics we acquired the best empirical re-
sults with 40 topics. A few sample topics are visible in table
2, along with the most common endorsements associated to
each topic.
At this moment there is no topic modelling algorithm uni-
versally accepted as the optimal solution. We found LDA to
be very e cient for our endorsement data, but as with any
application on a real-world dataset there were some inter-
esting observations which we organise here in three sections,
namely the good, the bad and the ugly.
4.1 The good
The LDA allows us to map any trip to a mixture of top-
ics, since during training we used each individual trip as a
document. Moreover, we can express each destination as a
mixture of the trips done there and thus map each destina-
tion to the latent topic space.
In the example of Bangkok and London the main topics
are visible in table 3. In both destinations, as in any big
city, Shopping is one of the most common activities trav-
ellers do, but in this case it is disambiguated into di↵erent
topics. People who travel to London combine shopping with
Sightseeing and Theater, or do it as part of the Christmas
Shopping, while people in Bangkok combine it with Food
and Entertainment or monuments and Culturally-Diverse-
Food. Finally, the two destinations are clearly separated by
their second most prominent topic which in case of London
is Museums, while in the case of Bangkok it is Culture and
Temples.
Mapping destinations to topics works extremely well for
major destinations since we have thousands of trips to de-
scribe them. This mapping also works very well for less pop-
ular destinations, discovering great places for niche endorse-
ments like ”Fine-Dining”, ”Volcanoes”or ”Bar-hopping”. More-
over, it is very easy to express popularity by multiplying the
City London Bangkok
Topic 25 (0.29) 10 (0.10) 18 (0.097) 4(0.27) 35 (0.13) 23 (0.11)
End. Shopping (0.40) Museums (0.8) Shopping (0.78) Shopping(0.40) Culture(0.30) Shopping(0.28)
End. Sightseeing (0.36) Parks (0.03) Christmas (0.05) Food (0.35) Temples (0.24) Monuments(0.12)
End. Theatre (0.36) Galleries (0.03) Culture (0.04) Entertainment(0.06) Food (0.20) Diverse Food (0.10)
Table 3: Main topics and their endorsements for London and Bangkok
topic distribution of a destination with its total number of
visitors, or discover trends by finding relative changes in the
topics of each month’s endorsements.
4.2 The bad
Similarly we can map users to topics based on the endorse-
ments they provided for their past trips. Unfortunately, the
travel industry has notoriously sparse data. Netflix users
might watch tens or hundreds of items in a year, while our
average user performs just below two trips. Moreover, users
often do not give endorsements feedback on their trips.
We, therefore, model each user’s preferences as the prod-
uct of the topic distributions of the destinations they visit.
A di↵erent choice would be modelling the user as a mixture
of their visited destinations, but we are interested to the
single topic that will engage them the most, rather than all
the topics they have been exposed to.
On the destination dataset, there exists a set of destina-
tions that have a very small number of endorsements with
respect to the total number of visitors they get — like for
example airports and other travelling hubs. The small num-
ber of endorsements can produce a very peaked distribution
in the topic space, while the high number of visitors might
bring destinations like Heathrow or Schiphol on the top of
the respective topic’s lists. We address this issue by black-
listing the 1‰ of the destinations with the worst endorse-
ments to visitors ratio.
4.3 The ugly
Endorsement data are user generated and so are the fixed
endorsements we used for the LDA model. Inevitably, there
exist terms which might not be socially acceptable by some
cultures or can be found o↵ensive in others, like for example
the endorsement People Watching which was given approx-
imately 6000 times. As long as we stay within the most
common endorsements, we are relatively safe, but using the
full width of our vocabulary requires careful inspection of
user engagement.
The idea behind Latent Dirichlet Allocation is to map a
large number of words to a small number of topics. This
means that many words will be grouped together, and in
some topics, these activities might be very diverse. For
example Topic 17 contains Culturally-Diverse-Food (0.43),
Theatre (0.35), Modern-Art (0.09) and Scuba-Diving (0.05).
We need to be careful before using these endorsements si-
multaneously, as the final result might be confusing to the
user.
Last but not least, LDA returns a number of topics with
no clear naming. The success of our experiments will depend
on the name we give to the topics ourselves, and simple
solutions like using the top endorsement for each topic might
take away the content discovered by the topic modelling —
see for example di↵erent Shopping topics in table 3.
Eng. Users Interaction Net Conversion Cancellations
Eml 1 34M+ +18.34% +10.00% +14.80%
Eml 2 34M+ +18.71% +7.14% +4.39%
Menu 40M+ Ongoing Experiments
Table 4: Quantitative Results of the user engage-
ment campaigns. Users as split randomly between
the base and variant groups, and bold indicates sta-
tistically significant results.
5. USER ENGAGEMENT APPLICATIONS
The probabilistic nature of the resulting endorsement model
makes many di↵erent applications feasible. We experimented
in multiple areas of interaction with the user to optimise
their engagement to our product and facilitate their accom-
modation search, e.g. email marketing, product organisation
and recommendation personalisation. All our experiments
were A/B tested for statistical significance, and in almost all
of our experiments we saw a positive response to our KPIs.
In this section we present three experiments in which we
observed statistical significance in net conversion in. The
experiments’ results in two main KPIs, namely Net Conver-
sion and Cancellations, are listed in table 4. The increase
in net conversion is
Net Conversion =
Net Conversion in B
Net Conversion in A
1 (3)
where Net Conversion is the probability a user engaged will
reserve an accommodation in our website and stay at the
reserved accommodation.
The Cancellations KPI is the probability that a customer
who reserved an accommodation will later cancel it — and
we try to keep this ratio as small as possible — although a
more engaging campaign produces a higher cancellers ratio.
5.1 Email Marketing
Email campaigns are a direct and economical way to ex-
pose our users to our products. In the case of booking.com,
these emails contain proposed travelling destinations and in-
formation about special deals from partner hotels. Both per-
sonalised recommendations and deals have an obvious value
to the user, however, the repeating nature of the email makes
the campaigns wear-o↵ over time.
5.2 Email 1
We used the latent topics discovered through the Latent
Dirichlet Allocation analysis to detect the common topics
between the destinations visited by the target user and the
destinations returned by our recommender. A sample of the
resulting email is visible in figure 3, in which case the main
endorsements of the topic selected for the target user are
Beach, Relaxation and Food.
We performed A/B testing on more than 34 million users,
half of whom received the most popular endorsements email
Figure 3: The LDA-based email
(group A) and half the email with their personalised LDA-
based topic selection (group B).
The campaign was very beneficial in multiple KPIs, in-
cluding users clicking on the campaign, interacting with the
website and finally converting from visitors to bookers. More
specifically, We observed a 18% uplift in clicks at the same
bounce rate, 10% uplift in net conversion and 14% increase
in cancellations (22% increase in gross conversion) . All of
these di↵erences were statistically significant for a confidence
interval of 90%.
5.3 Email 2
The first email campaign used a quite bold statement in
the form of “our team of travel scientists ... based on your
past adventures they think that you have a passion for... ”,
visible in figure 3. This spawned a vivid response from the
community on twitter with people being excited about our
campaign, people complaining about the recommendations
they got and people being sceptical about the access we have
to their past data 4.
Our original message could have been misleading, as we
didn’t look at a person’s activities or endorsements but rather
the most common activities in the places they had visited,
as explained in section 4.2. We launched a second email
campaign where we used a less intrusive message promot-
ing endorsements in suggested destinations without a direct
reference to how these endorsements were selected.
We sent the email to more than 22 Million users with half
of them (group A) receiving the old text and half of them
(group B) receiving the new text. We saw an impressive
increase in net and gross conversion (+7.14% and 10.5%)
and a modest increase in cancellations (+4.39%).
5.4 Browsing Menu
The Destination Finder provided our users with a search
engine that allowed them to find the ideal destination for
their desired activity. Although this method o↵ers complete
freedom of search to the user, it requires them having a
concrete activity in mind before coming to the website. In
contrast, most e-shops organise their inventory in intuitive
hierarchies, and users are familiar with menu-based naviga-
tion.
Figure 4: Example of twits related to the latent topic
modelling campaign.
We produced country-specific menus, where we present
as a top menu hierarchy the topics of destinations which
bookers from the corresponding country prefer the most. For
example, people from the Netherlands get top-level menus
for Shopping, Fine Dining, Cycling, Sightseeing and Beach,
as opposed to Beach, Food, Historical Landmarks, Museums
and Shopping for Britain.
The second layer provides the countries where these activi-
ties are carried out most often. For example, people from the
Netherlands and Britain share destinations like Spain, Italy
and Greece for the topic Beach, but they only receive their
individual home-country in the second level of their menus.
Lastly, each of the countries opens a third level with specific
destinations in this country. An example of the menu for a
user from the Netherlands is visible in figure 5
The menu-based navigation was a major success. We ex-
posed more than 40 million users in a A/B test experiment
and observed statistical significance in all the user engage-
ments KPIs we had set. Unfortunately, since the menu im-
provement is an ongoing project at our website at this mo-
ment, we can not publish numerical results.
6. CONCLUSIONS AND FUTURE WORK
We have demonstrated a successful application of topic
modelling of user generated endorsement data using Latent
Dirichlet Allocation. The automatically discovered topics
have semantic significance and they can be used to optimise
the user engagement experience, for example in the form
Figure 5: The LDA-based menu
of personalised email marketing and menu-based product
browsing.
The initial results are very promising but we are still far
away from exploiting the potential of the latent topic space.
Our destination recommendations are still based on algo-
rithms that ignore the underlying topic distribution, and the
e↵ect that the individual topics (and their manually-selected
names) have on the users is hardly accounted for.
At the moment we monitor the individual interactions of
each latent topic to each cluster of users in our database, in
order to use them not only according to their likelihood but
also according to their e↵ectiveness. Moreover, we work on
an email campaign based on consecutive follow-ups depen-
dent on the user’s responsiveness, e.g. using the second best
topic choice if the first one failed to engage the user.
7. ACKNOWLEDGEMENTS
We would like to thank Stefano Baccianella, Gabriel Agu
for their work on the data processing for the Latent Dirich-
let Allocation and production implementation of the menu-
based navigation software, as well as Rozina Szogyenyi and
Leandro Pinto for their work on the Destination Finder and
Fabrizio Salzano for overseeing the implementation of the
email marketing campaign.
8. REFERENCES
[1] Destination finder webpage. http:
//www.booking.com/destinationfinder.en-gb.html.
Accessed: 2014-07-10.
[2] webpage. http:
//www.booking.com/destinationfinder.en-gb.html.
Accessed: 2014-07-10.
[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent
dirichlet allocation. J. Mach. Learn. Res., 3:993–1022,
Mar. 2003.
[4] F. Ricci, L. Rokach, and B. Shapira. Introduction to
recommender systems handbook. In Recommender
Systems Handbook, pages 1–35. 2011.
(source: http://www.ueo-workshop.com/wp-content/uploads/2014/04/noulasKDD2014.pdf)
(Hint: In reality you need to think long and hard
about problems and then still most experiments fail.
Succes lies in increasing the rate of experimentation
/ decreasing the cost of experimentation.)
Now what?
Do you trust gut feeling
over quantified priors?
“As a … I want …”, but
what is the
optimisation goal?
If you internally make
your data generally
available…
…and you attract the
right kind of data
science people…
…and you can buy
supercomputers for the
price of a game console…
…then the limiting
factor is the maximum
rate of experimentation.
Do you have an
experimentation
framework?
GoDataDriven
We’re hiring / Questions? / Thank you!
@fzk
frisovanvollenhoven@godatadriven.com
Friso van Vollenhoven
CTO

More Related Content

Viewers also liked

Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
GoDataDriven
 

Viewers also liked (12)

Robustifying Descriptor Instability using Fisher Vectors
Robustifying Descriptor Instability using Fisher VectorsRobustifying Descriptor Instability using Fisher Vectors
Robustifying Descriptor Instability using Fisher Vectors
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
Divolte collector overview
Divolte collector overviewDivolte collector overview
Divolte collector overview
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
 
Bare metal Hadoop provisioning
Bare metal Hadoop provisioningBare metal Hadoop provisioning
Bare metal Hadoop provisioning
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
 
Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Magic, art or science? Deep learning unraveled
Magic, art or science? Deep learning unraveledMagic, art or science? Deep learning unraveled
Magic, art or science? Deep learning unraveled
 
Yes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
Yes you can play Monopoly with a Genetic Algorithm - Niels ZeilemakerYes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
Yes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @Twitter
 
Parallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkRParallelizing Existing R Packages with SparkR
Parallelizing Existing R Packages with SparkR
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 

Similar to Xebicon 2015 - Go Data Driven NOW!

Digital inclusion financial_capability
Digital inclusion financial_capabilityDigital inclusion financial_capability
Digital inclusion financial_capability
MinistryThrift
 
MKTG704_Brandon Langford_CoolHuntFinalReport_20151221
MKTG704_Brandon Langford_CoolHuntFinalReport_20151221MKTG704_Brandon Langford_CoolHuntFinalReport_20151221
MKTG704_Brandon Langford_CoolHuntFinalReport_20151221
Brandon Langford
 

Similar to Xebicon 2015 - Go Data Driven NOW! (20)

10 itf-tutorial mmmmm
10 itf-tutorial mmmmm10 itf-tutorial mmmmm
10 itf-tutorial mmmmm
 
www-valueappz-com-blog-types-of-marketplace-solutions.pdf
www-valueappz-com-blog-types-of-marketplace-solutions.pdfwww-valueappz-com-blog-types-of-marketplace-solutions.pdf
www-valueappz-com-blog-types-of-marketplace-solutions.pdf
 
242031463 e-marketing-ch-1
242031463 e-marketing-ch-1242031463 e-marketing-ch-1
242031463 e-marketing-ch-1
 
Prototyping digital business and services
Prototyping digital business and servicesPrototyping digital business and services
Prototyping digital business and services
 
Computational Marketing at Groupon - JCSSE 2017
Computational Marketing at Groupon - JCSSE 2017Computational Marketing at Groupon - JCSSE 2017
Computational Marketing at Groupon - JCSSE 2017
 
Food Chain Theory February 2010 Revisited
Food Chain Theory February 2010 RevisitedFood Chain Theory February 2010 Revisited
Food Chain Theory February 2010 Revisited
 
The 2015 Digital Consumer Collaborative Charter
The 2015 Digital Consumer Collaborative CharterThe 2015 Digital Consumer Collaborative Charter
The 2015 Digital Consumer Collaborative Charter
 
Digital inclusion financial_capability
Digital inclusion financial_capabilityDigital inclusion financial_capability
Digital inclusion financial_capability
 
Tran
TranTran
Tran
 
Marketing in travel and tourism(london)
Marketing in travel and tourism(london)Marketing in travel and tourism(london)
Marketing in travel and tourism(london)
 
Cmm 2011 final
Cmm 2011 finalCmm 2011 final
Cmm 2011 final
 
Webinar slides - Internet of Things Convergence: Preparatory Studies
Webinar slides - Internet of Things Convergence: Preparatory StudiesWebinar slides - Internet of Things Convergence: Preparatory Studies
Webinar slides - Internet of Things Convergence: Preparatory Studies
 
MKTG704_Brandon Langford_CoolHuntFinalReport_20151221
MKTG704_Brandon Langford_CoolHuntFinalReport_20151221MKTG704_Brandon Langford_CoolHuntFinalReport_20151221
MKTG704_Brandon Langford_CoolHuntFinalReport_20151221
 
Two-sided Internet platforms: A business model lifecycle perspective
Two-sided Internet platforms: A business model lifecycle perspectiveTwo-sided Internet platforms: A business model lifecycle perspective
Two-sided Internet platforms: A business model lifecycle perspective
 
IMAGE CONTENT IN LOCATION-BASED SHOPPING RECOMMENDER SYSTEMS FOR MOBILE USERS
IMAGE CONTENT IN LOCATION-BASED SHOPPING RECOMMENDER SYSTEMS FOR MOBILE USERSIMAGE CONTENT IN LOCATION-BASED SHOPPING RECOMMENDER SYSTEMS FOR MOBILE USERS
IMAGE CONTENT IN LOCATION-BASED SHOPPING RECOMMENDER SYSTEMS FOR MOBILE USERS
 
Consumer perception towards business websites Mba project
Consumer perception towards business websites Mba project Consumer perception towards business websites Mba project
Consumer perception towards business websites Mba project
 
Product customization in the ayurveda change new
Product customization in the ayurveda  change newProduct customization in the ayurveda  change new
Product customization in the ayurveda change new
 
Impact of electronic commerce in marketing and brand developing
Impact of electronic commerce in marketing and brand developingImpact of electronic commerce in marketing and brand developing
Impact of electronic commerce in marketing and brand developing
 
E commerce digital markets digital goods
E commerce digital markets  digital goodsE commerce digital markets  digital goods
E commerce digital markets digital goods
 
E-Tourism Assignment
E-Tourism AssignmentE-Tourism Assignment
E-Tourism Assignment
 

More from GoDataDriven

DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
GoDataDriven
 

More from GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Recently uploaded

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Xebicon 2015 - Go Data Driven NOW!

  • 1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP @fzk frisovanvollenhoven@godatadriven.com Go Data Driven NOW! Friso van Vollenhoven CTO
  • 2.
  • 3.
  • 4.
  • 5.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. (Hint:This only works if you know how to look at data.)
  • 13. What about Big Data?
  • 17. The Big Data way (http://trends.google.com)
  • 18. The Big Data way (source: Google Trends)
  • 19. (Hint:This only works if you put effort into the general availability of data.)
  • 21. 1985 €100000.00 1990 €110408.08 1995 €121899.44 2000 €134586.83 2005 €148594.74 2010 €164060.60 2015 €181136.16
  • 22.
  • 25. computing power price 1985 1900000000 16000000 2015 7000000000000 999 change times 370 divide by 16000
  • 26.
  • 27. Cray 2 iPad 2 (source: http://www.phoronix.com/scan.php?page=news_item&px=MTE4NjU)
  • 29.
  • 30.
  • 31.
  • 32. (Hint: If hardware and licensing costs are part of your Big Data business cases, you’re working on the wrong cases in the wrong way.)
  • 34.
  • 35. User Engagement through Topic Modelling in Travel Athanasios Noulas ⇤ Rembrandt Square Office Herengracht 597, 1017 CE Amsterdam athanasios.noulas@booking.com Mats Stafseng Einarsen † Rembrandt Square Office Herengracht 597, 1017 CE Amsterdam mats.einarsen@booking.com ABSTRACT Booking.com engages its users in di↵erent ways, like for example email campaigns or on-site recommendations, in which the user receives suggestions for the destination of their next trip. This engagement is data-driven, and its pa- rameters emerge from the corresponding relevant past be- havioural pattern of users in the form of collaborative fil- tering or other recommender algorithms. In this work we use a secondary database with meta-information about the recommended destination in the form of user endorsements. We model the endorsements using Latent Dirichlet alloca- tion, a well-known principled probabilistic framework, and use the resulting latent space to optimise user engagement. We demonstrate measurable benefits in two distinct inter- actions with the user in the form of email marketing and menu-based website browsing. 1. INTRODUCTION Booking.com [2] is the world’s largest online travel agent and millions of users visit its e-shop to reserve their accom- modation. As in most shopping experiences, the users have often only partially settled their product expectations. For example, the user might have a fixed budget and dates, but still be flexible in terms of the final destination. In such cases, optimising user engagement becomes a high impact objective, as an engaged user has higher chances on settling the final details of his choice and making a purchase which leads to short- and long-term prosperity of the company. There has been extensive work on recommending relevant items to the user [4] and much of those recommendations are provided both on the website of Booking.com as“alternative destinations” as well as through our email campaign which reaches millions of subscribers all over the world. Unfor- tunately, in a world-wide-web drowning in advertisements, these recommendations often fail to di↵erentiate themselves, ⇤Data Scientist, Visitor Profiling †Senior Product Owner, Front End Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD, The Second Workshop on User Engagement Optimization 2014 New York City, New York USA Copyright 2014 ACM X-XXXXX-XX-X/XX/XX ...$15.00. Endorsement # given # destinations # users Shopping 513536 9105 467143 Food 242647 13595 228130 Beach 237035 8255 220603 Adventure 9622 2327 9374 Chinatown 2041 132 1992 Table 1: Example endorsements, the number of times this endorsement was given by our users, the number of destinations that have been endorsed with this endorsement and the number of users that have given this endorsement and they are therefore discarded. Moreover, simple collab- orative filter-like messages, e.g. user who travelled to Rome also travelled to Florence, miss the personal touch which will engage the user optimally. Over the last year1 , Booking.com launched the Destina- tion Finder [1], a project that allowed users to endorse des- tinations for di↵erent activities. The objective of the project was to help users choose a destination based on their favourite activities rather than the location itself. What di↵erentiates the Destination Finder endorsements from usual travel web- sites is that they come strictly from people who have stayed at the destination they endorse — as endorsements are as- sociated to a completed hotel reservation. The objective of our project was to use the endorsement meta-data to opti- mise and personalise user engagement in every part of their interaction with Booking.com. The remainder of this paper is organised as follows. Sec- tion 2 contains a short description of the endorsement data. Section 3 gives a brief overview of the Latent Dirichlet Allo- cation[3], a probabilistic topic model we used to analyse the endorsement data. Section 4 presents the results acquired on the endorsement dataset. Section 5 describes di↵erent suc- cessful applications of the resulting endorsement data model along with their quantitative comparative results. 2. ENDORSEMENT DATA When this work was carried out the destination finder had collected more than 10 million endorsements from more than 2 million real users. These endorsements can be free, unprocessed text, or be one of 213 fixed endorsements. Some statistics of the most and least common fixed endorsements are shown in table 1. 1 The first endorsement was recorded in September 2013 Bangkok Book now Shopping Food Nightlife Sightseeing Temples 12416 3980 2745 1862 1822 endorsements endorsements endorsements endorsements endorsements Clothes Shopping 1048 Street Food 906 Markets 852 Gourmet Food 747 Monuments 654 City Trip 650Culture 1424 Relaxation 603 Show all Culturally Diverse Food 589 Friendly People 557 Luxury Brand Shopping 396 Fashion Bargains 313 Local Food 309 London Book now Shopping Sightseeing Museums Theater Culture 32075 18474 12133 10438 8908 endorsements endorsements endorsements endorsements endorsements History 6486 Food 4714 Entertainment 4331 City Trip 4113 Nightlife 3616 Musicals 2999Monuments 1424 Restaurants 2957 Show all Parks 2776 Pubs 2575 Architecture 2493 Luxury Brand Shopping 2350 Live Music 2237 Figure 1: The endorsement pages of Destination Finder for Bangkok and London The endorsement dataset presents a number of challenges which are common in natural language processing applica- tions: 1. Sparsity: Most of the endorsements represent a very long tail appearing very rarely 2. Ambiguity: Endorsements like ”Shopping” can repre- sent a variety of activities, ranging from shopping for luxury brands to buying souvenirs or bargaining in an open market 3. Competition: Most users give a limited number of en- dorsements, so inevitably endorsements in large cities, where multiple activities are available, will compete against each other for this limited space. 4. Redundancy: People mention ”Food”, ”Street Food” and so on interchangeably 5. Relativity: London might be a great destination for a city-trip from Amsterdam, but it is a far-away des- tination for someone coming from Thailand. Also, Culturally-Diverse-Food, which is a very common en- dorsement, has completely di↵erent meaning at di↵er- ent places of the world. As a result, most cities usually have groups of endorse- ments with similar frequency which depend both on the city and the visitors it attracts. For example, figure 1 contains the endorsements given to Bangkok and London as they are presented in each city’s Destination Finder page. We can see that Shopping is first in both destination, while Bangkok gets higher percentage of endorsements related to nightlife in contrast to London which gets more culture-related en- dorsements. 3. LATENT DIRICHLET ALLOCATION Latent Dirichlet Allocation [3] is a generative model which has been used extensively to model the distribution of words in documents. The naming comes from the fact that these topics are latent in the sense that we retrieve them from an unlabelled dataset as opposed to produce them using labelled data. The Dirichlet part refers to the priors we provide for the topic distribution in the documents and the word distribution in each topic. The objective behind the application of Latent Dirichlet Allocation is to model each document as a mixture of a small number of topics and at the same time attribute each word of the document to one of the topics present in the corresponding document. The probabilistic graphical model of the Latent Dirich- Figure 2: The probabilistic graphical model of LDA let Allocation is visible in figure 2. The plates represent M documents each one containing some of the N words of our vocabulary. In mathematical terms ↵ and are priors for the Dirichlet distributions of topics and words. Each docu- ment i has a topic distribution parameterised by ✓i and each topic k has a word distribution parameterised by k. The jth word in document i is then wij and it is associated to topic zij. The generative process then works in three steps: 1. Choose ✓i ⇠ Dir(↵), where i 2 {1, . . . , M} and Dir(↵) is the Dirichlet distribution for parameter ↵ 2. Choose k ⇠ Dir( ) , where k 2 {1, . . . , K} 3. For each of the word positions i, j, where j 2 {1, . . . , Ni}, and i 2 {1, . . . , M} • Choose a topic zi,j ⇠ Multinomial(✓i). • Choose a word wi,j ⇠ Multinomial( zi,j ). In practice we need to provide the learning algorithm with a corpus containing the M documents as the counts for each of the N word for each document. We further provide the desired parameter K which represents the total number of topics we want to discover. During learning, the model will retrieve the distribution over words for each topic so as to maximise the complete corpus likelihood: P(W , Z, ✓, '; ↵, ) = KY i=1 P('i; ) MY j=1 P(✓j; ↵) · (1) NY t=1 P(Zj,t|✓j)P(Wj,t|'Zj,t ) (2) we can then use this distribution to map each of the corpus’ documents to a mixture over the latent topics. We refer the reader interested in the details of learning and inference for the Latent Dirichlet Allocation model to David Blei’s page 2 which contains more information regarding topic modelling, many open-source implementations of Latent Dirichlet Al- location as well as a review paper and a tutorial presented in KDD-2011. 4. MODELLING ENDORSEMENT DATA The application of Latent Dirichlet Allocation (LDA) on our endorsement data is quite straightforward. Each en- dorsement is one word, while each reservation and its cor- responding endorsements can be seen as a single document 2 http://www.cs.princeton.edu/ blei/topicmodeling.html Topic Dominant Endorsements (probability of appearance) Example Des- tinations 1 Romantic (0.64), Roman- ruins (0.06), Photogra- phy (0.06), Historical- Landmarks (0.06) Brugge, Vi- enna, Verona 2 Snorkelling (0.23), Diving (0.19), Sun-Bathing (0.11), Walking-with-Kids (0.10) Bayahibe, Ko Phi Phi Don, Rhodes 4 Shopping (0.40), Food (0.35), Walking (0.07), Entertainment (0.06) Taipei, Buenos Aires, Valencia 23 Shopping (0.28), Monu- ments (0.12), Culturally- Diverse-Food (0.10), Busi- ness (0.08) Sydney, Lisbon, Liver- pool Table 2: Some of the topics discovered by the Latent Dirichlet Allocation. The topic number an identifier of the latent topics discovered by LDA. The domi- nant endorsements are the endorsements that ap- pear most probably when a trip is associated to the corresponding topic, while in brackets we have their probability of appearance. and its corresponding words. LDA requires a user-defined number of latent topics, and after experimentation with 5, 20, 40, 100 or 200 topics we acquired the best empirical re- sults with 40 topics. A few sample topics are visible in table 2, along with the most common endorsements associated to each topic. At this moment there is no topic modelling algorithm uni- versally accepted as the optimal solution. We found LDA to be very e cient for our endorsement data, but as with any application on a real-world dataset there were some inter- esting observations which we organise here in three sections, namely the good, the bad and the ugly. 4.1 The good The LDA allows us to map any trip to a mixture of top- ics, since during training we used each individual trip as a document. Moreover, we can express each destination as a mixture of the trips done there and thus map each destina- tion to the latent topic space. In the example of Bangkok and London the main topics are visible in table 3. In both destinations, as in any big city, Shopping is one of the most common activities trav- ellers do, but in this case it is disambiguated into di↵erent topics. People who travel to London combine shopping with Sightseeing and Theater, or do it as part of the Christmas Shopping, while people in Bangkok combine it with Food and Entertainment or monuments and Culturally-Diverse- Food. Finally, the two destinations are clearly separated by their second most prominent topic which in case of London is Museums, while in the case of Bangkok it is Culture and Temples. Mapping destinations to topics works extremely well for major destinations since we have thousands of trips to de- scribe them. This mapping also works very well for less pop- ular destinations, discovering great places for niche endorse- ments like ”Fine-Dining”, ”Volcanoes”or ”Bar-hopping”. More- over, it is very easy to express popularity by multiplying the City London Bangkok Topic 25 (0.29) 10 (0.10) 18 (0.097) 4(0.27) 35 (0.13) 23 (0.11) End. Shopping (0.40) Museums (0.8) Shopping (0.78) Shopping(0.40) Culture(0.30) Shopping(0.28) End. Sightseeing (0.36) Parks (0.03) Christmas (0.05) Food (0.35) Temples (0.24) Monuments(0.12) End. Theatre (0.36) Galleries (0.03) Culture (0.04) Entertainment(0.06) Food (0.20) Diverse Food (0.10) Table 3: Main topics and their endorsements for London and Bangkok topic distribution of a destination with its total number of visitors, or discover trends by finding relative changes in the topics of each month’s endorsements. 4.2 The bad Similarly we can map users to topics based on the endorse- ments they provided for their past trips. Unfortunately, the travel industry has notoriously sparse data. Netflix users might watch tens or hundreds of items in a year, while our average user performs just below two trips. Moreover, users often do not give endorsements feedback on their trips. We, therefore, model each user’s preferences as the prod- uct of the topic distributions of the destinations they visit. A di↵erent choice would be modelling the user as a mixture of their visited destinations, but we are interested to the single topic that will engage them the most, rather than all the topics they have been exposed to. On the destination dataset, there exists a set of destina- tions that have a very small number of endorsements with respect to the total number of visitors they get — like for example airports and other travelling hubs. The small num- ber of endorsements can produce a very peaked distribution in the topic space, while the high number of visitors might bring destinations like Heathrow or Schiphol on the top of the respective topic’s lists. We address this issue by black- listing the 1‰ of the destinations with the worst endorse- ments to visitors ratio. 4.3 The ugly Endorsement data are user generated and so are the fixed endorsements we used for the LDA model. Inevitably, there exist terms which might not be socially acceptable by some cultures or can be found o↵ensive in others, like for example the endorsement People Watching which was given approx- imately 6000 times. As long as we stay within the most common endorsements, we are relatively safe, but using the full width of our vocabulary requires careful inspection of user engagement. The idea behind Latent Dirichlet Allocation is to map a large number of words to a small number of topics. This means that many words will be grouped together, and in some topics, these activities might be very diverse. For example Topic 17 contains Culturally-Diverse-Food (0.43), Theatre (0.35), Modern-Art (0.09) and Scuba-Diving (0.05). We need to be careful before using these endorsements si- multaneously, as the final result might be confusing to the user. Last but not least, LDA returns a number of topics with no clear naming. The success of our experiments will depend on the name we give to the topics ourselves, and simple solutions like using the top endorsement for each topic might take away the content discovered by the topic modelling — see for example di↵erent Shopping topics in table 3. Eng. Users Interaction Net Conversion Cancellations Eml 1 34M+ +18.34% +10.00% +14.80% Eml 2 34M+ +18.71% +7.14% +4.39% Menu 40M+ Ongoing Experiments Table 4: Quantitative Results of the user engage- ment campaigns. Users as split randomly between the base and variant groups, and bold indicates sta- tistically significant results. 5. USER ENGAGEMENT APPLICATIONS The probabilistic nature of the resulting endorsement model makes many di↵erent applications feasible. We experimented in multiple areas of interaction with the user to optimise their engagement to our product and facilitate their accom- modation search, e.g. email marketing, product organisation and recommendation personalisation. All our experiments were A/B tested for statistical significance, and in almost all of our experiments we saw a positive response to our KPIs. In this section we present three experiments in which we observed statistical significance in net conversion in. The experiments’ results in two main KPIs, namely Net Conver- sion and Cancellations, are listed in table 4. The increase in net conversion is Net Conversion = Net Conversion in B Net Conversion in A 1 (3) where Net Conversion is the probability a user engaged will reserve an accommodation in our website and stay at the reserved accommodation. The Cancellations KPI is the probability that a customer who reserved an accommodation will later cancel it — and we try to keep this ratio as small as possible — although a more engaging campaign produces a higher cancellers ratio. 5.1 Email Marketing Email campaigns are a direct and economical way to ex- pose our users to our products. In the case of booking.com, these emails contain proposed travelling destinations and in- formation about special deals from partner hotels. Both per- sonalised recommendations and deals have an obvious value to the user, however, the repeating nature of the email makes the campaigns wear-o↵ over time. 5.2 Email 1 We used the latent topics discovered through the Latent Dirichlet Allocation analysis to detect the common topics between the destinations visited by the target user and the destinations returned by our recommender. A sample of the resulting email is visible in figure 3, in which case the main endorsements of the topic selected for the target user are Beach, Relaxation and Food. We performed A/B testing on more than 34 million users, half of whom received the most popular endorsements email Figure 3: The LDA-based email (group A) and half the email with their personalised LDA- based topic selection (group B). The campaign was very beneficial in multiple KPIs, in- cluding users clicking on the campaign, interacting with the website and finally converting from visitors to bookers. More specifically, We observed a 18% uplift in clicks at the same bounce rate, 10% uplift in net conversion and 14% increase in cancellations (22% increase in gross conversion) . All of these di↵erences were statistically significant for a confidence interval of 90%. 5.3 Email 2 The first email campaign used a quite bold statement in the form of “our team of travel scientists ... based on your past adventures they think that you have a passion for... ”, visible in figure 3. This spawned a vivid response from the community on twitter with people being excited about our campaign, people complaining about the recommendations they got and people being sceptical about the access we have to their past data 4. Our original message could have been misleading, as we didn’t look at a person’s activities or endorsements but rather the most common activities in the places they had visited, as explained in section 4.2. We launched a second email campaign where we used a less intrusive message promot- ing endorsements in suggested destinations without a direct reference to how these endorsements were selected. We sent the email to more than 22 Million users with half of them (group A) receiving the old text and half of them (group B) receiving the new text. We saw an impressive increase in net and gross conversion (+7.14% and 10.5%) and a modest increase in cancellations (+4.39%). 5.4 Browsing Menu The Destination Finder provided our users with a search engine that allowed them to find the ideal destination for their desired activity. Although this method o↵ers complete freedom of search to the user, it requires them having a concrete activity in mind before coming to the website. In contrast, most e-shops organise their inventory in intuitive hierarchies, and users are familiar with menu-based naviga- tion. Figure 4: Example of twits related to the latent topic modelling campaign. We produced country-specific menus, where we present as a top menu hierarchy the topics of destinations which bookers from the corresponding country prefer the most. For example, people from the Netherlands get top-level menus for Shopping, Fine Dining, Cycling, Sightseeing and Beach, as opposed to Beach, Food, Historical Landmarks, Museums and Shopping for Britain. The second layer provides the countries where these activi- ties are carried out most often. For example, people from the Netherlands and Britain share destinations like Spain, Italy and Greece for the topic Beach, but they only receive their individual home-country in the second level of their menus. Lastly, each of the countries opens a third level with specific destinations in this country. An example of the menu for a user from the Netherlands is visible in figure 5 The menu-based navigation was a major success. We ex- posed more than 40 million users in a A/B test experiment and observed statistical significance in all the user engage- ments KPIs we had set. Unfortunately, since the menu im- provement is an ongoing project at our website at this mo- ment, we can not publish numerical results. 6. CONCLUSIONS AND FUTURE WORK We have demonstrated a successful application of topic modelling of user generated endorsement data using Latent Dirichlet Allocation. The automatically discovered topics have semantic significance and they can be used to optimise the user engagement experience, for example in the form Figure 5: The LDA-based menu of personalised email marketing and menu-based product browsing. The initial results are very promising but we are still far away from exploiting the potential of the latent topic space. Our destination recommendations are still based on algo- rithms that ignore the underlying topic distribution, and the e↵ect that the individual topics (and their manually-selected names) have on the users is hardly accounted for. At the moment we monitor the individual interactions of each latent topic to each cluster of users in our database, in order to use them not only according to their likelihood but also according to their e↵ectiveness. Moreover, we work on an email campaign based on consecutive follow-ups depen- dent on the user’s responsiveness, e.g. using the second best topic choice if the first one failed to engage the user. 7. ACKNOWLEDGEMENTS We would like to thank Stefano Baccianella, Gabriel Agu for their work on the data processing for the Latent Dirich- let Allocation and production implementation of the menu- based navigation software, as well as Rozina Szogyenyi and Leandro Pinto for their work on the Destination Finder and Fabrizio Salzano for overseeing the implementation of the email marketing campaign. 8. REFERENCES [1] Destination finder webpage. http: //www.booking.com/destinationfinder.en-gb.html. Accessed: 2014-07-10. [2] webpage. http: //www.booking.com/destinationfinder.en-gb.html. Accessed: 2014-07-10. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, Mar. 2003. [4] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. In Recommender Systems Handbook, pages 1–35. 2011. (source: http://www.ueo-workshop.com/wp-content/uploads/2014/04/noulasKDD2014.pdf)
  • 36. (Hint: In reality you need to think long and hard about problems and then still most experiments fail. Succes lies in increasing the rate of experimentation / decreasing the cost of experimentation.)
  • 38. Do you trust gut feeling over quantified priors?
  • 39. “As a … I want …”, but what is the optimisation goal?
  • 40. If you internally make your data generally available…
  • 41. …and you attract the right kind of data science people…
  • 42. …and you can buy supercomputers for the price of a game console…
  • 43. …then the limiting factor is the maximum rate of experimentation.
  • 44.
  • 45. Do you have an experimentation framework?
  • 46. GoDataDriven We’re hiring / Questions? / Thank you! @fzk frisovanvollenhoven@godatadriven.com Friso van Vollenhoven CTO