SlideShare a Scribd company logo
1 of 15
Download to read offline
 

Optimization	
  of	
  Digital	
  Marketing	
  Campaigns	
  
	
  
Armando	
  Vieira,	
  Inesting	
  
	
  
Abstract	
  
In	
   this	
   work	
   we	
   apply	
   several	
   clustering,	
   visualization	
   and	
   predictive	
   machine	
   learning	
  
techniques	
   to	
   analyse	
   data	
   from	
   digital	
   marketing	
   campaigns.	
   For	
   data	
   exploration	
   we	
   used	
  
unsupervised	
  techniques	
  like	
  k-­‐means,	
  Principal	
  Component	
  Analysis	
  (PCA),	
  Multidimensional	
  
Scaling	
   (MDS)	
   and	
   Self-­‐Organized	
   Maps	
   (SOM).	
   We	
   identified	
   patterns	
   that	
   help	
   the	
   analyst	
  
understand	
   the	
   vast	
   amount	
   of	
   data	
   produced	
   by	
   digital	
   trails	
   and	
   guide	
   their	
   actions	
  
(actionable	
   insights).	
   Support	
   Vector	
   Machines	
   and	
   Random	
   Forest	
   algorithm	
   were	
   used	
   for	
  
supervised	
  learning	
  of	
  conversions	
  prediction.	
  	
  
	
  
Keywords:	
  ad	
  optimization,	
  Adwords,	
  Predictive	
  Analytics,	
  SEO,	
  digital	
  marketing	
  	
  

1 Introduction	
  
Online	
   advertising	
   has	
   evolved	
   into	
   a	
   $50	
   billion	
   industry	
   and	
   continues	
   to	
   grow	
   by	
   double	
  
digits.	
   On	
   the	
   other	
   hand,	
   powerful	
   web	
   analytic	
   tools,	
   such	
   as	
   Google	
   Analytics,	
   Facebook	
  
Insights	
  or	
  Kissmetrics,	
  provide	
  key	
  data	
  easily	
  available	
  to	
  anyone	
  who	
  wants	
  to	
  monitor	
  the	
  
performance	
   of	
   their	
   campaigns	
   online.	
   For	
   e-­‐commerce	
   sites,	
   the	
   analyst	
   has	
   the	
   ability	
   to	
  
track	
  every	
  single	
  action	
  of	
  the	
  visitor	
  over	
  the	
  conversion	
  path	
  and	
  answer	
  the	
  fundamental	
  
questions:	
  who,	
  what,	
  why,	
  how	
  and	
  when,	
  from	
  a	
  lead	
  to	
  the	
  purchase.	
  	
  
Our	
   interest	
   lies	
   in	
   monitoring	
   the	
   impact	
   campaigns	
   have	
   on	
   website	
   traffic,	
  
engagement	
  and	
  revenue	
  (in	
  the	
  case	
  of	
  e-­‐commerce).	
  	
  A	
  principal	
  form	
  of	
  online	
  advertising	
  is	
  
the	
   promotion	
   of	
   products	
   and	
   services	
   through	
   search-­‐based	
   advertising.	
   Today’s	
   most	
  
popular	
   search-­‐based	
   advertising	
   platform	
   is	
   Google	
   Adwords,	
   having	
   the	
   largest	
   share	
   of	
  
revenues.	
  Search	
  remains	
  the	
  largest	
  online	
  advertising	
  revenue	
  format,	
  accounting	
  for	
  46.5%	
  
of	
  2011	
  advertising	
  revenues,	
  up	
  from	
  44.8%	
  in	
  2010.	
  In	
  2011,	
  Search	
  revenues	
  totalled	
  $14.8	
  
billion,	
  up	
  almost	
  27%	
  from	
  $11.7	
  billion	
  in	
  2010.	
  	
  
This	
   gives	
   an	
   unprecedent	
   power	
   to	
   the	
   marketing	
   team	
   but	
   at	
   a	
   cost:	
   the	
   huge	
  
amounts	
  of	
  unstructured,	
  disparate	
  and	
  complex	
  data	
  to	
  be	
  processed	
  and	
  parameters	
  to	
  be	
  
adjusted.	
   The	
   effort	
   required	
   to	
   deal	
   with	
   the	
   number	
   of	
   options	
   and	
   configurations	
   for	
  
optimal	
  performance	
  of	
  a	
  company	
  website	
  is	
  simple	
  far	
  beyond	
  human	
  capabilities.	
  
Furthermore	
  some	
  parameters	
  have	
  non-­‐linear	
  interactions:	
  for	
  instance	
  the	
  quality	
  of	
  
the	
   SEO	
   boosts	
   the	
   position	
   of	
   the	
   Ad	
   in	
   Adwords	
   campaigns,	
   thus	
   achieving	
   a	
   better	
  
performance	
   for	
   a	
   lower	
   PPC.	
   The	
   budget	
   allocated	
   to	
   the	
   campaign	
   also	
   influences	
   the	
   Ad	
  
position.	
  There	
  are	
  even	
  subtler	
  influences	
  and	
  nuances	
  when	
  measuring	
  the	
  ROI.	
  For	
  instance,	
  
it	
   is	
   known	
   that	
   although	
   display	
   advertising	
   brings	
   very	
   little	
   direct	
   sales,	
   it	
   may	
   boost	
   the	
  
performance	
  of	
  search	
  Ads	
  since	
  users	
  where	
  previously	
  exposed	
  to	
  the	
  product	
  or	
  brand.	
  
To	
  optimize	
  this	
  myriad	
  of	
  parameters	
  we	
  need	
  to	
  rely	
  on	
  machine	
  learning	
  algorithms	
  
to	
   extract	
   actionable	
   insights	
   and	
   answers	
   some	
   simple	
   questions	
   like:	
   how	
   to	
   improve	
   my	
  
return	
   on	
   investment	
   (ROI)?	
   How	
   to	
   boost	
   costumer	
   engagement?	
   	
   What	
   product	
   generate	
  
most	
   interest?	
   What	
   catalysis	
   sales?	
   What	
   strategy	
   to	
   opt?	
   What	
   channels	
   to	
   choose?	
   How	
  
much	
  should	
  I	
  invest?	
  When,	
  how?	
  These	
  are	
  very	
  important	
  question	
  with	
  no	
  clear	
  a	
  single	
  
answer.	
  Most	
  of	
  them	
  depend	
  on	
  each	
  case,	
  and	
  some	
  are	
  two	
  vague	
  to	
  be	
  answered.	
  
Under	
   these	
   circumstances,	
   the	
   safe	
   strategy	
   starts	
   by	
   design	
   carefully	
   an	
   ad,	
   select	
  
adequate	
   keywords,	
   set	
   the	
   bids,	
   segment	
   the	
   campaign	
   properly	
   and	
   test	
   continuously	
   for	
  
fine-­‐tuning.	
  If	
  results	
  are	
  not	
  as	
  expected,	
  then	
  look	
  at	
  the	
  data,	
  learn,	
  make	
  corrections,	
  and	
  
repeat	
  the	
  cycle.	
  
	
  
Most	
   the	
   research	
   have	
   been	
   focused	
   on	
   the	
   publisher	
   side,	
   trying	
   to	
   device	
   strategies	
  
to	
  maximize	
  the	
  CTR	
  of	
  Ads,	
  by	
  means	
  of	
  content	
  contextualization,	
  ads	
  personalization	
  among	
  
others	
  [**].	
  In	
  this	
  work,	
  however,	
  we	
  take	
  the	
  perspective	
  of	
  the	
  advertiser	
  and	
  will	
  explore	
  
the	
   potential	
   of	
   machine	
   learning	
   tools	
   for	
   prediction	
   and	
   optimization	
   of	
   the	
   marketing	
  
strategy.	
   The	
   objective	
   is	
   to	
   maximize	
   performance	
   and	
   effectiveness	
   of	
   marketing	
   campaigns,	
  
namely	
   the	
   Return	
   On	
   Investment	
   (ROI).	
   We	
   propose	
   a	
   system	
   to	
   extract	
   information	
   from	
  
Google	
  Analytics	
  and	
  determine	
  the	
  most	
  important	
  for	
  optimization.	
  	
  
The	
   article	
   is	
   organized	
   as	
   follows.	
   In	
   section	
   2	
   we	
   introduce	
   the	
   data	
   and	
   pre-­‐processing.	
  
In	
  section	
  3	
  we	
  explore	
  the	
  data	
  and	
  extract	
  relevant	
  features	
  using	
  clustering	
  algorithms,	
  like	
  
k-­‐means,	
  PCA	
  and	
  MDS	
  and	
  SOM.	
  In	
  Section	
  4	
  we	
  introduce	
  the	
  supervised	
  learning,	
  where	
  we	
  
predict	
  Conversions,	
  Revenues	
  and	
  user	
  engagement.	
  Finally	
  in	
  section	
  6	
  some	
  conclusions	
  are	
  
drawn.	
  

2 Data	
  	
  
2.1 Data	
  Extraction	
  and	
  description	
  	
  
Data	
  was	
  collected	
  from	
  a	
  costumer	
  running	
  campaigns	
  on	
  an	
  ecommerce	
  site	
  with	
  Adwords	
  
campaign,	
  Facebook	
  and	
  email	
  marketing.	
  Data,	
  collected	
  on	
  a	
  daily	
  frequency	
  over	
  a	
  period	
  of	
  
6	
   months,	
   is	
   described	
   in	
   Table	
   1.	
   Our	
   main	
   data	
   sources	
   are	
   Google	
   Analytics	
   (GA)	
   -­‐	
   that	
  
aggregate	
   data	
   from	
   Google	
   Adwords	
   -­‐	
   and	
   Facebook	
   Insights.	
   We	
   focused	
   on	
   inputs	
   that	
   may	
  
give	
  us	
  access	
  to	
  insights,	
  namely	
  correlations	
  between	
  conversions	
  and	
  site	
  usage	
  or	
  Adwords	
  
campaigns.	
  	
  
We	
   used	
   the	
   package	
   RGoogleAnalytics	
   (RGA)	
   to	
   extracted	
   data	
   into	
   R	
   from	
   Google	
  
Analytics.	
   We	
   collected	
   data	
   from	
   Adwords,	
   Facebook	
   and	
   email	
   campaigns	
   -­‐	
   Table	
   1.	
   Data	
  
was	
  collected	
  over	
  different	
  timeframes	
  and	
  consolidated	
  by	
  date.	
  For	
  some	
  cases,	
  data	
  was	
  
decomposed	
  by	
  traffic	
  source	
  in	
  GA,	
  and	
  by	
  group	
  segment	
  as	
  in	
  case	
  of	
  Adwords,	
  so	
  each	
  data	
  
point	
   corresponds	
   to	
   a	
   specific	
   segment	
   on	
   a	
   specific	
   day.	
   Two	
   data	
   set	
   were	
   build:	
   Data	
   1:	
  
with	
  just	
  adwords	
  other	
  with	
  analytics+facebook+email:	
  Data	
  2.	
  
	
  

Table	
  1	
  variables	
  used	
  for	
  analysis.	
  The	
  colour	
  fields	
  are	
  data	
  from	
  campaigns.	
  
	
  

Variable	
  

Visit	
  length	
  
Number	
  of	
  visits	
  
Bounce	
  rate	
  
Page	
  per	
  visit	
  
Ad/campaign	
  group	
  
Cost	
  per	
  Click	
  
Position	
  
Type	
  	
  
Click	
  Through	
  Rate	
  
Conversion	
  Rate	
  
Impressions	
  

F AdWords	
  
a
c
e
b
o
o
k	
  

General	
  

Traffic	
  source	
  	
  

Name	
  
Comments	
  
(Metric/Dimension)	
  
TO	
  (D)	
  
Organic,	
  Email,	
  Adwords,	
  
Facebook,	
  Others	
  
VL	
  (M)	
  
	
  
NV	
  (M)	
  
	
  
BR	
  (M)	
  
	
  
PV	
  (M)	
  
	
  
CG	
  (D)	
  
Group	
  of	
  Ad	
  
CPC	
  (M)	
  
	
  
P	
  (M)	
  
	
  
T	
  (D)	
  
Search,	
  display	
  
CTR	
  (M)	
  
	
  
CRA	
  (M)	
  
	
  
Imp	
  (M)	
  
	
  
Emails	
  

	
  

	
  
	
  
	
  

Click	
  through	
  rate	
  
CTRf(M)	
  
Cost	
  per	
  like	
  
CPL	
  (M)	
  
Convertion	
  Rate	
  Facebook	
   CRF(M)	
  

	
  
	
  
	
  

Emails	
  Sent	
  
Open	
  Rate	
  
Click	
  Rate	
  
Conversion	
  Rate	
  email	
  
Total	
  revenue	
  

	
  
	
  
	
  
	
  
Revenue	
  from	
  sales	
  

Em	
  (M)	
  
OR	
  (M)	
  
CT	
  (M)	
  
CRE	
  (M)	
  
Re	
  (M)	
  

2.2 Performance	
  Ratios	
  
For	
   visualization	
   proposes,	
   we	
   consider	
   several	
   aggregated	
   metrics	
   to	
   benchmark	
   the	
  
performance	
   of	
   a	
   website	
   and	
   the	
   digital	
   campaigns.	
   We	
   divide	
   the	
   metrics	
   into	
   two	
   major	
  
categories:	
   website	
   usability	
   and	
   financial	
   performance.	
   All	
   indexes	
   are	
   defined	
   to	
   have	
   values	
  
between	
  0	
  and	
  1.	
  	
  
A	
  site	
  can	
  be	
  highly	
  engaging…	
  

	
  
Website	
  usability	
  metrics	
  
We	
  defined	
  the	
  engagement	
  as	
  a	
  composite	
  index,	
  defined	
  according	
  to	
  [8]	
  as:	
  
	
  

E = ∑ Cdi + Dd i +Idi + (1 − Bri ) 	
  
i

where	
  Br	
  is	
  the	
  bounce	
  rate	
  and	
  the	
  other	
  indices	
  are	
  defined	
  below.	
  The	
  sum	
  runs	
  over	
  any	
  
aggregation	
   metric	
   that	
   we	
   may	
   be	
   interested.	
   The	
   coefficients	
   are	
   obtained	
   from	
   sessions	
  
originated	
  from	
  a	
  particular	
  dimension:	
  visitor	
  id,	
  traffic	
  source,	
  time,	
  etc.	
  This	
  index	
  has	
  the	
  
€
advantage	
  of	
  benchmarking	
  the	
  quality	
  of	
  the	
  site	
  and	
  the	
  interaction	
  of	
  user	
  with	
  the	
  content.	
  	
  
Click	
  Depth	
  index	
  (Cd)	
  measures	
  the	
  degree	
  depth	
  visits	
  and	
  is	
  defined	
  as:	
  
	
  

Cd =

Sessions with at least 4 page views
	
  
All sessions

	
  
Duration	
  Depth	
  index	
  (Dd)	
  measures	
  the	
  intensity	
  of	
  the	
  visits	
  captured	
  by	
  the	
  
duration	
  of	
  visits	
  on	
  the	
  website.	
  It	
  is	
  defined	
  as:	
  

€
 

Dd =
	
  

Sessions with a duration of at least 3 min
	
  
All sessions

The	
   Interaction	
   depth	
   index,	
   (Id),	
   captures	
   the	
   visitor	
   interaction	
   with	
   content	
   or	
  
functionality	
  designed	
  to	
  increase	
  level	
  of	
  Attention.	
  It	
  is	
  defined	
  as:	
  
	
   €

Id =

Sessions where visitors complete an action
	
  
All sessions

	
  
where	
   an	
   action	
   can	
   be	
   defined	
   as	
   a	
   goal	
   on	
   GA,	
   from	
   downloading	
   a	
   document,	
   to	
   filling	
   a	
  
form	
  or	
  watching	
  a	
  video.	
  
€
	
  
Financial	
  metrics	
  
Engagement	
   with	
   a	
   website	
   is	
   important,	
   but	
   the	
   really	
   important	
   metrics,	
   especially	
   for	
   e-­‐
commerce	
  sites,	
  are	
  sales	
  or	
  leads.	
  This	
  is	
  captured	
  by	
  financial	
  metrics	
  ratios.	
  
	
  
There	
   are	
   dozens	
   of	
   financial	
   ratios	
   to	
   measure	
   efficiency	
   of	
   a	
   sales	
   channel,	
   but	
   we	
  
will	
  focus	
  on	
  the	
  following:	
  	
  
	
  
• CR,	
  Conversion	
  Rate	
  
• RPC,	
  Revenue	
  Per	
  Channel	
  
• ROI,	
  Return	
  On	
  Investment	
  
	
  
The	
  CR	
  rate	
  is	
  simple	
  defined	
  as:	
  
	
  

CR =

Sessions where visitors purchage a produt
	
  
All sessions

	
  
Typical	
  CR	
  are	
  low,	
  1%	
  is	
  considered	
  very	
  good	
  for	
  most	
  sites,	
  but	
  it	
  can	
  be	
  as	
  low	
  as	
  0.001%.	
  	
  
The	
  Revenue	
  per	
  channel	
  (RPC)	
  is	
  the	
  total	
  value	
  earned	
  by	
  a	
  sales	
  channel	
  over	
  a	
  fixed	
  
€
period	
  of	
  time.	
  The	
  ROI	
  of	
  a	
  channel	
  is	
  simply	
  the	
  ratio	
  of	
  revenue	
  per	
  total	
  investment	
  made	
  
on	
  this	
  channel:	
  	
  

ROI =

RPC
	
  
Total cost

	
  
In	
   Figure	
   1	
   we	
   show	
   the	
   evolution	
   of	
   Engagement	
   and	
   ROI	
   over	
   time	
   for	
   the	
   2	
   mains	
   traffic	
  
origin	
  sources.	
  
€
	
  
Figure	
  1:	
  Engagement	
  over	
  time	
  (days)	
  for	
  using	
  a	
  moving	
  average.	
  

	
  

	
  
	
  
In	
   Figure	
   2,	
   we	
   plot	
   the	
   revenue	
   per	
   origin	
   of	
   traffic.	
   The	
   most	
   important	
   source	
   for	
  
revenue	
   was	
   Facebook,	
   while	
   Google	
   Organic	
   ranks	
   second	
   and	
   Adwords	
   third.	
   The	
   most	
  
consist	
  channels	
  are	
  Direct	
  traffic	
  and	
  email	
  newsletter.	
  

Figure	
  2:	
  revenue	
  distribution	
  per	
  channel	
  (top	
  6).	
  

	
  

3 Data	
  visualization	
  with	
  unsupervised	
  techniques	
  
In	
  this	
  section	
  we	
  will	
  use	
  some	
  techniques	
  for	
  data	
  exploration	
  and	
  visualization	
  in	
  order	
  to	
  
detect	
   patterns	
   and	
   features	
   that	
   are	
   hidden	
   in	
   high	
   dimensional	
   data.	
   We	
   will	
   use	
   non-­‐
supervised	
   clustering	
   techniques,	
   from	
   simpler	
   ones,	
   like	
   k-­‐means,	
   to	
   more	
   elaborate	
   one,	
   like	
  
Self	
  Organized	
  Maps	
  (SOM)	
  and	
  Multi	
  Dimensional	
  Scaling	
  (MDS).	
  
	
  
3.1 Adwords	
  Data	
  
We	
  start	
  by	
  characterizing	
  the	
  data	
  by	
  plotting	
  the	
  box	
  plots	
  in	
  Figure	
  3	
  where	
  the	
  number	
  of	
  
conversions,	
  the	
  CTR	
  and	
  CR	
  are	
  displayed	
  for	
  all	
  Adgroups	
  in	
  our	
  campaign.	
  There	
  are	
  three	
  
Ad	
  groups	
  that	
  have	
  the	
  majority	
  of	
  conversions	
  (sales):	
  group	
  9,	
  10	
  and	
  11.	
  The	
  average	
  CTR	
  is	
  
almost	
   constant	
   for	
   most	
   of	
   the	
   groups	
   (around	
   6%),	
   but	
   in	
   some	
   cases	
   we	
   don’t	
   have	
   enough	
  
data	
   to	
   evaluate	
   it	
   with	
   accurately.	
   The	
   average	
   position	
   is	
   1.68	
   and	
   the	
   average	
   CR	
   is	
   0.2%,	
  
showing	
  a	
  greater	
  variability	
  than	
  the	
  CTR.	
  	
  
	
  	
  
	
  

	
  

Figure	
   3:	
   Boxplot	
   of	
   CTR	
   (red),	
   number	
   of	
   conversions	
   (blue)	
   and	
   CR	
   (green)	
   for	
   all	
   Adwords	
  
groups	
  
	
  
	
  
In	
   Figure	
   4	
   we	
   plot	
   the	
   weekly	
   revenues	
   and	
   costs	
   over	
   a	
   period	
   of	
   6	
   months	
   of	
   the	
  
adwords	
  campaign.	
  	
  Initally	
  the	
  campaign	
  was	
  not	
  very	
  efficient	
  since	
  we	
  run	
  a	
  trial	
  period	
  to	
  
test	
   and	
   optimized	
   its	
   content,	
   targeting	
   and	
   keywords.	
   After	
   week	
   6	
   a	
   boost	
   on	
   investment	
  
also	
  bring	
  a	
  more	
  than	
  propotional	
  increase	
  in	
  sales.	
  	
  
	
  
 

Figure	
  4:	
  Revenue	
  and	
  cost	
  per	
  week	
  on	
  Adwords	
  campaigns.	
  
	
  
Clustering	
  
We	
   then	
   cluster	
   the	
   data	
   using	
   the	
   k-­‐means	
   algorithm.	
   K-­‐means	
   is	
   one	
   of	
   the	
   simplest	
   and	
  
most	
   widely	
   used	
   algorithm	
   for	
   non-­‐supervised	
   clustering.	
   The	
   only	
   input	
   is	
   the	
   number	
   of	
  
clusters	
   k	
   and	
   the	
   metric	
   used	
   to	
   calculate	
   the	
   distances	
   between	
   points.	
   We	
   tested	
   the	
  
algorithm	
   from	
   two	
   to	
   five	
   clusters	
   using	
   the	
   Euclidian	
   distance	
   on	
   the	
   Adwords	
   data.	
   The	
  
optimum	
  compromise	
  between	
  intra	
  and	
  inter	
  cluster	
  distance	
  was	
  achieved	
  at	
  k	
  =	
  3	
  clusters.	
  
Results	
   are	
   presented	
   in	
   Figure	
   5	
   where	
   we	
   selected	
   the	
   dimensions	
   CTR	
   and	
   number	
   of	
   Clicks	
  
as	
   representative	
   axis.	
   The	
   four	
   patterns	
   are	
   very	
   clear	
   in	
   this	
   figure	
   and	
   the	
   centroids	
   are	
  
presented	
  in	
  Table	
  2.	
  It	
  can	
  be	
  seen	
  that	
  most	
  conversions	
  come	
  from	
  the	
  green	
  group,	
  which	
  
corresponds	
  to	
  the	
  greater	
  number	
  of	
  visits	
  and	
  clicks.	
  Number	
  of	
  page	
  visits	
  is	
  also	
  a	
  strong	
  
indicator	
   of	
   revenue.	
   Error!	
   Reference	
   source	
   not	
   found.	
   show	
   the	
   clustering	
   on	
   page	
   views	
  
and	
  visitors.	
  CTR,	
  CPC	
  and	
  position	
  are	
  almost	
  the	
  same	
  for	
  the	
  three	
  groups.	
  

Figure	
  5:	
  K-­‐means	
  algorithm	
  with	
  3	
  clusters	
  for	
  data	
  set	
  1.	
  

	
  
 
Table	
  2:	
  Centres	
  of	
  the	
  4	
  clusters	
  obtained	
  by	
  kmeans	
  for	
  the	
  Adwords	
  data	
  set	
  
Cluster	
  

Cost	
  

Clicks	
  

Imp.	
  

Revenue	
  

1	
  

56.7

327

4739

85.1

2	
  

81.7

474

6610

3	
  

20.8

73

1194

CTR(%)	
  

CPC	
  

Position	
  

0.07

0.14

1.79	
  

124.9

0.08

0.15

1.71	
  

14.1

0.06

0.17

1.30	
  

	
  
	
  

	
  
In	
  Figure	
  6	
  we	
  plot	
  the	
  structure	
  of	
  Graph	
  of	
  correlations	
  with	
  R	
  function	
  qgraph	
  for	
  
the	
  Adwords	
  data	
  set.	
  There	
  are	
  strong	
  correlations	
  between	
  **.???	
  

	
  

Figure	
  6	
  correlations	
  with	
  QGrapph	
  

	
  

	
  

3.2 PCA	
  
Principal	
  Component	
  Analysis	
  is	
  one	
  of	
  the	
  oldest	
  and	
  wider	
  used	
  approaches	
  to	
  compress	
  high	
  
dimensional	
  data	
  into	
  a	
  sub-­‐set	
  of	
  linear	
  components.	
  It	
  has	
  the	
  disadvantage	
  of	
  being	
  a	
  linear	
  
model,	
  but	
  it	
  still	
  very	
  useful.	
  In	
  Figure	
  7	
  we	
  plot	
  the	
  eigen-­‐values	
  of	
  the	
  components	
  in	
  a	
  bi-­‐
dimensional	
  plot.	
  Two	
  main	
  principal	
  components	
  are	
  clearly	
  seen.	
  Note	
  that	
  conversions	
  are	
  
highly	
  correlated	
  with	
  ad	
  groups.	
  
 

Figure	
  7	
  PCA	
  for	
  the	
  Adwords	
  (left)	
  data	
  and	
  Google	
  Analytics	
  (right).	
  

	
  

	
  

3.3 SOM	
  
Self-­‐organizing	
  map	
  (SOM)	
  is	
  an	
  unsupervised	
  neural	
  network	
  proposed	
  by	
   Kohonen	
  (Kohonen	
  
2001)	
   for	
   visual	
   cluster	
   analysis.	
   The	
   neurons	
   of	
   the	
   map	
   are	
   located	
   on	
   a	
   regular	
   grid	
  
embedded	
   in	
   a	
   low	
   (usually	
   2	
   or	
   3)	
   dimensional	
   space,	
   and	
   associated	
   with	
   the	
   cluster	
  
prototypes.	
  In	
  the	
  course	
  of	
  learning	
  process,	
  the	
  neurons	
  compete	
  with	
  each	
  other	
  through	
  
the	
  best	
  matching	
  principle,	
  i.e.,	
  the	
  input	
  is	
  projected	
  to	
  the	
  nearest	
  neuron	
  using	
  a	
  defined	
  
distance	
  metric.	
  The	
  winner	
  neuron	
  and	
  its	
  neighbours	
  on	
  the	
  map	
  are	
  adjusted	
  towards	
  the	
  
input	
  in	
  proportion	
  with	
  the	
  neighbourhood	
  distance,	
  consequently	
  the	
  neighbouring	
  neurons	
  
likely	
   represent	
   the	
   similar	
   patterns	
   of	
   the	
   input	
   data	
   space.	
   Due	
   to	
   the	
   data	
   clustering	
   and	
  
spatialization	
  through	
  the	
  topology	
  preserving	
  projection,	
  SOM	
  is	
  widely	
  used	
  in	
  the	
  context	
  of	
  
visual	
  clustering	
  applications.	
  	
  	
  
SOM	
  is	
  very	
  appropriate	
  to	
  analyze	
  the	
  high-­‐dimensional	
  data	
  of	
  digital	
  metrics	
  range	
  
of	
   research	
   groups	
   concentrate	
   on	
   the	
   bankruptcy	
   prediction	
   problem,	
   usually	
   solved	
   as	
   a	
  
classification	
   task	
   to	
   separate	
   the	
   companies	
   into	
   distress	
   and	
   healthy	
   category	
   (binary)	
   or	
   a	
  
number	
  of	
  predefined	
  credit	
  rates	
  (multi-­‐class).	
  	
  
SOM	
  is	
  used	
  to	
  determine	
  the	
  class	
  through	
  a	
  visual	
  exploration	
  (Merkevicius,	
  Garsva	
  
&	
  Simutis	
  2004).	
  An	
  enhanced	
  version	
  of	
  LVQ	
  can	
  boost	
  the	
  prediction	
  performance	
  of	
  multi-­‐
layer	
   perceptron	
   neural	
   network	
   (Neves	
   &	
   Vieira	
   2006).	
   In	
   cooperation	
   with	
   independent	
  
component	
   analysis	
   for	
   dimensionality	
   reduction,	
   LVQ	
   is	
   employed	
   to	
   recognize	
   the	
   distressed	
  
French	
  companies	
  (Chen	
  &	
  Vieira	
  2009).	
  
Figure	
  8:	
  SOM	
  for	
  data	
  set	
  1	
  –	
  Adword	
  campaigns	
  on	
  a	
  6x5	
  =	
  30	
  cells	
  space.	
  

	
  

3.4 MDS	
  
SOM	
   methods,	
   presented	
   previously,	
   involves	
   the	
   estimation	
   of	
   the	
   conditional	
   probability	
  
which	
   is	
   computationally	
   expensive	
   and	
   hard	
   to	
   extract.	
   Here	
   we	
   test	
   the	
   Multidimensional	
  
Scaling	
   algorithm	
   (MDS).	
   MDS,	
   is	
   a	
   non-­‐linear	
   approach,	
   mostly	
   used	
   for	
   visualizing,	
   that	
  
captures	
   the	
   level	
   of	
   similarity	
   of	
   individual	
   cases	
   of	
   a	
   dataset.	
   It	
   is	
   used	
   to	
   display	
   the	
  
information	
   contained	
   in	
   a	
   distance	
   matrix,	
   evaluated	
   according	
   with	
   some	
   metric.	
   The	
   MDS	
  
algorithm	
   place	
   each	
   object	
   in	
   N-­‐dimensional	
   space	
   such	
   that	
   the	
   between-­‐object	
   distances	
  
are	
   preserved	
   as	
   well	
   as	
   possible.	
   Each	
   object	
   is	
   then	
   assigned	
   coordinates	
   in	
   each	
   of	
   the	
   N	
  
dimensions.	
  The	
  number	
  of	
  dimensions	
  of	
  an	
  MDS	
  plot	
  N	
  can	
  exceed	
  2	
  and	
  is	
  specified	
  a	
  priori.	
  
Choosing	
  N=2	
  optimizes	
  the	
  object	
  locations	
  for	
  a	
  two-­‐dimensional	
  scatterplot	
  -­‐	
  Figure	
  9.	
  

Figure	
  9:	
  Aggregation	
  by	
  MDS	
  on	
  data	
  set	
  2.	
  Colours	
  represents	
  revenues	
  levels	
  (black	
  =	
  lowest,	
  light	
  blue	
  =	
  
highest).	
  

	
  

	
  

3.5 Heatmaps	
  and	
  ROI	
  
We	
  now	
  investigate	
  the	
  return	
  on	
  investment	
  (ROI)	
  from	
  Adwords	
  and	
  Facebook	
  campaigns.	
  
The	
  Facebook	
  campaign	
  run	
  over	
  the	
  same	
  period	
  as	
  the	
  Adwords	
  with	
  a	
  daily	
  budget	
  between	
  
10	
  and	
  40	
  euros	
  -­‐	
  Figure	
  10.	
  The	
  ROI	
  is	
  in	
  general	
  bigger	
  than	
  1,	
  meaning	
  that	
  the	
  campaign	
  
is	
  producing	
  good	
  results.	
  We	
  we	
  consider	
  the	
  global	
  performance	
  (Sales	
  originated	
  from	
  
all	
   channels)	
   the	
   ROI	
   almost	
   duplicate	
   –	
   considering	
   as	
   cost	
   only	
   the	
   investment	
   in	
  
Adwords	
  and	
  Facebook.	
  

	
  
Figure	
  10	
  :	
  ROI	
  over	
  time	
  (days)	
  -­‐	
  using	
  moving	
  averages:	
  (red)	
  Adwords,	
  (blue)	
  Total.	
  
	
  
We	
  now	
  plot	
  the	
  ROI	
  for	
  the	
  payed	
  channels.	
  Email	
  is	
  number	
  one,	
  as	
  expected,	
  due	
  to	
  
the	
  small	
  cost	
  of	
  promotion.	
  ROI	
  and	
  Eng	
  for	
  Data	
  1.	
  **	
  
	
  

Heat	
  maps	
  
Heat	
  maps	
  are	
  a	
  good	
  visualization	
  method	
  for	
  data	
  exploration	
  and	
  causality	
  explanation.	
  In	
  
this	
   case	
   we	
   use	
   it	
   to	
   explore	
   conversions	
   and	
   engagement	
   into	
   a	
   calendar	
   to	
   visually	
   spot	
  
trends.	
   We	
   use	
   the	
   GGplot2	
   library	
   to	
   create	
   a	
   Calendar	
   heatmap	
   with	
   data	
   from	
   6	
   months.	
  
We	
  plot	
  engagement,	
  visits	
  as	
  well	
  as	
  transactions	
  on	
  calendar	
  so	
  we	
  get	
  perspective	
  on	
  how	
  
they	
  interact	
  viz-­‐a-­‐viz	
  timeline.	
  
In	
   this	
   case	
   it	
   is	
   interesting	
   to	
   note	
   that	
   Tuesdays	
   have	
   high	
   visits	
   days	
   but	
   Wednesday	
  
has	
   been	
   the	
   day	
   when	
   most	
   transactions	
   occurs.	
   Visits	
   increases	
   towards	
   the	
   end	
   of	
   year	
  
(shopping	
  season)	
  and	
  then	
  slows	
  down	
  towards	
  year	
  start.	
  Engagement	
  has	
  been	
  improving	
  
over	
  time.	
  	
  
 

Figure	
  11:	
  Heatmap	
  calendar	
  for	
  visits	
  (top)	
  and	
  revenue	
  (bottom)	
  over	
  the	
  last	
  6	
  months.	
  	
  

	
  

	
  

	
  

4 Supervised	
  Learning	
  for	
  Revenue	
  Prediction	
  
In	
   previous	
   sections	
   we	
   explored	
   the	
   data	
   patterns	
   without	
   concerns	
   about	
   causality	
  
between	
  observations	
  (non-­‐supervised	
  learning).	
  In	
  this	
  section	
  we	
  go	
  a	
  step	
  forward	
  
and	
  use	
  supervised	
  learning	
  to	
  make	
  predictions	
  on	
  data	
  based	
  on	
  past	
  records.	
  This	
  is	
  
very	
  important	
  as	
  it	
  provides	
  explanation,	
  “the	
  why”	
  instead	
  of	
  “the	
  what”	
  as	
  we	
  enter	
  
the	
  field	
  of	
  predictive	
  analytics.	
  	
  
First	
  we	
  consider	
  the	
  problem	
  from	
  a	
  broader	
  perspective:	
  can	
  we	
  predict	
  the	
  
revenue	
   from	
   a	
   certain	
   channel	
   by	
   looking	
   at	
   the	
   data	
   traffic	
   generated?	
   If	
   so,	
   with	
  
how	
  much	
  accuracy	
  and	
  confidence?	
  What	
  is	
  the	
  difference	
  in	
  behaviour	
  from	
  a	
  user	
  
that	
   finalizes	
   a	
   purchase	
   from	
   other	
   users?	
   To	
   answer	
   these	
   questions	
   we	
   run	
  
supervised	
  algorithms	
  trained	
  with	
  past	
  data	
  and	
  perform	
  classification	
  analysis.	
  	
  
First	
   step,	
   we	
   enrich	
   our	
   data	
   extracting	
   extra	
   metrics	
   drill	
   down	
   by	
   5	
  
dimensions	
   (time,	
   traffic	
   source,	
   adwords	
   ad	
   group,	
   operating	
   system,	
   and	
   city).	
   The	
  
metrics	
   used	
   are:	
   number	
   of	
   visits,	
   average	
   pages	
   per	
   visit,	
   average	
   visit	
   duration,	
  
bounce	
   rate,	
   visit	
   depth,	
   CTR,	
   page	
   load	
   time,	
   social	
   interaction	
   and	
   cost	
   of	
   ads	
   on	
  
Adwords	
   and	
   Facebook.	
   From	
   these	
   metrics	
   we	
   extract	
   the	
   additional	
   performance	
  
ratios	
  described	
  in	
  Section	
  2.2.	
  In	
  which	
  concerns	
  the	
  traffic	
  sources,	
  we	
  selected	
  only	
  
the	
  top	
  10	
  performers.	
  We	
  consider	
  a	
  conversion	
  when	
  at	
  least	
  one	
  sale	
  is	
  concluded.	
  
All	
  data	
  is	
  aggregated	
  with	
  a	
  daily	
  granularity.	
  	
  
	
  
We	
   run	
   the	
   algorithms	
   as	
   a	
   classification	
   task,	
   trying	
   to	
   predict	
   when	
   a	
   given	
  
visit	
   leads	
   to	
   a	
   conversion	
   in	
   a	
   given	
   session.	
   The	
   data	
   set	
   contains	
   5680	
   sessions	
   of	
  
which	
   432	
   have	
   conversions.	
   We	
   used	
   Support	
   Vector	
   Machines	
   and	
   Random	
   Forest	
  
algorithm	
   since	
   they	
   can	
   easily	
   deal	
   with	
   categorical	
   and	
   continuous	
   inputs,	
   can	
   be	
  
trained	
  with	
  very	
  few	
  examples,	
  and	
  does	
  not	
  overfit.	
  	
  	
  
	
  
Since	
   many	
   more	
   visit	
   lead	
   to	
   non-­‐conversions	
   than	
   conversion,	
   we	
   create	
   a	
  
balanced	
  data	
  set	
  by	
  randomly	
  eliminating	
  entries	
  that	
  don’t	
  lead	
  to	
  conversions.	
  We	
  
end	
   up	
   with	
   864	
   training	
   examples.	
   All	
   data	
   was	
   normalized	
   and	
   the	
   algorithm	
   was	
  
tested	
  using	
  10-­‐fold	
  cross	
  validation.	
  
In	
  Figure	
  13	
  we	
  plot	
  the	
  ROC	
  curve	
  obtained	
  over	
  a	
  period	
  of	
  165	
  days.	
  The	
  AUC	
  
obtained	
  was	
  0.84.	
  For	
  comparison,	
  we	
  used	
  SVM	
  and	
  the	
  AUC	
  =	
  **.	
  This	
  is	
  somehow	
  
surprising	
   result	
   given	
   the	
   small	
   set	
   of	
   inputs.	
   	
   In	
   order	
   to	
   separate	
   the	
   traffic	
   from	
  
Adwords,	
   we	
   run	
   the	
   algorithm	
   without	
   traffic	
   from	
   this	
   source.	
   The	
   results	
   have	
  
improved	
  slightly.	
  	
  
Random	
   forest	
   returns	
   several	
   measures	
   of	
   variable	
   importance.	
   The	
   most	
   reliable	
  
measure	
  is	
  based	
  on	
  the	
  decrease	
  of	
  classification	
  accuracy	
  when	
  values	
  of	
  a	
  variable	
  in	
  a	
  
node	
  of	
  a	
  tree	
  are	
  permuted	
  randomly,	
  and	
  this	
  is	
  the	
  measure	
  of	
  variable	
  importance.	
  	
  
	
  Table	
  3	
  presents	
  the	
  best	
  discriminating	
  indicators	
  in	
  predicting	
  conversions:	
  traffic	
  
origin	
  and	
  the	
  number	
  of	
  visits	
  –	
  see	
  also	
  Figure	
  12.	
  
	
  
 
Figure	
  12:	
  dispersion	
  of	
  inputs	
  for	
  data	
  set	
  2.	
  

	
  

Figure	
  13:	
  ROC	
  curve	
  for	
  the	
  conversion	
  prediction	
  with	
  Random	
  Forest	
  and	
  SVM	
  algorithms.	
  
FPR:	
  False	
  positive	
  rate,	
  TPR:	
  true	
  positive	
  rate.	
  
Table	
  3:	
  Best	
  performing	
  conversion	
  prediction	
  indicators	
  for	
  the	
  two	
  datasets.	
  
All	
  Variables	
  
Traffic	
  Source	
  

Number	
  of	
  visits	
  

Number	
  of	
  visits	
  

Bounce	
  Rate	
  

Bounce	
  rate	
  

Visit	
  Length	
  

Visit	
  length	
  

	
  

All	
  without	
  Adwords	
  

Time	
  on	
  site	
  
5

Conclusions	
  

In	
   this	
   work	
   we	
   have	
   used	
   a	
   set	
   of	
   machine	
   learning	
   techniques	
   for	
   data	
   exploration	
   and	
  
predictive	
  analytics.	
  It	
  was	
  shown	
  that	
  exploratory	
  tools	
  can	
  help	
  understand	
  the	
  dynamics	
  of	
  
digital	
  campaigns.	
  	
  
	
  
We	
   used	
   Random	
   Forest	
   algorithms	
   (a	
   collection	
   of	
   decision	
   trees)	
   and	
   SVM	
   to	
   predict	
  
the	
  conversions	
  with	
  a	
  reasonable	
  accuracy.	
  The	
  most	
  important	
  features	
  are	
  number	
  of	
  visits,	
  
origin	
  of	
  traffic	
  and	
  visit	
  duration.	
  Surprisingly,	
  we	
  found	
  that	
  CTR	
  and	
  CR	
  have	
  little	
  influence	
  
as	
  predictors	
  of	
  conversions.	
  	
  

6

References	
  
•

•

•
•
•
•
•
	
  
	
  
	
  

	
  

1.	
  Benjamin	
  Edelman,	
  Michael	
  Ostrovsky,	
  and	
  Michael	
  Schwarz:	
  "Internet	
  Advertising	
  
and	
   the	
   Generalized	
   Second-­‐Price	
   Auction:	
   Selling	
   Billions	
   of	
   Dollars	
   Worth	
   of	
  
Keywords".	
  American	
  Economic	
  Review	
  97(1),	
  2007	
  pp	
  242-­‐259	
  
2.	
   P.	
   Maille,	
   E.	
   Markakis,	
   M.	
   Naldi,	
   G.	
   D.	
   Stamoulis,	
   B.	
   Tuffin.	
   Sponsored	
   Search	
  
Auctions:	
   An	
   Overview	
   of	
   Research	
   with	
   Emphasis	
   on	
   Game	
   Theoretic	
   Aspects.	
   To	
  
appear	
  in	
  the	
  Electronic	
  Commerce	
  Research	
  journal	
  (ECR).	
  	
  
3.	
  Andrei	
  Broder,	
  Vanja	
  Josifovski.	
  Introduction	
  to	
  Computational	
  Advertising	
  Course,	
  
Stanford	
  University,	
  California	
  	
  
4.	
   Anand	
   Rajaraman	
   and	
   Jeffrey	
   D.	
   Ullman.	
   Mining	
   of	
   massive	
   datasets.	
   Cambridge	
  
University	
  Press,	
  2012,	
  Chapter	
  8	
  –	
  Advertising	
  on	
  the	
  Web	
  
5.	
  James	
  Shanahan.	
  Digital	
  Advertising	
  and	
  Marketing:	
  A	
  review	
  of	
  three	
  generations.	
  
Tutorial	
  on	
  WWW	
  2012	
  
7.	
  IAB’s	
  Internet	
  Advertising	
  Revenue	
  Report	
  http://www.iab.net/AdRevenueReport	
  
http://www.webanalyticsdemystified.com/downloads/Web_Analytics_Demystified_an
d_NextStage_Global_-­‐_Measuring_the_Immeasurable_-­‐_Visitor_Engagement.pdf	
  

More Related Content

What's hot

Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics IBM SPSS Software
 
Customer analytics software - Quiterian
Customer analytics software - QuiterianCustomer analytics software - Quiterian
Customer analytics software - QuiterianJosep Arroyo
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation ProjectAditya Ekawade
 
Marketing analytics for the Banking Industry
Marketing analytics for the Banking IndustryMarketing analytics for the Banking Industry
Marketing analytics for the Banking IndustrySashindar Rajasekaran
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicagoKM Chicago
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePedro Ecija Serrano
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
The use of web analytics for digital-marketing-pe 2015 industrial-marketing-
The use of web analytics for digital-marketing-pe 2015 industrial-marketing-The use of web analytics for digital-marketing-pe 2015 industrial-marketing-
The use of web analytics for digital-marketing-pe 2015 industrial-marketing-adnan haidar
 
Poster presetation for "Using Big Data for Marketing Analytics"
Poster presetation for "Using Big Data for Marketing Analytics"Poster presetation for "Using Big Data for Marketing Analytics"
Poster presetation for "Using Big Data for Marketing Analytics"Touseef Ahmed
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analyticsPrasad Narasimhan
 
Machine learning for customer classification
Machine learning for customer classificationMachine learning for customer classification
Machine learning for customer classificationAndrew Barnes
 
Pi cube banking on predictive analytics151
Pi cube   banking on predictive analytics151Pi cube   banking on predictive analytics151
Pi cube banking on predictive analytics151Cole Capital
 
Retail Design
Retail DesignRetail Design
Retail Designjagishar
 
DATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTORDATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTORRenuka Chand
 
Data mining in marketing
Data mining in marketingData mining in marketing
Data mining in marketingrushabhs002
 

What's hot (20)

Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics Fuel for the cognitive age: What's new in IBM predictive analytics
Fuel for the cognitive age: What's new in IBM predictive analytics
 
Customer analytics software - Quiterian
Customer analytics software - QuiterianCustomer analytics software - Quiterian
Customer analytics software - Quiterian
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation Project
 
Marketing analytics for the Banking Industry
Marketing analytics for the Banking IndustryMarketing analytics for the Banking Industry
Marketing analytics for the Banking Industry
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Bdml ecom
Bdml ecomBdml ecom
Bdml ecom
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicago
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
The use of web analytics for digital-marketing-pe 2015 industrial-marketing-
The use of web analytics for digital-marketing-pe 2015 industrial-marketing-The use of web analytics for digital-marketing-pe 2015 industrial-marketing-
The use of web analytics for digital-marketing-pe 2015 industrial-marketing-
 
Poster presetation for "Using Big Data for Marketing Analytics"
Poster presetation for "Using Big Data for Marketing Analytics"Poster presetation for "Using Big Data for Marketing Analytics"
Poster presetation for "Using Big Data for Marketing Analytics"
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analytics
 
Machine learning for customer classification
Machine learning for customer classificationMachine learning for customer classification
Machine learning for customer classification
 
Pi cube banking on predictive analytics151
Pi cube   banking on predictive analytics151Pi cube   banking on predictive analytics151
Pi cube banking on predictive analytics151
 
Retail Design
Retail DesignRetail Design
Retail Design
 
Datamining
DataminingDatamining
Datamining
 
DATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTORDATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTOR
 
Data mining in marketing
Data mining in marketingData mining in marketing
Data mining in marketing
 

Similar to Optimization of digital marketing campaigns

Paid search Advertising Research
Paid search Advertising ResearchPaid search Advertising Research
Paid search Advertising ResearchNidhiArora113
 
Digital Marketing ROI Workshop
Digital Marketing ROI WorkshopDigital Marketing ROI Workshop
Digital Marketing ROI WorkshopArman Rousta
 
How Start-up’s Can Grow Online
How Start-up’s Can Grow OnlineHow Start-up’s Can Grow Online
How Start-up’s Can Grow OnlineDushyant Verma
 
nextNY Online Marketing School - SEM Presentation
nextNY Online Marketing School - SEM PresentationnextNY Online Marketing School - SEM Presentation
nextNY Online Marketing School - SEM PresentationnextNY
 
PPC Steps Introducing
PPC Steps IntroducingPPC Steps Introducing
PPC Steps IntroducingMartin Astern
 
Blackglass affili@ syd
Blackglass affili@ sydBlackglass affili@ syd
Blackglass affili@ sydMatt Bateman
 
Navigating the Digital Landscape: Strategies for Effective Digital Marketing
Navigating the Digital Landscape: Strategies for Effective Digital MarketingNavigating the Digital Landscape: Strategies for Effective Digital Marketing
Navigating the Digital Landscape: Strategies for Effective Digital Marketingabdulwaheedsq3434
 
Microsoft Advertising Bootcamp - Morning Session
Microsoft Advertising Bootcamp - Morning SessionMicrosoft Advertising Bootcamp - Morning Session
Microsoft Advertising Bootcamp - Morning SessionMSFTAdvertising
 
Data Driven Digital Marketing Strategy
Data Driven Digital Marketing Strategy Data Driven Digital Marketing Strategy
Data Driven Digital Marketing Strategy Wecomex Ltd
 
Azizul Hakim Poster PDF1
Azizul Hakim Poster PDF1Azizul Hakim Poster PDF1
Azizul Hakim Poster PDF1Aziz hakim
 
Dennis yu social amplification engine_guide_v5.1_2016_1210
Dennis yu social amplification engine_guide_v5.1_2016_1210Dennis yu social amplification engine_guide_v5.1_2016_1210
Dennis yu social amplification engine_guide_v5.1_2016_1210Vasil Azarov
 
E marketer integrating_search_and_display-tactics_for_more_effective_advertising
E marketer integrating_search_and_display-tactics_for_more_effective_advertisingE marketer integrating_search_and_display-tactics_for_more_effective_advertising
E marketer integrating_search_and_display-tactics_for_more_effective_advertisingAdCMO
 
The ROI of Marketing Automation
The ROI of Marketing AutomationThe ROI of Marketing Automation
The ROI of Marketing AutomationEvgeny Tsarkov
 
The roi of marketing automation
The roi of marketing automationThe roi of marketing automation
The roi of marketing automationNuno Fraga Coelho
 
Ebook definitive guide to attribution final
Ebook definitive guide to attribution finalEbook definitive guide to attribution final
Ebook definitive guide to attribution finalNicolas Valenzuela
 
Pay-Per-Click Marketing Principles Part 1
Pay-Per-Click Marketing Principles  Part 1Pay-Per-Click Marketing Principles  Part 1
Pay-Per-Click Marketing Principles Part 1Gold and Silver Online
 

Similar to Optimization of digital marketing campaigns (20)

Paid search Advertising Research
Paid search Advertising ResearchPaid search Advertising Research
Paid search Advertising Research
 
RCI
RCI RCI
RCI
 
Digital Marketing ROI Workshop
Digital Marketing ROI WorkshopDigital Marketing ROI Workshop
Digital Marketing ROI Workshop
 
How Start-up’s Can Grow Online
How Start-up’s Can Grow OnlineHow Start-up’s Can Grow Online
How Start-up’s Can Grow Online
 
nextNY Online Marketing School - SEM Presentation
nextNY Online Marketing School - SEM PresentationnextNY Online Marketing School - SEM Presentation
nextNY Online Marketing School - SEM Presentation
 
PPC Steps Introducing
PPC Steps IntroducingPPC Steps Introducing
PPC Steps Introducing
 
Digital marketing-overview
Digital marketing-overviewDigital marketing-overview
Digital marketing-overview
 
Blackglass affili@ syd
Blackglass affili@ sydBlackglass affili@ syd
Blackglass affili@ syd
 
Navigating the Digital Landscape: Strategies for Effective Digital Marketing
Navigating the Digital Landscape: Strategies for Effective Digital MarketingNavigating the Digital Landscape: Strategies for Effective Digital Marketing
Navigating the Digital Landscape: Strategies for Effective Digital Marketing
 
Microsoft Advertising Bootcamp - Morning Session
Microsoft Advertising Bootcamp - Morning SessionMicrosoft Advertising Bootcamp - Morning Session
Microsoft Advertising Bootcamp - Morning Session
 
Data Driven Digital Marketing Strategy
Data Driven Digital Marketing Strategy Data Driven Digital Marketing Strategy
Data Driven Digital Marketing Strategy
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Azizul Hakim Poster PDF1
Azizul Hakim Poster PDF1Azizul Hakim Poster PDF1
Azizul Hakim Poster PDF1
 
Dennis yu social amplification engine_guide_v5.1_2016_1210
Dennis yu social amplification engine_guide_v5.1_2016_1210Dennis yu social amplification engine_guide_v5.1_2016_1210
Dennis yu social amplification engine_guide_v5.1_2016_1210
 
Analytics + Brain = $
Analytics + Brain = $Analytics + Brain = $
Analytics + Brain = $
 
E marketer integrating_search_and_display-tactics_for_more_effective_advertising
E marketer integrating_search_and_display-tactics_for_more_effective_advertisingE marketer integrating_search_and_display-tactics_for_more_effective_advertising
E marketer integrating_search_and_display-tactics_for_more_effective_advertising
 
The ROI of Marketing Automation
The ROI of Marketing AutomationThe ROI of Marketing Automation
The ROI of Marketing Automation
 
The roi of marketing automation
The roi of marketing automationThe roi of marketing automation
The roi of marketing automation
 
Ebook definitive guide to attribution final
Ebook definitive guide to attribution finalEbook definitive guide to attribution final
Ebook definitive guide to attribution final
 
Pay-Per-Click Marketing Principles Part 1
Pay-Per-Click Marketing Principles  Part 1Pay-Per-Click Marketing Principles  Part 1
Pay-Per-Click Marketing Principles Part 1
 

More from Armando Vieira

Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)Armando Vieira
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsArmando Vieira
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars salesArmando Vieira
 
Visualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyVisualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyArmando Vieira
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithmsArmando Vieira
 
Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015Armando Vieira
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Armando Vieira
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationArmando Vieira
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningArmando Vieira
 
Online democracy Armando Vieira
Online democracy Armando VieiraOnline democracy Armando Vieira
Online democracy Armando VieiraArmando Vieira
 
Invtur conference aveiro 2010
Invtur conference aveiro 2010Invtur conference aveiro 2010
Invtur conference aveiro 2010Armando Vieira
 
Tourism with recomendation systems
Tourism with recomendation systemsTourism with recomendation systems
Tourism with recomendation systemsArmando Vieira
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionArmando Vieira
 
Artificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysisArtificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysisArmando Vieira
 

More from Armando Vieira (20)

Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)Improving Insurance  Risk Prediction with Generative Adversarial Networks (GANs)
Improving Insurance Risk Prediction with Generative Adversarial Networks (GANs)
 
Boosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithmsBoosting conversion rates on ecommerce using deep learning algorithms
Boosting conversion rates on ecommerce using deep learning algorithms
 
Seasonality effects on second hand cars sales
Seasonality effects on second hand cars salesSeasonality effects on second hand cars sales
Seasonality effects on second hand cars sales
 
Visualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and ShinyVisualizations of high dimensional data using R and Shiny
Visualizations of high dimensional data using R and Shiny
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015Extracting Knowledge from Pydata London 2015
Extracting Knowledge from Pydata London 2015
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective acceleration
 
Credit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learningCredit risk with neural networks bankruptcy prediction machine learning
Credit risk with neural networks bankruptcy prediction machine learning
 
Online democracy Armando Vieira
Online democracy Armando VieiraOnline democracy Armando Vieira
Online democracy Armando Vieira
 
Invtur conference aveiro 2010
Invtur conference aveiro 2010Invtur conference aveiro 2010
Invtur conference aveiro 2010
 
Tourism with recomendation systems
Tourism with recomendation systemsTourism with recomendation systems
Tourism with recomendation systems
 
Manifold learning for bankruptcy prediction
Manifold learning for bankruptcy predictionManifold learning for bankruptcy prediction
Manifold learning for bankruptcy prediction
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Requiem pelo ensino
Requiem pelo ensino Requiem pelo ensino
Requiem pelo ensino
 
Eurogen v
Eurogen vEurogen v
Eurogen v
 
Artificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysisArtificial neural networks for ion beam analysis
Artificial neural networks for ion beam analysis
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 

Recently uploaded

Tata Kelola Bisnis perushaan yang bergerak
Tata Kelola Bisnis perushaan yang bergerakTata Kelola Bisnis perushaan yang bergerak
Tata Kelola Bisnis perushaan yang bergerakEditores1
 
Anyhr.io | Presentation HR&Recruiting agency
Anyhr.io | Presentation HR&Recruiting agencyAnyhr.io | Presentation HR&Recruiting agency
Anyhr.io | Presentation HR&Recruiting agencyHanna Klim
 
Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)Lviv Startup Club
 
Project Brief & Information Architecture Report
Project Brief & Information Architecture ReportProject Brief & Information Architecture Report
Project Brief & Information Architecture Reportamberjiles31
 
Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access
 
NewBase 25 March 2024 Energy News issue - 1710 by Khaled Al Awadi_compress...
NewBase  25 March  2024  Energy News issue - 1710 by Khaled Al Awadi_compress...NewBase  25 March  2024  Energy News issue - 1710 by Khaled Al Awadi_compress...
NewBase 25 March 2024 Energy News issue - 1710 by Khaled Al Awadi_compress...Khaled Al Awadi
 
Introduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptxIntroduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptxJemalSeid25
 
Q2 2024 APCO Geopolitical Radar - The Global Operating Environment for Business
Q2 2024 APCO Geopolitical Radar - The Global Operating Environment for BusinessQ2 2024 APCO Geopolitical Radar - The Global Operating Environment for Business
Q2 2024 APCO Geopolitical Radar - The Global Operating Environment for BusinessAPCO
 
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfPDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfHajeJanKamps
 
A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.mcshagufta46
 
Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access
 
Building Your Personal Brand on LinkedIn - Expert Planet- 2024
 Building Your Personal Brand on LinkedIn - Expert Planet-  2024 Building Your Personal Brand on LinkedIn - Expert Planet-  2024
Building Your Personal Brand on LinkedIn - Expert Planet- 2024Stephan Koning
 
Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024Winbusinessin
 
Graham and Doddsville - Issue 1 - Winter 2006 (1).pdf
Graham and Doddsville - Issue 1 - Winter 2006 (1).pdfGraham and Doddsville - Issue 1 - Winter 2006 (1).pdf
Graham and Doddsville - Issue 1 - Winter 2006 (1).pdfAnhNguyen97152
 
AMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdf
AMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdfAMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdf
AMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdfJohnCarloValencia4
 
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...IMARC Group
 
Developing Coaching Skills: Mine, Yours, Ours
Developing Coaching Skills: Mine, Yours, OursDeveloping Coaching Skills: Mine, Yours, Ours
Developing Coaching Skills: Mine, Yours, OursKaiNexus
 
The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...
The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...
The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...Brian Solis
 
MoneyBridge Pitch Deck - Investor Presentation
MoneyBridge Pitch Deck - Investor PresentationMoneyBridge Pitch Deck - Investor Presentation
MoneyBridge Pitch Deck - Investor Presentationbaron83
 

Recently uploaded (20)

Tata Kelola Bisnis perushaan yang bergerak
Tata Kelola Bisnis perushaan yang bergerakTata Kelola Bisnis perushaan yang bergerak
Tata Kelola Bisnis perushaan yang bergerak
 
Anyhr.io | Presentation HR&Recruiting agency
Anyhr.io | Presentation HR&Recruiting agencyAnyhr.io | Presentation HR&Recruiting agency
Anyhr.io | Presentation HR&Recruiting agency
 
Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)Michael Vidyakin: Introduction to PMO (UA)
Michael Vidyakin: Introduction to PMO (UA)
 
Project Brief & Information Architecture Report
Project Brief & Information Architecture ReportProject Brief & Information Architecture Report
Project Brief & Information Architecture Report
 
Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024
 
NewBase 25 March 2024 Energy News issue - 1710 by Khaled Al Awadi_compress...
NewBase  25 March  2024  Energy News issue - 1710 by Khaled Al Awadi_compress...NewBase  25 March  2024  Energy News issue - 1710 by Khaled Al Awadi_compress...
NewBase 25 March 2024 Energy News issue - 1710 by Khaled Al Awadi_compress...
 
Introduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptxIntroduction to The overview of GAAP LO 1-5.pptx
Introduction to The overview of GAAP LO 1-5.pptx
 
Q2 2024 APCO Geopolitical Radar - The Global Operating Environment for Business
Q2 2024 APCO Geopolitical Radar - The Global Operating Environment for BusinessQ2 2024 APCO Geopolitical Radar - The Global Operating Environment for Business
Q2 2024 APCO Geopolitical Radar - The Global Operating Environment for Business
 
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdfPDT 89 - $1.4M - Seed - Plantee Innovations.pdf
PDT 89 - $1.4M - Seed - Plantee Innovations.pdf
 
A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.A flour, rice and Suji company in Jhang.
A flour, rice and Suji company in Jhang.
 
Investment Opportunity for Thailand's Automotive & EV Industries
Investment Opportunity for Thailand's Automotive & EV IndustriesInvestment Opportunity for Thailand's Automotive & EV Industries
Investment Opportunity for Thailand's Automotive & EV Industries
 
Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024Borderless Access - Global Panel book-unlock 2024
Borderless Access - Global Panel book-unlock 2024
 
Building Your Personal Brand on LinkedIn - Expert Planet- 2024
 Building Your Personal Brand on LinkedIn - Expert Planet-  2024 Building Your Personal Brand on LinkedIn - Expert Planet-  2024
Building Your Personal Brand on LinkedIn - Expert Planet- 2024
 
Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024Ethical stalking by Mark Williams. UpliftLive 2024
Ethical stalking by Mark Williams. UpliftLive 2024
 
Graham and Doddsville - Issue 1 - Winter 2006 (1).pdf
Graham and Doddsville - Issue 1 - Winter 2006 (1).pdfGraham and Doddsville - Issue 1 - Winter 2006 (1).pdf
Graham and Doddsville - Issue 1 - Winter 2006 (1).pdf
 
AMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdf
AMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdfAMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdf
AMAZON SELLER VIRTUAL ASSISTANT PRODUCT RESEARCH .pdf
 
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
Boat Trailers Market PPT: Growth, Outlook, Demand, Keyplayer Analysis and Opp...
 
Developing Coaching Skills: Mine, Yours, Ours
Developing Coaching Skills: Mine, Yours, OursDeveloping Coaching Skills: Mine, Yours, Ours
Developing Coaching Skills: Mine, Yours, Ours
 
The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...
The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...
The End of Business as Usual: Rewire the Way You Work to Succeed in the Consu...
 
MoneyBridge Pitch Deck - Investor Presentation
MoneyBridge Pitch Deck - Investor PresentationMoneyBridge Pitch Deck - Investor Presentation
MoneyBridge Pitch Deck - Investor Presentation
 

Optimization of digital marketing campaigns

  • 1.   Optimization  of  Digital  Marketing  Campaigns     Armando  Vieira,  Inesting     Abstract   In   this   work   we   apply   several   clustering,   visualization   and   predictive   machine   learning   techniques   to   analyse   data   from   digital   marketing   campaigns.   For   data   exploration   we   used   unsupervised  techniques  like  k-­‐means,  Principal  Component  Analysis  (PCA),  Multidimensional   Scaling   (MDS)   and   Self-­‐Organized   Maps   (SOM).   We   identified   patterns   that   help   the   analyst   understand   the   vast   amount   of   data   produced   by   digital   trails   and   guide   their   actions   (actionable   insights).   Support   Vector   Machines   and   Random   Forest   algorithm   were   used   for   supervised  learning  of  conversions  prediction.       Keywords:  ad  optimization,  Adwords,  Predictive  Analytics,  SEO,  digital  marketing     1 Introduction   Online   advertising   has   evolved   into   a   $50   billion   industry   and   continues   to   grow   by   double   digits.   On   the   other   hand,   powerful   web   analytic   tools,   such   as   Google   Analytics,   Facebook   Insights  or  Kissmetrics,  provide  key  data  easily  available  to  anyone  who  wants  to  monitor  the   performance   of   their   campaigns   online.   For   e-­‐commerce   sites,   the   analyst   has   the   ability   to   track  every  single  action  of  the  visitor  over  the  conversion  path  and  answer  the  fundamental   questions:  who,  what,  why,  how  and  when,  from  a  lead  to  the  purchase.     Our   interest   lies   in   monitoring   the   impact   campaigns   have   on   website   traffic,   engagement  and  revenue  (in  the  case  of  e-­‐commerce).    A  principal  form  of  online  advertising  is   the   promotion   of   products   and   services   through   search-­‐based   advertising.   Today’s   most   popular   search-­‐based   advertising   platform   is   Google   Adwords,   having   the   largest   share   of   revenues.  Search  remains  the  largest  online  advertising  revenue  format,  accounting  for  46.5%   of  2011  advertising  revenues,  up  from  44.8%  in  2010.  In  2011,  Search  revenues  totalled  $14.8   billion,  up  almost  27%  from  $11.7  billion  in  2010.     This   gives   an   unprecedent   power   to   the   marketing   team   but   at   a   cost:   the   huge   amounts  of  unstructured,  disparate  and  complex  data  to  be  processed  and  parameters  to  be   adjusted.   The   effort   required   to   deal   with   the   number   of   options   and   configurations   for   optimal  performance  of  a  company  website  is  simple  far  beyond  human  capabilities.   Furthermore  some  parameters  have  non-­‐linear  interactions:  for  instance  the  quality  of   the   SEO   boosts   the   position   of   the   Ad   in   Adwords   campaigns,   thus   achieving   a   better   performance   for   a   lower   PPC.   The   budget   allocated   to   the   campaign   also   influences   the   Ad   position.  There  are  even  subtler  influences  and  nuances  when  measuring  the  ROI.  For  instance,   it   is   known   that   although   display   advertising   brings   very   little   direct   sales,   it   may   boost   the   performance  of  search  Ads  since  users  where  previously  exposed  to  the  product  or  brand.   To  optimize  this  myriad  of  parameters  we  need  to  rely  on  machine  learning  algorithms   to   extract   actionable   insights   and   answers   some   simple   questions   like:   how   to   improve   my   return   on   investment   (ROI)?   How   to   boost   costumer   engagement?     What   product   generate   most   interest?   What   catalysis   sales?   What   strategy   to   opt?   What   channels   to   choose?   How   much  should  I  invest?  When,  how?  These  are  very  important  question  with  no  clear  a  single   answer.  Most  of  them  depend  on  each  case,  and  some  are  two  vague  to  be  answered.   Under   these   circumstances,   the   safe   strategy   starts   by   design   carefully   an   ad,   select   adequate   keywords,   set   the   bids,   segment   the   campaign   properly   and   test   continuously   for  
  • 2. fine-­‐tuning.  If  results  are  not  as  expected,  then  look  at  the  data,  learn,  make  corrections,  and   repeat  the  cycle.     Most   the   research   have   been   focused   on   the   publisher   side,   trying   to   device   strategies   to  maximize  the  CTR  of  Ads,  by  means  of  content  contextualization,  ads  personalization  among   others  [**].  In  this  work,  however,  we  take  the  perspective  of  the  advertiser  and  will  explore   the   potential   of   machine   learning   tools   for   prediction   and   optimization   of   the   marketing   strategy.   The   objective   is   to   maximize   performance   and   effectiveness   of   marketing   campaigns,   namely   the   Return   On   Investment   (ROI).   We   propose   a   system   to   extract   information   from   Google  Analytics  and  determine  the  most  important  for  optimization.     The   article   is   organized   as   follows.   In   section   2   we   introduce   the   data   and   pre-­‐processing.   In  section  3  we  explore  the  data  and  extract  relevant  features  using  clustering  algorithms,  like   k-­‐means,  PCA  and  MDS  and  SOM.  In  Section  4  we  introduce  the  supervised  learning,  where  we   predict  Conversions,  Revenues  and  user  engagement.  Finally  in  section  6  some  conclusions  are   drawn.   2 Data     2.1 Data  Extraction  and  description     Data  was  collected  from  a  costumer  running  campaigns  on  an  ecommerce  site  with  Adwords   campaign,  Facebook  and  email  marketing.  Data,  collected  on  a  daily  frequency  over  a  period  of   6   months,   is   described   in   Table   1.   Our   main   data   sources   are   Google   Analytics   (GA)   -­‐   that   aggregate   data   from   Google   Adwords   -­‐   and   Facebook   Insights.   We   focused   on   inputs   that   may   give  us  access  to  insights,  namely  correlations  between  conversions  and  site  usage  or  Adwords   campaigns.     We   used   the   package   RGoogleAnalytics   (RGA)   to   extracted   data   into   R   from   Google   Analytics.   We   collected   data   from   Adwords,   Facebook   and   email   campaigns   -­‐   Table   1.   Data   was  collected  over  different  timeframes  and  consolidated  by  date.  For  some  cases,  data  was   decomposed  by  traffic  source  in  GA,  and  by  group  segment  as  in  case  of  Adwords,  so  each  data   point   corresponds   to   a   specific   segment   on   a   specific   day.   Two   data   set   were   build:   Data   1:   with  just  adwords  other  with  analytics+facebook+email:  Data  2.     Table  1  variables  used  for  analysis.  The  colour  fields  are  data  from  campaigns.     Variable   Visit  length   Number  of  visits   Bounce  rate   Page  per  visit   Ad/campaign  group   Cost  per  Click   Position   Type     Click  Through  Rate   Conversion  Rate   Impressions   F AdWords   a c e b o o k   General   Traffic  source     Name   Comments   (Metric/Dimension)   TO  (D)   Organic,  Email,  Adwords,   Facebook,  Others   VL  (M)     NV  (M)     BR  (M)     PV  (M)     CG  (D)   Group  of  Ad   CPC  (M)     P  (M)     T  (D)   Search,  display   CTR  (M)     CRA  (M)     Imp  (M)    
  • 3. Emails           Click  through  rate   CTRf(M)   Cost  per  like   CPL  (M)   Convertion  Rate  Facebook   CRF(M)         Emails  Sent   Open  Rate   Click  Rate   Conversion  Rate  email   Total  revenue           Revenue  from  sales   Em  (M)   OR  (M)   CT  (M)   CRE  (M)   Re  (M)   2.2 Performance  Ratios   For   visualization   proposes,   we   consider   several   aggregated   metrics   to   benchmark   the   performance   of   a   website   and   the   digital   campaigns.   We   divide   the   metrics   into   two   major   categories:   website   usability   and   financial   performance.   All   indexes   are   defined   to   have   values   between  0  and  1.     A  site  can  be  highly  engaging…     Website  usability  metrics   We  defined  the  engagement  as  a  composite  index,  defined  according  to  [8]  as:     E = ∑ Cdi + Dd i +Idi + (1 − Bri )   i where  Br  is  the  bounce  rate  and  the  other  indices  are  defined  below.  The  sum  runs  over  any   aggregation   metric   that   we   may   be   interested.   The   coefficients   are   obtained   from   sessions   originated  from  a  particular  dimension:  visitor  id,  traffic  source,  time,  etc.  This  index  has  the   € advantage  of  benchmarking  the  quality  of  the  site  and  the  interaction  of  user  with  the  content.     Click  Depth  index  (Cd)  measures  the  degree  depth  visits  and  is  defined  as:     Cd = Sessions with at least 4 page views   All sessions   Duration  Depth  index  (Dd)  measures  the  intensity  of  the  visits  captured  by  the   duration  of  visits  on  the  website.  It  is  defined  as:   €
  • 4.   Dd =   Sessions with a duration of at least 3 min   All sessions The   Interaction   depth   index,   (Id),   captures   the   visitor   interaction   with   content   or   functionality  designed  to  increase  level  of  Attention.  It  is  defined  as:     € Id = Sessions where visitors complete an action   All sessions   where   an   action   can   be   defined   as   a   goal   on   GA,   from   downloading   a   document,   to   filling   a   form  or  watching  a  video.   €   Financial  metrics   Engagement   with   a   website   is   important,   but   the   really   important   metrics,   especially   for   e-­‐ commerce  sites,  are  sales  or  leads.  This  is  captured  by  financial  metrics  ratios.     There   are   dozens   of   financial   ratios   to   measure   efficiency   of   a   sales   channel,   but   we   will  focus  on  the  following:       • CR,  Conversion  Rate   • RPC,  Revenue  Per  Channel   • ROI,  Return  On  Investment     The  CR  rate  is  simple  defined  as:     CR = Sessions where visitors purchage a produt   All sessions   Typical  CR  are  low,  1%  is  considered  very  good  for  most  sites,  but  it  can  be  as  low  as  0.001%.     The  Revenue  per  channel  (RPC)  is  the  total  value  earned  by  a  sales  channel  over  a  fixed   € period  of  time.  The  ROI  of  a  channel  is  simply  the  ratio  of  revenue  per  total  investment  made   on  this  channel:     ROI = RPC   Total cost   In   Figure   1   we   show   the   evolution   of   Engagement   and   ROI   over   time   for   the   2   mains   traffic   origin  sources.   €  
  • 5. Figure  1:  Engagement  over  time  (days)  for  using  a  moving  average.         In   Figure   2,   we   plot   the   revenue   per   origin   of   traffic.   The   most   important   source   for   revenue   was   Facebook,   while   Google   Organic   ranks   second   and   Adwords   third.   The   most   consist  channels  are  Direct  traffic  and  email  newsletter.   Figure  2:  revenue  distribution  per  channel  (top  6).     3 Data  visualization  with  unsupervised  techniques   In  this  section  we  will  use  some  techniques  for  data  exploration  and  visualization  in  order  to   detect   patterns   and   features   that   are   hidden   in   high   dimensional   data.   We   will   use   non-­‐ supervised   clustering   techniques,   from   simpler   ones,   like   k-­‐means,   to   more   elaborate   one,   like   Self  Organized  Maps  (SOM)  and  Multi  Dimensional  Scaling  (MDS).    
  • 6. 3.1 Adwords  Data   We  start  by  characterizing  the  data  by  plotting  the  box  plots  in  Figure  3  where  the  number  of   conversions,  the  CTR  and  CR  are  displayed  for  all  Adgroups  in  our  campaign.  There  are  three   Ad  groups  that  have  the  majority  of  conversions  (sales):  group  9,  10  and  11.  The  average  CTR  is   almost   constant   for   most   of   the   groups   (around   6%),   but   in   some   cases   we   don’t   have   enough   data   to   evaluate   it   with   accurately.   The   average   position   is   1.68   and   the   average   CR   is   0.2%,   showing  a  greater  variability  than  the  CTR.             Figure   3:   Boxplot   of   CTR   (red),   number   of   conversions   (blue)   and   CR   (green)   for   all   Adwords   groups       In   Figure   4   we   plot   the   weekly   revenues   and   costs   over   a   period   of   6   months   of   the   adwords  campaign.    Initally  the  campaign  was  not  very  efficient  since  we  run  a  trial  period  to   test   and   optimized   its   content,   targeting   and   keywords.   After   week   6   a   boost   on   investment   also  bring  a  more  than  propotional  increase  in  sales.      
  • 7.   Figure  4:  Revenue  and  cost  per  week  on  Adwords  campaigns.     Clustering   We   then   cluster   the   data   using   the   k-­‐means   algorithm.   K-­‐means   is   one   of   the   simplest   and   most   widely   used   algorithm   for   non-­‐supervised   clustering.   The   only   input   is   the   number   of   clusters   k   and   the   metric   used   to   calculate   the   distances   between   points.   We   tested   the   algorithm   from   two   to   five   clusters   using   the   Euclidian   distance   on   the   Adwords   data.   The   optimum  compromise  between  intra  and  inter  cluster  distance  was  achieved  at  k  =  3  clusters.   Results   are   presented   in   Figure   5   where   we   selected   the   dimensions   CTR   and   number   of   Clicks   as   representative   axis.   The   four   patterns   are   very   clear   in   this   figure   and   the   centroids   are   presented  in  Table  2.  It  can  be  seen  that  most  conversions  come  from  the  green  group,  which   corresponds  to  the  greater  number  of  visits  and  clicks.  Number  of  page  visits  is  also  a  strong   indicator   of   revenue.   Error!   Reference   source   not   found.   show   the   clustering   on   page   views   and  visitors.  CTR,  CPC  and  position  are  almost  the  same  for  the  three  groups.   Figure  5:  K-­‐means  algorithm  with  3  clusters  for  data  set  1.    
  • 8.   Table  2:  Centres  of  the  4  clusters  obtained  by  kmeans  for  the  Adwords  data  set   Cluster   Cost   Clicks   Imp.   Revenue   1   56.7 327 4739 85.1 2   81.7 474 6610 3   20.8 73 1194 CTR(%)   CPC   Position   0.07 0.14 1.79   124.9 0.08 0.15 1.71   14.1 0.06 0.17 1.30         In  Figure  6  we  plot  the  structure  of  Graph  of  correlations  with  R  function  qgraph  for   the  Adwords  data  set.  There  are  strong  correlations  between  **.???     Figure  6  correlations  with  QGrapph       3.2 PCA   Principal  Component  Analysis  is  one  of  the  oldest  and  wider  used  approaches  to  compress  high   dimensional  data  into  a  sub-­‐set  of  linear  components.  It  has  the  disadvantage  of  being  a  linear   model,  but  it  still  very  useful.  In  Figure  7  we  plot  the  eigen-­‐values  of  the  components  in  a  bi-­‐ dimensional  plot.  Two  main  principal  components  are  clearly  seen.  Note  that  conversions  are   highly  correlated  with  ad  groups.  
  • 9.   Figure  7  PCA  for  the  Adwords  (left)  data  and  Google  Analytics  (right).       3.3 SOM   Self-­‐organizing  map  (SOM)  is  an  unsupervised  neural  network  proposed  by   Kohonen  (Kohonen   2001)   for   visual   cluster   analysis.   The   neurons   of   the   map   are   located   on   a   regular   grid   embedded   in   a   low   (usually   2   or   3)   dimensional   space,   and   associated   with   the   cluster   prototypes.  In  the  course  of  learning  process,  the  neurons  compete  with  each  other  through   the  best  matching  principle,  i.e.,  the  input  is  projected  to  the  nearest  neuron  using  a  defined   distance  metric.  The  winner  neuron  and  its  neighbours  on  the  map  are  adjusted  towards  the   input  in  proportion  with  the  neighbourhood  distance,  consequently  the  neighbouring  neurons   likely   represent   the   similar   patterns   of   the   input   data   space.   Due   to   the   data   clustering   and   spatialization  through  the  topology  preserving  projection,  SOM  is  widely  used  in  the  context  of   visual  clustering  applications.       SOM  is  very  appropriate  to  analyze  the  high-­‐dimensional  data  of  digital  metrics  range   of   research   groups   concentrate   on   the   bankruptcy   prediction   problem,   usually   solved   as   a   classification   task   to   separate   the   companies   into   distress   and   healthy   category   (binary)   or   a   number  of  predefined  credit  rates  (multi-­‐class).     SOM  is  used  to  determine  the  class  through  a  visual  exploration  (Merkevicius,  Garsva   &  Simutis  2004).  An  enhanced  version  of  LVQ  can  boost  the  prediction  performance  of  multi-­‐ layer   perceptron   neural   network   (Neves   &   Vieira   2006).   In   cooperation   with   independent   component   analysis   for   dimensionality   reduction,   LVQ   is   employed   to   recognize   the   distressed   French  companies  (Chen  &  Vieira  2009).  
  • 10. Figure  8:  SOM  for  data  set  1  –  Adword  campaigns  on  a  6x5  =  30  cells  space.     3.4 MDS   SOM   methods,   presented   previously,   involves   the   estimation   of   the   conditional   probability   which   is   computationally   expensive   and   hard   to   extract.   Here   we   test   the   Multidimensional   Scaling   algorithm   (MDS).   MDS,   is   a   non-­‐linear   approach,   mostly   used   for   visualizing,   that   captures   the   level   of   similarity   of   individual   cases   of   a   dataset.   It   is   used   to   display   the   information   contained   in   a   distance   matrix,   evaluated   according   with   some   metric.   The   MDS   algorithm   place   each   object   in   N-­‐dimensional   space   such   that   the   between-­‐object   distances   are   preserved   as   well   as   possible.   Each   object   is   then   assigned   coordinates   in   each   of   the   N   dimensions.  The  number  of  dimensions  of  an  MDS  plot  N  can  exceed  2  and  is  specified  a  priori.   Choosing  N=2  optimizes  the  object  locations  for  a  two-­‐dimensional  scatterplot  -­‐  Figure  9.   Figure  9:  Aggregation  by  MDS  on  data  set  2.  Colours  represents  revenues  levels  (black  =  lowest,  light  blue  =   highest).       3.5 Heatmaps  and  ROI   We  now  investigate  the  return  on  investment  (ROI)  from  Adwords  and  Facebook  campaigns.   The  Facebook  campaign  run  over  the  same  period  as  the  Adwords  with  a  daily  budget  between  
  • 11. 10  and  40  euros  -­‐  Figure  10.  The  ROI  is  in  general  bigger  than  1,  meaning  that  the  campaign   is  producing  good  results.  We  we  consider  the  global  performance  (Sales  originated  from   all   channels)   the   ROI   almost   duplicate   –   considering   as   cost   only   the   investment   in   Adwords  and  Facebook.     Figure  10  :  ROI  over  time  (days)  -­‐  using  moving  averages:  (red)  Adwords,  (blue)  Total.     We  now  plot  the  ROI  for  the  payed  channels.  Email  is  number  one,  as  expected,  due  to   the  small  cost  of  promotion.  ROI  and  Eng  for  Data  1.  **     Heat  maps   Heat  maps  are  a  good  visualization  method  for  data  exploration  and  causality  explanation.  In   this   case   we   use   it   to   explore   conversions   and   engagement   into   a   calendar   to   visually   spot   trends.   We   use   the   GGplot2   library   to   create   a   Calendar   heatmap   with   data   from   6   months.   We  plot  engagement,  visits  as  well  as  transactions  on  calendar  so  we  get  perspective  on  how   they  interact  viz-­‐a-­‐viz  timeline.   In   this   case   it   is   interesting   to   note   that   Tuesdays   have   high   visits   days   but   Wednesday   has   been   the   day   when   most   transactions   occurs.   Visits   increases   towards   the   end   of   year   (shopping  season)  and  then  slows  down  towards  year  start.  Engagement  has  been  improving   over  time.    
  • 12.   Figure  11:  Heatmap  calendar  for  visits  (top)  and  revenue  (bottom)  over  the  last  6  months.           4 Supervised  Learning  for  Revenue  Prediction   In   previous   sections   we   explored   the   data   patterns   without   concerns   about   causality   between  observations  (non-­‐supervised  learning).  In  this  section  we  go  a  step  forward   and  use  supervised  learning  to  make  predictions  on  data  based  on  past  records.  This  is   very  important  as  it  provides  explanation,  “the  why”  instead  of  “the  what”  as  we  enter   the  field  of  predictive  analytics.     First  we  consider  the  problem  from  a  broader  perspective:  can  we  predict  the   revenue   from   a   certain   channel   by   looking   at   the   data   traffic   generated?   If   so,   with   how  much  accuracy  and  confidence?  What  is  the  difference  in  behaviour  from  a  user  
  • 13. that   finalizes   a   purchase   from   other   users?   To   answer   these   questions   we   run   supervised  algorithms  trained  with  past  data  and  perform  classification  analysis.     First   step,   we   enrich   our   data   extracting   extra   metrics   drill   down   by   5   dimensions   (time,   traffic   source,   adwords   ad   group,   operating   system,   and   city).   The   metrics   used   are:   number   of   visits,   average   pages   per   visit,   average   visit   duration,   bounce   rate,   visit   depth,   CTR,   page   load   time,   social   interaction   and   cost   of   ads   on   Adwords   and   Facebook.   From   these   metrics   we   extract   the   additional   performance   ratios  described  in  Section  2.2.  In  which  concerns  the  traffic  sources,  we  selected  only   the  top  10  performers.  We  consider  a  conversion  when  at  least  one  sale  is  concluded.   All  data  is  aggregated  with  a  daily  granularity.       We   run   the   algorithms   as   a   classification   task,   trying   to   predict   when   a   given   visit   leads   to   a   conversion   in   a   given   session.   The   data   set   contains   5680   sessions   of   which   432   have   conversions.   We   used   Support   Vector   Machines   and   Random   Forest   algorithm   since   they   can   easily   deal   with   categorical   and   continuous   inputs,   can   be   trained  with  very  few  examples,  and  does  not  overfit.         Since   many   more   visit   lead   to   non-­‐conversions   than   conversion,   we   create   a   balanced  data  set  by  randomly  eliminating  entries  that  don’t  lead  to  conversions.  We   end   up   with   864   training   examples.   All   data   was   normalized   and   the   algorithm   was   tested  using  10-­‐fold  cross  validation.   In  Figure  13  we  plot  the  ROC  curve  obtained  over  a  period  of  165  days.  The  AUC   obtained  was  0.84.  For  comparison,  we  used  SVM  and  the  AUC  =  **.  This  is  somehow   surprising   result   given   the   small   set   of   inputs.     In   order   to   separate   the   traffic   from   Adwords,   we   run   the   algorithm   without   traffic   from   this   source.   The   results   have   improved  slightly.     Random   forest   returns   several   measures   of   variable   importance.   The   most   reliable   measure  is  based  on  the  decrease  of  classification  accuracy  when  values  of  a  variable  in  a   node  of  a  tree  are  permuted  randomly,  and  this  is  the  measure  of  variable  importance.      Table  3  presents  the  best  discriminating  indicators  in  predicting  conversions:  traffic   origin  and  the  number  of  visits  –  see  also  Figure  12.    
  • 14.   Figure  12:  dispersion  of  inputs  for  data  set  2.     Figure  13:  ROC  curve  for  the  conversion  prediction  with  Random  Forest  and  SVM  algorithms.   FPR:  False  positive  rate,  TPR:  true  positive  rate.   Table  3:  Best  performing  conversion  prediction  indicators  for  the  two  datasets.   All  Variables   Traffic  Source   Number  of  visits   Number  of  visits   Bounce  Rate   Bounce  rate   Visit  Length   Visit  length     All  without  Adwords   Time  on  site  
  • 15. 5 Conclusions   In   this   work   we   have   used   a   set   of   machine   learning   techniques   for   data   exploration   and   predictive  analytics.  It  was  shown  that  exploratory  tools  can  help  understand  the  dynamics  of   digital  campaigns.       We   used   Random   Forest   algorithms   (a   collection   of   decision   trees)   and   SVM   to   predict   the  conversions  with  a  reasonable  accuracy.  The  most  important  features  are  number  of  visits,   origin  of  traffic  and  visit  duration.  Surprisingly,  we  found  that  CTR  and  CR  have  little  influence   as  predictors  of  conversions.     6 References   • • • • • • •         1.  Benjamin  Edelman,  Michael  Ostrovsky,  and  Michael  Schwarz:  "Internet  Advertising   and   the   Generalized   Second-­‐Price   Auction:   Selling   Billions   of   Dollars   Worth   of   Keywords".  American  Economic  Review  97(1),  2007  pp  242-­‐259   2.   P.   Maille,   E.   Markakis,   M.   Naldi,   G.   D.   Stamoulis,   B.   Tuffin.   Sponsored   Search   Auctions:   An   Overview   of   Research   with   Emphasis   on   Game   Theoretic   Aspects.   To   appear  in  the  Electronic  Commerce  Research  journal  (ECR).     3.  Andrei  Broder,  Vanja  Josifovski.  Introduction  to  Computational  Advertising  Course,   Stanford  University,  California     4.   Anand   Rajaraman   and   Jeffrey   D.   Ullman.   Mining   of   massive   datasets.   Cambridge   University  Press,  2012,  Chapter  8  –  Advertising  on  the  Web   5.  James  Shanahan.  Digital  Advertising  and  Marketing:  A  review  of  three  generations.   Tutorial  on  WWW  2012   7.  IAB’s  Internet  Advertising  Revenue  Report  http://www.iab.net/AdRevenueReport   http://www.webanalyticsdemystified.com/downloads/Web_Analytics_Demystified_an d_NextStage_Global_-­‐_Measuring_the_Immeasurable_-­‐_Visitor_Engagement.pdf