…
How old
are you?
FROM THE BIG BANG
TO ECOMMERCE, A
JOURNEY IN MAKING
SENSE OF BIG DATA
Patrick Deglon
Director of Global Traffic Analytics
pdeglon@ebay.com
linkd.in/pdeglon
Agenda

1

Introduction:
CERN & eBay

2

eBay
Infrastructure

3

Examples
of Analysis

4

Partnership
& Trust

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

4
5
Image: CERN
During 1996-2002, worked at CERN
(the European Laboratory for Particle Physics)
for my MS and PhD at the University of Geneva
Mont
Blanc

Geneva
Switzerland

17 miles underground tunnel
for the LEP & LHC accelerator
Source: CERN 6
Image: CERN
7

Image: CERN

Source: CERN
Example of a particle collision

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

8
Solving the puzzle… which particles go together?
1. AB + CD?
2. AC + BD?
3. AD + BC?

A

B

?

D

C
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

9
PAW – Physics Analysis Workstation
Source: Wikipedia

Tape robot

Data collection & analysis was
done in Fortran. Advance
analysis/statistics was done
through PAW. [1996-2002]

Source: CERN

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

10
Solution: Big Data infrastructure enables large scale
computational such as combine all possibilities (cross-product)
Schematic View

CERN Example
(discovery of a new particle bb)

Signal
(particle resonance)

Statistical Noise

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html

11
Size of the electron?

R < 5.1 x 10-19 m ***

*** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3
au LEP, Th. phys. Genève, 2002; Sc. 3332

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

12
Extra dimension?
MS > 1.1 TeV ***

graviton

extra
dimension

e+
e+
ee-

our universe in 4 dimensions

*** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3
au LEP, Th. phys. Genève, 2002; Sc. 3332

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

13
2004, joined
eBay European HQ
in Bern, Switzerland

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

14
$68 billion
in merchandise traded in 2011 ... or

$1.3 million every

10 minutes

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

15
eBay: The World's Online Marketplace®
every

every

every

26
2
4
min. min. sec.
a Ford Mustang is sold

a major appliance is sold

a pair of shoes is sold

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

16
CERN vs EBAY
CERN

EBAY

• Write kilometers long Fortran code
• Analysis can run for many hours… before a
batch robot error

• Write miles long SQL code
• Queries can run for many hours… before a
spool space error

• Study billions of collision data

• Study billions of transactional data

• Great depth of data structure & complexity

• Great depth of data structure & complexity

• Know your local expert for question – but try
to find the solution by yourself… much
quicker

• Know your local expert for question – but try
to find the solution by yourself… much
quicker

• Remove “bad runs” (unclean data batch)

• Remove “wackos” (non material
transactions)

• Transform a complex system into insights

• Transform a complex system into insights

• Communicate findings to conferences

• Communicate recommendation to business
review

• Strong competitive landscape (4 distinct
experiments competing to the first to
publish, or publish better results)

• Strong competitive landscape

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

17
Analytics at eBay
“CIO”

“CDO”

“CAO”

“CMO”

Analytics Platforms & Delivery (APD)

Analytics

Marketing







Technology

Finance

Business
Units

End Users
of Big Data

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

18
What my friends
think I do

What my mum
thinks I do

What the BU
thinks I do

What I think I do

What the BU
wants me to do

What I really do

Source: Pierre Donzier
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

19
EBAY INFRASTRUCTURE

1

Introduction:
CERN & eBay

2

eBay
Infrastructure

3

Examples
of Analysis

4

Partnership
& Trust
Core Analytics
Data Access
Business Centric

DataHub

MS Excel

Tableau

Data Platform

Technology Centric

SAS/R

OBIEE

MicroStrategy

Analyze & Report

SOA/DAL

Purpose
Built Aps

SQL

Discover & Explore

EDW

“SINGULARITY”

HADOOP CLUSTERS

ENTERPRISE-CLASS SYSTEM

LOW END ENTERPRISE-CLASS SYSTEM

COMMODITY HARDWARE SYSTEM

Teradata 55xx and 66xx Series
Relational Data
Dual System

10+ PB

Semi Structured &
Relational Data
Deep Storage

Unstructured Data
Pattern Detection
Deep Storage

40+ PB

40+ PB

Data Integration
Ab Initio

Informatica

Golden Gate

UC4

BES

MapReduce

21
DW Sandbox enables agile analytics

Analytics teams have access
to sandboxes within eBay
Teradata data warehouses
(~ 100 GB per sandbox):
• Enable to keep the “Single

analyst’s
sandbox
Teradata Data Warehouse

Point of Truth” philosophy

• Improved Time To Market – Days / Weeks vs Months
• Enable the business to do agile prototyping
• Enable the users to “Fail

Fast” – Make it easy to try out new ideas

• Eliminate isolated Data Marts
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

22
SO… WHERE DO WE GO
FROM HERE?

1

Intro:
CERN & eBay

2

eBay
Infrastructure

3

Examples
of Analysis

4

Partnership
& Trust
Measuring impact of initiatives
A/B test

Pre/Post analysis

illustrative example (Simulation)

illustrative example (Simulation)

Number
of purchases

Number
of listings
35,000

Initiative
launched

450
400

Impact of the
initiative

350
300

test group

200
150

50
0
Aug 1st

pre

2012

post

D

25,000
20,000

250

100

30,000

Impact of the
initiative

Initiative
launched

B

15,000

2011

C

10,000
control
group

Sep 1st

5,000

Oct 1st

• Randomized Test/Control group
methodology is a golden standard in
research

A

0
Aug 1st

Sep 1st

Oct 1st

• Used to measure the impact of an
initiative in a full market or a market
segment
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
Marketing 101

Cost
Direct Return

Purchase

L C

L

Incr Return

?
No Purchase

?

C

D
Don‟t
Do Marketing

D
Do Marketing

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

25
Medici Effect
• New ideas proliferate when professional or cultural fields collide.
That‟s the “Medici Effect.“
• During the Renaissance, the Medici family enabled such collisions
by funding various fields and facilitating interdisciplinary creativity.

House of Medici

Michelangelo

Source:

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

26
Remember this physics problem?
1. AB + CD?
2. AC + BD?
3. AD + BC?

A

B

?

D

C
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

27
Solution: Big Data infrastructure enables large scale
computational such as combine all possibilities (cross-product)
Schematic View

CERN Example
(discovery of a new particle bb)

Signal
(particle resonance)

Statistical Noise

Combine correlated events and uncorrelated events produce a system with a
statistical noise (which is simple enough to extract) and the researched signal
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA
Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html

28
Big Data technologies enable the full Cartesian product of
Marketing action & Revenue generating events
Clicks – Conversion
Playground

Marketing Events
(Clicks or Impressions)

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

29
Alternative way to understand customer behavior &
incrementally: geographic experimentation
Revenues / Cost
3 per. Mov. Avg. (Group 1)

Baseline

3 per. Mov. Avg. (Group 2)

3 per. Mov. Avg. (Group 3)

Phase 1

3 per. Mov. Avg. (Group 4)

Phase 2

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

30
CREATING VALUE
THROUGH THE
ORGANIZATION
1

Introduction:
CERN & eBay

2

eBay
Infrastructure

3

Examples
of Analysis

4

Partnership
& Trust
Analytics as a function?
Embedded Model

Functional Model

“I‟m following my BU leader,
but can‟t get promoted”

“I‟m a partner of
business execution”
 Need to track
satisfaction/loyalty/trust
of our partnership
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

32
Net Promoter Score

NPS: How likely is that you will recommend [Brand Name] to a friend or a colleague?

0

1

2

3

4

5

6

7

8

very unlikely

9

10
very likely

Detractors

Passives

Promoters

NPS = % Promoters - % Detractors

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

33
The logic behind NPS
• To improve NPS, a company need to work on 2 fronts:
– Move Detractors into Passives
(i.e. fix the holes, i.e. no more unacceptable bad experiences)
– Move Passives into Promoters
(i.e. improve the whole experience, best-in-class buyer experience)

0

1

2

3

Detractors

4

5

6

7

8

Passives

9

10

Promoters

NPS = % Promoters - % Detractors

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

34
Side note: Error on NPS measurement
• NPS is a multinomial distribution with
– p the probability to answer 0 to 6
– q the probability to answer 7 or 8
– r the probability to answer 9 or 10
– N the number of answers
• The Expected value for the Net Promoter Score is then
E(NPS) = r – p
• The Variance is then
V(NPS) = V(r-p) = V(r) + V(p) – 2 Cov(r,p) =
r (1-r) / N + p (1-p) / N + 2 r p / N
• Hence the error on NPS, i.e. the Standard Deviation, is then
(NPS) = SQRT [ r (1-r) / N + p (1-p) / N + 2 r p / N ]

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

35
NPS is a measurement of Loyalty in a free environment. In a
paid environment, it‟s more a measurement of Trust between
co-workers/partners

Net Promoter Score
How likely is it that you would recommend working
with Analyst XXX to a friend or colleague?

0

1

2

3

4

5

6

7

8

9

10

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

36
eNPS Survey
Team eNPS
Survey

Partner eNPS
Survey

• Identify opportunity to better partner with the business
• Identify to better work together as a team
• Enable directional assessment of eNPS; keeping in mind
biases: low N, subjective question, unlikely to promote an
unknown entity, partner <> client (i.e. Finance vs Agency)
Now that we have a
measurement,
how to improve it?
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

37
What is Trust? How to improve it?

Trust =
Credibility
Reliability
Intimacy
Unselfishness
http://www.collieassociates.com/common/Trust_Equation.pdf

Words: Convincing & believable

Actions: Consistently good in
quality & performance
Emotions: Feel comfortable talking to you
about the sensitive, personal issues connected
to the surface issue
Motives: Know that you care about serving
higher interests

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

38
Build Trust: Trust Equation
Trust
=
R
×
C
×
I
×

Trust
Component
Reliability
(Actions = Consistently
good in quality &
performance)

Credibility
(Words = Convincing &
believable)

Insights
Discovery ®
Colors

Hartman
Personality
Profiles

Lead
completely

Fiery RED
“Do it now!”

RED
Power Wielders

Practice
judgment

Cool BLUE
“Do it right!”

BLUE
The Do-gooders

Keep it
human

Earth GREEN
“Do it
harmoniously!”

WHITE
The Peacekeepers

Trust each
other

Sunshine
YELLOW
“Do it together!”

YELLOW
The Fun Lovers

Intimacy
(Emotions = Feel
comfortable talking to
you about the
sensitive/personal
issues connected to
the surface issue)

Unselfishness

U

eBay
Success
Factor

(Motives = Know that
you care about serving
our higher interests)

Carl Jung,
Swiss psychologist

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

39
Example of an internal partners survey on
the Trust foundation
Translates ideas and concepts into action.

4.9

Turnaround requests effectively.

5.0

Is comfortable with change.

5.0

Is adept at prioritizing tasks.
Does what one says one will do.
Tell the truth.
Is genuine in saying „Thank you‟ or „I don‟t
know‟.
Is comfortable saying 'no' at the beginning
rather than being unable to deliver in the end.
Creates an environment to address potential
conflicts openly.

Reliability (4.9)

4.9
5.2
5.6
5.5

Credibility (5.3)

5.0
5.0

Seeks help when facing difficulties.

5.3

Has an appropriate sense of humor.

5.3

Responds to and understand the
feelings/needs of others.

5.4

Uses „we‟ rather than „they‟ or „I‟.
Makes time for others.

Intimacy (5.2)

5.2
5.4

Supports ideas for innovation from others.

5.3

Trusts others to make decisions and get things
done for them.

Unselfishness (5.3)

5.2

Please complete each of the following statements using the rating guide. Try to provide a rating for every statement
and be honest with your feedback.
Weak in this area=1, Some concerns=2, A minor shortfall=3, Competent=4, Better than competent=5, Outstanding=6

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

40
Trust Equation assessment by the team and our partners
Partner average answer
90

85

under confidence
zone
over confidence
zone

Intimacy,
Keep It Human
Credibility,
Meets Quality

80
Non Political,
Unselfishness

75
Reliability,
Meets Deadline

70
65
60
60

65

70

75

80

85

90

Team average answer

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

41
Reliability: Value of an Analysis

Keep It Simple & Stupid
Individual Limit

Total Cost

Direct Return
Preferred
analyst‟s
level of
complexity
Optimal
level of
complexity

Complexity of Analytics

Net Return (Profit)

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

42
Credibility: Principle Of Least Surprise (POLS)

Don‟t surprise executives & partners
with new metrics, new definition,
new format or anything new…
without a proper business reason.
Setup Insights & Recommendation
in a natural, logical, global &
agreed-upon framework.

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

43
Credibility: Fixed Standard… or Flexible Chaos?

Standardized
Global
Metrics

Store any thing to
enable measuring any
metrics to answer any
questions

Chaos enable
flexibility, but require a
strong process to
maintain credibility

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

44
(Business) Intimacy
• Keep It Human – meet people, talk to people, walk to desk, pick-up the phone
• Seek help when needed
• Have a good sense of humor – “It‟s just a website…”
• Create an enviroment where people can open-up and discuss underlying issue
• Respond to the need/feeling of others
• CONNECT with people (Avatar‟s “I see you”)

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

45
Unselfishness
• Don‟t work in silo
• Consider “we” rather than “I” or “they”
• Support ideas for innovation from other (improv‟s “yes, and…”)
• Trust other to make the right decision – and live with it
• Be AVAILABLE – make time for other

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

46
Wrapping Up
How complexity can spark innovation, but also kill effectiveness
• Medici principle
• KISS
• Managing chaos
Why an embedded or client-centric Analytics organization is not
necessarily a great idea
• Enable career path with an Analytics organization
• Partner vs Client
• eNPS - Maintain the pulse on the internal-client/partner satisfaction
Why analyst creativity is antagonistic to executive reporting
• Trust pillars: Reliability, Credibility, Intimacy, Unselfishness
• POLS
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

47
Q&A
FROM THE BIG BANG
TO ECOMMERCE, A
JOURNEY IN MAKING
SENSE OF BIG DATA
Patrick Deglon
Director of Global Traffic Analytics
pdeglon@ebay.com
linkd.in/pdeglon
Credibility: Key Phases of an Analytics Project
Move the
Business

Follow-up /
Implementation

Readout

Executive
Summary

Scoping

Hypothesis
to be verified

Scoping the
question

Measurement
set up

Measuring

Query

Data check

Guiding the
Business

Story Line /
Deck

Driving
Insights

Facts / Slides

Review
hypothesis
Data
manipulation

Interpretation

Statistics

Graphs

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

50
James, 32, live in Pittsburgh,
married, 1 child, Electronics Enthusiast
Site Visit

Site Visit

YouTube
Display Click

Site Visit
Offline
Store
Visit

Google Search on
“Digital Camera”,
click on eBay PS Ad

Google Search on
“eBay Digital Camera”
Click on NS link

Purchase

Loyalty Level
i.e. Likelihood to purchase on eBay
Woa.. They
really have
nice deals
on eBay

Ah…yes, e
Bay was a
good idea
– what do
they have?

That‟s really
expensive in
a store

Let‟s get
that
camera
now

Time
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

51
Marketing Attribution Logic
$
YouTube
Display Impression

Google Search on
“Digital Camera”,
click on eBay PS Ad

Google Search on
“eBay Digital Camera”
Click on NS link

Purchase

How does the purchase correlate to the customer touch points?
How “close”/”distant” are the clicks & the purchase?

Which one is the most important?

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

52
What is more important:
the front wheel or the back wheel?

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

53
Marketing Attribution Management
YouTube
Display Impression

Google Search on
“Digital Camera”,
click on eBay PS Ad

Google Search on
“eBay Digital Camera”
Click on NS link

Purchase

Define correlation (“distance”) between
customer touch points and purchase and
the likelihood that it happens
distance in time
distance in KW space

distance in Mindset

• Latency: time between click and ROI event (2 minutes? 2 hours? 2 days?)

• Relevancy: difference between Search keyword and Item purchased (KW-Title
relevancy, KW-Vertical relevancy)
• Loyalty: mindset of customer, i.e. RFM segment (Reactivation or Top Buyer)
• …

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

54
Marketing Attribution Management
Last Click

First Click

All Clicks

Model

YouTube
Display Impression

Google Search on
“Digital Camera”,
click on eBay PS Ad

Google Search on
“eBay Digital Camera”
Click on NS link
100%

YouTube
Display Impression
100%

Google Search on
“Digital Camera”,
click on eBay PS Ad

Google Search on
“eBay Digital Camera”
Click on NS link

YouTube
Display Impression
33%

Google Search on
“Digital Camera”,
click on eBay PS Ad
33%

Google Search on
“eBay Digital Camera”
Click on NS link
33%

YouTube
Display Impression
60%

Google Search on
“Digital Camera”,
click on eBay PS Ad
35%

Google Search on
“eBay Digital Camera”
Click on NS link
5%

Purchase

Purchase

Purchase

Purchase

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

55
… So what?
Last Click
Channel A
Channel B
Channel C

GMB
8%
5%
1%

ROI
+20%
-10%
+10%

• Reduce spend on channel B
• Invest in channel A
• When prioritizing, ignore
channel C

<>
All Clicks Model
Channel A
Channel B
Channel C

GMB
7%
6%
12%

ROI
-20%
+30%
+60%

• Reduce spend on channel A
• Invest heavily on channel C
• Marketing counts actually for
25% of the site

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

56
Example of the International Weekly Variance
Infrastructure (2007)

Automated SQL

Core DW

database

Excel
inputs

PDF
print-out

PET*

Modular
Back-end

single
pivot table

PPT &
Excel
report

Flexible
Front-end

* PET is a small database inside the Teradata Data Warehouse for building prototypes.
FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

57
Example of Automated Quarterly Market Review deck (2007)

PowerPoint chart object with a
“SQL” field containing a EXEC
MACRO to refresh data content
of the chart

Linked to an Excel file that can
we refresh when needed

PowerPoint table object with a
“SQL” field containing a EXEC
MACRO to refresh the table
content

58
PowerPoint Reporting Tool (2012)
Update the content of the selected objects (table or chart)
Update the content of all objects in the PowerPoint

Login to DW

Add a “SQL” tag to
objects (table of chart)
and edit the SQL

Create a dummy chart

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

59
Example of BI report using Tableau

FROM THE BIG BANG TO ECOMMERCE,
A JOURNEY IN MAKING SENSE OF BIG DATA

60

From the Big Bang to Ecommerce, a journey in making sense of Big Data

  • 1.
  • 2.
  • 3.
    FROM THE BIGBANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA Patrick Deglon Director of Global Traffic Analytics pdeglon@ebay.com linkd.in/pdeglon
  • 4.
    Agenda 1 Introduction: CERN & eBay 2 eBay Infrastructure 3 Examples ofAnalysis 4 Partnership & Trust FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 4
  • 5.
  • 6.
    During 1996-2002, workedat CERN (the European Laboratory for Particle Physics) for my MS and PhD at the University of Geneva Mont Blanc Geneva Switzerland 17 miles underground tunnel for the LEP & LHC accelerator Source: CERN 6 Image: CERN
  • 7.
  • 8.
    Example of aparticle collision FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 8
  • 9.
    Solving the puzzle…which particles go together? 1. AB + CD? 2. AC + BD? 3. AD + BC? A B ? D C FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 9
  • 10.
    PAW – PhysicsAnalysis Workstation Source: Wikipedia Tape robot Data collection & analysis was done in Fortran. Advance analysis/statistics was done through PAW. [1996-2002] Source: CERN FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 10
  • 11.
    Solution: Big Datainfrastructure enables large scale computational such as combine all possibilities (cross-product) Schematic View CERN Example (discovery of a new particle bb) Signal (particle resonance) Statistical Noise FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html 11
  • 12.
    Size of theelectron? R < 5.1 x 10-19 m *** *** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc. 3332 FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 12
  • 13.
    Extra dimension? MS >1.1 TeV *** graviton extra dimension e+ e+ ee- our universe in 4 dimensions *** Patrick Deglon, Etude de la diffusion Bhabha avec le détecteur L3 au LEP, Th. phys. Genève, 2002; Sc. 3332 FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 13
  • 14.
    2004, joined eBay EuropeanHQ in Bern, Switzerland FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 14
  • 15.
    $68 billion in merchandisetraded in 2011 ... or $1.3 million every 10 minutes FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 15
  • 16.
    eBay: The World'sOnline Marketplace® every every every 26 2 4 min. min. sec. a Ford Mustang is sold a major appliance is sold a pair of shoes is sold FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 16
  • 17.
    CERN vs EBAY CERN EBAY •Write kilometers long Fortran code • Analysis can run for many hours… before a batch robot error • Write miles long SQL code • Queries can run for many hours… before a spool space error • Study billions of collision data • Study billions of transactional data • Great depth of data structure & complexity • Great depth of data structure & complexity • Know your local expert for question – but try to find the solution by yourself… much quicker • Know your local expert for question – but try to find the solution by yourself… much quicker • Remove “bad runs” (unclean data batch) • Remove “wackos” (non material transactions) • Transform a complex system into insights • Transform a complex system into insights • Communicate findings to conferences • Communicate recommendation to business review • Strong competitive landscape (4 distinct experiments competing to the first to publish, or publish better results) • Strong competitive landscape FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 17
  • 18.
    Analytics at eBay “CIO” “CDO” “CAO” “CMO” AnalyticsPlatforms & Delivery (APD) Analytics Marketing    Technology Finance Business Units End Users of Big Data FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 18
  • 19.
    What my friends thinkI do What my mum thinks I do What the BU thinks I do What I think I do What the BU wants me to do What I really do Source: Pierre Donzier FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 19
  • 20.
    EBAY INFRASTRUCTURE 1 Introduction: CERN &eBay 2 eBay Infrastructure 3 Examples of Analysis 4 Partnership & Trust
  • 21.
    Core Analytics Data Access BusinessCentric DataHub MS Excel Tableau Data Platform Technology Centric SAS/R OBIEE MicroStrategy Analyze & Report SOA/DAL Purpose Built Aps SQL Discover & Explore EDW “SINGULARITY” HADOOP CLUSTERS ENTERPRISE-CLASS SYSTEM LOW END ENTERPRISE-CLASS SYSTEM COMMODITY HARDWARE SYSTEM Teradata 55xx and 66xx Series Relational Data Dual System 10+ PB Semi Structured & Relational Data Deep Storage Unstructured Data Pattern Detection Deep Storage 40+ PB 40+ PB Data Integration Ab Initio Informatica Golden Gate UC4 BES MapReduce 21
  • 22.
    DW Sandbox enablesagile analytics Analytics teams have access to sandboxes within eBay Teradata data warehouses (~ 100 GB per sandbox): • Enable to keep the “Single analyst’s sandbox Teradata Data Warehouse Point of Truth” philosophy • Improved Time To Market – Days / Weeks vs Months • Enable the business to do agile prototyping • Enable the users to “Fail Fast” – Make it easy to try out new ideas • Eliminate isolated Data Marts FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 22
  • 23.
    SO… WHERE DOWE GO FROM HERE? 1 Intro: CERN & eBay 2 eBay Infrastructure 3 Examples of Analysis 4 Partnership & Trust
  • 24.
    Measuring impact ofinitiatives A/B test Pre/Post analysis illustrative example (Simulation) illustrative example (Simulation) Number of purchases Number of listings 35,000 Initiative launched 450 400 Impact of the initiative 350 300 test group 200 150 50 0 Aug 1st pre 2012 post D 25,000 20,000 250 100 30,000 Impact of the initiative Initiative launched B 15,000 2011 C 10,000 control group Sep 1st 5,000 Oct 1st • Randomized Test/Control group methodology is a golden standard in research A 0 Aug 1st Sep 1st Oct 1st • Used to measure the impact of an initiative in a full market or a market segment FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA
  • 25.
    Marketing 101 Cost Direct Return Purchase LC L Incr Return ? No Purchase ? C D Don‟t Do Marketing D Do Marketing FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 25
  • 26.
    Medici Effect • Newideas proliferate when professional or cultural fields collide. That‟s the “Medici Effect.“ • During the Renaissance, the Medici family enabled such collisions by funding various fields and facilitating interdisciplinary creativity. House of Medici Michelangelo Source: FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 26
  • 27.
    Remember this physicsproblem? 1. AB + CD? 2. AC + BD? 3. AD + BC? A B ? D C FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 27
  • 28.
    Solution: Big Datainfrastructure enables large scale computational such as combine all possibilities (cross-product) Schematic View CERN Example (discovery of a new particle bb) Signal (particle resonance) Statistical Noise Combine correlated events and uncorrelated events produce a system with a statistical noise (which is simple enough to extract) and the researched signal FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA Source: http://www.atlas.ch/news/2011/ATLAS-discovers-its-first-new-particle.html 28
  • 29.
    Big Data technologiesenable the full Cartesian product of Marketing action & Revenue generating events Clicks – Conversion Playground Marketing Events (Clicks or Impressions) FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 29
  • 30.
    Alternative way tounderstand customer behavior & incrementally: geographic experimentation Revenues / Cost 3 per. Mov. Avg. (Group 1) Baseline 3 per. Mov. Avg. (Group 2) 3 per. Mov. Avg. (Group 3) Phase 1 3 per. Mov. Avg. (Group 4) Phase 2 FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 30
  • 31.
    CREATING VALUE THROUGH THE ORGANIZATION 1 Introduction: CERN& eBay 2 eBay Infrastructure 3 Examples of Analysis 4 Partnership & Trust
  • 32.
    Analytics as afunction? Embedded Model Functional Model “I‟m following my BU leader, but can‟t get promoted” “I‟m a partner of business execution”  Need to track satisfaction/loyalty/trust of our partnership FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 32
  • 33.
    Net Promoter Score NPS:How likely is that you will recommend [Brand Name] to a friend or a colleague? 0 1 2 3 4 5 6 7 8 very unlikely 9 10 very likely Detractors Passives Promoters NPS = % Promoters - % Detractors FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 33
  • 34.
    The logic behindNPS • To improve NPS, a company need to work on 2 fronts: – Move Detractors into Passives (i.e. fix the holes, i.e. no more unacceptable bad experiences) – Move Passives into Promoters (i.e. improve the whole experience, best-in-class buyer experience) 0 1 2 3 Detractors 4 5 6 7 8 Passives 9 10 Promoters NPS = % Promoters - % Detractors FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 34
  • 35.
    Side note: Erroron NPS measurement • NPS is a multinomial distribution with – p the probability to answer 0 to 6 – q the probability to answer 7 or 8 – r the probability to answer 9 or 10 – N the number of answers • The Expected value for the Net Promoter Score is then E(NPS) = r – p • The Variance is then V(NPS) = V(r-p) = V(r) + V(p) – 2 Cov(r,p) = r (1-r) / N + p (1-p) / N + 2 r p / N • Hence the error on NPS, i.e. the Standard Deviation, is then (NPS) = SQRT [ r (1-r) / N + p (1-p) / N + 2 r p / N ] FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 35
  • 36.
    NPS is ameasurement of Loyalty in a free environment. In a paid environment, it‟s more a measurement of Trust between co-workers/partners Net Promoter Score How likely is it that you would recommend working with Analyst XXX to a friend or colleague? 0 1 2 3 4 5 6 7 8 9 10 FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 36
  • 37.
    eNPS Survey Team eNPS Survey PartnereNPS Survey • Identify opportunity to better partner with the business • Identify to better work together as a team • Enable directional assessment of eNPS; keeping in mind biases: low N, subjective question, unlikely to promote an unknown entity, partner <> client (i.e. Finance vs Agency) Now that we have a measurement, how to improve it? FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 37
  • 38.
    What is Trust?How to improve it? Trust = Credibility Reliability Intimacy Unselfishness http://www.collieassociates.com/common/Trust_Equation.pdf Words: Convincing & believable Actions: Consistently good in quality & performance Emotions: Feel comfortable talking to you about the sensitive, personal issues connected to the surface issue Motives: Know that you care about serving higher interests FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 38
  • 39.
    Build Trust: TrustEquation Trust = R × C × I × Trust Component Reliability (Actions = Consistently good in quality & performance) Credibility (Words = Convincing & believable) Insights Discovery ® Colors Hartman Personality Profiles Lead completely Fiery RED “Do it now!” RED Power Wielders Practice judgment Cool BLUE “Do it right!” BLUE The Do-gooders Keep it human Earth GREEN “Do it harmoniously!” WHITE The Peacekeepers Trust each other Sunshine YELLOW “Do it together!” YELLOW The Fun Lovers Intimacy (Emotions = Feel comfortable talking to you about the sensitive/personal issues connected to the surface issue) Unselfishness U eBay Success Factor (Motives = Know that you care about serving our higher interests) Carl Jung, Swiss psychologist FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 39
  • 40.
    Example of aninternal partners survey on the Trust foundation Translates ideas and concepts into action. 4.9 Turnaround requests effectively. 5.0 Is comfortable with change. 5.0 Is adept at prioritizing tasks. Does what one says one will do. Tell the truth. Is genuine in saying „Thank you‟ or „I don‟t know‟. Is comfortable saying 'no' at the beginning rather than being unable to deliver in the end. Creates an environment to address potential conflicts openly. Reliability (4.9) 4.9 5.2 5.6 5.5 Credibility (5.3) 5.0 5.0 Seeks help when facing difficulties. 5.3 Has an appropriate sense of humor. 5.3 Responds to and understand the feelings/needs of others. 5.4 Uses „we‟ rather than „they‟ or „I‟. Makes time for others. Intimacy (5.2) 5.2 5.4 Supports ideas for innovation from others. 5.3 Trusts others to make decisions and get things done for them. Unselfishness (5.3) 5.2 Please complete each of the following statements using the rating guide. Try to provide a rating for every statement and be honest with your feedback. Weak in this area=1, Some concerns=2, A minor shortfall=3, Competent=4, Better than competent=5, Outstanding=6 FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 40
  • 41.
    Trust Equation assessmentby the team and our partners Partner average answer 90 85 under confidence zone over confidence zone Intimacy, Keep It Human Credibility, Meets Quality 80 Non Political, Unselfishness 75 Reliability, Meets Deadline 70 65 60 60 65 70 75 80 85 90 Team average answer FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 41
  • 42.
    Reliability: Value ofan Analysis Keep It Simple & Stupid Individual Limit Total Cost Direct Return Preferred analyst‟s level of complexity Optimal level of complexity Complexity of Analytics Net Return (Profit) FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 42
  • 43.
    Credibility: Principle OfLeast Surprise (POLS) Don‟t surprise executives & partners with new metrics, new definition, new format or anything new… without a proper business reason. Setup Insights & Recommendation in a natural, logical, global & agreed-upon framework. FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 43
  • 44.
    Credibility: Fixed Standard…or Flexible Chaos? Standardized Global Metrics Store any thing to enable measuring any metrics to answer any questions Chaos enable flexibility, but require a strong process to maintain credibility FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 44
  • 45.
    (Business) Intimacy • KeepIt Human – meet people, talk to people, walk to desk, pick-up the phone • Seek help when needed • Have a good sense of humor – “It‟s just a website…” • Create an enviroment where people can open-up and discuss underlying issue • Respond to the need/feeling of others • CONNECT with people (Avatar‟s “I see you”) FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 45
  • 46.
    Unselfishness • Don‟t workin silo • Consider “we” rather than “I” or “they” • Support ideas for innovation from other (improv‟s “yes, and…”) • Trust other to make the right decision – and live with it • Be AVAILABLE – make time for other FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 46
  • 47.
    Wrapping Up How complexitycan spark innovation, but also kill effectiveness • Medici principle • KISS • Managing chaos Why an embedded or client-centric Analytics organization is not necessarily a great idea • Enable career path with an Analytics organization • Partner vs Client • eNPS - Maintain the pulse on the internal-client/partner satisfaction Why analyst creativity is antagonistic to executive reporting • Trust pillars: Reliability, Credibility, Intimacy, Unselfishness • POLS FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 47
  • 48.
  • 49.
    FROM THE BIGBANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA Patrick Deglon Director of Global Traffic Analytics pdeglon@ebay.com linkd.in/pdeglon
  • 50.
    Credibility: Key Phasesof an Analytics Project Move the Business Follow-up / Implementation Readout Executive Summary Scoping Hypothesis to be verified Scoping the question Measurement set up Measuring Query Data check Guiding the Business Story Line / Deck Driving Insights Facts / Slides Review hypothesis Data manipulation Interpretation Statistics Graphs FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 50
  • 51.
    James, 32, livein Pittsburgh, married, 1 child, Electronics Enthusiast Site Visit Site Visit YouTube Display Click Site Visit Offline Store Visit Google Search on “Digital Camera”, click on eBay PS Ad Google Search on “eBay Digital Camera” Click on NS link Purchase Loyalty Level i.e. Likelihood to purchase on eBay Woa.. They really have nice deals on eBay Ah…yes, e Bay was a good idea – what do they have? That‟s really expensive in a store Let‟s get that camera now Time FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 51
  • 52.
    Marketing Attribution Logic $ YouTube DisplayImpression Google Search on “Digital Camera”, click on eBay PS Ad Google Search on “eBay Digital Camera” Click on NS link Purchase How does the purchase correlate to the customer touch points? How “close”/”distant” are the clicks & the purchase? Which one is the most important? FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 52
  • 53.
    What is moreimportant: the front wheel or the back wheel? FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 53
  • 54.
    Marketing Attribution Management YouTube DisplayImpression Google Search on “Digital Camera”, click on eBay PS Ad Google Search on “eBay Digital Camera” Click on NS link Purchase Define correlation (“distance”) between customer touch points and purchase and the likelihood that it happens distance in time distance in KW space distance in Mindset • Latency: time between click and ROI event (2 minutes? 2 hours? 2 days?) • Relevancy: difference between Search keyword and Item purchased (KW-Title relevancy, KW-Vertical relevancy) • Loyalty: mindset of customer, i.e. RFM segment (Reactivation or Top Buyer) • … FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 54
  • 55.
    Marketing Attribution Management LastClick First Click All Clicks Model YouTube Display Impression Google Search on “Digital Camera”, click on eBay PS Ad Google Search on “eBay Digital Camera” Click on NS link 100% YouTube Display Impression 100% Google Search on “Digital Camera”, click on eBay PS Ad Google Search on “eBay Digital Camera” Click on NS link YouTube Display Impression 33% Google Search on “Digital Camera”, click on eBay PS Ad 33% Google Search on “eBay Digital Camera” Click on NS link 33% YouTube Display Impression 60% Google Search on “Digital Camera”, click on eBay PS Ad 35% Google Search on “eBay Digital Camera” Click on NS link 5% Purchase Purchase Purchase Purchase FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 55
  • 56.
    … So what? LastClick Channel A Channel B Channel C GMB 8% 5% 1% ROI +20% -10% +10% • Reduce spend on channel B • Invest in channel A • When prioritizing, ignore channel C <> All Clicks Model Channel A Channel B Channel C GMB 7% 6% 12% ROI -20% +30% +60% • Reduce spend on channel A • Invest heavily on channel C • Marketing counts actually for 25% of the site FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 56
  • 57.
    Example of theInternational Weekly Variance Infrastructure (2007) Automated SQL Core DW database Excel inputs PDF print-out PET* Modular Back-end single pivot table PPT & Excel report Flexible Front-end * PET is a small database inside the Teradata Data Warehouse for building prototypes. FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 57
  • 58.
    Example of AutomatedQuarterly Market Review deck (2007) PowerPoint chart object with a “SQL” field containing a EXEC MACRO to refresh data content of the chart Linked to an Excel file that can we refresh when needed PowerPoint table object with a “SQL” field containing a EXEC MACRO to refresh the table content 58
  • 59.
    PowerPoint Reporting Tool(2012) Update the content of the selected objects (table or chart) Update the content of all objects in the PowerPoint Login to DW Add a “SQL” tag to objects (table of chart) and edit the SQL Create a dummy chart FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 59
  • 60.
    Example of BIreport using Tableau FROM THE BIG BANG TO ECOMMERCE, A JOURNEY IN MAKING SENSE OF BIG DATA 60