SlideShare a Scribd company logo
1 of 37
Take back control of your 
Web Tracking 
www.dataiku.com 
@ClementStenac 
CTO, Dataiku
Give me dashboards ! 
www.dataiku.com
Choose one 
www.dataiku.com 
Raw data 
Do what you want 
Your money 
Access to raw data is a premium feature
Who cares about raw data ? 
• SAAS analytics are full-featured 
• Custom variables to link with your backend data 
• Did you really join all data for your 
future needs ? 
• Do you have access / want to push to the JS 
all necessary data ? 
• What kinds of analysis will you do later on ? 
www.dataiku.com
A real example 
Segmentation and tracking user-satisfaction 
www.dataiku.com 
Raw 
tracking 
data 
User-level 
stats 
User base 
segmentation 
Metrics per 
segments 
Tracking over time 
TB 
GB
User-level data 
www.dataiku.com
Clustering 
www.dataiku.com
Labeling 
www.dataiku.com 
Search for a 
specific Topic 
Newcomer 
from Google 
News 
Here you need your 
business intelligence 
Foreigner 
Discovering The 
Site 
Fan who loves 
to comment 
Home Page 
Wanderer 
Dark Bot 
(Competitor?)
Compute metrics per segment 
738k sessions 
0.83€ per session 
0.73€ acquisition costs 
www.dataiku.com 
938k sessions 
Search for a 
specific Topic 
Newcomer 
from Google 
News 
Here you need to 
cross with your CRM 
Foreigner 
Discovering The 
Site 
Fan that loves 
to comment 
Home Page 
Wanderer 
Dark Bot 
(Competitor?) 
0.3€ per session 
0.23€ acquisition costs 
`` 
` 
13k sessions 
1.3€ per session 
0.23€ acquisition costs 
938k sessions 
0.3€ per session 
0.23€ acquisition costs 
68k sessions 
0.3€ per session 
1.23€ acquisition costs 
1k sessions 
0€ per session 
0€ acquisition costs
Track metrics over time 
www.dataiku.com 
Using your already-computed segments 
Search for a 
specific Topic 
Newcomer 
from Google 
News 
Fan that loves 
to comment 
Home Page 
Wanderer 
Foreigner 
Discovering The 
Site 
Dark Bot 
(Competitor?) 
Damn 
our latest 
release 
has diverging 
effects 
on segments
A few other examples 
• Churn prediction and explanation 
• Customer lifetime value prediction 
www.dataiku.com
www.dataiku.com 
OK 
I WANT TO 
DO IT
So, I have these Apache logs 
• First level of web tracking 
• "Nothing required" 
www.dataiku.com
Are backend logs a solution ? 
Challenge 1 : Identify a visitor 
www.dataiku.com 
• IP ? 
• NAT / Proxy 
• Not everyone has a public IP address 
• IP + user-agent ? 
• Big companies !
Are backend logs a solution ? 
Challenge 2 : Re-create sessions 
• Using expiration times 
• Advanced SQL / Hive / … 
www.dataiku.com 
makes this easier
Are backend logs a solution ? 
Challenge 3 : single-page webapps 
• Track behaviour within each page 
• Track events, not pages 
Also: getting logs from IT is sometimes another challenge  
www.dataiku.com
Client-side tracking 
• visitor_id and session_id handled with cookies 
• Tracking page loads and various events 
• Historically, "tracking" = fetching a 1x1 image 
• AJAX 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
JS tracking code 
Tracking calls
Are cookies good for your (web) health ? 
• Each cookie belongs to a domain 
www.dataiku.com 
(and its subdomains) 
• Who can write a cookie ? 
– The HTTP server, who becomes owner 
(via the Set-Cookie HTTP header) 
– JS code running on the "owner" domain 
• Who can read a cookie ? 
– The owner HTTP server (sent by the browser) 
– JS code running on the "owner" domain
First-party cookies 
• Set by the originating server (HTTP) or JS code 
• Belong to the originating domain 
• Sent by HTTP to the originating domain only 
• Readable by JS code 
www.dataiku.com 
www.website.com 
Browser 
Contents 
Cookies for www.website.com: 
None 
tracker.com 
GET / 
Cookies: none 
Fetch tracking script 
Tracking JS code: read cookies for www.website.com 
Tracking JS code: create visitor id and set cookie
First-party cookies 
• Set by the originating server (HTTP) or JS code 
• Belong to the originating domain 
• Sent by HTTP to the originating domain only 
• Readable by JS code 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
GET /track?visitor_id=d37ecba 
Cookies: None 
JS code: send AJAX request to tracker.com with visitor_id 
Cookies for www.website.com: 
visitor_id=d37ecba
Third-party cookies 
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain 
• Not send by HTTP to the originating domain (does not belong) 
• NOT readable by JS code (does not belong) 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
GET / 
Cookies: none 
Fetch tracking script 
Contents 
Cookies for www.website.com: 
None 
Cookies for tracker.com: 
None
Third-party cookies 
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain 
• Not send by HTTP to the originating domain (does not belong) 
• NOT readable by JS code (does not belong) 
www.dataiku.com 
www.website.com 
Browser 
Cookies for www.website.com: 
None 
Tracker code: assign visitor_id 
tracker.com 
Cookies for tracker.com: 
None 
GET /track 
Cookies: None 
200 OK 
Set-Cookie: visitor_id=33d7
Third-party cookies 
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain 
• Not send by HTTP to the originating domain (does not belong) 
• NOT readable by JS code (does not belong) 
Tracker code: read visitor_id 
GET /track 
Cookies: visitor_id=33d7 
www.dataiku.com 
www.website.com 
Browser 
tracker.com 
Cookies for tracker.com: 
visitor_id=33d7 
200 OK 
Cookies for www.website.com: 
None
Why each ? 
www.dataiku.com 
First party cookie 
• Tracks on a single website 
• Requires JS code for tracking 
• Reduced privacy impact: 
No exchange of information 
between sites 
• Usage: track your user's 
behaviour 
Third party cookie 
• Tracks across all websites 
using the same tracker 
• More frowned upon 
• Usage: generally, ads 
but also multi-website 
Rarely blocked 
(used for logins) 
Blocked by up to 
40% visitors
What are your obligations ? 
With ALL cookies 
• You should ask user whether he wants cookies 
• Even non-tracking related cookies 
• Yes, even login-related ones 
www.dataiku.com
What are your obligations ? 
With third party cookies 
• Obey the Do-Not-Track header 
www.dataiku.com 
www.website.com 
Browser 
Tracker code: DO NOTHING 
tracker.com 
GET /track 
Cookies: None 
DNT: 1 
200 OK
What are your obligations ? 
With third party cookies 
• Provide an opt-out URL 
• Allows the user to /optin , /optout or /status 
See in action : www.youronlinechoices.com 
www.dataiku.com
What are your obligations ? 
With third party cookies 
• Provide a P3P policy 
• Else, older IE blocks you 
"What are you doing with my data ?" 
www.dataiku.com 
Looks like this: 
CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
Tracking in mobile apps 
www.dataiku.com 
• Preserve battery 
– Each network call is costly 
– Do not track everything synchronously 
• Network access is intermittent 
– Queue events and wait for network access
So, what are my choices ? 
• You might really want to be your own web tracker 
• Most used open source Webtracker : 
www.dataiku.com 
Piwik 
• Provides both raw data and nice dashboards 
– MySQL backend 
– Raw data via API 
– Slightly less suited for analytics
www.dataiku.com 
WT1 
YOUR OWN 
TRACKER 
IN MINUTES
WT1 
An open source (Apache License) server 
to build your own web tracking 
https://github.com/dataiku/wt1 
• Designed to provide you with raw data, 
directly usable for analytics 
• Very high performance and scalability 
www.dataiku.com
Features 
www.dataiku.com 
• 1st or 3rd party cookies 
– Handling of DNT and opt-out 
– Helps handling P3P 
• Track events or pages with key-value data 
• Visitor-scope and session-scope variables 
• "Live view" debugging console
Features 
www.dataiku.com 
• Dashboards: None  
• Events processing and storage 
– Filesystem, S3 
– Event queues: Flume 
– Custom processors 
• JSON API for custom tracking 
• iOS library
Architecture 
www.dataiku.com 
Client-side 
JS tracker 
iOS 
library 
• 1st or 3rd 
party cookies 
• Event-level tracking 
• Automatic batching 
• Queuing to deal with 
network interruptions 
WT1 Server 
Raw storage 
• Filesystem 
• S3 
JSON POST 
Event processors: 
• Real-time aggregations 
• Custom code 
Event queues 
• Flume 
• Kafka, RabbitMQ, … 
• Java 
• > 20K events / second 
• Handles DNT, P3P, opt-out, …
Future work 
www.dataiku.com 
• Android library 
• More event queues supported OOTB 
– Kafka 
– RabbitMQ 
• Avro storage
Thank you ! 
www. .com 
Clément Stenac 
clement.stenac@dataiku.com 
@ClementStenac 
www.dataiku.com

More Related Content

What's hot

H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudEduardo Silva Pereira
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heartGabriel Hamilton
 
What is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataWhat is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataTreasure Data, Inc.
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Data Con LA
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Wes McKinney
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsPatrick Chanezon
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetCarl W. Handlin
 

What's hot (19)

Before Kaggle
Before KaggleBefore Kaggle
Before Kaggle
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
What is support_engineer_in_treasuredata
What is support_engineer_in_treasuredataWhat is support_engineer_in_treasuredata
What is support_engineer_in_treasuredata
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)Building Better Analytics Workflows (Strata-Hadoop World 2013)
Building Better Analytics Workflows (Strata-Hadoop World 2013)
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 

Viewers also liked

The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHDataiku
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku
 
Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch Dataiku
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015 Dataiku
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Dataiku
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ? Dataiku
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
The US Healthcare Industry
The US Healthcare IndustryThe US Healthcare Industry
The US Healthcare IndustryDataiku
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital OneFlink Forward
 
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...500 Startups
 
What Makes Content Memorable?
What Makes Content Memorable?What Makes Content Memorable?
What Makes Content Memorable?Bruce Kasanoff
 
Activate Tech and Media Outlook 2016
Activate Tech and Media Outlook 2016Activate Tech and Media Outlook 2016
Activate Tech and Media Outlook 2016Activate
 
Tips, Tools and Templates To Build Your Content Marketing Strategy
Tips, Tools and Templates To Build Your Content Marketing StrategyTips, Tools and Templates To Build Your Content Marketing Strategy
Tips, Tools and Templates To Build Your Content Marketing StrategyMichael Brenner
 
How To Plan And Build A Successful Content Marketing Strategy
How To Plan And Build A Successful Content Marketing StrategyHow To Plan And Build A Successful Content Marketing Strategy
How To Plan And Build A Successful Content Marketing StrategyMichael Brenner
 

Viewers also liked (20)

The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
 
Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch Dataiku - Paris JUG 2013 - Hadoop is a batch
Dataiku - Paris JUG 2013 - Hadoop is a batch
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ?
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
The US Healthcare Industry
The US Healthcare IndustryThe US Healthcare Industry
The US Healthcare Industry
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Flink Case Study: Capital One
Flink Case Study: Capital OneFlink Case Study: Capital One
Flink Case Study: Capital One
 
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
[WMD 2016] Karen X LLC >> Karen X Cheng "Facebook is completely changing vira...
 
What Makes Content Memorable?
What Makes Content Memorable?What Makes Content Memorable?
What Makes Content Memorable?
 
Activate Tech and Media Outlook 2016
Activate Tech and Media Outlook 2016Activate Tech and Media Outlook 2016
Activate Tech and Media Outlook 2016
 
Tips, Tools and Templates To Build Your Content Marketing Strategy
Tips, Tools and Templates To Build Your Content Marketing StrategyTips, Tools and Templates To Build Your Content Marketing Strategy
Tips, Tools and Templates To Build Your Content Marketing Strategy
 
How To Plan And Build A Successful Content Marketing Strategy
How To Plan And Build A Successful Content Marketing StrategyHow To Plan And Build A Successful Content Marketing Strategy
How To Plan And Build A Successful Content Marketing Strategy
 

Similar to OWF 2014 - Take back control of your Web tracking - Dataiku

OWF14 - Big Data Track : Take back control of your web tracking Go further by...
OWF14 - Big Data Track : Take back control of your web tracking Go further by...OWF14 - Big Data Track : Take back control of your web tracking Go further by...
OWF14 - Big Data Track : Take back control of your web tracking Go further by...Paris Open Source Summit
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics PrimerChad Richeson
 
Web前端性能优化 2014
Web前端性能优化 2014Web前端性能优化 2014
Web前端性能优化 2014Yubei Li
 
Tracking and business intelligence
Tracking and business intelligenceTracking and business intelligence
Tracking and business intelligenceSebastian Schleicher
 
10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App TodayChris Love
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDamian T. Gordon
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Website & Internet + Performance testing
Website & Internet + Performance testingWebsite & Internet + Performance testing
Website & Internet + Performance testingRoman Ananev
 
Altitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkAltitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkFastly
 
20 tips for website performance
20 tips for website performance20 tips for website performance
20 tips for website performanceAndrew Siemer
 
Introduction to Search Engine.pdf
Introduction to Search Engine.pdfIntroduction to Search Engine.pdf
Introduction to Search Engine.pdfPraveen Kurup
 
Introduction to Search Engine.pdf
Introduction to Search Engine.pdfIntroduction to Search Engine.pdf
Introduction to Search Engine.pdfPraveen Kurup
 
Scrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub
 
External JavaScript Widget Development Best Practices (updated) (v.1.1)
External JavaScript Widget Development Best Practices (updated) (v.1.1) External JavaScript Widget Development Best Practices (updated) (v.1.1)
External JavaScript Widget Development Best Practices (updated) (v.1.1) Volkan Özçelik
 
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...Robert Conti Jr.
 
10 things you can do to speed up your web app today stir trek edition
10 things you can do to speed up your web app today   stir trek edition10 things you can do to speed up your web app today   stir trek edition
10 things you can do to speed up your web app today stir trek editionChris Love
 
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday SeasonG3 Communications
 

Similar to OWF 2014 - Take back control of your Web tracking - Dataiku (20)

OWF14 - Big Data Track : Take back control of your web tracking Go further by...
OWF14 - Big Data Track : Take back control of your web tracking Go further by...OWF14 - Big Data Track : Take back control of your web tracking Go further by...
OWF14 - Big Data Track : Take back control of your web tracking Go further by...
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics Primer
 
Web前端性能优化 2014
Web前端性能优化 2014Web前端性能优化 2014
Web前端性能优化 2014
 
Tracking and business intelligence
Tracking and business intelligenceTracking and business intelligence
Tracking and business intelligence
 
10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web Scraping
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
 
Website & Internet + Performance testing
Website & Internet + Performance testingWebsite & Internet + Performance testing
Website & Internet + Performance testing
 
Door Of Internet
Door Of InternetDoor Of Internet
Door Of Internet
 
Web Performance Optimization (WPO)
Web Performance Optimization (WPO)Web Performance Optimization (WPO)
Web Performance Optimization (WPO)
 
Altitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkAltitude SF 2017: The power of the network
Altitude SF 2017: The power of the network
 
20 tips for website performance
20 tips for website performance20 tips for website performance
20 tips for website performance
 
Introduction to Search Engine.pdf
Introduction to Search Engine.pdfIntroduction to Search Engine.pdf
Introduction to Search Engine.pdf
 
Introduction to Search Engine.pdf
Introduction to Search Engine.pdfIntroduction to Search Engine.pdf
Introduction to Search Engine.pdf
 
Scrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub Deck for Startups
Scrapinghub Deck for Startups
 
External JavaScript Widget Development Best Practices (updated) (v.1.1)
External JavaScript Widget Development Best Practices (updated) (v.1.1) External JavaScript Widget Development Best Practices (updated) (v.1.1)
External JavaScript Widget Development Best Practices (updated) (v.1.1)
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
ISS Capstone - Martinez Technology Consulting and Cedar Hills Church Security...
 
10 things you can do to speed up your web app today stir trek edition
10 things you can do to speed up your web app today   stir trek edition10 things you can do to speed up your web app today   stir trek edition
10 things you can do to speed up your web app today stir trek edition
 
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
 

More from Dataiku

Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
 
04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku 04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku Dataiku
 
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Dataiku
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages JaunesDataiku
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data CircleDataiku
 
Dataiku - google cloud platform roadshow - october 2013
Dataiku  - google cloud platform roadshow - october 2013Dataiku  - google cloud platform roadshow - october 2013
Dataiku - google cloud platform roadshow - october 2013Dataiku
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
 
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16thDataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16thDataiku
 
Data Disruption for Insurance - Perspective from th
Data Disruption for Insurance - Perspective from thData Disruption for Insurance - Perspective from th
Data Disruption for Insurance - Perspective from thDataiku
 
Dataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku
 
Online Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for FunOnline Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for FunDataiku
 

More from Dataiku (15)

Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku 04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku
 
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
 
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku  - for Data Geek Paris@Criteo - Close the Data CircleDataiku  - for Data Geek Paris@Criteo - Close the Data Circle
Dataiku - for Data Geek Paris@Criteo - Close the Data Circle
 
Dataiku - google cloud platform roadshow - october 2013
Dataiku  - google cloud platform roadshow - october 2013Dataiku  - google cloud platform roadshow - october 2013
Dataiku - google cloud platform roadshow - october 2013
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16thDataiku, Pitch Data Innovation Night, Boston, Septembre 16th
Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th
 
Data Disruption for Insurance - Perspective from th
Data Disruption for Insurance - Perspective from thData Disruption for Insurance - Perspective from th
Data Disruption for Insurance - Perspective from th
 
Dataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine LearningDataiku - From Big Data To Machine Learning
Dataiku - From Big Data To Machine Learning
 
Online Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for FunOnline Games Analytics - Data Science for Fun
Online Games Analytics - Data Science for Fun
 

Recently uploaded

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 

Recently uploaded (20)

Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 

OWF 2014 - Take back control of your Web tracking - Dataiku

  • 1. Take back control of your Web Tracking www.dataiku.com @ClementStenac CTO, Dataiku
  • 2. Give me dashboards ! www.dataiku.com
  • 3. Choose one www.dataiku.com Raw data Do what you want Your money Access to raw data is a premium feature
  • 4. Who cares about raw data ? • SAAS analytics are full-featured • Custom variables to link with your backend data • Did you really join all data for your future needs ? • Do you have access / want to push to the JS all necessary data ? • What kinds of analysis will you do later on ? www.dataiku.com
  • 5. A real example Segmentation and tracking user-satisfaction www.dataiku.com Raw tracking data User-level stats User base segmentation Metrics per segments Tracking over time TB GB
  • 8. Labeling www.dataiku.com Search for a specific Topic Newcomer from Google News Here you need your business intelligence Foreigner Discovering The Site Fan who loves to comment Home Page Wanderer Dark Bot (Competitor?)
  • 9. Compute metrics per segment 738k sessions 0.83€ per session 0.73€ acquisition costs www.dataiku.com 938k sessions Search for a specific Topic Newcomer from Google News Here you need to cross with your CRM Foreigner Discovering The Site Fan that loves to comment Home Page Wanderer Dark Bot (Competitor?) 0.3€ per session 0.23€ acquisition costs `` ` 13k sessions 1.3€ per session 0.23€ acquisition costs 938k sessions 0.3€ per session 0.23€ acquisition costs 68k sessions 0.3€ per session 1.23€ acquisition costs 1k sessions 0€ per session 0€ acquisition costs
  • 10. Track metrics over time www.dataiku.com Using your already-computed segments Search for a specific Topic Newcomer from Google News Fan that loves to comment Home Page Wanderer Foreigner Discovering The Site Dark Bot (Competitor?) Damn our latest release has diverging effects on segments
  • 11. A few other examples • Churn prediction and explanation • Customer lifetime value prediction www.dataiku.com
  • 12. www.dataiku.com OK I WANT TO DO IT
  • 13. So, I have these Apache logs • First level of web tracking • "Nothing required" www.dataiku.com
  • 14. Are backend logs a solution ? Challenge 1 : Identify a visitor www.dataiku.com • IP ? • NAT / Proxy • Not everyone has a public IP address • IP + user-agent ? • Big companies !
  • 15. Are backend logs a solution ? Challenge 2 : Re-create sessions • Using expiration times • Advanced SQL / Hive / … www.dataiku.com makes this easier
  • 16. Are backend logs a solution ? Challenge 3 : single-page webapps • Track behaviour within each page • Track events, not pages Also: getting logs from IT is sometimes another challenge  www.dataiku.com
  • 17. Client-side tracking • visitor_id and session_id handled with cookies • Tracking page loads and various events • Historically, "tracking" = fetching a 1x1 image • AJAX www.dataiku.com www.website.com Browser tracker.com JS tracking code Tracking calls
  • 18. Are cookies good for your (web) health ? • Each cookie belongs to a domain www.dataiku.com (and its subdomains) • Who can write a cookie ? – The HTTP server, who becomes owner (via the Set-Cookie HTTP header) – JS code running on the "owner" domain • Who can read a cookie ? – The owner HTTP server (sent by the browser) – JS code running on the "owner" domain
  • 19. First-party cookies • Set by the originating server (HTTP) or JS code • Belong to the originating domain • Sent by HTTP to the originating domain only • Readable by JS code www.dataiku.com www.website.com Browser Contents Cookies for www.website.com: None tracker.com GET / Cookies: none Fetch tracking script Tracking JS code: read cookies for www.website.com Tracking JS code: create visitor id and set cookie
  • 20. First-party cookies • Set by the originating server (HTTP) or JS code • Belong to the originating domain • Sent by HTTP to the originating domain only • Readable by JS code www.dataiku.com www.website.com Browser tracker.com GET /track?visitor_id=d37ecba Cookies: None JS code: send AJAX request to tracker.com with visitor_id Cookies for www.website.com: visitor_id=d37ecba
  • 21. Third-party cookies • Set (in HTTP) by the tracker's domain – Belong to the tracker's domain • Not send by HTTP to the originating domain (does not belong) • NOT readable by JS code (does not belong) www.dataiku.com www.website.com Browser tracker.com GET / Cookies: none Fetch tracking script Contents Cookies for www.website.com: None Cookies for tracker.com: None
  • 22. Third-party cookies • Set (in HTTP) by the tracker's domain – Belong to the tracker's domain • Not send by HTTP to the originating domain (does not belong) • NOT readable by JS code (does not belong) www.dataiku.com www.website.com Browser Cookies for www.website.com: None Tracker code: assign visitor_id tracker.com Cookies for tracker.com: None GET /track Cookies: None 200 OK Set-Cookie: visitor_id=33d7
  • 23. Third-party cookies • Set (in HTTP) by the tracker's domain – Belong to the tracker's domain • Not send by HTTP to the originating domain (does not belong) • NOT readable by JS code (does not belong) Tracker code: read visitor_id GET /track Cookies: visitor_id=33d7 www.dataiku.com www.website.com Browser tracker.com Cookies for tracker.com: visitor_id=33d7 200 OK Cookies for www.website.com: None
  • 24. Why each ? www.dataiku.com First party cookie • Tracks on a single website • Requires JS code for tracking • Reduced privacy impact: No exchange of information between sites • Usage: track your user's behaviour Third party cookie • Tracks across all websites using the same tracker • More frowned upon • Usage: generally, ads but also multi-website Rarely blocked (used for logins) Blocked by up to 40% visitors
  • 25. What are your obligations ? With ALL cookies • You should ask user whether he wants cookies • Even non-tracking related cookies • Yes, even login-related ones www.dataiku.com
  • 26. What are your obligations ? With third party cookies • Obey the Do-Not-Track header www.dataiku.com www.website.com Browser Tracker code: DO NOTHING tracker.com GET /track Cookies: None DNT: 1 200 OK
  • 27. What are your obligations ? With third party cookies • Provide an opt-out URL • Allows the user to /optin , /optout or /status See in action : www.youronlinechoices.com www.dataiku.com
  • 28. What are your obligations ? With third party cookies • Provide a P3P policy • Else, older IE blocks you "What are you doing with my data ?" www.dataiku.com Looks like this: CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
  • 29. Tracking in mobile apps www.dataiku.com • Preserve battery – Each network call is costly – Do not track everything synchronously • Network access is intermittent – Queue events and wait for network access
  • 30. So, what are my choices ? • You might really want to be your own web tracker • Most used open source Webtracker : www.dataiku.com Piwik • Provides both raw data and nice dashboards – MySQL backend – Raw data via API – Slightly less suited for analytics
  • 31. www.dataiku.com WT1 YOUR OWN TRACKER IN MINUTES
  • 32. WT1 An open source (Apache License) server to build your own web tracking https://github.com/dataiku/wt1 • Designed to provide you with raw data, directly usable for analytics • Very high performance and scalability www.dataiku.com
  • 33. Features www.dataiku.com • 1st or 3rd party cookies – Handling of DNT and opt-out – Helps handling P3P • Track events or pages with key-value data • Visitor-scope and session-scope variables • "Live view" debugging console
  • 34. Features www.dataiku.com • Dashboards: None  • Events processing and storage – Filesystem, S3 – Event queues: Flume – Custom processors • JSON API for custom tracking • iOS library
  • 35. Architecture www.dataiku.com Client-side JS tracker iOS library • 1st or 3rd party cookies • Event-level tracking • Automatic batching • Queuing to deal with network interruptions WT1 Server Raw storage • Filesystem • S3 JSON POST Event processors: • Real-time aggregations • Custom code Event queues • Flume • Kafka, RabbitMQ, … • Java • > 20K events / second • Handles DNT, P3P, opt-out, …
  • 36. Future work www.dataiku.com • Android library • More event queues supported OOTB – Kafka – RabbitMQ • Avro storage
  • 37. Thank you ! www. .com Clément Stenac clement.stenac@dataiku.com @ClementStenac www.dataiku.com

Editor's Notes

  1. Web tracking is important, right ? You must understand how your users behave on your website One of the core points of lean So, let's not do it anymore and let others do it !
  2. A huge number of SAAS solutions – provide great dashboards Chances are good that you should use one of them ! Talk about encouraging you to do it yourself but you should probably start with hosted solution for startup.
  3. You generally have to choose between "cheap" (or free) solutions Free: Google Analytics  entry point to sell ads. Not bad but you should know what it's about.
  4. Example add data: complaints / support calls History prior to setting up *this* tracking Analysis: ML, not inaccessible and for elites
  5. Track user satisfaction metrics over time *by behaviour* Not science fiction Raw -> User: recreate *features* for users. Time-baed aggregations
  6. What Olivier Grisel just said 
  7. Just a few quick remarks
  8. Fairly standard if you are used to web trackers GA-like API