SlideShare a Scribd company logo
1 of 10
7/11/2016
2
VIEWBIX ENHANCED CREATIVE
VIDEO
BRANDING
CALL TO
ACTION
3
4
- Send tracking events as query string params to
server hosted on Rackspace
- Hourly job to parse log files and insert summary data into SQL
- Problems:
- Network Bottleneck – dropping events
- Managing SQL server drive space
- No scalability
- Because of sizing problems we limited ourselves in
what we collected – poor analytics
- No enrichment process
Solution 1
5
- Distribute the collection of the tracking events to Akamai cloud (GET
requests to CDN endpoint)
- Akamai aggregate logs and send every 4 hours a batch of logs via
FTP
- Hadoop – Hive – SQL summary tables all hosted in Azure cloud
- Problems:
- Need for faster end to end reporting
- To stay scalable need for summary tables- lose granular reporting
- Changes to the data we need to report on requires re-building and
possibly re-importing of raw data – data modeling
Hadoop/HIVE/SQL
Akamai
Solution 2
6
Requirements doc for new solution
- Work with Flash and Javascript trackers
- Robust data modeling - Ability to change business requirements on the
fly
- No need for summary data – granular reporting
- Robust and reliable enrichment process
- Fast and flexible end to end solution
3rd Party Solution
- Ability to send unlimited events and unstructured data
- Pricing not based on event volume (Dec. 779 Million)
- We own the data
- Hand holding- Managed service
- Beautiful and useful visualizations and data export API (may require
additional 3rd party)
7
How’d we do?
- Work with Flash and Javascript trackers
- Pricing not based on event volume (Dec. 779 Million)
- Ability to send unlimited events and unstructured data
- Hand Holding
- Fast and flexible end to end solution
- We own the data
- Robust data modeling - Ability to change business
requirements on the fly
- No need for summary data – granular reporting
- Robust and reliable enrichment process
- Beautiful and useful visualizations and data export
API (may require additional 3rd party)
Solution- Snowplow
- We wrote an Open Source AS3 tracker
- Fixed monthly fee + AWS usage
- No limits on size or event type
- Amazing customer service
- Pipeline can be adjusted based on needs
- Sits in our AWS account
- Because all data is stored we can change the
pipeline rules and at any time and re-run
- We learned to live with summary data
- Constantly growing- today surpasses our needs
- Today using Bime Analytics – soon to be in house
charting components or Amazon Quicksite
8
Gotchas we ran into
- Errors in the raw data being sent in – garbage in garbage out!
- Solution- at the time- was not auto-scaling.
- Redshift is not MS SQL server- need to understand nuances of
columnar database queries and optimizations
- Real data analysts don’t want charts- they want data. We spent
a lot of time and money perfecting our charts when ultimately our
customers want csv exports. Today our charts are about 95% for
marketing purposes.
- AWS cost forecasting and control
- Data modeling - Ultimately we do need to summarize but at an
acceptable level.
- Invest heavily in this stage.
- Overestimate your needs – You don’t know what you don’t
know.
- Work with Snowplow (at extra cost) to get it right
9
What value do our analytics
provide?
It’s not that big data is bad, but by looking
for the big wins, we risk losing the most
exciting potential of big data: the very
small actionable insights that are unique
to each individual. The real future
potential of big data isn’t in its capacity to
be big, but rather in just how small it can
get.
Glen Tullman - Forbes
“
“
10
THANK YOU

More Related Content

What's hot

How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon
 
The culture trip snowplow implementation
The culture trip snowplow implementationThe culture trip snowplow implementation
The culture trip snowplow implementation
idan_by
 

What's hot (20)

2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Snowplow is at the core of everything we do
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we do
 
Introducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowIntroducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from Snowplow
 
A taste of Snowplow Analytics data
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics data
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Simply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisSimply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution Analysis
 
Data science as a service
Data science as a serviceData science as a service
Data science as a service
 
The culture trip snowplow implementation
The culture trip snowplow implementationThe culture trip snowplow implementation
The culture trip snowplow implementation
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
 
Streetlife's real time analytics stack
Streetlife's real time analytics stackStreetlife's real time analytics stack
Streetlife's real time analytics stack
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Why Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted ConfWhy Big Query is so Powerful - Trusted Conf
Why Big Query is so Powerful - Trusted Conf
 
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive AnalyticsThe Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
The Real-Time CDO and the Cloud-Forward Path to Predictive Analytics
 

Viewers also liked

Viewers also liked (15)

Putting data to work
Putting data to workPutting data to work
Putting data to work
 
Memrise presentation @ London Snowplow meetup
Memrise presentation @ London Snowplow meetup Memrise presentation @ London Snowplow meetup
Memrise presentation @ London Snowplow meetup
 
Social Media Manager's Calendar
Social Media Manager's CalendarSocial Media Manager's Calendar
Social Media Manager's Calendar
 
Automotive business-intelligence software - webinar slides
Automotive business-intelligence software - webinar slidesAutomotive business-intelligence software - webinar slides
Automotive business-intelligence software - webinar slides
 
Waffor MioClient - Customer Engagement & Retention Platform
Waffor MioClient - Customer Engagement & Retention PlatformWaffor MioClient - Customer Engagement & Retention Platform
Waffor MioClient - Customer Engagement & Retention Platform
 
Remodista RetailSource Paper - The Seamless Commerce Experience
Remodista RetailSource Paper - The Seamless Commerce ExperienceRemodista RetailSource Paper - The Seamless Commerce Experience
Remodista RetailSource Paper - The Seamless Commerce Experience
 
Unleash the Power: Marketo & Microsoft Dynamics Integrations
Unleash the Power: Marketo & Microsoft Dynamics IntegrationsUnleash the Power: Marketo & Microsoft Dynamics Integrations
Unleash the Power: Marketo & Microsoft Dynamics Integrations
 
The digital transformation: Used car retail performance management 2.0
The digital transformation: Used car retail performance management 2.0The digital transformation: Used car retail performance management 2.0
The digital transformation: Used car retail performance management 2.0
 
Dealers and OEMs in the Omni Channel World
Dealers and OEMs in the Omni Channel World Dealers and OEMs in the Omni Channel World
Dealers and OEMs in the Omni Channel World
 
Autosure digital
Autosure digitalAutosure digital
Autosure digital
 
Targeting Beyond Demographics with Social Data
Targeting Beyond Demographics with Social DataTargeting Beyond Demographics with Social Data
Targeting Beyond Demographics with Social Data
 
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak) Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
Gartner Digital Marketing Conference 2016: Theater Session (C. Slovak)
 
New approach for availability management
New approach for availability managementNew approach for availability management
New approach for availability management
 
True Single Customer View
True Single Customer View True Single Customer View
True Single Customer View
 
Internal vs. external identity access management
Internal vs. external identity access managementInternal vs. external identity access management
Internal vs. external identity access management
 

Similar to Viewbix tracking journey

Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data center
Adam Cataldo
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
Dobler Consulting
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 

Similar to Viewbix tracking journey (20)

Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
The role of AWS in the Datalandscape of a fast growing Startup
The role of AWS in the Datalandscape of a fast growing StartupThe role of AWS in the Datalandscape of a fast growing Startup
The role of AWS in the Datalandscape of a fast growing Startup
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data center
 
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
Distributed Data Systems
Distributed Data SystemsDistributed Data Systems
Distributed Data Systems
 
NoSQL meetup July 2011
NoSQL meetup July 2011NoSQL meetup July 2011
NoSQL meetup July 2011
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Data virtualization in the cloud – accelerating time to-value
Data virtualization in the cloud – accelerating time to-valueData virtualization in the cloud – accelerating time to-value
Data virtualization in the cloud – accelerating time to-value
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
Cap intro oct2014 pdf
Cap intro oct2014 pdfCap intro oct2014 pdf
Cap intro oct2014 pdf
 
CAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log filesCAP Big Data analytics detects anomalies in server log files
CAP Big Data analytics detects anomalies in server log files
 
Cap server log file analytics
Cap server log file analyticsCap server log file analytics
Cap server log file analytics
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
SAP IQ 16 Product Annoucement
SAP IQ 16 Product AnnoucementSAP IQ 16 Product Annoucement
SAP IQ 16 Product Annoucement
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
ShareChat’s Path to High-Performance NoSQL with ScyllaDBShareChat’s Path to High-Performance NoSQL with ScyllaDB
ShareChat’s Path to High-Performance NoSQL with ScyllaDB
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
Data Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-ValueData Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-Value
 

Recently uploaded

Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 

Recently uploaded (20)

Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...Solution manual for managerial accounting 8th edition by john wild ken shaw b...
Solution manual for managerial accounting 8th edition by john wild ken shaw b...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 

Viewbix tracking journey

  • 3. 3
  • 4. 4 - Send tracking events as query string params to server hosted on Rackspace - Hourly job to parse log files and insert summary data into SQL - Problems: - Network Bottleneck – dropping events - Managing SQL server drive space - No scalability - Because of sizing problems we limited ourselves in what we collected – poor analytics - No enrichment process Solution 1
  • 5. 5 - Distribute the collection of the tracking events to Akamai cloud (GET requests to CDN endpoint) - Akamai aggregate logs and send every 4 hours a batch of logs via FTP - Hadoop – Hive – SQL summary tables all hosted in Azure cloud - Problems: - Need for faster end to end reporting - To stay scalable need for summary tables- lose granular reporting - Changes to the data we need to report on requires re-building and possibly re-importing of raw data – data modeling Hadoop/HIVE/SQL Akamai Solution 2
  • 6. 6 Requirements doc for new solution - Work with Flash and Javascript trackers - Robust data modeling - Ability to change business requirements on the fly - No need for summary data – granular reporting - Robust and reliable enrichment process - Fast and flexible end to end solution 3rd Party Solution - Ability to send unlimited events and unstructured data - Pricing not based on event volume (Dec. 779 Million) - We own the data - Hand holding- Managed service - Beautiful and useful visualizations and data export API (may require additional 3rd party)
  • 7. 7 How’d we do? - Work with Flash and Javascript trackers - Pricing not based on event volume (Dec. 779 Million) - Ability to send unlimited events and unstructured data - Hand Holding - Fast and flexible end to end solution - We own the data - Robust data modeling - Ability to change business requirements on the fly - No need for summary data – granular reporting - Robust and reliable enrichment process - Beautiful and useful visualizations and data export API (may require additional 3rd party) Solution- Snowplow - We wrote an Open Source AS3 tracker - Fixed monthly fee + AWS usage - No limits on size or event type - Amazing customer service - Pipeline can be adjusted based on needs - Sits in our AWS account - Because all data is stored we can change the pipeline rules and at any time and re-run - We learned to live with summary data - Constantly growing- today surpasses our needs - Today using Bime Analytics – soon to be in house charting components or Amazon Quicksite
  • 8. 8 Gotchas we ran into - Errors in the raw data being sent in – garbage in garbage out! - Solution- at the time- was not auto-scaling. - Redshift is not MS SQL server- need to understand nuances of columnar database queries and optimizations - Real data analysts don’t want charts- they want data. We spent a lot of time and money perfecting our charts when ultimately our customers want csv exports. Today our charts are about 95% for marketing purposes. - AWS cost forecasting and control - Data modeling - Ultimately we do need to summarize but at an acceptable level. - Invest heavily in this stage. - Overestimate your needs – You don’t know what you don’t know. - Work with Snowplow (at extra cost) to get it right
  • 9. 9 What value do our analytics provide? It’s not that big data is bad, but by looking for the big wins, we risk losing the most exciting potential of big data: the very small actionable insights that are unique to each individual. The real future potential of big data isn’t in its capacity to be big, but rather in just how small it can get. Glen Tullman - Forbes “ “