This document summarizes the evolution of video measurement and analytics solutions used by a company. It describes several solutions the company implemented, including sending tracking events to a server hosted on Rackspace [1], distributing event collection to Akamai and processing with Hadoop/Hive/SQL in Azure [2], and ultimately implementing a solution using Snowplow that addressed all of their requirements [3]. Key benefits of Snowplow included no limits on data, flexible data modeling, fast reporting and owning their own data. The document ends by discussing lessons learned around data quality, infrastructure costs, modeling needs and focusing on small, actionable insights from big data.
2. 2
HUMAN BEHAVIOR EVOLVED
A PICTURE IS WORTH
1,000
WORDS
6SECONDS OF VIDEO
1.8 MILLIONWORDS PER MINUTE
DR JAMES MCQUIVEY
OF FORRESTER RESEARCH
CONSUMERS
EXPECT MORE
TAP, TOUCH, ENGAGE
AND INTERACT
6. 6
3.2%
of video viewers submitted
their email addresses in the
forms inside the video
7.1%
of video viewers clicked a
call to action and visited
Cuisinart’s product pages
23.8%
of video viewers “Liked”
Cuisinart’s Facebook page
7. 7
30%
Improvement in CTR as
compared to the rest of the
advertising campaign
15%
Reduction in costs over
previous campaign efforts
using video
9. 9
- Send tracking events as query string params to
server hosted on Rackspace
- Hourly job to parse log files and insert summary data into SQL
- Problems:
- Network Bottleneck – dropping events
- Managing SQL server drive space
- No scalability
- Because of sizing problems we limited ourselves in
what we collected – poor analytics
- No enrichment process
Solution 1
10. 10
- Distribute the collection of the tracking events to Akamai cloud (GET
requests to CDN endpoint)
- Akamai aggregate logs and send every 4 hours a batch of logs via
FTP
- Hadoop – Hive – SQL summary tables all hosted in Azure cloud
- Problems:
- Need for faster end to end reporting
- To stay scalable need for summary tables- lose granular reporting
- Changes to the data we need to report on requires re-building and
possibly re-importing of raw data – data modeling
Hadoop/HIVE/SQL
Akamai
Solution 2
11. 11
Requirements doc for new solution
- Work with Flash and Javascript trackers
- Robust data modeling - Ability to change business requirements on the
fly
- No need for summary data – granular reporting
- Robust and reliable enrichment process
- Fast and flexible end to end solution
3rd Party Solution
- Ability to send unlimited events and unstructured data
- Pricing not based on event volume (Dec. 779 Million)
- We own the data
- Amazing customer service
- Beautiful and useful visualizations and data export API (may require
additional 3rd party)
12. 12
Requirement doc for solution
- Work with Flash and Javascript trackers
- Pricing not based on event volume (Dec. 779 Million)
- Ability to send unlimited events and unstructured data
- Amazing customer service
- Fast and flexible end to end solution
- We own the data
- Robust data modeling - Ability to change business
requirements on the fly
- No need for summary data – granular reporting
- Robust and reliable enrichment process
- Beautiful and useful visualizations and data export
API (may require additional 3rd party)
Solution- Snowplow
- We wrote an Open Source AS3 tracker
- Fixed monthly fee + AWS usage
- No limits on size or event type
- Amazing customer service
- Pipeline can be adjusted based on needs
- Sits in our AWS account
- Because all data is stored we can change the
pipeline rules and at any time and re-run
- We learned to live with summary data
- Constantly growing- today surpasses our needs
- Today using Bime Analytics – soon to be in house
charting components or Amazon Quicksite
13. 13
Gotchas we ran into
- Errors in the raw data being sent in – garbage in garbage out!
- Solution- at the time- was not auto-scaling.
- Redshift is not MS SQL server- need to understand nuances of
columnar database queries and optimizations
- Real data analysts don’t want charts- they want data. We spent
a lot of time and money perfecting our charts when ultimately our
customers want csv exports. Today our charts are about 95% for
marketing purposes.
- AWS cost forecasting and control
- Data modeling - Ultimately we do need to summarize but at an
acceptable level.
- Invest heavily in this stage.
- Overestimate your needs – You don’t know what you don’t
know.
- Work with Snowplow (at extra cost) to get it right
14. 14
What value do our analytics
provide?
It’s not that big data is bad, but by looking
for the big wins, we risk losing the most
exciting potential of big data: the very
small actionable insights that are unique
to each individual. The real future
potential of big data isn’t in its capacity to
be big, but rather in just how small it can
get.
Glen Tullman - Forbes
“
“