Infectious Media runs on data. But, as an ad-tech company that records hundreds of thousands of web events per second, they have have to deal with data at a scale not seen by most companies. You can not make decisions with data when people need to write manual SQL only for queries take 10-20 minutes to return. Infectious Media made the switch to Google BigQuery and Looker and now every member of every team can get the data they need in seconds.
Infectious Media shares:
- Why they chose their current stack
- Why faster data means happier customers
- Advantages and practical implications of storing and processing that much data
Check out the recording at https://info.looker.com/h/i/308848878-power-to-the-people-a-stack-to-empower-every-user-to-make-data-driven-decisions
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
1. 1
Power to the People: A Stack
to Empower Every User to
Make Data-Driven Decisions
2. Housekeeping
• We will do Q&A at the end.
• You should see a box on the right
side of your screen.
• There is a button marked “Q&A” on
the bottom menu.
• We are recording this
• We will send you the recording & slides
tomorrow.
Recording
Q&A
3. Zev Lebowitz
Senior Sales Engineer
Daniel de Sybel
CTO
Meet Our Presenters
Karol Ussher
Head of Technology
Partnerships, EMEA
7. What is Google BigQuery?
Durable and Highly Available
Convenience of SQL
Petabyte-scale Storage and Queries
Fully Managed,
Serverless Enterprise Data Warehouse
8. BigQuery for Enterprise Features
SQ
LFlat-rate Pricing
Standard
SQL
ODBC &
JDBC
Connectors
DML
Identity Access and
Management
Stackdriver
9. Google confidential Do not distribute
2012 20132002 2004 2006 2008 2010
Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html
GFS
MapReduce
BigTable
Google Research in Data Technologies
Colossus
Dremel Flume
Megastore
Spanner
Millwheel
PubSub
F1
10. Now:
Typical Big Data Tasks
Next:
Big Data with Google
No-Ops
Auto Everything
Analysis and
Insights
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
growing
scale
Utilization
improvements
Analysis and
Insights
Understanding
11. Google confidential Do not distribute
Think about the Data Warehouse
Laura
Dremel
BigQuery
13. Confidential & ProprietaryGoogle Cloud Platform 13
"We are very excited about the productivity
benefits offered by Cloud Dataflow and Cloud
Pub/Sub. It took half a day to rewrite something
that had previously taken over six months to build
using Spark"
Paul Clarke, Director of Technology, Ocado
http://googlecloudplatform.blogspot.co.uk/2015/08/Announcing-General-Availability-of-Google-Cloud-Dataflow-and-Cloud-Pub-Sub.html
14. Confidential & ProprietaryGoogle Cloud Platform 14
“Spotify chose Google in part because its
services for analyzing large amounts of data,
tools like BigQuery, are more advanced than data
services from other cloud providers.”
Nicholas Harteau, VP of Infrastructure, Spotify
https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
15. Confidential & ProprietaryGoogle Cloud Platform 15
“Right at the start of the partnership we were able to reduce
time to insight from 96 hours to 30 minutes by using
BigQuery.”
– Gary Sanders, Head of Digital Analytics, Lloyds Banking
Group
17. Makes it easy for everyone
to find, explore and
understand
the data that drives your
business.
A Data Analytics platform that...
18. DATA BOTTLENECK
Which features
increase
engagement?
What triggers
a customer
churn?
Which web
page works
best?
How is
pipeline for
Q4?
Will we meet
our revenue
targets?
Which
customer is at
risk?
Which
campaigns
convert best?
Which rep is
converting
best?
Can we speed
up our
operations?
Are we
investing in the
right area?
Who are our
happiest
customers?
What industries
are we doing
well in?
Where should
we spend
more budget?
20. IS THERE A WAY TO FIND BALANCE?
Standards
Scalability
Governance
Self-Service
Agility
Flexibility
21. THE TECHNICAL PILLARS THAT MAKE IT POSSIBLE
100% In Database
Leverage all your data
Avoid summarizing or
moving it
Modern Web
Architecture
Access from anywhere
Share and collaborate
Extend to anyone
LookML Intelligent
Modeling Layer
Describe the data
Create reusable and
shareable business logic
22. LOOKER: A DATA PLATFORM
Find, explore and understand all the data
Explore Everything
Find, explore and
understand all the data
Create Standards
Define your data and
business metrics
Any SQL Database
Analyze all of your data
where it is stored
Build a Data Culture
Anyone can ask and
answer questions
How is
pipeline for
Q4?
Will we meet
our revenue
targets?
Which
campaigns
convert best?
Which rep is
converting
best?
Which
customer is at
risk?
Can we speed
up our
operations?
23. Looker - BigQuery Integration Highlights
In-Database
Architecture
The power of BigQuery is
directly leveraged by
Looker because all
transformation is done in-
database
Support for Native
BigQuery Functions
Integration with unique
features to BigQuery in the
product and modeling layer
make for a seamless
integration.
Highest Level of
Looker Features
We’ve invested in
providing Looker features
for BigQuery to make the
best experience possible.
25. OUR BUSINESS
● Founded in 2008
● Leading International Programmatic agency
● Covering all biddable media
● Activity live in 30+ markets
● Highly customisable O&O technology stack – DMP & DSP
● Transparent model
26. Impression Desk
OUR DATA-DRIVEN ADVERTISING PLATFORM
THAT PROVIDES FULL ACCESS TO THE
FRAGMENTED LANDSCAPE OF INVENTORY
AND DATA
BIDDER
BIDDERS
27. Data Processing
• 4k requests / sec @ 1kb = 4Mbps
(0.4Tb / day)
• 500k requests / sec @ 1kb = 0.5Gbps
(40Tb / day)
RTB: The Data Problem
Analytics
• Impression level data is a goldmine
• Anything that doesn’t fit in Excel
generally needs techie help
28. Infobright Community Edition
• Fantastic open source columnar database
• Could be easily installed in Amazon Web Services on a single server
• Used standard SQL for queries
Where we started...
Problems
• Concurrency wasn’t great
• Single threaded
• Could only manage around 1-2TB of data
• Data load could be slow
29. Infobright Enterprise Edition
• Simple upgrade path
• Multi-threaded
• Parallel data loads
Up next...
Problems
• Concurrency still wasn’t great
• Not cloud native
• Licence costs grew linearly with data volume
30. Hadoop
• Everyone else is doing it
• No licence costs
• Perfect for cloud deployment
From there...
Problems
• Analysts had to learn new ways of writing queries
• Concurrency was non-existent
• Server costs were difficult to control
• Took an army of infrastructure engineers to maintain it
32. Before BQ
• 20 mins to query 1 month of data
• Stored < 5Tb of data
• 1 infrastructure engineer to manage
server
• 2 data engineers to manage data
• 3 analysts to query data
Some Stats
After BQ
• 2 mins to query 3 months of data
• Store > 50Tb of data
• 0 infrastructure engineers (no-one
cares about the backend)
• 1 data engineer to manage data
• 6 analysts to query data
They cost
the same!
33. Something missing
• Optimisation managers still had to go to Analytics to ask questions
• Slowed down campaign optimisations and insights
• Led to impatience and frustration
34. • Elegant abstraction of our perfect DW via LookML
• Safe data exploration for Optimisers without needing Analysts
• Simple automated queries to email or import into Excel for clients
• Easy extension and evolution of data model with db
• Wait... user defined dashboards?
Enter
35. Optimisers looking to extend
travel campaign to Paris
Compared Paris audience with
existing London audience
Use insight to create new
strategy
Sped up optimal campaign
creation by a week
Audience Comparison
36. Dashboard can pinpoint
problems on sites/exchanges
Identifying fraud/brand safety
early reduces wasted spend
Problem sites/exchanges
added to blocklists
Traders need to tackle arms
race with fraudsters
Fraud and Brand Safety
37. Ongoing work
• Costs have quickly increased
Built cost monitoring dash in Looker
Investigating flat rate pricing
• Release of standard SQL
Has made queries faster
Requires a migration in LookML
• Release of BigQuery regions
Allows better data governance
But creates problems for querying across region
38. Final thoughts
• Scale is the constant enemy
• Scale makes even simple questions require smart
solutions
• BigQuery handles the scale most use Hadoop for
• Layering on Looker allows your team to get more
answers, not more problems
40. THANK YOU FOR JOINING
Recording and slides
will be posted.
We will email you the links
tomorrow.
Our Next Webinar:
Parse.ly & Looker
Beyond the Dashboard: What
You Can Learn From Raw
Audience Data on Thursday
See how Google
BigQuery and Looker
work with your data.
Visit cloud.google.com/free-trial
and looker.com/free-trial or
email discover@looker.com.