In this talk we share details about glomex's award-winning data management infrastructure. They’ll show you how a serverless approach can scale automatically to the demands of a highly unpredictable industry as video clips go viral arbitrarily. What is the best architecture for real time data processing? How does a batch-driven BI workflow fit in? What are the key benefits of going to the Cloud? Which AWS services should you use?
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
Scale up - How to build adaptive data systems in the age of virality
1. 1glomex – A company of ProSiebenSat.1 Media SE
glomex
THE GLOBAL MEDIA
EXCHANGE
Scale up - How to build adaptive data systems in the age of virality
Winner “Best Data Management and Infrastructure”
Gartner Data & Analytics Summit 2017, London
2. 2glomex – A company of ProSiebenSat.1 Media SE
About us
Head of Site Reliability Engineering
@jobrandstetter
• 1 year with glomex (founded in May ‘16)
• Deeply involved in building Data Platform
• Software Developer, MongoDB Master, Metrics Nerd
Vice President of Engineering
@michael_muckel
• Founding member of the glomex Data Platform Team
• Responsible for glomex Engineering
• Software Architecture, Analytics, Machine Learning
3. 3glomex – A company of ProSiebenSat.1 Media SE
No.1 TV sales house in GSA
~3bn video views per year
50+ online & mobile platforms
20 years of technical excellence
Where We Come From
No.1 TV broadcaster in Germany
glomex – a ProSiebenSat.1 company
4. 4glomex – A company of ProSiebenSat.1 Media SE
glomex is …
the platform as a service provider for online video
management, serving 3bn. video views per year today
the platform owner of a global marketplace for online
video distribution
the pioneer in video delivery management for publishers
and content owners around the globe
Who we are
5. 5glomex – A company of ProSiebenSat.1 Media SE
Video Value Service
Media Exchange Service
Media Delivery Service
glomex
Content providers
Publishers
The global B2B content marketplace
TV broadcasters and
web-only content producers
6. 6glomex – A company of ProSiebenSat.1 Media SE
Video Value Service Media Delivery Service Media Exchange Service
Data Platform
Real-Time-Monitoring Analytics Machine Learning
Data Platform
7. 7glomex – A company of ProSiebenSat.1 Media SE
Serverless Computing
8. 8glomex – A company of ProSiebenSat.1 Media SE
Serverless Basics
Evolution of Computing
Weeks Minutes Seconds
On-premise Virtual Machines Containers
Amazon EC2 Amazon ECS
9. 9glomex – A company of ProSiebenSat.1 Media SE
AWS Lambda
Notification
Amazon S3 AWS Lambda
processes
the object
Amazon S3
New object
uploaded
Amazon
DynamoDB
10. 10glomex – A company of ProSiebenSat.1 Media SE
AWS Lambda AWS Lambda Amazon API GatewayAmazon Kinesis
Serverless – Ingest and Serving
11. 11glomex – A company of ProSiebenSat.1 Media SE
• Read data from Kinesis Firehose / S3
• Server downtime / scheduler
• Load to ElasticSearch
• Clean ElasticSearch and Redshift
• Advanced Redshift monitoring
• EBS Snapshots
Be Serverless. Everywhere.
12. 12glomex – A company of ProSiebenSat.1 Media SE
AWS Lambda Execution
13. 13glomex – A company of ProSiebenSat.1 Media SE
Some Facts
50 GB
50 Million
Per day click-stream data IN
Click-stream records processed per day
~100 ms Data freshness to S3
25 GB
1 Billion
Per day as zipped CDN log-files
CDN record processed per day
< 1 min Data freshness to API
14. 14glomex – A company of ProSiebenSat.1 Media SE
• AWS gives you primitives
• AWS tools totally unopinionated
• In beginning of 2016 hardly any good tooling available
• Lots of problems with reliable Lambda deployments
(binary packages)
• Lack of common workflow for various deployment
scenarios
• Better Developer Experience
Development Workflow
15. 15glomex – A company of ProSiebenSat.1 Media SE
glomex Cloud Deployment Tools
Agile Cloud Deployment
雲
kumo
ラムダ
ramuda
幽玄
yugen
展開
tenkai
• Used by other teams
• Slack and monitoring
integration
• Simplify build automation
• Codify proven practices
• Enable self-service
• Automate deployments
• 6000 deployments in 02/2017
• 1100 deployments on prod in
02/2017
• 300 errors (5%)
16. 16glomex – A company of ProSiebenSat.1 Media SE
Use Case - Monitoring Video Streaming Experience
17. 17glomex – A company of ProSiebenSat.1 Media SE
Focus on Metrics from the User‘s Perspective
From Server-Uptime To (anonymized) Real-User Monitoring
Monitoring Video Streaming Experience
18. 18glomex – A company of ProSiebenSat.1 Media SE
Scalable, Stream-based Ingest Pipeline
19. 19glomex – A company of ProSiebenSat.1 Media SE
Traffic Patterns
20. 20glomex – A company of ProSiebenSat.1 Media SE
Use Case 2 - Serverless Recommender System
Publisher’s URL
Serverless Recommender
Generated PlaylistPublisher Website
Adaptive Page Crawler
Entity Extraction
Topic Modeling
Search Metadata Index
21. 21glomex – A company of ProSiebenSat.1 Media SE
Infrastructure as Code
Agile Cloud Deployment
Structure Speed SecurityHealth
22. 22glomex – A company of ProSiebenSat.1 Media SE
Quick Lessons Learned
• Focus on feature development and robust pipelines not on infrastructure management
• AWS managed services provide a robust way to run complex big data infrastructures
• Cross functional teams help velocity and quality
• Use a micro service architecture
23. 23glomex – A company of ProSiebenSat.1 Media SE
Thanks for Listening!
Visit us at – explore.glomex.com
We‘ve recently opened a London office and we‘re hiring:
• Content & Publishers Sales Managers
• Media Sales Leaders
• Account Managers
Editor's Notes
Video value service: video player + ad block prevention, monetizing of content
Media exchange service
Media delivery serve: vod / live
Video value service: video player + ad block prevention, monetizing of content
Media exchange service
Media delivery serve: vod / live
Serverless computing, also known as function as a service (FaaS), is a cloud computing code execution model in which the cloud provider fully manages starting and stopping of a function's container platform as a service (PaaS) as necessary to serve requests, and requests are billed by an abstract measure of the resources required to satisfy the request, rather than per virtual machine, per hour.[1]
Deploy pure functions in Java, Python, Node.js and C#
Build event-driven apps
Build restful apis in conjunction with Amazon API
Pay as you go: number of requests + execution time (100ms slots)
One example: we work serverless and scale automatically
API for internal services
1 $ / hour for 25GB CDN log file processing
AWS also gives tools: chalice, ecs-tool,
your job as a responsible engineer is to add guard rails, build a clear promotion path for validating changesets into production, and limit the scope of the world it is capable of destroying
gcdt already solves existing problems with both services(wiring/rollbacks/failsafes/binary packages/bundling)
kumo (雲 from Japanese: cloud)
ramuda (ラムダ from Japanese: lambda)
yugen (幽玄 from Japanese: “dim”, “deep” or “mysterious”)
tenkai (展開 from Japanese: deployment)
We come from a technical background, so we measured server uptime, but that doesn’t say anything about user experience
Time to video start
Video quality (which streams do users get?)
Video player either web or native
The spikes are Thursdays: GNTM with Heidi Klum
API Gateway / Lambda
Structure: common services look alike
Speed: most of DP was built by 4 developers in 5 months
Health: Monitoring part of IaC, ramuda does sanity checks
Security: custom checks before deployments -> open ports to world, non ssl traffic, custom ciphers for elb
AWS also gives tools: chalice, ecs-tool,
your job as a responsible engineer is to add guard rails, build a clear promotion path for validating changesets into production, and limit the scope of the world it is capable of destroying
gcdt already solves existing problems with both services(wiring/rollbacks/failsafes/binary packages/bundling)
Microservice: separation of concerns