Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Real-time Insights into Application
Events
Who Are We?
Software Engineers in Netflix’s
Platform Engineering team,
working on very large scale data
infrastructure
Bui...
Why We Are Here?
No Monitoring Metrics Today
Netflix is a log generating company
that also happens to stream movies
- Adrian Cockroft

photo credit: http://www.flickr....
1,500,000
70,000,000,000
Making Sense of Billions of Events
A Humble Beginning
Things Changed
Application

Application

Application
Application

Application

Application

Application

Application

Application

Applic...
So We Evolved

hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket
What Is Missing?
Interactive Exploration
Use Cases
Real-time or Product
Business Operational
Metrics
Insignts
Getting Results Back in Seconds

150,000
Querying Data Along Different Dimensions
Discover Outstanding Data

HTTP 500
Discover Outstanding Data
See Trends Over Time
See Data Distributions
It’s All about Extracting Small Data
Out of Big Data
But Then What?
Intelligent Alerts
Guided Debugging in the Right Context
Guided Debugging in the Right Context
Guided Debugging in the Right Context
Technical Challenges
Problem:
Minimizing programming effort

Solution:

-Homogeneous architecture
- Separating producing logs from
consuming lo...
Field Name

Field Value

Client

“API”

Server

“Cryptex”

StatusCode

200

ResponseTime

73
A Single Data Pipeline

Log data

Log Filter

Collector
Agent
Log Collectors

LogManager.logEvent(anEvent)
Reliable/Flexible Data Pipeline

Log Filter

Sink Plugin

Log Filter

Sink Plugin

Log Filter

Sink Plugin

Server Farm

S...
Problem:
Not All Logs Are Worth Processing

Solution:
Dynamic Filtering
Problem:
Realtime Ingestion

Solution:
Druid & ElasticSearch
ElasticSearch

-Distributed restful search analytics
- Lucene based, Full text search
- High availability
- Faceted search...
Druid

-Real-time indexing and querying
- Arbitrary slicing and dicing, rolling
up and drilling down

- Packaged queries -...
Druid Architecture
RealTime Nodes

Hand off data

Historical Nodes

Deep Storage
Query API

Query API
Query Rewrite
Scatte...
Colmum Compression
timestamp
2011-01-01T00:01:35Z
2011-01-01T00:03:63Z
2011-01-01T00:04:51Z
2011-01-01T01:00:00Z
2011-01-0...
Bitmap Index
timestamp
2011-01-01T00:01:35Z
2011-01-01T00:03:63Z
2011-01-01T00:04:51Z
2011-01-01T01:00:00Z
2011-01-01T02:0...
Problem:
JSON Payload Is Tedious

Solution:
Build a parser
curl -X POST http://druid -d @data
There’s More
System Monitoring
System Resilience
System Operability
Problem:
So many combinations of configurations

Solution:
Build a flexible load testing tool
Problem:
Managing data sources can be hairy

Solution:
Use cell-like deployment
Druid

Kafka

Druid

Druid

Kafka

Kafka

Log Data Pipeline
Problem:
How do we know everything of the
new systems?

Solution:
Extensive instrumentation with
Servo and Atlas
Problem:
Many open-sourced solutions
assume static configuration

Solution:
Integrating with Netflix platform,
particularl...
Problem:
Zookeeper goes down, and so does
connections for Kakfa clients

Solution:
Replacing zkClient with Apache
Curator
Technology Stacks
- Netflix OSS: Powerful cloud computation
- Suro: Internal main data pipeline
- Kafka: High-throughput a...
Thank You!
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
213 event processingtalk-deviewkorea.key
Upcoming SlideShare
Loading in …5
×

213 event processingtalk-deviewkorea.key

3,477 views

Published on

Published in: Technology, Business
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/39sFWPG ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❶❶❶ http://bit.ly/39sFWPG ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

213 event processingtalk-deviewkorea.key

  1. 1. Real-time Insights into Application Events
  2. 2. Who Are We? Software Engineers in Netflix’s Platform Engineering team, working on very large scale data infrastructure Building and operating Netflix’s cloud real-time query service
  3. 3. Why We Are Here?
  4. 4. No Monitoring Metrics Today
  5. 5. Netflix is a log generating company that also happens to stream movies - Adrian Cockroft photo credit: http://www.flickr.com/photos/decade_null/142235888/sizes/o/in/photostream/
  6. 6. 1,500,000
  7. 7. 70,000,000,000
  8. 8. Making Sense of Billions of Events
  9. 9. A Humble Beginning
  10. 10. Things Changed
  11. 11. Application Application Application Application Application Application Application Application Application Application
  12. 12. So We Evolved hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket
  13. 13. What Is Missing?
  14. 14. Interactive Exploration
  15. 15. Use Cases Real-time or Product Business Operational Metrics Insignts
  16. 16. Getting Results Back in Seconds 150,000
  17. 17. Querying Data Along Different Dimensions
  18. 18. Discover Outstanding Data HTTP 500
  19. 19. Discover Outstanding Data
  20. 20. See Trends Over Time
  21. 21. See Data Distributions
  22. 22. It’s All about Extracting Small Data Out of Big Data
  23. 23. But Then What?
  24. 24. Intelligent Alerts
  25. 25. Guided Debugging in the Right Context
  26. 26. Guided Debugging in the Right Context
  27. 27. Guided Debugging in the Right Context
  28. 28. Technical Challenges
  29. 29. Problem: Minimizing programming effort Solution: -Homogeneous architecture - Separating producing logs from consuming logs
  30. 30. Field Name Field Value Client “API” Server “Cryptex” StatusCode 200 ResponseTime 73
  31. 31. A Single Data Pipeline Log data Log Filter Collector Agent Log Collectors LogManager.logEvent(anEvent)
  32. 32. Reliable/Flexible Data Pipeline Log Filter Sink Plugin Log Filter Sink Plugin Log Filter Sink Plugin Server Farm Server Farm Log Collectors Hadoop Kafka Druid Server Farm photo credit: http://www.flickr.com/photos/decade_null/142235888/sizes/m/in/photostream/ Kafka ElasticSearch
  33. 33. Problem: Not All Logs Are Worth Processing Solution: Dynamic Filtering
  34. 34. Problem: Realtime Ingestion Solution: Druid & ElasticSearch
  35. 35. ElasticSearch -Distributed restful search analytics - Lucene based, Full text search - High availability - Faceted search, a little slow
  36. 36. Druid -Real-time indexing and querying - Arbitrary slicing and dicing, rolling up and drilling down - Packaged queries - TopN, Time Series, Histograms, Cardinalities
  37. 37. Druid Architecture RealTime Nodes Hand off data Historical Nodes Deep Storage Query API Query API Query Rewrite Scatter/Gatter Broker Nodes slide credit: Eric Cheddar @Metamx
  38. 38. Colmum Compression timestamp 2011-01-01T00:01:35Z 2011-01-01T00:03:63Z 2011-01-01T00:04:51Z 2011-01-01T01:00:00Z 2011-01-01T02:00:00Z 2011-01-01T02:00:00Z ... publisher advertiser gender country bieberfever.com google.com Male USA bieberfever.com google.com Male USA bieberfever.com google.com Male USA ultratrimfast.com google.com Female UK ultratrimfast.com google.com Female UK ultratrimfast.com google.com Female UK Create Ids: bieberfever.com -> 0, ultratrimfast.com-> 1 Store: publisher -> [0, 0, 0, 1, 1, 1] advertiser -> [0, 0, 0, 0, 0, 0] slide credit: Eric Cheddar @Metamx ... 0.65 0.62 0.45 0.87 0.99 1.53
  39. 39. Bitmap Index timestamp 2011-01-01T00:01:35Z 2011-01-01T00:03:63Z 2011-01-01T00:04:51Z 2011-01-01T01:00:00Z 2011-01-01T02:00:00Z 2011-01-01T02:00:00Z ... publisher bieberfever.com bieberfever.com bieberfever.com ultratrimfast.com ultratrimfast.com ultratrimfast.com advertiser google.com google.com google.com google.com google.com google.com gender Male Male Male Female Female Female country USA USA USA UK UK UK ... 0.65 0.62 0.45 0.87 0.99 1.53 bieberfever.com -> [0, 1, 2] -> [111000] ultratrimfast.com -> [3, 4, 5] -> [000111] Compress CONCISE http://ricerca.mat.uniroma3.it/users/colanton/co slide credit: Eric Cheddar @Metamx
  40. 40. Problem: JSON Payload Is Tedious Solution: Build a parser
  41. 41. curl -X POST http://druid -d @data
  42. 42. There’s More
  43. 43. System Monitoring System Resilience System Operability
  44. 44. Problem: So many combinations of configurations Solution: Build a flexible load testing tool
  45. 45. Problem: Managing data sources can be hairy Solution: Use cell-like deployment
  46. 46. Druid Kafka Druid Druid Kafka Kafka Log Data Pipeline
  47. 47. Problem: How do we know everything of the new systems? Solution: Extensive instrumentation with Servo and Atlas
  48. 48. Problem: Many open-sourced solutions assume static configuration Solution: Integrating with Netflix platform, particularly Eureka
  49. 49. Problem: Zookeeper goes down, and so does connections for Kakfa clients Solution: Replacing zkClient with Apache Curator
  50. 50. Technology Stacks - Netflix OSS: Powerful cloud computation - Suro: Internal main data pipeline - Kafka: High-throughput and durable message queue - Druid: Efficient real-time multi-dimensional database on large-scale data - ElasticSearch: Distributed search engine - Kibana: ElasticSearch UI - Zookeeper: Distributed coordinator
  51. 51. Thank You!

×