Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Yannick Dawant & Vinh Nguyen
MovingfromMySQLto
ElasticsearchforAnalytics
— What is Analytics, and why is it important to Percolate?
— Analytics 1.0 - MySQL
— Analytics 2.0 - Elasticsearch
— Next ...
TheSystemofRecordforMarketing
WhatdoesAnalyticsmeanto
Percolate?

Howdoesitwork?
Analytics1.0-Design
Crawlers MySQL
API
UI
Facebook
Twitter
Instagram
LinkedIn
[…]
metrics
MySQLDataModel
post_id service_id tag created_at
1 1 blog 2016-01-01 10:11:15
2 1 blog, video 2016-01-01 12:12:30
3 2 elec...
— Relational data models
— Very well known pattern
— Application-level objects map cleanly to DB tables
— Joins are easy t...
Seemsreasonable.

Whatarethetradeoffs?
— Data Modeling Issues
— Starts easy but becomes complex over time (increasing number of tables)
— Schema inflexibility (d...
— Scalability Issues
— Database grows larger and larger over time
— Scaling is mostly vertical (add more CPU/RAM/disk to s...
Wheredowegofromhere?
Analytics1.0-Design
Crawlers MySQL
API
UI
Facebook
Twitter
Instagram
LinkedIn
[…]
metrics
Analytics2.0-Design
Crawlers Elasticsearch
API
UI
Facebook
Twitter
Instagram
LinkedIn
[…]
MySQL
Kafka Data Transformation
...
— Decouples data collection from storage
— Enhances reliability of our data pipelines
— Message queue persistence, replay
...
— Applies data transformation rules
— Validation, enrichment, denormalization, rollups
— Writes data to various indexes in...
{

"_index" : "analytics_2016-11-01",

"_type" : "post",

"_id" : "f6065582-a2d7-11e6-bee7-22000ae51cc9",

"post_id": "193...
— Document based datastore
— Flexible schemas, dynamic mapping, mapping templates
— JSON, rich data structures, nested obj...
— Search
— Rich set of built-in queries
— Powerful aggregations (and sub aggregations)
— Scalability
— More control over s...
Seemsreasonable.

Whatarethetradeoffs?
— Data updates are more complex
— Update by query, upserts, script security issues
— Not truly schema-less
— Reindexing is...
— More index management
— Better support for different types of indexes, each with own settings
— Add APIs + Tools for ope...
https://percolate.com/careers/
We’reHiring!
Moving From MySQL to Elasticsearch for Analytics
Upcoming SlideShare
Loading in …5
×

Moving From MySQL to Elasticsearch for Analytics

1,833 views

Published on

This presentation gives a technical overview of Percolate next generation Analytics system. It describes the first generation system, it challenges, and how Percolate uses the latest technology to build its new Analytics system.

Published in: Data & Analytics
  • Be the first to comment

Moving From MySQL to Elasticsearch for Analytics

  1. 1. Yannick Dawant & Vinh Nguyen MovingfromMySQLto ElasticsearchforAnalytics
  2. 2. — What is Analytics, and why is it important to Percolate? — Analytics 1.0 - MySQL — Analytics 2.0 - Elasticsearch — Next Steps Agenda
  3. 3. TheSystemofRecordforMarketing
  4. 4. WhatdoesAnalyticsmeanto Percolate?
 Howdoesitwork?
  5. 5. Analytics1.0-Design Crawlers MySQL API UI Facebook Twitter Instagram LinkedIn […] metrics
  6. 6. MySQLDataModel post_id service_id tag created_at 1 1 blog 2016-01-01 10:11:15 2 1 blog, video 2016-01-01 12:12:30 3 2 election 2016 2016-01-01 10:10:57 metric_id service_id name 1 1 likes 2 1 comments 3 1 follows 4 2 follows 5 2 mentions 6 2 retweets post_id metric_id metric_value captured_at 1 1 10 2016-01-01 10:11:15 1 1 20 2016-01-01 12:12:30 2 2 5 2016-01-01 10:10:57 2 2 10 2016-01-01 13:12:20 3 1 15 2016-01-01 13:12:45 3 2 30 2016-01-01 17:05:11 [post] service_id name 1 facebook 2 twitter 3 instagram [service] [post_metrics] [metric_names]
  7. 7. — Relational data models — Very well known pattern — Application-level objects map cleanly to DB tables — Joins are easy to do — Easy to use — Amazon RDS for managed hosting/deployment/monitoring — Very familiar to Ops team and other developers, shared knowledge base — Lots of support available online — Met product requirements WhyMySQL?
  8. 8. Seemsreasonable.
 Whatarethetradeoffs?
  9. 9. — Data Modeling Issues — Starts easy but becomes complex over time (increasing number of tables) — Schema inflexibility (dynamic changes, unused columns) — Hard to modify live schemas, may require downtime — Slow Queries — Lots of joins at query time — Tables grow larger and larger over time — Hard to partition Time series data — Expensive post-processing on application side MySQLTradeoffs
  10. 10. — Scalability Issues — Database grows larger and larger over time — Scaling is mostly vertical (add more CPU/RAM/disk to same node), may require downtime — Hard to scale horizontally — Not suitable for our Search needs MySQLTradeoffs
  11. 11. Wheredowegofromhere?
  12. 12. Analytics1.0-Design Crawlers MySQL API UI Facebook Twitter Instagram LinkedIn […] metrics
  13. 13. Analytics2.0-Design Crawlers Elasticsearch API UI Facebook Twitter Instagram LinkedIn […] MySQL Kafka Data Transformation metrics Data Transformation
  14. 14. — Decouples data collection from storage — Enhances reliability of our data pipelines — Message queue persistence, replay — Enhances horizontal scalability of our data pipelines — Multiple brokers, parallel consumers/producers WhyKafka?
  15. 15. — Applies data transformation rules — Validation, enrichment, denormalization, rollups — Writes data to various indexes in ES — Error handling — Network issues, ES load/timeout issues, mapping conflicts — Multiple workers to increase overall throughput — Real time and asynchronous workers DataTransformation
  16. 16. {
 "_index" : "analytics_2016-11-01",
 "_type" : "post",
 "_id" : "f6065582-a2d7-11e6-bee7-22000ae51cc9",
 "post_id": "19398339", "service": "facebook",
 "captured_at": "2016-10-31T20:32:17+00:00",
 "metrics": {
 "comments": 13,
 "consumptions": 132, “engaged": 24, "impressions": 132, "likes": 50, “negative_feedback": 5, "reach": 93,
 "shares": 76 “video_views": 42
 },
 "tags": ["blog","video"]
 } ElasticsearchDataModel
  17. 17. — Document based datastore — Flexible schemas, dynamic mapping, mapping templates — JSON, rich data structures, nested objects — REST APIs make integration simple — Query performance — Shards spread across nodes (versus entire MySQL DB/table on single node) — Rolling indexes for Time series data == querying only the indexes needed (versus entire MySQL table) WhyElasticsearch?
  18. 18. — Search — Rich set of built-in queries — Powerful aggregations (and sub aggregations) — Scalability — More control over shards and indexes — Horizontally scale by adding more nodes and clusters — Easy to archive old data/indexes to free up resources — Meets current and *new* product requirements WhyElasticsearch?
  19. 19. Seemsreasonable.
 Whatarethetradeoffs?
  20. 20. — Data updates are more complex — Update by query, upserts, script security issues — Not truly schema-less — Reindexing is time consuming — Adding fields, mapping conflicts — Still need custom, index management layer — Index mappings, settings, templates, naming patterns, data retention, backup/restore — Operating ES requires effort — Deployment, configuration, performance tuning, monitoring ElasticsearchTradeoffs
  21. 21. — More index management — Better support for different types of indexes, each with own settings — Add APIs + Tools for operations — Avoid oversharding, which causes cluster stability issues — More focus on UPDATE operations — Field updates (i.e. tags) require update by query/script — Faster reindexing (i.e. adding new fields, changing field mappings) — Slow updates/reindexing can affect other system operations/transactions — Data denormalization vs joins — More production monitoring NextSteps
  22. 22. https://percolate.com/careers/ We’reHiring!

×