Moving From MySQL to Elasticsearch for Analytics

Percolate
PercolatePercolate
Yannick Dawant & Vinh Nguyen
MovingfromMySQLto
ElasticsearchforAnalytics
— What is Analytics, and why is it important to Percolate?
— Analytics 1.0 - MySQL
— Analytics 2.0 - Elasticsearch
— Next Steps
Agenda
TheSystemofRecordforMarketing
WhatdoesAnalyticsmeanto
Percolate?

Howdoesitwork?
Analytics1.0-Design
Crawlers MySQL
API
UI
Facebook
Twitter
Instagram
LinkedIn
[…]
metrics
MySQLDataModel
post_id service_id tag created_at
1 1 blog 2016-01-01 10:11:15
2 1 blog, video 2016-01-01 12:12:30
3 2 election 2016 2016-01-01 10:10:57
metric_id service_id name
1 1 likes
2 1 comments
3 1 follows
4 2 follows
5 2 mentions
6 2 retweets
post_id metric_id metric_value captured_at
1 1 10 2016-01-01 10:11:15
1 1 20 2016-01-01 12:12:30
2 2 5 2016-01-01 10:10:57
2 2 10 2016-01-01 13:12:20
3 1 15 2016-01-01 13:12:45
3 2 30 2016-01-01 17:05:11
[post]
service_id name
1 facebook
2 twitter
3 instagram
[service]
[post_metrics] [metric_names]
— Relational data models
— Very well known pattern
— Application-level objects map cleanly to DB tables
— Joins are easy to do
— Easy to use
— Amazon RDS for managed hosting/deployment/monitoring
— Very familiar to Ops team and other developers, shared knowledge base
— Lots of support available online
— Met product requirements
WhyMySQL?
Seemsreasonable.

Whatarethetradeoffs?
— Data Modeling Issues
— Starts easy but becomes complex over time (increasing number of tables)
— Schema inflexibility (dynamic changes, unused columns)
— Hard to modify live schemas, may require downtime
— Slow Queries
— Lots of joins at query time
— Tables grow larger and larger over time
— Hard to partition Time series data
— Expensive post-processing on application side
MySQLTradeoffs
— Scalability Issues
— Database grows larger and larger over time
— Scaling is mostly vertical (add more CPU/RAM/disk to same node), may require downtime
— Hard to scale horizontally
— Not suitable for our Search needs
MySQLTradeoffs
Wheredowegofromhere?
Analytics1.0-Design
Crawlers MySQL
API
UI
Facebook
Twitter
Instagram
LinkedIn
[…]
metrics
Analytics2.0-Design
Crawlers Elasticsearch
API
UI
Facebook
Twitter
Instagram
LinkedIn
[…]
MySQL
Kafka Data Transformation
metrics
Data Transformation
— Decouples data collection from storage
— Enhances reliability of our data pipelines
— Message queue persistence, replay
— Enhances horizontal scalability of our data pipelines
— Multiple brokers, parallel consumers/producers
WhyKafka?
— Applies data transformation rules
— Validation, enrichment, denormalization, rollups
— Writes data to various indexes in ES
— Error handling
— Network issues, ES load/timeout issues, mapping conflicts
— Multiple workers to increase overall throughput
— Real time and asynchronous workers
DataTransformation
{

"_index" : "analytics_2016-11-01",

"_type" : "post",

"_id" : "f6065582-a2d7-11e6-bee7-22000ae51cc9",

"post_id": "19398339",
"service": "facebook",

"captured_at": "2016-10-31T20:32:17+00:00",

"metrics": {

"comments": 13,

"consumptions": 132,
“engaged": 24,
"impressions": 132,
"likes": 50,
“negative_feedback": 5,
"reach": 93,

"shares": 76
“video_views": 42

},

"tags": ["blog","video"]

}
ElasticsearchDataModel
— Document based datastore
— Flexible schemas, dynamic mapping, mapping templates
— JSON, rich data structures, nested objects
— REST APIs make integration simple
— Query performance
— Shards spread across nodes (versus entire MySQL DB/table on single node)
— Rolling indexes for Time series data == querying only the indexes needed (versus entire
MySQL table)
WhyElasticsearch?
— Search
— Rich set of built-in queries
— Powerful aggregations (and sub aggregations)
— Scalability
— More control over shards and indexes
— Horizontally scale by adding more nodes and clusters
— Easy to archive old data/indexes to free up resources
— Meets current and *new* product requirements
WhyElasticsearch?
Seemsreasonable.

Whatarethetradeoffs?
— Data updates are more complex
— Update by query, upserts, script security issues
— Not truly schema-less
— Reindexing is time consuming
— Adding fields, mapping conflicts
— Still need custom, index management layer
— Index mappings, settings, templates, naming patterns, data retention, backup/restore
— Operating ES requires effort
— Deployment, configuration, performance tuning, monitoring
ElasticsearchTradeoffs
— More index management
— Better support for different types of indexes, each with own settings
— Add APIs + Tools for operations
— Avoid oversharding, which causes cluster stability issues
— More focus on UPDATE operations
— Field updates (i.e. tags) require update by query/script
— Faster reindexing (i.e. adding new fields, changing field mappings)
— Slow updates/reindexing can affect other system operations/transactions
— Data denormalization vs joins
— More production monitoring
NextSteps
Moving From MySQL to Elasticsearch for Analytics
https://percolate.com/careers/
We’reHiring!
1 of 23

Recommended

Certificate-of-Completion by
Certificate-of-CompletionCertificate-of-Completion
Certificate-of-CompletionChristian Rangel
33 views1 slide
Elasticsearch勉強会第8回 ElasticsearchとKibanaで実現する 30億req/dayのリアルタイム分析 by
Elasticsearch勉強会第8回 ElasticsearchとKibanaで実現する 30億req/dayのリアルタイム分析Elasticsearch勉強会第8回 ElasticsearchとKibanaで実現する 30億req/dayのリアルタイム分析
Elasticsearch勉強会第8回 ElasticsearchとKibanaで実現する 30億req/dayのリアルタイム分析Naoyuki Yamada
9.9K views17 slides
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知 by
Norikra + Fluentd+ Elasticsearch + Kibana リアルタイムストリーミング処理ログ集計による異常検知Norikra + Fluentd+ Elasticsearch + Kibana リアルタイムストリーミング処理ログ集計による異常検知
Norikra + Fluentd + Elasticsearch + Kibana リアルタイムストリーミング処理 ログ集計による異常検知daisuke-a-matsui
4.4K views125 slides
Percolate: Content Marketing 360: From Planning to Performance by
Percolate: Content Marketing 360: From Planning to PerformancePercolate: Content Marketing 360: From Planning to Performance
Percolate: Content Marketing 360: From Planning to PerformancePercolate
4.4K views34 slides
Mstr meetup by
Mstr meetupMstr meetup
Mstr meetupBhavani Akunuri
1K views42 slides
JSS2015 - Keynote jour 2 by
JSS2015 - Keynote jour 2JSS2015 - Keynote jour 2
JSS2015 - Keynote jour 2GUSS
819 views52 slides

More Related Content

Similar to Moving From MySQL to Elasticsearch for Analytics

Splunk at Oscar Health by
Splunk at Oscar HealthSplunk at Oscar Health
Splunk at Oscar HealthSplunk
2K views50 slides
Sql server 2008 r2 data mining whitepaper overview by
Sql server 2008 r2 data mining whitepaper overviewSql server 2008 r2 data mining whitepaper overview
Sql server 2008 r2 data mining whitepaper overviewKlaudiia Jacome
945 views18 slides
24 Hours of PASS -- Enterprise Data Mining with SQL Server by
24 Hours of PASS -- Enterprise Data Mining with SQL Server24 Hours of PASS -- Enterprise Data Mining with SQL Server
24 Hours of PASS -- Enterprise Data Mining with SQL ServerMark Tabladillo
596 views37 slides
Linda Ege Resume by
Linda Ege ResumeLinda Ege Resume
Linda Ege ResumeLinda Ege
290 views6 slides
Data mining (Part I) by
Data mining (Part I)Data mining (Part I)
Data mining (Part I)Rodrigo Dornel
1K views11 slides
Scaling up your Analytics & Insights by
Scaling up your Analytics & InsightsScaling up your Analytics & Insights
Scaling up your Analytics & InsightsLoQutus
1.7K views70 slides

Similar to Moving From MySQL to Elasticsearch for Analytics(20)

Splunk at Oscar Health by Splunk
Splunk at Oscar HealthSplunk at Oscar Health
Splunk at Oscar Health
Splunk2K views
Sql server 2008 r2 data mining whitepaper overview by Klaudiia Jacome
Sql server 2008 r2 data mining whitepaper overviewSql server 2008 r2 data mining whitepaper overview
Sql server 2008 r2 data mining whitepaper overview
Klaudiia Jacome945 views
24 Hours of PASS -- Enterprise Data Mining with SQL Server by Mark Tabladillo
24 Hours of PASS -- Enterprise Data Mining with SQL Server24 Hours of PASS -- Enterprise Data Mining with SQL Server
24 Hours of PASS -- Enterprise Data Mining with SQL Server
Mark Tabladillo596 views
Linda Ege Resume by Linda Ege
Linda Ege ResumeLinda Ege Resume
Linda Ege Resume
Linda Ege290 views
Scaling up your Analytics & Insights by LoQutus
Scaling up your Analytics & InsightsScaling up your Analytics & Insights
Scaling up your Analytics & Insights
LoQutus1.7K views
SQL Saturday 119 Chicago -- Enterprise Data Mining with SQL Server by Mark Tabladillo
SQL Saturday 119 Chicago -- Enterprise Data Mining with SQL ServerSQL Saturday 119 Chicago -- Enterprise Data Mining with SQL Server
SQL Saturday 119 Chicago -- Enterprise Data Mining with SQL Server
Mark Tabladillo562 views
Building the BI system and analytics capabilities at the company based on Rea... by GameCamp
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
GameCamp272 views
Grokking Techtalk #42: Engineering challenges on building data platform for M... by Grokking VN
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking Techtalk #42: Engineering challenges on building data platform for M...
Grokking VN204 views
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ... by Sri Ambati
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Sri Ambati754 views
Introduction To SQL Server 2014 by Vishal Pawar
Introduction To SQL Server 2014Introduction To SQL Server 2014
Introduction To SQL Server 2014
Vishal Pawar2K views
Yandex Metrica - SEO Meet-up Melbourne by Anton Surov
Yandex Metrica - SEO Meet-up MelbourneYandex Metrica - SEO Meet-up Melbourne
Yandex Metrica - SEO Meet-up Melbourne
Anton Surov270 views
Build Answer-generating Apps that Users Love: Development best practices for ... by TIBCO Jaspersoft
Build Answer-generating Apps that Users Love: Development best practices for ...Build Answer-generating Apps that Users Love: Development best practices for ...
Build Answer-generating Apps that Users Love: Development best practices for ...
TIBCO Jaspersoft300 views
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc... by Rahul Neel Mani
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Rahul Neel Mani392 views
IRJET- Data Analytics & Visualization using Qlik by IRJET Journal
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
IRJET Journal23 views
MANISH SHARMA(MSBI-QLIKVIEW) by manish sharma
MANISH SHARMA(MSBI-QLIKVIEW)MANISH SHARMA(MSBI-QLIKVIEW)
MANISH SHARMA(MSBI-QLIKVIEW)
manish sharma424 views
SQL Saturday 108 -- Enterprise Data Mining with SQL Server by Mark Tabladillo
SQL Saturday 108 -- Enterprise Data Mining with SQL ServerSQL Saturday 108 -- Enterprise Data Mining with SQL Server
SQL Saturday 108 -- Enterprise Data Mining with SQL Server
Mark Tabladillo499 views
Navigating the Workday Analytics and Reporting Ecosystem by Workday, Inc.
Navigating the Workday Analytics and Reporting EcosystemNavigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting Ecosystem
Workday, Inc.861 views

More from Percolate

7 Project Management Tips from Across Disciplines by
7 Project Management Tips from Across Disciplines7 Project Management Tips from Across Disciplines
7 Project Management Tips from Across DisciplinesPercolate
1.9K views9 slides
Moving from Stateful Components to Stateless Components by
Moving from Stateful Components to Stateless ComponentsMoving from Stateful Components to Stateless Components
Moving from Stateful Components to Stateless ComponentsPercolate
620 views25 slides
Content Strategy in a Changing World by
Content Strategy in a Changing WorldContent Strategy in a Changing World
Content Strategy in a Changing WorldPercolate
1.1K views54 slides
Pratt Parser in Python by
Pratt Parser in PythonPratt Parser in Python
Pratt Parser in PythonPercolate
2.7K views21 slides
The 50 Most Important Marketing Charts of 2016 by
The 50 Most Important Marketing Charts of 2016The 50 Most Important Marketing Charts of 2016
The 50 Most Important Marketing Charts of 2016Percolate
4.9K views59 slides
The Secret to Brand Growth? Mental and Physical Availability by
The Secret to Brand Growth? Mental and Physical AvailabilityThe Secret to Brand Growth? Mental and Physical Availability
The Secret to Brand Growth? Mental and Physical AvailabilityPercolate
38.1K views32 slides

More from Percolate(20)

7 Project Management Tips from Across Disciplines by Percolate
7 Project Management Tips from Across Disciplines7 Project Management Tips from Across Disciplines
7 Project Management Tips from Across Disciplines
Percolate1.9K views
Moving from Stateful Components to Stateless Components by Percolate
Moving from Stateful Components to Stateless ComponentsMoving from Stateful Components to Stateless Components
Moving from Stateful Components to Stateless Components
Percolate620 views
Content Strategy in a Changing World by Percolate
Content Strategy in a Changing WorldContent Strategy in a Changing World
Content Strategy in a Changing World
Percolate1.1K views
Pratt Parser in Python by Percolate
Pratt Parser in PythonPratt Parser in Python
Pratt Parser in Python
Percolate2.7K views
The 50 Most Important Marketing Charts of 2016 by Percolate
The 50 Most Important Marketing Charts of 2016The 50 Most Important Marketing Charts of 2016
The 50 Most Important Marketing Charts of 2016
Percolate4.9K views
The Secret to Brand Growth? Mental and Physical Availability by Percolate
The Secret to Brand Growth? Mental and Physical AvailabilityThe Secret to Brand Growth? Mental and Physical Availability
The Secret to Brand Growth? Mental and Physical Availability
Percolate38.1K views
Advertising for the Long Term by Percolate
Advertising for the Long TermAdvertising for the Long Term
Advertising for the Long Term
Percolate6K views
Be Distinctive, Not Different by Percolate
Be Distinctive, Not DifferentBe Distinctive, Not Different
Be Distinctive, Not Different
Percolate60.6K views
Why Mass Marketing Wins Over Targeted Efforts by Percolate
Why Mass Marketing Wins Over Targeted EffortsWhy Mass Marketing Wins Over Targeted Efforts
Why Mass Marketing Wins Over Targeted Efforts
Percolate14.8K views
Small vs. Large Brands: How to Become a Market Leader by Percolate
Small vs. Large Brands: How to Become a Market LeaderSmall vs. Large Brands: How to Become a Market Leader
Small vs. Large Brands: How to Become a Market Leader
Percolate8.1K views
11 Charts that Predict the Future of Marketing by Percolate
11 Charts that Predict the Future of Marketing11 Charts that Predict the Future of Marketing
11 Charts that Predict the Future of Marketing
Percolate13.3K views
Percolate's Company Values by Percolate
Percolate's Company ValuesPercolate's Company Values
Percolate's Company Values
Percolate24.9K views
7 Lessons Marketers Can Learn From MasterCard to become a Global Publishing P... by Percolate
7 Lessons Marketers Can Learn From MasterCard to become a Global Publishing P...7 Lessons Marketers Can Learn From MasterCard to become a Global Publishing P...
7 Lessons Marketers Can Learn From MasterCard to become a Global Publishing P...
Percolate2.1K views
How Much Does Marketing Really Cost? by Percolate
How Much Does Marketing Really Cost?How Much Does Marketing Really Cost?
How Much Does Marketing Really Cost?
Percolate3.2K views
Technology Macro Trends - What Marketers Need to Know in 2014 by Percolate
Technology Macro Trends - What Marketers Need to Know in 2014Technology Macro Trends - What Marketers Need to Know in 2014
Technology Macro Trends - What Marketers Need to Know in 2014
Percolate4.1K views
State of Content Marketing by Percolate
State of Content MarketingState of Content Marketing
State of Content Marketing
Percolate3.5K views
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure by Percolate
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
Percolate5.4K views
IPG Media Lab's Jack Pollock presents employees as signals at Percolate's #SP... by Percolate
IPG Media Lab's Jack Pollock presents employees as signals at Percolate's #SP...IPG Media Lab's Jack Pollock presents employees as signals at Percolate's #SP...
IPG Media Lab's Jack Pollock presents employees as signals at Percolate's #SP...
Percolate1.5K views
Building Community with American Express OPEN by Percolate
Building Community with American Express OPENBuilding Community with American Express OPEN
Building Community with American Express OPEN
Percolate1.2K views
MasterCard's Jennifer Stalzer presents The Evolution of the Corporate Newsroo... by Percolate
MasterCard's Jennifer Stalzer presents The Evolution of the Corporate Newsroo...MasterCard's Jennifer Stalzer presents The Evolution of the Corporate Newsroo...
MasterCard's Jennifer Stalzer presents The Evolution of the Corporate Newsroo...
Percolate2.9K views

Recently uploaded

Inawisdom Quick Sight by
Inawisdom Quick SightInawisdom Quick Sight
Inawisdom Quick SightPhilipBasford
7 views27 slides
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
9 views37 slides
Dr. Ousmane Badiane-2023 ReSAKSS Conference by
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceAKADEMIYA2063
5 views34 slides
Shreyas hospital statistics.pdf by
Shreyas hospital statistics.pdfShreyas hospital statistics.pdf
Shreyas hospital statistics.pdfsamithavinal
5 views9 slides
DGST Methodology Presentation.pdf by
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdfmaddierlegum
7 views9 slides
META.pptx by
META.pptxMETA.pptx
META.pptxvasanthan19012003
6 views10 slides

Recently uploaded(20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by Bertram Ludäscher
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
Shreyas hospital statistics.pdf by samithavinal
Shreyas hospital statistics.pdfShreyas hospital statistics.pdf
Shreyas hospital statistics.pdf
samithavinal5 views
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
CRM stick or twist.pptx by info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
Customer Data Cleansing Project.pptx by Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language... by patiladiti752
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
patiladiti7528 views
4_4_WP_4_06_ND_Model.pptx by d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views

Moving From MySQL to Elasticsearch for Analytics

  • 1. Yannick Dawant & Vinh Nguyen MovingfromMySQLto ElasticsearchforAnalytics
  • 2. — What is Analytics, and why is it important to Percolate? — Analytics 1.0 - MySQL — Analytics 2.0 - Elasticsearch — Next Steps Agenda
  • 6. MySQLDataModel post_id service_id tag created_at 1 1 blog 2016-01-01 10:11:15 2 1 blog, video 2016-01-01 12:12:30 3 2 election 2016 2016-01-01 10:10:57 metric_id service_id name 1 1 likes 2 1 comments 3 1 follows 4 2 follows 5 2 mentions 6 2 retweets post_id metric_id metric_value captured_at 1 1 10 2016-01-01 10:11:15 1 1 20 2016-01-01 12:12:30 2 2 5 2016-01-01 10:10:57 2 2 10 2016-01-01 13:12:20 3 1 15 2016-01-01 13:12:45 3 2 30 2016-01-01 17:05:11 [post] service_id name 1 facebook 2 twitter 3 instagram [service] [post_metrics] [metric_names]
  • 7. — Relational data models — Very well known pattern — Application-level objects map cleanly to DB tables — Joins are easy to do — Easy to use — Amazon RDS for managed hosting/deployment/monitoring — Very familiar to Ops team and other developers, shared knowledge base — Lots of support available online — Met product requirements WhyMySQL?
  • 9. — Data Modeling Issues — Starts easy but becomes complex over time (increasing number of tables) — Schema inflexibility (dynamic changes, unused columns) — Hard to modify live schemas, may require downtime — Slow Queries — Lots of joins at query time — Tables grow larger and larger over time — Hard to partition Time series data — Expensive post-processing on application side MySQLTradeoffs
  • 10. — Scalability Issues — Database grows larger and larger over time — Scaling is mostly vertical (add more CPU/RAM/disk to same node), may require downtime — Hard to scale horizontally — Not suitable for our Search needs MySQLTradeoffs
  • 14. — Decouples data collection from storage — Enhances reliability of our data pipelines — Message queue persistence, replay — Enhances horizontal scalability of our data pipelines — Multiple brokers, parallel consumers/producers WhyKafka?
  • 15. — Applies data transformation rules — Validation, enrichment, denormalization, rollups — Writes data to various indexes in ES — Error handling — Network issues, ES load/timeout issues, mapping conflicts — Multiple workers to increase overall throughput — Real time and asynchronous workers DataTransformation
  • 16. {
 "_index" : "analytics_2016-11-01",
 "_type" : "post",
 "_id" : "f6065582-a2d7-11e6-bee7-22000ae51cc9",
 "post_id": "19398339", "service": "facebook",
 "captured_at": "2016-10-31T20:32:17+00:00",
 "metrics": {
 "comments": 13,
 "consumptions": 132, “engaged": 24, "impressions": 132, "likes": 50, “negative_feedback": 5, "reach": 93,
 "shares": 76 “video_views": 42
 },
 "tags": ["blog","video"]
 } ElasticsearchDataModel
  • 17. — Document based datastore — Flexible schemas, dynamic mapping, mapping templates — JSON, rich data structures, nested objects — REST APIs make integration simple — Query performance — Shards spread across nodes (versus entire MySQL DB/table on single node) — Rolling indexes for Time series data == querying only the indexes needed (versus entire MySQL table) WhyElasticsearch?
  • 18. — Search — Rich set of built-in queries — Powerful aggregations (and sub aggregations) — Scalability — More control over shards and indexes — Horizontally scale by adding more nodes and clusters — Easy to archive old data/indexes to free up resources — Meets current and *new* product requirements WhyElasticsearch?
  • 20. — Data updates are more complex — Update by query, upserts, script security issues — Not truly schema-less — Reindexing is time consuming — Adding fields, mapping conflicts — Still need custom, index management layer — Index mappings, settings, templates, naming patterns, data retention, backup/restore — Operating ES requires effort — Deployment, configuration, performance tuning, monitoring ElasticsearchTradeoffs
  • 21. — More index management — Better support for different types of indexes, each with own settings — Add APIs + Tools for operations — Avoid oversharding, which causes cluster stability issues — More focus on UPDATE operations — Field updates (i.e. tags) require update by query/script — Faster reindexing (i.e. adding new fields, changing field mappings) — Slow updates/reindexing can affect other system operations/transactions — Data denormalization vs joins — More production monitoring NextSteps