SlideShare a Scribd company logo
ARC202 
Real-World Real-Time Analytics 
Gustavo Arjones | @arjones 
CTO, Socialmetrix 
November 13, 2014 | Las Vegas, NV 
Sebastian Montini | @sebamontini 
Solutions Architect, Socialmetrix 
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
• SaaS Company—since 2008 
• Social media analytics track and measure activity 
of brands and personality, providing information to 
market research and brand comparison 
• Multilanguage technology (English, Portuguese, 
and Spanish) 
• Leader in Latin America, with operations in 5 
countries, customers in Latin America and US 
• 1 out of 34 Twitter Certified Program worldwide
Our customers
Share of topics 
Ranking Brand 1 Brand 2 Brand 3 
Q2 Q3 Q2 Q3 Q2 Q3 
1° Flavor Breakfast Flavor Flavor Advertising Flavor 
2° Healthy Flavor Packaging Brand I love Flavor Breakfast 
3° Components Components Healthy Packaging Healthy Healthy 
4° Advertising Healthy Components Addiction Components Advertising 
5° Enquires Desire Prices Consumption Prices Components 
TOTAL 1.401 8.189 463 5.519 1.081 2.445 
Which conversations are my brand and my competitors’ brands driving?
smx.io/reinvent #reinvent
Challenges
Challenges: Variety 
• Different data sources 
• Different API 
• SLA 
• Method (pull or push) 
• Rate-limit, backoff strategy
Challenges: Velocity 
• Updates every second 
• Top users, top hashtags each 
minute 
Last TV 
Debate 
• After event analysis are made 
with batch over complete 
dataset 
• Spikes of 20,000+ tweets per 
minute 
Results 
Announced
Challenges: Meaning 
•Disambiguation 
•Data Enrichment 
– Demographics 
– Sentiment 
– Influencers 
•Human analysis 
Oi Telecom Hi! 
PAN 
Orange Telecom
Challenges: Alert and report 
• Clear and 
understandable UI 
• Slice-dice for business 
(not BI experts) 
• Real-time alerts for 
anomalies
Architecture evolution
Drivers for architecture evolution 
• More customers, bigger customers 
• Add new features 
• Keep costs under control
Architecture evolution 
120 
100 
80 
60 
40 
20 
0 
#1 #2 #3 #4 
Active Customers
Architecture—1st iteration 
What we needed: 
• Complete data isolation 
• Trying different solutions/offerings
Architecture—1st iteration 
What we did: 
• All-in-one approach 
• Multi-instance architecture 
• Simple vertical scalability 
• MySQL performance tuning
Architecture—1st iteration 
What we've learned: 
• Multi-instance is harder to administrate, but 
minimizes instability impact on customers 
• Vertical scalability: poor resource management 
• MySQL schema changes translate into downtime
Architecture—2nd iteration 
What we needed: 
• Separation of responsibilities (crawling, processing) 
• Horizontal scalability 
• Fast provisioning 
• Cost reduction
Architecture—2nd iteration 
What we changed: 
• Migrated to AWS 
• RabbitMQ (Single Node) 
• Replace MySQL for 
Amazon RDS 
• AWS CloudFormation 
• Auto Scaling groups
Architecture—2nd iteration 
What we've learned: 
• PIOPS  
• Tuning the Auto Scaling policies can be hard 
• AWS CloudFormation: great for migration, not 
enough for daily ops
Architecture—3rd iteration 
What we needed: 
• Deliver new features (NRT, more complex analytics) 
• Scale fast 
• Be resilient against failure 
• Adding and improving data sources 
• Keep costs under control (always)
Architecture—3rd iteration 
What we changed: 
• Apache Storm 
• RabbitMQ HA 
• Amazon Elastic MapReduce 
(Hadoop/Hive) 
• AWS CloudFormation + Chef 
• Amazon Glacier + Amazon S3 
lifecycles policies
Architecture—3rd iteration 
What we've learned: 
• Spot Instances + Reserved Instances 
• Hive = SQL  SQL scripts are hard to test 
• Bulk upserts on Amazon RDS can be expensive 
(PIOPS) 
• Amazon DynamoDB is great, but expensive (for 
our use-case)
Dashboard
Architecture—4th iteration 
What we needed: 
• Monitor millions of social media profiles 
• Make data accessible (exploration, PoC) 
• Improve UI response times 
• Testing our data pipelines 
• Reprocessing (faster)
Architecture—4th iteration 
What we changed: 
• Cassandra (DSE) 
• MongoDB MMS 
• Apache Spark
Architecture—4th iteration 
What we've learned: 
• Leverage AWS ecosystem 
• Datastax AMI + Opscenter integration 
• MongoDB MMS: automation magic! 
• Apache Spark unit testing + Amazon EC2 
launch scripts 
• Amazon EMR doesn’t have the latest stable 
versions
Architecture evolution 
160 
140 
120 
100 
80 
60 
40 
20 
- 
120 
100 
80 
60 
40 
20 
0 
#1 #2 #3 #4 
Active Customers 
Costs Customers
Lessons learned
Lessons learned 
• Automate since Day 1 (CloudFormation + Chef) 
• Monitor systems activity, understand your data 
patterns, e.g. LogStash (ELK) 
• Always have a Source of Truth (Amazon S3 + 
Glacier) 
• Make your Source of Truth searchable
Lessons Learned (II) 
•Approximation is a good thing: HLL, CMS, Bloom 
• Write your pipelines considering reprocessing 
needs 
• Avoid at all costs framework explosion 
•AWS ecosystem allows rapid prototype
Socialmetrix NextGen 
2015
Architecture evolution 
120 
100 
80 
60 
40 
20 
0 
#1 #2 #3 #4 
Active Customers
Architecture nextgen 
• Reduce moving parts 
• Apache Spark as central processing framework 
– Realtime (Micro-batch) 
– Batch-processing 
• Kafka (Message Broker) 
• Cassandra (Time-series storage) 
• ElasticSearch (Content Indexer)
To infinity … 
and beyond! Architecture evolution 
120 
100 
80 
60 
40 
20 
0 
#1 #2 #3 #4 NextGen 
Active Customers
Feedback and QandA 
Gustavo Arjones, CTO 
@arjones | gustavo@socialmetrix.com 
Sebastian Montini, Solutions Architect 
@sebamontini | sebastian@socialmetrix.com 
Let’s talk at Venetian—Titian Hallway
Please give us your feedback on this 
presentation 
ARC202: Real-World 
Real-Time Analytics 
Thank you! 
Join the conversation on Twitter with #reinvent 
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

More Related Content

What's hot

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Fastly
 
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB
 
WSO2 Intro Webinar - Simplifying Enterprise Integration with Configurable WS...
WSO2 Intro Webinar -  Simplifying Enterprise Integration with Configurable WS...WSO2 Intro Webinar -  Simplifying Enterprise Integration with Configurable WS...
WSO2 Intro Webinar - Simplifying Enterprise Integration with Configurable WS...WSO2
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Rekha Joshi
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
HostedbyConfluent
 
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
Eddie Satterly
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
HostedbyConfluent
 
RedisConf18 - Redis Analytics Use Cases
RedisConf18 - Redis Analytics Use CasesRedisConf18 - Redis Analytics Use Cases
RedisConf18 - Redis Analytics Use Cases
Redis Labs
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
Amazon Web Services
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
Jampp
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
GlobalLogic Ukraine
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...Pragmatic CQRS with existing applications and databases (Digital Xchange, May...
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...
Lucas Jellema
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
HostedbyConfluent
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
Cloud , DC Advisory & Transformation – Success Stories | Happiest Minds
Cloud , DC Advisory & Transformation – Success Stories | Happiest MindsCloud , DC Advisory & Transformation – Success Stories | Happiest Minds
Cloud , DC Advisory & Transformation – Success Stories | Happiest Minds
Happiest Minds Technologies
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
 
Pyramid vs QlikView
Pyramid vs QlikViewPyramid vs QlikView
Pyramid vs QlikView
Pyramid Analytics
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynote
sparktc
 

What's hot (20)

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...
 
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
 
WSO2 Intro Webinar - Simplifying Enterprise Integration with Configurable WS...
WSO2 Intro Webinar -  Simplifying Enterprise Integration with Configurable WS...WSO2 Intro Webinar -  Simplifying Enterprise Integration with Configurable WS...
WSO2 Intro Webinar - Simplifying Enterprise Integration with Configurable WS...
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
RedisConf18 - Redis Analytics Use Cases
RedisConf18 - Redis Analytics Use CasesRedisConf18 - Redis Analytics Use Cases
RedisConf18 - Redis Analytics Use Cases
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...Pragmatic CQRS with existing applications and databases (Digital Xchange, May...
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...
 
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
Evolving the Engineering Culture to Manage Kafka as a Service | Kate Agnew, O...
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Cloud , DC Advisory & Transformation – Success Stories | Happiest Minds
Cloud , DC Advisory & Transformation – Success Stories | Happiest MindsCloud , DC Advisory & Transformation – Success Stories | Happiest Minds
Cloud , DC Advisory & Transformation – Success Stories | Happiest Minds
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Pyramid vs QlikView
Pyramid vs QlikViewPyramid vs QlikView
Pyramid vs QlikView
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynote
 

Similar to ARC202:real world real time analytics

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
Amazon Web Services
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
2nd Watch
 
How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...
Sebastien Goiffon
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
Amazon Web Services
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
AWS Dublin Briefing - Logentries Customer Presentation
AWS Dublin Briefing - Logentries Customer PresentationAWS Dublin Briefing - Logentries Customer Presentation
AWS Dublin Briefing - Logentries Customer Presentation
Amazon Web Services
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Amazon Web Services
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
Daniel Austin
 
Bdf16 big-data-warehouse-case-study-data kitchen
Bdf16 big-data-warehouse-case-study-data kitchenBdf16 big-data-warehouse-case-study-data kitchen
Bdf16 big-data-warehouse-case-study-data kitchen
Christopher Bergh
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
Carter Wickstrom
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
Christian Beedgen
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
Amazon Web Services
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
Sri Ambati
 
Le big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entrepriseLe big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entreprise
Rubedo, a WebTales solution
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
API and Big Data Solution Patterns
API and Big Data Solution Patterns API and Big Data Solution Patterns
API and Big Data Solution Patterns WSO2
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
BrandonWilhelm4
 
FEALTY TECHNOLOGIES Portfolio - LATEST.pptx
FEALTY TECHNOLOGIES Portfolio - LATEST.pptxFEALTY TECHNOLOGIES Portfolio - LATEST.pptx
FEALTY TECHNOLOGIES Portfolio - LATEST.pptx
AmarVirdi2
 

Similar to ARC202:real world real time analytics (20)

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
 
How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
AWS Dublin Briefing - Logentries Customer Presentation
AWS Dublin Briefing - Logentries Customer PresentationAWS Dublin Briefing - Logentries Customer Presentation
AWS Dublin Briefing - Logentries Customer Presentation
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
Bdf16 big-data-warehouse-case-study-data kitchen
Bdf16 big-data-warehouse-case-study-data kitchenBdf16 big-data-warehouse-case-study-data kitchen
Bdf16 big-data-warehouse-case-study-data kitchen
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
 
Using AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics ServiceUsing AWS To Build A Scalable Machine Data Analytics Service
Using AWS To Build A Scalable Machine Data Analytics Service
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Le big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entrepriseLe big data à l'épreuve des projets d'entreprise
Le big data à l'épreuve des projets d'entreprise
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
API and Big Data Solution Patterns
API and Big Data Solution Patterns API and Big Data Solution Patterns
API and Big Data Solution Patterns
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
FEALTY TECHNOLOGIES Portfolio - LATEST.pptx
FEALTY TECHNOLOGIES Portfolio - LATEST.pptxFEALTY TECHNOLOGIES Portfolio - LATEST.pptx
FEALTY TECHNOLOGIES Portfolio - LATEST.pptx
 

More from Sebastian Montini

Ansible 202 - sysarmy
Ansible 202 - sysarmyAnsible 202 - sysarmy
Ansible 202 - sysarmy
Sebastian Montini
 
Ansible 202
Ansible 202Ansible 202
Ansible 202
Sebastian Montini
 
Not all that glitter is AWS - Nerdearla2016
Not all that glitter is AWS - Nerdearla2016Not all that glitter is AWS - Nerdearla2016
Not all that glitter is AWS - Nerdearla2016
Sebastian Montini
 
Designed to fail
Designed to failDesigned to fail
Designed to fail
Sebastian Montini
 
Devopsconf 2015 sebamontini
Devopsconf 2015 sebamontiniDevopsconf 2015 sebamontini
Devopsconf 2015 sebamontini
Sebastian Montini
 
Startup Weekend 2015
Startup Weekend 2015Startup Weekend 2015
Startup Weekend 2015
Sebastian Montini
 
Aws meetup (sep 2015) exprimir cada centavo
Aws meetup (sep 2015)   exprimir cada centavoAws meetup (sep 2015)   exprimir cada centavo
Aws meetup (sep 2015) exprimir cada centavo
Sebastian Montini
 
AWS Anti patterns
AWS Anti patternsAWS Anti patterns
AWS Anti patterns
Sebastian Montini
 
Debian vs. Ubuntu
Debian vs. UbuntuDebian vs. Ubuntu
Debian vs. Ubuntu
Sebastian Montini
 
software libre al servicio de la educacion
software libre al servicio de la educacionsoftware libre al servicio de la educacion
software libre al servicio de la educacionSebastian Montini
 
gnu/linux: la libertad a un paso de distancia
gnu/linux: la libertad a un paso de distanciagnu/linux: la libertad a un paso de distancia
gnu/linux: la libertad a un paso de distanciaSebastian Montini
 

More from Sebastian Montini (14)

Ansible 202 - sysarmy
Ansible 202 - sysarmyAnsible 202 - sysarmy
Ansible 202 - sysarmy
 
Ansible 202
Ansible 202Ansible 202
Ansible 202
 
Not all that glitter is AWS - Nerdearla2016
Not all that glitter is AWS - Nerdearla2016Not all that glitter is AWS - Nerdearla2016
Not all that glitter is AWS - Nerdearla2016
 
Designed to fail
Designed to failDesigned to fail
Designed to fail
 
Devopsconf 2015 sebamontini
Devopsconf 2015 sebamontiniDevopsconf 2015 sebamontini
Devopsconf 2015 sebamontini
 
Startup Weekend 2015
Startup Weekend 2015Startup Weekend 2015
Startup Weekend 2015
 
Aws meetup (sep 2015) exprimir cada centavo
Aws meetup (sep 2015)   exprimir cada centavoAws meetup (sep 2015)   exprimir cada centavo
Aws meetup (sep 2015) exprimir cada centavo
 
AWS Anti patterns
AWS Anti patternsAWS Anti patterns
AWS Anti patterns
 
Mentalidad Alternativa 08
Mentalidad Alternativa 08Mentalidad Alternativa 08
Mentalidad Alternativa 08
 
freeNas
freeNasfreeNas
freeNas
 
Awesome
AwesomeAwesome
Awesome
 
Debian vs. Ubuntu
Debian vs. UbuntuDebian vs. Ubuntu
Debian vs. Ubuntu
 
software libre al servicio de la educacion
software libre al servicio de la educacionsoftware libre al servicio de la educacion
software libre al servicio de la educacion
 
gnu/linux: la libertad a un paso de distancia
gnu/linux: la libertad a un paso de distanciagnu/linux: la libertad a un paso de distancia
gnu/linux: la libertad a un paso de distancia
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

ARC202:real world real time analytics

  • 1. ARC202 Real-World Real-Time Analytics Gustavo Arjones | @arjones CTO, Socialmetrix November 13, 2014 | Las Vegas, NV Sebastian Montini | @sebamontini Solutions Architect, Socialmetrix © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. • SaaS Company—since 2008 • Social media analytics track and measure activity of brands and personality, providing information to market research and brand comparison • Multilanguage technology (English, Portuguese, and Spanish) • Leader in Latin America, with operations in 5 countries, customers in Latin America and US • 1 out of 34 Twitter Certified Program worldwide
  • 4.
  • 5.
  • 6. Share of topics Ranking Brand 1 Brand 2 Brand 3 Q2 Q3 Q2 Q3 Q2 Q3 1° Flavor Breakfast Flavor Flavor Advertising Flavor 2° Healthy Flavor Packaging Brand I love Flavor Breakfast 3° Components Components Healthy Packaging Healthy Healthy 4° Advertising Healthy Components Addiction Components Advertising 5° Enquires Desire Prices Consumption Prices Components TOTAL 1.401 8.189 463 5.519 1.081 2.445 Which conversations are my brand and my competitors’ brands driving?
  • 9. Challenges: Variety • Different data sources • Different API • SLA • Method (pull or push) • Rate-limit, backoff strategy
  • 10. Challenges: Velocity • Updates every second • Top users, top hashtags each minute Last TV Debate • After event analysis are made with batch over complete dataset • Spikes of 20,000+ tweets per minute Results Announced
  • 11. Challenges: Meaning •Disambiguation •Data Enrichment – Demographics – Sentiment – Influencers •Human analysis Oi Telecom Hi! PAN Orange Telecom
  • 12. Challenges: Alert and report • Clear and understandable UI • Slice-dice for business (not BI experts) • Real-time alerts for anomalies
  • 14. Drivers for architecture evolution • More customers, bigger customers • Add new features • Keep costs under control
  • 15. Architecture evolution 120 100 80 60 40 20 0 #1 #2 #3 #4 Active Customers
  • 16. Architecture—1st iteration What we needed: • Complete data isolation • Trying different solutions/offerings
  • 17. Architecture—1st iteration What we did: • All-in-one approach • Multi-instance architecture • Simple vertical scalability • MySQL performance tuning
  • 18. Architecture—1st iteration What we've learned: • Multi-instance is harder to administrate, but minimizes instability impact on customers • Vertical scalability: poor resource management • MySQL schema changes translate into downtime
  • 19. Architecture—2nd iteration What we needed: • Separation of responsibilities (crawling, processing) • Horizontal scalability • Fast provisioning • Cost reduction
  • 20. Architecture—2nd iteration What we changed: • Migrated to AWS • RabbitMQ (Single Node) • Replace MySQL for Amazon RDS • AWS CloudFormation • Auto Scaling groups
  • 21. Architecture—2nd iteration What we've learned: • PIOPS  • Tuning the Auto Scaling policies can be hard • AWS CloudFormation: great for migration, not enough for daily ops
  • 22. Architecture—3rd iteration What we needed: • Deliver new features (NRT, more complex analytics) • Scale fast • Be resilient against failure • Adding and improving data sources • Keep costs under control (always)
  • 23. Architecture—3rd iteration What we changed: • Apache Storm • RabbitMQ HA • Amazon Elastic MapReduce (Hadoop/Hive) • AWS CloudFormation + Chef • Amazon Glacier + Amazon S3 lifecycles policies
  • 24. Architecture—3rd iteration What we've learned: • Spot Instances + Reserved Instances • Hive = SQL  SQL scripts are hard to test • Bulk upserts on Amazon RDS can be expensive (PIOPS) • Amazon DynamoDB is great, but expensive (for our use-case)
  • 26. Architecture—4th iteration What we needed: • Monitor millions of social media profiles • Make data accessible (exploration, PoC) • Improve UI response times • Testing our data pipelines • Reprocessing (faster)
  • 27. Architecture—4th iteration What we changed: • Cassandra (DSE) • MongoDB MMS • Apache Spark
  • 28. Architecture—4th iteration What we've learned: • Leverage AWS ecosystem • Datastax AMI + Opscenter integration • MongoDB MMS: automation magic! • Apache Spark unit testing + Amazon EC2 launch scripts • Amazon EMR doesn’t have the latest stable versions
  • 29.
  • 30. Architecture evolution 160 140 120 100 80 60 40 20 - 120 100 80 60 40 20 0 #1 #2 #3 #4 Active Customers Costs Customers
  • 32. Lessons learned • Automate since Day 1 (CloudFormation + Chef) • Monitor systems activity, understand your data patterns, e.g. LogStash (ELK) • Always have a Source of Truth (Amazon S3 + Glacier) • Make your Source of Truth searchable
  • 33. Lessons Learned (II) •Approximation is a good thing: HLL, CMS, Bloom • Write your pipelines considering reprocessing needs • Avoid at all costs framework explosion •AWS ecosystem allows rapid prototype
  • 35. Architecture evolution 120 100 80 60 40 20 0 #1 #2 #3 #4 Active Customers
  • 36. Architecture nextgen • Reduce moving parts • Apache Spark as central processing framework – Realtime (Micro-batch) – Batch-processing • Kafka (Message Broker) • Cassandra (Time-series storage) • ElasticSearch (Content Indexer)
  • 37. To infinity … and beyond! Architecture evolution 120 100 80 60 40 20 0 #1 #2 #3 #4 NextGen Active Customers
  • 38. Feedback and QandA Gustavo Arjones, CTO @arjones | gustavo@socialmetrix.com Sebastian Montini, Solutions Architect @sebamontini | sebastian@socialmetrix.com Let’s talk at Venetian—Titian Hallway
  • 39. Please give us your feedback on this presentation ARC202: Real-World Real-Time Analytics Thank you! Join the conversation on Twitter with #reinvent © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.