Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014


Published on

Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.

Published in: Technology
  • Be the first to comment

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014

  1. 1. © 2014, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of, Inc. November 13, 2014 | Las Vegas, NV ARC202Real-World Real-Time Analytics Gustavo Arjones| @arjones CTO, Socialmetrix Sebastian Montini | @sebamontini Solutions Architect, Socialmetrix
  2. 2. •SaaS Company—since 2008 •Social media analytics track and measure activity of brands and personality, providing information to market research and brand comparison •Multilanguagetechnology(English, Portuguese, and Spanish) •Leader in Latin America, with operations in 5 countries, customers in Latin Americaand US •1 out of 34 Twitter Certified Program worldwide
  3. 3. Our customers
  4. 4. Ranking Brand 1 Brand 2 Brand 3 Q2 Q3 Q2 Q3 Q2 Q3 1° Flavor Breakfast Flavor Flavor Advertising Flavor 2° Healthy Flavor Packaging Brand I love Flavor Breakfast 3° Components Components Healthy Packaging Healthy Healthy 4° Advertising Healthy Components Addiction Components Advertising 5° Enquires Desire Prices Consumption Prices Components TOTAL 1.401 8.189 463 5.519 1.081 2.445 Share of topics Which conversations are my brand and my competitors’ brands driving?
  5. 5. #reinvent
  6. 6. Challenges
  7. 7. Challenges: Variety •Different data sources •Different API •SLA •Method (pull or push) •Rate-limit, backoff strategy
  8. 8. Challenges: Velocity •Updates every second •Top users, top hashtags each minute •After event analysis are made with batch over complete dataset •Spikes of 20,000+ tweets per minute
  9. 9. Last TV Debate Results Announced Challenges: Velocity
  10. 10. Challenges: Meaning •Disambiguation •DataEnrichment –Demographics –Sentiment –Influencers •Humananalysis PAN Orange Telecom Oi Telecom Hi!
  11. 11. Challenges: Alert and report •Clear and understandable UI •Slice-dice for business (not BI experts) •Real-time alerts for anomalies
  12. 12. Architecture evolution
  13. 13. Drivers for architecture evolution •More customers, bigger customers •Add new features •Keep costsunder control
  14. 14. Architecture evolution 0 20 40 60 80 100 120 #1 #2 #3 #4 Active Customers
  15. 15. Architecture—1stiteration What we needed: •Complete data isolation •Trying different solutions/offerings
  16. 16. Architecture—1stiteration What we did: •All-in-one approach •Multi-instance architecture •Simple vertical scalability •MySQL performance tuning
  17. 17. Architecture—1stiteration What we've learned: •Multi-instance is harder to administrate, but minimizes instability impact on customers •Vertical scalability: poor resource management •MySQL schema changes translate into downtime
  18. 18. Architecture—2nditeration What we needed: •Separation of responsibilities (crawling, processing) •Horizontal scalability •Fast provisioning •Cost reduction
  19. 19. Architecture—2nditeration What we changed: •Migrated to AWS •RabbitMQ (Single Node) •Replace MySQL for Amazon RDS •AWS CloudFormation •Auto Scaling groups
  20. 20. Architecture—2nditeration What we've learned: •PIOPS  •Tuning theAuto Scaling policiescan be hard •AWS CloudFormation: great for migration, not enough for daily ops
  21. 21. Architecture—3rditeration What we needed: •Delivernew features (NRT, more complex analytics) •Scalefast •Be resilient against failure •Addingand improvingdata sources •Keepcosts under control (always)
  22. 22. Architecture—3rditeration What we changed: •Apache Storm •RabbitMQ HA •Amazon ElasticMapReduce (Hadoop/Hive) •AWS CloudFormation + Chef •Amazon Glacier + Amazon S3 lifecyclespolicies
  23. 23. Architecture—3rditeration What we've learned: •Spot Instances+ ReservedInstances •Hive= SQL SQL scripts are hard to test •BulkupsertsonAmazon RDS can be expensive (PIOPS) •Amazon DynamoDB is great, but expensive (for our use-case)
  24. 24. Dashboard
  25. 25. Architecture—4thiteration What we needed: •Monitor millions of social media profiles •Make data accessible (exploration, PoC) •Improve UI response times •Testing our data pipelines •Reprocessing (faster)
  26. 26. Architecture—4th iteration What we changed: • Cassandra (DSE) • MongoDB MMS • Apache Spark
  27. 27. What we've learned: •Leverage AWS ecosystem •DatastaxAMI + Opscenterintegration •MongoDBMMS: automation magic! •Apache Spark unit testing + Amazon EC2 launch scripts •Amazon EMR doesn’t have the latest stable versions Architecture—4thiteration
  28. 28. Architecture evolution - 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 #1 #2 #3 #4 Active Customers Costs Customers
  29. 29. Lessons learned
  30. 30. Lessons learned •Automatesince Day 1 (CloudFormation + Chef) •Monitor systems activity, understand your data patterns, e.g. LogStash(ELK) •Always have a Source of Truth (Amazon S3 + Glacier) •Make your Source of Truth searchable
  31. 31. Lessons Learned (II) •Approximation is a good thing: HLL, CMS, Bloom •Write your pipelines considering reprocessingneeds •Avoidat all costs framework explosion •AWS ecosystem allows rapid prototype
  32. 32. Socialmetrix NextGen2015
  33. 33. Architecture evolution 0 20 40 60 80 100 120 #1 #2 #3 #4 Active Customers
  34. 34. Architecture nextgen •Reduce moving parts •Apache Spark as central processing framework –Realtime(Micro-batch) –Batch-processing •Kafka or Amazon Kinesis(Message Broker) •Cassandra(Time-series storage) •ElasticSearch(Content Indexer)
  35. 35. To infinity … and beyond! Architecture evolution 0 20 40 60 80 100 120 #1 #2 #3 #4 NextGen Active Customers
  36. 36. Gustavo Arjones, CTO @arjones | Sebastian Montini, Solutions Architect @sebamontini | Feedbackand QandA
  37. 37.