SlideShare a Scribd company logo
1 of 25
Download to read offline
Big Data 
@Hootsuite Analytics 
Claudiu Coman
About us 
● You might have heard of uberVU 
● Acquired by Hootsuite to develop their new analytics 
● New scalability challenges => millions of customers 
● Everything is in the cloud => Amazon 
● 10 M social media posts per day 
● 100+ Amazon EC2 instances of various sizes 
● 650 GB media posts/month 
● 400 GB worth of analytics data 
● 10+ MongoDB clusters
What this presentation will be about 
● What is the big data we’re working with 
● Infrastructure 
● Technologies 
● What we do on top of the data 
● How we currently display the data 
● What’s in store for the future
Our data 
● Our data is made up of media posts 
● Revolves around search queries 
● You can also connect accounts and pages 
● Sources: Twitter, Facebook, Google+, Wordpress, Flickr, 
Picasa and others 
● To acquire data 
○ a lot of REST 
○ some streaming 
○ very little scraping
Mentions 
● Every piece of data that we process 
○ gets normalized 
○ annotated 
○ convert everything to a standard JSON format 
● We call the end result a mention
Mentions (2)
Annotations 
● Language detection 
○ written in C++, wrapped in Python 
○ can process ~300 mentions/second 
● Sentiment detection 
○ external provider 
○ piggybacking for efficiency 
● Location detection 
○ in-house algorithm 
○ text tokenization and matching against a locations database
The Pipeline 
● Our processing infrastructure is a pipeline 
● Producer-Consumer pattern 
● Enables us to scale parts of the infrastructure separately
The pipeline (2)
The pipeline (3) 
● 100+ consumer types 
● 450 consumer instances 
● Automatic scaling algorithm developed by us 
○ whenever a consumer falls behind, the system deploys new 
consumer instances 
○ automatically adjusts cluster size
MongoDB 
● 10+ clusters 
● Our biggest cluster 
○ 1500 operations/second 
○ m2.xlarge instances (17GB RAM, 6.5 ECU) 
○ 8x80 GB RAID10 
● Hard to manage databases in multiple clusters 
○ we wrote mongo-pool https://github.com/uberVU/mongo-pool 
● Cluster pyramid structure for cost efficiency 
● Communication between clusters through our own mongo-oplogreplay 
https://github.com/uberVU/mongo-oplogreplay
MongoDB mention clusters pyramid
Kestrel 
● Distributed message queue developed by Twitter 
● Uses Memcache protocol 
● Disk persistence 
● 400 consumer operations/second 
● Part of our pipeline core 
● Extremely reliable 
● Sending gzipped content to save I/O costs
Redis, Memcache 
● Gradually replacing Memcache with Redis 
● Used for high-access temporary information 
● 60 GB worth of data
Other technologies 
● RabbitMQ asynchronous tasks 
● DynamoDB 
○ analytics 
○ auxiliary permanent storage use cases that don’t take a lot of 
space 
● S3 for data with low access rates 
● Glacier for archived data
System metrics & monitoring 
● Graphite 
● 150K system metrics 
● Alerts are being generated based on Graphite 
● In-house alert-detection 
● Nagios/Nagstamon 
● PagerDuty for on-call
Analytics overview 
● Currently in DynamoDB 
● MongoDB still runs in parallel, considering a move back to 
MongoDB 
● Breakdowns on language, location, sentiment, gender 
● Support for several resolutions (day, hour, 15min) 
● Optimized for language and location filtering (95% of our 
queries)
Aggregation pipeline 
● We aggregate analytics to reduce writes 
● Efficient but simple concurrency through Redis primitives 
● Got a 5x improvement !
Aggregation pipeline (2)
Tagcloud 
● Tagcloud algorithm that can detect n-grams of all lengths 
● Some of the data we analyze is blog content, can be very big 
● Needed something fast 
● In-house algorithm 
○ linear complexity (doesn’t go up with max n-gram size) 
○ based on statistical correlations
Signals 
● Need to synthesize all the data we’re collecting 
● Top Stories 
○ O(1) algorithm 
● Influencers 
○ dependency graph 
○ edges are interactions between users 
● Spikes & Bursts 
○ code written in C++ to reduce time 
○ statistical algorithms on top of our analytics timeseries 
○ adapt reads from analytics based on data size
Boards 
● Needed a good way to display all this data 
● Designed Boards 
○ released this year 
○ allows you to create a dashboard with metric visualizations 
○ drag-drop widgets to arrange them your way
Boards Demo
Future plans 
● Hootsuite has millions of users 
● Our analytics infrastructure will have to scale 
○ Transitioning to streaming services 
○ Larger MongoDB clusters 
■ more shards for write throughput 
■ more secondaries for reads 
● Add more metrics
Questions ? 
claudiu.coman@hootsuite.com

More Related Content

What's hot

MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...MongoDB
 
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanDatabricks
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackMirantis
 
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Severalnines
 
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica ColoftCassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica ColoftJon Haddad
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)Mihnea Giurgea
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Miguel Pérez Colino
 
A Cheapskates Guide to AWS v2.0
A Cheapskates Guide to AWS v2.0A Cheapskates Guide to AWS v2.0
A Cheapskates Guide to AWS v2.0Michael Soh
 
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices confluent
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucFraugster
 
Zentrales logging mit dem Elastic Stack
Zentrales logging mit dem Elastic StackZentrales logging mit dem Elastic Stack
Zentrales logging mit dem Elastic StackSimonSchneider24
 
MongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMaxim Ligus
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series databasefelixbarny
 
A Cheapskates Guide to AWS
A Cheapskates Guide to AWSA Cheapskates Guide to AWS
A Cheapskates Guide to AWSMichael Soh
 
IOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTIOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTMongoDB
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYEKnoldus Inc.
 

What's hot (20)

Windows Azure Tables e NoSQL
Windows Azure Tables e NoSQLWindows Azure Tables e NoSQL
Windows Azure Tables e NoSQL
 
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
 
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
 
Introducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStackIntroducing MagnetoDB, a key-value storage sevice for OpenStack
Introducing MagnetoDB, a key-value storage sevice for OpenStack
 
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
 
Cassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica ColoftCassandra meetup slides - Oct 15 Santa Monica Coloft
Cassandra meetup slides - Oct 15 Santa Monica Coloft
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
 
A Cheapskates Guide to AWS v2.0
A Cheapskates Guide to AWS v2.0A Cheapskates Guide to AWS v2.0
A Cheapskates Guide to AWS v2.0
 
Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices Kafka as an Eventing System to Replatform a Monolith into Microservices
Kafka as an Eventing System to Replatform a Monolith into Microservices
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
Zentrales logging mit dem Elastic Stack
Zentrales logging mit dem Elastic StackZentrales logging mit dem Elastic Stack
Zentrales logging mit dem Elastic Stack
 
MongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of Events
 
Elasticsearch as a time series database
Elasticsearch as a time series databaseElasticsearch as a time series database
Elasticsearch as a time series database
 
A Cheapskates Guide to AWS
A Cheapskates Guide to AWSA Cheapskates Guide to AWS
A Cheapskates Guide to AWS
 
IOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOTIOT Paris Seminar 2015 - Storage Challenges in IOT
IOT Paris Seminar 2015 - Storage Challenges in IOT
 
Redis IU
Redis IURedis IU
Redis IU
 
Logging for Containers
Logging for ContainersLogging for Containers
Logging for Containers
 
Going Serverless
Going Serverless Going Serverless
Going Serverless
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYE
 

Similar to Big data @ Hootsuite analtyics

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...Anna Ossowski
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsAlluxio, Inc.
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGPablo Garbossa
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database AuditingJuan Berner
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 

Similar to Big data @ Hootsuite analtyics (20)

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORINGEko10 workshop - OPEN SOURCE DATABASE MONITORING
Eko10 workshop - OPEN SOURCE DATABASE MONITORING
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

Big data @ Hootsuite analtyics

  • 1. Big Data @Hootsuite Analytics Claudiu Coman
  • 2. About us ● You might have heard of uberVU ● Acquired by Hootsuite to develop their new analytics ● New scalability challenges => millions of customers ● Everything is in the cloud => Amazon ● 10 M social media posts per day ● 100+ Amazon EC2 instances of various sizes ● 650 GB media posts/month ● 400 GB worth of analytics data ● 10+ MongoDB clusters
  • 3. What this presentation will be about ● What is the big data we’re working with ● Infrastructure ● Technologies ● What we do on top of the data ● How we currently display the data ● What’s in store for the future
  • 4. Our data ● Our data is made up of media posts ● Revolves around search queries ● You can also connect accounts and pages ● Sources: Twitter, Facebook, Google+, Wordpress, Flickr, Picasa and others ● To acquire data ○ a lot of REST ○ some streaming ○ very little scraping
  • 5. Mentions ● Every piece of data that we process ○ gets normalized ○ annotated ○ convert everything to a standard JSON format ● We call the end result a mention
  • 7. Annotations ● Language detection ○ written in C++, wrapped in Python ○ can process ~300 mentions/second ● Sentiment detection ○ external provider ○ piggybacking for efficiency ● Location detection ○ in-house algorithm ○ text tokenization and matching against a locations database
  • 8. The Pipeline ● Our processing infrastructure is a pipeline ● Producer-Consumer pattern ● Enables us to scale parts of the infrastructure separately
  • 10. The pipeline (3) ● 100+ consumer types ● 450 consumer instances ● Automatic scaling algorithm developed by us ○ whenever a consumer falls behind, the system deploys new consumer instances ○ automatically adjusts cluster size
  • 11. MongoDB ● 10+ clusters ● Our biggest cluster ○ 1500 operations/second ○ m2.xlarge instances (17GB RAM, 6.5 ECU) ○ 8x80 GB RAID10 ● Hard to manage databases in multiple clusters ○ we wrote mongo-pool https://github.com/uberVU/mongo-pool ● Cluster pyramid structure for cost efficiency ● Communication between clusters through our own mongo-oplogreplay https://github.com/uberVU/mongo-oplogreplay
  • 13. Kestrel ● Distributed message queue developed by Twitter ● Uses Memcache protocol ● Disk persistence ● 400 consumer operations/second ● Part of our pipeline core ● Extremely reliable ● Sending gzipped content to save I/O costs
  • 14. Redis, Memcache ● Gradually replacing Memcache with Redis ● Used for high-access temporary information ● 60 GB worth of data
  • 15. Other technologies ● RabbitMQ asynchronous tasks ● DynamoDB ○ analytics ○ auxiliary permanent storage use cases that don’t take a lot of space ● S3 for data with low access rates ● Glacier for archived data
  • 16. System metrics & monitoring ● Graphite ● 150K system metrics ● Alerts are being generated based on Graphite ● In-house alert-detection ● Nagios/Nagstamon ● PagerDuty for on-call
  • 17. Analytics overview ● Currently in DynamoDB ● MongoDB still runs in parallel, considering a move back to MongoDB ● Breakdowns on language, location, sentiment, gender ● Support for several resolutions (day, hour, 15min) ● Optimized for language and location filtering (95% of our queries)
  • 18. Aggregation pipeline ● We aggregate analytics to reduce writes ● Efficient but simple concurrency through Redis primitives ● Got a 5x improvement !
  • 20. Tagcloud ● Tagcloud algorithm that can detect n-grams of all lengths ● Some of the data we analyze is blog content, can be very big ● Needed something fast ● In-house algorithm ○ linear complexity (doesn’t go up with max n-gram size) ○ based on statistical correlations
  • 21. Signals ● Need to synthesize all the data we’re collecting ● Top Stories ○ O(1) algorithm ● Influencers ○ dependency graph ○ edges are interactions between users ● Spikes & Bursts ○ code written in C++ to reduce time ○ statistical algorithms on top of our analytics timeseries ○ adapt reads from analytics based on data size
  • 22. Boards ● Needed a good way to display all this data ● Designed Boards ○ released this year ○ allows you to create a dashboard with metric visualizations ○ drag-drop widgets to arrange them your way
  • 24. Future plans ● Hootsuite has millions of users ● Our analytics infrastructure will have to scale ○ Transitioning to streaming services ○ Larger MongoDB clusters ■ more shards for write throughput ■ more secondaries for reads ● Add more metrics