How we process half abillion mentions a day      George & Shrikar
Agenda Who we are? Some numbers about our system Open-Source Technologies we use Architecture of the System Component...
Who We Are Social Media Analytics, Monitoring and  Engagement Company  (www.viralheat.com) We are based in San Mateo, CA
Data Crunched Daily In total we ingest around 1TB of Social Data   every day to our infrastructure Social Data :   Twitt...
How we manage it? Redis Mysql Riak ElasticSearch Memcache Storm (Real time data processing) Beanstalk
Data PipelineCrawlers        Beanstalk        Processor    Elastic                     Search          Memcache    Stats  ...
Deep Dive Processor tags Social Mention with   Sentiment and Intent. Around 100 Million Social mentions every 5   hours....
Near Realtime We use Storm for near real time data   pipeline. Benefits : Scalable, fault tolerant and easy to   operate...
Q&A
Thank You
We are hiring!www.viralheat.com/company/careers/
Upcoming SlideShare
Loading in …5
×

Processing half a billion mentions daily

434 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
434
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Talk about why each component is used.
  • Individual component overview.
  • Existing Spouts / Bolts. Like MysqlSpout , Redispubsub spout etc.
  • Processing half a billion mentions daily

    1. 1. How we process half abillion mentions a day George & Shrikar
    2. 2. Agenda Who we are? Some numbers about our system Open-Source Technologies we use Architecture of the System Component Overview
    3. 3. Who We Are Social Media Analytics, Monitoring and Engagement Company (www.viralheat.com) We are based in San Mateo, CA
    4. 4. Data Crunched Daily In total we ingest around 1TB of Social Data every day to our infrastructure Social Data : Twitter, Facebook, Linkedin, Pinterest, Blogs etc.
    5. 5. How we manage it? Redis Mysql Riak ElasticSearch Memcache Storm (Real time data processing) Beanstalk
    6. 6. Data PipelineCrawlers Beanstalk Processor Elastic Search Memcache Stats Redis Storm Cluster Riak
    7. 7. Deep Dive Processor tags Social Mention with Sentiment and Intent. Around 100 Million Social mentions every 5 hours. Elasticsearch indexes and ranks the social data. Stats calculates the analytics for each keyword grouped by sentiment and intent.
    8. 8. Near Realtime We use Storm for near real time data pipeline. Benefits : Scalable, fault tolerant and easy to operate Easy to load and store data from existing databases/queues.
    9. 9. Q&A
    10. 10. Thank You
    11. 11. We are hiring!www.viralheat.com/company/careers/

    ×