SlideShare a Scribd company logo
Term paper presented by:
• Akhtar S.Quereshi
• Anurag Arora
• Divya Gandhi
• Nishant Goyal
DDBMS term paper 1
Twitter tale of big data!
3 years, 2 months and 1 day. The time it took from the first Tweet to the billionth Tweet.
1 week. The time it took for users to send a billion Tweets in 2011.
50 million. The average number of Tweets people sent per day, 2010.
140 million. The average number of Tweets people sent per day, February 2011.
177 million. Tweets sent on March 11, 2011.
Half a billion tweets sent per day in Oct 2012.
572,000. Number of new accounts created on March 12, 2011.
460,000. Average number of new accounts per day over February 2011.
DDBMS term paper 2
Real-time challenge
DDBMS term paper 3
DDBMS term paper 4
Agenda of ppt
• Managing social graphs- FlockDB
• Sharding- Gizzard
• Real time data processing/storing:
Hadoop/Storm
DDBMS term paper 5
FlockDB- built over MySQL
Maintaining social graph and query processing
DDBMS term paper 6
DDBMS term paper 7
Challenges
• Timeline needs to rapidly go through the
*following* list of user and quickly display all
their tweets (sorted recency based)
• Answer queries like "What's the intersection of
people I follow and people who are following
President Obama?"
• Handle heavy write traffic, as followers are added or
removed.
DDBMS term paper 8
DDBMS term paper 9
These features are difficult to
implement in a traditional
relational database.
DDBMS term paper 10
What is FlockDB?
• FlockDB is a distributed graph database for storing
adjacency lists.
• Optimized not for graph traversal but very large
adjacency lists and fast read/writes.
• It is able to support:
– a high rate of add/update/remove operations.
– potentially complex set arithmetic queries.
– paging through query result sets containing millions
of entries.
– ability to "archive" and later restore archived edges.
DDBMS term paper 11
How FlockDB deals with challenges?
• FlockDB database stores all information as edge
attributes in the graph.
• The four major attributes in the adjacency list
DDBMS term paper 12
• Each edge is actually stored twice.
forward: Nick follows Robey at 9:54 today.
backward: Robey is followed by Nick at 9:54 today.
• "Who follows me?" is just as efficient as
"Who do I follow?”
DDBMS term paper 13
"What's the intersection of people I follow and
people who are following President Obama?“
.
This can be answered quickly by decomposing it into single-user
query: "Who is following President Obama?“
 Data is partitioned by node, so these queries can
each be answered by a single partition, using an
indexed range query.
 Paging through long result sets is done by using the
position field(timestamp) as a cursor.
DDBMS term paper 14
Gizzard Framework is used to query
the flockDB distributed datastore.
And to handle the partitioning layer
DDBMS term paper 15
What’s ‘Sharding’
DDBMS term paper 16
Sharding
= Partitioning + Replication
The problem is: sharding is difficult.
Determining smart partitioning schemes for
particular kinds of data requires a lot of
thought. And even more difficult is ensuring
that all of the copies of the data are consistent
despite unreliable communication and
occasional computer failures.
DDBMS term paper 17
Sharding
The advantages of sharding are:
• High availability
• Faster Queries
How is sharding different than
traditional architectures?
DDBMS term paper 18
How is sharding different than
traditional architectures?
• Data are parallelized across many datastores
• Data are more highly available.
• It doesn't use replication
• Data are denormalized
DDBMS term paper 19
Gizzard
DDBMS term paper 20
Gizzard
Gizzard is a framework that offers a basic
template for solving a certain class of problem.
DDBMS term paper 21
Gizzard
Here are some key features of "Gizzard"
 Gizzard supports any datastorage backend
 Gizzard handles partitioning through a forwarding table
 Gizzard is middleware
 Gizzard handles replication through a replication tree
 Gizzard is fault-tolerant
 Gizzard supports migrations
 Gizzard handles write conflicts
DDBMS term paper 22
How does
‘Gizzard’ work
DDBMS term paper 23
How does it work ?
Gizzard is middleware
It sits “in the middle” between clients (web front-ends like PHP and Ruby
on Rails applications) and the many partitions and replicas of data hence
all the data manipulation flow through Gizzard.
DDBMS term paper 24
Architecture
Web/App Server
Gizzard
MySQL
Stateless
DDBMS term paper 25
How does it work ?
Gizzard handles partitioning through a
forwarding table
Gizzard handles partitioning by mappings ranges of data to particular
shards.
Stored in a forwarding table
DDBMS term paper 26
Partitioning
• Define a function Fun( id )
• Ranges do not have to be
equal
DDBMS term paper 27
How does it work ?
Gizzard handles replication through a
replication tree
Each shard referenced in the forwarding table can be either a physical
shard or a logical shard.
A physical shard is a reference to a particular data storage back-end
A logical shard is just a tree of other shards.
DDBMS term paper 28
Partitioning
• Logical Shading-Tree
• Define Replication Policy
Read Only, Write Only
Replicate
DDBMS term paper 29
How does it work ?
Gizzard is fault-tolerant
Gizzard is designed to avoid any single points of failure.
If a certain replica in a partition has crashed, Gizzard routes requests to
the remaining healthy replicas, bearing in mind the weighting function.
Writes to an unavailable shard are buffered until the shard again becomes
available.
DDBMS term paper 30
How does
‘Gizzard’ handle
write conflicts
DDBMS term paper 31
Write operations have to be idempotent and
commutative.
Example: A user quickly follows and unfollows me. How
is this write communtative?
Follow and unfollow translate to the same write event
to FlockDB, "set edge state to X". An update applies only
if the state on disk is older than the state in flight. So in
the case of follow then unfollow, it doesn't matter
which one is applied to MySQL first, the unfollow state
will always win as it is more recent.
DDBMS term paper 32
How does Gizzard handle write
conflicts ?
Write conflicts are when two manipulations to the same record try to
change the record in differing ways.
Because Gizzard does not guarantee that operations will apply in order.
As described write operations must be both idempotent and commutative
in order to avoid conflicts.
This is actually an easy requirement in many cases than trying to guarantee
ordered delivery of messages with bounded latency and high availability.
DDBMS term paper 33
Migration
Migrating from Datastore A to
Datastore A'
DDBMS term paper 34
Twitter’s real time data
processing and storage
needs.
What type of data system
does it need?
DDBMS term paper 35
DDBMS term paper 36
DDBMS term paper 37
DDBMS term paper 38
DDBMS term paper 39
DDBMS term paper 40
DDBMS term paper 41
DDBMS term paper 42
DDBMS term paper 43
Hadoop
• Hadoop Distirbuted File System (HDFS)- it breaks
each file you give it into 64- or 128-MB chunks called
blocks and sends them to different machines in the
cluster, replicating each block three times along the
way.
– LZO Compression
• Map reduce workflow system- It breaks analyses
over large sets of data into small chunks which can be
done in parallel across all 100 (say) machines.
Generates the precomputed view on which queries are
executed
DDBMS term paper 44
DDBMS term paper 45
DDBMS term paper 46
DDBMS term paper 47
DDBMS term paper 48
DDBMS term paper 49
DDBMS term paper 50
DDBMS term paper 51
DDBMS term paper 52
DDBMS term paper 53
DDBMS term paper 54
DDBMS term paper 55
DDBMS term paper 56
DDBMS term paper 57
DDBMS term paper 58
DDBMS term paper 59
DDBMS term paper 60
DDBMS term paper 61
Storm topology
DDBMS term paper 62
Example Query: streaming word count
DDBMS term paper 63
1.Guaranteed Message processing.
2.Robust Process Management.
3.Fault Detection and Automatic Reassignment.
4.Efficient Message Passing.
USP of STORM:
DDBMS term paper 64
Monitoring popular queries
DDBMS term paper 65
Storm topology that
tracks statistics on
search queries
Send to human evaluators
for question AND Amazon’s
Mechanical Turk query
categorizes the query.
Machine learning models
evaluates responses and
then push information to
back end systems
1.Twitter Engineering blog.
2.Github Forums.
References
DDBMS term paper 66
Queries please!!
DDBMS term paper 67
Thank you!

More Related Content

What's hot

Front End Development | Introduction
Front End Development | IntroductionFront End Development | Introduction
Front End Development | Introduction
JohnTaieb
 
Bing Search Engine
Bing Search EngineBing Search Engine
Bing Search Engine
HussainiShah
 
On page seo
On page seoOn page seo
On page seo
sathya ravi
 
Instagram Final Presentation
Instagram Final PresentationInstagram Final Presentation
Instagram Final Presentation
dlcolgrove
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
Nasrin Hussain
 
Boxpark Brochure
Boxpark BrochureBoxpark Brochure
Boxpark Brochure
Roger Wade
 
SEARCH ENGINE OPTIMIZATION (SEO)
SEARCH ENGINE OPTIMIZATION (SEO)SEARCH ENGINE OPTIMIZATION (SEO)
SEARCH ENGINE OPTIMIZATION (SEO)
Preeti Acharya
 
Myspace presentation
Myspace presentationMyspace presentation
Myspace presentation
gueste88e64
 
Instagram marketing
Instagram marketingInstagram marketing
Instagram marketing
Tiara Rachmaniar
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Twitter PPT
Twitter PPTTwitter PPT
Instagram presentation
Instagram presentation Instagram presentation
Instagram presentation
Gee Ekachai
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
Shreyas Anand
 
WEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web DevelopmentWEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web Development
Randy Connolly
 
website planning and creation for beginners
website planning and creation for beginners website planning and creation for beginners
website planning and creation for beginners
Shruti Goel
 
Taller Google Analytics 4.0 | Presentación
Taller Google Analytics 4.0  | PresentaciónTaller Google Analytics 4.0  | Presentación
Taller Google Analytics 4.0 | Presentación
Arnold Gutierrez
 
Website performance optimization
Website performance optimizationWebsite performance optimization
Website performance optimization
Shubham Shinde
 
Instagram Digital Marketing
Instagram Digital MarketingInstagram Digital Marketing
Instagram Digital Marketing
GurleenKaur135
 
A Step-By-Step Guide to LinkedIn's New Company Pages
A Step-By-Step Guide to LinkedIn's New Company PagesA Step-By-Step Guide to LinkedIn's New Company Pages
A Step-By-Step Guide to LinkedIn's New Company Pages
Douglas Burdett
 
Front end web development
Front end web developmentFront end web development
Front end web development
viveksewa
 

What's hot (20)

Front End Development | Introduction
Front End Development | IntroductionFront End Development | Introduction
Front End Development | Introduction
 
Bing Search Engine
Bing Search EngineBing Search Engine
Bing Search Engine
 
On page seo
On page seoOn page seo
On page seo
 
Instagram Final Presentation
Instagram Final PresentationInstagram Final Presentation
Instagram Final Presentation
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Boxpark Brochure
Boxpark BrochureBoxpark Brochure
Boxpark Brochure
 
SEARCH ENGINE OPTIMIZATION (SEO)
SEARCH ENGINE OPTIMIZATION (SEO)SEARCH ENGINE OPTIMIZATION (SEO)
SEARCH ENGINE OPTIMIZATION (SEO)
 
Myspace presentation
Myspace presentationMyspace presentation
Myspace presentation
 
Instagram marketing
Instagram marketingInstagram marketing
Instagram marketing
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Twitter PPT
Twitter PPTTwitter PPT
Twitter PPT
 
Instagram presentation
Instagram presentation Instagram presentation
Instagram presentation
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
WEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web DevelopmentWEB I - 01 - Introduction to Web Development
WEB I - 01 - Introduction to Web Development
 
website planning and creation for beginners
website planning and creation for beginners website planning and creation for beginners
website planning and creation for beginners
 
Taller Google Analytics 4.0 | Presentación
Taller Google Analytics 4.0  | PresentaciónTaller Google Analytics 4.0  | Presentación
Taller Google Analytics 4.0 | Presentación
 
Website performance optimization
Website performance optimizationWebsite performance optimization
Website performance optimization
 
Instagram Digital Marketing
Instagram Digital MarketingInstagram Digital Marketing
Instagram Digital Marketing
 
A Step-By-Step Guide to LinkedIn's New Company Pages
A Step-By-Step Guide to LinkedIn's New Company PagesA Step-By-Step Guide to LinkedIn's New Company Pages
A Step-By-Step Guide to LinkedIn's New Company Pages
 
Front end web development
Front end web developmentFront end web development
Front end web development
 

Viewers also liked

Twitter case study final
Twitter case study  finalTwitter case study  final
Twitter case study final
Aishwaryaa Ravi
 
Distributed Airline Reservation System
Distributed Airline Reservation SystemDistributed Airline Reservation System
Distributed Airline Reservation System
amanchaurasia
 
Design at Scale: A Storage Case Study
Design at Scale: A Storage Case StudyDesign at Scale: A Storage Case Study
Design at Scale: A Storage Case Study
DesignMap
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
Aniruddha Chakrabarti
 
Microservice architecture case study
Microservice architecture case studyMicroservice architecture case study
Microservice architecture case study
Rudra Tripathy
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
Hortonworks
 
The twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyThe twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategy
John Ashcroft
 
Cisco Systems Case Study: The Architecture Review Process Improving the IT P...
Cisco Systems Case Study: The Architecture Review  Process Improving the IT P...Cisco Systems Case Study: The Architecture Review  Process Improving the IT P...
Cisco Systems Case Study: The Architecture Review Process Improving the IT P...
Susan Bouchard
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
Kevin Weil
 
Twitter Case Study by Mitesh M Motwani
Twitter Case Study by Mitesh M MotwaniTwitter Case Study by Mitesh M Motwani
Twitter Case Study by Mitesh M Motwani
Mitesh M Motwani
 

Viewers also liked (11)

Twitter case study final
Twitter case study  finalTwitter case study  final
Twitter case study final
 
Distributed Airline Reservation System
Distributed Airline Reservation SystemDistributed Airline Reservation System
Distributed Airline Reservation System
 
Design at Scale: A Storage Case Study
Design at Scale: A Storage Case StudyDesign at Scale: A Storage Case Study
Design at Scale: A Storage Case Study
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Microservice architecture case study
Microservice architecture case studyMicroservice architecture case study
Microservice architecture case study
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
The twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyThe twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategy
 
Cisco Systems Case Study: The Architecture Review Process Improving the IT P...
Cisco Systems Case Study: The Architecture Review  Process Improving the IT P...Cisco Systems Case Study: The Architecture Review  Process Improving the IT P...
Cisco Systems Case Study: The Architecture Review Process Improving the IT P...
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Twitter Case Study by Mitesh M Motwani
Twitter Case Study by Mitesh M MotwaniTwitter Case Study by Mitesh M Motwani
Twitter Case Study by Mitesh M Motwani
 

Similar to Twitter case study

IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Leons Petražickis
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
Kunal Khanna
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
Edward Capriolo
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
Soner Altin
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
Arvind Kalyan
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
Chris Richardson
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
InfiniteGraph
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
MongoDB
 
Distributed databases and dbm ss
Distributed databases and dbm ssDistributed databases and dbm ss
Distributed databases and dbm ss
Mohd Arif
 
Mongo Internal Training session by Soner Altin
Mongo Internal Training session by Soner AltinMongo Internal Training session by Soner Altin
Mongo Internal Training session by Soner Altin
mustafa sarac
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
Vinay D. Patel
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
Big data business case
Big data   business caseBig data   business case
Big data business case
Karthik Padmanabhan ( MLE℠)
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
Data
DataData
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 

Similar to Twitter case study (20)

IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
History of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature setHistory of NoSQL and Azure Documentdb feature set
History of NoSQL and Azure Documentdb feature set
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptSQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
New Data Technologies, Graph Computing and Relationship Discovery in the Ente...
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
Distributed databases and dbm ss
Distributed databases and dbm ssDistributed databases and dbm ss
Distributed databases and dbm ss
 
Mongo Internal Training session by Soner Altin
Mongo Internal Training session by Soner AltinMongo Internal Training session by Soner Altin
Mongo Internal Training session by Soner Altin
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
Data
DataData
Data
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 

Twitter case study

  • 1. Term paper presented by: • Akhtar S.Quereshi • Anurag Arora • Divya Gandhi • Nishant Goyal DDBMS term paper 1
  • 2. Twitter tale of big data! 3 years, 2 months and 1 day. The time it took from the first Tweet to the billionth Tweet. 1 week. The time it took for users to send a billion Tweets in 2011. 50 million. The average number of Tweets people sent per day, 2010. 140 million. The average number of Tweets people sent per day, February 2011. 177 million. Tweets sent on March 11, 2011. Half a billion tweets sent per day in Oct 2012. 572,000. Number of new accounts created on March 12, 2011. 460,000. Average number of new accounts per day over February 2011. DDBMS term paper 2
  • 5. Agenda of ppt • Managing social graphs- FlockDB • Sharding- Gizzard • Real time data processing/storing: Hadoop/Storm DDBMS term paper 5
  • 6. FlockDB- built over MySQL Maintaining social graph and query processing DDBMS term paper 6
  • 8. Challenges • Timeline needs to rapidly go through the *following* list of user and quickly display all their tweets (sorted recency based) • Answer queries like "What's the intersection of people I follow and people who are following President Obama?" • Handle heavy write traffic, as followers are added or removed. DDBMS term paper 8
  • 10. These features are difficult to implement in a traditional relational database. DDBMS term paper 10
  • 11. What is FlockDB? • FlockDB is a distributed graph database for storing adjacency lists. • Optimized not for graph traversal but very large adjacency lists and fast read/writes. • It is able to support: – a high rate of add/update/remove operations. – potentially complex set arithmetic queries. – paging through query result sets containing millions of entries. – ability to "archive" and later restore archived edges. DDBMS term paper 11
  • 12. How FlockDB deals with challenges? • FlockDB database stores all information as edge attributes in the graph. • The four major attributes in the adjacency list DDBMS term paper 12
  • 13. • Each edge is actually stored twice. forward: Nick follows Robey at 9:54 today. backward: Robey is followed by Nick at 9:54 today. • "Who follows me?" is just as efficient as "Who do I follow?” DDBMS term paper 13
  • 14. "What's the intersection of people I follow and people who are following President Obama?“ . This can be answered quickly by decomposing it into single-user query: "Who is following President Obama?“  Data is partitioned by node, so these queries can each be answered by a single partition, using an indexed range query.  Paging through long result sets is done by using the position field(timestamp) as a cursor. DDBMS term paper 14
  • 15. Gizzard Framework is used to query the flockDB distributed datastore. And to handle the partitioning layer DDBMS term paper 15
  • 17. Sharding = Partitioning + Replication The problem is: sharding is difficult. Determining smart partitioning schemes for particular kinds of data requires a lot of thought. And even more difficult is ensuring that all of the copies of the data are consistent despite unreliable communication and occasional computer failures. DDBMS term paper 17
  • 18. Sharding The advantages of sharding are: • High availability • Faster Queries How is sharding different than traditional architectures? DDBMS term paper 18
  • 19. How is sharding different than traditional architectures? • Data are parallelized across many datastores • Data are more highly available. • It doesn't use replication • Data are denormalized DDBMS term paper 19
  • 21. Gizzard Gizzard is a framework that offers a basic template for solving a certain class of problem. DDBMS term paper 21
  • 22. Gizzard Here are some key features of "Gizzard"  Gizzard supports any datastorage backend  Gizzard handles partitioning through a forwarding table  Gizzard is middleware  Gizzard handles replication through a replication tree  Gizzard is fault-tolerant  Gizzard supports migrations  Gizzard handles write conflicts DDBMS term paper 22
  • 24. How does it work ? Gizzard is middleware It sits “in the middle” between clients (web front-ends like PHP and Ruby on Rails applications) and the many partitions and replicas of data hence all the data manipulation flow through Gizzard. DDBMS term paper 24
  • 26. How does it work ? Gizzard handles partitioning through a forwarding table Gizzard handles partitioning by mappings ranges of data to particular shards. Stored in a forwarding table DDBMS term paper 26
  • 27. Partitioning • Define a function Fun( id ) • Ranges do not have to be equal DDBMS term paper 27
  • 28. How does it work ? Gizzard handles replication through a replication tree Each shard referenced in the forwarding table can be either a physical shard or a logical shard. A physical shard is a reference to a particular data storage back-end A logical shard is just a tree of other shards. DDBMS term paper 28
  • 29. Partitioning • Logical Shading-Tree • Define Replication Policy Read Only, Write Only Replicate DDBMS term paper 29
  • 30. How does it work ? Gizzard is fault-tolerant Gizzard is designed to avoid any single points of failure. If a certain replica in a partition has crashed, Gizzard routes requests to the remaining healthy replicas, bearing in mind the weighting function. Writes to an unavailable shard are buffered until the shard again becomes available. DDBMS term paper 30
  • 31. How does ‘Gizzard’ handle write conflicts DDBMS term paper 31
  • 32. Write operations have to be idempotent and commutative. Example: A user quickly follows and unfollows me. How is this write communtative? Follow and unfollow translate to the same write event to FlockDB, "set edge state to X". An update applies only if the state on disk is older than the state in flight. So in the case of follow then unfollow, it doesn't matter which one is applied to MySQL first, the unfollow state will always win as it is more recent. DDBMS term paper 32
  • 33. How does Gizzard handle write conflicts ? Write conflicts are when two manipulations to the same record try to change the record in differing ways. Because Gizzard does not guarantee that operations will apply in order. As described write operations must be both idempotent and commutative in order to avoid conflicts. This is actually an easy requirement in many cases than trying to guarantee ordered delivery of messages with bounded latency and high availability. DDBMS term paper 33
  • 34. Migration Migrating from Datastore A to Datastore A' DDBMS term paper 34
  • 35. Twitter’s real time data processing and storage needs. What type of data system does it need? DDBMS term paper 35
  • 44. Hadoop • Hadoop Distirbuted File System (HDFS)- it breaks each file you give it into 64- or 128-MB chunks called blocks and sends them to different machines in the cluster, replicating each block three times along the way. – LZO Compression • Map reduce workflow system- It breaks analyses over large sets of data into small chunks which can be done in parallel across all 100 (say) machines. Generates the precomputed view on which queries are executed DDBMS term paper 44
  • 63. Example Query: streaming word count DDBMS term paper 63
  • 64. 1.Guaranteed Message processing. 2.Robust Process Management. 3.Fault Detection and Automatic Reassignment. 4.Efficient Message Passing. USP of STORM: DDBMS term paper 64
  • 65. Monitoring popular queries DDBMS term paper 65 Storm topology that tracks statistics on search queries Send to human evaluators for question AND Amazon’s Mechanical Turk query categorizes the query. Machine learning models evaluates responses and then push information to back end systems
  • 66. 1.Twitter Engineering blog. 2.Github Forums. References DDBMS term paper 66
  • 67. Queries please!! DDBMS term paper 67 Thank you!

Editor's Notes

  1. Source_id/destination_id is a unique user id unless the graph is the graph storing favorite tweets in which case, the destination ID may be a tweet ID.Position is timestampFor example, the users who delete their account, their edges are put into “archived” state, allowing them to be restored later. When the edge is deleted, the row isn’t actually deleted from MySQL; it's just marked as being in the deleted state, which has the effect of moving the primary key.
  2. Data is partitioned by node, so these queries can each be answered by a single partition, using an indexed range query.
  3. Unlike others it is fault tolerant and scalable..Storm can do a continuous query and stream the results to clients in realtime..
  4. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures..
  5. 1. Tracks tasks tree.2. Workers controlled by supervisor. Hence task will never be orphaned sucking up memory.3. Tasks heartbeat to nimbus.4. No immediate queuing. Directly message transfer between tasks.Storm guarantees messages will be processed even in the face of failures
  6. 1. Tracks tasks tree.2. Workers controlled by supervisor. Hence task will never be orphaned sucking up memory.3. Tasks heartbeat to nimbus.4. No immediate queuing. Directly message transfer between tasks.Storm guarantees messages will be processed even in the face of failures
  7. 1. Tracks tasks tree.2. Workers controlled by supervisor. Hence task will never be orphaned sucking up memory.3. Tasks heartbeat to nimbus.4. No immediate queuing. Directly message transfer between tasks.Storm guarantees messages will be processed even in the face of failures