Twitter by the Numbers (Columbia University)Presentation Transcript
by the #s with @raffi
Giv ing a @twitter t alk at ColumbiaUni versity talking a bout Twitter’sNumbers!22 Feb via Twitter for iPhone ty from Mudd Building at Columbia Universi 500 West 120th Street New York, New York View Tweets at this place
What’s a Tweet?It’s a short message thats sent through 140 characters
How many are there?
How many are there? 110M!
110M tweets 1200 tweets per day ≈ per second
How big are they? 1 tweet text = 140 characters ≈ 200 bytes
1200 tweets per ≈ 230 KB/sec second ≈ 14 MB/min ≈ 19 GB/day Just tweet text!
MySQL Can’t generate IDs fast enoughCentralized and a single point of failure snowflake Highly available and uncoordinated (10kqps) Compatible with the ecosystem http://github.com/twitter/snowflake
ampura mons from ch used under Creative Com Photo1 TB generated 10 TB generated per day per day
10 TBper day in total ≈ 120 MB per sec 80 MB = per sec Photo used u n der Creative C ommons from Mac Users G uide
Where do they go? Followed by Following Asymmetric Digraph
1 Digraph 2 Need to represent this 4 1 2 3 4 31 Matrix2 Naïve implementation is not scalable34
200M registered users 2006 2008 2010 2011
Photo used under Creative Commons from jurvetson Distributed graph databaseflockdb High rate of CRUD operations Complex set arithmetic queries http://github.com/twitter/flockdb
@ladygagamother mons†er8.3 million followers@justinbieberJustin Bieber7.5 million followers@BarakObama44th President of the United States6.7 million followers@raffime!0.007 million followers
How do they get out? 10B API calls 100,000 calls per day ≈ per second
REST API XML/JSON API over HTTPPoll-based system / pseudo real-time hosebird Streaming API Long poll HTTP Near real-time delivery of Tweets
Latency200ms100ms 0ms
752%in 2008
1358% in 2009
Where do we want to be? Today - 200M people generate ~1200 TPSTomorrow - we want to support half the world and all its devices (5B phones and 6B people)
Real challenges in front of us Real time Indexing, search, and analytics Relevance systems Graph databases Storage Scalability and efficiency
Follow me atQuestions? twitter.com/raffi
Let LinkedIn power your SlideShare experience
+
Let LinkedIn power your SlideShare experience
Customize SlideShare content based on your interests
We will import your LinkedIn profile and you will be visible on SlideShare.
Keep up to date when your LinkedIn contacts post on SlideShare
1–2 of 2 previous next