2. Giv ing a @twitter t alk at ColumbiaUni versity talking a bout Twitter’sNumbers!22 Feb via Twitter for iPhone ty from Mudd Building at Columbia Universi 500 West 120th Street New York, New York View Tweets at this place
8. What’s a Tweet?It’s a short message thats sent through 140 characters
9. How many are there?
10. How many are there? 110M!
11. 110M tweets 1200 tweets per day ≈ per second
12. How big are they? 1 tweet text = 140 characters ≈ 200 bytes
13. 1200 tweets per ≈ 230 KB/sec second ≈ 14 MB/min ≈ 19 GB/day Just tweet text!
14. MySQL Can’t generate IDs fast enoughCentralized and a single point of failure snowflake Highly available and uncoordinated (10kqps) Compatible with the ecosystem http://github.com/twitter/snowflake
15. ampura mons from ch used under Creative Com Photo1 TB generated 10 TB generated per day per day
16. 10 TBper day in total ≈ 120 MB per sec 80 MB = per sec Photo used u n der Creative C ommons from Mac Users G uide
17. Where do they go? Followed by Following Asymmetric Digraph
18. 1 Digraph 2 Need to represent this 4 1 2 3 4 31 Matrix2 Naïve implementation is not scalable34
19. 200M registered users 2006 2008 2010 2011
20. Photo used under Creative Commons from jurvetson Distributed graph databaseflockdb High rate of CRUD operations Complex set arithmetic queries http://github.com/twitter/flockdb
21. @ladygagamother mons†er8.3 million followers@justinbieberJustin Bieber7.5 million followers@BarakObama44th President of the United States6.7 million followers@raffime!0.007 million followers
22. How do they get out? 10B API calls 100,000 calls per day ≈ per second
23. REST API XML/JSON API over HTTPPoll-based system / pseudo real-time hosebird Streaming API Long poll HTTP Near real-time delivery of Tweets
24. Latency200ms100ms 0ms
25. 752%in 2008
26. 1358% in 2009
27. Where do we want to be? Today - 200M people generate ~1200 TPSTomorrow - we want to support half the world and all its devices (5B phones and 6B people)
28. Real challenges in front of us Real time Indexing, search, and analytics Relevance systems Graph databases Storage Scalability and efficiency