Giv ing a talk at Ca l’s CS10 - The
Beauty and Joy of Computing.
inst. eecs.berkeley.e du/~cs10/fa10
20 Oct via Twitter for iPhone
from UC Berkeley
155 Donner Lab
Berkeley, CA
View Tweets at this place
What’s The data structure of a
tweet?
{
“id ” : 787167620,
“text” : “Free”,
“ date” : “2008-04 -11 13:27:26”
“user” : “jamesbuck”
}*
*not quite, but something like this
@ladygaga
mother mons†er
6.8 million followers
@justinbieber
Justin Bieber
5.7 million followers
@BarakObama
44th President of the United States
5.6 million followers
@raffi
me!
5.1 thousand followers
@ladygaga
mother mons†er
6.8 x 106 followers
@raffi
me!
5.1 x 103 followers
differs by 10 3
MySQL
Can’t generate IDs fast enough
Centralized and a single point of failure
snowflake
Highly available and uncoordinated (10kqps)
Compatible with the ecosystem
http://github.com/twitter/snowflake
How big are they?
1 tweet text = 140
characters
≈ 200 bytes
1000 tweets
per second ≈ 195 KB/sec
≈ 11 MB/min
≈ 16 GB/day
Just tweet text!
1
Digraph 2
Need to represent this
4
1 2 3 4 3
1
Matrix
2
Naïve implementation is not scalable
3
4
Photo used under Creative Commons from jurvetson
Distributed graph database
flockdb High rate of CRUD operations
Complex set arithmetic queries
http://github.com/twitter/flockdb
Where do we want to be?
Today - 160M people generate ~1000 TPS
Tomorrow - we want to support half the world and all its devices
(right now, there are 6B people and 5B phones)