Small talk -- how many use twitter? How many tweeted today? How many checked today? Feel free to tweet during the talk, I won&#x2019;t get offended.
Who am I? I went to a couple universities, worked in a few places, now I work on the analytics infrastructure at Twitter.
The NoSQL term is bad because it defines something by what it is not, conflating a number of different techs. I will be talking about scaling problems and big data problems.
Will check if there is time left over.
Events that happen during the same millisecond are ordered semi-arbitrarily (depends on what dc/worker they hit). We are ok with that.
DC and worker ids come from config + ZK sanity check.
VoltDB independently came up with basically the same approach, It&#x2019;s amusing to look through their code and find the same solutions to same weird corner cases.
This is a slow query, even if you have indexes. We&#x2019;ll talk about the indexes in a sec, but first let&#x2019;s consider whether it even makes sense to run this query.
Knowing we are dealing with a list saves a lot of client-side code, merging lists in store allows consistency control
Logs are immutable; HDFS is great. Tables have mutable data. Ignore updates? bad data. Pull updates, resolve at read time? Pain, time. Pull updates, resolve in batches? Pain, time. Let someone else do the resolving? Helloooo, HBase! Bonus: Lookups, Projection push-downs.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.