Your SlideShare is downloading. ×

Nosql at twitter_devoxx2010

22,681

Published on

Published in: Technology
6 Comments
78 Likes
Statistics
Notes
No Downloads
Views
Total Views
22,681
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
629
Comments
6
Likes
78
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Small talk -- how many use twitter? How many tweeted today? How many checked today? Feel free to tweet during the talk, I won’t get offended.

    Who am I? I went to a couple universities, worked in a few places, now I work on the analytics infrastructure at Twitter.
  • The NoSQL term is bad because it defines something by what it is not, conflating a number of different techs.
    I will be talking about scaling problems and big data problems.

  • Will check if there is time left over.





  • Events that happen during the same millisecond are ordered semi-arbitrarily (depends on what dc/worker they hit). We are ok with that.

    DC and worker ids come from config + ZK sanity check.


  • VoltDB independently came up with basically the same approach,
    It’s amusing to look through their code and find the same solutions to same weird corner cases.









  • This is a slow query, even if you have indexes.
    We’ll talk about the indexes in a sec, but first let’s consider whether it even makes sense to run this query.





  • Knowing we are dealing with a list saves a lot of client-side code, merging lists in store allows consistency control
































  • Logs are immutable; HDFS is great. Tables have mutable data.
    Ignore updates? bad data. Pull updates, resolve at read time? Pain, time.
    Pull updates, resolve in batches? Pain, time. Let someone else do the resolving? Helloooo, HBase!
    Bonus: Lookups, Projection push-downs.










  • ×