Nosql at twitter_devoxx2010

23,717 views
23,110 views

Published on

Published in: Technology
6 Comments
77 Likes
Statistics
Notes
No Downloads
Views
Total views
23,717
On SlideShare
0
From Embeds
0
Number of Embeds
701
Actions
Shares
0
Downloads
631
Comments
6
Likes
77
Embeds 0
No embeds

No notes for slide
  • Small talk -- how many use twitter? How many tweeted today? How many checked today? Feel free to tweet during the talk, I won’t get offended.

    Who am I? I went to a couple universities, worked in a few places, now I work on the analytics infrastructure at Twitter.
  • The NoSQL term is bad because it defines something by what it is not, conflating a number of different techs.
    I will be talking about scaling problems and big data problems.

  • Will check if there is time left over.





  • Events that happen during the same millisecond are ordered semi-arbitrarily (depends on what dc/worker they hit). We are ok with that.

    DC and worker ids come from config + ZK sanity check.


  • VoltDB independently came up with basically the same approach,
    It’s amusing to look through their code and find the same solutions to same weird corner cases.









  • This is a slow query, even if you have indexes.
    We’ll talk about the indexes in a sec, but first let’s consider whether it even makes sense to run this query.





  • Knowing we are dealing with a list saves a lot of client-side code, merging lists in store allows consistency control
































  • Logs are immutable; HDFS is great. Tables have mutable data.
    Ignore updates? bad data. Pull updates, resolve at read time? Pain, time.
    Pull updates, resolve in batches? Pain, time. Let someone else do the resolving? Helloooo, HBase!
    Bonus: Lookups, Projection push-downs.










  • ×