Handling Data in Mega Scale Systems

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

2 comments

Comments 1 - 2 of 2 previous next Post a comment

  • + directi Directi Group 4 days ago
    Thanks Ankit : )
  • + guest30bea7 Ankit Jain 1 week ago
    ooh.. that’s great. Really helped me understanding the archi. behind such high end apps.

    - ankit
Post a comment
Embed Video
Edit your comment Cancel

Favorites, Groups & Events

Handling Data in Mega Scale Systems - Presentation Transcript

  1. Intelligent People. Uncommon Ideas.
    Handling Data in Mega Scale Web Apps(lessons learnt @ Directi)
    Vineet Gupta | GM – Software Engineering | Directi
    http://vineetgupta.spaces.live.com
    Licensed under Creative Commons Attribution Sharealike Noncommercial
  2. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  3. Not Covering
    Offline Processing (Batching / Queuing)
    Distributed Processing – Map Reduce
    Non-blocking IO
    Fault Detection, Tolerance and Recovery
  4. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  5. How Big Does it Get
    22M+ users
    Dozens of DB servers
    Dozens of Web servers
    Six specialized graph database servers to run recommendations engine
    Source:http://highscalability.com/digg-architecture
  6. How Big Does it Get
    1 TB / Day
    100 M blogs indexed / day
    10 B objects indexed / day
    0.5 B photos and videos
    Data doubles in 6 months
    Users double in 6 months
    Source:http://www.royans.net/arch/2007/10/25/scaling-technorati-100-million-blogs-indexed-everyday/
  7. How Big Does it Get
    2 PB Raw Storage
    470 M photos, 4-5 sizes each
    400 k photos added / day
    35 M photos in Squid cache (total)
    2 M photos in Squid RAM
    38k reqs / sec to Memcached
    4 B queries / day
    Source:http://mysqldba.blogspot.com/2008/04/mysql-uc-2007-presentation-file.html
  8. How Big Does it Get
    Virtualized database spans 600 production instances residing in 100+ server clusters distributed over 8 datacenters
    2 PB of data
    26 B SQL queries / day
    1 B page views / day
    3 B API calls / month
    15,000 App servers
    Source:http://highscalability.com/ebay-architecture/
  9. How Big Does it Get
    450,000 low cost commodity servers in 2006
    Indexed 8 B web-pages in 2005
    200 GFS clusters (1 cluster = 1,000 – 5,000 machines)
    Read / write thruput = 40 GB / sec across a cluster
    Map-Reduce
    100k jobs / day
    20 PB of data processed / day
    10k MapReduce programs
    Source:http://highscalability.com/google-architecture/
  10. Key Trends
    Data Size ~ PB
    Data Growth ~ TB / day
    No of servers – 10s to 10,000
    No of datacenters – 1 to 10
    Queries – B+ / day
    Specialized needs – more / other than RDBMS
  11. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  12. Host
    RAM
    CPU
    CPU
    RAM
    CPU
    RAM
    App Server
    DB Server
    Vertical Scaling (Scaling Up)
  13. Big Irons
    Sunfire E20k
    PowerEdge SC1435
    36x 1.8GHz processors
    Dualcore 1.8 GHz processor
    $450,000 - $2,500,000
    Around $1,500
  14. Vertical Scaling (Scaling Up)
    Increasing the hardware resources on a host
    Pros
    Simple to implement
    Fast turnaround time
    Cons
    Finite limit
    Hardware does not scale linearly (diminishing returns for each incremental unit)
    Requires downtime
    Increases Downtime Impact
    Incremental costs increase exponentially
  15. Host
    Host
    App Server
    DB Server
    Vertical Partitioning of Services
  16. Vertical Partitioning of Services
    Split services on separate nodes
    Each node performs different tasks
    Pros
    Increases per application Availability
    Task-based specialization, optimization and tuning possible
    Reduces context switching
    Simple to implement for out of band processes
    No changes to App required
    Flexibility increases
    Cons
    Sub-optimal resource utilization
    May not increase overall availability
    Finite Scalability
  17. Horizontal Scaling of App Server
    Web Server
    Load Balancer
    Web Server
    DB Server
    Web Server
  18. Horizontal Scaling of App Server
    Add more nodes for the same service
    Identical, doing the same task
    Load Balancing
    Hardware balancers are faster
    Software balancers are more customizable
  19. The problem - State
    Web Server
    User 1
    Load Balancer
    Web Server
    DB Server
    User 2
    Web Server
  20. Sticky Sessions
    Web Server
    User 1
    Load Balancer
    Web Server
    DB Server
    User 2
    Web Server
    Asymmetrical load distribution
    Downtime
  21. Central Session Store
    Web Server
    User 1
    Load Balancer
    Web Server
    Session Store
    User 2
    Web Server
    SPOF
    Reads and Writes generate network + disk IO
  22. Clustered Sessions
    Web Server
    User 1
    Load Balancer
    Web Server
    User 2
    Web Server
  23. Clustered Sessions
    Pros
    No SPOF
    Easier to setup
    Fast Reads
    Cons
    n x Writes
    Increase in network IO with increase in nodes
    Stale data (rare)
  24. Sticky Sessions with Central Store
    Web Server
    User 1
    Load Balancer
    Web Server
    DB Server
    User 2
    Web Server
  25. More Session Management
    No Sessions
    Stuff state in a cookie and sign it!
    Cookie is sent with every request / response
    Super Slim Sessions
    Keep small amount of frequently used data in cookie
    Pull rest from DB (or central session store)
  26. Sessions - Recommendation
    Bad
    Sticky sessions
    Good
    Clustered sessions for small number of nodes and / or small write volume
    Central sessions for large number of nodes or large write volume
    Great
    No Sessions!
  27. App Tier Scaling - More
    HTTP Accelerators / Reverse Proxy
    Static content caching, redirect to lighter HTTP
    Async NIO on user-side, Keep-alive connection pool
    CDN
    Get closer to your user
    Akamai, Limelight
    IP Anycasting
    Async NIO
  28. Scaling a Web App
    App-Layer
    Add more nodes and load balance!
    Avoid Sticky Sessions
    Avoid Sessions!!
    Data Store
    Tricky! Very Tricky!!!
  29. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  30. Replication = Scaling by Duplication
    App Layer
    T1, T2, T3, T4
  31. Replication = Scaling by Duplication
    App Layer
    T1, T2, T3, T4
    T1, T2, T3, T4
    T1, T2, T3, T4
    T1, T2, T3, T4
    T1, T2, T3, T4
    Each node has its own copy of data
    Shared Nothing Cluster
  32. Replication
    Read : Write = 4:1
    Scale reads at cost of writes!
    Duplicate Data – each node has its own copy
    Master Slave
    Writes sent to one node, cascaded to others
    Multi-Master
    Writes can be sent to multiple nodes
    Can lead to deadlocks
    Requires conflict management
  33. Master-Slave
    App Layer
    Master
    Slave
    Slave
    Slave
    Slave
    n x Writes – Async vs. Sync
    SPOF
    Async - Critical Reads from Master!
  34. Multi-Master
    App Layer
    Master
    Master
    Slave
    Slave
    Slave
    n x Writes – Async vs. Sync
    No SPOF
    Conflicts!
  35. Replication Considerations
    Asynchronous
    Guaranteed, but out-of-band replication from Master to Slave
    Master updates its own db and returns a response to client
    Replication from Master to Slave takes place asynchronously
    Faster response to a client
    Slave data is marginally behind the Master
    Requires modification to App to send critical reads and writes to master, and load balance all other reads
    Synchronous
    Guaranteed, in-band replication from Master to Slave
    Master updates its own db, and confirms all slaves have updated their db before returning a response to client
    Slower response to a client
    Slaves have the same data as the Master at all times
    Requires modification to App to send writes to master and load balance all reads
  36. Replication Considerations
    Replication at RDBMS level
    Support may exists in RDBMS or through 3rd party tool
    Faster and more reliable
    App must send writes to Master, reads to any db and critical reads to Master
    Replication at Driver / DAO level
    Driver / DAO layer ensures
    writes are performed on all connected DBs
    Reads are load balanced
    Critical reads are sent to a Master
    In most cases RDBMS agnostic
    Slower and in some cases less reliable
  37. Diminishing Returns
    Per Server:
    4R, 1W
    2R, 1W
    1R, 1W
    Read
    Read
    Read
    Write
    Write
    Write
    Read
    Read
    Read
    Read
    Write
    Write
    Write
    Write
  38. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  39. Partitioning = Scaling by Division
    Vertical Partitioning
    Divide data on tables / columns
    Scale to as many boxes as there are tables or columns
    Finite
    Horizontal Partitioning
    Divide data on rows
    Scale to as many boxes as there are rows!
    Limitless scaling
  40. Vertical Partitioning
    App Layer
    T1, T2, T3, T4, T5
    Note: A node here typically represents a shared nothing cluster
  41. Vertical Partitioning
    App Layer
    T3
    T4
    T5
    T2
    T1
    Facebook - User table, posts table can be on separate nodes
    Joins need to be done in code (Why have them?)
  42. Horizontal Partitioning
    App Layer
    T3
    T4
    T5
    T2
    T1
    First million rows
    T3
    T4
    T5
    T2
    T1
    Second million rows
    T3
    T4
    T5
    T2
    T1
    Third million rows
  43. Horizontal Partitioning Schemes
    Value Based
    Split on timestamp of posts
    Split on first alphabet of user name
    Hash Based
    Use a hash function to determine cluster
    Lookup Map
    First Come First Serve
    Round Robin
  44. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  45. CAP Theorem
    Source:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.1495
  46. Transactions
    Transactions make you feel alone
    No one else manipulates the data when you are
    Transactional serializability
    The behavior is as if a serial order exists
    Source:http://blogs.msdn.com/pathelland/
    Slide 46
  47. Life in the “Now”
    Transactions live in the “now” inside services
    Time marches forward
    Transactions commit
    Advancing time
    Transactions see the committed transactions
    A service’s biz-logic lives in the “now”
    Source:http://blogs.msdn.com/pathelland/
    Slide 47
  48. Sending Unlocked Data Isn’t “Now”
    Messages contain unlocked data
    Assume no shared transactions
    Unlocked data may change
    Unlocking it allows change
    Messages are not from the “now”
    They are from the past
    There is no simultaneity at a distance!
    • Similar to speed of light
    • Knowledge travels at speed of light
    • By the time you see a distant object it may have changed!
    • By the time you see a message, the data may have changed!
    Services, transactions, and locks bound simultaneity!
    • Inside a transaction, things appear simultaneous (to others)
    • Simultaneity only inside a transaction!
    • Simultaneity only inside a service!
    Source:http://blogs.msdn.com/pathelland/
    Slide 48
  49. Outside Data: a Blast from the Past
    All data from distant stars is from the past
    • 10 light years away; 10 year old knowledge
    • The sun may have blown up 5 minutes ago
    • We won’t know for 3 minutes more…
    All data seen from a distant service is from the “past”
    By the time you see it, it has been unlocked and may change
    Each service has its own perspective
    Inside data is “now”; outside data is “past”
    My inside is not your inside; my outside is not your outside
    This is like going from Newtonian to Einstonian physics
    • Newton’s time marched forward uniformly
    • Instant knowledge
    • Classic distributed computing: many systems look like one
    • RPC, 2-phase commit, remote method calls…
    • In Einstein’s world, everything is “relative” to one’s perspective
    • Today: No attempt to blur the boundary
    Source:http://blogs.msdn.com/pathelland/
    Slide 49
  50. Versions and Distributed Systems
    Can’t have “the same” dataat many locations
    Unless it isa snapshot
    Changing distributed dataneeds versions
    Creates asnapshot…
    Source:http://blogs.msdn.com/pathelland/
  51. Subjective Consistency
    Given what I know here and now, make a decision
    Remember the versions of all the data used to make this decision
    Record the decision as being predicated on these versions
    Other copies of the object may make divergent decisions
    Try to sort out conflicts within the family
    If necessary, programmatically apologize
    Very rarely, whine and fuss for human help
    Subjective Consistency
     Given the information I have at hand, make a decision and act on it !
     Remember the information at hand !
    Ambassadors Had Authority
    Back before radio, it could be months between communication with the king. Ambassadors would make treaties and much more... They had binding authority. The mess was sorted out later!
    Source:http://blogs.msdn.com/pathelland/
  52. Eventual Consistency
    Eventually, all the copies of the object share their changes
    “I’ll show you mine if you show me yours!”
    Now, apply subjective consistency:
    “Given the information I have at hand, make a decision and act on it!”
    Everyone has the same information, everyone comes to the same conclusion about the decisions to take…
    Eventual Consistency
    • Given the same knowledge, produce the same result !
    • Everyone sharing their knowledge leads to the same result...
    This is NOT magic; it is a design requirement !
    Idempotence, commutativity, and associativity of the operations(decisions made) are all implied by this requirement
    Source:http://blogs.msdn.com/pathelland/
  53. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  54. Why Normalize?
    Classic problemwith de-normalization
    Can’t updateSam’s phone #since there aremany copies
    Emp #
    Emp Name
    Mgr #
    Mgr Name
    Emp Phone
    47
    Joe
    13
    Sam
    5-1234
    18
    Sally
    38
    Harry
    3-3123
    91
    Pete
    13
    Sam
    2-1112
    66
    Mary
    02
    Betty
    5-7349
    Mgr Phone
    6-9876
    5-6782
    6-9876
    4-0101
    Normalization’s Goal Is Eliminating Update Anomalies
    Can Be Changed Without “Funny Behavior”
    Each Data Item Lives in One Place
    De-normalization is
    OK if you aren’t going to update!
    Source:http://blogs.msdn.com/pathelland/
  55. Eliminate Joins
  56. Eliminate Joins
    6 joins for 1 query!
    Do you think FB would do this?
    And how would you do joins with partitioned data?
    De-normalization removes joins
    But increases data volume
    But disk is cheap and getting cheaper
    And can lead to inconsistent data
    If you are lazy
    However this is not really an issue
  57. “Append-Only” Data
    Many Kinds of Computing are “Append-Only”
    Lots of observations are made about the world
    Debits, credits, Purchase-Orders, Customer-Change-Requests, etc
    As time moves on, more observations are added
    You can’t change the history but you can add new observations
    Derived Results May Be Calculated
    Estimate of the “current” inventory
    Frequently inaccurate
    Historic Rollups Are Calculated
    Monthly bank statements
  58. Databases and Transaction Logs
    Transaction Logs Are the Truth
    High-performance & write-only
    Describe ALL the changes to the data
    Data-Base  the Current Opinion
    Describes the latest value of the data as perceived by the application
    Log
    DB
    The Database Is a Caching of the Transaction Log !
    It is the subset of the latest committed values represented in the transaction log…
    Source:http://blogs.msdn.com/pathelland/
  59. We Are Swimming in a Sea of Immutable Data
    Source:http://blogs.msdn.com/pathelland/
  60. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  61. Caching
    Makes scaling easier (cheaper)
    Core Idea
    Read data from persistent store into memory
    Store in a hash-table
    Read first from cache, if not, load from persistent store
  62. Write thru Cache
    App Server
    Cache
  63. Write back Cache
    App Server
    Cache
  64. Sideline Cache
    App Server
    Cache
  65. Memcached
  66. How does it work
    In-memory Distributed Hash Table
    Memcached instance manifests as a process (often on the same machine as web-server)
    Memcached Client maintains a hash table
    Which item is stored on which instance
    Memcached Server maintains a hash table
    Which item is stored in which memory location
  67. Outline
    Characteristics
    App Tier Scaling
    Replication
    Partitioning
    Consistency
    Normalization
    Caching
    Data Engine Types
  68. It’s not all Relational!
    Amazon - S3, SimpleDb, Dynamo
    Google - App Engine Datastore, BigTable
    Microsoft – SQL Data Services, Azure Storages
    Facebook – Cassandra
    LinkedIn - Project Voldemort
    Ringo, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, CouchDB, Hbase, Hypertable
  69. Tuplespaces
    Basic Concepts
    No tables - Containers-Entity
    No schema - each tuple has its own set of properties
    Amazon SimpleDB – strings only
    Microsoft Azure SQL Data Services
    Strings, blob, datetime, bool, int, double, etc.
    No x-container joins as of now
    Google App Engine Datastore
    Strings, blob, datetime, bool, int, double, etc.
  70. Key-Value Stores
    Google BigTable
    Sparse, Distributed, multi-dimensional sorted map
    Indexed by row key, column key, timestamp
    Each value is an un-interpreted array of bytes
    Amazon Dynamo
    Data partitioned and replicated using consistent hashing
    Decentralized replica sync protocol
    Consistency thru versioning
    Facebook Cassandra
    Used for Inbox search
    Open Source
    Scalaris
    Keys stored in lexicographical order
    Improved Paxos to provide ACID
    Memory resident, no persistence
  71. In Summary
    Real Life Scaling requires trade offs
    No Silver Bullet
    Need to learn new things
    Need to un-learn
    Balance!
  72. QUESTIONS?
  73. Intelligent People. Uncommon Ideas.
    Licensed under Creative Commons Attribution Sharealike Noncommercial

+ Directi GroupDirecti Group, 3 weeks ago

custom

236 views, 0 favs, 2 embeds more stats

Handling Data in Mega Scale Systems by Vineet Gupta more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 236
    • 176 on SlideShare
    • 60 from embeds
  • Comments 2
  • Favorites 0
  • Downloads 11
Most viewed embeds
  • 58 views on http://blog.codechef.com
  • 2 views on http://ankitjain.info

more

All embeds
  • 58 views on http://blog.codechef.com
  • 2 views on http://ankitjain.info

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories