Membase Meetup - Silicon Valley


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Get better channels at bottom myspace, forbes
    About ShareThis, stats/metrics consumer facing company but we collect data…scale...
  • Data used as signals for user model
    Social Network used
    Search Keywords

    Membase stores user cookie -> segments

  • What we offer display campaigns that are audience targeted,

    Example of ads, content, different partners etc.
  • We’re very serious about simplicity. By being based on the memcached protocol, membase is already compatible with HUGE number of languages, frameworks and even applications. The verbs the clients in those languages expose are very simple. Set, get, atomic increment/decrement, append and prepend.

    membase has persistence and is designed for distributed environments, meaning we need to replicate the data. Keeping to the simple interface and client compatibility while making it simple to define rules about persistence and replication of data items and, as importantly make it simple to grow your capacity while maintaining consistency.

    Without changes, many applications can directly use membase as a K/V store
  • Membase Meetup - Silicon Valley

    1. 1. Silicon Valley
    2. 2. 2 Tonight • Membase Overview • Use Cases and Deployment Examples • Membase Architecture • Demo! • Developing with Membase • A Glimpse into the Future
    3. 3. What is Membase?
    4. 4. Before: Application scales linearly, data hits wall Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server 4
    5. 5. Membase is a distributed database 5 Membase Servers In the data center Web application server Application user On the administrator con
    6. 6. Built-in Memcached Caching Layer 6 Memcached Membase Database Memcached Membase Database Memcached Mode Membase Mode Membase development team has contributed over half of the source code to the Memcached project.
    7. 7. Deployment options 7 application logic OTC memcached client data operations application logic OTC memcached client data operations cluster operations 11211 server list OTC Memcached Server 11211 Membase Server server list proxy vbucket map application logic OTC memcached client Membase Server localhost proxy vbucket map application logic NEW memcached client Membase Server vbucket map Embedded proxy Standalone proxy “vBucket-aware”client Deployment Option1 Deployment Option2 Deployment Option3 11210 data operations cluster operations 11211 proxy vbucket map 11210 data operations cluster operations 11211 proxy vbucket map 11210
    8. 8. Secure multitenant support 8 Membase data servers In the data center Web application server Application user On the administrator con Bucket 1 Bucket 2Aggregate Cluster Memory and Disk Capacity
    9. 9. Five minutes or less to a working cluster • Downloads for Linux and Windows • Start with a single node • One button press joins nodes to a cluster Easy to develop against • Just SET and GET – no schema required • Drop it in. 10,000+ existing applications already “speak membase” (via memcached) • Practically every language and application framework is supported, out of the box Easy to manage • One-click failover and cluster rebalancing • Graphical and programmatic interfaces • Configurable alerting Membase is Simple, Fast, Elastic 9
    10. 10. Membase is Simple, Fast, Elastic 10 Predictable • “Never keep an application waiting” • Quasi-deterministic latency and throughput Low latency • Built-in Memcached technology • Auto-migration of hot data to lowest latency storage technology (RAM, SSD, Disk) • Selectable write behavior – asynchronous, synchronous (on replication, persistence) High throughput • Multi-threaded • Low lock contention • Asynchronous wherever possible • Automatic write de-duplication
    11. 11. Membase is Simple, Fast, Elastic 11 Zero-downtime elasticity • Spread I/O and data across commodity servers (or VMs) • Consistent performance with linear cost • Dynamic rebalancing of a live cluster All nodes are created equal • No special case nodes • Clone to grow Extensible • Filtered TAP interface provides hook points for external systems (e.g. full-text search, backup, warehouse) • Data bucket – engine API for specialized container types • Membase NodeCode [FUTURE]
    12. 12. Leading cloud service (PAAS) provider Over 65,000 hosted applications Membase Server supporting over 3,000 Heroku customers Proven at small, and extra large scale 12 Social game leader – FarmVille, Mafia Wars, Café World Over 230 million monthly users Zynga is a core contributor to and large scale user of Membase Server
    13. 13. After: Data layer scales like application logic layer Data layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers 13 Database Scales Out Just add more commodity data servers Scaling out flattens the cost and performance curves. Membase Servers
    14. 14. Membase - A practical path to “NoSQL” adoption 14
    15. 15. Use Cases
    16. 16. 17 Leading cloud service (PAAS) provider Over 65,000 hosted applications Membase Server serving over 1,200 Heroku customers (as of June 10, 2010) Deployments Leading Membase Social game leader – FarmVille, Mafia Wars, Café World Over 230 million monthly users • Membase Server is the 500,000 ops-per-second database behind FarmVille and Café World
    17. 17. Use case – Ad targeting 18 events profiles, campaigns profiles, real time campaign statistics 40 milliseconds to come up with an answer. 2 3 1
    18. 18. 19 Search and Gaming Portal Database
    19. 19. Targeting at ShareThis
    20. 20. Largest integrated sharing network We make sharing simple, engaging & valuable Powerful Social Analytics & Audience Monetization About ShareThis 450/mo million consumers ~850 thousand sites 50+ social channels
    21. 21. This is how it works Log FilesSearch Keywords Page Views Sharing Behavior HDFS Map/Reduce Content Analysis Taxonomy Ad Server User Membase 2
    22. 22. ShareThis Ad Products
    23. 23. Membase Architecture
    24. 24. 25 Clustering • Underlying cluster functionality based on erlang OTP • Have a custom, vector clock based way of storing and propagating... – Cluster topology – vBucket mapping • Collect statistics from many nodes of the cluster – Identify hot keys, resource utilization 25
    25. 25. 26
    26. 26. 27 TAP • A generic, scalable method of streaming mutations from a given server – As data operations arrive, they can be sent to arbitrary TAP receivers • Leverages the existing memcached engine interface, and the non-blocking IO interfaces to send data • Three modes of operation
    27. 27. 29 Clients, nodes and other nodes
    28. 28. 30 Data buckets are secure membase “slices” Membase data servers In the data center Web application server Application user On the administrator console Bucket 1 Bucket 2 Aggregate Cluster Memory and Disk Capacity
    29. 29. 32 vBucket mapping
    30. 30. 34 Disk > Memory Dataset may have many items infrequently accessed. However, memcached has different behavior (LRU) than wanted with membase. Still, traditional (most) RDBMS implementations are not 100% correct for us either. The speed of a miss is very, very important.
    31. 31. Membase Demo
    32. 32. 36 Thanks!
    33. 33. Key-Value Patterns
    34. 34. 38 Key-Value (with a replica) Items have: Key Value Expiration Flags CAS (more on this later) Operations include: Get/Set Increment/Decrement Append/Prepend
    35. 35. 39 Membase Datatypes • byte[] – Does your data have 1s and 0s? “Any customer can have a car painted any colour that he wants so long as it is black.” • Items do have flags – Many clients use flags – Data type options • Google protobuf • Thrift • Avro
    36. 36. 40 Transactions • Lock == slow me down • CAS operations – Optimistic locking • Very useful with complex datatypes – Imagine two clients trying to update a complex item • You’re likely using CAS already... if you use a CPU User 1 User 2
    37. 37. 41 Common Use: Sessions • Web user sessions – Highly read, less writes in many case – Protocol advantage of memcached • Options already for PHP, Ruby and Java • Application state – Not necessarily “entity” style things – May be appropriate for a “cache” pool
    38. 38. 42 Common Use (cache): Rate Limiting • Want to provide API calls into the system – Twitter search – Google search services • Use the atomic increment – Set an item with a unique ID – Upon API request, increment and check • HTTP 420: go away and come back later Your Users Your App
    39. 39. Looking Ahead: NodeCode Frank Weigel, Membase
    40. 40. 44 Beyond key-value • Indexing/Range Queries • Advanced Data Structures • Sub-object direct manipulation Validation and In-flight transformation • Block mutations failing validation • Enrich or transform objects Connectors (Integrate easily with other systems) • Solr • Hadoop • MySQL NodeCode – Motivation
    41. 41. 45 NodeCode - What is it? Method for extending & customizing Membase Separate code modules Defined interface to datapath and cluster manager Notification on events • Synchronous • Asynchronous
    42. 42. 46 Simple • Packaged modules for easy install and enable • Library of “off the shelf” modules • Module monitoring • Straight forward development and debugging Fast • Low latency/high-throughput • Per-bucket process isolation • Don’t break data manager performance/correctness Elastic • Automatically migrate and instantiate on rebalance • Provide support for migration of internal data • Leverage native Membase engine for internal data storage NodeCode – Drivers
    43. 43. 47 Block-level architecture
    44. 44. 48 Java only – jar format Must implement minimal module API • Initial module startup • Module removal • Association with bucket NodeCode library helper functions • Register synchronous & asynchronous listeners/callbacks • Register protocol extension/callbacks • Register rebalance callback • Register cluster manager event callbacks • Membase data access NodeCode 1.0 Plans
    45. 45. 50 Q&A
    46. 46. 51 Attributions • g_of_China.png • g_of_South_Korea.svg • g_of_Japan.svg