Solbase & Real-time Activity

2,532 views

Published on

Solbase, the real time open-source search engine, is now available on github. Solbase was developed by Photobucket.com and is built upon Lucene, Solr and HBase. Photobucket has also recently released a real time community activity stream capturing the 4 million daily uploads as well as all of your friends' comments and favorite photos. The foundation of the system is HBase and also employs Kestrel queues. This talk will cover the architecture, implementation details and share many of the lessons learned while developing this real time big data system.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,532
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • We should go over these agendas and introduce each of presenters
  • First, Koh is going to talk about Solbase.  That's our real time search engine that was built on top of Lucene, Solr, and HBase.  We started presenting Solbase about 9 months ago, and at that time we reported that our standard implementation of lucene/solr was no longer scaling to meet our needs, and our initial tests of Solbase gave us hope that we were going to solve that problem AND dramatically improve performance.  In addition we were updating our search index in real time.  Great results, but possibly the bigger news at that time was that we were planning to open source all the code.  Tonight Koh is here to deliver on that promise. The next topic we'll cover is another HBase feature developed at PB: our activity stream.  It's what you'd probably expect.  A social network feature that distributes events about photos and videos in near real time.  We've seen a number of presentations on similar features, but rarely to you see any detail on the architecture or lessons learned that would help you build your own.  Ron and Josh are going to do exactly that. But before we jump into all that... why do you care?  who is PB?  
  • We're the biggest dedicated photo site on the web and we're right next door.   We have millions of active users and billions of photos.
  • Here's a quick slide on our size compared to our peers… its a little old, but you get the idea.   We have millions of unique visitors.  
  • Over time those users have contributed half a billion public photos and videos to our search index, and we generate a boatload of social events around all that public media.
  • Lucene's Field cache for sorting and filtering became very problematic for us Turn around time for building entire set of indices took us about a day Every 100 ms improvement in response time equates to approximatey 1 extra page views Impractical to add significatn number of new docs and data 
  • In a nutch shell, Solbase have basically replaced indices stored in local filesystem to database in HBase also overcame lucene's inherent limitations. and one major one we solved is sort/filter 
  • Ron Here
  • Ron Here
  • Ron Here
  • Ron Here
  • Kestrel is open source and developed at twitter.
  • Talk about scale and real-time processing speed. Ops per second. 1 thread push 40/s all the way to hbase.
  • Talk about scale and real-time processing speed. Ops per second. 1 thread push 40/s all the way to hbase.
  • Josh Here HBase is a distributed big-table like database build upon Hadoop components leverages HDFS, Hadoop ’s distributed file system Built upon Hadoop, scales to a massive size, virtually limitless used by many large scale companies: Facebook, Yahoo, Google (through their big-table implementaiton) Ask who has used hbase
  • Josh Here HBase is a distributed big-table like database build upon Hadoop components leverages HDFS, Hadoop ’s distributed file system Built upon Hadoop, scales to a massive size, virtually limitless used by many large scale companies: Facebook, Yahoo, Google (through their big-table implementaiton) Ask who has used hbase To fix: 1. Features     column store     key/value store witih semi-structured values.      2. Why use hbase?     -horizontal scalability     -high write throughput     -millions of columns billion of rows
  • consists of master nodes with a set of region servers to distribute the data The master is the gateway interface to direct clients to the proper region server for the requested data Data is replicated among several data nodes by Hadoop ’s file system, HDFS There is ‘locational affinity’ between the region server and the data served
  • Each table consists of a row key, a set of defined column families, and an arbitrary number of qualified columns for each family Keys are store lexicographically so that range scans between two keys is extremely fast All data is binary interestingly, this is similar to the concept of the inverted index, where the ‘terms’ are lexicographically stored; this is something that we leverage in our implementation
  • Mention using lexicographical key to pre-sort data.
  • Get : single row access, similar to SQL like query by primary key Put: single row update/insert (can be done in batch) Scan: lexicographic range query between 2 specified keys
  • Back to Ron HBase optimization: scans continue to be fast, large multi-gets have been an issue.
  • HBase optimization: scans continue to be fast, large multi-gets have been an issue.
  • Solbase & Real-time Activity

    1. 2. <ul><ul><li>Doug McCuen - Director of Engineering </li></ul></ul><ul><ul><li>Ron White - Senior Software Engineer </li></ul></ul><ul><ul><li>Josh Hollander - Senior Software Engineer </li></ul></ul><ul><ul><li>Kyungseog Oh - Senior Software Engineer </li></ul></ul>Who we are
    2. 3. Photobucket Solbase Activity Stream Agenda
    3. 4. • Photobucket is the most-visited photo site with 23.4 Million UVs • Over 9 Billion photos stored! • Users upload 4 Million images per day! • Photobucket users spend more time than any other photo site with 3.8 Avg mins/visit • 2.0 Million avg daily visitors - more daily visits than Flickr and Picasa combined Sources: 1comScore May 2011, 2Internal data Photobucket Overview
    4. 5. 23.4M UVs 9.9M UVs 9.5M UVs 7.9M UVs 1.6M UVs 19.7M UVs 6.0M UVs
    5. 6. <ul><ul><li>Upload </li></ul></ul><ul><ul><ul><li>4M images/videos upload per day </li></ul></ul></ul><ul><ul><li>Search </li></ul></ul><ul><ul><ul><li>Over 30M requests per day </li></ul></ul></ul><ul><ul><li>Social Activity </li></ul></ul><ul><ul><ul><li>20k &quot;Likes&quot;/day </li></ul></ul></ul><ul><ul><ul><li>5k comments/day </li></ul></ul></ul><ul><ul><ul><li>10k &quot;Follows&quot;/day </li></ul></ul></ul>Sources: 1comScore May 2011, 2Internal data Photobucket Stats
    6. 7. Solbase is an open-source, real-time search platform based on Lucene, Solr and HBase built at Photobucket What is Solbase?
    7. 8. <ul><ul><li>Memory Issue </li></ul></ul><ul><ul><li>Indexing time </li></ul></ul><ul><ul><li>Speed </li></ul></ul><ul><ul><li>Capacity </li></ul></ul>Why Solbase?
    8. 9. <ul><ul><li>Overcame Lucene ’s inherent limitations (memory issues) with embedded sort/filter fields </li></ul></ul><ul><ul><li>Replaced Lucene index file with distributed database, HBase </li></ul></ul><ul><ul><li>Moved initial indexing process to map/reduce framework for faster processing time </li></ul></ul><ul><ul><li>Provided Real time indexing capability </li></ul></ul>Summary of what we did
    9. 10. <ul><ul><li>Average query time for native Solr/Lucene: 169 ms </li></ul></ul><ul><ul><li>Average query time for Solbase: 109 ms or 35% decrease </li></ul></ul><ul><ul><li>Term ‘me’ has ~14M docs </li></ul></ul><ul><ul><ul><li>‘ me’ takes 13 seconds to load from HBase, 500 ms from term vector cache </li></ul></ul></ul><ul><ul><li>Most terms not in cache take < 200 ms </li></ul></ul><ul><ul><li>Most cached terms take < 20 ms </li></ul></ul><ul><ul><li>~300 real-time updates per second </li></ul></ul>Results
    10. 11. <ul><ul><li>Geo-search </li></ul></ul><ul><ul><li>Other data products within Photobucket, outside of search, as a general query engine for large data sets </li></ul></ul>Next Steps
    11. 12. https://github.com/Photobucket/Solbase https://github.com/Photobucket/Solbase-Solr https://github.com/Photobucket/Solbase-Lucene Solbase repos
    12. 13. Activity Stream is Social networking feature using HBase, Flume, Kestrel, Camel built at Photobucket What is Activity Stream?
    13. 14. <ul><ul><li>Somebody you follow: </li></ul></ul><ul><ul><ul><li>Uploads new photos or videos </li></ul></ul></ul><ul><ul><ul><li>Comments on media </li></ul></ul></ul><ul><ul><ul><li>Likes media </li></ul></ul></ul><ul><ul><li>Somebody follows you </li></ul></ul><ul><ul><li>Somebody likes your content </li></ul></ul><ul><ul><li>Somebody comments on your media </li></ul></ul>Activity Events
    14. 15. Activity Events Rendered
    15. 16. <ul><ul><li>Difficult Problem </li></ul></ul><ul><ul><ul><li>Especially with &quot;real-time&quot; requirements </li></ul></ul></ul><ul><ul><li>Options </li></ul></ul><ul><ul><ul><li>Fan-in </li></ul></ul></ul><ul><ul><ul><ul><li>Very slow </li></ul></ul></ul></ul><ul><ul><ul><li>Scatter-gather </li></ul></ul></ul><ul><ul><ul><ul><li>Used by Facebook </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Parallelized </li></ul></ul></ul></ul><ul><ul><ul><li>Fan-out </li></ul></ul></ul><ul><ul><ul><ul><li>Simpler Engineering </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Massive amounts of data (very de-normalized) </li></ul></ul></ul></ul>Delivering Activities
    16. 17. <ul><ul><li>Flume & Kestrel – how we collect user activity </li></ul></ul><ul><ul><li>Processor – how we fan out that activity to other users </li></ul></ul><ul><ul><li>Query service – providing this data back to our php front end </li></ul></ul><ul><ul><li>HBase – how we store tons of data </li></ul></ul>Discussion Overview
    17. 18. Activity Collection
    18. 19. <ul><ul><li>Flume </li></ul></ul><ul><ul><ul><li>Part of Hadoop Stack </li></ul></ul></ul><ul><ul><ul><li>Distributed Real-time Log processing tool </li></ul></ul></ul><ul><ul><ul><li>Collects logs written by php web servers </li></ul></ul></ul><ul><ul><li>Kestrel </li></ul></ul><ul><ul><ul><li>Open source, developed at twitter </li></ul></ul></ul><ul><ul><ul><li>Fast, reliable, durable queue </li></ul></ul></ul><ul><ul><ul><li>Horizontally scalable to infinity  </li></ul></ul></ul><ul><ul><ul><li>Not strongly ordered </li></ul></ul></ul><ul><ul><ul><li>Memcache protocol </li></ul></ul></ul>Flume & Kestrel
    19. 20. <ul><ul><li>Camel </li></ul></ul><ul><ul><ul><li>Enterprise Intergration Patterns (EIP) framework </li></ul></ul></ul><ul><ul><li>Fanout Processor </li></ul></ul><ul><ul><ul><li>Receive a message from queue </li></ul></ul></ul><ul><ul><ul><li>Six different processors </li></ul></ul></ul><ul><ul><ul><ul><li>Easily configured with Camel </li></ul></ul></ul></ul><ul><ul><ul><li>Writes copy of the activity for all users interested in that event </li></ul></ul></ul>Fanout Processor & Camel
    20. 21. <ul><ul><li>HBase/PHP adapter </li></ul></ul><ul><ul><ul><li>Provides a simple service interface for PHP web servers </li></ul></ul></ul><ul><ul><li>Caching - In Memory </li></ul></ul><ul><ul><ul><li>Consistent hashing load balancer </li></ul></ul></ul><ul><ul><ul><li>No Serialization/Deserialization penalty </li></ul></ul></ul><ul><ul><li>Custom Rollup Logic </li></ul></ul>Query Service
    21. 22. <ul><ul><li>Throughput </li></ul></ul><ul><ul><ul><li>40 events/sec per processor thread </li></ul></ul></ul><ul><ul><ul><li>nominal load of 5/sec </li></ul></ul></ul><ul><ul><li>Latency </li></ul></ul><ul><ul><ul><li>Users typically see new activity within 1 second of event </li></ul></ul></ul><ul><ul><ul><li>Delete events slower </li></ul></ul></ul><ul><ul><li>Responsiveness </li></ul></ul><ul><ul><ul><li>90% < 1 sec query time </li></ul></ul></ul><ul><ul><ul><li>average response 301ms </li></ul></ul></ul>Performance 
    22. 23. <ul><li>HBase is: </li></ul><ul><ul><li>Based on Google's Big-Table </li></ul></ul><ul><ul><ul><li>&quot;distributed, versioned, column oriented store&quot;  </li></ul></ul></ul><ul><ul><ul><li>persistent, sorted, multidimensional map </li></ul></ul></ul><ul><ul><li>Pure-java implementation </li></ul></ul><ul><ul><li>Built on top of Hadoop </li></ul></ul><ul><ul><ul><li>HDFS for storage </li></ul></ul></ul><ul><ul><ul><li>Zookeeper </li></ul></ul></ul><ul><ul><li>Used extensively at Facebook, Yahoo and Stumble Upon among others. </li></ul></ul>What is HBase?
    23. 24. <ul><ul><li>Highly horizontally scalable </li></ul></ul><ul><ul><ul><li>We store huge amounts of user activity data </li></ul></ul></ul><ul><ul><ul><li>Fanout implies duplication of that data </li></ul></ul></ul><ul><ul><ul><li>Need to be able to expand storage/servers easily </li></ul></ul></ul><ul><ul><li>High write throughput </li></ul></ul><ul><ul><ul><li>Our users generate a lot of activity very quickly. </li></ul></ul></ul><ul><ul><ul><li>Want fanout to be near realtime </li></ul></ul></ul><ul><ul><li>&quot;Millions of columns, Billions of rows&quot; </li></ul></ul>Why HBase?
    24. 25. Hadoop/Hbase Architecture
    25. 26. Schema: {row key 1 {      column family 1{        c olumn 1 {data1},           column 2 {data 2}         … }      ...}   } {row key 2 {...}} Example: {dog:spotty {owner{matt{age 41}, linda{age 41}} vaccinations{rabies{july 2011}}} {cat:fluffy {owner{doug{age 41}, heather{age 41}} vaccinations{rabies{june2011}}} HBase Tables
    26. 27. <ul><ul><li>Key </li></ul></ul><ul><ul><ul><li>salted userid + inverted timestamp </li></ul></ul></ul><ul><ul><ul><li>Scan by userid from timestamp 0 to timestamp fffffff </li></ul></ul></ul><ul><ul><li>Each Activity has up to 6 items with similar data </li></ul></ul><ul><ul><ul><ul><li>Traditionally would be normalized into another table </li></ul></ul></ul></ul><ul><ul><ul><ul><li>No joins in HBase </li></ul></ul></ul></ul><ul><ul><ul><ul><li>We use multiple column families each with the same column schema </li></ul></ul></ul></ul>Our Schema Design
    27. 28. <ul><ul><li>Get </li></ul></ul><ul><ul><ul><li>Single row query </li></ul></ul></ul><ul><ul><li>Put </li></ul></ul><ul><ul><ul><li>Single row update/insert </li></ul></ul></ul><ul><ul><li>Delete </li></ul></ul><ul><ul><li>Scan </li></ul></ul><ul><ul><ul><li>Range query between start and end keys </li></ul></ul></ul><ul><ul><ul><li>Can be filtered by column data filters </li></ul></ul></ul><ul><ul><li>Batched operations (GET, PUT, DELETE) </li></ul></ul><ul><ul><ul><li>Actions across multiple regions are parallelized  </li></ul></ul></ul><ul><ul><li>HBase Abstraction </li></ul></ul><ul><ul><ul><li>Built JDBC template like HBase wrapper classes </li></ul></ul></ul>Hbase Client API
    28. 29. <ul><ul><li>Kestrel configuration </li></ul></ul><ul><ul><ul><li>Pre-define queues in the config </li></ul></ul></ul><ul><ul><li>Threading issues </li></ul></ul><ul><ul><ul><li>HBase/Kestrel use many threads & connections </li></ul></ul></ul><ul><ul><ul><li>Set high limits for nprocs & nofiles </li></ul></ul></ul><ul><ul><li>Million Follower Problem </li></ul></ul><ul><ul><ul><li>Chunk large batch operations </li></ul></ul></ul><ul><ul><ul><li>Limits to Abstraction </li></ul></ul></ul>Challenges
    29. 30. <ul><ul><li>Hardware configuration </li></ul></ul><ul><ul><ul><li>Don't RAID </li></ul></ul></ul><ul><ul><ul><li>Dedicated cluster switch </li></ul></ul></ul><ul><ul><li>Replication </li></ul></ul><ul><ul><ul><li>Still in Beta </li></ul></ul></ul><ul><ul><ul><li>Needed for Disaster Recovery </li></ul></ul></ul><ul><ul><ul><li>Worked through several issues </li></ul></ul></ul><ul><ul><li>Hot Regions </li></ul></ul><ul><ul><ul><li>User activity is not well distributed </li></ul></ul></ul><ul><ul><li>Manual Region Splitting & Major Compaction </li></ul></ul><ul><ul><li>Garbage collection </li></ul></ul><ul><ul><ul><li>HBase memory hog </li></ul></ul></ul>HBase Challenges
    30. 31. http://www.cloudera.com/resource/hadoop-world-2011-presentation-slides-advanced-hbase-schema-design http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/ References
    31. 32. Q&A

    ×