eBay marketplace has been working hard on the next generation search infrastructure and software system, code-named Cassini. The new search engine processes over 250 million search queries and serves more than 2 billion page views each day. Its indexing platform is based on Apache Hadoop and Apache HBase. Apache HBase is a distributed persistent layer built on Hadoop to support billions of updates per day. Its easy sharding character, fast writes, and table scans, super fast data bulk load, and natural integration to Hadoop provide the cornerstones for successful continuous index builds. We will share with the audience the technical details and share the difficulties and challenges that we’ve gone through and that we are still facing in the process.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
1. HBase
the Use Case in eBay Cassini
Thomas Pan
Principal Software Engineer
eBay Marketplaces
2. eBay Marketplaces
97 million
active buyers and sellers world wide
200+ million items
in more than 50,000 categories
2 billion page views
each day
9 petabytes of data
in our Hadoop and Teradata clusters
250 million queries
each day to our search engine
3. Cassini
eBay’s new Search Engine
Entirely new codebase
World-class, from a world class team
Platform for ranking innovation
Four major tracks, 100+ engineers
Likely launch in 2012
4. Indexing in Cassini
Index with more data and more history
More computationally expensive work at index-
time (and less at query-time)
Ability to rescore and reclassify entire site inventory
The entire site inventory is stored in HBase
Indexes are built via MapReduce jobs and stored in
HDFS
Build the entire site inventory in hours
5.
6.
7. Hbase Table Data Import
Bulk Load
Batch processing on demand or every couple of hours
Load a large amount of data quickly
PUT
Near real time updates
Better for updating small amount of data
Read after PUT for better random read performance
8. HBase Tables
3 major tables: active items, completed items and sellers
15TB data
3600 pre-split regions per table with auto-split disabled
3 column families with maximum 200 columns
Automatic major compaction disabled
RowKey is bit reversal of document id (unsigned 64-bit
integer)
10. Numbers
Data import
Bulk data import: 30 minutes for 500 million full rows
Random write: ~ 200,000,000 rows per day
1.2 TB data daily import
Scan Performance
Scan speed: 2004 rows per second per region server
(average version 3), 465 rows per second per region
server (average version 10)
Scan speed with filters: 325~353 rows per second per
region server
11. Operations
Monitoring
Ganglia
Nagios
OpenTSDB
Testing
Unit test and regression test
HBaseTestingUtility for unit test
Standalone Hbase for regression test (mvn verify)
Cluster level
Fault Injection Tests [HBASE-4925]
Region balancer
Manual major compaction
12. Operations (Cont’d)
Disable swap
Largely increase file descriptor limit and xciever count
Metrics Watch for
jvm.DataNode.metrics.threadRunnable Connection leakage
with netstat
hbase.regionserver.compactionQueueSize Major/minor
compactions
dfs.datanode.blockReports_avg_time Data block reporting (for
too many data blocks)
network_report Network bandwidth
usage (for data locality)
13. Community
Acknowledgement
Eli Collins
Kannan Muthukkaruppan
Karthik Ranganathan
Konstantin Shvachko
Lars George
Michael Stack
Ted Yu
Todd Lipcon
Editor's Notes
45 nodes per rack with 5 racks of data nodes total.Each node has 12 * 2TB disk space, 72GB RAM and 24 cores under hyper-threading.Each node is running region server, task tracker, data node, 8 open slots for mappers and 6 open slots for reducers.Enterprise nodes are dual powered, dual homed with active-active TORS and backed up by Netapp Filer.No TORS redundancy on data node racksWhy share Hmaser with Zookeeper nodes?----- Meeting Notes (1/26/12 14:02) -----# TORS lack of redudencyShare ranks among different clusters.Then, network bandwidth on TORS could be an issue.With extra 5 racks, the impact is much smaller
MapReduce is to slice and dice data, leveraging large scale cluster.The indexing job is to convert raw data into pieces of data, easy to merge, in index format, and grouped under query node columns.Merge jobs are running parallel. Among them, the posting list merge job is the most expensive and will become more expensive.Column group data is copied 4 times and posting list data is copied 5 times in the pipeline.----- Meeting Notes (1/26/12 14:02) -----Nick: Why not collapse all three merge/packing/packaging phases together?