HBaseCon 2012Applications Track – Case Study                                  1
   Suraj Varma     Director of Technology Implementation     Gap Inc Direct (GID), San Francisco, CA     IRC: svarma ...
   Problem Domain   HBase Schema Specifics   HBase Cluster Specifics   Learning & Challenges                          ...
2009    2008    2005    2007    2010           APPLICATION SERVERS   DATABASESNEWATHLETACA & SITE LAUNCH UNIVERSALITY   PI...
   Evolution of the GID Apparel Catalog     2005 - Three independent brands in US     2010 – 5 integrated brands in US,...
   Single Catalog store for all brands/markets     Horizontally scalable over time     Cross brand business features  ...
   Sharded RDMBS, MemCached, etc     Significant effort was required     Still had scalability limits   Non-relational...
   Strong Consistency Model   Server Side Filters   Automatic Sharding, Distribution, Failover   Hadoop Integration ou...
NEAR REAL TIME INVENTORY UPDATES                                        MUTATIONS  INCOMING  REQUESTS     BACKEND         ...
   Read Mostly                Website Traffic                               Sync MR Jobs   Write / Delete Bursts     ...
   Hierarchical Data (Primarily)     SKU -> Style Lookups (child -> parent)     Cross Brand Sell (sibling <-> sibling) ...
READ FULL GRAPH                  READ SINGLE PATH / EDGE                                       12
   Built custom “bean to schema mapper”     POJO graph < -> HBase qualifiers     Flexibility to shorten column qualifie...
   <PP>_<id1>_QQ_<id2>_RR_<id3>_name     Where PP is parent, QQ is child, RR is grandchildPattern: ANCESTOR IDS EMBEDDED...
   Secondary Index     <id3> => RR ; QQ ; PP     FilterList with (RR, QQ, PP) ids to get thin slice      pathPattern: S...
   “Publish at Midnight”     Future Dated PUTs     Get/Scan with time range   Large Feed Files     Sharded into small...
   16 Slave (RS + TT + DN) Nodes     8 & 16 GB RAM   3 Master (HM,ZK,JT, NN) Nodes     8 GB RAM   NN Failover via NFS...
   Block Cache     Maximize Block Cache     hfile.block.cache.size: 0.6   Garbage Collection     MSLAB enabled     C...
   Quick Recovery on node failure     Default timeouts too large     zookeeper.session.timeout   Region Server     hb...
   Block Cache Size Tuning     Block Cache Churn   Hot Row scenarios     Perf Tests & Doing Phased Rollouts   Hot Reg...
   Monitoring is crucial     Layer by layer -> what’s the bottleneck     Metrics to target optimization & tuning     T...
   M/R Jobs running on live cluster     Has an impact – so cannot run full throttle     Go easy …   Feature Enablement...
INVENTORY UPDATESFEATURE “A” ENABLED:ADDITIONAL “N” REQ / SEC INCOMING           BACKEND           LOT                    ...
   Search     No out-of-the-box secondary indexes.     Custom solution with Solr   Transactions     Only row level at...
   Orthogonal access patterns     Optimize for most frequently used pattern.   Filters     May suffice, with early out...
   Rebuild from source data     Takes time … but no data loss   Export / import based backups     Faster … but stale  ...
We’re hiring!    http://www.gapinc.com                            27
Upcoming SlideShare
Loading in...5
×

HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live Website

3,094

Published on

Gap Inc Direct, the online division for Gap Inc., uses HBase to serve, in real-time, apparel catalog for all its brands’ and markets’ web sites. This case study will review the business case as well as key decisions regarding schema selection and cluster configurations. We will also discuss implementation challenges and insights that were learned.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,094
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
122
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live Website

  1. 1. HBaseCon 2012Applications Track – Case Study 1
  2. 2.  Suraj Varma  Director of Technology Implementation  Gap Inc Direct (GID), San Francisco, CA  IRC: svarma Gupta Gogula  Director-IT & Domain Architect of Catalog Management & Distribution  Gap Inc Direct (GID), San Francisco, CA 2
  3. 3.  Problem Domain HBase Schema Specifics HBase Cluster Specifics Learning & Challenges 3
  4. 4. 2009 2008 2005 2007 2010 APPLICATION SERVERS DATABASESNEWATHLETACA & SITE LAUNCH UNIVERSALITY PIPERLIME EU MARKETS US CA EU US CA EUINCOMING TRAFFIC US CA EU US US 4
  5. 5.  Evolution of the GID Apparel Catalog  2005 - Three independent brands in US  2010 – 5 integrated brands in US, CA, EU Rapid Expansion of Apparel Catalog However, each brand / market combination necessitated separate logical catalog databases 5
  6. 6.  Single Catalog store for all brands/markets  Horizontally scalable over time  Cross brand business features Access data store directly  To avail of inventory awareness of items Minimal Caching – only for optimization  Keeping caches in sync is a problem. Highly Available 6
  7. 7.  Sharded RDMBS, MemCached, etc  Significant effort was required  Still had scalability limits Non-relational alternatives considered HBase POC (early-2010)  Promising results -decided to move ahead 7
  8. 8.  Strong Consistency Model Server Side Filters Automatic Sharding, Distribution, Failover Hadoop Integration out of the box General Purpose Other use cases outside of Catalog Strong Community! 8
  9. 9. NEAR REAL TIME INVENTORY UPDATES MUTATIONS INCOMING REQUESTS BACKEND HBASE FOR SERVICES REQUESTS CLUSTERCATALOG DATA MUTATIONS MUTATIONS PRICING UPDATES ITEM UPDATES 9
  10. 10.  Read Mostly  Website Traffic  Sync MR Jobs Write / Delete Bursts  Catalog Publish  Phase out to near real- time updates from originating systems  MR jobs on Live Cluster Continuous Writes  Inventory Updates 10
  11. 11.  Hierarchical Data (Primarily)  SKU -> Style Lookups (child -> parent)  Cross Brand Sell (sibling <-> sibling) Rows: 100KB avg size 1000-5000 cols Data Access Patterns Sparse rows  Full Product Graph in one read  Single path of graph from root to leaf node  Search - Secondary Indices  Large Feed files 11
  12. 12. READ FULL GRAPH READ SINGLE PATH / EDGE 12
  13. 13.  Built custom “bean to schema mapper”  POJO graph < -> HBase qualifiers  Flexibility to shorten column qualifiers  Flexibility to change schema qualifiers (per environment / developer) <…> <association>one-to-many</association> <prefix>SC</prefix> <uniqueId>colorCd</uniqueId> <beanName>styleColorBean</beanName> <…> 13
  14. 14.  <PP>_<id1>_QQ_<id2>_RR_<id3>_name  Where PP is parent, QQ is child, RR is grandchildPattern: ANCESTOR IDS EMBEDDED IN QUALIFIER NAMEcf1:VAR_1_SC_0012_colorCdcf2:VAR_1_SC_0012_SCIMG_10_path 14
  15. 15.  Secondary Index  <id3> => RR ; QQ ; PP  FilterList with (RR, QQ, PP) ids to get thin slice pathPattern: SECONDARY INDEX TO HIERARCHICAL ANCESTORS KEY_5555 4444 333 22 1 15
  16. 16.  “Publish at Midnight”  Future Dated PUTs  Get/Scan with time range Large Feed Files  Sharded into smaller chunks < 2MB per cellPattern: SHARDED CHUNKS KEY_nnnn S_1 S_2 S_3 S_4 16
  17. 17.  16 Slave (RS + TT + DN) Nodes  8 & 16 GB RAM 3 Master (HM,ZK,JT, NN) Nodes  8 GB RAM NN Failover via NFS 17
  18. 18.  Block Cache  Maximize Block Cache  hfile.block.cache.size: 0.6 Garbage Collection  MSLAB enabled  CMSInitiatingOccupancyFactor 18
  19. 19.  Quick Recovery on node failure  Default timeouts too large  zookeeper.session.timeout Region Server  hbase.rpc.timeout Data Node  dfs.heartbeat.recheck.interval  heartbeat.recheck.interval 19
  20. 20.  Block Cache Size Tuning  Block Cache Churn Hot Row scenarios  Perf Tests & Doing Phased Rollouts Hot Region issues  Perf Tests & Pre-split Regions. Filters  CPU Intensive – profiling needed. 20
  21. 21.  Monitoring is crucial  Layer by layer -> what’s the bottleneck  Metrics to target optimization & tuning  Troubleshooting Non Uniform Hardware  Sub-optimal region distribution  Hefty boxes lightly loaded. 21
  22. 22.  M/R Jobs running on live cluster  Has an impact – so cannot run full throttle  Go easy … Feature Enablement – Phase in  Don’t turn on several features together  Easier identification of potential hot regions / rows, overloaded RS, etc 22
  23. 23. INVENTORY UPDATESFEATURE “A” ENABLED:ADDITIONAL “N” REQ / SEC INCOMING BACKEND LOT HBASE REQUESTS SERVICES MORE CLUSTER REQUESTSFEATURE “B” ENABLED:ADDITIONAL “K” REQ / SEC PRICING UPDATES ITEM UPDATES Enable Features individually to measure impact and tune cluster accordingly 23
  24. 24.  Search  No out-of-the-box secondary indexes.  Custom solution with Solr Transactions  Only row level atomicity  But … can’t pack all in a single row  Atomic Cross-Row Put/Delete and HBASE-5229 seem potential partial solves (0.94+) 24
  25. 25.  Orthogonal access patterns  Optimize for most frequently used pattern. Filters  May suffice, with early out configurations  Impacts CPU usage Duplicate data for every access pattern  Too drastic  Effort to keep all copies in sync 25
  26. 26.  Rebuild from source data  Takes time … but no data loss Export / import based backups  Faster … but stale  Another MR on live cluster Better options in future releases … 26
  27. 27. We’re hiring! http://www.gapinc.com 27
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×