Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

HBase brings interactivity to Hadoop, and allows users to collect, manage and process data in real-time. Lily wraps HBase and Solr in a comprehensive Big Data platform, with HBase-native secondary indexing complementing ad-hoc structured search. Through spare write-cycles during read operations, Lily transforms HBase in an scalable data management engine providing interactive analytics, profile harvesting and real-time recommendations. This talk highlights the architecture of Lily, how it completes HBase, and explains some of its implementation use cases.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

  1. 1. Making Sense of Data Lily goes shopping –real-time recommendations with HBase HBaseCon, May 2012 Steven Noels – VP Product – @stevenn WWW.NGDATA.COM
  2. 2. Lily Core 2’ recap•  HBase-backed data repository, with batteries included•  Data model: •  high-level data model on top of HBase’s client app byte[]’s •  schema •  versioning (schema and data) Lily •  links, variants RowLog•  Java & REST APIs•  Indexing: HBase Solr et al. •  through configuration, not implementation •  incremental and batch index maintenance•  RowLog: distributed, durable queue for sec. actions•  Open Source: (Apache License) WWW.NGDATA.COM
  3. 3. Why HBase?•  BigTable model•  sparseness•  atomic row updates aka concistency•  auto-partitioning•  Apache license•  A great community led by a Saint J WWW.NGDATA.COM
  4. 4. Portfolio Overview Real-time AI Recommendations Industry algorithms and rules commercial availability   Trend Analytics Pattern Detection Profile Development Context and Activity Tracking open source   Social Stream Ingestion Schema and Data Management Total Data Aggregation Real-time Index and Retrieval Security and Enterprise Connectors WWW.NGDATA.COM
  5. 5. Lily (=HBase) In UseSome of the larger Lily deployments•  media •  aggregation, database publishing and online archives•  finance •  real-time identity fraud detection•  retail banking •  contextualized (time+loc+person) mobile coupons•  retail •  e-commerce platform: product catalog, consumer data store, real-time indexing WWW.NGDATA.COM
  6. 6. Collaborative Filtering? Recommend items similar to a user’s highly-preferred items WWW.NGDATA.COM
  7. 7. Collaborative Filtering is … Matrixes Sean likes “Scarface” a lot (123,654,5.0)! Robin likes “Scarface” somewhat (789,654,3.0)! Grant likes “The Notebook” not at all (345,876,1.0)! … …! (Magic) Grant may like “Scarface” quite a bit (345,654,4.5)! … …! WWW.NGDATA.COM
  8. 8. Contextualized recommendations Personalized offers shops & merchants Profile Acitvity Item product families offers/couponscreditcardstatements WWW.NGDATA.COM
  9. 9. Fitting Recommendations into the LilyArchitecture LILY CRUD API Lily/HBase Secondary Indexes read/write demultiplexer co-occurence lookup matrix rowlog activity store Steven Noels telephone: +32 9 33 engine LILY recommender 88 220 data profile data, activity, profile scoring indexes store store Gent (Belgium) propensity custom ... k-means ALS Makers of Lily Core Repository algorithm support WWW.NGDATA.COM
  10. 10. Preferencing aka Feeding the Matrix•  Transaction-based preferencing •  Pluggable preference strategies, using Lily-based data (HBase&Solr) for decision making •  e.g. credit card statement = transactions between users and product families •  Preference weighting •  Ingest: REST API, bulk support •  Real-time updating of the recommendation model•  Profile Store •  Profile activities can be preferenced •  Support for Profile behavior analysis WWW.NGDATA.COM
  11. 11. Making recommendations•  Recommender •  Pluggable recommender strategies, using Lily-based data (HBase&Solr) for decision making •  Multi-model support: user-item & item-user recommendations •  Estimation of both preferenced and non-preferenced items •  Geolocation-based recommendations •  Re-scoring •  REST API•  (Planned) •  Support for Classifications (scenario - Recommend me all (possible) coffee drinkers) •  Matrix / recommendation indexing WWW.NGDATA.COM
  12. 12. Other upcoming Lily Features•  Secondary indexes (= Lily Core!) •  indexes are defined through configuration •  single or multi-field indexes •  range queries and prefix queries •  asc or desc sorted results •  can read huge, sorted lists •  synchronously updated: index updates are applied by rowlog secondary actions •  online building of new indexes (no table locks) •  MapReduce integration•  SolrCloud integration •  Index shards and configuration managed through ZooKeeper WWW.NGDATA.COM
  13. 13. Making Sense of DataQuestions? Thank you! WWW.NGDATA.COM