HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

Cloudera, Inc.
Cloudera, Inc.Cloudera, Inc.
Making Sense of Data




        Lily goes shopping –
real-time recommendations with HBase
                         HBaseCon, May 2012




         Steven Noels – VP Product – @stevenn


                             WWW.NGDATA.COM
Lily Core 2’ recap
•  HBase-backed data repository,
   with batteries included
•  Data model:
    •  high-level data model on top of HBase’s
                                                       client app
       byte[]’s
    •  schema
    •  versioning (schema and data)                         Lily
    •  links, variants
                                                           RowLog
•  Java & REST API's
•  Indexing:                                       HBase           Solr et al.

    •  through configuration, not implementation
    •  incremental and batch index maintenance
•  RowLog: distributed, durable queue for sec.
   actions
•  Open Source: www.lilyproject.org (Apache
   License)


                                                            WWW.NGDATA.COM
Why HBase?
•  BigTable model
•  sparseness
•  atomic row updates aka concistency
•  auto-partitioning
•  Apache license
•  A great community led by a Saint J




                                         WWW.NGDATA.COM
Portfolio Overview

                                               Real-time AI
                                               Recommendations
                                               Industry algorithms and rules


                                             commercial availability	
  
                 Trend Analytics
               Pattern Detection



          Profile Development
  Context and Activity Tracking              open source	
  
       Social Stream Ingestion


                                   Schema and Data Management
                                   Total Data Aggregation
                                   Real-time Index and Retrieval
                                   Security and Enterprise Connectors




                                                              WWW.NGDATA.COM
Lily (=HBase) In Use
Some of the larger Lily deployments

•  media
    •  aggregation, database publishing and online archives
•  finance
     •  real-time identity fraud detection
•  retail banking
     •  contextualized (time+loc+person) mobile coupons
•  retail
    •  e-commerce platform:
       product catalog, consumer data store, real-time
       indexing




                                                              WWW.NGDATA.COM
Collaborative Filtering?

  Recommend items similar to a user’s highly-preferred items




                                                          WWW.NGDATA.COM
Collaborative Filtering is … Matrixes


   Sean likes “Scarface” a lot             (123,654,5.0)!
   Robin likes “Scarface” somewhat         (789,654,3.0)!
   Grant likes “The Notebook” not at all   (345,876,1.0)!
   …                                       …!

                                              (Magic)




   Grant may like “Scarface” quite a bit   (345,654,4.5)!
   …                                       …!



                                                    WWW.NGDATA.COM
Contextualized recommendations


                                  Personalized
                                     offers




                                                        shops & merchants
             Profile   Acitvity                  Item   product families
                                                        offers/coupons




creditcard
statements

                                                             WWW.NGDATA.COM
Fitting Recommendations into the Lily
Architecture

            LILY CRUD API

                                                       Lily/HBase Secondary Indexes


       read/write demultiplexer

                                                                                        co-occurence
                                                                                        lookup matrix


               rowlog                       activity store
                                                                               Steven Noels
                                                                           stevenn@ngdata.com
                                                                             www.ngdata.com
                                                                        telephone: +32 9 33 engine
                                                                               LILY recommender 88 220
                data        profile   data, activity, profile scoring
  indexes
                store       store                                             Gent (Belgium)




                                                                                                     propensity


                                                                                                                   custom ...
                                                                                           k-means
                                                                                  ALS
                                                                                                                                Makers of


    Lily Core Repository
                                                                                        algorithm support



                                                                                                                  WWW.NGDATA.COM
Preferencing aka Feeding the Matrix
•  Transaction-based preferencing
     •  Pluggable preference strategies, using Lily-based data
        (HBase&Solr) for decision making
        •  e.g. credit card statement = transactions between users and product
           families
    •  Preference weighting
    •  Ingest: REST API, bulk support
    •  Real-time updating of the recommendation model



•  Profile Store
     •  Profile activities can be preferenced
    •  Support for Profile behavior analysis



                                                                   WWW.NGDATA.COM
Making recommendations
•  Recommender
    •  Pluggable recommender strategies, using Lily-based data
       (HBase&Solr) for decision making
    •  Multi-model support: user-item & item-user recommendations
    •  Estimation of both preferenced and non-preferenced items
    •  Geolocation-based recommendations
    •  Re-scoring
    •  REST API



•  (Planned)
     •  Support for Classifications
        (scenario - Recommend me all (possible) coffee drinkers)
     •  Matrix / recommendation indexing


                                                              WWW.NGDATA.COM
Other upcoming Lily Features
•  Secondary indexes (= Lily Core!)
    •  indexes are defined through configuration
    •  single or multi-field indexes
    •  range queries and prefix queries
    •  asc or desc sorted results
    •  can read huge, sorted lists
    •  synchronously updated: index updates are applied by rowlog
       secondary actions
    •  online building of new indexes (no table locks)
    •  MapReduce integration


•  SolrCloud integration
    •  Index shards and configuration managed through ZooKeeper



                                                          WWW.NGDATA.COM
Making Sense of Data




Questions? Thank you!




               WWW.NGDATA.COM
1 of 13

Recommended

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures by
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresCloudera, Inc.
4.1K views16 slides
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro by
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
5.5K views38 slides
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems by
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
6.1K views35 slides
Content Identification using HBase by
Content Identification using HBaseContent Identification using HBase
Content Identification using HBaseHBaseCon
3.8K views16 slides
A Survey of HBase Application Archetypes by
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
20K views60 slides
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data by
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
3.5K views17 slides

More Related Content

What's hot

Building a Hadoop Data Warehouse with Impala by
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
2K views37 slides
Building a Business on Hadoop, HBase, and Open Source Distributed Computing by
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBradford Stephens
42.2K views89 slides
Design Patterns for Building 360-degree Views with HBase and Kiji by
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
4.3K views37 slides
Engineering practices in big data storage and processing by
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
1.6K views54 slides
HBase Status Report - Hadoop Summit Europe 2014 by
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
1.1K views40 slides
What database by
What databaseWhat database
What databaseRegunath B
3.2K views21 slides

What's hot(20)

Building a Hadoop Data Warehouse with Impala by huguk
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk2K views
Building a Business on Hadoop, HBase, and Open Source Distributed Computing by Bradford Stephens
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens42.2K views
Design Patterns for Building 360-degree Views with HBase and Kiji by HBaseCon
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon4.3K views
Engineering practices in big data storage and processing by Schubert Zhang
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
Schubert Zhang1.6K views
HBase Status Report - Hadoop Summit Europe 2014 by larsgeorge
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge1.1K views
What database by Regunath B
What databaseWhat database
What database
Regunath B3.2K views
Apache Drill by Ted Dunning
Apache DrillApache Drill
Apache Drill
Ted Dunning17.7K views
New Security Features in Apache HBase 0.98: An Operator's Guide by HBaseCon
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
HBaseCon10.6K views
Cloudera Impala: A Modern SQL Engine for Apache Hadoop by Cloudera, Inc.
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.5K views
HBase and Impala Notes - Munich HUG - 20131017 by larsgeorge
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge10.2K views
Architecting Applications with Hadoop by markgrover
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover765 views
Impala: Real-time Queries in Hadoop by Cloudera, Inc.
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.12.6K views
In Search of Database Nirvana: Challenges of Delivering HTAP by HBaseCon
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
HBaseCon1.6K views
An introduction to apache drill presentation by MapR Technologies
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
MapR Technologies2.7K views
Application architectures with hadoop – big data techcon 2014 by Jonathan Seidman
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman2.3K views
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase by HBaseCon
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon3.3K views
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize... by Data Con LA
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA974 views
SQL Engines for Hadoop - The case for Impala by markgrover
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover1.2K views
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0 by Adam Muise
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise3.1K views

Viewers also liked

HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget by
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
3.1K views26 slides
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
4.6K views23 slides
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th... by
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...Cloudera, Inc.
3.4K views8 slides
HBaseCon 2012 | Real-time Analytics with HBase - Sematext by
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
8K views40 slides
HBaseCon 2013: Scalable Network Designs for Apache HBase by
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
5.7K views47 slides
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W... by
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
5.8K views27 slides

Viewers also liked(20)

HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget by Cloudera, Inc.
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.3.1K views
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by Cloudera, Inc.
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.4.6K views
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th... by Cloudera, Inc.
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
Cloudera, Inc.3.4K views
HBaseCon 2012 | Real-time Analytics with HBase - Sematext by Cloudera, Inc.
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
Cloudera, Inc.8K views
HBaseCon 2013: Scalable Network Designs for Apache HBase by Cloudera, Inc.
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.5.7K views
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W... by Cloudera, Inc.
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Cloudera, Inc.5.8K views
HBaseCon 2013: Full-Text Indexing for Apache HBase by Cloudera, Inc.
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.7.3K views
HBaseCon 2012 | HBase, the Use Case in eBay Cassini by Cloudera, Inc.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.6.1K views
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data... by Cloudera, Inc.
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.3.5K views
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural... by Cloudera, Inc.
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
Cloudera, Inc.8.8K views
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc... by Cloudera, Inc.
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.9.3K views
HBaseCon 2013: Near Real Time Indexing for eBay Search by Cloudera, Inc.
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
Cloudera, Inc.5.9K views
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera by Cloudera, Inc.
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.5.5K views
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce by Cloudera, Inc.
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.41.7K views
HBase for Dealing with Large Matrices by gcapan
HBase for Dealing with Large MatricesHBase for Dealing with Large Matrices
HBase for Dealing with Large Matrices
gcapan612 views
20130404 emacs conf 2013 sketchnotes by Sacha Chua
20130404 emacs conf 2013 sketchnotes20130404 emacs conf 2013 sketchnotes
20130404 emacs conf 2013 sketchnotes
Sacha Chua2.3K views
Quantified Awesome: Tracking Clothes, Groceries, and Other Small Things by Sacha Chua
Quantified Awesome: Tracking Clothes, Groceries, and Other Small ThingsQuantified Awesome: Tracking Clothes, Groceries, and Other Small Things
Quantified Awesome: Tracking Clothes, Groceries, and Other Small Things
Sacha Chua6.4K views
Emacs Modes I can't work without by Hitesh Sharma
Emacs Modes I can't work withoutEmacs Modes I can't work without
Emacs Modes I can't work without
Hitesh Sharma2.4K views

Similar to HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

Use of EMR for Marketing Segmentation by
Use of EMR for Marketing SegmentationUse of EMR for Marketing Segmentation
Use of EMR for Marketing SegmentationAmazon Web Services
958 views21 slides
Streaming Hadoop for Enterprise Adoption by
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionDATAVERSITY
829 views16 slides
Common MongoDB Use Cases by
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use CasesDATAVERSITY
11K views22 slides
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh by
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - UtkarshSlash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarshslashn
3.1K views25 slides
Introducing the Big Data Ecosystem with Caserta Concepts & Talend by
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
3.1K views25 slides
Bigdata antipatterns by
Bigdata antipatternsBigdata antipatterns
Bigdata antipatternsAnurag S
217 views46 slides

Similar to HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata(20)

Streaming Hadoop for Enterprise Adoption by DATAVERSITY
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
DATAVERSITY829 views
Common MongoDB Use Cases by DATAVERSITY
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY11K views
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh by slashn
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - UtkarshSlash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
Slash n: Tech Talk Track 1 – Art and Science of Cataloguing - Utkarsh
slashn3.1K views
Introducing the Big Data Ecosystem with Caserta Concepts & Talend by Caserta
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta 3.1K views
Bigdata antipatterns by Anurag S
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
Anurag S217 views
Common MongoDB Use Cases Webinar by MongoDB
Common MongoDB Use Cases WebinarCommon MongoDB Use Cases Webinar
Common MongoDB Use Cases Webinar
MongoDB752 views
Next Generation Data Platforms - Deon Thomas by Thoughtworks
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
Thoughtworks3.2K views
Combining Hadoop RDBMS for Large-Scale Big Data Analytics by DataWorks Summit
Combining Hadoop RDBMS for Large-Scale Big Data AnalyticsCombining Hadoop RDBMS for Large-Scale Big Data Analytics
Combining Hadoop RDBMS for Large-Scale Big Data Analytics
DataWorks Summit12.1K views
Millions quotes per second in pure java by Roman Elizarov
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
Roman Elizarov5.1K views
The Microsoft BigData Story by Lynn Langit
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
Lynn Langit3.5K views
7 Databases in 70 minutes by Karen Lopez
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
Karen Lopez4.8K views
2011 - TDWI Big Data Forum - The New Analytics by Casey Kiernan
2011 - TDWI Big Data Forum - The New Analytics 2011 - TDWI Big Data Forum - The New Analytics
2011 - TDWI Big Data Forum - The New Analytics
Casey Kiernan457 views
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... by Amazon Web Services
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
Big Data Paris : Hadoop and NoSQL by Tugdual Grall
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
Tugdual Grall1.1K views
Processing Big Data by cwensel
Processing Big DataProcessing Big Data
Processing Big Data
cwensel817 views
No Sql Movement by Ajit Koti
No Sql MovementNo Sql Movement
No Sql Movement
Ajit Koti600 views
How we use Hive at SnowPlow, and how the role of HIve is changing by yalisassoon
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon3.4K views
Big Data with Not Only SQL by Philippe Julio
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio21.2K views

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx by
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
107 views55 slides
Cloudera Data Impact Awards 2021 - Finalists by
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
6.4K views34 slides
2020 Cloudera Data Impact Awards Finalists by
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
6.3K views43 slides
Edc event vienna presentation 1 oct 2019 by
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
4.5K views67 slides
Machine Learning with Limited Labeled Data 4/3/19 by
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
3.6K views36 slides
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
2.5K views21 slides

More from Cloudera, Inc.(20)

Partner Briefing_January 25 (FINAL).pptx by Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.107 views
Cloudera Data Impact Awards 2021 - Finalists by Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.6.4K views
2020 Cloudera Data Impact Awards Finalists by Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.6.3K views
Edc event vienna presentation 1 oct 2019 by Cloudera, Inc.
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.4.5K views
Machine Learning with Limited Labeled Data 4/3/19 by Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.3.6K views
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.2.5K views
Introducing Cloudera DataFlow (CDF) 2.13.19 by Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.4.9K views
Introducing Cloudera Data Science Workbench for HDP 2.12.19 by Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.2.7K views
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19 by Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.1.6K views
Leveraging the cloud for analytics and machine learning 1.29.19 by Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.1.6K views
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19 by Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.2.5K views
Leveraging the Cloud for Big Data Analytics 12.11.18 by Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.1.7K views
Modern Data Warehouse Fundamentals Part 3 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.1.3K views
Modern Data Warehouse Fundamentals Part 2 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.2.3K views
Modern Data Warehouse Fundamentals Part 1 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.1.5K views
Extending Cloudera SDX beyond the Platform by Cloudera, Inc.
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.966 views
Federated Learning: ML with Privacy on the Edge 11.15.18 by Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.2.2K views
Analyst Webinar: Doing a 180 on Customer 360 by Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.1.4K views
Build a modern platform for anti-money laundering 9.19.18 by Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.1K views
Introducing the data science sandbox as a service 8.30.18 by Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.1.2K views

Recently uploaded

Voice Logger - Telephony Integration Solution at Aegis by
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
17 views1 slide
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
91 views86 slides
Transcript: The Details of Description Techniques tips and tangents on altern... by
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...BookNet Canada
119 views15 slides
How the World's Leading Independent Automotive Distributor is Reinventing Its... by
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...NUS-ISS
15 views25 slides
Five Things You SHOULD Know About Postman by
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
25 views43 slides
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... by
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...Vadym Kazulkin
70 views64 slides

Recently uploaded(20)

Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma17 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software91 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada119 views
How the World's Leading Independent Automotive Distributor is Reinventing Its... by NUS-ISS
How the World's Leading Independent Automotive Distributor is Reinventing Its...How the World's Leading Independent Automotive Distributor is Reinventing Its...
How the World's Leading Independent Automotive Distributor is Reinventing Its...
NUS-ISS15 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 views
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor... by Vadym Kazulkin
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
How to reduce cold starts for Java Serverless applications in AWS at JCON Wor...
Vadym Kazulkin70 views
AI: mind, matter, meaning, metaphors, being, becoming, life values by Twain Liu 刘秋艳
AI: mind, matter, meaning, metaphors, being, becoming, life valuesAI: mind, matter, meaning, metaphors, being, becoming, life values
AI: mind, matter, meaning, metaphors, being, becoming, life values
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... by NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS32 views
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze by NUS-ISS
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng TszeDigital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
NUS-ISS19 views
Black and White Modern Science Presentation.pptx by maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291614 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab11 views
Perth MeetUp November 2023 by Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price12 views
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum... by NUS-ISS
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
NUS-ISS28 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb12 views
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... by NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS23 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta14 views

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

  • 1. Making Sense of Data Lily goes shopping – real-time recommendations with HBase HBaseCon, May 2012 Steven Noels – VP Product – @stevenn WWW.NGDATA.COM
  • 2. Lily Core 2’ recap •  HBase-backed data repository, with batteries included •  Data model: •  high-level data model on top of HBase’s client app byte[]’s •  schema •  versioning (schema and data) Lily •  links, variants RowLog •  Java & REST API's •  Indexing: HBase Solr et al. •  through configuration, not implementation •  incremental and batch index maintenance •  RowLog: distributed, durable queue for sec. actions •  Open Source: www.lilyproject.org (Apache License) WWW.NGDATA.COM
  • 3. Why HBase? •  BigTable model •  sparseness •  atomic row updates aka concistency •  auto-partitioning •  Apache license •  A great community led by a Saint J WWW.NGDATA.COM
  • 4. Portfolio Overview Real-time AI Recommendations Industry algorithms and rules commercial availability   Trend Analytics Pattern Detection Profile Development Context and Activity Tracking open source   Social Stream Ingestion Schema and Data Management Total Data Aggregation Real-time Index and Retrieval Security and Enterprise Connectors WWW.NGDATA.COM
  • 5. Lily (=HBase) In Use Some of the larger Lily deployments •  media •  aggregation, database publishing and online archives •  finance •  real-time identity fraud detection •  retail banking •  contextualized (time+loc+person) mobile coupons •  retail •  e-commerce platform: product catalog, consumer data store, real-time indexing WWW.NGDATA.COM
  • 6. Collaborative Filtering? Recommend items similar to a user’s highly-preferred items WWW.NGDATA.COM
  • 7. Collaborative Filtering is … Matrixes Sean likes “Scarface” a lot (123,654,5.0)! Robin likes “Scarface” somewhat (789,654,3.0)! Grant likes “The Notebook” not at all (345,876,1.0)! … …! (Magic) Grant may like “Scarface” quite a bit (345,654,4.5)! … …! WWW.NGDATA.COM
  • 8. Contextualized recommendations Personalized offers shops & merchants Profile Acitvity Item product families offers/coupons creditcard statements WWW.NGDATA.COM
  • 9. Fitting Recommendations into the Lily Architecture LILY CRUD API Lily/HBase Secondary Indexes read/write demultiplexer co-occurence lookup matrix rowlog activity store Steven Noels stevenn@ngdata.com www.ngdata.com telephone: +32 9 33 engine LILY recommender 88 220 data profile data, activity, profile scoring indexes store store Gent (Belgium) propensity custom ... k-means ALS Makers of Lily Core Repository algorithm support WWW.NGDATA.COM
  • 10. Preferencing aka Feeding the Matrix •  Transaction-based preferencing •  Pluggable preference strategies, using Lily-based data (HBase&Solr) for decision making •  e.g. credit card statement = transactions between users and product families •  Preference weighting •  Ingest: REST API, bulk support •  Real-time updating of the recommendation model •  Profile Store •  Profile activities can be preferenced •  Support for Profile behavior analysis WWW.NGDATA.COM
  • 11. Making recommendations •  Recommender •  Pluggable recommender strategies, using Lily-based data (HBase&Solr) for decision making •  Multi-model support: user-item & item-user recommendations •  Estimation of both preferenced and non-preferenced items •  Geolocation-based recommendations •  Re-scoring •  REST API •  (Planned) •  Support for Classifications (scenario - Recommend me all (possible) coffee drinkers) •  Matrix / recommendation indexing WWW.NGDATA.COM
  • 12. Other upcoming Lily Features •  Secondary indexes (= Lily Core!) •  indexes are defined through configuration •  single or multi-field indexes •  range queries and prefix queries •  asc or desc sorted results •  can read huge, sorted lists •  synchronously updated: index updates are applied by rowlog secondary actions •  online building of new indexes (no table locks) •  MapReduce integration •  SolrCloud integration •  Index shards and configuration managed through ZooKeeper WWW.NGDATA.COM
  • 13. Making Sense of Data Questions? Thank you! WWW.NGDATA.COM