Real-Time BI in
        Hadoop
              Bradford Stephens

       Lead Engineer, Visible Technologies
Principal Consultant, Drawn to Scale Consulting
Topics

• Scalability and BI
• Costs and Abilities
• Search as BI
What Is BI?
What is “Real-Time”


• Understanding Latency
• We aim for <5 secs.
Scalability in BI

• Scalbility matters now
• Social Media: Catalyst
• All data is important
• Data doesn’t scale with business size any
  more
Search as BI


• Katta = Distributed Search on Haddoop
• Bobo = Faceted Lucene
Doing it Cheap

• 100 TB, Structured and Unstructured
• Oracle- $100,000,000
• “NewSQL” - $4,000,000
• Hadoop + Katta - $250,000
Why We Need Hadoop

• Need to process high-latency data to get
  the “small stuff” fast
• Robust Ecosystem
• Need more than SQL. RDBMS not a Swiss-
  Army Knife
Aggregation is Real-
        Time

• Distributed Search w/ Katta + Facets =
  Aggregation-Based BI
• Sum, Count, Filter, Avg, Group
Protips: Review

• Understand High vs. Low Latency data
• Hadoop makes it cheap
• Pre-aggregate w/ Hadoop, Explore w/ Katta
  + Faceted Search
The Future


• Search/BI as a Platform: “Google my Data
  Warehouse”
• Real-Time MR on HBase

Real Time BI with Hadoop

  • 1.
    Real-Time BI in Hadoop Bradford Stephens Lead Engineer, Visible Technologies Principal Consultant, Drawn to Scale Consulting
  • 2.
    Topics • Scalability andBI • Costs and Abilities • Search as BI
  • 6.
  • 8.
    What is “Real-Time” •Understanding Latency • We aim for <5 secs.
  • 10.
    Scalability in BI •Scalbility matters now • Social Media: Catalyst • All data is important • Data doesn’t scale with business size any more
  • 11.
    Search as BI •Katta = Distributed Search on Haddoop • Bobo = Faceted Lucene
  • 17.
    Doing it Cheap •100 TB, Structured and Unstructured • Oracle- $100,000,000 • “NewSQL” - $4,000,000 • Hadoop + Katta - $250,000
  • 18.
    Why We NeedHadoop • Need to process high-latency data to get the “small stuff” fast • Robust Ecosystem • Need more than SQL. RDBMS not a Swiss- Army Knife
  • 19.
    Aggregation is Real- Time • Distributed Search w/ Katta + Facets = Aggregation-Based BI • Sum, Count, Filter, Avg, Group
  • 20.
    Protips: Review • UnderstandHigh vs. Low Latency data • Hadoop makes it cheap • Pre-aggregate w/ Hadoop, Explore w/ Katta + Faceted Search
  • 21.
    The Future • Search/BIas a Platform: “Google my Data Warehouse” • Real-Time MR on HBase