Real Time BI with Hadoop
Upcoming SlideShare
Loading in...5
×
 

Real Time BI with Hadoop

on

  • 16,759 views

A brief synopsis of using the Apache Hadoop stack to build a Real-Time Business Intelligence application, including data warehousing and search.

A brief synopsis of using the Apache Hadoop stack to build a Real-Time Business Intelligence application, including data warehousing and search.

Statistics

Views

Total Views
16,759
Views on SlideShare
16,658
Embed Views
101

Actions

Likes
15
Downloads
430
Comments
2

4 Embeds 101

http://www.slideshare.net 98
http://obl8.com:8000 1
http://webcache.googleusercontent.com 1
http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • I hate to display my ignorance among such an august body, but after downloading the presentation (in .key format) absolutely nothing would recognize it - not powerpoint, not Safari, not QuickTime -- nothing. A quick search of Google revealed no plausible alternate application tied to '.key' suffix-files, and there doesn't appear to be an independent app called 'Apple Keynote'. What gives?
    -- Confused in Denver
    Are you sure you want to
    Your message goes here
    Processing…
  • I guess you mean NoSQL and not NowSQL, right? ;-)

    Cheers,
    Herbert
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Real Time BI with Hadoop Real Time BI with Hadoop Presentation Transcript

  • Real-Time BI in Hadoop Bradford Stephens Lead Engineer, Visible Technologies Principal Consultant, Drawn to Scale Consulting
  • Topics • Scalability and BI • Costs and Abilities • Search as BI
  • What Is BI?
  • What is “Real-Time” • Understanding Latency • We aim for <5 secs.
  • Scalability in BI • Scalbility matters now • Social Media: Catalyst • All data is important • Data doesn’t scale with business size any more
  • Search as BI • Katta = Distributed Search on Haddoop • Bobo = Faceted Lucene
  • Doing it Cheap • 100 TB, Structured and Unstructured • Oracle- $100,000,000 • “NewSQL” - $4,000,000 • Hadoop + Katta - $250,000
  • Why We Need Hadoop • Need to process high-latency data to get the “small stuff” fast • Robust Ecosystem • Need more than SQL. RDBMS not a Swiss- Army Knife
  • Aggregation is Real- Time • Distributed Search w/ Katta + Facets = Aggregation-Based BI • Sum, Count, Filter, Avg, Group
  • Protips: Review • Understand High vs. Low Latency data • Hadoop makes it cheap • Pre-aggregate w/ Hadoop, Explore w/ Katta + Faceted Search
  • The Future • Search/BI as a Platform: “Google my Data Warehouse” • Real-Time MR on HBase