Real-Time BI in Hadoop <ul><li>Bradford Stephens </li></ul><ul><li>Lead Engineer,  Visible Technologies </li></ul><ul><li>...
Topics <ul><li>Scalability and BI </li></ul><ul><li>Costs and Abilities </li></ul><ul><li>Search as BI </li></ul>
 
 
 
What Is BI?
 
What is “Real-Time” <ul><li>Understanding Latency </li></ul><ul><li>We aim for <5 secs.  </li></ul>
 
Scalability in BI <ul><li>Scalbility matters now </li></ul><ul><li>Social Media: Catalyst </li></ul><ul><li>All data is im...
Search as BI <ul><li>Katta = Distributed Search on Haddoop </li></ul><ul><li>Bobo = Faceted Lucene </li></ul>
 
 
 
 
 
Doing it Cheap <ul><li>100 TB, Structured and Unstructured </li></ul><ul><li>Oracle- $100,000,000 </li></ul><ul><li>“ NewS...
Why We Need Hadoop <ul><li>Need to process high-latency data to get the “small stuff” fast </li></ul><ul><li>Robust Ecosys...
Aggregation is Real-Time <ul><li>Distributed Search w/ Katta + Facets = Aggregation-Based BI </li></ul><ul><li>Sum, Count,...
Protips: Review <ul><li>Understand High vs. Low Latency data </li></ul><ul><li>Hadoop makes it cheap </li></ul><ul><li>Pre...
The Future <ul><li>Search/BI as a Platform:  “Google my Data Warehouse” </li></ul><ul><li>Real-Time MR on HBase </li></ul>
Upcoming SlideShare
Loading in...5
×

Hw09 Real Time Business Intelligence

1,903

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,903
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
117
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hw09 Real Time Business Intelligence

  1. 1. Real-Time BI in Hadoop <ul><li>Bradford Stephens </li></ul><ul><li>Lead Engineer, Visible Technologies </li></ul><ul><li>Principal Consultant, Drawn to Scale Consulting </li></ul>
  2. 2. Topics <ul><li>Scalability and BI </li></ul><ul><li>Costs and Abilities </li></ul><ul><li>Search as BI </li></ul>
  3. 6. What Is BI?
  4. 8. What is “Real-Time” <ul><li>Understanding Latency </li></ul><ul><li>We aim for <5 secs. </li></ul>
  5. 10. Scalability in BI <ul><li>Scalbility matters now </li></ul><ul><li>Social Media: Catalyst </li></ul><ul><li>All data is important </li></ul><ul><li>Data doesn’t scale with business size any more </li></ul>
  6. 11. Search as BI <ul><li>Katta = Distributed Search on Haddoop </li></ul><ul><li>Bobo = Faceted Lucene </li></ul>
  7. 17. Doing it Cheap <ul><li>100 TB, Structured and Unstructured </li></ul><ul><li>Oracle- $100,000,000 </li></ul><ul><li>“ NewSQL” - $4,000,000 </li></ul><ul><li>Hadoop + Katta - $250,000 </li></ul>
  8. 18. Why We Need Hadoop <ul><li>Need to process high-latency data to get the “small stuff” fast </li></ul><ul><li>Robust Ecosystem </li></ul><ul><li>Need more than SQL. RDBMS not a Swiss-Army Knife </li></ul>
  9. 19. Aggregation is Real-Time <ul><li>Distributed Search w/ Katta + Facets = Aggregation-Based BI </li></ul><ul><li>Sum, Count, Filter, Avg, Group </li></ul>
  10. 20. Protips: Review <ul><li>Understand High vs. Low Latency data </li></ul><ul><li>Hadoop makes it cheap </li></ul><ul><li>Pre-aggregate w/ Hadoop, Explore w/ Katta + Faceted Search </li></ul>
  11. 21. The Future <ul><li>Search/BI as a Platform: “Google my Data Warehouse” </li></ul><ul><li>Real-Time MR on HBase </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×