0
Big Data and MicroStrategy:           Building a Bridge for the Elephant                 Paul Groom, Chief Innovation Offi...
Let’s start at…           The End.
Panacea
You…built the E              DW
You…built the BICC
and yes you built…lots of cool reports and dashboards
EpilogueA comfortable status quo
How are you really judged?                             • Fast?                             • Consistent?                  ...
Rrrrrriiiiiiinnnnnngggggg!                 Back to the real world
Disruption
Disruptor: New Data
Disruptor: Social Media & Sentiment
Disruptor:             Data ?
Disruptor: More Connected Users
Disruptor: Data Discovery ToolsChoices for engaging quickly with dataBusiness users head’s distracted from core BI!
BI Wild West
Where it matters
Lots of variety of DW and EDW
The Reality of the DW analytical workload
EDW says no or not now!…and CFO says no big upgrades
Pragmatism…ok so you enable plenty of caching, limit drill anywhere and add Intelligent Cubes
And then came…
Distraction                                          or                                   Boonhttp://oris-rake.deviantart....
Scalable, resilient, bit bucket
Experimenting   © 20th Century Fox
The Hadoop stack                                              Pig              Hive         ZooKepper / Ambari            ...
Hadoop Performance Reality• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overhea...
SQL to the Rescue• So MapReduce is complicated    – use Hive (SQL) as the easy way out                                    ...
Hive• Simplifies access    Hive is great, but Hadoop’s execution engine    “    makes even the smallest queries take minut...
Conclusion Hadoop just too slow for interactive BI!         “while hadoop shines as a processing          platform, it is ...
Hive is based on Hadoop which is a batch processing system. Accordingly,this system does not and cannot promise low latenc...
Why can’t HadoopWhy can’t I have a   be in-memory?giant icubes?
Remember…Lots of theseHadoop inherently disk orientedNot so many of theseTypically low ratio of CPU to Disk
Larger cubes Issues: Time to Populate, Proliferation
Alternative - In-memory Processing  Analyticsdo the work!    Cores requires CPU,  RAM keeps the data close    Scale with t...
Goals: Minimise Disruption, Cut Latency• Don’t change the existing BI and analytics• Support more creative and dynamic BI•...
Kognitio Hadoop ConnectorsHDFS Connector• Connector defines access to hdfs file system• External table accesses row-based ...
BI – Central GovernanceCentrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all ...
Engineering for Success Thomas Herbrich
connect                                   NA: +1 855  KOGNITIOwww.kognitio.com                   EMEA: +44 1344 300 770lin...
Big data and mstr   bridge the elephant
Big data and mstr   bridge the elephant
Big data and mstr   bridge the elephant
Big data and mstr   bridge the elephant
Big data and mstr   bridge the elephant
Upcoming SlideShare
Loading in...5
×

Big data and mstr bridge the elephant

887

Published on

Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant”

Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy.

The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO.

This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
887
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data and mstr bridge the elephant"

  1. 1. Big Data and MicroStrategy: Building a Bridge for the Elephant Paul Groom, Chief Innovation OfficerJan 2013
  2. 2. Let’s start at… The End.
  3. 3. Panacea
  4. 4. You…built the E DW
  5. 5. You…built the BICC
  6. 6. and yes you built…lots of cool reports and dashboards
  7. 7. EpilogueA comfortable status quo
  8. 8. How are you really judged? • Fast? • Consistent? • All users?
  9. 9. Rrrrrriiiiiiinnnnnngggggg! Back to the real world
  10. 10. Disruption
  11. 11. Disruptor: New Data
  12. 12. Disruptor: Social Media & Sentiment
  13. 13. Disruptor: Data ?
  14. 14. Disruptor: More Connected Users
  15. 15. Disruptor: Data Discovery ToolsChoices for engaging quickly with dataBusiness users head’s distracted from core BI!
  16. 16. BI Wild West
  17. 17. Where it matters
  18. 18. Lots of variety of DW and EDW
  19. 19. The Reality of the DW analytical workload
  20. 20. EDW says no or not now!…and CFO says no big upgrades
  21. 21. Pragmatism…ok so you enable plenty of caching, limit drill anywhere and add Intelligent Cubes
  22. 22. And then came…
  23. 23. Distraction or Boonhttp://oris-rake.deviantart.com/
  24. 24. Scalable, resilient, bit bucket
  25. 25. Experimenting © 20th Century Fox
  26. 26. The Hadoop stack Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  27. 27. Hadoop Performance Reality• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads – ~30 second base response time – Too much latency in stack and processing model – Trade-off in optimization and latency• MapReduce complex – Typically multiple Java routineshttps://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
  28. 28. SQL to the Rescue• So MapReduce is complicated – use Hive (SQL) as the easy way out Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  29. 29. Hive• Simplifies access Hive is great, but Hadoop’s execution engine “ makes even the smallest queries take minutes!”• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage
  30. 30. Conclusion Hadoop just too slow for interactive BI! “while hadoop shines as a processing platform, it is painfully slow as a query tool” …loss of train-of-thought
  31. 31. Hive is based on Hadoop which is a batch processing system. Accordingly,this system does not and cannot promise low latencies on queries. Theparadigm here is strictly of submitting jobs and being notified when the jobsare completed as opposed to real time queries. As a result it should not becompared with systems like Oracle where analysis is done on asignificantly smaller amount of data but the analysis proceeds much moreiteratively with the response times between iterations being less than a fewminutes. For Hive queries response times for even the smallest jobscan be of the order of 5-10 minutes and for larger jobs this may evenrun into hours.I remain skeptical on the practical performance of the Hive query approachand have yet to talk to any beta customers. A more practical approach isloading some of the Hadoop data into the in-memory cube with the newHadoop connector.
  32. 32. Why can’t HadoopWhy can’t I have a be in-memory?giant icubes?
  33. 33. Remember…Lots of theseHadoop inherently disk orientedNot so many of theseTypically low ratio of CPU to Disk
  34. 34. Larger cubes Issues: Time to Populate, Proliferation
  35. 35. Alternative - In-memory Processing Analyticsdo the work! Cores requires CPU, RAM keeps the data close Scale with the data
  36. 36. Goals: Minimise Disruption, Cut Latency• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk – Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput – Its all about queries per hour!• Minimal DBA requirement
  37. 37. Kognitio Hadoop ConnectorsHDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memoryFilter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant predicates to agent• Data filtering and projection takes place locally on each Hadoop node• Only data of interest is loaded into memory via parallel load streams
  38. 38. BI – Central GovernanceCentrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools Analytical power
  39. 39. Engineering for Success Thomas Herbrich
  40. 40. connect NA: +1 855  KOGNITIOwww.kognitio.com EMEA: +44 1344 300 770linkedin.com/companies/kognitio twitter.com/kognitiotinyurl.com/kognitio youtube.com/kognitio
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×