Big data and mstr bridge the elephant


Published on

Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant”

Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy.

The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO.

This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big data and mstr bridge the elephant

  1. 1. Big Data and MicroStrategy: Building a Bridge for the Elephant Paul Groom, Chief Innovation OfficerJan 2013
  2. 2. Let’s start at… The End.
  3. 3. Panacea
  4. 4. You…built the E DW
  5. 5. You…built the BICC
  6. 6. and yes you built…lots of cool reports and dashboards
  7. 7. EpilogueA comfortable status quo
  8. 8. How are you really judged? • Fast? • Consistent? • All users?
  9. 9. Rrrrrriiiiiiinnnnnngggggg! Back to the real world
  10. 10. Disruption
  11. 11. Disruptor: New Data
  12. 12. Disruptor: Social Media & Sentiment
  13. 13. Disruptor: Data ?
  14. 14. Disruptor: More Connected Users
  15. 15. Disruptor: Data Discovery ToolsChoices for engaging quickly with dataBusiness users head’s distracted from core BI!
  16. 16. BI Wild West
  17. 17. Where it matters
  18. 18. Lots of variety of DW and EDW
  19. 19. The Reality of the DW analytical workload
  20. 20. EDW says no or not now!…and CFO says no big upgrades
  21. 21. Pragmatism…ok so you enable plenty of caching, limit drill anywhere and add Intelligent Cubes
  22. 22. And then came…
  23. 23. Distraction or Boon
  24. 24. Scalable, resilient, bit bucket
  25. 25. Experimenting © 20th Century Fox
  26. 26. The Hadoop stack Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  27. 27. Hadoop Performance Reality• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads – ~30 second base response time – Too much latency in stack and processing model – Trade-off in optimization and latency• MapReduce complex – Typically multiple Java routines
  28. 28. SQL to the Rescue• So MapReduce is complicated – use Hive (SQL) as the easy way out Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  29. 29. Hive• Simplifies access Hive is great, but Hadoop’s execution engine “ makes even the smallest queries take minutes!”• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage
  30. 30. Conclusion Hadoop just too slow for interactive BI! “while hadoop shines as a processing platform, it is painfully slow as a query tool” …loss of train-of-thought
  31. 31. Hive is based on Hadoop which is a batch processing system. Accordingly,this system does not and cannot promise low latencies on queries. Theparadigm here is strictly of submitting jobs and being notified when the jobsare completed as opposed to real time queries. As a result it should not becompared with systems like Oracle where analysis is done on asignificantly smaller amount of data but the analysis proceeds much moreiteratively with the response times between iterations being less than a fewminutes. For Hive queries response times for even the smallest jobscan be of the order of 5-10 minutes and for larger jobs this may evenrun into hours.I remain skeptical on the practical performance of the Hive query approachand have yet to talk to any beta customers. A more practical approach isloading some of the Hadoop data into the in-memory cube with the newHadoop connector.
  32. 32. Why can’t HadoopWhy can’t I have a be in-memory?giant icubes?
  33. 33. Remember…Lots of theseHadoop inherently disk orientedNot so many of theseTypically low ratio of CPU to Disk
  34. 34. Larger cubes Issues: Time to Populate, Proliferation
  35. 35. Alternative - In-memory Processing Analyticsdo the work! Cores requires CPU, RAM keeps the data close Scale with the data
  36. 36. Goals: Minimise Disruption, Cut Latency• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk – Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput – Its all about queries per hour!• Minimal DBA requirement
  37. 37. Kognitio Hadoop ConnectorsHDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memoryFilter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant predicates to agent• Data filtering and projection takes place locally on each Hadoop node• Only data of interest is loaded into memory via parallel load streams
  38. 38. BI – Central GovernanceCentrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools Analytical power
  39. 39. Engineering for Success Thomas Herbrich
  40. 40. connect NA: +1 855 EMEA: +44 1344 300