Your SlideShare is downloading. ×
Big data and mstr   bridge the elephant
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big data and mstr bridge the elephant


Published on

Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant” …

Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant”

Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy.

The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO.

This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Big Data and MicroStrategy: Building a Bridge for the Elephant Paul Groom, Chief Innovation OfficerJan 2013
  • 2. Let’s start at… The End.
  • 3. Panacea
  • 4. You…built the E DW
  • 5. You…built the BICC
  • 6. and yes you built…lots of cool reports and dashboards
  • 7. EpilogueA comfortable status quo
  • 8. How are you really judged? • Fast? • Consistent? • All users?
  • 9. Rrrrrriiiiiiinnnnnngggggg! Back to the real world
  • 10. Disruption
  • 11. Disruptor: New Data
  • 12. Disruptor: Social Media & Sentiment
  • 13. Disruptor: Data ?
  • 14. Disruptor: More Connected Users
  • 15. Disruptor: Data Discovery ToolsChoices for engaging quickly with dataBusiness users head’s distracted from core BI!
  • 16. BI Wild West
  • 17. Where it matters
  • 18. Lots of variety of DW and EDW
  • 19. The Reality of the DW analytical workload
  • 20. EDW says no or not now!…and CFO says no big upgrades
  • 21. Pragmatism…ok so you enable plenty of caching, limit drill anywhere and add Intelligent Cubes
  • 22. And then came…
  • 23. Distraction or Boon
  • 24. Scalable, resilient, bit bucket
  • 25. Experimenting © 20th Century Fox
  • 26. The Hadoop stack Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  • 27. Hadoop Performance Reality• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads – ~30 second base response time – Too much latency in stack and processing model – Trade-off in optimization and latency• MapReduce complex – Typically multiple Java routines
  • 28. SQL to the Rescue• So MapReduce is complicated – use Hive (SQL) as the easy way out Pig Hive ZooKepper / Ambari HBase MapReduce Oozie HCatalog HDFS
  • 29. Hive• Simplifies access Hive is great, but Hadoop’s execution engine “ makes even the smallest queries take minutes!”• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage
  • 30. Conclusion Hadoop just too slow for interactive BI! “while hadoop shines as a processing platform, it is painfully slow as a query tool” …loss of train-of-thought
  • 31. Hive is based on Hadoop which is a batch processing system. Accordingly,this system does not and cannot promise low latencies on queries. Theparadigm here is strictly of submitting jobs and being notified when the jobsare completed as opposed to real time queries. As a result it should not becompared with systems like Oracle where analysis is done on asignificantly smaller amount of data but the analysis proceeds much moreiteratively with the response times between iterations being less than a fewminutes. For Hive queries response times for even the smallest jobscan be of the order of 5-10 minutes and for larger jobs this may evenrun into hours.I remain skeptical on the practical performance of the Hive query approachand have yet to talk to any beta customers. A more practical approach isloading some of the Hadoop data into the in-memory cube with the newHadoop connector.
  • 32. Why can’t HadoopWhy can’t I have a be in-memory?giant icubes?
  • 33. Remember…Lots of theseHadoop inherently disk orientedNot so many of theseTypically low ratio of CPU to Disk
  • 34. Larger cubes Issues: Time to Populate, Proliferation
  • 35. Alternative - In-memory Processing Analyticsdo the work! Cores requires CPU, RAM keeps the data close Scale with the data
  • 36. Goals: Minimise Disruption, Cut Latency• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk – Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput – Its all about queries per hour!• Minimal DBA requirement
  • 37. Kognitio Hadoop ConnectorsHDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memoryFilter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant predicates to agent• Data filtering and projection takes place locally on each Hadoop node• Only data of interest is loaded into memory via parallel load streams
  • 38. BI – Central GovernanceCentrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools Analytical power
  • 39. Engineering for Success Thomas Herbrich
  • 40. connect NA: +1 855 EMEA: +44 1344 300