BI/Analytics on NoSQL: Review of Architectures

7,717 views
7,402 views

Published on

NoSQL is great for running your apps; flexible and scalable. Traditional SQL-centric BI tools are challenging if not impossible to use with data in NoSQL systems.

We will cover and discuss existing implementations, and the broad set of architectures for how organizations are "doing BI" on top of NoSQL systems. We will cover the challenges, strengths, and war stories of these various architectures along with practical advice for those who are adopting or building out these solutions. In particular we hope to help attendees answer the following questions:

How do I enable AdHoc, self service reporting on data in my NoSQL?
My NoSQL system is massively scalable, but my users complain that reports are slow: How do I improve report performance on my NoSQL data?
How do I integrate my NoSQL data with my existing Data Warehouse or BI systems?
Some of my data is in traditional RDBMSes; How do I build BI based on NoSQL plus additional outside data?
How do I do simple reporting on NoSQL data, but also do the rich complex analytics that only my NoSQL allows (graph analytics, social media analytics, etc)?

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • HI dear i am Priscilla by name, i read your profile and like to be in contact with you, that is why i drop this note for you, please i we like you to contact me in my privet mail box to enable me send my picture to you,and also tell you more about me thanks i we be waiting for your reply, (bernard_priscilla@yahoo.com)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
7,717
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
118
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

BI/Analytics on NoSQL: Review of Architectures

  1. 1. BI/Analytics for NoSQL:Review of Architectures
  2. 2. What well answer in 50 minutes• Who is this guy?• How do I enable AdHoc, self service reporting on NoSQL?• How do I improve the performance of dashboards on top of NoSQL?• How do I integrate NoSQL data with my other data not inside NoSQL?• How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?
  3. 3. Nicholas Goodman• Open Source BI thought leader – 50+ Open Source BI customer projects – Blogger, whitepapers, etc• Entrepreneur – DynamoBI Corporation – Bayon Technologies, Inc.• Data Geek, hacker, tinkerer, committer GOAL: Share perspectives, research, opinions. DISCLAIMER: Your Mileage ...
  4. 4. How do we answer those Qs?
  5. 5. Promise of “Big Data”• NoSQL/Hadoop/MapReduce Systems – Keep more of it – Cost effective analysis – “Massive scale” data, now accessible to everyone (elastic) – Not just SQL queries, more complex analysis ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING
  6. 6. Reality Check!• Petabytes? Y • Fast Queries? N• Cheap Storage? Y • Ad Hoc access? N• Raw Processing? Y • Accessibility to commodity BI tools? N• Rich Query Languages? Y• Flexible data structures? Y• Easy report authoring? N• Reliable, Fault Tolerant? Y• Levels of Aggregation? N • Integrated Data? N Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.
  7. 7. Data Gaps too!• Code, Developers • Analysts w/ Excel, Dashboards• MR, Rich Graph/Access • Simple 2D (tables, charts)• Hierarchical, Unstructured • Filtering and easy analytics
  8. 8. Levels of AggregationSAME DATA AT VARIOUSLEVELS OF AGGREGATIONHUGELY IMPORTANT IN REALLIFE IMPLEMENTATIONS! 10K1 ROW 1 MILLIONTO 100 MILLION1 BILLION ROWS 100 BILLION
  9. 9. Architectures• NoSQL reports• NoSQL thru and thru• NoSQL + MySQL• NoSQL as ETL Source• NoSQL programs in BI Tools• NoSQL via BI Database (SQL)
  10. 10. NoSQL reports• Pay Developer to build applications for reports Apps• 100% Richness of NoSQL • $$, developer driven process• Up to date, current • No commodity BI tools• Excellent performance on • Managing rollups/summaries large datasets • Schema-less = Harder!• Custom built, beautiful • Hard to integrate other reports/dashboards reporting information• Single system to manage
  11. 11. NoSQL thru and thru• Pay Developer to build FLEXIBLE applications for reports Indices Advanced Aggs Apps• All of NoSQL report • $$, developer driven process advantages • $$, app required for aggs• Managed aggregations, • No commodity BI tools rollups • Hard to integrate other• “Guided Adhoc” available reporting information inside application • Limited AdHoc (only• Higher performance for developer built dashboards/summaries combinations)
  12. 12. NoSQL + MySQL• Pay Developer to build FLEXIBLE applications for reports ETL App MySQL• Less IT $$ since developers • Data freshness (24 hrs old) arent “building reports” • Once into MySQL no rich• Rich, NoSQL analysis left in NoSQL application use (M/R) place (ETL + NoSQL) • BI Tool can connect ONLY to• Easy, Ad Hoc reporting via data in MySQL, not NoSQL commodity BI tools • Aggregations still self• Easier to understand data for managed in MySQL self service reports
  13. 13. NoSQL as ETL Data Source• NoSQL treated like any other data source Informatica Teradata• Allows use of consolidated, • ETL Development Expense BI tool for AdHoc • Data Latency• Enables integrated • Loss of NoSQL language (combined) datasets for richness reporting • Traditional DW tools are $$• Aggregations Often “managed” • Scaling issues with DW Database• Best of Breed tools
  14. 14. NoSQL programs in BI Tools• Write a program in BI tool that flattens data, output into report• Rich use of NoSQL native • Developer required to write language program ($$)• Direct, up to date access • Slow-er (aggs, summaries)• Access to 100% of dataset • Lacks integration with other• Leverage “guided” report datasets parameter pages • Still (usually) no AdHoc• Less expensive than apps access
  15. 15. NoSQL via BI Database (SQL)• Enable NoSQL data access via SQL (gasp!) Live Query Cached, 24hr data• Easy reports, easy (SQL) • Another system in between• Integration with other data • Still needs to be refreshed,• ETL is simple INSERT/MERGEs nightly• Live, up to date access • Not all capabilities for NoSQL richness available via SQL• High performance, cached data• AdHoc access to Live + Cached• Aggregations/Summaries
  16. 16. Mozilla: NoSQL thru and thru(DB)• Socorro Project: Crash reports, optionally sent to Mozilla• https://crash-stats.mozilla.com
  17. 17. X: NoSQL via SQL• Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)• Desire to use Tableau for advanced analytics/visualization
  18. 18. Meteor Solutions: NoSQL thru and thru• Using Cloudant BigCouch solution (SaaS)• High performance set of multi purpose indices on pre defined aggregations• Up to date aggregation/reports• Better fit for Social Media graph structures over relational DB• Custom built BI applications (dashboards/reports) providing a flexible guided view through data Advanced Apps
  19. 19. A,B,C: NoSQL + MySQL• Many Many companies (3 weve worked with)• All “web related” companies (semi structured, some, mostly volume)• Heavy lifting and storage, and “ETL/Data prepartion” inside Hadoop• Push summarized, aggregated data into MySQL for analysis by easy, dashboarding/BI Tools ETL App MySQL

×