No sql now2011_review_of_adhoc_architectures

  • 20,489 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
20,489
On Slideshare
0
From Embeds
0
Number of Embeds
17

Actions

Shares
Downloads
0
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BI/Analytics for NoSQL:Review of Architectures
  • 2. What well answer in 50 minutes• Who is this guy?• How do I enable AdHoc, self service reporting on NoSQL?• How do I improve the performance of dashboards on top of NoSQL?• How do I integrate NoSQL data with my other data not inside NoSQL?• How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?
  • 3. Nicholas Goodman• Open Source BI thought leader – 50+ Open Source BI customer projects – Blogger, whitepapers, etc• Entrepreneur – DynamoBI Corporation – Bayon Technologies, Inc.• Data Geek, hacker, tinkerer, committer GOAL: Share perspectives, research, opinions. DISCLAIMER: Your Mileage ...
  • 4. How do we answer those Qs?
  • 5. Promise of “Big Data”• NoSQL/Hadoop/MapReduce Systems – Keep more of it – Cost effective analysis – “Massive scale” data, now accessible to everyone (elastic) – Not just SQL queries, more complex analysis ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING
  • 6. Reality Check!• Petabytes? Y • Fast Queries? N• Cheap Storage? Y • Ad Hoc access? N• Raw Processing? Y • Accessibility to commodity BI tools? N• Rich Query Languages? Y• Flexible data structures? Y• Easy report authoring? N• Reliable, Fault Tolerant? Y• Levels of Aggregation? N • Integrated Data? N Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.
  • 7. Data Gaps too!• Code, Developers • Analysts w/ Excel, Dashboards• MR, Rich Graph/Access • Simple 2D (tables, charts)• Hierarchical, Unstructured • Filtering and easy analytics
  • 8. Levels of AggregationSAME DATA AT VARIOUSLEVELS OF AGGREGATIONHUGELY IMPORTANT IN REALLIFE IMPLEMENTATIONS! 10K1 ROW 1 MILLIONTO 100 MILLION1 BILLION ROWS 100 BILLION
  • 9. Architectures• NoSQL reports• NoSQL thru and thru• NoSQL + MySQL• NoSQL as ETL Source• NoSQL programs in BI Tools• NoSQL via BI Database (SQL)
  • 10. NoSQL reports• Pay Developer to build applications for reports Apps• 100% Richness of NoSQL • $$, developer driven process• Up to date, current • No commodity BI tools• Excellent performance on • Managing rollups/summaries large datasets • Schema-less = Harder!• Custom built, beautiful • Hard to integrate other reports/dashboards reporting information• Single system to manage
  • 11. NoSQL thru and thru• Pay Developer to build FLEXIBLE applications for reports Indices Advanced Aggs Apps• All of NoSQL report • $$, developer driven process advantages • $$, app required for aggs• Managed aggregations, • No commodity BI tools rollups • Hard to integrate other• “Guided Adhoc” available reporting information inside application • Limited AdHoc (only• Higher performance for developer built dashboards/summaries combinations)
  • 12. NoSQL + MySQL• Pay Developer to build FLEXIBLE applications for reports ETL App MySQL• Less IT $$ since developers • Data freshness (24 hrs old) arent “building reports” • Once into MySQL no rich• Rich, NoSQL analysis left in NoSQL application use (M/R) place (ETL + NoSQL) • BI Tool can connect ONLY to• Easy, Ad Hoc reporting via data in MySQL, not NoSQL commodity BI tools • Aggregations still self• Easier to understand data for managed in MySQL self service reports
  • 13. NoSQL as ETL Data Source• NoSQL treated like any other data source Informatica Teradata• Allows use of consolidated, • ETL Development Expense BI tool for AdHoc • Data Latency• Enables integrated • Loss of NoSQL language (combined) datasets for richness reporting • Traditional DW tools are $$• Aggregations Often “managed” • Scaling issues with DW Database• Best of Breed tools
  • 14. NoSQL programs in BI Tools• Write a program in BI tool that flattens data, output into report• Rich use of NoSQL native • Developer required to write language program ($$)• Direct, up to date access • Slow-er (aggs, summaries)• Access to 100% of dataset • Lacks integration with other• Leverage “guided” report datasets parameter pages • Still (usually) no AdHoc• Less expensive than apps access
  • 15. NoSQL via BI Database (SQL)• Enable NoSQL data access via SQL (gasp!) Live Query Cached, 24hr data• Easy reports, easy (SQL) • Another system in between• Integration with other data • Still needs to be refreshed,• ETL is simple INSERT/MERGEs nightly• Live, up to date access • Not all capabilities for NoSQL richness available via SQL• High performance, cached data• AdHoc access to Live + Cached• Aggregations/Summaries
  • 16. Mozilla: NoSQL thru and thru(DB)• Socorro Project: Crash reports, optionally sent to Mozilla• https://crash-stats.mozilla.com
  • 17. X: NoSQL via SQL• Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)• Desire to use Tableau for advanced analytics/visualization
  • 18. Meteor Solutions: NoSQL thru and thru• Using Cloudant BigCouch solution (SaaS)• High performance set of multi purpose indices on pre defined aggregations• Up to date aggregation/reports• Better fit for Social Media graph structures over relational DB• Custom built BI applications (dashboards/reports) providing a flexible guided view through data Advanced Apps
  • 19. A,B,C: NoSQL + MySQL• Many Many companies (3 weve worked with)• All “web related” companies (semi structured, some, mostly volume)• Heavy lifting and storage, and “ETL/Data prepartion” inside Hadoop• Push summarized, aggregated data into MySQL for analysis by easy, dashboarding/BI Tools ETL App MySQL