Kognitio Spark! Modern Data Platform

474 views
446 views

Published on

Hadoop Meets Mature BI: Where the rubber meets the road for the Modern Data Platform.

This presenattion supports the Spark! Event in Atlanta, where Kognitio is a key sponsor. The event discusses the shift in how information is collected, stored and analyzed in a Big Data World.

More on the Radiant Advisors' Spark! Events at http://radiantadvisors.com/spark/

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
474
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Kognitio Spark! Modern Data Platform

  1. 1. @Kognitio #SparkEvent Hadoop meets Mature BI:  Where the rubber meets the road for  the Modern Data Platform Michael Hiskey Futurist, Product Evangelist (and VP, Marketing and Business Development www.kognitio.com
  2. 2. @Kognitio #SparkEvent Today, and the Future Big DataAdvanced Analytics In-memory Modern Data Platform Hybrid Data Ecosystem ‘Logical Data Warehouse’ Predictive Analytics Data Scientists Data
  3. 3. @Kognitio #SparkEvent 64% Have invested/plan to  invest in Big Data Tech Have started using it8% Via TechCrunch, 23 Sept 2013 Average TBs  of  stored data200 Walmart DW in 19992x Insights & Publications  May, 2011
  4. 4. @Kognitio #SparkEvent The Data Scientist Sexiest job of the 21st Century?
  5. 5. @Kognitio #SparkEvent Data  Scientist The Analytical Enterprise Business  Analyst Systems  Admin
  6. 6. @Kognitio #SparkEvent Remember: Decision Support Systems? …accessed with ease and simplicity Historical information, latency BI tools have plateaued 0 1 2 3 4 5 6 7 8 9 Advanced analytics &  data science More math…a lot more math
  7. 7. @Kognitio #SparkEvent create externalscript LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) Behind the  numbers
  8. 8. @Kognitio #SparkEvent What has changed? More connected-users? More-connected users?
  9. 9. @Kognitio #SparkEvent Don’t be a Railroad Stoker! Highly skilled engineering required …  but the world innovated around them.
  10. 10. @Kognitio #SparkEvent Machine learning  algorithms Dynamic Simulation Statistical  Analysis Clustering Behavior modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic  Interaction Technology/Automation Analytical Complexity Campaign  Management
  11. 11. @Kognitio #SparkEvent Key: “Graduation” Projects will need  to Graduate from the  Data Science Lab  and become part  of  Business as Usual
  12. 12. @Kognitio #SparkEvent Your goal:  PRESS HERE …and really cool Big Data stuff happens!
  13. 13. @Kognitio #SparkEvent Data flow
  14. 14. @Kognitio #SparkEvent © 20th Century Fox
  15. 15. @Kognitio #SparkEvent  No need to pre‐process  No need to align to schema  No need to triage  Null storage concerns
  16. 16. @Kognitio #SparkEvent Hadoop just too  slow for interactive  BI! …loss of train‐ of‐thought “while Hadoop shines as a processing platform, it is painfully slow as a query tool”
  17. 17. @Kognitio #SparkEvent Lots of these Not so many of these inherently disk oriented typically low ratio of CPU to Disk Hadoop is… 
  18. 18. @Kognitio #SparkEvent Analytics needs low latency, no I/O wait High speed in‐memory processing
  19. 19. A* Modern Data Platform  Reference Architecture Analytical Platform Near‐line Storage (optional) Access Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems … Reporting Cloud  Storage *(not THE)
  20. 20. © Hortonworks Inc. 2013 (another) Next-Generation Data Architecture Page 21 APPLICATIONSDATA SYSTEMS Microsoft Applications DATA SOURCES Traditional Sources  (RDBMS, OLTP, OLAP) In‐memory MPP Accelerator BI Tools & OLAP Clients TRADITIONAL REPOS RDBMS EDW MPP OPERATIONAL TOOLS MANAGE &  MONITOR DEV & DATA TOOLS BUILD &  TEST New Sources  (web logs, email, sensors, social media) HORTONWORKS  DATA PLATFORM
  21. 21. Analytical Platform

×