Kognitio Spark! Modern Data Platform
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Kognitio Spark! Modern Data Platform

Uploaded on

Hadoop Meets Mature BI: Where the rubber meets the road for the Modern Data Platform. ...

Hadoop Meets Mature BI: Where the rubber meets the road for the Modern Data Platform.

This presenattion supports the Spark! Event in Atlanta, where Kognitio is a key sponsor. The event discusses the shift in how information is collected, stored and analyzed in a Big Data World.

More on the Radiant Advisors' Spark! Events at http://radiantadvisors.com/spark/

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 31

https://twitter.com 11
http://teradatariver.com 11
http://teradatariver1.com 9

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. @Kognitio #SparkEvent Hadoop meets Mature BI:  Where the rubber meets the road for  the Modern Data Platform Michael Hiskey Futurist, Product Evangelist (and VP, Marketing and Business Development www.kognitio.com
  • 2. @Kognitio #SparkEvent Today, and the Future Big DataAdvanced Analytics In-memory Modern Data Platform Hybrid Data Ecosystem ‘Logical Data Warehouse’ Predictive Analytics Data Scientists Data
  • 3. @Kognitio #SparkEvent 64% Have invested/plan to  invest in Big Data Tech Have started using it8% Via TechCrunch, 23 Sept 2013 Average TBs  of  stored data200 Walmart DW in 19992x Insights & Publications  May, 2011
  • 4. @Kognitio #SparkEvent The Data Scientist Sexiest job of the 21st Century?
  • 5. @Kognitio #SparkEvent Data  Scientist The Analytical Enterprise Business  Analyst Systems  Admin
  • 6. @Kognitio #SparkEvent Remember: Decision Support Systems? …accessed with ease and simplicity Historical information, latency BI tools have plateaued 0 1 2 3 4 5 6 7 8 9 Advanced analytics &  data science More math…a lot more math
  • 7. @Kognitio #SparkEvent create externalscript LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) Behind the  numbers
  • 8. @Kognitio #SparkEvent What has changed? More connected-users? More-connected users?
  • 9. @Kognitio #SparkEvent Don’t be a Railroad Stoker! Highly skilled engineering required …  but the world innovated around them.
  • 10. @Kognitio #SparkEvent Machine learning  algorithms Dynamic Simulation Statistical  Analysis Clustering Behavior modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic  Interaction Technology/Automation Analytical Complexity Campaign  Management
  • 11. @Kognitio #SparkEvent Key: “Graduation” Projects will need  to Graduate from the  Data Science Lab  and become part  of  Business as Usual
  • 12. @Kognitio #SparkEvent Your goal:  PRESS HERE …and really cool Big Data stuff happens!
  • 13. @Kognitio #SparkEvent Data flow
  • 14. @Kognitio #SparkEvent © 20th Century Fox
  • 15. @Kognitio #SparkEvent  No need to pre‐process  No need to align to schema  No need to triage  Null storage concerns
  • 16. @Kognitio #SparkEvent Hadoop just too  slow for interactive  BI! …loss of train‐ of‐thought “while Hadoop shines as a processing platform, it is painfully slow as a query tool”
  • 17. @Kognitio #SparkEvent Lots of these Not so many of these inherently disk oriented typically low ratio of CPU to Disk Hadoop is… 
  • 18. @Kognitio #SparkEvent Analytics needs low latency, no I/O wait High speed in‐memory processing
  • 19. A* Modern Data Platform  Reference Architecture Analytical Platform Near‐line Storage (optional) Access Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems … Reporting Cloud  Storage *(not THE)
  • 20. © Hortonworks Inc. 2013 (another) Next-Generation Data Architecture Page 21 APPLICATIONSDATA SYSTEMS Microsoft Applications DATA SOURCES Traditional Sources  (RDBMS, OLTP, OLAP) In‐memory MPP Accelerator BI Tools & OLAP Clients TRADITIONAL REPOS RDBMS EDW MPP OPERATIONAL TOOLS MANAGE &  MONITOR DEV & DATA TOOLS BUILD &  TEST New Sources  (web logs, email, sensors, social media) HORTONWORKS  DATA PLATFORM
  • 21. Analytical Platform