@Kognitio #SparkEvent
Hadoop meets Mature BI: 
Where the rubber meets the road for 
the Modern Data Platform
Michael Hiske...
@Kognitio #SparkEvent
Today, and the Future
Big DataAdvanced Analytics
In-memory
Modern Data Platform
Hybrid Data Ecosyste...
@Kognitio #SparkEvent
The Data Scientist
Sexiest job of the 21st Century?
@Kognitio #SparkEvent
Data 
Scientist
The Analytical Enterprise
Business 
Analyst
Systems 
Admin
@Kognitio #SparkEvent
Remember: Decision Support Systems?
…accessed with ease
and simplicity
Historical information, laten...
select Trans_Year, Num_Trans,
count(distinct Account_ID) Num_Accts,
sum(count( distinct Account_ID)) over (partition by Tr...
@Kognitio #SparkEvent
What has changed?
More
connected-users?
More-connected
users?
@Kognitio #SparkEvent
Don’t be a Railroad Stoker!
Highly skilled engineering required … 
but the world innovated around th...
@Kognitio #SparkEvent
Machine learning 
algorithms Dynamic
Simulation
Statistical 
Analysis
Clustering
Behavior
modelling
...
@Kognitio #SparkEvent
Key: “Graduation”
Projects will need 
to Graduate
from the 
Data Science Lab 
and become part 
of 
B...
@Kognitio #SparkEvent
Your goal: 
PRESS HERE
…and really cool Big Data stuff happens!
@Kognitio #SparkEvent
Data flow
@Kognitio #SparkEvent
© 20th Century Fox
@Kognitio #SparkEvent
 No need to pre‐process
 No need to align to schema
 No need to triage 
Null storage concerns
@Kognitio #SparkEvent
Hadoop just too 
slow for interactive 
BI!
…loss of train‐
of‐thought
“while Hadoop shines as a proc...
@Kognitio #SparkEvent
Lots of these
Not so many of these
inherently disk oriented
typically low ratio of CPU to Disk
Hadoo...
@Kognitio #SparkEvent
Analytics needs
low latency, no I/O wait
High speed in‐memory processing
A* Modern Data Platform 
Reference Architecture
Analytical
Platform Near‐line
Storage
(optional)
Access
Application &
Clie...
© Hortonworks Inc. 2013
(another) Next-Generation Data Architecture
Page 20
APPLICATIONSDATA SYSTEMS
Microsoft Application...
Analytical Platform
@Kognitio #SparkEvent
It’s all about getting work done
Used to be simple fetch of value
Tasks evolving: 
Then was compute ...
Kognitio spark modern data platform print
Upcoming SlideShare
Loading in …5
×

Kognitio spark modern data platform print

431 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
431
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Kognitio spark modern data platform print

  1. 1. @Kognitio #SparkEvent Hadoop meets Mature BI:  Where the rubber meets the road for  the Modern Data Platform Michael Hiskey Futurist, Product Evangelist (and VP, Marketing & Business Development)
  2. 2. @Kognitio #SparkEvent Today, and the Future Big DataAdvanced Analytics In-memory Modern Data Platform Hybrid Data Ecosystem ‘Logical Data Warehouse’ Predictive Analytics Data Scientists Data
  3. 3. @Kognitio #SparkEvent The Data Scientist Sexiest job of the 21st Century?
  4. 4. @Kognitio #SparkEvent Data  Scientist The Analytical Enterprise Business  Analyst Systems  Admin
  5. 5. @Kognitio #SparkEvent Remember: Decision Support Systems? …accessed with ease and simplicity Historical information, latency BI tools have plateaued 0 1 2 3 4 5 6 7 8 9 Advanced analytics &  data science More math…a lot more math
  6. 6. select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts, cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend, rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts, rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spend from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from Transaction_fact where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in (select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary group by Trans_Year, Num_Trans order by Trans Year desc Num Trans; Behind the  numbers
  7. 7. @Kognitio #SparkEvent What has changed? More connected-users? More-connected users?
  8. 8. @Kognitio #SparkEvent Don’t be a Railroad Stoker! Highly skilled engineering required …  but the world innovated around them.
  9. 9. @Kognitio #SparkEvent Machine learning  algorithms Dynamic Simulation Statistical  Analysis Clustering Behavior modelling The drive for deeper understanding Reporting & BPM Fraud detection Dynamic  Interaction Technology/Automation Analytical Complexity Campaign  Management
  10. 10. @Kognitio #SparkEvent Key: “Graduation” Projects will need  to Graduate from the  Data Science Lab  and become part  of  Business as Usual
  11. 11. @Kognitio #SparkEvent Your goal:  PRESS HERE …and really cool Big Data stuff happens!
  12. 12. @Kognitio #SparkEvent Data flow
  13. 13. @Kognitio #SparkEvent © 20th Century Fox
  14. 14. @Kognitio #SparkEvent  No need to pre‐process  No need to align to schema  No need to triage  Null storage concerns
  15. 15. @Kognitio #SparkEvent Hadoop just too  slow for interactive  BI! …loss of train‐ of‐thought “while Hadoop shines as a processing platform, it is painfully slow as a query tool”
  16. 16. @Kognitio #SparkEvent Lots of these Not so many of these inherently disk oriented typically low ratio of CPU to Disk Hadoop is… 
  17. 17. @Kognitio #SparkEvent Analytics needs low latency, no I/O wait High speed in‐memory processing
  18. 18. A* Modern Data Platform  Reference Architecture Analytical Platform Near‐line Storage (optional) Access Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems … Reporting Cloud  Storage *(not THE)
  19. 19. © Hortonworks Inc. 2013 (another) Next-Generation Data Architecture Page 20 APPLICATIONSDATA SYSTEMS Microsoft Applications DATA SOURCES Traditional Sources  (RDBMS, OLTP, OLAP) In‐memory MPP Accelerator BI Tools & OLAP Clients TRADITIONAL REPOS RDBMS EDW MPP OPERATIONAL TOOLS MANAGE &  MONITOR DEV & DATA TOOLS BUILD &  TEST New Sources  (web logs, email, sensors, social media) HORTONWORKS  DATA PLATFORM
  20. 20. Analytical Platform
  21. 21. @Kognitio #SparkEvent It’s all about getting work done Used to be simple fetch of value Tasks evolving:  Then was compute dynamic aggregate Now complex algorithms! Now complex algorithms!

×