SlideShare a Scribd company logo
1 of 58
Download to read offline
Page	
  1	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Real-Time Processing in Hadoop
Big Data for Business
Shane	
  Kumpf	
  &	
  Mac	
  Moore	
  
SoluEons	
  Engineers,	
  Hortonworks	
  
April	
  2015	
  
Page	
  2	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Agenda	
  
§  IntroducEon	
  &	
  about	
  Hortonworks	
  HDP	
  
§  Overview	
  of	
  logisEcs	
  industry	
  scenario	
  
§  Overview	
  of	
  streaming	
  architecture	
  on	
  HDP	
  
§  Streaming	
  Demo	
  #1	
  
§  IntegraEng	
  PredicEve	
  AnalyEcs	
  in	
  streaming	
  scenarios	
  
§  Streaming	
  Demo	
  with	
  PredicEve	
  addiEons	
  
§  Q	
  &	
  A	
  
Page	
  2	
  
Page	
  3	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Preface:	
  Enabling	
  Technologies	
  
Page	
  3	
  
• Problems solved at scale, via fundamentally new approaches…
• Make it possible, even simple, to produce new products/applications that would have
been too cost prohibitive – or simply impossible - beforehand.
• Where foundation tech like Li-­‐Ion	
  baUeries,	
  reEna	
  displays,	
  &	
  Eny	
  HD	
  cameras	
  (from	
  
smartphones)	
  have	
  enabled	
  Electric	
  cars,	
  quad-­‐copters,	
  VR	
  displays,	
  &	
  more…	
  
• Hadoop	
  has	
  similarly	
  led	
  to	
  breakthroughs	
  in	
  big	
  data	
  capability,	
  and	
  enables	
  new	
  real-­‐
Eme	
  advanced	
  analyEc	
  applicaEons.	
  
Page	
  4	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Why did Hadoop emerge?
April	
  2015	
  
Page	
  5	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
	
  
	
  
Traditional systems under pressure
Challenges
•  Constrains data to app
•  Can’t manage new data
•  Costly to Scale
Business	
  Value	
  
	
  
	
  
	
  
	
  
Clickstream	
  
GeolocaEon	
  
Web	
  Data	
  
Internet	
  of	
  Things	
  
Docs,	
  emails	
  
Server	
  logs	
  
2012	
  
2.8	
  Ze5abytes	
  
2020	
  
40	
  Ze5abytes	
  
LAGGARDS	
  
INDUSTRY	
  
LEADERS	
  
1
2 New Data	
  
ERP	
   CRM	
   SCM	
  
New	
  	
  
TradiKonal	
  
Page	
  6	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Hadoop for the Enterprise: Implement a
Modern Data Architecture with HDP
Spring	
  2015	
  
Hortonworks. We do Hadoop.
Page	
  7	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Hadoop	
  for	
  the	
  Enterprise:	
  	
  
Implement	
  a	
  Modern	
  Data	
  Architecture	
  with	
  HDP	
  
Customer Momentum
•  330+ customers (as of year-end 2014)
Hortonworks Data Platform
•  Completely open multi-tenant platform for any app & any data.
•  A centralized architecture of consistent enterprise services for
resource management, security, operations, and governance.
Partner for Customer Success
•  Open source community leadership focus on enterprise needs
•  Unrivaled world class support
•  Founded in 2011
•  Original 24 architects, developers,
operators of Hadoop from Yahoo!
•  600+ Employees
•  1000+ Ecosystem Partners
Page	
  8	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Customer Partnerships matter
Driving	
  our	
  innovaKon	
  through	
  
Apache	
  SoSware	
  FoundaKon	
  Projects	
  
Apache	
  Project	
   Commi5ers	
  
PMC	
  
Members	
  
Hadoop	
   27	
   21	
  
Pig	
   5	
   5	
  
Hive	
   18	
   6	
  
Tez	
   16	
   15	
  
HBase	
   6	
   4	
  
Phoenix	
   4	
   4	
  
Accumulo	
   2	
   2	
  
Storm	
   3	
   2	
  
Slider	
   11	
   11	
  
Falcon	
   5	
   3	
  
Flume	
   1	
   1	
  
Sqoop	
   1	
   1	
  
Ambari	
   34	
   27	
  
Oozie	
   3	
   2	
  
Zookeeper	
   2	
   1	
  
Knox	
   13	
   3	
  
Ranger	
   10	
   n/a	
  
TOTAL	
   161	
   108	
  
Source:	
  Apache	
  Sobware	
  FoundaEon.	
  As	
  of	
  11/7/2014.	
  
Hortonworkers	
  are	
  the	
  architects	
  and	
  
engineers	
  that	
  lead	
  development	
  of	
  open	
  
source	
  Apache	
  Hadoop	
  at	
  the	
  ASF	
  
•  ExperKse	
  
Uniquely	
  capable	
  to	
  solve	
  the	
  most	
  complex	
  issues	
  &	
  
ensure	
  success	
  with	
  latest	
  features	
  
•  ConnecKon	
  
Provide	
  customers	
  &	
  partners	
  direct	
  input	
  into	
  	
  
the	
  community	
  roadmap	
  
•  Partnership	
  
We	
  partner	
  with	
  customers	
  with	
  subscripEon	
  offering.	
  
Our	
  success	
  is	
  predicated	
  on	
  yours.	
  
27	
  
Cloudera:	
  11	
  
	
  
Facebook:	
  5	
  
	
  
LinkedIn:	
  2	
  
	
  
IBM:	
  2	
  
	
  
Others:	
  23	
  
	
  
Yahoo	
  
10	
  
	
  
Page	
  9	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Technology Partnerships matter
Apache	
  Project	
   Hortonworks	
  
RelaKonship	
  
Named	
  
Partner	
  
CerEfied	
  
SoluEon	
  
Resells	
  
Joint	
  
Engr	
  
MicrosoS	
   u	
   u	
   u	
   u	
  
HP	
   u	
   u	
   u	
   u	
  
SAS	
   u	
   u	
   u	
  
SAP	
   u	
   u	
   u	
   u	
  
IBM	
   u	
   u	
   u	
  
Pivotal	
   u	
   u	
   u	
  
Redhat	
   u	
   u	
   u	
  
Teradata	
   u	
   u	
   u	
   u	
  
InformaKca	
   u	
   u	
   u	
  
Oracle	
   u	
   u	
  
It	
  is	
  not	
  just	
  about	
  
packaging	
  and	
  cerEfying	
  
sobware…	
  
	
  
Our	
  joint	
  engineering	
  
with	
  our	
  partners	
  drives	
  
open	
  source	
  standards	
  
for	
  Apache	
  Hadoop	
  	
  
	
  
	
  
HDP	
  is	
  	
  
Apache	
  Hadoop	
  
Page	
  10	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
HDP delivers a Centralized Architecture
Modern Data Architecture
•  Unifies data and processing.
•  Enables applications to have access to
all your enterprise data through an
efficient centralized platform
•  Supported with a centralized approach
governance, security and operations
•  Versatile to handle any applications
and datasets no matter the size or type
Clickstream	
   Web	
  	
  
&	
  Social	
  
GeolocaKon	
   Sensor	
  	
  
&	
  Machine	
  
Server	
  	
  
Logs	
  
Unstructured	
  
SOURCES	
  
ExisKng	
  Systems	
  
ERP	
   CRM	
   SCM	
  
ANALYTICS	
  
Data	
  	
  
Marts	
  
Business	
  	
  
AnalyKcs	
  
VisualizaKon	
  
&	
  Dashboards	
  
ANALYTICS	
  
ApplicaKons	
  
Business	
  	
  
AnalyKcs	
  
VisualizaKon	
  
&	
  Dashboards	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
°	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
YARN:	
  Data	
  OperaKng	
  System	
  
Interactive Real-TimeBatch Partner ISVBatch Batch
MPP	
   EDW	
  
Page	
  11	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Real World Use Case:
Trucking Company
Spring	
  2015	
  
Hortonworks. We do Hadoop.
Page	
  12	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Scenario Overview
.
Page	
  13	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Trucking	
  company	
  w/	
  large	
  fleet	
  of	
  trucks	
  in	
  Midwest	
  
A	
  truck	
  generates	
  millions	
  of	
  events	
  for	
  a	
  
given	
  route;	
  an	
  event	
  could	
  be:	
  
§  'Normal'	
  events:	
  starEng	
  /	
  stopping	
  of	
  the	
  
vehicle	
  
§  ‘ViolaEon’	
  events:	
  speeding,	
  excessive	
  
acceleraEon	
  and	
  breaking,	
  unsafe	
  tail	
  distance	
  
Company	
  uses	
  an	
  applicaKon	
  that	
  monitors	
  
truck	
  locaKons	
  and	
  violaKons	
  from	
  the	
  truck/
driver	
  in	
  real-­‐Kme	
  
Route?	
  
Truck?	
  
Driver?	
  
	
  
Analysts	
  query	
  a	
  broad	
  
history	
  to	
  understand	
  if	
  
today’s	
  violaEons	
  are	
  
part	
  of	
  a	
  larger	
  problem	
  
with	
  specific	
  routes,	
  
trucks,	
  or	
  drivers	
  
Page	
  14	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Distributed	
  Storage:	
  HDFS	
  
Many	
  Workloads:	
  YARN	
  
Trucking	
  Company’s	
  YARN-­‐enabled	
  Architecture	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Alerts	
  &	
  Events	
  
(AcEveMQ)	
  
Real-­‐Time	
  	
  
User	
  Interface	
  
One	
  cluster	
  with	
  consistent	
  
security,	
  governance	
  &	
  
operaKons	
  
SQL	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Truck	
  Sensors	
  
Page	
  15	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Distributed	
  Storage:	
  HDFS	
  
Many	
  Workloads:	
  YARN	
  
Trucking	
  Company’s	
  YARN-­‐enabled	
  Architecture	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Alerts	
  &	
  Events	
  
(AcEveMQ)	
  
Real-­‐Time	
  	
  
User	
  Interface	
  
One	
  cluster	
  with	
  consistent	
  
security,	
  governance	
  &	
  
operaKons	
  
SQL	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Truck	
  Sensors	
  
Page	
  16	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
What	
  is	
  Kara?	
  	
   APACHE	
  KAFKA	
  
§  High	
  throughput	
  distributed	
  messaging	
  
system	
  
§  Publish-­‐Subscribe	
  semanEcs	
  but	
  re-­‐
imagined	
  at	
  the	
  implementaEon	
  level	
  to	
  
operate	
  at	
  speed	
  with	
  big	
  data	
  volumes	
  
	
  
§  Kara	
  @LinkedIn:	
  
§  800	
  billion	
  messages	
  per	
  day	
  
§  175	
  terabytes	
  of	
  data	
  wriUen	
  per	
  day	
  
§  650	
  terabytes	
  of	
  data	
  read	
  per	
  day	
  
§  Over	
  13	
  million	
  messages/2.75GB	
  of	
  data	
  
per	
  second	
  
Kaga	
  
Cluster	
  
producer	
  
producer	
  
producer	
  
consumer	
  
consumer	
  
consumer	
  
Page	
  17	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Kara:	
  Anatomy	
  of	
  a	
  Topic	
  
ParKKon	
  0	
   ParKKon	
  1	
   ParKKon	
  2	
  
	
  
0	
   0	
   0	
  
1	
   1	
   1	
  
2	
   2	
   2	
  
3	
   3	
   3	
  
4	
   4	
   4	
  
5	
   5	
   5	
  
6	
   6	
   6	
  
7	
   7	
   7	
  
8	
   8	
   8	
  
9	
   9	
   9	
  
10	
   10	
  
11	
   11	
  
12	
  
Writes	
  
Old	
  
New	
  
APACHE	
  KAFKA	
  
§  ParEEoning	
  allows	
  topics	
  to	
  
scale	
  beyond	
  a	
  single	
  
machine/node	
  
	
  
§  Topics	
  can	
  also	
  be	
  replicated,	
  
for	
  high	
  availability.	
  
Page	
  18	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Distributed	
  Storage:	
  HDFS	
  
Many	
  Workloads:	
  YARN	
  
Trucking	
  Company’s	
  YARN-­‐enabled	
  Architecture	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Alerts	
  &	
  Events	
  
(AcEveMQ)	
  
Real-­‐Time	
  	
  
User	
  Interface	
  
One	
  cluster	
  with	
  consistent	
  
security,	
  governance	
  &	
  
operaKons	
  
SQL	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Truck	
  Sensors	
  
Page	
  19	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Apache	
  Storm	
  
• Distributed,	
  real	
  Eme,	
  fault	
  tolerant	
  Stream	
  Processing	
  plaxorm.	
  
• Provides	
  processing	
  guarantees.	
  
• Key	
  concepts	
  include:	
  
• Tuples	
  
• Streams	
  
• Spouts	
  
• Bolts	
  
• Topology	
  
Page	
  19	
  
Page	
  20	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Tuples	
  and	
  Streams	
  
• What	
  is	
  a	
  Tuple?	
  
– Fundamental	
  data	
  structure	
  in	
  Storm.	
  	
  Is	
  a	
  named	
  list	
  of	
  values	
  that	
  can	
  be	
  of	
  any	
  data	
  type.	
  
	
  
Page	
  20	
  
• What	
  is	
  a	
  Stream?	
  
– An	
  unbounded	
  sequences	
  of	
  tuples.	
  
– Core	
  abstracEon	
  in	
  Storm	
  and	
  are	
  what	
  you	
  “process”	
  in	
  Storm	
  
	
  
Page	
  21	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Spouts	
  
• What	
  is	
  a	
  Spout?	
  
– Generates	
  or	
  a	
  source	
  of	
  Streams	
  
– E.g.:	
  JMS,	
  TwiUer,	
  Log,	
  Kara	
  Spout	
  
– Can	
  spin	
  up	
  mulEple	
  instances	
  of	
  a	
  Spout	
  and	
  dynamically	
  adjust	
  as	
  needed	
  
Page	
  21	
  
Page	
  22	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Bolts	
  
• What	
  is	
  a	
  Bolt?	
  
– Processes	
  any	
  number	
  of	
  input	
  streams	
  and	
  produces	
  output	
  streams	
  
– Common	
  processing	
  in	
  bolts	
  are	
  funcEons,	
  aggregaEons,	
  joins,	
  read/write	
  to	
  data	
  stores,	
  alerEng	
  
logic	
  
– Can	
  spin	
  up	
  mulEple	
  instances	
  of	
  a	
  Bolt	
  and	
  dynamically	
  adjust	
  as	
  needed	
  
• Bolts	
  used	
  in	
  the	
  Use	
  Case:	
  
1.  HBaseBolt:	
  persisEng	
  and	
  counEng	
  in	
  Hbase	
  
2.  HDFSBolt:	
  persisEng	
  into	
  HFDS	
  as	
  Avro	
  Files	
  using	
  Flume	
  
3.  MonitoringBolt:	
  Read	
  from	
  Hbase	
  and	
  create	
  alerts	
  via	
  email	
  and	
  a	
  message	
  to	
  AcEveMQ	
  if	
  the	
  
number	
  of	
  illegal	
  driver	
  incidents	
  exceed	
  a	
  given	
  threshhold.	
  
Page	
  22	
  
Page	
  23	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Topology	
  
• What	
  is	
  a	
  Topology?	
  
– A	
  network	
  of	
  spouts	
  and	
  bolts	
  wired	
  together	
  into	
  a	
  workflow	
  
Page 23
Truck-Event-Processor Topology
Kafka Spout
HBase
Bolt
Monitoring
Bolt
HDFS
Bolt
WebSocket
Bolt
Stream Stream
Stream
Stream
Page	
  24	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Distributed	
  Storage:	
  HDFS	
  
Many	
  Workloads:	
  YARN	
  
Trucking	
  Company’s	
  YARN-­‐enabled	
  Architecture	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Alerts	
  &	
  Events	
  
(AcEveMQ)	
  
Real-­‐Time	
  	
  
User	
  Interface	
  
One	
  cluster	
  with	
  consistent	
  
security,	
  governance	
  &	
  
operaKons	
  
SQL	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Truck	
  Sensors	
  
Page	
  25	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Key	
  Constructs	
  in	
  Apache	
  HBase	
  
•  HBase = Key /Value store
•  Designed for petabyte scale
•  Supports low latency reads, writes and updates
•  Key features
– Updateable records
– Versioned Records
– Distributed across a cluster of machines
– Low Latency
– Caching
•  Popular use cases:
– User profiles and session state
– Object store
– Sensor apps
Page	
  25	
  
Page	
  26	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Data	
  Assignment	
  
Page	
  26	
  
HBase	
  Table	
  
Keys	
  within	
  HBase	
  
Divided	
  among	
  
different	
  RegionServers	
  
Page	
  27	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Data	
  Access	
  
• Get
– Retrieves a single cell, all cells with a matching rowkey, or all cells in a column family with a
matching rowkey
• Put
– Inserts a new version of a cell.  
• Scan
– The whole table, row by row, or a section of that table starting at a particular start key and ending
at a particular end key
• Delete
– It is actually a version of put(Add a new version with put with a deletion marker)
• SQL via Apache Phoenix
– Unique capability in the NoSQL market
Page	
  27	
  
Page	
  28	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Distributed	
  Storage:	
  HDFS	
  
Many	
  Workloads:	
  YARN	
  
Trucking	
  Company’s	
  YARN-­‐enabled	
  Architecture	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Alerts	
  &	
  Events	
  
(AcEveMQ)	
  
Real-­‐Time	
  	
  
User	
  Interface	
  
One	
  cluster	
  with	
  consistent	
  
security,	
  governance	
  &	
  
operaKons	
  
SQL	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Truck	
  Sensors	
  
Page	
  29	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
2009	
  2006	
  
1	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   N	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
MapReduce	
  
Largely	
  Batch	
  Processing	
  
Hadoop	
  w/	
  
MapReduce	
  
YARN: Data Operating System
1
 °
 °
 °
 °
 °
 °
 °
 °
 °
°
 °
 °
 °
 °
 °
 °
 °
 °
°
°
N
HDFS 

(Hadoop Distributed File System)
Hadoop2 & YARN based Architecture
Silo’d clusters
Largely batch system
Difficult to integrate
MR-­‐279:	
  YARN	
  
Hadoop 2 & YARN
Interactive Real-TimeBatch
Architected & 

led development
of YARN to enable
the Modern Data
Architecture
October 23, 2013
Page	
  30	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Benefits	
  of	
  YARN	
  as	
  the	
  Data	
  OperaEng	
  System	
  
• The container based model allows for running nearly any workload.
– Enables the centralized architecture.
– No longer is MapReduce the only data processing engine.
– Docker containers managed byYARN.Yes Please!
• Decouples resource scheduling from application lifecycle.
– Improved scalability and fault tolerence
• Dynamically allocated resources, resulting in HUGE utilization gains
– Versus static allocation of “slots” in Hadoop 1.0
Page	
  30	
  
Yahoo has over 30000 nodes runningYARN across over 365PB of data.
They calculate running about 400,000 jobs per day for about 10 million hours of compute time.
They also have estimated a 60% – 150% improvement on node usage per day since moving toYARN.
Page	
  31	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Distributed	
  Storage:	
  HDFS	
  
Many	
  Workloads:	
  YARN	
  
Trucking	
  Company’s	
  YARN-­‐enabled	
  Architecture	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Alerts	
  &	
  Events	
  
(AcEveMQ)	
  
Real-­‐Time	
  	
  
User	
  Interface	
  
One	
  cluster	
  with	
  consistent	
  
security,	
  governance	
  &	
  
operaKons	
  
SQL	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Truck	
  Sensors	
  
Page	
  32	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Apache HDFS – Hadoop Distributed File System	
  
•  Very large scale distributed file system
•  10K nodes, tens of millions files and PBs of data
•  Supports large files
•  Designed to run on commodity hardware, assumes hardware failures
•  Files are replicated to handle hardware failure
•  Detect failures and recovers from them automatically
•  Optimized for Large Scale Processing
•  Data locations are exposed so that the computations can move to where data resides
•  Data Coherency
•  Write once and read many times access pattern
•  Files are broken up in chunks called ‘blocks’
•  Blocks are distributed over nodes
Page	
  32	
  
Page	
  33	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Streaming	
  Demo	
  -­‐	
  High	
  Level	
  Architecture	
  
Distributed	
  Storage:	
  HDFS	
  
YARN	
  
Storm	
  Stream	
  Processing	
  
Kakfa	
  Spout	
  
HBase	
  
Dangerous	
  
Events	
  Table	
  
Hbase	
  
Bolt	
  
HDFS	
  
Bolt	
  
Truck	
  Events	
  
AcKve	
  	
  
MQ	
  
Monitoring	
  
Bolt	
  
Web	
  App	
  
Truck	
  Streaming	
  Data	
  
T(1)	
   T(2)	
   T(N)	
  
Inbound	
  Messaging	
  
(Kaga)	
  
Truck	
  Events	
  Topic	
  
Page	
  34	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Demo	
  –	
  Streaming	
  Dashboard	
  
.
Page	
  35	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
A	
  New	
  Challenge	
  
.
Page	
  36	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
CDO’s	
  vision:	
  Build	
  a	
  PredicEve	
  Business,	
  not	
  a	
  ReacEve	
  one	
  
CDO’s	
  Requirements	
  
§  Offline	
  predicKons	
  
§  IdenKfy	
  investments	
  that	
  will	
  increase	
  
safety	
  and	
  reduce	
  company’s	
  liabiliKes	
  
§  Real-­‐Kme	
  predicKons	
  	
  
§  AnKcipate	
  driver	
  violaKons	
  before	
  they	
  
happen	
  and	
  take	
  precauKonary	
  acKons	
  
Data	
  ScienKst’s	
  Response	
  
§  Need	
  to	
  explore	
  data	
  &	
  form	
  a	
  hypothesis	
  
§  Verify	
  trends	
  against	
  TBs	
  of	
  events	
  data	
  via	
  
machine	
  learning	
  
§  Generate	
  predicEve	
  models	
  with	
  Spark	
  
MLlib	
  on	
  HDP	
  	
  
§  Plug	
  models	
  into	
  the	
  Storm	
  topology	
  to	
  predict	
  
driver	
  violaEons	
  in	
  real-­‐Eme	
  
♬	
  I’ve	
  been	
  wai+ng	
  for	
  
this	
  moment	
  all	
  my	
  life	
  ♬	
  
Page	
  37	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Demo	
  –	
  Analyzing	
  Events	
  with	
  Tableau	
  
.
Page	
  38	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Raw Events – dangerous drivers
Page 38
Page	
  39	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Raw Events – dangerous routes
Page 39
Page	
  40	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Raw Events – violations by location
Page 40
Page	
  41	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Enriching	
  truck	
  events	
  for	
  analysis	
  with	
  Pig	
  
HDFS	
   Raw	
  Truck	
  Events	
  Weather	
  Data	
  Sets	
  
Raw	
  Weather	
  Data	
  
HCatalog	
  (Metadata)	
  
Payroll	
  Data	
  
HR	
  &	
  Payroll	
  DBs	
  
Load	
  Raw	
  Truck	
  
Events	
  
Clean	
  &	
  	
  
Filter	
  
Cleaned	
  
Events	
  
Transformed	
  
Events	
  
Transform	
  	
  
	
  
Join	
  with	
  
HR	
  &	
  weather	
  data	
  
Enriched	
  
Events	
  
Enriched	
  Events	
  
Store	
  
Tableau	
  	
  
Page	
  42	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Enriched Events – noncertified and fatigued drivers
more dangerous
Page 42
Page	
  43	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Enriched Events – top 3 dangerous routes seem to be
driven by fatigued drivers
Page 43
Page	
  44	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Enriched Events – foggy weather leads to violations
Page 44
Page	
  45	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Analyzing Enriched Events – but top 3 safest routes are also
foggy
Page 45
Page	
  46	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
IntegraEng	
  PredicEve	
  AnalyEcs	
  
Page	
  47	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Building	
  the	
  PredicEve	
  Model	
  on	
  HDP	
  
Tableau	
  	
  
Explore	
  small	
  subset	
  of	
  events	
  to	
  idenEfy	
  predicEve	
  
features	
  and	
  make	
  a	
  hypothesis.	
  E.g.	
  hypothesis:	
  “foggy	
  
weather	
  causes	
  driver	
  viola+ons”	
  
1	
  
IdenEfy	
  suitable	
  ML	
  algorithms	
  to	
  train	
  a	
  model	
  –	
  we	
  will	
  
use	
  classificaEon	
  algorithms	
  as	
  we	
  have	
  labeled	
  events	
  
data	
  	
  
2	
  
Transform	
  enriched	
  events	
  data	
  to	
  a	
  format	
  that	
  is	
  
friendly	
  to	
  Spark	
  MLlib	
  –	
  many	
  ML	
  libs	
  expect	
  
training	
  data	
  in	
  a	
  certain	
  format	
  
3	
  
Train	
  a	
  logisEc	
  regression	
  model	
  in	
  Spark	
  on	
  YARN,	
  with	
  
above	
  events	
  as	
  training	
  input,	
  and	
  iterate	
  to	
  fine	
  tune	
  
the	
  generated	
  model	
  
4	
  
	
  
Integrate	
  Spark	
  MLlib	
  model	
  in	
  a	
  Storm	
  bolt	
  to	
  predict	
  
violaEons	
  in	
  real	
  Eme	
  
5	
  
Page	
  48	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Truck	
  Sensors	
  
HDFS	
  
YARN	
  
Integrate	
  PredicEve	
  AnalyEcs	
  in	
  Stream	
  Processing	
  
Stream	
  Processing	
  
(Storm)	
  
Inbound	
  Messaging	
  
(Kara)	
  
InteracEve	
  Query	
  
(Hive	
  on	
  Tez)	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
Millions	
  of	
  Enriched	
  Truck	
  Events	
  	
  
PredicEon	
  Bolt	
  
Plug	
  Spark	
  model	
  
into	
  Storm	
  bolt	
  
Machine	
  Learning	
  
(Spark)	
  
Train	
  Spark	
  ML	
  model	
  with	
  
millions	
  of	
  truck	
  events	
  
Page	
  49	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
© Hortonworks Inc. 2012
Professional Services
Streaming	
  Demo	
  -­‐	
  Updated	
  Architecture	
  
Distributed	
  Storage:	
  HDFS	
  
YARN	
  
Storm	
  Stream	
  Processing	
  
Kakfa	
  Spout	
  
HBase	
  
PayRoll	
  
Table	
  HBase	
  
Bolt	
  
HDFS	
  
Bolt	
  
Truck	
  Events	
  
AcKve	
  	
  
MQ	
  
Monitoring	
  
Bolt	
  
Web	
  App	
  
Truck	
  Streaming	
  Data	
  
T(1)	
   T(2)	
   T(N)	
  
Inbound	
  Messaging	
  
(Kaga)	
  
Truck	
  Events	
  Topic	
  
PredicKon	
  
Bolt	
  
Enrich	
  	
  
Event	
  
Predict	
  
violaKon	
  in	
  real	
  
Kme	
  	
  &	
  alert	
  
via	
  MQ	
  
Render	
  Real	
  Kme	
  
predicKons	
  on	
  UI	
  
Page	
  50	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Transforming	
  training	
  data	
  for	
  Spark	
  MLlib	
  
Enriched	
  Events	
  Data	
  
Event	
  Type	
   Is	
  Driver	
  
CerKfied?	
  
Wage	
  
Plan	
  
Hours	
  
Driven	
  
Miles	
  
Driven	
  
Longitude	
   LaKtude	
   Weather	
  
Foggy	
  
Weather	
  	
  
Rainy	
  
Weather	
  	
  
Windy	
  
Normal	
   Yes	
   Hourly	
   45	
   2721	
   -­‐91.3	
   38.14	
   No	
   No	
   No	
  
Overspeed	
   No	
   Miles	
   72	
   4152	
   -­‐94.23	
   37.09	
   Yes	
   Yes	
   No	
  
…	
   …	
   …	
   …	
   …	
   …	
   …	
   …	
   …	
   …	
  
Spark	
  MLlib	
  	
  Training	
  Data	
  
Label	
   Is	
  Driver	
  
CerKfied?	
  
Wage	
  
Plan	
  
Hours	
  
Driven	
  
Miles	
  
Driven	
  
Weather	
  
Foggy	
  
Weather	
  	
  
Rainy	
  
Weather	
  	
  
Windy	
  
0	
   1	
   1	
   0.45	
   0.2721	
   0	
   0	
   0	
  
1	
   0	
   0	
   0.72	
   0.4152	
   1	
   1	
   0	
  
…	
   …	
   …	
   …	
   …	
   …	
   …	
   …	
  
Normal	
  events	
  
labeled	
  as	
  0	
  and	
  
violaEon	
  events	
  as	
  1	
  
Feature	
  scaling	
  applied	
  to	
  
hours	
  and	
  miles	
  to	
  improve	
  
algorithm	
  performance	
  
Features	
  with	
  binary	
  values	
  	
  
denoted	
  as	
  0	
  and	
  1	
  
Page	
  51	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Running	
  Spark	
  ML	
  on	
  YARN	
  
1	
  
spark-­‐submit	
  -­‐-­‐class	
  org.apache.spark.examples.mllib.BinaryClassifica+on	
  -­‐-­‐master	
  yarn-­‐cluster	
  	
  -­‐-­‐
num-­‐executors	
  3	
  -­‐-­‐driver-­‐memory	
  512m	
  	
  -­‐-­‐executor-­‐memory	
  512m	
  	
  	
  	
  
-­‐-­‐executor-­‐cores	
  1	
  truckml.jar	
  -­‐-­‐algorithm	
  LR	
  -­‐-­‐regType	
  L2	
  -­‐-­‐regParam	
  1.0	
  /user/root/truck_training	
  	
  
-­‐-­‐numItera3ons	
  100	
  
Run	
  spark-­‐submit	
  script	
  to	
  launch	
  a	
  Spark	
  job	
  on	
  YARN.	
  
Training	
  data	
  
locaEon	
  on	
  HDFS	
  
2	
   Monitor	
  progress	
  of	
  Spark	
  job	
  in	
  YARN	
  Resource	
  Mgr	
  UI	
  
Page	
  52	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
InterpreEng	
  Spark	
  LogisEc	
  Regression	
  Results	
  
Precision:	
  87.5%	
   Recall:	
  88%	
  
	
  Top	
  three	
  predictors	
  of	
  violaKons	
  	
  
1.	
  Foggy	
  Weather	
  2.	
  Rainy	
  Weather	
  3.	
  Driver	
  CerEficaEon	
  
Page	
  53	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
IntegraEng	
  Spark	
  model	
  in	
  Storm	
  
Kara	
  Spout	
  
	
  
	
  	
  	
  	
  	
  Storm	
  PredicEon	
  Bolt	
  
§  IniEalize	
  Spark	
  model	
  
§  Parse	
  truck	
  event	
  
§  Enrich	
  event	
  with	
  HBase	
  data	
  
§  Predict	
  violaEon	
  with	
  model	
  
§  Send	
  Alert	
  if	
  violaEon	
  predicted	
  
Real-­‐Eme	
  Serving	
  
(HBase)	
  
AcKve	
  MQ	
  
Ops	
  Center	
   LOB	
  Dashboards	
  
Page	
  54	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Summary:	
  SoluEon	
  Value	
  
.
Page	
  55	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Value	
  of	
  large	
  scale	
  ML	
  on	
  HDP	
  
§  Accelerate	
  Kme	
  to	
  market/value	
  
§  Test	
  out	
  mulEple	
  ML	
  algorithms	
  against	
  TBs	
  of	
  training	
  data	
  in	
  
reasonable	
  Eme	
  frames	
  
§  Confirm	
  hypothesis	
  against	
  TBs	
  of	
  training	
  data	
  with	
  confidence	
  
§  We	
  confirmed	
  that	
  fog	
  does	
  impact	
  safety	
  and	
  wage	
  plans	
  do	
  not,	
  
whereas	
  BI	
  tools	
  indicated	
  otherwise	
  
	
  
§  Easily	
  integrate	
  predicKve	
  models	
  in	
  data	
  driven	
  apps	
  
§  Run	
  predicEve	
  models	
  in	
  Storm	
  or	
  any	
  other	
  app	
  in	
  your	
  enterprise	
  
	
  
§  Run	
  all	
  of	
  the	
  above	
  in	
  a	
  mulK-­‐tenant	
  YARN	
  cluster	
  
§  Large	
  scale	
  ML	
  on	
  YARN	
  respects	
  other	
  tenants	
  in	
  an	
  HDP	
  cluster	
  
Page	
  56	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
RecommendaEons	
  to	
  CDO	
  
§  Investment	
  recommendaKons,	
  in	
  order	
  of	
  priority	
  
1.  Invest	
  in	
  visibility	
  sensors	
  and	
  auto	
  braking	
  systems	
  to	
  deal	
  with	
  foggy	
  condiEons	
  
2.  Invest	
  in	
  slip	
  resistant	
  Eres	
  to	
  fight	
  rainy	
  condiEons	
  
3.  Invest	
  in	
  cerEfying	
  drivers	
  to	
  reduce	
  violaEon	
  probability	
  
	
  
	
  
	
  
	
  
§  Power	
  of	
  real	
  Kme	
  predicKons	
  
§  40%	
  reducEon	
  in	
  violaEon	
  rates	
  by	
  predicEng	
  high	
  risk	
  situaEons	
  in	
  real-­‐Eme	
  and	
  
sending	
  immediate	
  alerts	
  to	
  drivers	
  
	
  
	
  
Page	
  57	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
PredicEve	
  Demo	
  
.
Page	
  58	
   ©	
  Hortonworks	
  Inc.	
  2011	
  –	
  2014.	
  All	
  Rights	
  Reserved	
  
Q & A
Big Data for Business

More Related Content

What's hot

Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Powering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the CloudPowering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the CloudHortonworks
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Hortonworks
 
Democratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsightDemocratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsightHortonworks
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopDataWorks Summit
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3OTN Systems Hub
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution Hortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scaleCarolyn Duby
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 
Go Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
Go Zero to Big Data in 15 Minutes with the Hortonworks SandboxGo Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
Go Zero to Big Data in 15 Minutes with the Hortonworks SandboxHortonworks
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Hortonworks
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 

What's hot (20)

Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Powering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the CloudPowering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the Cloud
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts Presentation
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?
 
Democratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsightDemocratizing Big Data with Microsoft Azure HDInsight
Democratizing Big Data with Microsoft Azure HDInsight
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 
Go Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
Go Zero to Big Data in 15 Minutes with the Hortonworks SandboxGo Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
Go Zero to Big Data in 15 Minutes with the Hortonworks Sandbox
 
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Pla...
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 

Viewers also liked

QUIEDAN PRES 2015C_111815
QUIEDAN PRES 2015C_111815QUIEDAN PRES 2015C_111815
QUIEDAN PRES 2015C_111815Juan Batista
 
Oracle HCM Cloud Overview
Oracle HCM Cloud OverviewOracle HCM Cloud Overview
Oracle HCM Cloud OverviewKotaro Uchida
 
Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390
Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390
Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390Lisa Wren
 
PPF_032715_SPDM2015_Booklet-FINAL
PPF_032715_SPDM2015_Booklet-FINALPPF_032715_SPDM2015_Booklet-FINAL
PPF_032715_SPDM2015_Booklet-FINALAmanda Panda
 
Traitement des données manquantes et aberrantes sous R
Traitement des données manquantes et aberrantes sous RTraitement des données manquantes et aberrantes sous R
Traitement des données manquantes et aberrantes sous RMohamed Ali KHOUAJA
 
Time Series Data: OpenTSDB and TSP (Betfair)
Time Series Data: OpenTSDB and TSP (Betfair)Time Series Data: OpenTSDB and TSP (Betfair)
Time Series Data: OpenTSDB and TSP (Betfair)Rakh1
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
G Holmes CV - BA-STA
G Holmes CV - BA-STAG Holmes CV - BA-STA
G Holmes CV - BA-STAGraham Holmes
 

Viewers also liked (14)

Twaitter
TwaitterTwaitter
Twaitter
 
QUIEDAN PRES 2015C_111815
QUIEDAN PRES 2015C_111815QUIEDAN PRES 2015C_111815
QUIEDAN PRES 2015C_111815
 
Bella Carlo LLC
Bella Carlo LLCBella Carlo LLC
Bella Carlo LLC
 
OUTLINE
OUTLINEOUTLINE
OUTLINE
 
Oracle HCM Cloud Overview
Oracle HCM Cloud OverviewOracle HCM Cloud Overview
Oracle HCM Cloud Overview
 
Side Effects
Side EffectsSide Effects
Side Effects
 
Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390
Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390
Hum. Mol. Genet.-2014-Goldstein-hmg-ddu390
 
Tugas call trrigger
Tugas call trriggerTugas call trrigger
Tugas call trrigger
 
PPF_032715_SPDM2015_Booklet-FINAL
PPF_032715_SPDM2015_Booklet-FINALPPF_032715_SPDM2015_Booklet-FINAL
PPF_032715_SPDM2015_Booklet-FINAL
 
Product research-Stoker
Product research-StokerProduct research-Stoker
Product research-Stoker
 
Traitement des données manquantes et aberrantes sous R
Traitement des données manquantes et aberrantes sous RTraitement des données manquantes et aberrantes sous R
Traitement des données manquantes et aberrantes sous R
 
Time Series Data: OpenTSDB and TSP (Betfair)
Time Series Data: OpenTSDB and TSP (Betfair)Time Series Data: OpenTSDB and TSP (Betfair)
Time Series Data: OpenTSDB and TSP (Betfair)
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
G Holmes CV - BA-STA
G Holmes CV - BA-STAG Holmes CV - BA-STA
G Holmes CV - BA-STA
 

Similar to Storm Demo Talk - Denver Apr 2015

Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopKrishna-Kumar
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 

Similar to Storm Demo Talk - Denver Apr 2015 (20)

Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Trafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoopTrafodion – an enterprise class sql based on hadoop
Trafodion – an enterprise class sql based on hadoop
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 

Recently uploaded

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 

Recently uploaded (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 

Storm Demo Talk - Denver Apr 2015

  • 1. Page  1   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Real-Time Processing in Hadoop Big Data for Business Shane  Kumpf  &  Mac  Moore   SoluEons  Engineers,  Hortonworks   April  2015  
  • 2. Page  2   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Agenda   §  IntroducEon  &  about  Hortonworks  HDP   §  Overview  of  logisEcs  industry  scenario   §  Overview  of  streaming  architecture  on  HDP   §  Streaming  Demo  #1   §  IntegraEng  PredicEve  AnalyEcs  in  streaming  scenarios   §  Streaming  Demo  with  PredicEve  addiEons   §  Q  &  A   Page  2  
  • 3. Page  3   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Preface:  Enabling  Technologies   Page  3   • Problems solved at scale, via fundamentally new approaches… • Make it possible, even simple, to produce new products/applications that would have been too cost prohibitive – or simply impossible - beforehand. • Where foundation tech like Li-­‐Ion  baUeries,  reEna  displays,  &  Eny  HD  cameras  (from   smartphones)  have  enabled  Electric  cars,  quad-­‐copters,  VR  displays,  &  more…   • Hadoop  has  similarly  led  to  breakthroughs  in  big  data  capability,  and  enables  new  real-­‐ Eme  advanced  analyEc  applicaEons.  
  • 4. Page  4   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Why did Hadoop emerge? April  2015  
  • 5. Page  5   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved       Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale Business  Value           Clickstream   GeolocaEon   Web  Data   Internet  of  Things   Docs,  emails   Server  logs   2012   2.8  Ze5abytes   2020   40  Ze5abytes   LAGGARDS   INDUSTRY   LEADERS   1 2 New Data   ERP   CRM   SCM   New     TradiKonal  
  • 6. Page  6   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP Spring  2015   Hortonworks. We do Hadoop.
  • 7. Page  7   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Hadoop  for  the  Enterprise:     Implement  a  Modern  Data  Architecture  with  HDP   Customer Momentum •  330+ customers (as of year-end 2014) Hortonworks Data Platform •  Completely open multi-tenant platform for any app & any data. •  A centralized architecture of consistent enterprise services for resource management, security, operations, and governance. Partner for Customer Success •  Open source community leadership focus on enterprise needs •  Unrivaled world class support •  Founded in 2011 •  Original 24 architects, developers, operators of Hadoop from Yahoo! •  600+ Employees •  1000+ Ecosystem Partners
  • 8. Page  8   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Customer Partnerships matter Driving  our  innovaKon  through   Apache  SoSware  FoundaKon  Projects   Apache  Project   Commi5ers   PMC   Members   Hadoop   27   21   Pig   5   5   Hive   18   6   Tez   16   15   HBase   6   4   Phoenix   4   4   Accumulo   2   2   Storm   3   2   Slider   11   11   Falcon   5   3   Flume   1   1   Sqoop   1   1   Ambari   34   27   Oozie   3   2   Zookeeper   2   1   Knox   13   3   Ranger   10   n/a   TOTAL   161   108   Source:  Apache  Sobware  FoundaEon.  As  of  11/7/2014.   Hortonworkers  are  the  architects  and   engineers  that  lead  development  of  open   source  Apache  Hadoop  at  the  ASF   •  ExperKse   Uniquely  capable  to  solve  the  most  complex  issues  &   ensure  success  with  latest  features   •  ConnecKon   Provide  customers  &  partners  direct  input  into     the  community  roadmap   •  Partnership   We  partner  with  customers  with  subscripEon  offering.   Our  success  is  predicated  on  yours.   27   Cloudera:  11     Facebook:  5     LinkedIn:  2     IBM:  2     Others:  23     Yahoo   10    
  • 9. Page  9   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Technology Partnerships matter Apache  Project   Hortonworks   RelaKonship   Named   Partner   CerEfied   SoluEon   Resells   Joint   Engr   MicrosoS   u   u   u   u   HP   u   u   u   u   SAS   u   u   u   SAP   u   u   u   u   IBM   u   u   u   Pivotal   u   u   u   Redhat   u   u   u   Teradata   u   u   u   u   InformaKca   u   u   u   Oracle   u   u   It  is  not  just  about   packaging  and  cerEfying   sobware…     Our  joint  engineering   with  our  partners  drives   open  source  standards   for  Apache  Hadoop         HDP  is     Apache  Hadoop  
  • 10. Page  10   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   HDP delivers a Centralized Architecture Modern Data Architecture •  Unifies data and processing. •  Enables applications to have access to all your enterprise data through an efficient centralized platform •  Supported with a centralized approach governance, security and operations •  Versatile to handle any applications and datasets no matter the size or type Clickstream   Web     &  Social   GeolocaKon   Sensor     &  Machine   Server     Logs   Unstructured   SOURCES   ExisKng  Systems   ERP   CRM   SCM   ANALYTICS   Data     Marts   Business     AnalyKcs   VisualizaKon   &  Dashboards   ANALYTICS   ApplicaKons   Business     AnalyKcs   VisualizaKon   &  Dashboards   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   HDFS     (Hadoop  Distributed  File  System)   YARN:  Data  OperaKng  System   Interactive Real-TimeBatch Partner ISVBatch Batch MPP   EDW  
  • 11. Page  11   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Real World Use Case: Trucking Company Spring  2015   Hortonworks. We do Hadoop.
  • 12. Page  12   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Scenario Overview .
  • 13. Page  13   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Trucking  company  w/  large  fleet  of  trucks  in  Midwest   A  truck  generates  millions  of  events  for  a   given  route;  an  event  could  be:   §  'Normal'  events:  starEng  /  stopping  of  the   vehicle   §  ‘ViolaEon’  events:  speeding,  excessive   acceleraEon  and  breaking,  unsafe  tail  distance   Company  uses  an  applicaKon  that  monitors   truck  locaKons  and  violaKons  from  the  truck/ driver  in  real-­‐Kme   Route?   Truck?   Driver?     Analysts  query  a  broad   history  to  understand  if   today’s  violaEons  are   part  of  a  larger  problem   with  specific  routes,   trucks,  or  drivers  
  • 14. Page  14   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Distributed  Storage:  HDFS   Many  Workloads:  YARN   Trucking  Company’s  YARN-­‐enabled  Architecture   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   Real-­‐Eme  Serving   (HBase)   Alerts  &  Events   (AcEveMQ)   Real-­‐Time     User  Interface   One  cluster  with  consistent   security,  governance  &   operaKons   SQL   InteracEve  Query   (Hive  on  Tez)   Truck  Sensors  
  • 15. Page  15   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Distributed  Storage:  HDFS   Many  Workloads:  YARN   Trucking  Company’s  YARN-­‐enabled  Architecture   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   Real-­‐Eme  Serving   (HBase)   Alerts  &  Events   (AcEveMQ)   Real-­‐Time     User  Interface   One  cluster  with  consistent   security,  governance  &   operaKons   SQL   InteracEve  Query   (Hive  on  Tez)   Truck  Sensors  
  • 16. Page  16   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services What  is  Kara?     APACHE  KAFKA   §  High  throughput  distributed  messaging   system   §  Publish-­‐Subscribe  semanEcs  but  re-­‐ imagined  at  the  implementaEon  level  to   operate  at  speed  with  big  data  volumes     §  Kara  @LinkedIn:   §  800  billion  messages  per  day   §  175  terabytes  of  data  wriUen  per  day   §  650  terabytes  of  data  read  per  day   §  Over  13  million  messages/2.75GB  of  data   per  second   Kaga   Cluster   producer   producer   producer   consumer   consumer   consumer  
  • 17. Page  17   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Kara:  Anatomy  of  a  Topic   ParKKon  0   ParKKon  1   ParKKon  2     0   0   0   1   1   1   2   2   2   3   3   3   4   4   4   5   5   5   6   6   6   7   7   7   8   8   8   9   9   9   10   10   11   11   12   Writes   Old   New   APACHE  KAFKA   §  ParEEoning  allows  topics  to   scale  beyond  a  single   machine/node     §  Topics  can  also  be  replicated,   for  high  availability.  
  • 18. Page  18   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Distributed  Storage:  HDFS   Many  Workloads:  YARN   Trucking  Company’s  YARN-­‐enabled  Architecture   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   Real-­‐Eme  Serving   (HBase)   Alerts  &  Events   (AcEveMQ)   Real-­‐Time     User  Interface   One  cluster  with  consistent   security,  governance  &   operaKons   SQL   InteracEve  Query   (Hive  on  Tez)   Truck  Sensors  
  • 19. Page  19   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Apache  Storm   • Distributed,  real  Eme,  fault  tolerant  Stream  Processing  plaxorm.   • Provides  processing  guarantees.   • Key  concepts  include:   • Tuples   • Streams   • Spouts   • Bolts   • Topology   Page  19  
  • 20. Page  20   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Tuples  and  Streams   • What  is  a  Tuple?   – Fundamental  data  structure  in  Storm.    Is  a  named  list  of  values  that  can  be  of  any  data  type.     Page  20   • What  is  a  Stream?   – An  unbounded  sequences  of  tuples.   – Core  abstracEon  in  Storm  and  are  what  you  “process”  in  Storm    
  • 21. Page  21   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Spouts   • What  is  a  Spout?   – Generates  or  a  source  of  Streams   – E.g.:  JMS,  TwiUer,  Log,  Kara  Spout   – Can  spin  up  mulEple  instances  of  a  Spout  and  dynamically  adjust  as  needed   Page  21  
  • 22. Page  22   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Bolts   • What  is  a  Bolt?   – Processes  any  number  of  input  streams  and  produces  output  streams   – Common  processing  in  bolts  are  funcEons,  aggregaEons,  joins,  read/write  to  data  stores,  alerEng   logic   – Can  spin  up  mulEple  instances  of  a  Bolt  and  dynamically  adjust  as  needed   • Bolts  used  in  the  Use  Case:   1.  HBaseBolt:  persisEng  and  counEng  in  Hbase   2.  HDFSBolt:  persisEng  into  HFDS  as  Avro  Files  using  Flume   3.  MonitoringBolt:  Read  from  Hbase  and  create  alerts  via  email  and  a  message  to  AcEveMQ  if  the   number  of  illegal  driver  incidents  exceed  a  given  threshhold.   Page  22  
  • 23. Page  23   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Topology   • What  is  a  Topology?   – A  network  of  spouts  and  bolts  wired  together  into  a  workflow   Page 23 Truck-Event-Processor Topology Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt WebSocket Bolt Stream Stream Stream Stream
  • 24. Page  24   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Distributed  Storage:  HDFS   Many  Workloads:  YARN   Trucking  Company’s  YARN-­‐enabled  Architecture   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   Real-­‐Eme  Serving   (HBase)   Alerts  &  Events   (AcEveMQ)   Real-­‐Time     User  Interface   One  cluster  with  consistent   security,  governance  &   operaKons   SQL   InteracEve  Query   (Hive  on  Tez)   Truck  Sensors  
  • 25. Page  25   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Key  Constructs  in  Apache  HBase   •  HBase = Key /Value store •  Designed for petabyte scale •  Supports low latency reads, writes and updates •  Key features – Updateable records – Versioned Records – Distributed across a cluster of machines – Low Latency – Caching •  Popular use cases: – User profiles and session state – Object store – Sensor apps Page  25  
  • 26. Page  26   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Data  Assignment   Page  26   HBase  Table   Keys  within  HBase   Divided  among   different  RegionServers  
  • 27. Page  27   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Data  Access   • Get – Retrieves a single cell, all cells with a matching rowkey, or all cells in a column family with a matching rowkey • Put – Inserts a new version of a cell.   • Scan – The whole table, row by row, or a section of that table starting at a particular start key and ending at a particular end key • Delete – It is actually a version of put(Add a new version with put with a deletion marker) • SQL via Apache Phoenix – Unique capability in the NoSQL market Page  27  
  • 28. Page  28   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Distributed  Storage:  HDFS   Many  Workloads:  YARN   Trucking  Company’s  YARN-­‐enabled  Architecture   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   Real-­‐Eme  Serving   (HBase)   Alerts  &  Events   (AcEveMQ)   Real-­‐Time     User  Interface   One  cluster  with  consistent   security,  governance  &   operaKons   SQL   InteracEve  Query   (Hive  on  Tez)   Truck  Sensors  
  • 29. Page  29   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   2009  2006   1   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   MapReduce   Largely  Batch  Processing   Hadoop  w/   MapReduce   YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS 
 (Hadoop Distributed File System) Hadoop2 & YARN based Architecture Silo’d clusters Largely batch system Difficult to integrate MR-­‐279:  YARN   Hadoop 2 & YARN Interactive Real-TimeBatch Architected & 
 led development of YARN to enable the Modern Data Architecture October 23, 2013
  • 30. Page  30   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Benefits  of  YARN  as  the  Data  OperaEng  System   • The container based model allows for running nearly any workload. – Enables the centralized architecture. – No longer is MapReduce the only data processing engine. – Docker containers managed byYARN.Yes Please! • Decouples resource scheduling from application lifecycle. – Improved scalability and fault tolerence • Dynamically allocated resources, resulting in HUGE utilization gains – Versus static allocation of “slots” in Hadoop 1.0 Page  30   Yahoo has over 30000 nodes runningYARN across over 365PB of data. They calculate running about 400,000 jobs per day for about 10 million hours of compute time. They also have estimated a 60% – 150% improvement on node usage per day since moving toYARN.
  • 31. Page  31   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Distributed  Storage:  HDFS   Many  Workloads:  YARN   Trucking  Company’s  YARN-­‐enabled  Architecture   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   Real-­‐Eme  Serving   (HBase)   Alerts  &  Events   (AcEveMQ)   Real-­‐Time     User  Interface   One  cluster  with  consistent   security,  governance  &   operaKons   SQL   InteracEve  Query   (Hive  on  Tez)   Truck  Sensors  
  • 32. Page  32   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Apache HDFS – Hadoop Distributed File System   •  Very large scale distributed file system •  10K nodes, tens of millions files and PBs of data •  Supports large files •  Designed to run on commodity hardware, assumes hardware failures •  Files are replicated to handle hardware failure •  Detect failures and recovers from them automatically •  Optimized for Large Scale Processing •  Data locations are exposed so that the computations can move to where data resides •  Data Coherency •  Write once and read many times access pattern •  Files are broken up in chunks called ‘blocks’ •  Blocks are distributed over nodes Page  32  
  • 33. Page  33   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Streaming  Demo  -­‐  High  Level  Architecture   Distributed  Storage:  HDFS   YARN   Storm  Stream  Processing   Kakfa  Spout   HBase   Dangerous   Events  Table   Hbase   Bolt   HDFS   Bolt   Truck  Events   AcKve     MQ   Monitoring   Bolt   Web  App   Truck  Streaming  Data   T(1)   T(2)   T(N)   Inbound  Messaging   (Kaga)   Truck  Events  Topic  
  • 34. Page  34   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Demo  –  Streaming  Dashboard   .
  • 35. Page  35   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   A  New  Challenge   .
  • 36. Page  36   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   CDO’s  vision:  Build  a  PredicEve  Business,  not  a  ReacEve  one   CDO’s  Requirements   §  Offline  predicKons   §  IdenKfy  investments  that  will  increase   safety  and  reduce  company’s  liabiliKes   §  Real-­‐Kme  predicKons     §  AnKcipate  driver  violaKons  before  they   happen  and  take  precauKonary  acKons   Data  ScienKst’s  Response   §  Need  to  explore  data  &  form  a  hypothesis   §  Verify  trends  against  TBs  of  events  data  via   machine  learning   §  Generate  predicEve  models  with  Spark   MLlib  on  HDP     §  Plug  models  into  the  Storm  topology  to  predict   driver  violaEons  in  real-­‐Eme   ♬  I’ve  been  wai+ng  for   this  moment  all  my  life  ♬  
  • 37. Page  37   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Demo  –  Analyzing  Events  with  Tableau   .
  • 38. Page  38   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Raw Events – dangerous drivers Page 38
  • 39. Page  39   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Raw Events – dangerous routes Page 39
  • 40. Page  40   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Raw Events – violations by location Page 40
  • 41. Page  41   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Enriching  truck  events  for  analysis  with  Pig   HDFS   Raw  Truck  Events  Weather  Data  Sets   Raw  Weather  Data   HCatalog  (Metadata)   Payroll  Data   HR  &  Payroll  DBs   Load  Raw  Truck   Events   Clean  &     Filter   Cleaned   Events   Transformed   Events   Transform       Join  with   HR  &  weather  data   Enriched   Events   Enriched  Events   Store   Tableau    
  • 42. Page  42   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Enriched Events – noncertified and fatigued drivers more dangerous Page 42
  • 43. Page  43   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Enriched Events – top 3 dangerous routes seem to be driven by fatigued drivers Page 43
  • 44. Page  44   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Enriched Events – foggy weather leads to violations Page 44
  • 45. Page  45   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Analyzing Enriched Events – but top 3 safest routes are also foggy Page 45
  • 46. Page  46   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   IntegraEng  PredicEve  AnalyEcs  
  • 47. Page  47   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Building  the  PredicEve  Model  on  HDP   Tableau     Explore  small  subset  of  events  to  idenEfy  predicEve   features  and  make  a  hypothesis.  E.g.  hypothesis:  “foggy   weather  causes  driver  viola+ons”   1   IdenEfy  suitable  ML  algorithms  to  train  a  model  –  we  will   use  classificaEon  algorithms  as  we  have  labeled  events   data     2   Transform  enriched  events  data  to  a  format  that  is   friendly  to  Spark  MLlib  –  many  ML  libs  expect   training  data  in  a  certain  format   3   Train  a  logisEc  regression  model  in  Spark  on  YARN,  with   above  events  as  training  input,  and  iterate  to  fine  tune   the  generated  model   4     Integrate  Spark  MLlib  model  in  a  Storm  bolt  to  predict   violaEons  in  real  Eme   5  
  • 48. Page  48   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Truck  Sensors   HDFS   YARN   Integrate  PredicEve  AnalyEcs  in  Stream  Processing   Stream  Processing   (Storm)   Inbound  Messaging   (Kara)   InteracEve  Query   (Hive  on  Tez)   Real-­‐Eme  Serving   (HBase)   Millions  of  Enriched  Truck  Events     PredicEon  Bolt   Plug  Spark  model   into  Storm  bolt   Machine  Learning   (Spark)   Train  Spark  ML  model  with   millions  of  truck  events  
  • 49. Page  49   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   © Hortonworks Inc. 2012 Professional Services Streaming  Demo  -­‐  Updated  Architecture   Distributed  Storage:  HDFS   YARN   Storm  Stream  Processing   Kakfa  Spout   HBase   PayRoll   Table  HBase   Bolt   HDFS   Bolt   Truck  Events   AcKve     MQ   Monitoring   Bolt   Web  App   Truck  Streaming  Data   T(1)   T(2)   T(N)   Inbound  Messaging   (Kaga)   Truck  Events  Topic   PredicKon   Bolt   Enrich     Event   Predict   violaKon  in  real   Kme    &  alert   via  MQ   Render  Real  Kme   predicKons  on  UI  
  • 50. Page  50   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Transforming  training  data  for  Spark  MLlib   Enriched  Events  Data   Event  Type   Is  Driver   CerKfied?   Wage   Plan   Hours   Driven   Miles   Driven   Longitude   LaKtude   Weather   Foggy   Weather     Rainy   Weather     Windy   Normal   Yes   Hourly   45   2721   -­‐91.3   38.14   No   No   No   Overspeed   No   Miles   72   4152   -­‐94.23   37.09   Yes   Yes   No   …   …   …   …   …   …   …   …   …   …   Spark  MLlib    Training  Data   Label   Is  Driver   CerKfied?   Wage   Plan   Hours   Driven   Miles   Driven   Weather   Foggy   Weather     Rainy   Weather     Windy   0   1   1   0.45   0.2721   0   0   0   1   0   0   0.72   0.4152   1   1   0   …   …   …   …   …   …   …   …   Normal  events   labeled  as  0  and   violaEon  events  as  1   Feature  scaling  applied  to   hours  and  miles  to  improve   algorithm  performance   Features  with  binary  values     denoted  as  0  and  1  
  • 51. Page  51   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Running  Spark  ML  on  YARN   1   spark-­‐submit  -­‐-­‐class  org.apache.spark.examples.mllib.BinaryClassifica+on  -­‐-­‐master  yarn-­‐cluster    -­‐-­‐ num-­‐executors  3  -­‐-­‐driver-­‐memory  512m    -­‐-­‐executor-­‐memory  512m         -­‐-­‐executor-­‐cores  1  truckml.jar  -­‐-­‐algorithm  LR  -­‐-­‐regType  L2  -­‐-­‐regParam  1.0  /user/root/truck_training     -­‐-­‐numItera3ons  100   Run  spark-­‐submit  script  to  launch  a  Spark  job  on  YARN.   Training  data   locaEon  on  HDFS   2   Monitor  progress  of  Spark  job  in  YARN  Resource  Mgr  UI  
  • 52. Page  52   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   InterpreEng  Spark  LogisEc  Regression  Results   Precision:  87.5%   Recall:  88%    Top  three  predictors  of  violaKons     1.  Foggy  Weather  2.  Rainy  Weather  3.  Driver  CerEficaEon  
  • 53. Page  53   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   IntegraEng  Spark  model  in  Storm   Kara  Spout              Storm  PredicEon  Bolt   §  IniEalize  Spark  model   §  Parse  truck  event   §  Enrich  event  with  HBase  data   §  Predict  violaEon  with  model   §  Send  Alert  if  violaEon  predicted   Real-­‐Eme  Serving   (HBase)   AcKve  MQ   Ops  Center   LOB  Dashboards  
  • 54. Page  54   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Summary:  SoluEon  Value   .
  • 55. Page  55   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Value  of  large  scale  ML  on  HDP   §  Accelerate  Kme  to  market/value   §  Test  out  mulEple  ML  algorithms  against  TBs  of  training  data  in   reasonable  Eme  frames   §  Confirm  hypothesis  against  TBs  of  training  data  with  confidence   §  We  confirmed  that  fog  does  impact  safety  and  wage  plans  do  not,   whereas  BI  tools  indicated  otherwise     §  Easily  integrate  predicKve  models  in  data  driven  apps   §  Run  predicEve  models  in  Storm  or  any  other  app  in  your  enterprise     §  Run  all  of  the  above  in  a  mulK-­‐tenant  YARN  cluster   §  Large  scale  ML  on  YARN  respects  other  tenants  in  an  HDP  cluster  
  • 56. Page  56   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   RecommendaEons  to  CDO   §  Investment  recommendaKons,  in  order  of  priority   1.  Invest  in  visibility  sensors  and  auto  braking  systems  to  deal  with  foggy  condiEons   2.  Invest  in  slip  resistant  Eres  to  fight  rainy  condiEons   3.  Invest  in  cerEfying  drivers  to  reduce  violaEon  probability           §  Power  of  real  Kme  predicKons   §  40%  reducEon  in  violaEon  rates  by  predicEng  high  risk  situaEons  in  real-­‐Eme  and   sending  immediate  alerts  to  drivers      
  • 57. Page  57   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   PredicEve  Demo   .
  • 58. Page  58   ©  Hortonworks  Inc.  2011  –  2014.  All  Rights  Reserved   Q & A Big Data for Business