SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning using Spark and DL4J for
fun and profit
Adam Gibson and Dhruv Kumar
2015
Version 1.0
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Who are we?
Adam Gibson
- Co founder of Skymind
- Wrote DeepLearning4J, ND4J
Dhruv Kumar
- Sr Solutions Architect, HWX
- MS Umass, Mahout, ASF
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
In this talk
- What’s Deep Learning?
- Architectures
- Implementation and Libraries in Real Life
- Demo!
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning
• One of the many pattern recognition techniques in Data
Science
• Excels at rich media applications:
• Image recognition
• Speech translation
• Voice recognition
• Loosely inspired by human brain models
• Synonymous with Artificial Neural Networks, Multi Layer
Networks
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise use cases
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Doing this in real life for enterprise
Page7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP FOR DATA AT
REST
HDF FOR DATA IN
MOTION
ACTIONABLE
INTELLIGENCE
MODERN DATA APPS
Modern Data Applications
in Enterprise: Connected,
Fast, Intelligent
PERISHABLE
INSIGHTS
HISTORICAL
INSIGHTS
INTERNET
OF
ANYTHING
How do we realize MDA in a Hadoop Centric World?
HDF
Hadoop
HDFS
HBase Hive SOLR
YARN
Storm
Service
Management /
Workflow
SIEM
Spark
Raw Network Stream
Network Metadata Stream
Data Stores
Syslog
Raw Application Logs
Other Streaming Telemetry
www.hortonworks.com
NiFi 1
NiFi 2
Storm 1
Kafka 1
Storm 2
Kafka 2
Storm 3
Kafka 3
DataNode 1
HBase 1
Source 1
Source 2
Source 3
Source N
NiFi Nodes
Edge Nodes
Master NodesClients 1
Clients 2
DataNode 2
Hbase 2
DataNode 3
Hbase 3
DataNode 4
Hbase 4
DataNode 5
Hbase 5
DataNode 6
Hbase 6
DataNode 7
Hbase 7
DataNode 8
Hbase 8
DataNode 9 DataNode 10
DataNode 31 DataNode 32
Master 1
Master 2
Master 3
Master 4
Master 5
Worker Nodes
HDF
HDP
World Azure
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm/Spark Streaming
Storm
Detailed Reference Architecture
HDF
Flume
Sink to
HDFS
Transform
Interactive
UI Framework
Hive
Hive
HDFS
HDFS
SOURCE DATA
Server logs
Application Logs
Firewall Logs
CRM/ERP
Sensor
Kafka
Kafka
Stream to
HDF
Forward to
Storm
Real Time Storage
Spark-ML
Pig
Alerts
Bolt to
HDFS
Dashboard
Silk
JMS
Alerts
Hive Server
HiveServer
Reporting
BI Tools
High Speed
Ingest
Real-Time
Batch Interactive
Machine Learning
Models
Spark
Pig
Alerts SQOOP
Flume
Iterative ML
Hbase/Pheonix
HBaseEvent Enrichment
Spark-Thrift
Pig
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
For Model Building: Typical Workflow
11
1.Ingest training data and store it
2.Split data set into: training, testing and validation sets
3.Vectorize and extract features to go into next step
4.Architect multi layer network, initialize
5.Feed data and train
6.Test and Validate
7.Repeat steps 4 and 5 until desired
8.Store model
9.Put model in app, start generalizing on real data.
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So what do you get?
12
1.Ingest training data and store it using Nifi or other ingest tools
2.Split data set into: training, testing and validation sets
3.Vectorize and extract features to go into next step
4.Architect multi layer network, initialize
5.Feed data and train
6.Test and Validate
7.Repeat steps 4 and 5 until desired
8.Store model
9.Put model in app, start generalizing on real data.
Steps 2, 3, 4 and 5:
Use libraries such as
Deeplearning4j
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deeplearning4j Architecture
13
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DL4J: Canova for Vectorization and Ingest
• Canova uses an input/output format system (similar to
how Hadoop uses MapReduce)
• Supports all major types of input data (text, CSV, audio,
image and video)
• Can be extended for specialized input formats
• Connects to Kafka
14
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
ND4J:
• N-dimensional vector library
• Scientific computing for JVM
• DL4J uses it to do linear algebra for backpropagation
• Supports GPUs via CUDA and Native via Jblas
• Deploys on Android
• DL4J code remains unchanged whether using GPU or
CPU
15
Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16
How to chose a
Neural Net in
DL4J core?
Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo!
Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You
hortonworks.com

More Related Content

What's hot

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Spark Summit
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Spark Summit
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
Uwe Korn
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
How to Choose a Deep Learning Framework
How to Choose a Deep Learning FrameworkHow to Choose a Deep Learning Framework
How to Choose a Deep Learning Framework
Navid Kalaei
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
Navid Kalaei
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Databricks
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
Sri Ambati
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
Carl Steinbach
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
data.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowledata.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowle
Sri Ambati
 

What's hot (20)

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
Optimizing Spark Deployments for Containers: Isolation, Safety, and Performan...
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
How to Choose a Deep Learning Framework
How to Choose a Deep Learning FrameworkHow to Choose a Deep Learning Framework
How to Choose a Deep Learning Framework
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
data.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowledata.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowle
 

Viewers also liked

Jessie j presentation
Jessie j presentationJessie j presentation
Jessie j presentationmollyruby5
 
Jessie J
Jessie JJessie J
Jessie Jhevaw
 
SKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetupSKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetup
Adam Gibson
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
Adam Gibson
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
Adam Gibson
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wild
Adam Gibson
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
Adam Gibson
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
Adam Gibson
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the best
Adam Gibson
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
Adam Gibson
 
Dynamic and Static Modeling
Dynamic and Static ModelingDynamic and Static Modeling
Dynamic and Static ModelingSaurabh Kumar
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
Adam Gibson
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 

Viewers also liked (14)

Jessie j presentation
Jessie j presentationJessie j presentation
Jessie j presentation
 
Jessie J
Jessie JJessie J
Jessie J
 
SKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetupSKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetup
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wild
 
Jessie j
Jessie jJessie j
Jessie j
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the best
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
 
Dynamic and Static Modeling
Dynamic and Static ModelingDynamic and Static Modeling
Dynamic and Static Modeling
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 

Similar to Hadoop summit 2016

Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
Raúl Marín
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
Rommel Garcia
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
Rommel Garcia
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 

Similar to Hadoop summit 2016 (20)

Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 

More from Adam Gibson

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
Adam Gibson
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Deploying signature verification with deep learning
Deploying signature verification with deep learningDeploying signature verification with deep learning
Deploying signature verification with deep learning
Adam Gibson
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep Learning
Adam Gibson
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
Adam Gibson
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
Adam Gibson
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
Adam Gibson
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
Adam Gibson
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
Adam Gibson
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round Table
Adam Gibson
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
Adam Gibson
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
Adam Gibson
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
Adam Gibson
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
Adam Gibson
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
Adam Gibson
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Adam Gibson
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
Adam Gibson
 

More from Adam Gibson (18)

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Deploying signature verification with deep learning
Deploying signature verification with deep learningDeploying signature verification with deep learning
Deploying signature verification with deep learning
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
 
Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep Learning
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round Table
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Hadoop summit 2016

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning using Spark and DL4J for fun and profit Adam Gibson and Dhruv Kumar 2015 Version 1.0
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Who are we? Adam Gibson - Co founder of Skymind - Wrote DeepLearning4J, ND4J Dhruv Kumar - Sr Solutions Architect, HWX - MS Umass, Mahout, ASF
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In this talk - What’s Deep Learning? - Architectures - Implementation and Libraries in Real Life - Demo!
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning • One of the many pattern recognition techniques in Data Science • Excels at rich media applications: • Image recognition • Speech translation • Voice recognition • Loosely inspired by human brain models • Synonymous with Artificial Neural Networks, Multi Layer Networks
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise use cases
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Doing this in real life for enterprise
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP FOR DATA AT REST HDF FOR DATA IN MOTION ACTIONABLE INTELLIGENCE MODERN DATA APPS Modern Data Applications in Enterprise: Connected, Fast, Intelligent PERISHABLE INSIGHTS HISTORICAL INSIGHTS INTERNET OF ANYTHING
  • 8. How do we realize MDA in a Hadoop Centric World? HDF Hadoop HDFS HBase Hive SOLR YARN Storm Service Management / Workflow SIEM Spark Raw Network Stream Network Metadata Stream Data Stores Syslog Raw Application Logs Other Streaming Telemetry
  • 9. www.hortonworks.com NiFi 1 NiFi 2 Storm 1 Kafka 1 Storm 2 Kafka 2 Storm 3 Kafka 3 DataNode 1 HBase 1 Source 1 Source 2 Source 3 Source N NiFi Nodes Edge Nodes Master NodesClients 1 Clients 2 DataNode 2 Hbase 2 DataNode 3 Hbase 3 DataNode 4 Hbase 4 DataNode 5 Hbase 5 DataNode 6 Hbase 6 DataNode 7 Hbase 7 DataNode 8 Hbase 8 DataNode 9 DataNode 10 DataNode 31 DataNode 32 Master 1 Master 2 Master 3 Master 4 Master 5 Worker Nodes HDF HDP World Azure
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm/Spark Streaming Storm Detailed Reference Architecture HDF Flume Sink to HDFS Transform Interactive UI Framework Hive Hive HDFS HDFS SOURCE DATA Server logs Application Logs Firewall Logs CRM/ERP Sensor Kafka Kafka Stream to HDF Forward to Storm Real Time Storage Spark-ML Pig Alerts Bolt to HDFS Dashboard Silk JMS Alerts Hive Server HiveServer Reporting BI Tools High Speed Ingest Real-Time Batch Interactive Machine Learning Models Spark Pig Alerts SQOOP Flume Iterative ML Hbase/Pheonix HBaseEvent Enrichment Spark-Thrift Pig
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved For Model Building: Typical Workflow 11 1.Ingest training data and store it 2.Split data set into: training, testing and validation sets 3.Vectorize and extract features to go into next step 4.Architect multi layer network, initialize 5.Feed data and train 6.Test and Validate 7.Repeat steps 4 and 5 until desired 8.Store model 9.Put model in app, start generalizing on real data.
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So what do you get? 12 1.Ingest training data and store it using Nifi or other ingest tools 2.Split data set into: training, testing and validation sets 3.Vectorize and extract features to go into next step 4.Architect multi layer network, initialize 5.Feed data and train 6.Test and Validate 7.Repeat steps 4 and 5 until desired 8.Store model 9.Put model in app, start generalizing on real data. Steps 2, 3, 4 and 5: Use libraries such as Deeplearning4j
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deeplearning4j Architecture 13
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DL4J: Canova for Vectorization and Ingest • Canova uses an input/output format system (similar to how Hadoop uses MapReduce) • Supports all major types of input data (text, CSV, audio, image and video) • Can be extended for specialized input formats • Connects to Kafka 14
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ND4J: • N-dimensional vector library • Scientific computing for JVM • DL4J uses it to do linear algebra for backpropagation • Supports GPUs via CUDA and Native via Jblas • Deploys on Android • DL4J code remains unchanged whether using GPU or CPU 15
  • 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16 How to chose a Neural Net in DL4J core?
  • 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo!
  • 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You hortonworks.com

Editor's Notes

  1. TALK TRACK I’m about to go over the products, consulting and training that Hortonworks offers, and I want you to keep this image in mind. Remember: The Internet of Anything is doubling the amount of data in the world every 2 years. Connected Data Platforms deliver an open-architected solution to manage data, both in motion and at rest, empowering your organization to gain Actionable Intelligence delivered to your end users through Modern Data Apps. Hortonworks DataFlow (aka HDF) manages your data in motion—bringing it to where you need it for real-time analysis to capture perishable insights or into storage for historical analysis. Hortonworks Data Platform (aka HDP) stores the data at rest and provides historical insights through deep, detailed analysis of everything that’s already happened. Those historical insights from HDP help optimize your data ingest with HDF, which in turn optimizes your data at rest. This is how HDF, HDP, and Modern Data Applications deliver actionable intelligence to your end users. And Actionable Intelligence is the beating heart animating the Future of Data. [NEXT SLIDE]
  2. CapOne – Ingesting from everywhere Email, Syslog, Applog, Netflow… Moving to “Cloud Only model”….even looking to use “docker Containers” in Amazon…
  3. The team puts together a detailed architecture of the proposed solution using HDP and HDF. The architecture considers sources data from the numerous sources including Server Logs, Application Logs, XML and Senso data. This data is easily accepted into the flexible schema of HDP using HDF and Sqoop. The data is processed using Pig and analyzed using Spark. Then the data is made available in a real-time dashboard as well as to visualization and reporting tools. [NEXT SLIDE]