Hadoop summit 2016

© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning using Spark and DL4J for
fun and profit
Adam Gibson and Dhruv Kumar
2015
Version 1.0

Who are we?
Adam Gibson
- Co founder of Skymind
- Wrote DeepLearning4J, ND4J
Dhruv Kumar
- Sr Solutions Architect, HWX
- MS Umass, Mahout, ASF

In this talk
- What’s Deep Learning?
- Architectures
- Implementation and Libraries in Real Life
- Demo!

Deep Learning
• One of the many pattern recognition techniques in Data
Science
• Excels at rich media applications:
• Image recognition
• Speech translation
• Voice recognition
• Loosely inspired by human brain models
• Synonymous with Artificial Neural Networks, Multi Layer
Networks

Enterprise use cases

Doing this in real life for enterprise

© Hortonworks Inc. 2011 – 2014. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP FOR DATA AT
REST
HDF FOR DATA IN
MOTION
ACTIONABLE
INTELLIGENCE
MODERN DATA APPS
Modern Data Applications
in Enterprise: Connected,
Fast, Intelligent
PERISHABLE
INSIGHTS
HISTORICAL
INSIGHTS
INTERNET
OF
ANYTHING

How do we realize MDA in a Hadoop Centric World?
HDF
Hadoop
HDFS
HBase Hive SOLR
YARN
Storm
Service
Management /
Workflow
SIEM
Spark
Raw Network Stream
Network Metadata Stream
Data Stores
Syslog
Raw Application Logs
Other Streaming Telemetry

www.hortonworks.com
NiFi 1
NiFi 2
Storm 1
Kafka 1
Storm 2
Kafka 2
Storm 3
Kafka 3
DataNode 1
HBase 1
Source 1
Source 2
Source 3
Source N
NiFi Nodes
Edge Nodes
Master NodesClients 1
Clients 2
DataNode 2
Hbase 2
DataNode 3
Hbase 3
DataNode 4
Hbase 4
DataNode 5
Hbase 5
DataNode 6
Hbase 6
DataNode 7
Hbase 7
DataNode 8
Hbase 8
DataNode 9 DataNode 10
DataNode 31 DataNode 32
Master 1
Master 2
Master 3
Master 4
Master 5
Worker Nodes
HDF
HDP
World Azure

Storm/Spark Streaming
Storm
Detailed Reference Architecture
HDF
Flume
Sink to
HDFS
Transform
Interactive
UI Framework
Hive
Hive
HDFS
HDFS
SOURCE DATA
Server logs
Application Logs
Firewall Logs
CRM/ERP
Sensor
Kafka
Kafka
Stream to
HDF
Forward to
Storm
Real Time Storage
Spark-ML
Pig
Alerts
Bolt to
HDFS
Dashboard
Silk
JMS
Alerts
Hive Server
HiveServer
Reporting
BI Tools
High Speed
Ingest
Real-Time
Batch Interactive
Machine Learning
Models
Spark
Pig
Alerts SQOOP
Flume
Iterative ML
Hbase/Pheonix
HBaseEvent Enrichment
Spark-Thrift
Pig

For Model Building: Typical Workflow
11
1.Ingest training data and store it
2.Split data set into: training, testing and validation sets
3.Vectorize and extract features to go into next step
4.Architect multi layer network, initialize
5.Feed data and train
6.Test and Validate
7.Repeat steps 4 and 5 until desired
8.Store model
9.Put model in app, start generalizing on real data.

So what do you get?
12
1.Ingest training data and store it using Nifi or other ingest tools
2.Split data set into: training, testing and validation sets
3.Vectorize and extract features to go into next step
4.Architect multi layer network, initialize
5.Feed data and train
6.Test and Validate
7.Repeat steps 4 and 5 until desired
8.Store model
9.Put model in app, start generalizing on real data.
Steps 2, 3, 4 and 5:
Use libraries such as
Deeplearning4j

Deeplearning4j Architecture
13

DL4J: Canova for Vectorization and Ingest
• Canova uses an input/output format system (similar to
how Hadoop uses MapReduce)
• Supports all major types of input data (text, CSV, audio,
image and video)
• Can be extended for specialized input formats
• Connects to Kafka
14

ND4J:
• N-dimensional vector library
• Scientific computing for JVM
• DL4J uses it to do linear algebra for backpropagation
• Supports GPUs via CUDA and Native via Jblas
• Deploys on Android
• DL4J code remains unchanged whether using GPU or
CPU
15

Demo!

Thank You
hortonworks.com

Hadoop summit 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Hadoop summit 2016

Similar to Hadoop summit 2016 (20)

More from Adam Gibson

More from Adam Gibson (18)

Recently uploaded

Recently uploaded (20)

Hadoop summit 2016

Editor's Notes