Scalable Software Systems Laboratory
Scalable Software Systems Laboratory
Department of Electrical and Computer Engineering
CognitiveEngine: Boosting
Scientific Discovery
Xiaolin Andy Li
http://www.andyli.ece.ufl.edu
Scalable Software Systems Laboratory
Information Technology
Text in here
1939 1946 1970 1980 1990
New
Age	
ENIAC
ARPANET
The Internet
Fiber Optics
Vint Cerf
Bob Kahn Charles Kuen Kao
Mosaic Web Browser
Marc Andreessen and Eric Bina
WWW
Tim Berners Lee
Martin Cooper, 1973 Steve Jobs, 2007
1G, 1980s
2G, 1990s
3G, 2000s
4G, 2010s
ABC
John Atanasoff
BSEE@UF, 1925
Scalable Software Systems Laboratory
Cloud Computing
n  SaaS: Software as a Service
n  Salesforce, 1999
n  StaaS: Storage as a Service
n  Amazon S3, 2006; Dropbox, 2008
n  PaaS: Platform as a Service
n  Google App Engine, 2008; Microsoft Azure, 2010;
n  Docker, 2013; IBM BlueMix, 2014
n  IaaS: Infrastructure as a Service
n  Amazon AWS, 2002; Eucalyptus, 2008
n  Rackspace/NASA OpenStack, 2010; Google Compute Engine, 2012
2000
Scalable Software Systems Laboratory
SDN: Software-Defined Networking
Nick
McKeown
Scott
Schenker
Martin
Casado
2009
Scalable Software Systems Laboratory
Internet2 Innovation Platform2013
Scalable Software Systems Laboratory
Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis
2013
Scalable Software Systems Laboratory
1970 àà 1990 àà 2010 àà 2030 àà
2D IT Booming Cycles
IT Boom V2 IT Boom V3IT Boom V1
1950 à à à 1980 à à à 2010 à à à 2040
3D Computing Platform Cycles
2nd Platform 3rd Platform1st Platform 4th Platform
Towards Intelligent Platform
IT Boom V4
Scalable Software Systems Laboratory
Time for Change
Current Unified Big Systems
Hadoop
OpenStack
Torque
Pig
Dryad
Pregel
Percolator
CIEL
Container
Virtual
Machine Bare Metal
Scalable Software Systems Laboratory
GatorCloud
- Towards Software-Defined Ecosystems
OpenFlow
Software-
Defined
Computing
SDC
Apps
Runtime
Big Data
PBS/Torq
Virtual
MachineContainer
Nova
Controller
HPC
Program
Models
Software-
Defined
Networking
SDN
Apps
Low
Latency
SDN
Hypervisor
OVS
OF-
Config
Open
Flow
GENI
SDN
Controller
High
Throughp
ut
Scalable Software Systems Laboratory
GatorCloud Network Topology
2*10Gb/s
upgraded to
2*100Gb/s
National Lambda
Rail, Internet2, GENI
(via Jacksonville)
UF
Physics
CMS/OSG
Data Center
GatorVisor
SSRB
CNS Lab
NEB
S3Lab
CISE Lab
Apps Controller
Nets Controller
8	U
46	U
8	U
8	U
1	U
2	U
3	U
3	U
3	U
8	U
46	U
8	U
8	U
1	U
2	U
3	U
3	U
3	U
Data Cloud
VM Cloud Cloud Portal
VM Cloud
Data Cloud
2
2
2
2
100G
100G
100G 100G10G
40G
4
4
Cloud Orange
Cloud Green
FLR
ECDC
HPC Center - ES
Physics
HPC Center - Phy 2
100G
Larsen
HPC Center - Eng
SSRB
Campus Datacenter
Hybrid Controller
Larsen
HCS Lab
40G
4
2*10Gb/s
upgraded to
2*100Gb/s
Golfer
Golfer
Deployed in 2012, one of the first 100Gbps SDN Campus Research Networks in USA
SDN Switch
Phase 1 SDN, 40G/10G
Phase 2 SDN, 100G
SDN Control Plane
Scalable Software Systems Laboratory
HiPerGator Supercomputer
Ranking from top500 supercomputer list
# 4 among public universities in US
# 8 among universities in US
# 115 among all machines listed
Major Data Centers at UF
HiPerGator Supercomputer
CMS/OSG Physics
HPC Centers
ICBR: Interdisciplinary Center for Biotech Research
CTSI: Clinical and Translational Science Institute
ACIS/CAC Data Center
CHREC Data Center (Novo-G)
NEB Data Center
Scalable Software Systems Laboratory
What Changed?
Lecture 1 -Fei-Fei Li & Andrej Karpathy & Justin Johnson
Convolution
Pooling
Softmax
Other
GoogLeNet VGG MSRASuperVision
[Krizhevsky NIPS 2012]
Year 2012 Year 2014Year 2010
Dense grid descriptor:
HOG, LBP
Coding: local coordinate,
super-vector
Pooling, SPM
Linear SVM
NEC-UIUC
[Lin CVPR 2011] [Szegedy arxiv 2014] [Simonyan arxiv 2014]
4-Jan-1631
Year 2015
Revolution of Depth
34
58
66
86
HOG, DPM AlexNet
(RCNN)
VGG
(RCNN)
ResNet
(Faster RCNN)*
PASCAL VOC 2007 Object Detection mAP (%)
shallow
8 layers
16 layers
101 layers
*w/ other improvem
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition
Engines of
visual recognition
Revolution of Depth
3.57
6.7 7.3
11.7
16.4
25.8
28.2
ILSVRC'15
ResNet
ILSVRC'14
GoogleNet
ILSVRC'14
VGG
ILSVRC'13 ILSVRC'12
AlexNet
ILSVRC'11 ILSVRC'10
ImageNet Classification top-5 error (%)
shallow8 layers
19 layers22 layers
152 layers
Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015.
8 layers
Beyond
Human
Scalable Software Systems Laboratory
CognitiveEngine: Beyond Hadoop and Spark
n  Bulk Synchronization Parallel
n  Both a blessing and a curse
n  Easy to schedule and arrange dependency
n  All synchronized
Map
Reduce
Stage
Stage
Stage
Stage
Scalable Software Systems Laboratory
ADD Design Choices
n  Asynchronous Distributed Datasets (ADD)
n  Inherits the easy-to-use programming interface
n  Differentiate static data (samples) and the iteratively updated data
(parameters)
n  Automatic asynchronous updates, with user specified bound
n  Asynchronous-aware scheduling
Scalable Software Systems Laboratory
ADD local
copy
ADD System
ADD Server
ADD Server
ADD Client
ADD Client
ADD Client
Training
samples
Training
samples
Training
samples
Async
push
Async
pull
Feed Forward +
Back Propagation
ADD features
•  Async push and pull of model
update
•  Users are allowed to specify the
condition of returning from pull/
push, so that they don’t have to
wait
•  Adaptive model update method:
all-to-one/tree aggregation/P2P
approximate update
•  User-controllable tradeoff
between asynchrony and
convergence rate
•  Model snapshot and sharing
Scalable Software Systems Laboratory
Execution
Static
Data
Dynamic Data
Handler
Function State
ADD Partition
ADD
Task
ADD
Task
ADD
Task
Locality Iteration, etc.
Fetch
Compute
Update
Bookkeeping
Scalable Software Systems Laboratory
Advantages
n  Asynchronous Update
n  IO / CPU overlap
n  Fault tolerant
n  Derive and live with state-of-the-art system
n  Spark
n  Sharing among jobs and users
n  Maximizing parallelism of GPUs
Scalable Software Systems Laboratory
DeepApps
n  DeepScience
n  DeepSky
n  DeepDefense
n  DeepHealth
n  DeepBipolar
n  DeepVital
n  DeepGuard
n  DeepCancer
n  DeepBot/Dingding
n  DeepDrug
Scalable Software Systems Laboratory
DeepSky: Sloan Digital Sky Survey
With Jian Ge
Scalable Software Systems Laboratory
The animation shows how Kepler detects planets. As the
planet passes between the host star and the spacecraft,
the observed star brightness decreases slightly, signaling
the potential detection of a planet.
Kepler looked at over 150,000 stars continuously for four
years in the constellations Cygnus and Lyra, seeking to
record the slight periodic brightness changes in stars that
could reveal the presence of planets.
	
Kepler detects planets by taking a photometric measurement
of the stars in its field of view every 30 minutes. A planet
transit will show as a small periodic dip in the “light curve” of
a star over time.
	
Kepler Data
Goal: Detect planet(s) currently missed by the
Kepler Team’s automatic search programs --
likely “super-Earths” with long periods
Scalable Software Systems Laboratory
Quasar Spectra Pair Method
The identification of 2175 bump is based on Mgii
absorber catalog with limitation:
•  We can only identify the 2175 bump in the redshift
range from 0.7 to 2.5.
•  The method is based on Mg II absorber catalog. If the
Mg ii absorber catalog is not complete, the 2175
bump sample may not be complete.
Scalable Software Systems Laboratory
Analysis of the Effects
(a) Input data with bumps (c) Feature map of last
convolutional layer
(b) Filters of the first
convolutional layer
Scalable Software Systems Laboratory
Reconstruction of Bumps
(d) Reconstructed input
image with bump
(e) Reconstructed input
image without bump
Scalable Software Systems Laboratory
DeepDefense: DDoS Detection
Scalable Software Systems Laboratory
DeepDefense Architecture
LSTM
CTC
DataSequence1
000
DataSequence2
000
DataSequence3
000
DataSequence4
000
CNN
CNN
CNN
CNN
CNN
LSTMLSTMLSTMLSTMLSTM
LSTMLSTM
LST
M
LSTMLSTMLSTM
LSTMLSTMLSTMLSTMLSTMLSTM
Spatial
Temporal,Recurrent,CascadingLSTM
BPTT
BPTS
Feature Analysis
Ensemble Analysis
Knowledge Fusion
Performance
Evaluation
BPTT: Backpropogation Through Time
BPTS: Backpropogation Through Space
CNN: Convolution Neural Network
LSTM: Long Short-Term Memory
CTS: Connectionist Temporal Classification
SearchableOutputs
Scalable Software Systems Laboratory
Data-Driven DeepHealth
With Azra Bihorac, Lizi Wu, Parisa Rashidi etc
Scalable Software Systems Laboratory
Bipolar Disorder & Challenge Objectives
•  Bipolar disorder is a brain disease that causes
unusual mood shifts
•  Estimated 51% of affected population go
untreated in a given year
•  Detection not straightforward - symptoms and
test metrics not too dissimilar from other brain
disease
•  Recent studies indicate heritability and
genetic factors as causes opening new area of
detection using genome data.
•  CAGI challenge given to predict the bipolar
disorder using exomes .
•  Exome sequencing data of 1000 samples with
500 for training and 500 for prediction
challengeImage source http://www.nimh.nih.gov/health/statistics/prevalence/
bipolar-disorder-among-adults.shtml
Scalable Software Systems Laboratory
Data Pre-Processing
n  Extracted genotype information from the exomes
n  The genotypes were 0/0,0/1,1/1 and ./.
n  One-hot-encoding transformation on the genotypes i.e 0/0
encoded as 0100, 0/1 encoded as 0010,etc.
n  One hot encoding treats all categorical variables equidistant
Scalable Software Systems Laboratory
DeepBipolar V1: Convolutional DNN
Genotype data:
2008 * 1000 * 1
32 kernels,kernel size:
4*4*1 , stride: (1,4)
32 kernels,kernel size:
3*3*32 , stride: (1,1)
Max Pooling: Pool size
(3,3), stride=(3,2)
2 x 64 kernels,size:
3*3*32 , stride: (1,1)
MP:size (1,3),
stride=(3,3)
128 kernels,kernel size:
3*3*64 , stride: (1,1)
128 kernels,kernel
size: 3*3*128 , stride:
(1,1)
Max Pooling:
size (2,2), stride=(2,2)
128 kernels,size:
3*3*128,stride: (1,1)
MP:size (3,3),
s=(2,2)
1 kernels,size:
1*1,stride: (1,1)
Fully Connected Layer
64 neurons
Sigmoid -
Probability
Output
Layer
997
502
32 32
995
500
331
249
32
64
329
247
64
327
81
109
245
107
128
79 128
77
105
52
38
128
36
50
128
128
17
24
24
17
1
64
Scalable Software Systems Laboratory
DeepBipolar V2: Convolutional AutoEncoder
Genotype data:
2008 * 1000 * 1
32 kernels,kernel size:
4*4*1 , stride: (1,4)
32 kernels,kernel size:
3*3*32 , stride: (1,1)
Max Pooling(MP):size
(3,3), stride=(3,2)
64 kernels,kernel size:
3*3*32 , stride: (1,1)
64 kernels,kernel size:
3*3*64 , stride: (1,1)
Max Pooling: Pool size
(1,3), stride=(3,3)
128 kernels,kernel size:
3*3*64 , stride: (1,1)
997
502
32 32
995
500
331
249
32
64
329
247
64
327
81
109
107
128
79 128
128
Up Sampling: size
(3,3), stride=(3,2)
109
81
245
2 x 64 kernels,size: 3*3*64
Deconvolution
Up Sampling: size
(1,3), stride=(3,3)
2 x 32 kernels, size:
3*3*64 Deconvolution
64
327
245
64
329
247
331
249
64
995
32
500
1000
2008
32
1*1 Convolution
layer
2008
1000
1
Input data
Scalable Software Systems Laboratory
SDE Controller
SDDC Hypervisor
SDE App Store
GatorCloud: SDN-enabled Campus Cloud
DeepCloud Towards Composable Intelligent Platform
Golfer
GolfVisor
8	U
46	U
8	U
8	U
1	U
2	U
3	U
3	U
3	U
8	U
46	U
8	U
8	U
1	U
2	U
3	U
3	U
3	U
8	U
46	U
8	U
8	U
1	U
2	U
3	U
3	U
3	U
8	U
46	U
8	U
8	U
1	U
2	U
3	U
3	U
3	U
Gator, GENI, and Testbed Racks
Internet2
/NLR
100G
100G
GENI
Apps
GolfStore
CloudDashboard
Users
Researchers
Scientists
Developers
Engineers
Admins
IaaS
PaaS
SaaS
CPSaaS
NaaS
HPCaaS
iBDaaS
Security
Apps
Network
Apps
BigData
Apps
Self-Protection
Major Data Centers at UF
HiPerGator Supercomputer
CMS/OSG Physics
HPC Centers
ICBR: Interdisciplinary Center
for Biotech Research
CTSI: Clinical and Translational
Science Institute
ACIS Data Center
NEB Data Center
HPC
Apps
StaaS
Scalable Software Systems Laboratory
S3Lab Research Highlights
Finest
Smartphone
Indoor
Location
Ecosystem
First
SDN-enabled
Campus Cloud
GatorCloud
Fastest
Campus
Research
Network
100G
IMPACT
Fourth
DeepCloud
Intelligent
Platform
Scalable Software Systems Laboratory
NSF I/UCR Center for Big Learning (Pending)
Deep
Learning
Big
Systems
Big
Data
Intelligence
Member Benefits
• Leveraging the world-class
talents (about 40
professors and 200
graduate students) in the
era of big learning, big
data, and big systems.
• Realizing a 10:1 return on
investment.
• Discovering top students in
top universities.
• Joining peer members from
high-profile companies and
research units.
CBL Consortium: University of Florida (UF, South), Carnegie Mellon University
(CMU, East), University of Missouri at Kansas City (UMKC, Central), University of
Notre Dame (ND, North), and University of Oregon (UO, West), and a large number
of industrial partners.
Scalable Software Systems Laboratory
Thank You!

Cognitive Engine: Boosting Scientific Discovery

  • 1.
    Scalable Software SystemsLaboratory Scalable Software Systems Laboratory Department of Electrical and Computer Engineering CognitiveEngine: Boosting Scientific Discovery Xiaolin Andy Li http://www.andyli.ece.ufl.edu
  • 2.
    Scalable Software SystemsLaboratory Information Technology Text in here 1939 1946 1970 1980 1990 New Age ENIAC ARPANET The Internet Fiber Optics Vint Cerf Bob Kahn Charles Kuen Kao Mosaic Web Browser Marc Andreessen and Eric Bina WWW Tim Berners Lee Martin Cooper, 1973 Steve Jobs, 2007 1G, 1980s 2G, 1990s 3G, 2000s 4G, 2010s ABC John Atanasoff BSEE@UF, 1925
  • 3.
    Scalable Software SystemsLaboratory Cloud Computing n  SaaS: Software as a Service n  Salesforce, 1999 n  StaaS: Storage as a Service n  Amazon S3, 2006; Dropbox, 2008 n  PaaS: Platform as a Service n  Google App Engine, 2008; Microsoft Azure, 2010; n  Docker, 2013; IBM BlueMix, 2014 n  IaaS: Infrastructure as a Service n  Amazon AWS, 2002; Eucalyptus, 2008 n  Rackspace/NASA OpenStack, 2010; Google Compute Engine, 2012 2000
  • 4.
    Scalable Software SystemsLaboratory SDN: Software-Defined Networking Nick McKeown Scott Schenker Martin Casado 2009
  • 5.
    Scalable Software SystemsLaboratory Internet2 Innovation Platform2013
  • 6.
    Scalable Software SystemsLaboratory Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis 2013
  • 7.
    Scalable Software SystemsLaboratory 1970 àà 1990 àà 2010 àà 2030 àà 2D IT Booming Cycles IT Boom V2 IT Boom V3IT Boom V1 1950 à à à 1980 à à à 2010 à à à 2040 3D Computing Platform Cycles 2nd Platform 3rd Platform1st Platform 4th Platform Towards Intelligent Platform IT Boom V4
  • 8.
    Scalable Software SystemsLaboratory Time for Change Current Unified Big Systems Hadoop OpenStack Torque Pig Dryad Pregel Percolator CIEL Container Virtual Machine Bare Metal
  • 9.
    Scalable Software SystemsLaboratory GatorCloud - Towards Software-Defined Ecosystems OpenFlow Software- Defined Computing SDC Apps Runtime Big Data PBS/Torq Virtual MachineContainer Nova Controller HPC Program Models Software- Defined Networking SDN Apps Low Latency SDN Hypervisor OVS OF- Config Open Flow GENI SDN Controller High Throughp ut
  • 10.
    Scalable Software SystemsLaboratory GatorCloud Network Topology 2*10Gb/s upgraded to 2*100Gb/s National Lambda Rail, Internet2, GENI (via Jacksonville) UF Physics CMS/OSG Data Center GatorVisor SSRB CNS Lab NEB S3Lab CISE Lab Apps Controller Nets Controller 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U Data Cloud VM Cloud Cloud Portal VM Cloud Data Cloud 2 2 2 2 100G 100G 100G 100G10G 40G 4 4 Cloud Orange Cloud Green FLR ECDC HPC Center - ES Physics HPC Center - Phy 2 100G Larsen HPC Center - Eng SSRB Campus Datacenter Hybrid Controller Larsen HCS Lab 40G 4 2*10Gb/s upgraded to 2*100Gb/s Golfer Golfer Deployed in 2012, one of the first 100Gbps SDN Campus Research Networks in USA SDN Switch Phase 1 SDN, 40G/10G Phase 2 SDN, 100G SDN Control Plane
  • 11.
    Scalable Software SystemsLaboratory HiPerGator Supercomputer Ranking from top500 supercomputer list # 4 among public universities in US # 8 among universities in US # 115 among all machines listed Major Data Centers at UF HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Translational Science Institute ACIS/CAC Data Center CHREC Data Center (Novo-G) NEB Data Center
  • 12.
    Scalable Software SystemsLaboratory What Changed? Lecture 1 -Fei-Fei Li & Andrej Karpathy & Justin Johnson Convolution Pooling Softmax Other GoogLeNet VGG MSRASuperVision [Krizhevsky NIPS 2012] Year 2012 Year 2014Year 2010 Dense grid descriptor: HOG, LBP Coding: local coordinate, super-vector Pooling, SPM Linear SVM NEC-UIUC [Lin CVPR 2011] [Szegedy arxiv 2014] [Simonyan arxiv 2014] 4-Jan-1631 Year 2015 Revolution of Depth 34 58 66 86 HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)* PASCAL VOC 2007 Object Detection mAP (%) shallow 8 layers 16 layers 101 layers *w/ other improvem Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition Engines of visual recognition Revolution of Depth 3.57 6.7 7.3 11.7 16.4 25.8 28.2 ILSVRC'15 ResNet ILSVRC'14 GoogleNet ILSVRC'14 VGG ILSVRC'13 ILSVRC'12 AlexNet ILSVRC'11 ILSVRC'10 ImageNet Classification top-5 error (%) shallow8 layers 19 layers22 layers 152 layers Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015. 8 layers Beyond Human
  • 13.
    Scalable Software SystemsLaboratory CognitiveEngine: Beyond Hadoop and Spark n  Bulk Synchronization Parallel n  Both a blessing and a curse n  Easy to schedule and arrange dependency n  All synchronized Map Reduce Stage Stage Stage Stage
  • 14.
    Scalable Software SystemsLaboratory ADD Design Choices n  Asynchronous Distributed Datasets (ADD) n  Inherits the easy-to-use programming interface n  Differentiate static data (samples) and the iteratively updated data (parameters) n  Automatic asynchronous updates, with user specified bound n  Asynchronous-aware scheduling
  • 15.
    Scalable Software SystemsLaboratory ADD local copy ADD System ADD Server ADD Server ADD Client ADD Client ADD Client Training samples Training samples Training samples Async push Async pull Feed Forward + Back Propagation ADD features •  Async push and pull of model update •  Users are allowed to specify the condition of returning from pull/ push, so that they don’t have to wait •  Adaptive model update method: all-to-one/tree aggregation/P2P approximate update •  User-controllable tradeoff between asynchrony and convergence rate •  Model snapshot and sharing
  • 16.
    Scalable Software SystemsLaboratory Execution Static Data Dynamic Data Handler Function State ADD Partition ADD Task ADD Task ADD Task Locality Iteration, etc. Fetch Compute Update Bookkeeping
  • 17.
    Scalable Software SystemsLaboratory Advantages n  Asynchronous Update n  IO / CPU overlap n  Fault tolerant n  Derive and live with state-of-the-art system n  Spark n  Sharing among jobs and users n  Maximizing parallelism of GPUs
  • 18.
    Scalable Software SystemsLaboratory DeepApps n  DeepScience n  DeepSky n  DeepDefense n  DeepHealth n  DeepBipolar n  DeepVital n  DeepGuard n  DeepCancer n  DeepBot/Dingding n  DeepDrug
  • 19.
    Scalable Software SystemsLaboratory DeepSky: Sloan Digital Sky Survey With Jian Ge
  • 20.
    Scalable Software SystemsLaboratory The animation shows how Kepler detects planets. As the planet passes between the host star and the spacecraft, the observed star brightness decreases slightly, signaling the potential detection of a planet. Kepler looked at over 150,000 stars continuously for four years in the constellations Cygnus and Lyra, seeking to record the slight periodic brightness changes in stars that could reveal the presence of planets. Kepler detects planets by taking a photometric measurement of the stars in its field of view every 30 minutes. A planet transit will show as a small periodic dip in the “light curve” of a star over time. Kepler Data Goal: Detect planet(s) currently missed by the Kepler Team’s automatic search programs -- likely “super-Earths” with long periods
  • 21.
    Scalable Software SystemsLaboratory Quasar Spectra Pair Method The identification of 2175 bump is based on Mgii absorber catalog with limitation: •  We can only identify the 2175 bump in the redshift range from 0.7 to 2.5. •  The method is based on Mg II absorber catalog. If the Mg ii absorber catalog is not complete, the 2175 bump sample may not be complete.
  • 22.
    Scalable Software SystemsLaboratory Analysis of the Effects (a) Input data with bumps (c) Feature map of last convolutional layer (b) Filters of the first convolutional layer
  • 23.
    Scalable Software SystemsLaboratory Reconstruction of Bumps (d) Reconstructed input image with bump (e) Reconstructed input image without bump
  • 24.
    Scalable Software SystemsLaboratory DeepDefense: DDoS Detection
  • 25.
    Scalable Software SystemsLaboratory DeepDefense Architecture LSTM CTC DataSequence1 000 DataSequence2 000 DataSequence3 000 DataSequence4 000 CNN CNN CNN CNN CNN LSTMLSTMLSTMLSTMLSTM LSTMLSTM LST M LSTMLSTMLSTM LSTMLSTMLSTMLSTMLSTMLSTM Spatial Temporal,Recurrent,CascadingLSTM BPTT BPTS Feature Analysis Ensemble Analysis Knowledge Fusion Performance Evaluation BPTT: Backpropogation Through Time BPTS: Backpropogation Through Space CNN: Convolution Neural Network LSTM: Long Short-Term Memory CTS: Connectionist Temporal Classification SearchableOutputs
  • 26.
    Scalable Software SystemsLaboratory Data-Driven DeepHealth With Azra Bihorac, Lizi Wu, Parisa Rashidi etc
  • 27.
    Scalable Software SystemsLaboratory Bipolar Disorder & Challenge Objectives •  Bipolar disorder is a brain disease that causes unusual mood shifts •  Estimated 51% of affected population go untreated in a given year •  Detection not straightforward - symptoms and test metrics not too dissimilar from other brain disease •  Recent studies indicate heritability and genetic factors as causes opening new area of detection using genome data. •  CAGI challenge given to predict the bipolar disorder using exomes . •  Exome sequencing data of 1000 samples with 500 for training and 500 for prediction challengeImage source http://www.nimh.nih.gov/health/statistics/prevalence/ bipolar-disorder-among-adults.shtml
  • 28.
    Scalable Software SystemsLaboratory Data Pre-Processing n  Extracted genotype information from the exomes n  The genotypes were 0/0,0/1,1/1 and ./. n  One-hot-encoding transformation on the genotypes i.e 0/0 encoded as 0100, 0/1 encoded as 0010,etc. n  One hot encoding treats all categorical variables equidistant
  • 29.
    Scalable Software SystemsLaboratory DeepBipolar V1: Convolutional DNN Genotype data: 2008 * 1000 * 1 32 kernels,kernel size: 4*4*1 , stride: (1,4) 32 kernels,kernel size: 3*3*32 , stride: (1,1) Max Pooling: Pool size (3,3), stride=(3,2) 2 x 64 kernels,size: 3*3*32 , stride: (1,1) MP:size (1,3), stride=(3,3) 128 kernels,kernel size: 3*3*64 , stride: (1,1) 128 kernels,kernel size: 3*3*128 , stride: (1,1) Max Pooling: size (2,2), stride=(2,2) 128 kernels,size: 3*3*128,stride: (1,1) MP:size (3,3), s=(2,2) 1 kernels,size: 1*1,stride: (1,1) Fully Connected Layer 64 neurons Sigmoid - Probability Output Layer 997 502 32 32 995 500 331 249 32 64 329 247 64 327 81 109 245 107 128 79 128 77 105 52 38 128 36 50 128 128 17 24 24 17 1 64
  • 30.
    Scalable Software SystemsLaboratory DeepBipolar V2: Convolutional AutoEncoder Genotype data: 2008 * 1000 * 1 32 kernels,kernel size: 4*4*1 , stride: (1,4) 32 kernels,kernel size: 3*3*32 , stride: (1,1) Max Pooling(MP):size (3,3), stride=(3,2) 64 kernels,kernel size: 3*3*32 , stride: (1,1) 64 kernels,kernel size: 3*3*64 , stride: (1,1) Max Pooling: Pool size (1,3), stride=(3,3) 128 kernels,kernel size: 3*3*64 , stride: (1,1) 997 502 32 32 995 500 331 249 32 64 329 247 64 327 81 109 107 128 79 128 128 Up Sampling: size (3,3), stride=(3,2) 109 81 245 2 x 64 kernels,size: 3*3*64 Deconvolution Up Sampling: size (1,3), stride=(3,3) 2 x 32 kernels, size: 3*3*64 Deconvolution 64 327 245 64 329 247 331 249 64 995 32 500 1000 2008 32 1*1 Convolution layer 2008 1000 1 Input data
  • 31.
    Scalable Software SystemsLaboratory SDE Controller SDDC Hypervisor SDE App Store GatorCloud: SDN-enabled Campus Cloud DeepCloud Towards Composable Intelligent Platform Golfer GolfVisor 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U Gator, GENI, and Testbed Racks Internet2 /NLR 100G 100G GENI Apps GolfStore CloudDashboard Users Researchers Scientists Developers Engineers Admins IaaS PaaS SaaS CPSaaS NaaS HPCaaS iBDaaS Security Apps Network Apps BigData Apps Self-Protection Major Data Centers at UF HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Translational Science Institute ACIS Data Center NEB Data Center HPC Apps StaaS
  • 32.
    Scalable Software SystemsLaboratory S3Lab Research Highlights Finest Smartphone Indoor Location Ecosystem First SDN-enabled Campus Cloud GatorCloud Fastest Campus Research Network 100G IMPACT Fourth DeepCloud Intelligent Platform
  • 33.
    Scalable Software SystemsLaboratory NSF I/UCR Center for Big Learning (Pending) Deep Learning Big Systems Big Data Intelligence Member Benefits • Leveraging the world-class talents (about 40 professors and 200 graduate students) in the era of big learning, big data, and big systems. • Realizing a 10:1 return on investment. • Discovering top students in top universities. • Joining peer members from high-profile companies and research units. CBL Consortium: University of Florida (UF, South), Carnegie Mellon University (CMU, East), University of Missouri at Kansas City (UMKC, Central), University of Notre Dame (ND, North), and University of Oregon (UO, West), and a large number of industrial partners.
  • 34.
    Scalable Software SystemsLaboratory Thank You!