Scalable Software Systems Laboratory
Scalable Software Systems Laboratory
Department of Electrical and Computer Engineerin...
Scalable Software Systems Laboratory
Information Technology
Text in here
1939 1946 1970 1980 1990
New
Age	
ENIAC
ARPANET
T...
Scalable Software Systems Laboratory
Cloud Computing
n  SaaS: Software as a Service
n  Salesforce, 1999
n  StaaS: Storage ...
Scalable Software Systems Laboratory
SDN: Software-Defined Networking
Nick
McKeown
Scott
Schenker
Martin
Casado
2009
Scalable Software Systems Laboratory
Internet2 Innovation Platform2013
Scalable Software Systems Laboratory
Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis
2013
Scalable Software Systems Laboratory
1970 àà 1990 àà 2010 àà 2030 àà
2D IT Booming Cycles
IT Boom V2 IT Boom V3IT Boom V1
...
Scalable Software Systems Laboratory
Time for Change
Current Unified Big Systems
Hadoop
OpenStack
Torque
Pig
Dryad
Pregel
...
Scalable Software Systems Laboratory
GatorCloud
- Towards Software-Defined Ecosystems
OpenFlow
Software-
Defined
Computing...
Scalable Software Systems Laboratory
GatorCloud Network Topology
2*10Gb/s
upgraded to
2*100Gb/s
National Lambda
Rail, Inte...
Scalable Software Systems Laboratory
HiPerGator Supercomputer
Ranking from top500 supercomputer list
# 4 among public univ...
Scalable Software Systems Laboratory
What Changed?
Lecture 1 -Fei-Fei Li & Andrej Karpathy & Justin Johnson
Convolution
Po...
Scalable Software Systems Laboratory
CognitiveEngine: Beyond Hadoop and Spark
n  Bulk Synchronization Parallel
n  Both a b...
Scalable Software Systems Laboratory
ADD Design Choices
n  Asynchronous Distributed Datasets (ADD)
n  Inherits the easy-to...
Scalable Software Systems Laboratory
ADD local
copy
ADD System
ADD Server
ADD Server
ADD Client
ADD Client
ADD Client
Trai...
Scalable Software Systems Laboratory
Execution
Static
Data
Dynamic Data
Handler
Function State
ADD Partition
ADD
Task
ADD
...
Scalable Software Systems Laboratory
Advantages
n  Asynchronous Update
n  IO / CPU overlap
n  Fault tolerant
n  Derive and...
Scalable Software Systems Laboratory
DeepApps
n  DeepScience
n  DeepSky
n  DeepDefense
n  DeepHealth
n  DeepBipolar
n  Dee...
Scalable Software Systems Laboratory
DeepSky: Sloan Digital Sky Survey
With Jian Ge
Scalable Software Systems Laboratory
The animation shows how Kepler detects planets. As the
planet passes between the host...
Scalable Software Systems Laboratory
Quasar Spectra Pair Method
The identification of 2175 bump is based on Mgii
absorber ...
Scalable Software Systems Laboratory
Analysis of the Effects
(a) Input data with bumps (c) Feature map of last
convolution...
Scalable Software Systems Laboratory
Reconstruction of Bumps
(d) Reconstructed input
image with bump
(e) Reconstructed inp...
Scalable Software Systems Laboratory
DeepDefense: DDoS Detection
Scalable Software Systems Laboratory
DeepDefense Architecture
LSTM
CTC
DataSequence1
000
DataSequence2
000
DataSequence3
0...
Scalable Software Systems Laboratory
Data-Driven DeepHealth
With Azra Bihorac, Lizi Wu, Parisa Rashidi etc
Scalable Software Systems Laboratory
Bipolar Disorder & Challenge Objectives
•  Bipolar disorder is a brain disease that c...
Scalable Software Systems Laboratory
Data Pre-Processing
n  Extracted genotype information from the exomes
n  The genotype...
Scalable Software Systems Laboratory
DeepBipolar V1: Convolutional DNN
Genotype data:
2008 * 1000 * 1
32 kernels,kernel si...
Scalable Software Systems Laboratory
DeepBipolar V2: Convolutional AutoEncoder
Genotype data:
2008 * 1000 * 1
32 kernels,k...
Scalable Software Systems Laboratory
SDE Controller
SDDC Hypervisor
SDE App Store
GatorCloud: SDN-enabled Campus Cloud
Dee...
Scalable Software Systems Laboratory
S3Lab Research Highlights
Finest
Smartphone
Indoor
Location
Ecosystem
First
SDN-enabl...
Scalable Software Systems Laboratory
NSF I/UCR Center for Big Learning (Pending)
Deep
Learning
Big
Systems
Big
Data
Intell...
Scalable Software Systems Laboratory
Thank You!
Upcoming SlideShare
Loading in …5
×

Cognitive Engine: Boosting Scientific Discovery

206 views

Published on

Xiaolin (Andy) Li, University of Florida, Presentation at Cognitive Systems Institute Group April 28, 2016

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
206
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cognitive Engine: Boosting Scientific Discovery

  1. 1. Scalable Software Systems Laboratory Scalable Software Systems Laboratory Department of Electrical and Computer Engineering CognitiveEngine: Boosting Scientific Discovery Xiaolin Andy Li http://www.andyli.ece.ufl.edu
  2. 2. Scalable Software Systems Laboratory Information Technology Text in here 1939 1946 1970 1980 1990 New Age ENIAC ARPANET The Internet Fiber Optics Vint Cerf Bob Kahn Charles Kuen Kao Mosaic Web Browser Marc Andreessen and Eric Bina WWW Tim Berners Lee Martin Cooper, 1973 Steve Jobs, 2007 1G, 1980s 2G, 1990s 3G, 2000s 4G, 2010s ABC John Atanasoff BSEE@UF, 1925
  3. 3. Scalable Software Systems Laboratory Cloud Computing n  SaaS: Software as a Service n  Salesforce, 1999 n  StaaS: Storage as a Service n  Amazon S3, 2006; Dropbox, 2008 n  PaaS: Platform as a Service n  Google App Engine, 2008; Microsoft Azure, 2010; n  Docker, 2013; IBM BlueMix, 2014 n  IaaS: Infrastructure as a Service n  Amazon AWS, 2002; Eucalyptus, 2008 n  Rackspace/NASA OpenStack, 2010; Google Compute Engine, 2012 2000
  4. 4. Scalable Software Systems Laboratory SDN: Software-Defined Networking Nick McKeown Scott Schenker Martin Casado 2009
  5. 5. Scalable Software Systems Laboratory Internet2 Innovation Platform2013
  6. 6. Scalable Software Systems Laboratory Geoffrey Hinton, Yann LeCun, Yoshua Bengio, Andrew Ng, Demis Hassabis 2013
  7. 7. Scalable Software Systems Laboratory 1970 àà 1990 àà 2010 àà 2030 àà 2D IT Booming Cycles IT Boom V2 IT Boom V3IT Boom V1 1950 à à à 1980 à à à 2010 à à à 2040 3D Computing Platform Cycles 2nd Platform 3rd Platform1st Platform 4th Platform Towards Intelligent Platform IT Boom V4
  8. 8. Scalable Software Systems Laboratory Time for Change Current Unified Big Systems Hadoop OpenStack Torque Pig Dryad Pregel Percolator CIEL Container Virtual Machine Bare Metal
  9. 9. Scalable Software Systems Laboratory GatorCloud - Towards Software-Defined Ecosystems OpenFlow Software- Defined Computing SDC Apps Runtime Big Data PBS/Torq Virtual MachineContainer Nova Controller HPC Program Models Software- Defined Networking SDN Apps Low Latency SDN Hypervisor OVS OF- Config Open Flow GENI SDN Controller High Throughp ut
  10. 10. Scalable Software Systems Laboratory GatorCloud Network Topology 2*10Gb/s upgraded to 2*100Gb/s National Lambda Rail, Internet2, GENI (via Jacksonville) UF Physics CMS/OSG Data Center GatorVisor SSRB CNS Lab NEB S3Lab CISE Lab Apps Controller Nets Controller 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U Data Cloud VM Cloud Cloud Portal VM Cloud Data Cloud 2 2 2 2 100G 100G 100G 100G10G 40G 4 4 Cloud Orange Cloud Green FLR ECDC HPC Center - ES Physics HPC Center - Phy 2 100G Larsen HPC Center - Eng SSRB Campus Datacenter Hybrid Controller Larsen HCS Lab 40G 4 2*10Gb/s upgraded to 2*100Gb/s Golfer Golfer Deployed in 2012, one of the first 100Gbps SDN Campus Research Networks in USA SDN Switch Phase 1 SDN, 40G/10G Phase 2 SDN, 100G SDN Control Plane
  11. 11. Scalable Software Systems Laboratory HiPerGator Supercomputer Ranking from top500 supercomputer list # 4 among public universities in US # 8 among universities in US # 115 among all machines listed Major Data Centers at UF HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Translational Science Institute ACIS/CAC Data Center CHREC Data Center (Novo-G) NEB Data Center
  12. 12. Scalable Software Systems Laboratory What Changed? Lecture 1 -Fei-Fei Li & Andrej Karpathy & Justin Johnson Convolution Pooling Softmax Other GoogLeNet VGG MSRASuperVision [Krizhevsky NIPS 2012] Year 2012 Year 2014Year 2010 Dense grid descriptor: HOG, LBP Coding: local coordinate, super-vector Pooling, SPM Linear SVM NEC-UIUC [Lin CVPR 2011] [Szegedy arxiv 2014] [Simonyan arxiv 2014] 4-Jan-1631 Year 2015 Revolution of Depth 34 58 66 86 HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)* PASCAL VOC 2007 Object Detection mAP (%) shallow 8 layers 16 layers 101 layers *w/ other improvem Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition Engines of visual recognition Revolution of Depth 3.57 6.7 7.3 11.7 16.4 25.8 28.2 ILSVRC'15 ResNet ILSVRC'14 GoogleNet ILSVRC'14 VGG ILSVRC'13 ILSVRC'12 AlexNet ILSVRC'11 ILSVRC'10 ImageNet Classification top-5 error (%) shallow8 layers 19 layers22 layers 152 layers Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015. 8 layers Beyond Human
  13. 13. Scalable Software Systems Laboratory CognitiveEngine: Beyond Hadoop and Spark n  Bulk Synchronization Parallel n  Both a blessing and a curse n  Easy to schedule and arrange dependency n  All synchronized Map Reduce Stage Stage Stage Stage
  14. 14. Scalable Software Systems Laboratory ADD Design Choices n  Asynchronous Distributed Datasets (ADD) n  Inherits the easy-to-use programming interface n  Differentiate static data (samples) and the iteratively updated data (parameters) n  Automatic asynchronous updates, with user specified bound n  Asynchronous-aware scheduling
  15. 15. Scalable Software Systems Laboratory ADD local copy ADD System ADD Server ADD Server ADD Client ADD Client ADD Client Training samples Training samples Training samples Async push Async pull Feed Forward + Back Propagation ADD features •  Async push and pull of model update •  Users are allowed to specify the condition of returning from pull/ push, so that they don’t have to wait •  Adaptive model update method: all-to-one/tree aggregation/P2P approximate update •  User-controllable tradeoff between asynchrony and convergence rate •  Model snapshot and sharing
  16. 16. Scalable Software Systems Laboratory Execution Static Data Dynamic Data Handler Function State ADD Partition ADD Task ADD Task ADD Task Locality Iteration, etc. Fetch Compute Update Bookkeeping
  17. 17. Scalable Software Systems Laboratory Advantages n  Asynchronous Update n  IO / CPU overlap n  Fault tolerant n  Derive and live with state-of-the-art system n  Spark n  Sharing among jobs and users n  Maximizing parallelism of GPUs
  18. 18. Scalable Software Systems Laboratory DeepApps n  DeepScience n  DeepSky n  DeepDefense n  DeepHealth n  DeepBipolar n  DeepVital n  DeepGuard n  DeepCancer n  DeepBot/Dingding n  DeepDrug
  19. 19. Scalable Software Systems Laboratory DeepSky: Sloan Digital Sky Survey With Jian Ge
  20. 20. Scalable Software Systems Laboratory The animation shows how Kepler detects planets. As the planet passes between the host star and the spacecraft, the observed star brightness decreases slightly, signaling the potential detection of a planet. Kepler looked at over 150,000 stars continuously for four years in the constellations Cygnus and Lyra, seeking to record the slight periodic brightness changes in stars that could reveal the presence of planets. Kepler detects planets by taking a photometric measurement of the stars in its field of view every 30 minutes. A planet transit will show as a small periodic dip in the “light curve” of a star over time. Kepler Data Goal: Detect planet(s) currently missed by the Kepler Team’s automatic search programs -- likely “super-Earths” with long periods
  21. 21. Scalable Software Systems Laboratory Quasar Spectra Pair Method The identification of 2175 bump is based on Mgii absorber catalog with limitation: •  We can only identify the 2175 bump in the redshift range from 0.7 to 2.5. •  The method is based on Mg II absorber catalog. If the Mg ii absorber catalog is not complete, the 2175 bump sample may not be complete.
  22. 22. Scalable Software Systems Laboratory Analysis of the Effects (a) Input data with bumps (c) Feature map of last convolutional layer (b) Filters of the first convolutional layer
  23. 23. Scalable Software Systems Laboratory Reconstruction of Bumps (d) Reconstructed input image with bump (e) Reconstructed input image without bump
  24. 24. Scalable Software Systems Laboratory DeepDefense: DDoS Detection
  25. 25. Scalable Software Systems Laboratory DeepDefense Architecture LSTM CTC DataSequence1 000 DataSequence2 000 DataSequence3 000 DataSequence4 000 CNN CNN CNN CNN CNN LSTMLSTMLSTMLSTMLSTM LSTMLSTM LST M LSTMLSTMLSTM LSTMLSTMLSTMLSTMLSTMLSTM Spatial Temporal,Recurrent,CascadingLSTM BPTT BPTS Feature Analysis Ensemble Analysis Knowledge Fusion Performance Evaluation BPTT: Backpropogation Through Time BPTS: Backpropogation Through Space CNN: Convolution Neural Network LSTM: Long Short-Term Memory CTS: Connectionist Temporal Classification SearchableOutputs
  26. 26. Scalable Software Systems Laboratory Data-Driven DeepHealth With Azra Bihorac, Lizi Wu, Parisa Rashidi etc
  27. 27. Scalable Software Systems Laboratory Bipolar Disorder & Challenge Objectives •  Bipolar disorder is a brain disease that causes unusual mood shifts •  Estimated 51% of affected population go untreated in a given year •  Detection not straightforward - symptoms and test metrics not too dissimilar from other brain disease •  Recent studies indicate heritability and genetic factors as causes opening new area of detection using genome data. •  CAGI challenge given to predict the bipolar disorder using exomes . •  Exome sequencing data of 1000 samples with 500 for training and 500 for prediction challengeImage source http://www.nimh.nih.gov/health/statistics/prevalence/ bipolar-disorder-among-adults.shtml
  28. 28. Scalable Software Systems Laboratory Data Pre-Processing n  Extracted genotype information from the exomes n  The genotypes were 0/0,0/1,1/1 and ./. n  One-hot-encoding transformation on the genotypes i.e 0/0 encoded as 0100, 0/1 encoded as 0010,etc. n  One hot encoding treats all categorical variables equidistant
  29. 29. Scalable Software Systems Laboratory DeepBipolar V1: Convolutional DNN Genotype data: 2008 * 1000 * 1 32 kernels,kernel size: 4*4*1 , stride: (1,4) 32 kernels,kernel size: 3*3*32 , stride: (1,1) Max Pooling: Pool size (3,3), stride=(3,2) 2 x 64 kernels,size: 3*3*32 , stride: (1,1) MP:size (1,3), stride=(3,3) 128 kernels,kernel size: 3*3*64 , stride: (1,1) 128 kernels,kernel size: 3*3*128 , stride: (1,1) Max Pooling: size (2,2), stride=(2,2) 128 kernels,size: 3*3*128,stride: (1,1) MP:size (3,3), s=(2,2) 1 kernels,size: 1*1,stride: (1,1) Fully Connected Layer 64 neurons Sigmoid - Probability Output Layer 997 502 32 32 995 500 331 249 32 64 329 247 64 327 81 109 245 107 128 79 128 77 105 52 38 128 36 50 128 128 17 24 24 17 1 64
  30. 30. Scalable Software Systems Laboratory DeepBipolar V2: Convolutional AutoEncoder Genotype data: 2008 * 1000 * 1 32 kernels,kernel size: 4*4*1 , stride: (1,4) 32 kernels,kernel size: 3*3*32 , stride: (1,1) Max Pooling(MP):size (3,3), stride=(3,2) 64 kernels,kernel size: 3*3*32 , stride: (1,1) 64 kernels,kernel size: 3*3*64 , stride: (1,1) Max Pooling: Pool size (1,3), stride=(3,3) 128 kernels,kernel size: 3*3*64 , stride: (1,1) 997 502 32 32 995 500 331 249 32 64 329 247 64 327 81 109 107 128 79 128 128 Up Sampling: size (3,3), stride=(3,2) 109 81 245 2 x 64 kernels,size: 3*3*64 Deconvolution Up Sampling: size (1,3), stride=(3,3) 2 x 32 kernels, size: 3*3*64 Deconvolution 64 327 245 64 329 247 331 249 64 995 32 500 1000 2008 32 1*1 Convolution layer 2008 1000 1 Input data
  31. 31. Scalable Software Systems Laboratory SDE Controller SDDC Hypervisor SDE App Store GatorCloud: SDN-enabled Campus Cloud DeepCloud Towards Composable Intelligent Platform Golfer GolfVisor 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U 8 U 46 U 8 U 8 U 1 U 2 U 3 U 3 U 3 U Gator, GENI, and Testbed Racks Internet2 /NLR 100G 100G GENI Apps GolfStore CloudDashboard Users Researchers Scientists Developers Engineers Admins IaaS PaaS SaaS CPSaaS NaaS HPCaaS iBDaaS Security Apps Network Apps BigData Apps Self-Protection Major Data Centers at UF HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Translational Science Institute ACIS Data Center NEB Data Center HPC Apps StaaS
  32. 32. Scalable Software Systems Laboratory S3Lab Research Highlights Finest Smartphone Indoor Location Ecosystem First SDN-enabled Campus Cloud GatorCloud Fastest Campus Research Network 100G IMPACT Fourth DeepCloud Intelligent Platform
  33. 33. Scalable Software Systems Laboratory NSF I/UCR Center for Big Learning (Pending) Deep Learning Big Systems Big Data Intelligence Member Benefits • Leveraging the world-class talents (about 40 professors and 200 graduate students) in the era of big learning, big data, and big systems. • Realizing a 10:1 return on investment. • Discovering top students in top universities. • Joining peer members from high-profile companies and research units. CBL Consortium: University of Florida (UF, South), Carnegie Mellon University (CMU, East), University of Missouri at Kansas City (UMKC, Central), University of Notre Dame (ND, North), and University of Oregon (UO, West), and a large number of industrial partners.
  34. 34. Scalable Software Systems Laboratory Thank You!

×