Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Designing for intensity: parallelism from analytics to AI


Published on

No longer just a niche for scientific applications, High Performance Computing architectures and designs underpin the rapid growth in Deep Learning innovation and the wider arena of intensive computing in all its forms. Parallelism - the hallmark of HPC - increases throughput and efficiencies in data-intensive analytics solutions and multiplies the speed of compute-intensive deep learning algorithms inside today's advanced AI applications. In this session, we will discuss co-creating integrated solutions that leverage parallelism at all levels to deliver the most competitive intensive computing platforms, ready to adapt and scale as required.
Speaker: Ian Goodfrey
Manju Oommen

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Designing for intensity: parallelism from analytics to AI

  1. 1. 0 © Copyright 2017 FUJITSU Fujitsu Forum 2017 #FujitsuForum
  2. 2. 1 © Copyright 2017 FUJITSU Designing for intensity: parallelism from analytics to AI Ian Godfrey Director of the Solutions Business for Fujitsu Systems Europe Manju Annie Oommen Global Product Marketing Manager, Fujitsu
  3. 3. 2 © Copyright 2017 FUJITSU Agenda HPC Diversifies 1 Co-creating solutions 4 Q & A 5 Similarities between HPC and Deep Learning optimization 3 What Changed over the years? 2
  4. 4. 3 © Copyright 2017 FUJITSU HPC Diversifies: Hunger for compute power Increasing connected devices worldwide Size of digital universe increasing Driving more applications 6.4Bn Devices 10 Zettabytes 1000s of apps 2016 28Bn Devices 180 Zettabytes 20K New apps >2020 10 times more data to be generated by 2025* Emergence of High Performance Data Analytics Fraud and anomaly detection Identifying harmful /potentially harmful patterns and causes using graphical, semantic analysis, or other high performance analytics techniques, real time. Marketing Promote products or services using complex algorithms to discern potential customers' demographics, buying preferences and habits. Business intelligence Uses HPDA to identify opportunities to advance the market position and competitiveness of businesses, by better understanding themselves, their competitors, and the evolving dynamics of the markets they participate in. Other Commercial HPDA An example of such a high-potential workload is the use of HPDA to manage large IT infrastructures, ranging from on premise data centers to public clouds and Internet-of-Things (IoT) Infrastructures- involves solving complex problems. Existing HPC users • Intelligence community, FSI • Data-driven science/ engineering (e.g., biology) • Knowledge discovery • ML/DL, cognitive, AI New commercial users • Fraud/anomaly detection • Business intelligence • Affinity marketing • Personalized medicine Fastest processing/transformation of large volume data Real-time analysis to extract invisible insight from the data Accelerated deep-learning technology by GPU computation HPDA to grow robustly to be a $5.4Bn market* Customerbenefits 2 3 1 *Source: Information from analysts and various tele communication firms
  5. 5. 4 © Copyright 2017 FUJITSU Neural Networks are Old – What changed? Scale drives deep learning progress  Availability of:  More Data  Faster Compute/Hardware  Better Algorithm  Best results are obtained by training a large neural network or/and by feeding in more data Repetitive Training History 1943 First electrical model of neural network 1958 Perceptron 1986 Backpropogation 1990s Convolutional Networks (LeCun) 2006 Deep Belief Network (Hinton) 2013/14 Google buys Deep Mind HPC speeding up Deep learning Research
  6. 6. 5 © Copyright 2017 FUJITSU What does deep learning deal with? Deep Learning DeepLearningisthe machine’sperceptionof Images • Faces • Self driving Sound • Voice search • Music Gen • Translation Text • CRM • Search + • Ads Time Series • Health data • Sensors • Finance ARTIFICIAL INTELLIGENCE A program that can sense, reasons, act and adapt MACHINE LEARNING Algorithms whose performance improve when exposed to more data over time DEEP LEARNING Multi-layered neural networks learn from vast amounts of data Unsupervised LearningSupervised Learning Cluster Analysis Time Series Unstructured Convolutional Neural Network(CNN) Recurrent Neural Network(RNN) RNN+ Long-short term Memory(LSTM) Reinforcement Learning
  7. 7. 6 © Copyright 2017 FUJITSU Industry segmentation and use cases Healthcare • Pharmaceutical • Genomics • Imagery and medical diagnostic Marketing Automation • CRM • Market Classification • Demand Prediction • Document Generation • Enterprise Resource Planning • Predictive Maintenance/Analysis • Machine transcription • Machine translation Defense and Social Security • Surveillance and Security • Cyber security • Image recognition • Motion detection Consumer e-commerce/ Retail Transport/ Logistics • Autonomous cars • Motion detection • Networked car/Co- ordinated traffic • Commercial Drones • Optimized route • Sentiment Analysis • Classification • Recommendation engine • Demand prediction • Automated consulting • Search • Emails • Personalization • Smart Assistant • Chatbots Others • Education • Fintech • Gaming • Telco • Media Manufacturing/ Industrial
  8. 8. 7 © Copyright 2017 FUJITSU Industry wide presence of Deep Learning Social Infra 4% Financial 9% Public Sector 18% Distribution 26% Manufacturing 43% Sector wise Call center 28% Knowledge Utilization 20% Manufacturing 16% Demand Prediction 13% Maintenance 8% Fintech 9% Healthcare 6% Application wise Source: Based on projects & PoCs in Fujitsu Artificial Intelligence is the new Electricity…..Andrew Ng  DL is not a vertical market. It is more akin to an algorithm or method of computation, like an FFT  Intersect360 Research tracks AI (including deep learning, machine learning, cognitive computing, etc.) as part of the hyper scale market  Similar to but distinct from HPC  Low precision, intensely parallel, strong affinity to public cloud  Cloud providers and end users are in early stages of investment for their applications  AI may become a pervasive technology that is embedded in non-hyperscale manifestations
  9. 9. 8 © Copyright 2017 FUJITSU Fujitsu shaping HPC Diversification
  10. 10. 9 © Copyright 2017 FUJITSU HPC: the foundation to accelerating AI technology & FX100 for simulation and pre-processing technology Zinrai Deep Learning & DLU for a high-speed learning environment Digital Annealer for combinatorial optimal solutions Quantum Computing Deep Learning HPC
  11. 11. 10 © Copyright 2017 FUJITSU Proximity in AI and HPC HPC AI/DL HyperscaleSupercomputing Multi-node
  12. 12. 11 © Copyright 2017 FUJITSU Characterising Performance Computing Computational scope Customer usage Primary focus is performance Compute-intensive algorithms Maths solvers Applications arbitrarily scalable Is still “HPC” on only a few nodes – there is entry-level HPC Largest supercomputers are >$100 million Problem-solving Data Analysis Scientific Simulation Technical Modelling Virtual Prototyping Top tier users push boundaries and influence technology throughout industry
  13. 13. 12 © Copyright 2017 FUJITSU Convolutional Neural Network Breakthrough Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012) Deeper Network in Network Deep DNN first blood One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528- dimensional, and the number of neurons in the network’s remaining layers is given by 253,440– 186,624–64,896–64,896–43,264–4096–4096–1000. 2014 2013 2012 Use of 2 GPUs – data parallelism
  14. 14. 13 © Copyright 2017 FUJITSU Neural Network starting point 𝐴𝑐𝑡 𝐿 𝑗 = 𝜎 𝐴𝑐𝑡 𝐿 − 1 𝑖 𝑥 𝑊 𝐿 𝑖 𝑗 + 𝐵𝑖𝑎𝑠 𝐿 [𝑗] 𝐴𝑐𝑡 𝐿 − 1 1 𝐴𝑐𝑡 𝐿 − 1 2 𝐴𝑐𝑡 𝐿 − 1 3 𝑊 𝐿 1][1 𝑊 𝐿 3][1 𝑊 𝐿 2][1 𝜎 Activation function, e.g tanh, ReLu Weight Feed-forward network, 3 neurons, 1 hidden layer Fundamental multiply-add structure
  15. 15. 14 © Copyright 2017 FUJITSU Vectorisation in Linear Algebra  Core intensive code in Linpack benchmark do 30 j = kp1, n t = a(l,j) if (l .eq. k) go to 20 a(l,j) = a(k,j) a(k,j) = t 20 continue call daxpy(n-k,t,a(k+1,k),1,a(k+1,j),1) 30 continue do 40 kb = 1, n k = n + 1 - kb b(k) = b(k)/a(k,k) t = -b(k) call daxpy(k-1,t,a(1,k),1,b(1),1) 40 continue do 10 i = 1,n dy(iy) = dy(iy) + da*dx(ix) ix = ix + incx iy = iy + incy 10 continue Fujitsu K computer Source:
  16. 16. 15 © Copyright 2017 FUJITSU Network Illustration Source: Nervana 𝑊𝑖→𝑗 784 × 100 𝑏𝑗 100 𝑊𝑖→𝑗 100 × 10 𝑏𝑗 10 Total parameters 𝑐(𝑜𝑢𝑡𝑝𝑢𝑡, 𝑡𝑟𝑢𝑡ℎ) Cost function N = 10 output units (one for each digit) Each unit i encodes the probability of the input image of being of the digit iN = 100 hidden units (user-defined parameter) N= 28 x 28 pixels = 784 input units Fully connected network, convolution not present for now
  17. 17. 16 © Copyright 2017 FUJITSU CNN Computing Operations Dense Matrix Multiplies Recurrent Layers Convolutions All-Reduce Deep Learning ingredients 1. Randomly seed weights 2. Forward-pass 3. Cost 4. Backward-pass 5. Update weights
  18. 18. 17 © Copyright 2017 FUJITSU Parallelisation Hierarchy Vectorisation – Is SIMD parallelism used well? Scalar tuning – What happens in the pipeline? Memory – Is cache usage maximised or RAM access streamlined? Threading – do cores cooperation efficiently? Communication – can coordination in a distributed or heterogeneous system be improved?
  19. 19. 18 © Copyright 2017 FUJITSU Naïve Nested Loops in CNN Algorithms Forward Propagation Backward Propagation Convolution
  20. 20. 19 © Copyright 2017 FUJITSU A short word on Tensors  Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations  The number of indices is called the rank of the tensor  Tensor rank 0 is a scalar  Tensor rank 1 is a vector  Tensors are important in many areas of physics (general relativity, electromagnetic theory)  In N-dimensional space a tensor of rank n has Nn components  Transformation rules are independent of choice of reference frame – ideal for expressing universal physical laws
  21. 21. 20 © Copyright 2017 FUJITSU Optimised Functions  Software Libraries  Tensor functions hand-coded for CPUs or GPUs  Intel MKL-DNN  Emergence of dedicated processing units and ISAs  Tensor Arithmetic in hardware
  22. 22. 21 © Copyright 2017 FUJITSU Multi-threading CNN Training 1 thread 4 threads 16 threads 64 threads Training on CIFAR-10 with Intel-Caffe, 1000 iterations, Full Solver Dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class – 50000 training images and 10000 test images.
  23. 23. 22 © Copyright 2017 FUJITSU MPI Parallelism in CFD Global model decomposed into 8 balanced MPI domains Halo at interface between domains Communicate between processes with MPI primitives: MPI_Send, MPI_Recv, MPI_Wait MPI_AllToAll, MPI_AllReduce, MPI_Barrier Domain surfaces adapted to cell weights
  24. 24. 23 © Copyright 2017 FUJITSU MPI in Deep Learning
  25. 25. 24 © Copyright 2017 FUJITSU MPI Parallel Performance
  26. 26. 25 © Copyright 2017 FUJITSU AI evolution driving CPU and GPU releases Performance Intel® Xeon Phi™ Processor Knights Mill Intel® Xeon Processor Skylake Lake Crest Intel® Xeon® Processor + FPGA Intel® Lake Crest Deep neural network processor DatacenterEdge/CloudDatacenter InferenceTraining Intel® Nervana NVIDIA Tesla P4,P40 NVIDIA Drive PX Google TPU NVIDIA Pascal 100 FPGA SOC(Intel/Xilinx) FUJITSU PRIMERGY CX600 K Computer
  27. 27. 26 © Copyright 2017 FUJITSU Fujitsu Gateway – Intelligent Application Platform Cloud Services  Cloud bursting – Gateway  On premise cloud – UNCAI Artificial Intelligence  Smart City Surveillance  Manufacturing process optimisation HPC for Data Analytics  Based on PRIMERGY with Parallel File System  Reference Architecture Products and Solutions CELSIUS Intel & Mellanox Cluster Interconnect NVDIA GPGPU PRIMERGY RX2540 M4 SKL based © FUJITSU LIMITED 201726 PRIMEFLEX for HPC Solutions Products CX600 M1 KNL / KNM based Entry ETERNUS storage Cloud PRIMERGY RX2530 M4 SKL based High-end ETERNUS storage NetApp storage DDN storage Workgroup Data CenterDepartmental * Liquid Cooling + immersion cooling * FY2018 CX400 M4 SKL based CX2550 M4 HPC CX2570 M4 GPU CX2580 M4 FPGA Engineering Cloud  Industry 4.0  MONOZUKURI
  28. 28. 27 © Copyright 2017 FUJITSU New PRIMEFLEX Options  Reference designs defined for AI Deep Learning frameworks  PRIMEFLEX configuration tool provided for fast definition of a complete solution  PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack  Ref arch for off-premise  Cloud-bursting capability
  29. 29. 28 © Copyright 2017 FUJITSU DL/HPC trends  DL opportunity represents 6-7% of Hyperscale Market  Speculative figure, likely 100% y/y growth  DL is not a vertical market  It is more akin to an algorithm or method of computation, like an FFT  AI/DL exists in proximity to HPC  Driven by same architectural objective – performance and scale  Converged math and programming methodologies  Technological cross-fertilization • Software: compilers, libraries, tools • Hardware: processors, memory, interconnect Source: Intersect360 Research, 2016
  30. 30. 29 © Copyright 2017 FUJITSU Summary Combine algorithmic expertise on HPC and ML Fujitsu has the rare capability to combine technologies & provide fully optimized solution Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical AI usage is primarily on Cloud, today Customer looks for simplified integrated solutions
  31. 31. 30 © Copyright 2017 FUJITSU Fujitsu Sans Light – abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 ¬!”£$%^&*()_+-=[]{};’#:@~,./<>?| ©¨~¡¢¤¥¦§¨ª«»¬- ®¯°±²³µ¶·¸¹º¼½¾¿ÀÁÂÃÄÅÇÈÆÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþ ÿĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝·-‒–—―‘’‚“”„†‡•…‰‹›‾⁄⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉€™Ω→∂∆∏∑−√∞∫≈≠≤≥⋅■◊fifl Fujitsu Sans – abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 ¬!”£$%^&*()_+-=[]{};’#:@~,./<>?| ©¨~¡¢¤¥¦§¨ª«»¬- ®¯°±²³µ¶·¸¹º¼½¾¿ÀÁÂÃÄÅÇÈÆÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüý þÿĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝·-‒–—―‘’‚“”„†‡•…‰‹›‾⁄⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉€™Ω→∂∆∏∑−√∞∫≈≠≤≥⋅■◊fifl Fujitsu Sans Medium – abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789 ¬!”£$%^&*()_+-=[]{};’#:@~,./<>?| ©¨~¡¢¤¥¦§¨ª«»¬- ®¯°±²³µ¶·¸¹º¼½¾¿ÀÁÂÃÄÅÇÈÆÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúû üýþÿĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝·-‒–—―‘’‚“”„†‡•…‰‹›‾⁄⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉€™Ω→∂∆∏∑−√∞∫≈≠≤≥⋅■◊fifl
  32. 32. 31 © Copyright 2017 FUJITSU Deep Learning Networks Image Identity BACK
  33. 33. 32 © Copyright 2017 FUJITSU Unsupervised Learning Genome Market Segmentation Fraud Detection Astronomical data analysisGoogle News BACK