Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gömülü Sistemlerde Derin Öğrenme Uygulamaları

440 views

Published on

Gömülü sistemler özellikle düşük güç harcayarak yüksek işlem gücü sağladığından drone, elektro-optik, robotik ve otonom sistemlerde yaygın bir şekilde kullanılmaktadır.
Bu eğitimimizde derin öğrenme uygulamalarının çalıştırılabildiği gömülü sistemler (FPGA ve GPU), örnek uygulamalar ve uygulama geliştirme süreci anlatılmıştır.

Published in: Data & Analytics
  • Be the first to comment

Gömülü Sistemlerde Derin Öğrenme Uygulamaları

  1. 1. GÖMÜLÜ SİSTEMLERDE DERİN ÖĞRENME UYGULAMALARI Ferhat Kurt https://embedded.openzeka.com
  2. 2. Microsoft & Google “Superhuman” Image Recognition Microsoft “Super Deep Network” Berkeley’s Brett End-to-End Reinforcement Learning Deep Speech 2 One network, 2 languages A New Computing Model Hits Pop Culture AlphaGo Rivals a World Champion TU Delft Deep-Learning Amazon Picking Champion YAPAY ZEKA KİLOMETRE TAŞLARI
  3. 3. Deep Learning and Computer Vision Graphics GPU Compute NVIDIA GPU: GRAFİKTEN DAHA FAZLASI
  4. 4. GPU'lar üstün performans ve verimlilik sunar Tümleşik algılama ve derin öğrenme, otonomluk sağlar x1 x2 x3 x4 OTONOM MAKİNELERİN YÜKSELİŞİ Otonomluk gerektiren yeni kullanım durumları
  5. 5. ÖNCÜ JETSON TEKNOLOJİSİ Otonom Makinelerin Gelecek Nesline Güç Veriyor
  6. 6. Jetson TX1 Bir Modül Üzerinde Süper Bilgisayar 10 W altında benzersiz performans Otonom makineler için gelişmiş teknoloji Kredi kartından daha küçük
  7. 7. JETSON TX1 GPU 1 TFLOP/s 256-core Maxwell CPU 4x 64-bit ARM A57 CPUs | 1.6GHz Memory 4 GB LPDDR4 | 25.6 GB/s Video decode 4K 60Hz H.264 Video encode 4K 30Hz H.264 CSI Up to 6 cameras | 1400 Mpix/s Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI Wi-Fi 802.11 2x2 ac Networking 1 Gigabit Ethernet PCI-E Gen 2 1x1 + 1x4 Storage 16 GB eMMC, SDIO, SATA Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs Power 10-15W, 6.6V-19.5VDC Size 50mm x 87mm Modül Üstünde Sistem
  8. 8. Jetson TX1 Developer Kit Jetson TX1 Developer Board 5MP Camera
  9. 9. DIGITS Workflow VisionWorks Jetson Multimedia SDK ve diğer teknolojiler: CUDA, Linux4Tegra, NSIGHT EE, OpenCV4Tegra, OpenGL, Vulkan, System Trace, Visual Profiler, Ubuntu 14.04 Deep Learning SDK NVIDIA JETPACK
  10. 10. Linux for Tegra Compute (CUDA) Jetson TX1 Vision Machine Learning cuSPARSE cuSolver cuFFT cuBLAS NPP cuRAND Thrust CUDA Math Library Graphics Araçlar NVTX NVIDIA Tools eXtension Source code editor Debugger Profiler System Trace Dikey Entegre Edilmiş Paketler V4L2 libjpeg JETSON SDK: DETAYLAR
  11. 11. VISIONWORKS™ CUDA-accelerated Computer Vision Toolkit • Full OpenVX 1.1 implementation • Easy integration with existing CV pipelines • Custom extensions Applications VisionWorks CUDA Jetson TX1 VisionWorks™ Toolkit Robotics Augmented Reality Drones Example Applications Feature Tracking Structure from Motion Object Tracking Dense Optical Flow VisionWorks™ API + FrameWorks IMAGE ARITHMETIC AbsoluteDifference AccumulateImage Accumulate Squared Accumulate Weighted Add / Subtract / Multiply Channel Combine ChannelExtract GEOMETRIC TRANSFORMS Affine Warp + Perspective Warp Flip Image Gaussian Pyramid Remap Scale Image Features Canny EdgeDetector Fast Corners+ Fast Track Harris Corners + HarrisTrack HoughCircles HoughLines
  12. 12. • Jetpack SDK • Libraries • Developer tools • Design collateral • Developer Forum • Training and Tutorials • Ecosystem http://developer.nvidia.com/embedded-computing Kapsamlı Geliştirici Platformu
  13. 13. GETTING STARTED JETSON COMMUNITY Developer Forums devtalk.nvidia.com eLinux Wiki eLinux.org/Jetson_TX1
  14. 14. • Infrared devices: • SICK LIDAR (LMS 200); Hokuyo; rpLIDAR • Asus Xtion Pro Live (PrimeSense) • Intel RealSense (mult. generations) • Stereo and color cameras: • StereoLabs Zed (consumer-oriented) • Point Grey Research USB3 and GigE • e-con Systems CSI-MIPI Cameras with external ISP THE PERIPHERALS JETSON CONNECTS WITH including Community Contributions
  15. 15. JETSON TX1 MODÜLÜ YERLEŞTİRME Modüler Ekosistem • ConnectTech Orbitty • ConnectTech Rosie • Auvidea J120 • Colorado Engineering TX1-SOM TX1 MODÜL
  16. 16. GPU Inference Engine ile Gerçek Zamanlı Derin Öğrenme Ağlarını Uygulama
  17. 17. 72% 74% 84% 88% 93% 96.4% Human:94.9% 2010 2011 2012 20152013 2014 GPU’da Derin Öğrenme OTONOMA NE KADAR UZAĞIZ? ImageNet sınıflandırma doğruluğu
  18. 18. DERİN ÖĞRENME Fark Ne?
  19. 19. Derin Öğrenme DNN + Veri + HPC Geleneksel Bilgisayarlı Görü Uzman + Zaman YENİ HESAPLAMA MODELİ Otonom Makienler Onboard Zeka
  20. 20. Nesne Sınıflandırma Segmentasyon Çarpışma Önleme 3D Geriçatma Lokalizasyon/ Haritalandırma
  21. 21. POWERING THE DEEP LEARNING ECOSYSTEM NVIDIA SDK Accelerates Every Major Framework developer.nvidia.com/deep-learning-software DEEP LEARNING FRAMEWORKS COMPUTER VISION SPEECH ANDAUDIO NATURAL LANGUAGE PROCESSING Object Detection Voice Recognition Language Translation Recommendation Engines Sentiment Analysis Mocha.jl Image Classification NVIDIA DEEP LEARNING SDK NCCLcuDNN cuBLAS GIEcuSPARSE
  22. 22. A COMPLETE COMPUTE PLATFORM MANAGE TRAIN DEPLOY DIGITS DATACENTER AUTOMOTIVE TRAINTEST MANAGE /AUGMENT EMBEDDED GPU INFERENCE ENGINE
  23. 23. NVIDIA DIGITS Test Image developer.nvidia.com/digits input concat İnteraktif Derin Öğrenme GPU Eğitim Sistemi Veri İşleme DNN Yapılandırma İşlem Görüntüleme Görselleştirme
  24. 24. FIRST Team 900 ROBUST DATA COLLECTION ZEBRACORNS team900.org
  25. 25. GPU INFERENCE ENGINE Workflow DIGITS OPTIMIZATION ENGINE EXECUTION ENGINE PLANNEURAL NETWORK input concatdeveloper.nvidia.com/gpu-inference-engine
  26. 26. NVIDIA GPU Inference Engine (GIE) provides even higher efficiency and performance for neural network inference. Tests performed using GoogLenet. CPU-only: Single-socket Intel Xeon (Haswell) E5-2698 v3@2.3GHz with HT. GPU: NVIDIA Tesla M4 + cuDNN 5 RC. GPU + GIE: NVIDIA Tesla M4 + GIE.
  27. 27. input concat
  28. 28. GPU INFERENCE ENGINE Optimizations • Fuse network layers • Eliminate concatenation layers • Kernel specialization • Auto-tuning for target platform • Select optimal tensor layout • Batch size tuningTRAINED NEURALNETWORK input concat OPTIMIZED INFERENCE RUNTIME developer.nvidia.com/gpu-inference-engine
  29. 29. Graph Optimization concat max pool next input 1x1 conv. relu bias relu bias 1x1 conv. relu bias 3x3 conv. relu bias 1x1 conv. relu bias 5x5 conv. relu bias 1x1 conv. input concat
  30. 30. Graph Optimization Vertical fusion max pool input concat next input concat 1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR 1x1 CBR
  31. 31. Graph Optimization Horizontal fusion concat max pool next input 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR input concat
  32. 32. Graph Optimization Concat elision max pool input next input 3x3 CBR 5x5 CBR 1x1 CBR 1x1 CBR
  33. 33. • Baseline is cuDNN / cuBLAS • Direct convolution kernels for small batch • Custom Winograd & Implicit GEMM for Half2 • Custom Deconvolution for filter size == stride case • Weight pre-transform for Winograd • Optimal T/N choice for BLAS • Run cudnnFindForwardConvolutionEx() with multiple iterations Autotuning Choose the fastest kernel for each layer
  34. 34. // create the network definition INetworkDefinition* network = infer->createNetwork(); // create a map from caffe blob names to GIE tensors std::unordered_map<std::string, infer1::Tensor> blobNameToTensor; // populate the network definition and map CaffeParser* parser = new CaffeParser; parser->parse(deployFile, modelFile, *network, blobNameToTensor); // tell GIE which tensors are required outputs for (auto& s : outputs) network->setOutput(blobNameToTensor[s]); Build Importing a Caffe Model
  35. 35. // Specify the maximum batch size and scratch size CudaEngineBuildContext buildContext; buildContext.maxBatchSize = maxBatchSize; buildContext.maxWorkspaceSize = 1 << 20; // create the engine ICudaEngine* engine = infer->createCudaEngine(buildContext, *network); // serialize to a C++ stream engine->serialize(gieModelStream); Build Engine Creation
  36. 36. // get array bindings for input and output int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME), outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME); // set array of input and output buffers void* buffers[2]; buffers[inputIndex] = gpuInputBuffer; buffers[outputIndex] = gpuOutputBuffer; Runtime Binding Buffers
  37. 37. // Specify the batch size CudaEngineContext context; context.batchSize = batchSize; // add GIE kernels to the given stream engine->enqueue(context, buffers, stream, NULL); <…> // wait on the stream cudaStreamSynchronize(stream); Runtime Running the Engine
  38. 38. Training organizations and individuals to solve challenging problems using Deep Learning On-site workshops and online courses presented by certified experts Covering complete workflows for proven application use cases Image classification, object detection, natural language processing, recommendation systems, and more www.nvidia.com/dli Hands-on Training for Data Scientists and Software Engineers NVIDIA Deep Learning Institute
  39. 39. Deep Reinforcement Learning
  40. 40. PLAYING ATARI WITH DEEPMIND From Pixels to Actions: Human-level control through Deep Reinforcement Learning
  41. 41. http://arxiv.org/abs/1602.01783
  42. 42. http://arxiv.org/abs/1602.01783 Inside Google’s DeepMind AlphaGo GPU cluster
  43. 43. END-TO-END LEARNING Motor PWM Sensory Inputs Perceptron RNN Recognition Inference Goal/Reward user task Short-termLong-term MOTION CONTROL AUTONOMOUS NAVIGATION
  44. 44. 49 OpenAI Gym Gazebo Unreal4Torch PhysX Others SIMULATION Physical Intuition
  45. 45. A reinforcement learning agent includes: state (environment) actions (controls) reward (feedback) A value function predicts the future reward of performing actions in the current state Given the recent state, action with the maximum estimated future reward is chosen for execution For agents with complex state spaces, deep networks are used as Q-value approximator Numerical solver (gradient descent) optimizes the network on-the-fly based on reward inputs Q-LEARNING How’s it work?
  46. 46. LSTM ACCELERATION Launch a 2D grid of RNN cells Multiple layers in a single call are faster Doesn’t suffer from vanishing gradient Able to adopt long-term strategy Supports: Partially-observable environments Uni/Bidirectional RNNs Non-uniform length minibatches Dropout between layers
  47. 47. DEEP-LEARNING RESEARCH ROVER TURBO 2.0 github.org/dusty-nv
  48. 48. Derin Öğrenme Sunucuları (Kütüphane, veri setleri, ağ yapısı ve modellerini içerir) İstek ön işleme ve sonuç döndürme katmanı Kullanıcı arayüzü (Web+Api desteği) Görüntü Analizi Ses analizi Veri analizi Müşteriye özel analiz yapısı Girdi Çıktı Resim Video Ses (sinyal) Veri Gerçek zamanlı Sınıflandırılmış veya anlamlandırılmış çıktı Open Zeka Mimarisi
  49. 49. Open Zeka API GPU ve CPU Bulutu Üzerinde Gömülü Sistemler Jetson TX1-TK1 Rasberry Pi 3 Test devam ediyor
  50. 50. Frame Dönüşümü Ses Ayrışımı Resim ne anlatıyor? Ses ne anlatıyor?
  51. 51. Görüntü Kaynaklar Tür Model Fotoğraf Video frame RGB Termal (LWIR/SWI R) Monochrom e Nesne Tespiti Yüz tanıma Konsept Konsept MSI/HSI Ses Metin/Veri Veri
  52. 52. Open Zeka Servisi Son kullanıcıya Cloud üzerinde insan algısına yakın bir seviyede görüntü, ses ve veri analizi sunma Model barındırma servisi (Geliştirici arayüz desteği) Algoritma geliştirme ve barındırma servisi (Esnek mimari)
  53. 53. Nerede Kullanılacak • Kamera görüntülerinin (Resim-akış) gerçek zamanlı anlamlandırılması, • Eğlence sektörü, • Sürücü destek sistemleri, • Otonom ve robotik sistemler (Gömülü teknoloji) • Savunma sanayiinde sensör kullanan mimarilere yapay zekâ kazandırılması (Karar destek sistemi) • Sağlık alanında görüntü ve veri analizi • Büyük veri analizi (Finans) Güvenlik kameralarının bulut içerisinde gerçek zamanlı analizi
  54. 54. Open Zeka Jetson TX1 Türkiye tedarikçisidir.
  55. 55. Türkiye Derin Öğrenme Grubu Sayfası: https://www.linkedin.com/grp/home?gid=8334641 Ankara Derin Öğrenme Meetup Sayfası: http://www.meetup.com/Ankara-Deep-Learning Derin Öğrenme Grup Sayfası: https://www.facebook.com/groups/derin.ogrenme http://www.derinogrenme.com
  56. 56. “If we knew what it was we were doing, it would not be called research, would it?” Einstein TEŞEKKÜRLER.

×