Data platform at Samsung (Big Learning)


Published on

Published in: Engineering
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data platform at Samsung (Big Learning)

  1. 1. SRA-SV | Cloud Research LabSRA-SV | Cloud Research Lab Guangdeng Liao Zhan Zhang Samsung Cloud Research Lab Data Platform at Samsung
  2. 2. SRA-SV | Cloud Research Lab Slide 2 Our Mission: provide scalable, reliable, and secure storage and computation for Samsung R&D Samsung Data Platform Resources: • Hundreds of machines • Petabytes of storage • keep increasing..
  3. 3. SRA-SV | Cloud Research Lab Slide 3 What we have in our platform Distributed MR processing Data warehousing with Hive/Pig In-house web-based ETL portal Many more.. Offline K-V store HBase In-house Blob store Online Storm Many more.. Online Apache Mahout ElasticSearch In house unified web portal In house Single Sign On Visualization Many more.. Dev. & management tools By using platform, we already significantly improve ETL process, data management and processing for other teams!!
  4. 4. SRA-SV | Cloud Research Lab Slide 4 So, are we done? No. Many more complex challenges.
  5. 5. SRA-SV | Cloud Research Lab Slide 5 Challenge #1: How to build scalable and efficient machine learning over Big Data?
  6. 6. SRA-SV | Cloud Research Lab Slide 6 MR-based Mahout is good but... Not good at expressing data dependency and iterative algorithms like PageRank Map: distribute rank to link targets Reduce: collect ranks from multiple sources Iterate         n i i i tC tPR N xPR 1 )( )( )1( 1 )(  One job/iteration Startup penaltyI/O Penalty Unfortunately, a lot of MLDM are iterative jobs
  7. 7. SRA-SV | Cloud Research Lab Slide 7 Graph naturally represents data dependency
  8. 8. SRA-SV | Cloud Research Lab Slide 8 Graph-based Processing: Think like a Vertex Scheduling p p p p p p p In-memory data graph over a cluster Communication – Message-based – Shared memory- based Vertex abstraction – Think like a vertex’s – In-memory processing Execution engine – Bulk synchronous parallel – Asynchronous parallel Popular frameworks: – Giraph – GraphLab
  9. 9. SRA-SV | Cloud Research Lab Slide 9 Graph-based Machine Learning We used Apache Giraph 1.0 and developed machine learning library over it: Alternative Least Square (ALS) Weight ALS SGD ( Matrix Factorization) Bias SGD Belief Propagation Recommendation Graphical Model KMeans KMeans++ Fuzzy-Clustering Clustering We see one magnitude order of speedups compared to MR-based approach in our cluster
  10. 10. SRA-SV | Cloud Research Lab Slide 10 Challenge #2: How to make Big Model + Big Data like Deep Learning scalable and efficient?
  11. 11. SRA-SV | Cloud Research Lab Slide 11 One example: Deep Learning1 Many more examples (millions to billions parameters ) in Speech Recognition, Image Processing and NLP 1Imagenet classification with deep convolutional neural networks, in NIPS 2012
  12. 12. SRA-SV | Cloud Research Lab Slide 12 Model-Parallel Framework User defined model Auto-generation of model topology Auto-partition of topology over cluster c1 c2 Auto-deployment of topology (in- memory) c3 Neuron-like programming Message-based communication Message-driven computation Parallelize a big machine learning model over a cluster
  13. 13. SRA-SV | Cloud Research Lab Slide 13 Architecture over Yarn Node Manager Node manager Controller Partition and deploy topology Node manager Application Master Container Container Container Data Communication: • node-level • group-level Control comm. based on Thrift Data comm. based on Netty
  14. 14. SRA-SV | Cloud Research Lab Slide 14 Execution Engine • Execution Engine (Deep Neural Net) – Training layer by layer controlled by Execution Engine.. – Progress reporting – Process control: end user can control the training process, and even restart the process from a certain point – System snapshot for fault tolerance Input RBM RBMSoftmax Fully connected • Generic Execution Engine – Abstract the common design pattern from our development experiences of deep neural net algorithm. – Generalized to support various other algorithms
  15. 15. SRA-SV | Cloud Research Lab Slide 15 Model-parallel is still not scalable enough over Big Data
  16. 16. SRA-SV | Cloud Research Lab Slide 16 Deep Learning Platform: Hybrid of Data-parallelism and Model- parallelism ……..Data Chunk Model-parallel Model-parallel Data Chunk …….. Parameter Server 1 Parameter Server n …….. Parameters coordination Data-parallelism Lots of model instances Parameter servers help models learn each other
  17. 17. SRA-SV | Cloud Research Lab Slide 17 Distributed Parameter Servers Client Client Client HBase/HDFS In-memory cache/storage In-memory cache/storage In-memory cache/storage Server 1 Server 2 Server 3 Netty communication layer Currently we support asynchronous parameter pulls and push Synchronized version is also supported Pull/Push/Sync
  18. 18. SRA-SV | Cloud Research Lab Slide 18 Deep Learning Algorithms Aim at three major application fields: speech recognition, image processing and NLP What we have developed Our Roadmap Feed Forward Neural Network Restricted Boltzmann Machine Deep Belief Network Sparse Auto-encoder Convolutional Neural Network Recurrent Neural Network
  19. 19. SRA-SV | Cloud Research Lab Slide 19 Summary • We are providing our Hadoop-based data platform – hundreds machines, petabytes of storages – Hadoop ecosystem (MapReduce, HBase, Yarn, HDFS, Zookeeper, Oozie, Lipstick, Mahout etc.) – In-house ETL pipeline – In-house unified web portal with SSO • We are working hard on big learning to make our platform intelligent – Large-scale graph-based machine learning – Large-scale deep learning – And many more under progress
  20. 20. Q&A