Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data computing overview

400 views

Published on

big data computing 개요
Linux cluster
Google File System (GFS)
Hadoop Map & Reduce
Spark Stream Processing

Published in: Technology
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big data computing overview

  1. 1. 1 Big Data Computing Overview 2015.04.07 Youngsung Son
  2. 2. 2 Agenda § What am I doing? § Big Data Computing History – Supercomputer – Parallel Computing – Linux Cluster – Big Data Computing § Google File System (GFS) § Hadoop Map and Reduce § Spark Stream Processing § References
  3. 3. 3 What am I doing?
  4. 4. 4 1.PersonalCloudRepositoryAccess 2.PersonalHealthRecordRetrieval 3.CasebasedReasoning(SimilarCaseSearch) 4.ComparisionamongSimilarPatients(forHealthPlanning,Prediction,Advise) 1 2 3 4 Healing Platform
  5. 5. 5 Healing Platform 모바일 플랫폼 Open API 의료 데이터 프로바이더 1..N 5000만명x17건/365일 =~200만건/일; 라이프레코드 프로바이더 1..N 5000만명x5회 = ~3억/일 개인 힐링 레코드 저장소 1..N 5000만명/일 요청 전송 저장 서비스 분석 엔진 모바일 서비스 1..N RESTfullAPI 3초 이내 로드요청 표준 변환 Targeted 데이터 /힐링지식베이스 (NoSQL DB) TD TD TD KB 변환/필터링 스트림컴퓨팅 (업데이트 관리) 고속계산용DB DW구축 DC DC DC DC Big Data Personal Data Control Service 분석플랫폼 데이터 중계기 요청 전송 공공 임상 사례 빅데이터 개인 힐링레코드 사례 빅데이터 원본 빅데이터 (HDFS) 유사사례 검색 트렌드 플래닝 TD구축 지식베이스 구축 엔진 Cluster, CBR, …
  6. 6. 6 Big Data Computing History
  7. 7. 7 Supercomputer
  8. 8. 8 Supercomputer
  9. 9. 9 Architecture of HyperCube John P. Hayes, “Architecture of Supercomputer,” International Conference of Parallel Processing 1986. http://web.eecs.umich.edu/~tnm/trev_test/papersPDF/1986.08.Architecture%20Of%20A%20Hypercube%20Supercomputer_Conf_Parallel_Processing.pdf
  10. 10. 10 Architecture of HyperCube
  11. 11. 11 Architecture of HyperCube
  12. 12. 12 Architecture of HyperCube http://web.eecs.umich.edu/~tnm/trev_test/papersPDF/1986.08.Architecture%20Of%20A%20Hypercube%20Supercomputer_Conf_Parallel_Processing.pdf
  13. 13. 13 Parallel Computing § MPI – Message Passing Interface § PVM – Parallel Virtual Machine
  14. 14. 14 Parallel Computing § MPI (Message Passing Interface)
  15. 15. 15 Parallel Computing § PVM (Parallel Virtual Machine)
  16. 16. 16 Architecture of HyperCube Too much costy!!!! Too much difficult!!!!
  17. 17. 17 Linux Cluster
  18. 18. 18 Berkeley NOW Project (1995)
  19. 19. 19 Linux Cluster Project CROWN System Clustering Resources of Workstation’s Network (1997~1999)
  20. 20. 20
  21. 21. 21
  22. 22. 22 Linux Cluster Specifications § 16 PCs § PC’s specification – Pentium3 – 16MB – 20GB § Myrinet (300Mbps)
  23. 23. 23 Linux Cluster’s Goals
  24. 24. 24 Linux Cluster’s Goals Real-time Rendering
  25. 25. 25 Limitations of achieving this goal § Visible Human Project – Data Size : 40GB (~100GB) § Linux File System (ext2) – 16GB/1 file – IDE bandwidth : 33Mbps (66Mbps) – Ethernet bandwidth : 100Mbps (below 30Mbps) – RAM : not enough § Myrinet network interface – Too difficult to use – Kernel hooking required!!! § Programming Model – PVM or MPI – Too Slow & Difficult!!!
  26. 26. 26 Google File System
  27. 27. 27 Google File System (GFS, 2003) Sanjay Ghemawat, “The Google File System,” http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf
  28. 28. 28 Google File System (GFS) Distributed, Overlayed, Scalable File System
  29. 29. 29 Hadoop System 2005
  30. 30. 30 Map & Reduce § User Logs Counting
  31. 31. 31 Map & Reduce Why doing as this?
  32. 32. 32 How about this example? § Count Phone Call Logs? – Each user’s total time for phone call – KT’s case : 40TB / month – No exception available § Oracle Database – HW cost : ? – SW cost : Over 400,000,000 Korean Won – Time cost : about 1 day.
  33. 33. 33 Solution? § Simple is best – Log Merge for(int i=0; i<max_log;i++) user[log[i].id].usage_time += log[i].usage_time; But, Too much time required!!!
  34. 34. 34 Map & Reduce § User Logs Counting
  35. 35. 35 Spark Stream Processing 2009
  36. 36. 36 Hadoop’s Performance Problem
  37. 37. 37 Hadoop’s Peformance Problem
  38. 38. 38 Spark Stream Processing
  39. 39. 39 Spark Stream Processing
  40. 40. 40
  41. 41. 41
  42. 42. 42 Spark Code Example
  43. 43. 43 Conclusion § Big Data Computing? – Of course, it is needed!! But for us? § We did a lot. – We need to enhance our aspect? § What’s the next? – Trends are repeated!!! – Your major might be come again?
  44. 44. 44
  45. 45. 45 아이고 의미없다.
  46. 46. 46 References § John P. Hayes, “Architecture of Supercomputer,” International Conference of Parallel Processing 1986. § MPI code example, http://mpitutorial.com/tutorials/mpi- hello-world/ § PVM code example, http://www.netlib.org/pvm3/book/node17.html § Sanjay Ghemawat, “Google File System,” SOSP 2003 § Hadoop Code Example, http://azure.microsoft.com/en- us/documentation/articles/hdinsight-sample- wordcount/ § Madhukara Phatak, Introduction to Apache Spark, http://blog.madhukaraphatak.com/introduction-to- spark/
  47. 47. 47 Thank you Young-Sung Son ysson@etri.re.kr

×