雲端與Big data


Published on


  • Be the first to comment

  • Be the first to like this

雲端與Big data

  1. 1. 雲端 與 Big DataKun-Ta Chuang (莊坤達), Ph.D.Assistant ProfessorNational Cheng Kung University
  2. 2. Preliminaries∗ Before going into the discussion, we see videos talking about the future 2
  3. 3. What is Cloud Computing?∗ Okay, we still watch a video before starting the discussion about ‘Cloud Computing’ 3
  4. 4. What is Big Data?∗ Sure. We also start by watching a video! 4
  5. 5. ∗ Cloud Computing and Big Data are the definite consequence of the internet age!∗ We start the discussion from ‘Cloud Computing’ 5
  6. 6. Introduction to Cloud Computing∗ What is Cloud Computing?∗ We have different perspectives from different sides ∗ According to wikipedia, "Cloud computing is Internet-based ("Cloud") development and use of computer technology. "
  7. 7. The NIST Cloud Definition Framework Hybrid CloudsDeployment Private CommunityModels Public Cloud Cloud CloudService Software as a Platform as a Infrastructure as aModels Service (SaaS) Service (PaaS) Service (IaaS) On Demand Self-ServiceEssential Broad Network Access Rapid ElasticityCharacteristics Resource Pooling Measured Service Massive Scale Resilient ComputingCommon Homogeneity Geographic DistributionCharacteristics Virtualization Service Orientation Low Cost Software Advanced Security 7
  8. 8. What is Cloud Computing?∗ A new business opportunity? ∗ Is it far beyond distributed/grid/cluster computing? ∗ Or, just a new term?∗ Is it a new Holy Grail? I don’t understand what we would do ∗ Web 3.0, new web-scale problem? differently in the light of cloud computing ∗ Social, Location, Mobile other than changing the wording of some of our ads Oracle’s CEO Larry Ellison
  9. 9. New philosophy?What we do in the past
  10. 10. In the Cloud Era
  11. 11. We don’t need to work here
  12. 12. The Rise of a New Era in IT Cloud Platform as a Service Web Application Servers PC / Client-Server Unix ServicesMainframeCOBOL Each new era in computing brings a new application platform: for the Cloud era it is “PaaS”
  13. 13. Money? 13
  14. 14. Where can we get money? From Gartner (March, 2009)
  15. 15. It is a new Era, but Is it a new business model?∗ Let’s turn to review the history of the IC industry ∗ Do you think why Fabless Design Houses are so strong in the past 10+ years?
  16. 16. Systems Design Manufacturing Saber SysStudio VMMHW/SW Magellan SysVerilog Formality DC Ultra Test VIP IC VCS NTB Virtual Compiler Platform Star RCXT Connect. DesignWare IP Analog IP (Phys) CHIP Power Hercules CATS Sigma C Proteus SiVL PrimeTime FE TCAD NanoSim PrimeYield HSIM BE TCAD HSPICE DFM Manuf. TCAD Yield Test Libraries Mgmt Chips
  17. 17. Today: Global IC MarketSystems $1.26T Front-End Manufacturing EDA $21.9BComputersCommunications Masks* $4.0 BConsumer $3.3BIndustrial Lithography/Mask Making CMP equipmentMilitary… Ion Implanters DepositionEmbedded SW $2.5B Etching and Cleaning Silicon Other Wafers $11.4B Back-End ManufacturingIP $1.4B $6.6B Assembly Equipment Assembly Inspect.Semiconductors $269.9B Dicing BondingMicros, DSP PackagingMemory Int. Assembly SysASIC, ASSP Chips Total TestAnalogDiscrete Foundry Wafers $20.9B2008 Data (*2006)Source: VLSI Research, Gartner, IC Insights, SEMI, Information Network, Synopsys Estimates
  18. 18. A mature business
  19. 19. A mature business
  20. 20. Cloud -- Not Just a New Term?∗ Is ‘Cloud Computing’ far beyond distributed/grid/cluster computing?∗ Is it also mature?∗ 鑑古知今
  21. 21. Do we have TSMC and Synopsys in the Cloud IT industry?∗ Amazon AWS Marketplace 21
  22. 22. Look back∗ We have TSMC and Synopsys, but we still need ASML, National Instruments
  23. 23. ∗ VMWARE 23
  24. 24. Cloud Hierarchy∗ IaaS ∗ Infrastrature∗ PaaS ∗ Platform∗ SaaS ∗ Software
  25. 25. Technology HierarchyUser Level 應用 Social Computing, Enterprise, ISV,… User-Level 程式語言 Middleware Web 2.0 介面, Mashups, Workflows, … 控制 Qos Neqotiation, Ddmission Control, Core Middleware Pricing, SLA Management, Metering… 虛擬化 VM, VM management and DeploymentSystem Level 25
  26. 26. Deployment models Public cloud Community cloud Hybrid cloud Private cloudWe talk about: PublicCloud - A cloud isavailable in pay-as-you-go to the generalpublic 26
  27. 27. Utility Computing -- Pay as you go∗ Hours purchased via cloud ∗ Cloud computing offers computing can be economic benefits of distributed non-uniformly elasticity and in time transference of risk Utility Computing – the service being sold in public cloud Cloud Services = SaaS + Utility Computing
  28. 28. The spirit of ‘Pay as you go’∗ No longer require the Large Capital∗ Don’t concerned about Over-Provisioning or Under- Provisioning for prediction ∗ 選課系統 ∗ Startup companies∗ Companies with large batch-oriented tasks can be finish quickly∗ More elasticity of resources
  29. 29. Example(Provision for peak load)最高峰 :500servers最低峰 :100servers雲端需要24*300=7200(小時*伺服器)傳統模式下需要500*24=12000(小時*伺服器)雲端可以節省約1.7倍的cost!!!
  30. 30. Example(Under-provision)Active user – People use the site regularlyDefector – People abandon the sitesSuppose 10% of active user become defector whoreceive poor service due to under-provision
  31. 31. Cloud can help∗ The appearance of infinite computing resource is available to overcome load surges∗ The elimination of an up-front commitment by cloud users∗ The ability to pay for use of computing resources on a short term∗ Remember: 要喝牛奶,你不必買頭牛 31
  32. 32. Famous new Companies∗ 30,000,000 users∗ Based on Amazon AWS∗ Django web framework∗ PostgreSQL database∗ Memory cache by Redis∗ Merged by Facebook Quoted from http://instagram-engineering.tumblr.com/post/13649370142/what- powers-instagram-hundreds-of-instances-dozens-of
  33. 33. Famous new Companies∗ Also based on Amazon AWS
  34. 34. Cloud Cost∗ 在矽谷每個月租server x元, 頻寬x元 在台灣每個月租server 0.5~1x元,頻寬30~40x元!! --- 翟本喬∗ 在美國租伺服器,每台每月169~229美元,可是流量超 出我的預期…最後我的信用卡額度每個月3萬美金(約 90萬台幣)才夠用 --- 陳士駿∗ 在台灣會更慘,每個月90萬美金(2700萬台幣) 34
  35. 35. Price∗ Is Cloud-Service really cheaper?? ∗ Depend on your age/finance situations, you rent or buy houses
  36. 36. General Obstacles and Opportunities in Clouds
  37. 37. Top 10 Obstacles and Opportunities for Cloud Computing
  38. 38. Top 10 Obstacles and Opportunities for Cloud Computing ∗ 1.Availability/Business Continuity ∗ Q: User/Organization worry about whether utility computing services will have adequate availability or company may even go out of business ∗ A:Multiple and different cloud computing providers
  39. 39. Top 10 Obstacles and Opportunities for Cloud Computing ∗ 2.Data Lock-In ∗ Q:The Storage API for cloud computing are still essentially proprietary, cannot easily extract by customers ∗ A: Standardize APIs ;Compatible SW to enable Surge of Hybird of Cloud Computing
  40. 40. Top 10 Obstacles and Opportunities for Cloud Computing ∗ 3.Data Confidentiality/Auditability ∗ Q: Cloud user face security threats both from outsides and insides the cloud Outside : any third-party , cloud vender Inside : cloud user ∗ A: cloud user : virtualization ∗ cloud vender : user-level encryption ∗ any third-party : firewall
  41. 41. Top 10 Obstacles and Opportunities for Cloud Computing ∗ 4.Data Transfer Bottlenecks ∗ Q : The cost of data transfer is high and transfer rate ∗ is slow because data is in surprising size ∗ A: ship disks
  42. 42. Top 10 Obstacles and Opportunities for Cloud Computing ∗ 7.Bugs in large scale distributed systems ∗ Q:Bugs can’t appear in smaller configuration ,but appear in production data center ∗ A:Use distributed VMs
  43. 43. Top 10 Obstacles and Opportunities for Cloud Computing ∗ 10.Software Licensing ∗ Q : Cloud provisions pay more money ∗ A : Open source or pay-for-use license ∗ Why open source?? Cost issues in startup teams
  44. 44. Question?
  45. 45. Talking about ‘Big Data’ 45
  46. 46. New Data Source∗ The number of smart phone will exceed 1 billion in 2014, as expected
  47. 47. ∗ The number of app download is more than 10 billionQuoted fromhttp://android-developers.blogspot.com/search/label/Android%20Market
  48. 48. Web-Scale Problems It is BIG DATA! ∗ Characteristics: ∗ Definitely data-intensive ∗ May also be processing intensive ∗ Examples: ∗ Crawling, indexing, searching, mining the Web ∗ Social Network ∗ Web 3.0 applications
  49. 49. ∗ In 2007 the average was 5,000 tweets per day∗ In 2008 that had grown to 300,000∗ In 2009 tweets per day averaged 2.5 million∗ In 2010 that number was 35 million tweets per day∗ In the month of March 2011 alone, 140 million tweets are being sent on average per day. http://www.marketinggum.com/twitter-statistics-2011-updated-stats/ 49
  50. 50. ∗ Twitter is the top 8 websiteQuoted from http://www.alexa.com/topsites 50
  51. 51. Web-Scale Problems It is BIG DATA! http://archive.org/index.php∗ Wayback Machine has 2 PB + 20 TB/month (2006)∗ Google processes 20 PB a day (2008)∗ “all words ever spoken by human beings” ~ 5 EB∗ NOAA has ~1 PB climate data (2007)∗ CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. 51
  52. 52. Quoted from “Nosql big data Hadoop with microsoft” 52
  53. 53. What is the scale of BigData?∗ We can capture the scale of 300GB, since we have a hard disk more than the size nowaday 53
  54. 54. What is the scale of BigData?Quoted from “Nosql big data Hadoop with microsoft” 54
  55. 55. What is the scale of BigData? 55
  56. 56. Quoted from “big data the next frontier for 56innovation competition and productivity”
  57. 57. Quoted from “big data the next frontier for 57innovation competition and productivity”
  58. 58. For Big Data Analytics∗ They cannot be solved by a set of machines ∗ Many machines? ∗ Distributed/grid/cluster computing?∗ We need huge machines! ∗ Less-communication between computers ∗ Less-synchronization systems
  59. 59. Big Data Initiative in US 60
  60. 60. Big Data is the trend Open Its Power! 61
  61. 61. Databases inthe cloud era
  62. 62. Relational Database Performance
  63. 63. 64
  64. 64. 65
  65. 65. Third-party Cloud Services∗ Play as a web-services to provide Relation Database functionalities∗ Solve (2) Data Lock-In Issues
  66. 66. Snapshot of database.com
  67. 67. Snapshot of database.com
  68. 68. Traditional Database model is no longer workable! 69
  69. 69. 70
  70. 70. They are the future∗ We have data and Computing Everywhere! ∗ New terms: M2M, Internet of Things∗ The IT industry is growing but changing∗ Software and Idea are more valuable than Hardware and Labor∗ Small/Diverse/Open-Source Software is more beneficial 71
  71. 71. They are the future∗ Cross-discipline will be the best way to evolve with the trend∗ Good to touch Data-Driven Sciences ∗ Data Mining∗ Since Software is the king, welcome to join us ∗ 9:00~12:00 Thursday ∗ 4204@CSIE Building ∗ Many Talks about software or big data processing from experts in software industries such as Google, Yahoo!, Synopsys, Trend Micro 72
  72. 72. Q&A∗ Taiwan Ready? ∗ Our Network environment? ∗ Our Software environment? ∗ Our Creation?∗ No Matter you like it or not, the surge is coming∗ Thinking Big for the new Opportunities! 73