Your SlideShare is downloading. ×
0
Big Data and Cloud    Jun 30, 2011   Schubert Zhang
Who am I• Schubert Zhang (张松波)• Chief Architect and Director of Big Data Engineering  and Cloud• Research Cloud Tech., Dev...
Agenda• Introduction of Cloud Storage and Computing• Big Data and Cloud• Our Big-Data/Cloud Products and Solutions• Anythi...
PART-1:INTRODUCTION OF CLOUDSTORAGE AND COMPUTING
A Popular Definition of Cloud …•   Cloud computing is a model for enabling convenient, on-demand network access    to a sh...
A Popular Definition of Cloud …                                                Hybrid                                     ...
Examples of Famous Cloud Products•   Google                                                   Techs:     – Google AppEngin...
We focus on    The Technologies Back of the Cloud• Storage                                            • Computing•   High ...
PART-2:BIG DATA AND CLOUD
Big Data• Immutable Law of Big Data  – Volume  – Variety  – Velocity• Need ….  – Distributed System     • Many-many commod...
Big Data, Big Business  $2.25B                $400M                 $1.7B                $250M                $263M  $2.35...
The Next Decade in Data ManagementA stable system capable of variety of apps is necessary.Innovations in database are a re...
EngineeringPART-3:OUR BIG-DATA/CLOUD PRODUCTSAND SOLUTIONS
Overview                Cloud Applications            (MagicBox, EnterpriseApps …)                          Cloud Datasets...
Our Focus• Enterprise Big Data Management• Leverage of the Cloud Tech. from Internet  Backend
Hardware采用标准的普通服务器硬件(PC-Server)和网络设备,采用大数集群软件平台构建灵活的集群系统。集群规模可从几个节点到几千节点,存储规模可高达PB级。We rely more on software layer scalabi...
Products and Features                                                                        Cloud API        Cloud       ...
Cloud Service PlatformCloud Services                       相似的同类产品或业务                        •   Cloud Services APIObjectS...
Object Storage Platform                build another S3RockStor Object Storage system provides object storage infrastructu...
Object Storage Cloud Services   RESTful API举例(一个简单的对象上传/PUT操作)                                Object Storage              ...
2000                 4000                 6000                 8000                10000                    01306028040000...
DataStore Platform          build a scalable BDMS          应用层                  数据访问层                                 SQL语...
Performance of BDMSStreaming Ingest Data Throughput write ops/Sec 140000 120000 100000  80000  60000  40000  20000      0 ...
CloudNAS+MagicBox Enterprise                 Solution            办公/SOHO网络                                                ...
Parallel Computing Platform                            Applications  Dataset as Input.          job launchPartition/Split ...
Cloud Management
Thank You Very Much!          Any more question?      schubert.zhang@gmail.com       http://cloudepr.blogspot.comhttp://ww...
Upcoming SlideShare
Loading in...5
×

Big data and cloud

830

Published on

Some

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
830
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data and cloud"

  1. 1. Big Data and Cloud Jun 30, 2011 Schubert Zhang
  2. 2. Who am I• Schubert Zhang (张松波)• Chief Architect and Director of Big Data Engineering and Cloud• Research Cloud Tech., Develop Cloud Projects and Products from 2007• Led the core development team of CMCC “Big Cloud”. @Hanborq• 10-years telecom products development and tech- management. @UTStarcom
  3. 3. Agenda• Introduction of Cloud Storage and Computing• Big Data and Cloud• Our Big-Data/Cloud Products and Solutions• Anything for Discussion …
  4. 4. PART-1:INTRODUCTION OF CLOUDSTORAGE AND COMPUTING
  5. 5. A Popular Definition of Cloud …• Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.• Cloud storage is a model of networked online storage where data is stored on multiple servers. Hosting companies operate large data centers, which provides the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers or/and data centers.• It promotes availability and is composed of five essential characteristics, three service models, and four deployment models.
  6. 6. A Popular Definition of Cloud … Hybrid CloudsDeployment Private Community Public CloudModels Cloud CloudService Software as a Platform as a Infrastructure as aModels Service (SaaS) Service (PaaS) Service (IaaS) On Demand Self-ServiceEssential Broad Network Access Rapid ElasticityCharacteristics Resource Pooling Measured Service Massive Scale Elastic ComputingCommon Homogeneity Geographic DistributionCharacteristics Virtualization Service Orientation Low Cost Software Advanced Security
  7. 7. Examples of Famous Cloud Products• Google Techs: – Google AppEngine (Storage for Database, etc.) GFS2/Bigtable/MapReduce/ – Google Storage (Storage for Objects) Megastore/Spanner/Pregel /Dremel…• Amazon AWS – Simple Storage Service – S3 (Storage for Objects) Techs: – Cloud Drive (Online Storage for Individuals) Web-Service-Protocol/ – SimpleDB (Storage for Database) – Elastic Compute Cloud – EC2 (Compute) Bitstore/Keymap/Dynamo …• Rackspace Techs: – Cloud Servers (Compute) – Cloud Files (Storage for Objects) Open Stack …• Facebook Techs: – Messages Hive/Scribe/Haystack/Hadoop – Photo Storage …• Cloudera – Hadoop …
  8. 8. We focus on The Technologies Back of the Cloud• Storage • Computing• High Scalability • High Scalability – Shared-Nothing – Object-Oriented • Parallel Computing Framework – NoSQL – … – MR - MapReduce – BSP - Bulk Synchronous Parallel• High Availability – Failure-Detecting • Job/Task scheduler – Server Clustering – Replication • Failure rework – Eventual Consistency • PDM - Parallel Data Analysis/Mining – … Algorithms• Big Data – Simple Statistic/Analysis – PB level storage – Structured or non-structured – Classification/Clustering … – Information Retrieval – For Recommendation and AD – Indexing – Automatic re-sharding/re-partitioning – … – Automatic load balancing – …• High Throughput/Latency – Optimized IO and data write/read models.
  9. 9. PART-2:BIG DATA AND CLOUD
  10. 10. Big Data• Immutable Law of Big Data – Volume – Variety – Velocity• Need …. – Distributed System • Many-many commodity machines – Scale-out vs. Scale-Up • Scale-out: Auto vs. Manually
  11. 11. Big Data, Big Business $2.25B $400M $1.7B $250M $263M $2.35B >>$30.5M (vc) Storage Products/Solutions Data Warehouse NAS (Limited Scale-out) (MPP)
  12. 12. The Next Decade in Data ManagementA stable system capable of variety of apps is necessary.Innovations in database are a requirement.New data stores are necessary.Differentiation between programs ill continue until key innovations in data managementplatforms become uniform.
  13. 13. EngineeringPART-3:OUR BIG-DATA/CLOUD PRODUCTSAND SOLUTIONS
  14. 14. Overview Cloud Applications (MagicBox, EnterpriseApps …) Cloud Datasets RESTful 科研 Cloud Services (web-based) (ObjectStorage Service, DataStore Service, MapReduce Service, Compute Service …) NGO … Cloud Stack • 以Cloud Stack云技术产品和(CloudOS, SandStor, PebStor, MapReduce, vCompute, …) 方案为基础; • 提供面向大规模数据存储和 处理的行业应用解决方案: Cloud Solutions; Cloud Solutions • 提供面向公众和企业的存储、 计算、应用云服务产品: 物 互 Cloud Services;电 电 视 交 医 政 提供云应用: Cloud 联 联 … •力 信 频 通 疗 府 Applications。 网 网
  15. 15. Our Focus• Enterprise Big Data Management• Leverage of the Cloud Tech. from Internet Backend
  16. 16. Hardware采用标准的普通服务器硬件(PC-Server)和网络设备,采用大数集群软件平台构建灵活的集群系统。集群规模可从几个节点到几千节点,存储规模可高达PB级。We rely more on software layer scalability (scale-out) and fault-tolerance. 传统服务器: IBM小型机(p5 570) 联系集群系统(深腾7000G) 曙光集群系统(曙光TC5000) SUN服务器 … 传统存储系统: NAS系统 SAN系统 磁盘阵列 • 普通标准PC服务器 • 自带存储 (单点可>10TB) 弱点: • 易维护 昂贵、扩展难、限制多 • 节点可替代 • 集群扩展方便 拒绝昂贵、难扩展、局限性 • 组网灵活 多的小型机、硬件捆绑集群 • Cluster-Level Soft RAID 和SAN/NAS等存储设备。
  17. 17. Products and Features Cloud API Cloud DataStore ObjectStorage MapReduce Compute Services Cloud Cloud Cloud Cloud SandStor PebStor MapReduce Cloud vCompute CloudOS Stack Hardware & OS CloudOS SandStor PebStor MapReduce vCompute• Distributed Cloud Platform • Distributed • Distributed Blob • Flexible Parallel Data • Virtual Machines• Commodity Hardware and Structured Data Data Management Processing and Computing Cluster Management Framework Resources mgmt • Common features • Common features of CloudOS • Common features of • Multi VMs support• High Scalability CloudOS• High Reliability(Data Replication) of CloudOS • Efficiency indexes • Elastic VMs • Large-scale• High Availability • High efficiency and meta mgmt provisioning • High parallelized Indexing • Efficiency storage • Auto-scale• Strong Consistency • Locality computing • Multi-level Cache space mgmt• High Throughput • Simple model for • Compression • De-duplicating programming• Load Balancing • Fast random access, • Unlimited blob size • Abundant high-level• Global Data Access Low Latency languages and• Global File system toolkits • Flexible Schema• Simplify Complexity of Apps • Seamlessly integrated • High Durability, no data loss with storage system July 3, 2012 17
  18. 18. Cloud Service PlatformCloud Services 相似的同类产品或业务 • Cloud Services APIObjectStorage Cloud Service Amazon S3 – 基于Web,随处可得 Google Storage for Developer – RESTful风格,简单易用 Rackspace Files/OpenStack Swift – 提供对语言开发SDK Google BlobStoreDataStore Cloud Service Amazon SimpleDB • Cloud Services的特点 Google DataStore – 用户无需关心实现MapReduce Cloud Service Amazon MapReduce – 随处可得 Hadooop – 数据可靠性高Video Media Cloud Service … Video – 伸缩性强 Delivery/Streaming/Transcoding/ – 可用性高(99.9%) Time-shifting/Analytics – 按实际使用付费 – 简单易用 • Multi-Level Cloud Services: – API符合业界标准/习惯 – Infrastructure – Platform – 丰富的管理和监控工具 – Applications – 严密且灵活的安全策略 – 多种云服务整合的AAA服 务
  19. 19. Object Storage Platform build another S3RockStor Object Storage system provides object storage infrastructureservices which guaranteed efficiency, robustness and load-balance. Object Access Layer Providing Client Lib Object-Oriented High Availability MetaStore Layer DHT-based Consistent Overlay Network High Scalability Data Chunk Store Layer Autonomous Overlay Network Huge Capacity Clustered storage nodes
  20. 20. Object Storage Cloud Services RESTful API举例(一个简单的对象上传/PUT操作) Object Storage Web-based管理系统 和Amazon S3类似
  21. 21. 2000 4000 6000 8000 10000 013060280400001306028520000130602900000013060294800001306029960000130603044000013060309200001306031400000 count Total used time(hour) latency(us)1306031880000 Total average Total operations1306032360000 Total Data size(GB)1306032840000 Total throughput/sec13060333200001306033800000130603428000013060347600001306035240000 4.931306035720000 132.230 7084.320 Write(8KB) 1342208001306036200000 1024 (=1TB)130603668000013060371600001306037640000130603812000013060386000001306039080000130603956000013060400400001306040520000 17.267 464.012 2155.119 Read(8KB)1306041000000 134220800 1024 (=1TB)1306041480000 Performance of S313060419600001306042440000130604292000013060434000001306043880000130604436000013060448400001306045320000 dThrou(ops/sec)1306045800000
  22. 22. DataStore Platform build a scalable BDMS 应用层 数据访问层 SQL语言,JDBC Driver API 导入工具 数据分析接口 (包括Hadoop集成接口) 数据模型和表述层 数据模型和Schema定义,存储引擎映射API, SQL, Hadoop MapReduce接口 索引管理 简单关系模型 BDMS集群 分布式存储引擎层 WAL,写缓存和读缓存 存储文件结构和索引结构 Structured/Semi- 数据压缩和压紧 数据分布管理和索引 本地分析引擎 High Availability 分布式存储平台层 分布式数据存储 负载均衡 数据副本和一致性管理 High Scalability 数据寻址 集群服务层 集群节点网络拓扑 Big Data 故障监测 分布式异步通讯框架 BDMS逻辑架构 BDMS软件层次架构
  23. 23. Performance of BDMSStreaming Ingest Data Throughput write ops/Sec 140000 120000 100000 80000 60000 40000 20000 0 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305 321 337 353 369 385 401 417 433 449 465 481 497 513 529 545 561 577 593 609 625 641 657 673 689 705 totalThroughput deltaThroughputSLA of Random Query Query Result select * from table wherepercentage of read ops msisdn > xxx limit N;100.00% 80.00% limit 1 0.34 second 60.00% limit 10 0.31 second 40.00% limit 100 0.40 second 20.00% limit 1000 0.46 second 0.00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 limit 10000 1.25 seconds 100ms limit 500000 55.42 seconds
  24. 24. CloudNAS+MagicBox Enterprise Solution 办公/SOHO网络 Company LAN or WAN BigdataClou d NAS Proxy Enterprise Private Access files via Web Service BigdataCloud CIFS/NFS/FTP RESTful API MagicBox Service MagicBox Client• CloudNAS • MagicBox NAS Proxy + NAS in BigdataCloud Backup/Sync/Sharing/Versioning – File Server – Documents Backup – Archive Server – Backup Server – Collaboration
  25. 25. Parallel Computing Platform Applications Dataset as Input. job launchPartition/Split as used defined policy MapReduce JobTracker ass ign red assign map uceData Split-1 Map-1Data Split-2 Map-2 Reduce-1 Output-1Data Split-3 Map-3Data Split-4 Map-4 Reduce-2 Output-2Data Split-5 Map-5 MapReduce BSP
  26. 26. Cloud Management
  27. 27. Thank You Very Much! Any more question? schubert.zhang@gmail.com http://cloudepr.blogspot.comhttp://www.slideshare.net/schubertzhang
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×