This document discusses big data and cloud computing. It introduces cloud storage and computing models. It then discusses how big data requires distributed systems that can scale out across many commodity machines to handle large volumes and varieties of data with high velocity. The document outlines some famous cloud products and their technologies. Finally, it provides an overview of the company's focus on enterprise big data management leveraging cloud technologies, and lists some of its cloud products and services including data storage, object storage, MapReduce and compute cloud services.
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
David Loureiro, SysFera CEO, talks about "Managing large-scale, heterogeneous infrastructures: from DIET to SysFera-DS" at HP's High Performance Computing and Open Source & Linux Technical Excellence Symposium that took place on the 19-23 March, 2012, in Grenoble, France.
We present a software model built on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We discuss layers in this stack
We give examples of integrating ABDS with HPC
We discuss how to implement this in a world of multiple infrastructures and evolving software environments for users, developers and administrators
We present Cloudmesh as supporting Software-Defined Distributed System as a Service or SDDSaaS with multiple services on multiple clouds/HPC systems.
We explain the functionality of Cloudmesh as well as the 3 administrator and 3 user modes supported
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
David Loureiro, SysFera CEO, talks about "Managing large-scale, heterogeneous infrastructures: from DIET to SysFera-DS" at HP's High Performance Computing and Open Source & Linux Technical Excellence Symposium that took place on the 19-23 March, 2012, in Grenoble, France.
We present a software model built on the Apache software stack (ABDS) that is well used in modern cloud computing, which we enhance with HPC concepts to derive HPC-ABDS.
We discuss layers in this stack
We give examples of integrating ABDS with HPC
We discuss how to implement this in a world of multiple infrastructures and evolving software environments for users, developers and administrators
We present Cloudmesh as supporting Software-Defined Distributed System as a Service or SDDSaaS with multiple services on multiple clouds/HPC systems.
We explain the functionality of Cloudmesh as well as the 3 administrator and 3 user modes supported
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Compare and contrast big data processing platforms RDBMS, Hadoop, and Spark. pros and cons of each platform are discussed. Business use cases are also included.
High Performance Computing and Big Data Geoffrey Fox
We propose a hybrid software stack with Large scale data systems for both research and commercial applications running on the commodity (Apache) Big Data Stack (ABDS) using High Performance Computing (HPC) enhancements typically to improve performance. We give several examples taken from bio and financial informatics.
We look in detail at parallel and distributed run-times including MPI from HPC and Apache Storm, Heron, Spark and Flink from ABDS stressing that one needs to distinguish the different needs of parallel (tightly coupled) and distributed (loosely coupled) systems.
We also study "Java Grande" or the principles to use to allow Java codes to perform as fast as those written in more traditional HPC languages. We also note the differences between capacity (individual jobs using many nodes) and capability (lots of independent jobs) computing.
We discuss how this HPC-ABDS concept allows one to discuss convergence of Big Data, Big Simulation, Cloud and HPC Systems. See http://hpc-abds.org/kaleidoscope/
1. Big Data Analytics
- Big Data
- Spark: Big Data Analytics
- Resilient Distributed Datasets (RDD)
- Spark libraries (SQL, DataFrames, MLlib for machine learning, GraphX, and Streaming)
- PFP: Parallel FP-Growth
2. Ubiquitous Computing
- Edge Computing
- Cloudlet
- Fog computing
- Internet of Things (IoT)
- Virtualization
- Virtual Conferencing
- Virtual Events (2D, 3D, and Hybrid)
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...CloudOps Summit
CloudOps Summit 2012, Frankfurt, 20.9.2012 Track 2 - Build and Run
by Nigel Sanctuary, VP Propositions at Kognitio (www.kognitio.com)
http://cloudops.de/sprecher/#nigelsanctuary
Find the video of this talk at http://youtu.be/wQrHQNOMlKc
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
SAP HANA is an increasingly popular platform for various analytical and transactional use cases with its in-memory architecture. If you’re an SAP customer you’ve experienced the benefits.
However, the underlying storage for SAP HANA is painfully expensive. This slows down your ability to grow your SAP HANA footprint and serve up more applications.
Danny Quilton from Capacitas presented a paper, ‘Capacity Management and the Cloud’. The presentation made the case for capacity management of cloud-based services, highlighting the critical role of capacity management in controlling cloud cost. The presentation referenced a number of client engagement case studies to debunk some of the myths surrounding cloud:
Capacity can be turned up instantaneously
Capacity planning discipline is no longer required
Cloud capacity is cheap
Bottlenecks can be alleviated by expanding cloud capacity
Capacity management can be delegated to the cloud provider
Performance is guaranteed by the cloud provider
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Compare and contrast big data processing platforms RDBMS, Hadoop, and Spark. pros and cons of each platform are discussed. Business use cases are also included.
High Performance Computing and Big Data Geoffrey Fox
We propose a hybrid software stack with Large scale data systems for both research and commercial applications running on the commodity (Apache) Big Data Stack (ABDS) using High Performance Computing (HPC) enhancements typically to improve performance. We give several examples taken from bio and financial informatics.
We look in detail at parallel and distributed run-times including MPI from HPC and Apache Storm, Heron, Spark and Flink from ABDS stressing that one needs to distinguish the different needs of parallel (tightly coupled) and distributed (loosely coupled) systems.
We also study "Java Grande" or the principles to use to allow Java codes to perform as fast as those written in more traditional HPC languages. We also note the differences between capacity (individual jobs using many nodes) and capability (lots of independent jobs) computing.
We discuss how this HPC-ABDS concept allows one to discuss convergence of Big Data, Big Simulation, Cloud and HPC Systems. See http://hpc-abds.org/kaleidoscope/
1. Big Data Analytics
- Big Data
- Spark: Big Data Analytics
- Resilient Distributed Datasets (RDD)
- Spark libraries (SQL, DataFrames, MLlib for machine learning, GraphX, and Streaming)
- PFP: Parallel FP-Growth
2. Ubiquitous Computing
- Edge Computing
- Cloudlet
- Fog computing
- Internet of Things (IoT)
- Virtualization
- Virtual Conferencing
- Virtual Events (2D, 3D, and Hybrid)
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...CloudOps Summit
CloudOps Summit 2012, Frankfurt, 20.9.2012 Track 2 - Build and Run
by Nigel Sanctuary, VP Propositions at Kognitio (www.kognitio.com)
http://cloudops.de/sprecher/#nigelsanctuary
Find the video of this talk at http://youtu.be/wQrHQNOMlKc
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
SAP HANA is an increasingly popular platform for various analytical and transactional use cases with its in-memory architecture. If you’re an SAP customer you’ve experienced the benefits.
However, the underlying storage for SAP HANA is painfully expensive. This slows down your ability to grow your SAP HANA footprint and serve up more applications.
Danny Quilton from Capacitas presented a paper, ‘Capacity Management and the Cloud’. The presentation made the case for capacity management of cloud-based services, highlighting the critical role of capacity management in controlling cloud cost. The presentation referenced a number of client engagement case studies to debunk some of the myths surrounding cloud:
Capacity can be turned up instantaneously
Capacity planning discipline is no longer required
Cloud capacity is cheap
Bottlenecks can be alleviated by expanding cloud capacity
Capacity management can be delegated to the cloud provider
Performance is guaranteed by the cloud provider
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsSchubert Zhang
HFile is a mimic of Google’s SSTable. Now, it is available in Hadoop HBase-0.20.0. And the previous releases of HBase temporarily use an alternate file format – MapFile, which is a common file format in Hadoop IO package. I think HFile should also become a common file format when it becomes mature, and should be moved into the common IO package of Hadoop in the future.
Cassandra Compression and Performance EvaluationSchubert Zhang
Even though we had abandoned the Cassandra in all our products, we would like to share our works here.
Why we abandoned the Cassandra in our products? Because:
(1) It is a big wrong in Cassandra's implementation, especially on it's local storage engine layer, i.e. SSTable and Indexing.
(2) It is a big wrong to combine Bigtable and Dynamo. Dynamo's hash ring architecture is a obsolete technolohy for scale, it's consistency and replication policy is also unusable in big data storage.
Les mégadonnées représentent un vrai enjeu à la fois technique, business et de société
: l'exploitation des données massives ouvre des possibilités de transformation radicales au
niveau des entreprises et des usages. Tout du moins : à condition que l'on en soit
techniquement capable... Car l'acquisition, le stockage et l'exploitation de quantités
massives de données représentent des vrais défis techniques.
Une architecture big data permet la création et de l'administration de tous les
systèmes techniques qui vont permettre la bonne exploitation des données.
Il existe énormément d'outils différents pour manipuler des quantités massives de
données : pour le stockage, l'analyse ou la diffusion, par exemple. Mais comment assembler
ces différents outils pour réaliser une architecture capable de passer à l'échelle, d'être
tolérante aux pannes et aisément extensible, tout cela sans exploser les coûts ?
Le succès du fonctionnement de la Big data dépend de son architecture, son
infrastructure correcte et de son l’utilité que l’on fait ‘’ Data into Information into Value ‘’.
L’architecture de la Big data est composé de 4 grandes parties : Intégration, Data Processing
& Stockage, Sécurité et Opération.
We are in the midst of a computing revolution. As the cost of provisioning hardware and software stacks grows, and the cost of securing and administering these complex systems grows even faster, we're seeing a shift towards computing clouds. For cloud service providers, there is efficiency from amortizing costs and averaging usage peaks. Internet portals like Yahoo! have long offered application services, such as email for individuals and organizations. Companies are now offering services such as storage and compute cycles, enabling higher-level services to be built on top. In this talk, I will discuss Yahoo!'s vision of cloud computing, and describe some of the key initiatives, highlighting the technical challenges involved in designing hosted, multi-tenanted data management systems.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
2. Who am I
• Schubert Zhang (张松波)
• Chief Architect and Director of Big Data Engineering
and Cloud
• Research Cloud Tech., Develop Cloud Projects and
Products from 2007
• Led the core development team of CMCC “Big Cloud”.
@Hanborq
• 10-years telecom products development and tech-
management. @UTStarcom
3. Agenda
• Introduction of Cloud Storage and Computing
• Big Data and Cloud
• Our Big-Data/Cloud Products and Solutions
• Anything for Discussion …
5. A Popular Definition of Cloud …
• Cloud computing is a model for enabling convenient, on-demand network access
to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction.
• Cloud storage is a model of networked online storage where data is stored on
multiple servers. Hosting companies operate large data centers, which provides
the resources according to the requirements of the customer and expose them as
storage pools, which the customers can themselves use to store files or data
objects. Physically, the resource may span across multiple servers or/and data
centers.
• It promotes availability and is composed of five essential characteristics, three
service models, and four deployment models.
6. A Popular Definition of Cloud …
Hybrid
Clouds
Deployment Private Community Public Cloud
Models Cloud Cloud
Service Software as a Platform as a Infrastructure as a
Models Service (SaaS) Service (PaaS) Service (IaaS)
On Demand Self-Service
Essential Broad Network Access Rapid Elasticity
Characteristics
Resource Pooling Measured Service
Massive Scale Elastic Computing
Common Homogeneity Geographic Distribution
Characteristics
Virtualization Service Orientation
Low Cost Software Advanced Security
7. Examples of Famous Cloud Products
• Google Techs:
– Google AppEngine (Storage for Database, etc.) GFS2/Bigtable/MapReduce/
– Google Storage (Storage for Objects) Megastore/Spanner/Pregel
/Dremel…
• Amazon AWS
– Simple Storage Service – S3 (Storage for Objects) Techs:
– Cloud Drive (Online Storage for Individuals) Web-Service-Protocol/
– SimpleDB (Storage for Database)
– Elastic Compute Cloud – EC2 (Compute)
Bitstore/Keymap/Dynamo
…
• Rackspace
Techs:
– Cloud Servers (Compute)
– Cloud Files (Storage for Objects) Open Stack …
• Facebook Techs:
– Messages Hive/Scribe/Haystack/Hadoop
– Photo Storage
…
• Cloudera
– Hadoop …
8. We focus on
The Technologies Back of the Cloud
• Storage • Computing
• High Scalability • High Scalability
– Shared-Nothing
– Object-Oriented • Parallel Computing Framework
– NoSQL
– … – MR - MapReduce
– BSP - Bulk Synchronous Parallel
• High Availability
– Failure-Detecting • Job/Task scheduler
– Server Clustering
– Replication
• Failure rework
– Eventual Consistency • PDM - Parallel Data Analysis/Mining
– …
Algorithms
• Big Data – Simple Statistic/Analysis
– PB level storage
– Structured or non-structured – Classification/Clustering …
– Information Retrieval – For Recommendation and AD
– Indexing
– Automatic re-sharding/re-partitioning – …
– Automatic load balancing
– …
• High Throughput/Latency
– Optimized IO and data write/read models.
10. Big Data
• Immutable Law of Big Data
– Volume
– Variety
– Velocity
• Need ….
– Distributed System
• Many-many commodity machines
– Scale-out vs. Scale-Up
• Scale-out: Auto vs. Manually
11. Big Data, Big Business
$2.25B
$400M
$1.7B
$250M
$263M
$2.35B
>>$30.5M (vc)
Storage Products/Solutions Data Warehouse
NAS (Limited Scale-out) (MPP)
12. The Next Decade in Data Management
A stable system capable of variety of apps is necessary.
Innovations in database are a requirement.
New data stores are necessary.
Differentiation between programs ill continue until key innovations in data management
platforms become uniform.
17. Products and Features
Cloud API
Cloud DataStore ObjectStorage MapReduce Compute
Services Cloud Cloud Cloud Cloud
SandStor PebStor MapReduce
Cloud vCompute
CloudOS
Stack
Hardware & OS
CloudOS SandStor PebStor MapReduce vCompute
• Distributed Cloud Platform • Distributed • Distributed Blob • Flexible Parallel Data • Virtual Machines
• Commodity Hardware and Structured Data Data Management Processing and Computing
Cluster Management Framework Resources mgmt
• Common features
• Common features of CloudOS • Common features of • Multi VMs support
• High Scalability CloudOS
• High Reliability(Data Replication) of CloudOS • Efficiency indexes • Elastic VMs
• Large-scale
• High Availability • High efficiency and meta mgmt provisioning
• High parallelized
Indexing • Efficiency storage • Auto-scale
• Strong Consistency • Locality computing
• Multi-level Cache space mgmt
• High Throughput • Simple model for
• Compression • De-duplicating programming
• Load Balancing
• Fast random access, • Unlimited blob size • Abundant high-level
• Global Data Access
Low Latency languages and
• Global File system toolkits
• Flexible Schema
• Simplify Complexity of Apps • Seamlessly integrated
• High Durability, no
data loss with storage system
July 3, 2012 17
18. Cloud Service Platform
Cloud Services 相似的同类产品或业务 • Cloud Services API
ObjectStorage Cloud Service Amazon S3 – 基于Web,随处可得
Google Storage for Developer – RESTful风格,简单易用
Rackspace Files/OpenStack Swift – 提供对语言开发SDK
Google BlobStore
DataStore Cloud Service Amazon SimpleDB • Cloud Services的特点
Google DataStore – 用户无需关心实现
MapReduce Cloud Service Amazon MapReduce – 随处可得
Hadooop – 数据可靠性高
Video Media Cloud Service … Video – 伸缩性强
Delivery/Streaming/Transcoding/ – 可用性高(99.9%)
Time-shifting/Analytics
– 按实际使用付费
– 简单易用
• Multi-Level Cloud Services:
– API符合业界标准/习惯
– Infrastructure
– Platform
– 丰富的管理和监控工具
– Applications – 严密且灵活的安全策略
– 多种云服务整合的AAA服
务
19. Object Storage Platform
build another S3
RockStor Object Storage system provides object storage infrastructure
services which guaranteed efficiency, robustness and load-balance.
Object Access Layer
Providing Client Lib Object-Oriented
High Availability
MetaStore Layer
DHT-based Consistent Overlay Network
High Scalability
Data Chunk Store Layer
Autonomous Overlay Network Huge Capacity
Clustered storage nodes
24. CloudNAS+MagicBox Enterprise
Solution
办公/SOHO网络 Company LAN or WAN
BigdataClou
d NAS Proxy
Enterprise Private
Access files via Web Service BigdataCloud
CIFS/NFS/FTP RESTful API
MagicBox
Service
MagicBox
Client
• CloudNAS • MagicBox
NAS Proxy + NAS in BigdataCloud Backup/Sync/Sharing/Versioning
– File Server
– Documents Backup
– Archive Server
– Backup Server – Collaboration
25. Parallel Computing Platform
Applications
Dataset as Input. job launch
Partition/Split as used
defined policy
MapReduce
JobTracker
ass
ign
red
assign map uce
Data Split-1 Map-1
Data Split-2 Map-2
Reduce-1 Output-1
Data Split-3 Map-3
Data Split-4 Map-4
Reduce-2 Output-2
Data Split-5 Map-5
MapReduce
BSP