Dr. Anita Goel
Big Data
and
Cloud Computing
Seminar on Recent Trends in Data Analytics,RKGITM, India
9th September 2017
Dyal Singh College
Department of Computer Science
University of Delhi, India
Presentation Outline
•Introduction to Big Data
•Introduction to Cloud Computing
•Cloud Storage
•Software Defined Storage
•Concept of Virtualization
2
3
Sources of Data
• Earth sciences
• Internet of Things
• Social sciences
• Astronomy
• Business
• Industry
From where data is collected
• Web Browsers
• Search Engines
• Tablets and App
• Mobile devices, tracking systems, RFID
• Sensor networks, social networks, automated
record keeping, video archives, e-commerce
Who is collecting data
• Hospitals & Other Medical Systems
• Banking & Phone Systems
• Credit Card Companies
Why
• Target Marketing
• Targeted Information
Big Data..
Attributes of Big Data
4
Big Data
Features
Volume
Velocity
VeracityValue
Variety
Volume of Data
•Data Sizes
• Exabyte - 1,024 petabytes - 1,152,921,504,606,846,976
• Zettabyte - 1,180,591,620,717,411,303,424
• Yottabyte - 1,208,925,819,614,629,174,706,176
•Volume
• 44x increase from 2009 to 2020
• Data volume increasing exponentially
5
Variety of Data
•Formats, Types and Structures
• Unstructured - text on web, audio, videos, images, pdf file,
text doc (About 85%)
• Semi structured -XML
• Structured - relational databases and spreadsheets
•Platforms - Enterprise, Social media, Sensors
•Flow of data - Static data vs. streaming data
•Single application generating/collecting many types of
data
6
Velocity Veracity and Value of Data
•Velocity
• Data generated fast, Need to be processed fast. Examples, E-
Promotions, Healthcare monitoring
•Veracity
• Diversity of quality, accuracy, Trustworthiness of data
•Value
• All four Vs are important for value specific research and
decision-support applications
7
Big Data
•Relevant technologies and expertise needed to-
8
Generate Collect Store Manage Process Analyze
Present &
utilize data,
information
& knowledge
derived
Attributes of Big Data
9
Big Data
Features
Volume
Velocity
VeracityValue
Variety
Hardware and
Software
Requirement
Real Time
Processing Ability
Technology Enablers for BDA
10
Sharma, S. K., & Wang, X. (2017). Live Data Analytics With Collaborative Edge and Cloud Processing in Wireless IoT Networks. IEEE Access, 5, 4621-4635.
AWS data
science chief
Matt Wood
• Analytics is addictive
• Positive addiction sours if infrastructure can't keep up
• Need platform to move from one scale to next .. Not in data
center frozen in time
• Companies answer original question, business has moved on
Inforworld
Matt Assay
• Picking Spark or Hadoop isn’t key to success.
Picking right infrastructure is
• On-premise solutions are complex, costly, inflexible
• Difficult to keep up with exploding demand for real time actionable information
• Store massive amounts of data and lay infrastructure to perform analytics on it.
Task is both time & resource-intensive.
11
Infrastructure Challenge for BDA
Big Data and Cloud
12
Cloud Computing
Is infrastructure
Offers scalability
Elastic, on-demand, self-service
model
Provides elastic on-demand
computer
Big Data
Represents content
Is big
About extracting VALUE
Needs large on-demand compute
power and distributed storage
Gigaom Research • 53% of large enterprises use cloud resources for BDA
Hortonworks
Connolly
• Data gravity Weather, census, machine & sensor data -
originate outside enterprise, use cloud
• Bulk of data created on premises, analytics on premises
• Stream processing of machine, sensor data; use cloud
Computing Service
13
Computing as a Resource
Computing as a Service
(NIST) What is a Cloud?
•Cloud computing is a model for
• enabling ubiquitous, convenient, on-demand network
access to a shared pool of
• configurable computing resources (e.g., networks,
servers, storage, applications and services)
• that can be rapidly provisioned and released with minimal
management effort or service provider interaction.
National Institute of Standards and Technology's (NIST)
What it means to Users is…
• Accessible everywhere via Internet
• Global access - from thin client, mobile devices, desktops
• No worry - Storage capacity, Upgrading applications
• Local machine has all software without local hosting
• Cost savings in IT investments - Infrastructure, Software, Personnel
• Pay per use utility
• Improved utilisation of compute resources
Cloud Model
•5Essential characteristics
•3Service models
•4Deployment models
16
Cloud Characteristics
• Automatic provisioning – no human
intervention
On-demand Self
Service
• Access cloud from anywhere
Broad Network Access
• Sharing of resources , Location
transparencyResource Pooling
• Can scale from 10 to 100 servers and vice versa
• Resources allocated and released on demandRapid Elasticity
• Pay-as-per-use
Measured Service
17
Source: NIST Working Definition of Cloud Computing
Cloud Service Models
Infrastructure as a Service (IaaS)
Platform as a Service (PaaS)
Software as a Service (SaaS)
Infrastructure as a Service (IaaS)
20
Storage
Provides web based,
scalable storage
Allow hiring of storage
space on cloud storage
servers
Manages data
availability and security
Network
Provides resource
sharing - geographically
separate locations
Allows connection to
desired resources
Manages network -
VPN firewalls etc.
Compute
Provides processing
power as a resource
Allows provisioning of
machines
Manages multi-
tenancy issues
Infrastructure as a Service (IaaS)
Google has 450,000 systems
running across 20 datacenters
Microsoft's Windows Live team
is doubling the number of
servers it uses every 14 months
Why buy machines when you can
rent cycles?
Cloud Deployment Models
Public
cloud
Maintained by
3rd party
Available on
subscription
basis (pay as
you go)
Private
cloud
Runs within a
company’s
own Data
Center
For internal
and/or
partners use
Hybrid
cloud
Mixed usage of
private and
public
Leasing public
cloud services
when private
capacity is
insufficient
Community
cloud
Created to meet
needs of a
community
Integrates
services of
different cloud
for community
22
Storage Service
Cloud Storage - Necessary Conditions
Illusion of
infinite
storage
capacity
• Eliminate need
to plan far
ahead for
provisioning
Elimination of
up-front
commitment by
cloud users
• Allow to start small
and increase
capacity as needed
Ability to pay
for use as
needed
• Pay for use on a
short term basis
and release as
needed
-By a report from University of California Berkeley
Cloud Storage Solutions
Storage
Types
File
Storage
Object
Storage
Block
Storage
Cloud Storage Solutions..
Block Storage
Raw physical storage via
a dedicated network
Access protocols
• fiber channel
• iSCSI
OpenStack Cinder,
Amazon Elastic Block
Store (EBS), Ceph RADOS
Block Device(RBD)
File Storage
Data is stored as files
Access protocols
• NFS
• CIFS
GlusterFS, Dropbox
Google Drive
Object Storage
Data is stored as objects
Access protocols
• REST API
• SOAP API
OpenStack Swift, Amazon
S3, Rackspace, Ceph
Variety of Data
•Formats, Types and Structures
• Unstructured - text on web, audio, videos, images, pdf file,
text doc (About 85%)
• Semi structured –XML, Sensors
• Structured - relational databases and spreadsheets
•Platforms - Enterprise, Social media, Sensors
•Flow of data - Static data vs. streaming data
•Single application generating/collecting many types of
data
27
Unstructured Data
•Information that either –
• Does not have a pre-defined data model, or
• Is not organized in a pre-defined manner
•Hard to maintain context and difficult to know
content
•Requirements of unstructured data
–durable
–accessible
–low cost
–manageable
File
File
System Metadata
• Filename: HeraPheri
• Created:
16/12/2013
• Last Modifed:
17/12/2013
Metadata
•Object Description
• Describe the object
• Specifications
• Usage Description
• Access Permissions
• Identify the one needed
• Describes the object
Object
Custom Metadata
• Director: abc
• Producer: dfgh
• Music Director: wert
• Playback singers: ghjl
• Actor: a1, a2, a3
• Release year: 2000
• Type: Comedy
Object
Object = File + Metadata
File Storage
System Metadata
• Filename: HeraPheri
• Created:
16/12/2013
• Last Modifed:
17/12/2013
Object Storage
Custom Metadata
• Director: abc
• Producer: dfgh
• Music Director: wert
• Playback singers:
ghjl
• Actor: a1, a2, a3
• Release year: 2000
• Type: Comedy
Object Storage - Advantages
Support unstructured data
 Descriptive metadata
 Variable sized data containers
 High performance
 High security
 Location independence
 Distributed
-32-
Storage Devices
•Direct Attached Storage - DAS
•Network Attached Storage - NAS
• Expensive, Scaling Issues, NAS Islands
•Storage Area Network - SAN
• Expensive, Scaling Issues, Redundant Array of
Independent Disks (RAID) Recovery time
•Software Defined Storage - SDS
33
SDS Technology
•Does not use NAS/SAN but Commodity Hardware
•Storage Node - Processor + Storage
• Each Node has a Computation power
•Control Plane separate from Data Plane
•Uses Commodity Storage for Data Plane
•Uses Server for Control Plane
•Enabling Technology
• Hadoop Distributed File System (HDFS)
34
HDFS Architecture
35
HDFS
36
Popular Object Storage Cloud Providers
•Commercial Providers
• Amazon Simple Storage Service (S3)
• Window Azure Blob Storage
• EMC Atmos
•Open Source Providers
• OpenStack
• Ceph
• Riak
37
Popular Object Storage Cloud Providers
•Commercial Providers
• Amazon Simple Storage Service (S3)
• Window Azure Blob Storage
• EMC Atmos
•Open Source Providers
• OpenStack
• Ceph
• Riak
38
What is Ceph?
•Open-Source Software
•Software Defined Storage System
•Unified Storage Solution
• Block storage, File storage, Object storage
•Cost effective – Runs on Commodity Hardware
• Provides enterprise - grade highly reliable storage
•Easy to consume - in Linux Kernel
•Integrated with OpenStack, Cinder, Ubuntu
39
Ceph: Architectural Philosophy
•Distributed Storage System
•High Performance System
•Reliable System - No single point of failure
•Massively Scalable - Exabyte levels
•1EB ~ 1000 PB ~ 1 million TB ~ 1billion GB
•Fault tolerant - Data Replication
•Self-manageable, wherever possible
40
41
Key Features
•Decoupled data and metadata – Uses CRUSH
• Files striped onto predictably named objects
• CRUSH maps objects to storage devices
•Dynamic Distributed Metadata Management
• Dynamic subtree partitioning - Distributes metadata among MDSs
•Object-based storage
• OSDs handle migration, replication, failure detection and
recovery
Source: Weil OSDI
Ceph Architecture Overview
42
Ceph Storage Cluster
Underlying Commodity Hardware
Linux OS
Ceph Client Storage Services
File Block Object
Concept of Virtualization
•Decoupling of hardware and software
•Abstract and create a layer of resources
•Uses Hypervisor for abstraction
•Abstracted resources can be
•Can be used, demanded
•Cannot be owned or configured
•Can be sliced, resized, combined, and distributed
43
Traditional Picture
44
Virtualization Architecture
•OS assumes complete control of underlying hardware
•Virtualization provides illusion through VMM
•Hypervisor or VMM is software layer
• Allows multiple VM to run on single physical host
• Provides hardware abstraction to guest OS
• Efficiently multiplexes hardware resources
45
Virtualization
Hardware
Virtual Machine Monitor (VMM) / Hypervisor
Guest OS
(Linux)
Guest OS
(NetBSD)
Guest OS
(Windows)
VM VM VM
App AppApp AppApp
46
Benefits of Virtualization
•Instant Provisioning – Fast Scalability
•Live Migration possible
•Load Balancing and Consolidation in Data Center
possible
•Virtual hardware supports legacy OS efficiently
•Security and Fault Isolation
47
Traditional OS
48
49
VMM and Guest OS
Pre VT-x and Post VT-x
51
VMM ring de-privileging of guest OS VMM executes in VMX root-mode
Guest OS aware its not at Ring 0 Guest OS de-privileging eliminated
Intel Virtualization Technology Processor Virtualization Extensions and Intel Trusted execution Technology
Pre VT-x Post VT-x
Publications
1. An Overview of Data Storage on the Cloud, P. Jain, A. Goel, S. Gupta
In Proceedings of IEEE International Conference on Advanced Research in Engineering
and Technology, India, pp. 318-322, 2013.
2. Object Storage as a Service, P. Jain, A. Goel, S. Gupta
In Proceedings of International Journal of Innovations & Advancement in Computer
Science, Vol. 4, pp. 605-614, 2015.
3. Monitoring Checklist for Ceph Object Storage Infrastructure, P. Jain, A. Goel, S.
Gupta
In Proceedings of 5th IFIP International Conference on Computer Science and Its
Application, Saida, Algeria, pp. 611-623, 2015.
4. Monitoring the Infrastructure of Riak CS, P. Jain, A. Goel, S. Gupta
In Proceedings of 11th International Multi Conference on Information Processing,
Bangalore, India, pp.137-146, 2015.
5. Requirement Checklist for Infrastructure Monitoring of Swift , P. Jain, A. Goel, S.
Gupta
The 2015 International Conference On High Performance Computing & Simulation,
HPCS, Amsterdam, Netherlands
Publications..
6. IaaS as a Service, A. Datt, A. Goel, SC Gupta
In Proceedings of SARC-IRAJ International Conference, New Delhi, India, June 2013,
ISBN: 978-81-927147-6-9, pp. 18-23
7. Comparing Infrastructure Monitoring with CloudStack Compute Services for
Cloud Computing Systems, A. Datt, A. Goel, SC Gupta
In Proceedings of 10th International Workshop - Databases in Networked International
Systems, DNIS (2015) , Japan, LNCS 8999, Springer, 2015, pp. 195-212.
8. Analysis of Infrastructure Monitoring Requirements for OpenStack Nova, A.
Datt, A. Goel, SC Gupta
In Proceedings of Eleventh International Multi Conference on Communication
Networks, ICCN 2015, August 21-23, 2015, Bangalore, India, Volume 54, ISBN: 1877-
0509, pp. 127-136
9. Monitoring list for Compute Infrastructure in Eucalyptus Cloud, A. Datt, A.
Goel, SC Gupta
In Proceedings of The 24th IEEE International Conference on Enabling Technologies:
Infrastructure for Collaborative Enterprise, Cyprus, 2015, Pages: 69 - 71, WETICE
Publications..
10. Infrastructure Monitoring of Compute Cloud, A. Datt, A. Goel, SC Gupta
Published in Journal of Advances in Economics and Business Management (AEBM), ISSN:
2394-1545, vol. 2, issue 5, pp. 439- 444
11. Cloud Service Orchestration Based Architecture of OpenStack Nova and Swift, P.
Jain, A. Datt, A. Goel, S. Gupta
5th International Conference on Advances in Computing, Communications and
Informatics, Jaipur, India September 21-24, 2016
12. Role of Hadoop in Big Data Analytics, A. Goel et al.
In CSI Communications, Vol. 41, Issue 1, April 2017
13. Session on OpenStack, P. Jain, A. Goel
3 hour Session in “Recent Trends in Big Data and Cloud Computing”, Indira Gandhi Delhi
Technical University for Women (IGDTUW), India, 19th December 2013.
14. Software Defined Storage, S.C. Gupta, A. Goel
Half day Tutorial in Asia Pacific Software Engineering Conference (APSEC), 1st December
2015, India.
Thank You
Contact: goel.anita@gmail.com

Big data and cloud computing 9 sep-2017

  • 1.
    Dr. Anita Goel BigData and Cloud Computing Seminar on Recent Trends in Data Analytics,RKGITM, India 9th September 2017 Dyal Singh College Department of Computer Science University of Delhi, India
  • 2.
    Presentation Outline •Introduction toBig Data •Introduction to Cloud Computing •Cloud Storage •Software Defined Storage •Concept of Virtualization 2
  • 3.
    3 Sources of Data •Earth sciences • Internet of Things • Social sciences • Astronomy • Business • Industry From where data is collected • Web Browsers • Search Engines • Tablets and App • Mobile devices, tracking systems, RFID • Sensor networks, social networks, automated record keeping, video archives, e-commerce Who is collecting data • Hospitals & Other Medical Systems • Banking & Phone Systems • Credit Card Companies Why • Target Marketing • Targeted Information Big Data..
  • 4.
    Attributes of BigData 4 Big Data Features Volume Velocity VeracityValue Variety
  • 5.
    Volume of Data •DataSizes • Exabyte - 1,024 petabytes - 1,152,921,504,606,846,976 • Zettabyte - 1,180,591,620,717,411,303,424 • Yottabyte - 1,208,925,819,614,629,174,706,176 •Volume • 44x increase from 2009 to 2020 • Data volume increasing exponentially 5
  • 6.
    Variety of Data •Formats,Types and Structures • Unstructured - text on web, audio, videos, images, pdf file, text doc (About 85%) • Semi structured -XML • Structured - relational databases and spreadsheets •Platforms - Enterprise, Social media, Sensors •Flow of data - Static data vs. streaming data •Single application generating/collecting many types of data 6
  • 7.
    Velocity Veracity andValue of Data •Velocity • Data generated fast, Need to be processed fast. Examples, E- Promotions, Healthcare monitoring •Veracity • Diversity of quality, accuracy, Trustworthiness of data •Value • All four Vs are important for value specific research and decision-support applications 7
  • 8.
    Big Data •Relevant technologiesand expertise needed to- 8 Generate Collect Store Manage Process Analyze Present & utilize data, information & knowledge derived
  • 9.
    Attributes of BigData 9 Big Data Features Volume Velocity VeracityValue Variety Hardware and Software Requirement Real Time Processing Ability
  • 10.
    Technology Enablers forBDA 10 Sharma, S. K., & Wang, X. (2017). Live Data Analytics With Collaborative Edge and Cloud Processing in Wireless IoT Networks. IEEE Access, 5, 4621-4635.
  • 11.
    AWS data science chief MattWood • Analytics is addictive • Positive addiction sours if infrastructure can't keep up • Need platform to move from one scale to next .. Not in data center frozen in time • Companies answer original question, business has moved on Inforworld Matt Assay • Picking Spark or Hadoop isn’t key to success. Picking right infrastructure is • On-premise solutions are complex, costly, inflexible • Difficult to keep up with exploding demand for real time actionable information • Store massive amounts of data and lay infrastructure to perform analytics on it. Task is both time & resource-intensive. 11 Infrastructure Challenge for BDA
  • 12.
    Big Data andCloud 12 Cloud Computing Is infrastructure Offers scalability Elastic, on-demand, self-service model Provides elastic on-demand computer Big Data Represents content Is big About extracting VALUE Needs large on-demand compute power and distributed storage Gigaom Research • 53% of large enterprises use cloud resources for BDA Hortonworks Connolly • Data gravity Weather, census, machine & sensor data - originate outside enterprise, use cloud • Bulk of data created on premises, analytics on premises • Stream processing of machine, sensor data; use cloud
  • 13.
    Computing Service 13 Computing asa Resource Computing as a Service
  • 14.
    (NIST) What isa Cloud? •Cloud computing is a model for • enabling ubiquitous, convenient, on-demand network access to a shared pool of • configurable computing resources (e.g., networks, servers, storage, applications and services) • that can be rapidly provisioned and released with minimal management effort or service provider interaction. National Institute of Standards and Technology's (NIST)
  • 15.
    What it meansto Users is… • Accessible everywhere via Internet • Global access - from thin client, mobile devices, desktops • No worry - Storage capacity, Upgrading applications • Local machine has all software without local hosting • Cost savings in IT investments - Infrastructure, Software, Personnel • Pay per use utility • Improved utilisation of compute resources
  • 16.
  • 17.
    Cloud Characteristics • Automaticprovisioning – no human intervention On-demand Self Service • Access cloud from anywhere Broad Network Access • Sharing of resources , Location transparencyResource Pooling • Can scale from 10 to 100 servers and vice versa • Resources allocated and released on demandRapid Elasticity • Pay-as-per-use Measured Service 17 Source: NIST Working Definition of Cloud Computing
  • 18.
    Cloud Service Models Infrastructureas a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS)
  • 19.
    Infrastructure as aService (IaaS) 20 Storage Provides web based, scalable storage Allow hiring of storage space on cloud storage servers Manages data availability and security Network Provides resource sharing - geographically separate locations Allows connection to desired resources Manages network - VPN firewalls etc. Compute Provides processing power as a resource Allows provisioning of machines Manages multi- tenancy issues
  • 20.
    Infrastructure as aService (IaaS) Google has 450,000 systems running across 20 datacenters Microsoft's Windows Live team is doubling the number of servers it uses every 14 months Why buy machines when you can rent cycles?
  • 21.
    Cloud Deployment Models Public cloud Maintainedby 3rd party Available on subscription basis (pay as you go) Private cloud Runs within a company’s own Data Center For internal and/or partners use Hybrid cloud Mixed usage of private and public Leasing public cloud services when private capacity is insufficient Community cloud Created to meet needs of a community Integrates services of different cloud for community 22
  • 22.
  • 23.
    Cloud Storage -Necessary Conditions Illusion of infinite storage capacity • Eliminate need to plan far ahead for provisioning Elimination of up-front commitment by cloud users • Allow to start small and increase capacity as needed Ability to pay for use as needed • Pay for use on a short term basis and release as needed -By a report from University of California Berkeley
  • 24.
  • 25.
    Cloud Storage Solutions.. BlockStorage Raw physical storage via a dedicated network Access protocols • fiber channel • iSCSI OpenStack Cinder, Amazon Elastic Block Store (EBS), Ceph RADOS Block Device(RBD) File Storage Data is stored as files Access protocols • NFS • CIFS GlusterFS, Dropbox Google Drive Object Storage Data is stored as objects Access protocols • REST API • SOAP API OpenStack Swift, Amazon S3, Rackspace, Ceph
  • 26.
    Variety of Data •Formats,Types and Structures • Unstructured - text on web, audio, videos, images, pdf file, text doc (About 85%) • Semi structured –XML, Sensors • Structured - relational databases and spreadsheets •Platforms - Enterprise, Social media, Sensors •Flow of data - Static data vs. streaming data •Single application generating/collecting many types of data 27
  • 27.
    Unstructured Data •Information thateither – • Does not have a pre-defined data model, or • Is not organized in a pre-defined manner •Hard to maintain context and difficult to know content •Requirements of unstructured data –durable –accessible –low cost –manageable
  • 28.
    File File System Metadata • Filename:HeraPheri • Created: 16/12/2013 • Last Modifed: 17/12/2013
  • 29.
    Metadata •Object Description • Describethe object • Specifications • Usage Description • Access Permissions • Identify the one needed • Describes the object Object Custom Metadata • Director: abc • Producer: dfgh • Music Director: wert • Playback singers: ghjl • Actor: a1, a2, a3 • Release year: 2000 • Type: Comedy
  • 30.
    Object Object = File+ Metadata File Storage System Metadata • Filename: HeraPheri • Created: 16/12/2013 • Last Modifed: 17/12/2013 Object Storage Custom Metadata • Director: abc • Producer: dfgh • Music Director: wert • Playback singers: ghjl • Actor: a1, a2, a3 • Release year: 2000 • Type: Comedy
  • 31.
    Object Storage -Advantages Support unstructured data  Descriptive metadata  Variable sized data containers  High performance  High security  Location independence  Distributed -32-
  • 32.
    Storage Devices •Direct AttachedStorage - DAS •Network Attached Storage - NAS • Expensive, Scaling Issues, NAS Islands •Storage Area Network - SAN • Expensive, Scaling Issues, Redundant Array of Independent Disks (RAID) Recovery time •Software Defined Storage - SDS 33
  • 33.
    SDS Technology •Does notuse NAS/SAN but Commodity Hardware •Storage Node - Processor + Storage • Each Node has a Computation power •Control Plane separate from Data Plane •Uses Commodity Storage for Data Plane •Uses Server for Control Plane •Enabling Technology • Hadoop Distributed File System (HDFS) 34
  • 34.
  • 35.
  • 36.
    Popular Object StorageCloud Providers •Commercial Providers • Amazon Simple Storage Service (S3) • Window Azure Blob Storage • EMC Atmos •Open Source Providers • OpenStack • Ceph • Riak 37
  • 37.
    Popular Object StorageCloud Providers •Commercial Providers • Amazon Simple Storage Service (S3) • Window Azure Blob Storage • EMC Atmos •Open Source Providers • OpenStack • Ceph • Riak 38
  • 38.
    What is Ceph? •Open-SourceSoftware •Software Defined Storage System •Unified Storage Solution • Block storage, File storage, Object storage •Cost effective – Runs on Commodity Hardware • Provides enterprise - grade highly reliable storage •Easy to consume - in Linux Kernel •Integrated with OpenStack, Cinder, Ubuntu 39
  • 39.
    Ceph: Architectural Philosophy •DistributedStorage System •High Performance System •Reliable System - No single point of failure •Massively Scalable - Exabyte levels •1EB ~ 1000 PB ~ 1 million TB ~ 1billion GB •Fault tolerant - Data Replication •Self-manageable, wherever possible 40
  • 40.
    41 Key Features •Decoupled dataand metadata – Uses CRUSH • Files striped onto predictably named objects • CRUSH maps objects to storage devices •Dynamic Distributed Metadata Management • Dynamic subtree partitioning - Distributes metadata among MDSs •Object-based storage • OSDs handle migration, replication, failure detection and recovery Source: Weil OSDI
  • 41.
    Ceph Architecture Overview 42 CephStorage Cluster Underlying Commodity Hardware Linux OS Ceph Client Storage Services File Block Object
  • 42.
    Concept of Virtualization •Decouplingof hardware and software •Abstract and create a layer of resources •Uses Hypervisor for abstraction •Abstracted resources can be •Can be used, demanded •Cannot be owned or configured •Can be sliced, resized, combined, and distributed 43
  • 43.
  • 44.
    Virtualization Architecture •OS assumescomplete control of underlying hardware •Virtualization provides illusion through VMM •Hypervisor or VMM is software layer • Allows multiple VM to run on single physical host • Provides hardware abstraction to guest OS • Efficiently multiplexes hardware resources 45
  • 45.
    Virtualization Hardware Virtual Machine Monitor(VMM) / Hypervisor Guest OS (Linux) Guest OS (NetBSD) Guest OS (Windows) VM VM VM App AppApp AppApp 46
  • 46.
    Benefits of Virtualization •InstantProvisioning – Fast Scalability •Live Migration possible •Load Balancing and Consolidation in Data Center possible •Virtual hardware supports legacy OS efficiently •Security and Fault Isolation 47
  • 47.
  • 48.
  • 49.
    Pre VT-x andPost VT-x 51 VMM ring de-privileging of guest OS VMM executes in VMX root-mode Guest OS aware its not at Ring 0 Guest OS de-privileging eliminated Intel Virtualization Technology Processor Virtualization Extensions and Intel Trusted execution Technology Pre VT-x Post VT-x
  • 50.
    Publications 1. An Overviewof Data Storage on the Cloud, P. Jain, A. Goel, S. Gupta In Proceedings of IEEE International Conference on Advanced Research in Engineering and Technology, India, pp. 318-322, 2013. 2. Object Storage as a Service, P. Jain, A. Goel, S. Gupta In Proceedings of International Journal of Innovations & Advancement in Computer Science, Vol. 4, pp. 605-614, 2015. 3. Monitoring Checklist for Ceph Object Storage Infrastructure, P. Jain, A. Goel, S. Gupta In Proceedings of 5th IFIP International Conference on Computer Science and Its Application, Saida, Algeria, pp. 611-623, 2015. 4. Monitoring the Infrastructure of Riak CS, P. Jain, A. Goel, S. Gupta In Proceedings of 11th International Multi Conference on Information Processing, Bangalore, India, pp.137-146, 2015. 5. Requirement Checklist for Infrastructure Monitoring of Swift , P. Jain, A. Goel, S. Gupta The 2015 International Conference On High Performance Computing & Simulation, HPCS, Amsterdam, Netherlands
  • 51.
    Publications.. 6. IaaS asa Service, A. Datt, A. Goel, SC Gupta In Proceedings of SARC-IRAJ International Conference, New Delhi, India, June 2013, ISBN: 978-81-927147-6-9, pp. 18-23 7. Comparing Infrastructure Monitoring with CloudStack Compute Services for Cloud Computing Systems, A. Datt, A. Goel, SC Gupta In Proceedings of 10th International Workshop - Databases in Networked International Systems, DNIS (2015) , Japan, LNCS 8999, Springer, 2015, pp. 195-212. 8. Analysis of Infrastructure Monitoring Requirements for OpenStack Nova, A. Datt, A. Goel, SC Gupta In Proceedings of Eleventh International Multi Conference on Communication Networks, ICCN 2015, August 21-23, 2015, Bangalore, India, Volume 54, ISBN: 1877- 0509, pp. 127-136 9. Monitoring list for Compute Infrastructure in Eucalyptus Cloud, A. Datt, A. Goel, SC Gupta In Proceedings of The 24th IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprise, Cyprus, 2015, Pages: 69 - 71, WETICE
  • 52.
    Publications.. 10. Infrastructure Monitoringof Compute Cloud, A. Datt, A. Goel, SC Gupta Published in Journal of Advances in Economics and Business Management (AEBM), ISSN: 2394-1545, vol. 2, issue 5, pp. 439- 444 11. Cloud Service Orchestration Based Architecture of OpenStack Nova and Swift, P. Jain, A. Datt, A. Goel, S. Gupta 5th International Conference on Advances in Computing, Communications and Informatics, Jaipur, India September 21-24, 2016 12. Role of Hadoop in Big Data Analytics, A. Goel et al. In CSI Communications, Vol. 41, Issue 1, April 2017 13. Session on OpenStack, P. Jain, A. Goel 3 hour Session in “Recent Trends in Big Data and Cloud Computing”, Indira Gandhi Delhi Technical University for Women (IGDTUW), India, 19th December 2013. 14. Software Defined Storage, S.C. Gupta, A. Goel Half day Tutorial in Asia Pacific Software Engineering Conference (APSEC), 1st December 2015, India.
  • 53.