The rise of “big data”
on cloud computing
Presenter
Muhammad Maaz Irfan
Reg No: 201824100008
School of Information Science and Engineering
University of Jinan, Jinan, Shandong, China
1
Contents
I. Introduction
II. Definition & characteristics of big data
III. Cloud Computing
IV. Relationship between cloud & big data
V. Case studies
VI. Big data storage system
VII. Hadoop background
VIII. Research Challenges
I. Scalability
II. Availability
III. Data integrity
IV. Transformation
V. Data quality
VI. Heterogeneity
VII. Privacy
VIII. Legal issues
IX. Governance
IX. Open Research issues
X. Conclusion
2
Introduction
• The rise of social media, Internet of Things (IoT),
and multimedia, has produced an overwhelming
flow of data in either structured
• Big data are characterized by three aspects:
– Data are numerous
– Data cannot be categorized into regular relational
databases
– Data are generated, captured, and processed rapidly.
Moreover, big data is transforming healthcare,
science, engineering, finance, business, and
eventually, the society.
3
Definition & characteristics of big data
• Big data is a set of techniques and
technologies that require new forms of
integration to uncover large hidden values
from large datasets that are
– Diverse
– Complex
– & In massive scale
4
Four Vs of big data
5
Big
data
Volume
Variety
Velocity
Value
Advantages of big data
6
Cloud Computing 7
8
Cloud Computing
9
• Cloud computing is a fast-growing technology that has
established itself in the next generation of IT industry
and business.
• Cloud computing promises reliable software, hardware,
and IaaS(Infrastructure as a Service) delivered over the
Internet and remote data centers .
• Cloud services have become a powerful architecture
to perform complex large-scale computing tasks and
span a range of IT functions from storage and
computation to database and application services.
Relationship between cloud
& big data
10
Relationship between cloud
& big data
• Cloud computing and big data are conjoined.
• Big data provides users the ability to use commodity
computing to process distributed queries across
multiple datasets and return resultant sets in a timely
manner.
• Cloud computing provides the underlying through the
use of Hadoop, a class of distributed data-processing
platforms.
• Big data utilizes distributed storage technology based
on cloud computing rather than local storage attached
to a computer or electronic device.
11
Cloud computing usage in big data
12
Case studies
• Studies provided by different vendors who
integrate big data technologies into their
cloud environment.
13
Big data storage system
14
Big data storage system
• A storage architecture that can be accessed in a highly efficient
manner while achieving availabilityand reliability is required to store
and manage large datasets.
• Existing technologies can be classified as direct attached storage
(DAS), network attached storage (NAS), and storage area network
(SAN).
• The organizational systems of data storages (DAS, NAS, and SAN)
can be divided into three parts:
– (i) disc array, where the foundation of a storage system provides the
fundamental guarantee
– (ii) connection and network subsystems, which connect one or more
disc arrays and servers
– (iii) storage management software, which oversees data sharing,
storage management, and disaster recovery tasks for multiple servers.
15
background
• Hadoop is an open-source software framework
for storing data and running applications on
clusters of commodity hardware. It provides
massive storage for any kind of data, enormous
processing power and the ability to handle
virtually limitless concurrent tasks or jobs
• The software framework that supports HDFS,
MapReduce and other related entities is called
the project Hadoop or simply Hadoop.
• This is open source and distributed by Apache
16
Features of Hadoop
I. Cost Effective System
II. Large Cluster of Notes
III. Parallel Processing
IV. Distributive Data
V. Automatic failover management
VI. Data Locality optimization
VII.Heterogeneous Cluster
VIII.Scalability
17
Research Challenges
I. Scalability = Scalability is the ability of the storage to handle increasing
amounts of data in an appropriate manner.
II. Availability = Availability refers to the resources of the system accessible on
demand by an authorized individual
III. Data integrity = A key aspect of big data security is integrity. Integrity means
that data can be modified only by authorized parties or the data owner to
prevent misuse.
IV. Transformation = Transforming data into a form suitable for analysis is an
obstacle in the adoption of big data.
V. Data quality = In the past, data processing was typically performed on clean
datasets from well-known and limited sources.
VI. Heterogeneity = Variety, one of the major aspects of big data characterization,
is the result of the growth of virtually unlimited different sources of data.
VII.Privacy = Privacy concerns continue to hamper users who outsource their
private data into the cloud storage.
VIII.Legal issues = Specific laws and regulations must be established to preserve
the personal and sensitive information of users.
IX. Governance = Data governance embodies the exercise of control and
authority over data-related rules of law, transparency. 18
Conclusion
• Two IT initiatives are currently top of mind for
organizations across the globe i.e.
– Big Data Analytics
– Cloud Computing
As a delivery model for IT services , cloud computing has the
potential to enhance business agility and productivity while
enabling greater efficiencies and reducing costs.
• In the current scenario , Big Data is a big challenge for the
organizations . To store and process such large volume of
data , variety of data and velocity of data Hadoop came into
existence.
• Our presentation is all about Cloud Computing , Big Data &
Big Data Analytics.
19
Thanks for your time 
If you any Questions please ask.
20

The rise of big data on cloud computing

  • 1.
    The rise of“big data” on cloud computing Presenter Muhammad Maaz Irfan Reg No: 201824100008 School of Information Science and Engineering University of Jinan, Jinan, Shandong, China 1
  • 2.
    Contents I. Introduction II. Definition& characteristics of big data III. Cloud Computing IV. Relationship between cloud & big data V. Case studies VI. Big data storage system VII. Hadoop background VIII. Research Challenges I. Scalability II. Availability III. Data integrity IV. Transformation V. Data quality VI. Heterogeneity VII. Privacy VIII. Legal issues IX. Governance IX. Open Research issues X. Conclusion 2
  • 3.
    Introduction • The riseof social media, Internet of Things (IoT), and multimedia, has produced an overwhelming flow of data in either structured • Big data are characterized by three aspects: – Data are numerous – Data cannot be categorized into regular relational databases – Data are generated, captured, and processed rapidly. Moreover, big data is transforming healthcare, science, engineering, finance, business, and eventually, the society. 3
  • 4.
    Definition & characteristicsof big data • Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are – Diverse – Complex – & In massive scale 4
  • 5.
    Four Vs ofbig data 5 Big data Volume Variety Velocity Value
  • 6.
  • 7.
  • 8.
  • 9.
    Cloud Computing 9 • Cloudcomputing is a fast-growing technology that has established itself in the next generation of IT industry and business. • Cloud computing promises reliable software, hardware, and IaaS(Infrastructure as a Service) delivered over the Internet and remote data centers . • Cloud services have become a powerful architecture to perform complex large-scale computing tasks and span a range of IT functions from storage and computation to database and application services.
  • 10.
  • 11.
    Relationship between cloud &big data • Cloud computing and big data are conjoined. • Big data provides users the ability to use commodity computing to process distributed queries across multiple datasets and return resultant sets in a timely manner. • Cloud computing provides the underlying through the use of Hadoop, a class of distributed data-processing platforms. • Big data utilizes distributed storage technology based on cloud computing rather than local storage attached to a computer or electronic device. 11
  • 12.
    Cloud computing usagein big data 12
  • 13.
    Case studies • Studiesprovided by different vendors who integrate big data technologies into their cloud environment. 13
  • 14.
  • 15.
    Big data storagesystem • A storage architecture that can be accessed in a highly efficient manner while achieving availabilityand reliability is required to store and manage large datasets. • Existing technologies can be classified as direct attached storage (DAS), network attached storage (NAS), and storage area network (SAN). • The organizational systems of data storages (DAS, NAS, and SAN) can be divided into three parts: – (i) disc array, where the foundation of a storage system provides the fundamental guarantee – (ii) connection and network subsystems, which connect one or more disc arrays and servers – (iii) storage management software, which oversees data sharing, storage management, and disaster recovery tasks for multiple servers. 15
  • 16.
    background • Hadoop isan open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs • The software framework that supports HDFS, MapReduce and other related entities is called the project Hadoop or simply Hadoop. • This is open source and distributed by Apache 16
  • 17.
    Features of Hadoop I.Cost Effective System II. Large Cluster of Notes III. Parallel Processing IV. Distributive Data V. Automatic failover management VI. Data Locality optimization VII.Heterogeneous Cluster VIII.Scalability 17
  • 18.
    Research Challenges I. Scalability= Scalability is the ability of the storage to handle increasing amounts of data in an appropriate manner. II. Availability = Availability refers to the resources of the system accessible on demand by an authorized individual III. Data integrity = A key aspect of big data security is integrity. Integrity means that data can be modified only by authorized parties or the data owner to prevent misuse. IV. Transformation = Transforming data into a form suitable for analysis is an obstacle in the adoption of big data. V. Data quality = In the past, data processing was typically performed on clean datasets from well-known and limited sources. VI. Heterogeneity = Variety, one of the major aspects of big data characterization, is the result of the growth of virtually unlimited different sources of data. VII.Privacy = Privacy concerns continue to hamper users who outsource their private data into the cloud storage. VIII.Legal issues = Specific laws and regulations must be established to preserve the personal and sensitive information of users. IX. Governance = Data governance embodies the exercise of control and authority over data-related rules of law, transparency. 18
  • 19.
    Conclusion • Two ITinitiatives are currently top of mind for organizations across the globe i.e. – Big Data Analytics – Cloud Computing As a delivery model for IT services , cloud computing has the potential to enhance business agility and productivity while enabling greater efficiencies and reducing costs. • In the current scenario , Big Data is a big challenge for the organizations . To store and process such large volume of data , variety of data and velocity of data Hadoop came into existence. • Our presentation is all about Cloud Computing , Big Data & Big Data Analytics. 19
  • 20.
    Thanks for yourtime  If you any Questions please ask. 20