8/29/2015
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING
SECURITY ISSUES ASSOCIATED
WITH BIG DATA IN CLOUD
COMPUTING
Seminar Advance Topics One
Submitted By
Md.Mehedi Hassan
1/26
Supervisor
Sajjad Waheed
Associate Professor
Dept. of ICT,MBSTU
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Outline
 Introduction
 Big Data
 Why Big Data
 Cloud Computing
 How Big Data is Related with Cloud Computing
 Why Choose Big Data as a Thesis Topic
 Introduction to Hadoop
 MapReduce
 Hadoop Distributed File System(HDFS)
 Application
 Advantages of Big Data
 Alternative of Big Data
 Security Issue of Big Data
 Motivation and Related Work
 Issues and Challenges
 The Proposed Approaches
 Conclusions
2/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Introduction
 To analyze complex data and to identify patterns it is very important
to securely store, manage and share large amounts of complex data
(big data).
 Big data applications are a great benefit to organizations, business,
companies and many large scale and small scale industries.
 Cloud resources are needed to support big data storage and projects,
and big data is a huge business case for moving to cloud
 The main focus is on security issues in cloud computing that are
associated with big data.
3/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Big Data
 Big Data is the word used to describe massive volumes of structured
and unstructured data that are so large that it is very difficult to
process this data using traditional databases and software
technologies.
 Big Data Source :
4/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Big Data
 Volume
 Many factors contribute towards increasing Volume
storing transaction, live streaming and data
collected from sensors etc
 Variety
 Structured: Relational data.
 Semi Structured: XML data.
Unstructured: Word, PDF, Text,
Media Logs
 Velocity
 Big Data Velocity deals with the
pace at which data flows in from
sources and human interaction
5/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Why Big Data
 Speed, Capacity and Scalability of Cloud Storage
 End Users Can Visualize Data
 Manage Data Better
 Company Can Find New Business Opportunities
 Data Analysis Methods, Capabilities will Evolve
6/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Cloud Computing
 Cloud Computing is a technology which
depends on sharing of computing
resources than having local servers or
personal devices to handle the
applications.
 In Cloud Computing, the word “Cloud”
means “The Internet”, so Cloud
Computing means a type of computing in
which services are delivered through the
Internet.
7/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
How Big Data is Related with Cloud Computing
 Cloud computing is a powerful technology to perform massive-scale and
complex computing.
 It eliminates the need to maintain expensive computing hardware,
dedicated space, and software
 Big Data need large on-demand compute power and distributed storage to
crunch the 3V data problem and Cloud seamlessly provides this elastic on-
demand
8/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Why Choose Big Data as a Thesis Topic
 As a software developer I have handle large volume of data for banking
transaction.
 Already observed for time consume to execute data for a particular select
statement or analytical SQL
 System is very slow when all branch are parallel processing.
 This problem over come using Big Data concept
 Already use Facebook,Goole,IBM etc.
 Open source (Hadoop)
 In this case I choose Big Data Topic
9/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Introduction to Hadoop
10/26
 Hadoop : Apache open source framework written in java that allows
distributed processing of large datasets across clusters of computers using
simple programming models
 Doug Cutting son’s toy
 Hadoop Architecture
Two major layers
 Processing layer :
MapReduce
 Storage layer :
Hadoop Distributed
File System
MapReduce
(Distributed Computation)
HDFS
(Distributed Storage)
YARN Framework Common Utilities
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Introduction to Hadoop (cont.)
 How Hadoop works
 Core tasks across a cluster of computers
 Data dividing into directories and files
 Files are then distributed across various cluster nodes
 HDFS, supervises the processing.
 Blocks are replicated.
 Performing sort that takes place between the map and reduce stages.
 Sending the sorted data to a certain computer.
 Advantages
 Low-cost alternative to build bigger servers
 Fault-tolerance and high availability.
 Dynamic clustering
 Automatic data distribution and open source
11/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
MapReduce
 What is MapReduce : A processing technique and a program model for
distributed computing based on java.
 Mapper
 Shuffle
 Reducer
 Java based
 Key Value
12/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
MapReduce (cont.)
13/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
MapReduce Example
14/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Hadoop Distributed File System(HDFS)
 The HDFS is a distributed, scalable, and portable file-system written in
Java for the Hadoop framework
 Features
 Distributed storage and processing
 Name Node
 Data Node
 Interface in Hadoop
 Streaming access
 Cluster status check
15/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Hadoop Distributed File System(cont.)
16/26
Name Node
Meta data(Name, replica…)
/home/foo/data, 3…
Client
Blocks
Replication
Write
Meta data Ops
Read
Block Ops
D a t a n o d e s D a t a n o d e s
Rack 1 Rack 2
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Application
17/26
Homeland
Security
Smarter
Healthcare
Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading
Analytics
Search
Quality
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Advantages of Big Data
 Cost reduction
 Faster, better decision making
 New products and services
 Perform risk analysis
18/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Alternative of Big Data
 Apache Spark (Less security than Hadoop)
 Cluster Map Reduce(Slow and less security than Hadoop)
19/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Issue and Challenge
 Network level
 Distributed Nodes
 Distributed Data
 Internodes Communication
 Authentication level
 Data Protection
 Administrative Rights for Nodes
 Authentication of Applications and Nodes
 Logging
 Data level
 Confidentiality
 Integrity
 Availability
 Generic types
 Traditional Security Tools
 Use of Different Technologies
20/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
The Proposed Approaches
 File Encryption
 Network Encryption
 Logging
 Software Format and Node Maintenance
 Nodes Authentication
 Rigorous System Testing of Map Reduce Jobs
 Honeypot Nodes
 Layered Framework for Assuring Cloud
 Third Party Secure Data Publication to Cloud
 Access Control
21/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Conclusions
 I have highlighted the main advantages and application of Big data with
cloud computing .
 Summarized security issues associated with big data in cloud computing .
 Propose cloud environments can be secured for complex business
operations.
 Propose approaches for Big Data security
22/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Future Works
 To Implement data chaptering algorithm with data security
 Data flow Hadoop to Cloud with confidential security
23/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
Q & A
24/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015 25/26
SECURITY ISSUES ASSOCIATED WITH BIG
DATA IN CLOUD COMPUTING8/29/2015
References
 Ren, Yulong, and Wen Tang. "A SERVICE INTEGRITY ASSURANCE
FRAMEWORK FOR CLOUD COMPUTING BASED ON
MAPREDUCE."Proceedings of IEEE CCIS2012. Hangzhou: 2012, pp 240 –
244, Oct. 30 2012-Nov. 1 2012
 Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the
Hadoop platform."Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.
 N, Gonzalez, Miers C, Redigolo F, Carvalho T, Simplicio M, de Sousa G.T,
and Pourzandi M. "A Quantitative Analysis of Current Security Concerns and
Solutions for Cloud Computing.". Athens:2011., pp 231 – 238, Nov. 29 2011-
Dec. 1 2011
 Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the
Hadoop platform.".Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.
26/26

Big Data (security Issue)

  • 1.
    8/29/2015 SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING SECURITY ISSUES ASSOCIATED WITH BIG DATA IN CLOUD COMPUTING Seminar Advance Topics One Submitted By Md.Mehedi Hassan 1/26 Supervisor Sajjad Waheed Associate Professor Dept. of ICT,MBSTU
  • 2.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Outline  Introduction  Big Data  Why Big Data  Cloud Computing  How Big Data is Related with Cloud Computing  Why Choose Big Data as a Thesis Topic  Introduction to Hadoop  MapReduce  Hadoop Distributed File System(HDFS)  Application  Advantages of Big Data  Alternative of Big Data  Security Issue of Big Data  Motivation and Related Work  Issues and Challenges  The Proposed Approaches  Conclusions 2/26
  • 3.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Introduction  To analyze complex data and to identify patterns it is very important to securely store, manage and share large amounts of complex data (big data).  Big data applications are a great benefit to organizations, business, companies and many large scale and small scale industries.  Cloud resources are needed to support big data storage and projects, and big data is a huge business case for moving to cloud  The main focus is on security issues in cloud computing that are associated with big data. 3/26
  • 4.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Big Data  Big Data is the word used to describe massive volumes of structured and unstructured data that are so large that it is very difficult to process this data using traditional databases and software technologies.  Big Data Source : 4/26
  • 5.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Big Data  Volume  Many factors contribute towards increasing Volume storing transaction, live streaming and data collected from sensors etc  Variety  Structured: Relational data.  Semi Structured: XML data. Unstructured: Word, PDF, Text, Media Logs  Velocity  Big Data Velocity deals with the pace at which data flows in from sources and human interaction 5/26
  • 6.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Why Big Data  Speed, Capacity and Scalability of Cloud Storage  End Users Can Visualize Data  Manage Data Better  Company Can Find New Business Opportunities  Data Analysis Methods, Capabilities will Evolve 6/26
  • 7.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Cloud Computing  Cloud Computing is a technology which depends on sharing of computing resources than having local servers or personal devices to handle the applications.  In Cloud Computing, the word “Cloud” means “The Internet”, so Cloud Computing means a type of computing in which services are delivered through the Internet. 7/26
  • 8.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 How Big Data is Related with Cloud Computing  Cloud computing is a powerful technology to perform massive-scale and complex computing.  It eliminates the need to maintain expensive computing hardware, dedicated space, and software  Big Data need large on-demand compute power and distributed storage to crunch the 3V data problem and Cloud seamlessly provides this elastic on- demand 8/26
  • 9.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Why Choose Big Data as a Thesis Topic  As a software developer I have handle large volume of data for banking transaction.  Already observed for time consume to execute data for a particular select statement or analytical SQL  System is very slow when all branch are parallel processing.  This problem over come using Big Data concept  Already use Facebook,Goole,IBM etc.  Open source (Hadoop)  In this case I choose Big Data Topic 9/26
  • 10.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Introduction to Hadoop 10/26  Hadoop : Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models  Doug Cutting son’s toy  Hadoop Architecture Two major layers  Processing layer : MapReduce  Storage layer : Hadoop Distributed File System MapReduce (Distributed Computation) HDFS (Distributed Storage) YARN Framework Common Utilities
  • 11.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Introduction to Hadoop (cont.)  How Hadoop works  Core tasks across a cluster of computers  Data dividing into directories and files  Files are then distributed across various cluster nodes  HDFS, supervises the processing.  Blocks are replicated.  Performing sort that takes place between the map and reduce stages.  Sending the sorted data to a certain computer.  Advantages  Low-cost alternative to build bigger servers  Fault-tolerance and high availability.  Dynamic clustering  Automatic data distribution and open source 11/26
  • 12.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 MapReduce  What is MapReduce : A processing technique and a program model for distributed computing based on java.  Mapper  Shuffle  Reducer  Java based  Key Value 12/26
  • 13.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 MapReduce (cont.) 13/26
  • 14.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 MapReduce Example 14/26
  • 15.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Hadoop Distributed File System(HDFS)  The HDFS is a distributed, scalable, and portable file-system written in Java for the Hadoop framework  Features  Distributed storage and processing  Name Node  Data Node  Interface in Hadoop  Streaming access  Cluster status check 15/26
  • 16.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Hadoop Distributed File System(cont.) 16/26 Name Node Meta data(Name, replica…) /home/foo/data, 3… Client Blocks Replication Write Meta data Ops Read Block Ops D a t a n o d e s D a t a n o d e s Rack 1 Rack 2
  • 17.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Application 17/26 Homeland Security Smarter Healthcare Multi-channel sales Telecom Manufacturing Traffic Control Trading Analytics Search Quality
  • 18.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Advantages of Big Data  Cost reduction  Faster, better decision making  New products and services  Perform risk analysis 18/26
  • 19.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Alternative of Big Data  Apache Spark (Less security than Hadoop)  Cluster Map Reduce(Slow and less security than Hadoop) 19/26
  • 20.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Issue and Challenge  Network level  Distributed Nodes  Distributed Data  Internodes Communication  Authentication level  Data Protection  Administrative Rights for Nodes  Authentication of Applications and Nodes  Logging  Data level  Confidentiality  Integrity  Availability  Generic types  Traditional Security Tools  Use of Different Technologies 20/26
  • 21.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 The Proposed Approaches  File Encryption  Network Encryption  Logging  Software Format and Node Maintenance  Nodes Authentication  Rigorous System Testing of Map Reduce Jobs  Honeypot Nodes  Layered Framework for Assuring Cloud  Third Party Secure Data Publication to Cloud  Access Control 21/26
  • 22.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Conclusions  I have highlighted the main advantages and application of Big data with cloud computing .  Summarized security issues associated with big data in cloud computing .  Propose cloud environments can be secured for complex business operations.  Propose approaches for Big Data security 22/26
  • 23.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Future Works  To Implement data chaptering algorithm with data security  Data flow Hadoop to Cloud with confidential security 23/26
  • 24.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 Q & A 24/26
  • 25.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 25/26
  • 26.
    SECURITY ISSUES ASSOCIATEDWITH BIG DATA IN CLOUD COMPUTING8/29/2015 References  Ren, Yulong, and Wen Tang. "A SERVICE INTEGRITY ASSURANCE FRAMEWORK FOR CLOUD COMPUTING BASED ON MAPREDUCE."Proceedings of IEEE CCIS2012. Hangzhou: 2012, pp 240 – 244, Oct. 30 2012-Nov. 1 2012  Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the Hadoop platform."Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011.  N, Gonzalez, Miers C, Redigolo F, Carvalho T, Simplicio M, de Sousa G.T, and Pourzandi M. "A Quantitative Analysis of Current Security Concerns and Solutions for Cloud Computing.". Athens:2011., pp 231 – 238, Nov. 29 2011- Dec. 1 2011  Hao, Chen, and Ying Qiao. "Research of Cloud Computing based on the Hadoop platform.".Chengdu, China: 2011, pp. 181 – 184, 21-23 Oct 2011. 26/26