SlideShare a Scribd company logo
1 of 16
SECURITY ISSUES IN BIG
DATA
Shallote Dsouza
WHAT IS BIG DATA?
Big data refers to data that is so large and complex that it exceeds the processing
capability of conventional data management systems and software techniques.
Data becomes big data when individual data stops mattering and only a large
collection of it or analysis derived from it are of value
Offers many opportunities - advancement of science, improvement of health care,
promotion of economic growth, enhancement of education system and more ways
of social interaction and entertainment.
 But Big data has its issues of security and privacy too due to its huge volume,
high velocity, large variety in data sources and formats etc.
DIMENSIONS OF BIG DATA
Big Data possesses characteristics that can be defined by several V’s
Volume
Refers to quantity of data. Big data is defined as massive data sets with measures such as
petabytes and zeta bytes. Vast amounts of data are generated every second. Today big
data is generated by machines, networks and human interaction on systems like social
media. Volume of data to be analysed is massive.
Velocity
Deals with the accelerating speed at which data flows in from sources like business
processes, machines, networks like social media sites, mobile devices, etc. The flow of
data is continuous. Reacting quickly enough to deal with data velocity is a challenge for
most organizations.
Variety
Refers to various formats of data . Structured, numeric data in traditional databases.
Unstructured text documents, email, video, audio, stock ticker data and financial
Veracity
Refers to the quality of big data like biases, noise, abnormality of data, immeasurable
uncertainties and truthfulness and trustworthiness of data. Data that are erroneous,
duplicate and incomplete or outdated, as a whole are referred to as dirty data.
Valence
Refers to the connectedness of big data in the form of graphs just like atoms. Data items
are often directly connected to one another like a city is connected to its country. Two
Facebook users are connected as they are friends. A high valence data is denser.
Value
Refers to the fact how big data is going to benefit us and our organization. It helps in
measuring the usefulness of data in decision making. Queries can be run on the stored
data so as to deduce important results and gain insights
TOOLS FOR BIG DATA
Big Data storage and management tools
 Hadoop- Provides a software framework for distributed storage and processing of big
data using the Map Reduce programming model
 Cassandra- used for fast processing during very heavy writes and reads the environment
and stored data which is very large to fit on the server, but still want a friendly familiar
interface
MongoDB- used for dynamic queries, defining indexes for good performance on a big
database which makes applications faster and more efficiently at scale.
Apache Hive- Analysis of large datasets stored in HDFS. Also, used for data
summarization, query and ad-hoc analysis to process structured and semi-structured data
in Hadoop
Hbase- Used for real-time big data applications which contain billions of rows and
millions of columns in tables built for low latency operations
Cloudera- 100% open source and is the only Hadoop solution to offer batch processing,
interactive SQL and interactive search as well as enterprise-grade continuous availability.
TYPICAL BIG DATA ARCHITECTURE
Big data architecture varies based on a company's infrastructure and needs, but it usually
contains the following components:
1. Data sources: This can include data from databases, data from real-time sources, and
static files generated from applications, such as Windows logs.
2. Data store: Need storage for the data that will be processed via big data architecture.
Often, data will be stored in a data lake, which is a large unstructured database that
scales easily.
3. A combination of batch processing and real-time processing: Large volume of data
processed can be handled efficiently using batch processing, while real-time data
needs to be processed immediately to bring value.
4. Analytical data store: Helps keep all the data is in one place so analysis can be
comprehensive, and it is optimized for analysis rather than transactions. This might
take the form of a cloud-based data warehouse or a relational database
5. Automation: Ingesting and transforming the data, moving it in batches and stream
processes, loading it to an analytical data store, and finally deriving insights must be
in a repeatable workflow so that you can continually gain insights from your big data
GENERAL BIG DATA SECURITY
ISSUESInsecure Computation
Malicious programs are used by attackers to extract sensitive information from data
sources. This can also corrupt the data, leading to incorrect results in prediction or
analysis. It can also result into Denial of Services (DoS)
Input Validation and Filtering
Big Data collects inputs from multiple sources hence input validation is required. This
involves validating trusted data sources and filtering malicious data from the good one.
In big data gigabytes and terabytes of continuous data flow makes it really very difficult
to perform input validation or data filtering on the incoming batch of data.
Privacy Concerns in Data Mining and Analytics
Monetization of Big Data involves sharing of analytical results which involves multiple
challenges like invasion of privacy, invasive marketing and unintentional disclosure of
information. Quite a few examples of these include - AOL Inc. released search logs where
users could be identified easily, which was really concerning.
Granular Access Controls
Big data was traditionally designed with almost no security in mind. As a way out, the
parts of needed data sets, that users have right to see, are copied to a separate big data
warehouse and provided to particular user groups. For a medical research, only the
medical info (without the names, addresses) gets copied. Volumes of big data grow even
faster this way. Complex solutions adversely affect the system’s performance and
maintenance.
Insecure data storage
Authentication, authorization and encryption of data at thousands of nodes becomes a
challenging work. Auto–tiering moves cold data, which might be of use, to lesser secure
medium. Also encryption of real time data may have performance impacts. Secure
communication amongst various nodes, middlewares, and end users is disabled by
default, hence it needs to be enabled explicitly.
SECURITY ISSUES IN BIG DATA – SOME
RELEVANT USE CASES
Vulnerability to fake data generation
For instance, if a manufacturing company uses sensor data to detect malfunctioning
production processes, cybercriminals can penetrate the system and make the sensors
show fake results. The company can fail to notice alarming trends and miss the
opportunity to solve problems before serious damage is caused. Such challenges can be
solved through applying fraud detection approach.
Amazon’s Galaxy Data Lakes
Challenges faced by Amazon: data silos, difficulty analyzing diverse datasets, managing
data access and security.
1. A data silo is a situation wherein only one group in an organization can access a set of
data. Data is stored in different places and in different ways for international
expansion which keeps important data hidden. A data lake solves this problem by
uniting all the data into one central location.
2. Amazon Prime has data for fulfilment centres and packaged goods, while Amazon
Fresh has data for grocery stores and food. Even shipping programs differ
internationally. For example, different countries sometimes have different box sizes
and shapes. Different systems may also have the same type of information, but it’s
labeled differently. For example, in Europe, the term used is “cost per unit,” but in
North America, the term used is “cost per package.”
Data lakes allow you to import any amount of data in any format because there is no
predefined schema
3. Amazon’s operations finance data are spread across more than 25 databases, with
regional teams creating their own local version of datasets. Audits and controls must
be in place for each database to ensure that nobody has improper access.With a data
lake, it’s easier to get the right data to the right people at the right time
Possibility of sensitive information mining
Lack of control within big data solutions may let corrupt IT specialists or evil
business rivals mine unprotected data and sell it for their own benefit.
Companies, can incur huge losses, if such information is connected with new
product/service launch, or users’ personal information. An employee of a
company in charge of the big data store can misuse his power and violate
privacy policies. For example: stalk people by monitoring through chats. To
avoid this, proper security tools should be in place and access controls should
be applied strictly at different levels in the organizations.
High speed of NoSQL databases’ evolution and lack of security focus
NoSQL databases, handle many challenges of big data analytics without concerning much
over security issues which is embedded only in the middleware and no explicit security
enforcement is provided. NoSQL databases have weak authentication techniques and
weak password storage mechanisms. They are subjected to attacks like JSON injection,
REST injection, man-in-the-middle attack and schema injection and others. NoSQL
databases are subjected to inside attacks as well due to lenient security mechanisms. To
avoid this the following should be done:
1. Encrypting sensitive database fields
2. Keeping unencrypted values in a sandboxed environment
3. Using sufficient input validation
4. Applying strong user authentication policies
RECOMMENDATIONS TO ENHANCE BIG
DATA SECURITY
Secure Your Computation Code
To prevent malicious data entry, implement access control, code signing and dynamic
analysis of the computational code. Proper strategies need to be made to control the
impact of untrusted code if it has been able to get into the big data solution.
There are generally two ways of preventing attacks: securing the data when insecure
mapper is present, and securing the mapper.
Implement Comprehensive Input Validation and Filtering.
For better security practices, implementation of input validation and filtering on internal
and external sources is recommended. Proper evaluation of key input validation and
filtering features is required
Implement Granular Access Control.
Defining and enforcing the roles to different the kinds of users like admin,
knowledge workers, end users, developers etc. is the core part for the
implementation of granular access control.
Use policy to define which SUDO sessions are keystroke logged based on risk
and user. Implement granular assignments for who can switch sessions ("SU”)
and Audit privileged activity
Secure data storage and computation.
Important as much part of sensitive data leakage portions are encountered in
this phase. For this, the sensitive data should be segregated. Enabling Data
Encryption for sensitive data and audit administrative access on Data Nodes
marks to be a major step.
Finally the verification of proper configuration of API security of all
components is the final step for secure data storage and computation.
CONCLUSION
Big data is trending. No new application can be imagined without it producing
new forms of data, operating on data driven algorithms, and consuming
specified amount of data.
With data storing and computing environments becoming more cheaper–
encryption and compliance have introduced challenges that practically need to
be handled in a very systematic manner.
There is a big ecosystem exists for specific big data problems. Major
recommendations for dealing with the security issues are implementation of
data lakes, access controls, validation, filtration and securing data storage and
computation.
THANK YOU

More Related Content

What's hot

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challengesBee_Ware
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Wired and Wireless Network Forensics
Wired and Wireless Network ForensicsWired and Wireless Network Forensics
Wired and Wireless Network ForensicsSavvius, Inc
 
Migrating Your Databases to AWS - Tools and Services.pdf
Migrating Your Databases to AWS -  Tools and Services.pdfMigrating Your Databases to AWS -  Tools and Services.pdf
Migrating Your Databases to AWS - Tools and Services.pdfAmazon Web Services
 
Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021Amrit Chhetri
 
Demystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceHealth Catalyst
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services MarketplaceDenodo
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics amorshed
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 

What's hot (20)

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Top ten big data security and privacy challenges
Top ten big data security and privacy challengesTop ten big data security and privacy challenges
Top ten big data security and privacy challenges
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Microsoft Azure Overview
Microsoft Azure OverviewMicrosoft Azure Overview
Microsoft Azure Overview
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Wired and Wireless Network Forensics
Wired and Wireless Network ForensicsWired and Wireless Network Forensics
Wired and Wireless Network Forensics
 
Information classification
Information classificationInformation classification
Information classification
 
Migrating Your Databases to AWS - Tools and Services.pdf
Migrating Your Databases to AWS -  Tools and Services.pdfMigrating Your Databases to AWS -  Tools and Services.pdf
Migrating Your Databases to AWS - Tools and Services.pdf
 
Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021
 
Demystifying Healthcare Data Governance
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data Governance
 
Big data
Big dataBig data
Big data
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
Big Data
Big DataBig Data
Big Data
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Alteryx Presentation
Alteryx PresentationAlteryx Presentation
Alteryx Presentation
 
Business Intelligence (BI) and Data Management Basics
Business Intelligence (BI) and Data Management  Basics Business Intelligence (BI) and Data Management  Basics
Business Intelligence (BI) and Data Management Basics
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Data Security Explained
Data Security ExplainedData Security Explained
Data Security Explained
 

Similar to Security issues in big data

Similar to Security issues in big data (20)

Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
1
11
1
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data security
Big data securityBig data security
Big data security
 
Big data security
Big data securityBig data security
Big data security
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
 
BD1.pptx
BD1.pptxBD1.pptx
BD1.pptx
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 

Recently uploaded

Management and managerial skills training manual.pdf
Management and managerial skills training manual.pdfManagement and managerial skills training manual.pdf
Management and managerial skills training manual.pdffillmonipdc
 
Farmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan ManchFarmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan ManchRashtriya Kisan Manch
 
Board Diversity Initiaive Launch Presentation
Board Diversity Initiaive Launch PresentationBoard Diversity Initiaive Launch Presentation
Board Diversity Initiaive Launch Presentationcraig524401
 
Introduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-EngineeringIntroduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-Engineeringthomas851723
 
Unlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency MatrixUnlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency MatrixCIToolkit
 
Simplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes ThinkingSimplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes ThinkingCIToolkit
 
LPC Warehouse Management System For Clients In The Business Sector
LPC Warehouse Management System For Clients In The Business SectorLPC Warehouse Management System For Clients In The Business Sector
LPC Warehouse Management System For Clients In The Business Sectorthomas851723
 
Reflecting, turning experience into insight
Reflecting, turning experience into insightReflecting, turning experience into insight
Reflecting, turning experience into insightWayne Abrahams
 
Fifteenth Finance Commission Presentation
Fifteenth Finance Commission PresentationFifteenth Finance Commission Presentation
Fifteenth Finance Commission Presentationmintusiprd
 
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why DiagramBeyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why DiagramCIToolkit
 
Motivational theories an leadership skills
Motivational theories an leadership skillsMotivational theories an leadership skills
Motivational theories an leadership skillskristinalimarenko7
 
LPC Operations Review PowerPoint | Operations Review
LPC Operations Review PowerPoint | Operations ReviewLPC Operations Review PowerPoint | Operations Review
LPC Operations Review PowerPoint | Operations Reviewthomas851723
 
Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)
Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)
Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)jennyeacort
 
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证jdkhjh
 
Measuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield MetricsMeasuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield MetricsCIToolkit
 
How-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem ResolutionHow-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem ResolutionCIToolkit
 
From Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement RoadmapsFrom Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement RoadmapsCIToolkit
 

Recently uploaded (18)

Management and managerial skills training manual.pdf
Management and managerial skills training manual.pdfManagement and managerial skills training manual.pdf
Management and managerial skills training manual.pdf
 
Farmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan ManchFarmer Representative Organization in Lucknow | Rashtriya Kisan Manch
Farmer Representative Organization in Lucknow | Rashtriya Kisan Manch
 
Board Diversity Initiaive Launch Presentation
Board Diversity Initiaive Launch PresentationBoard Diversity Initiaive Launch Presentation
Board Diversity Initiaive Launch Presentation
 
Introduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-EngineeringIntroduction to LPC - Facility Design And Re-Engineering
Introduction to LPC - Facility Design And Re-Engineering
 
Unlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency MatrixUnlocking Productivity and Personal Growth through the Importance-Urgency Matrix
Unlocking Productivity and Personal Growth through the Importance-Urgency Matrix
 
Simplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes ThinkingSimplifying Complexity: How the Four-Field Matrix Reshapes Thinking
Simplifying Complexity: How the Four-Field Matrix Reshapes Thinking
 
LPC Warehouse Management System For Clients In The Business Sector
LPC Warehouse Management System For Clients In The Business SectorLPC Warehouse Management System For Clients In The Business Sector
LPC Warehouse Management System For Clients In The Business Sector
 
Reflecting, turning experience into insight
Reflecting, turning experience into insightReflecting, turning experience into insight
Reflecting, turning experience into insight
 
Fifteenth Finance Commission Presentation
Fifteenth Finance Commission PresentationFifteenth Finance Commission Presentation
Fifteenth Finance Commission Presentation
 
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why DiagramBeyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
Beyond the Five Whys: Exploring the Hierarchical Causes with the Why-Why Diagram
 
Motivational theories an leadership skills
Motivational theories an leadership skillsMotivational theories an leadership skills
Motivational theories an leadership skills
 
LPC Operations Review PowerPoint | Operations Review
LPC Operations Review PowerPoint | Operations ReviewLPC Operations Review PowerPoint | Operations Review
LPC Operations Review PowerPoint | Operations Review
 
Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)
Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)
Call Us🔝⇛+91-97111🔝47426 Call In girls Munirka (DELHI)
 
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
原版1:1复刻密西西比大学毕业证Mississippi毕业证留信学历认证
 
Measuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield MetricsMeasuring True Process Yield using Robust Yield Metrics
Measuring True Process Yield using Robust Yield Metrics
 
How-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem ResolutionHow-How Diagram: A Practical Approach to Problem Resolution
How-How Diagram: A Practical Approach to Problem Resolution
 
sauth delhi call girls in Defence Colony🔝 9953056974 🔝 escort Service
sauth delhi call girls in Defence Colony🔝 9953056974 🔝 escort Servicesauth delhi call girls in Defence Colony🔝 9953056974 🔝 escort Service
sauth delhi call girls in Defence Colony🔝 9953056974 🔝 escort Service
 
From Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement RoadmapsFrom Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
From Goals to Actions: Uncovering the Key Components of Improvement Roadmaps
 

Security issues in big data

  • 1. SECURITY ISSUES IN BIG DATA Shallote Dsouza
  • 2. WHAT IS BIG DATA? Big data refers to data that is so large and complex that it exceeds the processing capability of conventional data management systems and software techniques. Data becomes big data when individual data stops mattering and only a large collection of it or analysis derived from it are of value Offers many opportunities - advancement of science, improvement of health care, promotion of economic growth, enhancement of education system and more ways of social interaction and entertainment.  But Big data has its issues of security and privacy too due to its huge volume, high velocity, large variety in data sources and formats etc.
  • 3. DIMENSIONS OF BIG DATA Big Data possesses characteristics that can be defined by several V’s Volume Refers to quantity of data. Big data is defined as massive data sets with measures such as petabytes and zeta bytes. Vast amounts of data are generated every second. Today big data is generated by machines, networks and human interaction on systems like social media. Volume of data to be analysed is massive. Velocity Deals with the accelerating speed at which data flows in from sources like business processes, machines, networks like social media sites, mobile devices, etc. The flow of data is continuous. Reacting quickly enough to deal with data velocity is a challenge for most organizations. Variety Refers to various formats of data . Structured, numeric data in traditional databases. Unstructured text documents, email, video, audio, stock ticker data and financial
  • 4. Veracity Refers to the quality of big data like biases, noise, abnormality of data, immeasurable uncertainties and truthfulness and trustworthiness of data. Data that are erroneous, duplicate and incomplete or outdated, as a whole are referred to as dirty data. Valence Refers to the connectedness of big data in the form of graphs just like atoms. Data items are often directly connected to one another like a city is connected to its country. Two Facebook users are connected as they are friends. A high valence data is denser. Value Refers to the fact how big data is going to benefit us and our organization. It helps in measuring the usefulness of data in decision making. Queries can be run on the stored data so as to deduce important results and gain insights
  • 5. TOOLS FOR BIG DATA Big Data storage and management tools  Hadoop- Provides a software framework for distributed storage and processing of big data using the Map Reduce programming model  Cassandra- used for fast processing during very heavy writes and reads the environment and stored data which is very large to fit on the server, but still want a friendly familiar interface MongoDB- used for dynamic queries, defining indexes for good performance on a big database which makes applications faster and more efficiently at scale. Apache Hive- Analysis of large datasets stored in HDFS. Also, used for data summarization, query and ad-hoc analysis to process structured and semi-structured data in Hadoop Hbase- Used for real-time big data applications which contain billions of rows and millions of columns in tables built for low latency operations Cloudera- 100% open source and is the only Hadoop solution to offer batch processing, interactive SQL and interactive search as well as enterprise-grade continuous availability.
  • 6. TYPICAL BIG DATA ARCHITECTURE Big data architecture varies based on a company's infrastructure and needs, but it usually contains the following components: 1. Data sources: This can include data from databases, data from real-time sources, and static files generated from applications, such as Windows logs. 2. Data store: Need storage for the data that will be processed via big data architecture. Often, data will be stored in a data lake, which is a large unstructured database that scales easily. 3. A combination of batch processing and real-time processing: Large volume of data processed can be handled efficiently using batch processing, while real-time data needs to be processed immediately to bring value. 4. Analytical data store: Helps keep all the data is in one place so analysis can be comprehensive, and it is optimized for analysis rather than transactions. This might take the form of a cloud-based data warehouse or a relational database 5. Automation: Ingesting and transforming the data, moving it in batches and stream processes, loading it to an analytical data store, and finally deriving insights must be in a repeatable workflow so that you can continually gain insights from your big data
  • 7. GENERAL BIG DATA SECURITY ISSUESInsecure Computation Malicious programs are used by attackers to extract sensitive information from data sources. This can also corrupt the data, leading to incorrect results in prediction or analysis. It can also result into Denial of Services (DoS) Input Validation and Filtering Big Data collects inputs from multiple sources hence input validation is required. This involves validating trusted data sources and filtering malicious data from the good one. In big data gigabytes and terabytes of continuous data flow makes it really very difficult to perform input validation or data filtering on the incoming batch of data. Privacy Concerns in Data Mining and Analytics Monetization of Big Data involves sharing of analytical results which involves multiple challenges like invasion of privacy, invasive marketing and unintentional disclosure of information. Quite a few examples of these include - AOL Inc. released search logs where users could be identified easily, which was really concerning.
  • 8. Granular Access Controls Big data was traditionally designed with almost no security in mind. As a way out, the parts of needed data sets, that users have right to see, are copied to a separate big data warehouse and provided to particular user groups. For a medical research, only the medical info (without the names, addresses) gets copied. Volumes of big data grow even faster this way. Complex solutions adversely affect the system’s performance and maintenance. Insecure data storage Authentication, authorization and encryption of data at thousands of nodes becomes a challenging work. Auto–tiering moves cold data, which might be of use, to lesser secure medium. Also encryption of real time data may have performance impacts. Secure communication amongst various nodes, middlewares, and end users is disabled by default, hence it needs to be enabled explicitly.
  • 9. SECURITY ISSUES IN BIG DATA – SOME RELEVANT USE CASES Vulnerability to fake data generation For instance, if a manufacturing company uses sensor data to detect malfunctioning production processes, cybercriminals can penetrate the system and make the sensors show fake results. The company can fail to notice alarming trends and miss the opportunity to solve problems before serious damage is caused. Such challenges can be solved through applying fraud detection approach. Amazon’s Galaxy Data Lakes Challenges faced by Amazon: data silos, difficulty analyzing diverse datasets, managing data access and security. 1. A data silo is a situation wherein only one group in an organization can access a set of data. Data is stored in different places and in different ways for international expansion which keeps important data hidden. A data lake solves this problem by uniting all the data into one central location.
  • 10. 2. Amazon Prime has data for fulfilment centres and packaged goods, while Amazon Fresh has data for grocery stores and food. Even shipping programs differ internationally. For example, different countries sometimes have different box sizes and shapes. Different systems may also have the same type of information, but it’s labeled differently. For example, in Europe, the term used is “cost per unit,” but in North America, the term used is “cost per package.” Data lakes allow you to import any amount of data in any format because there is no predefined schema 3. Amazon’s operations finance data are spread across more than 25 databases, with regional teams creating their own local version of datasets. Audits and controls must be in place for each database to ensure that nobody has improper access.With a data lake, it’s easier to get the right data to the right people at the right time
  • 11. Possibility of sensitive information mining Lack of control within big data solutions may let corrupt IT specialists or evil business rivals mine unprotected data and sell it for their own benefit. Companies, can incur huge losses, if such information is connected with new product/service launch, or users’ personal information. An employee of a company in charge of the big data store can misuse his power and violate privacy policies. For example: stalk people by monitoring through chats. To avoid this, proper security tools should be in place and access controls should be applied strictly at different levels in the organizations.
  • 12. High speed of NoSQL databases’ evolution and lack of security focus NoSQL databases, handle many challenges of big data analytics without concerning much over security issues which is embedded only in the middleware and no explicit security enforcement is provided. NoSQL databases have weak authentication techniques and weak password storage mechanisms. They are subjected to attacks like JSON injection, REST injection, man-in-the-middle attack and schema injection and others. NoSQL databases are subjected to inside attacks as well due to lenient security mechanisms. To avoid this the following should be done: 1. Encrypting sensitive database fields 2. Keeping unencrypted values in a sandboxed environment 3. Using sufficient input validation 4. Applying strong user authentication policies
  • 13. RECOMMENDATIONS TO ENHANCE BIG DATA SECURITY Secure Your Computation Code To prevent malicious data entry, implement access control, code signing and dynamic analysis of the computational code. Proper strategies need to be made to control the impact of untrusted code if it has been able to get into the big data solution. There are generally two ways of preventing attacks: securing the data when insecure mapper is present, and securing the mapper. Implement Comprehensive Input Validation and Filtering. For better security practices, implementation of input validation and filtering on internal and external sources is recommended. Proper evaluation of key input validation and filtering features is required
  • 14. Implement Granular Access Control. Defining and enforcing the roles to different the kinds of users like admin, knowledge workers, end users, developers etc. is the core part for the implementation of granular access control. Use policy to define which SUDO sessions are keystroke logged based on risk and user. Implement granular assignments for who can switch sessions ("SU”) and Audit privileged activity Secure data storage and computation. Important as much part of sensitive data leakage portions are encountered in this phase. For this, the sensitive data should be segregated. Enabling Data Encryption for sensitive data and audit administrative access on Data Nodes marks to be a major step. Finally the verification of proper configuration of API security of all components is the final step for secure data storage and computation.
  • 15. CONCLUSION Big data is trending. No new application can be imagined without it producing new forms of data, operating on data driven algorithms, and consuming specified amount of data. With data storing and computing environments becoming more cheaper– encryption and compliance have introduced challenges that practically need to be handled in a very systematic manner. There is a big ecosystem exists for specific big data problems. Major recommendations for dealing with the security issues are implementation of data lakes, access controls, validation, filtration and securing data storage and computation.