Hadoop Online Training : kelly technologies is the bestHadoop online Training Institutes in Bangalore. ProvidingHadoop online Training by real time faculty in Bangalore.
2. OUTLINE
Objectives
Assured Information Sharing
Layered Framework for a Secure Cloud
Cloud-based Assured Information Sharing
Cloud-based Secure Social Networking
Other Topics
Secure Hybrid Cloud
Cloud Monitoring
Cloud for Malware Detection
Cloud for Secure Big Data
Education
Directions
Related Books
www.kellytechno.com
3. TEAM MEMBERS
Sponsor: Air Force Office of Scientific Research
The University of Texas at Dallas
Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen;
Dr. Zhiqiang Lin, Dr. Kamil Sarac
Sub-contractors
Prof. Elisa Bertino (Purdue)
Ms. Anita Miller, Late Dr. Bob Johnson (North Texas Fusion
Center)
Collaborators
Late Dr. Steve Barker, Dr. Maribel Fernandez, Kings College,
U of London (EOARD)
Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria
(EOARD)
www.kellytechno.com
4. OBJECTIVES
Cloud computing is an example of computing in which dynamically
scalable and often virtualized resources are provided as a service over
the Internet. Users need not have knowledge of, expertise in, or control
over the technology infrastructure in the "cloud" that supports them.
Our research on Cloud Computing is based on Hadoop, MapReduce,
Xen
Apache Hadoop is a Java software framework that supports data
intensive distributed applications under a free license. It enables
applications to work with thousands of nodes and petabytes of data.
Hadoop was inspired by Google's MapReduce and Google File System
(GFS) papers.
XEN is a Virtual Machine Monitor developed at the University of
Cambridge, England
Our goal is to build a secure cloud infrastructure for assured information
sharing and related applications
www.kellytechno.com
5. INFORMATION OPERINFORMATION OPERAATIONS ACROSS INFOSPHERES:TIONS ACROSS INFOSPHERES:
ASSURED INFORMATION SHARINGASSURED INFORMATION SHARING
Scientific/Technical Approach
Conduct experiments as to how much information is
lost as a result of enforcing security policies in the
case of trustworthy partners
Develop more sophisticated policies based on role-
based and usage control based access control
models
Develop techniques based on game theoretical
strategies to handle partners who are semi-
trustworthy
Develop data mining techniques to carry out
defensive and offensive information operations
Accomplishments
Developed an experimental system for
determining information loss due to
security policy enforcement
Developed a strategy for applying game
theory for semi-trustworthy partners;
simulation results
Developed data mining techniques for
conducting defensive operations for
untrustworthy partners
Challenges
Handling dynamically changing trust
levels; Scalability
Objectives
Develop a Framework for Secure and Timely Data
Sharing across Infospheres
Investigate Access Control and Usage Control
policies for Secure Data Sharing
Develop innovative techniques for extracting
information from trustworthy, semi-trustworthy
and untrustworthy partners Component
Data/Policy for
Agency A
Data/Policy for Coalition
Publish Data/Policy
Component
Data/Policy for
Agency C
Component
Data/Policy for
Agency B
Publish Data/Policy
Publish Data/Policy
www.kellytechno.com
6. Our Approach
• Policy-based Information Sharing
• Integrate the Medicaid claims data and mine the data;
• Enforce policies and determine how much information has
been lost (Trustworthy partners);
• Application of Semantic web technologies
• Apply game theory and probing to extract information from
semi-trustworthy partners
• Conduct Active Defence and determine the actions of an
untrustworthy partner
– Defend ourselves from our partners using data analytics
techniques
– Conduct active defence – find our what our partners are
doing by monitoring them so that we can defend our
selves from dynamic situations
www.kellytechno.com
9. SECURE QUERY PROCESSING WITH
HADOOP/MAPREDUCE
We have studied clouds based on Hadoop
Query rewriting and optimization techniques designed and
implemented for two types of data
(i) Relational data: Secure query processing with HIVE
(ii) RDF data: Secure query processing with SPARQL
Demonstrated with XACML policies
Joint demonstration with Kings College and University of
Insubria
First demo (2011): Each party submits their data and policies
Our cloud will manage the data and policies
Second demo (2012): Multiple clouds
www.kellytechno.com
10. Fine-grained Access Control with Hive
System Architecture
Table/View definition and loading,
Users can create tables as well as
load data into tables. Further, they
can also upload XACML policies
for the table they are creating.
Users can also create XACML
policies for tables/views.
Users can define views only if they
have permissions for all tables
specified in the query used to
create the view. They can also
either specify or create XACML
policies for the views they are
defining.
CollaborateCom 2010
www.kellytechno.com
11. Server
Backend
SPARQL Query Optimizer for Secure
RDF Data Processing
Web Interface
Data Preprocessor
N-Triples
Converter
Prefix Generator
Predicate
Based Splitter
Predicate Object
Based Splitter
MapReduce Framework
Parser
Query Validator
& Rewriter
XACML PDP
Plan
Generator
Plan Executor
Query Rewriter By
Policy
New Data Query
Answer
To build an efficient
storage mechanism
using Hadoop for
large amounts of
data (e.g. a billion
triples); build an
efficient query
mechanism for data
stored in Hadoop;
Integrate with Jena
Developed a query
optimizer and query
rewriting techniques
for RDF Data with
XACML policies and
implemented on top
of JENA
IEEE Transactions
on Knowledge and
Data Engineering,
2011
www.kellytechno.com
12. DEMONSTRATION: CONCEPT OF
OPERATION
User Interface Layer
Fine-grained Access
Control with Hive
SPARQL Query
Optimizer for Secure
RDF Data Processing
Relational
Data
RDF Data
Agency
1
Agency
2
Agency
n
…
www.kellytechno.com
14. RDF-BASED POLICY ENGINE ON THE
CLOUD
Policy
Transformation
Layer
ResultQuery
DB DB RDF
Policy Parser Layer
Regular Expression-Query
Translator
Data Controller Provenance Controller
. . .
RDF
XML
Policy / Graph
Transformation Rules
Access Control/ Redaction
Policy (Traditional Mechanism)
User Interface Layer
High Level Specification
Policy
Translator
A testbed for evaluating different policy sets over
different data representation. Also supporting
provenance as directed graph and viewing policy
outcomes graphically
Determine how access is granted to a
resource as well as how a document is
shared
User specify policy: e.g., Access Control,
Redaction, Released Policy
Parse a high-level policy to a low-level
representation
Support Graph operations and
visualization. Policy executed as graph
operations
Execute policies as SPARQL queries over
large RDF graphs on Hadoop
Support for policies over Traditional
data and its provenance
IFIP Data and Applications Security,
2010, ACM SACMAT 2011www.kellytechno.com
15. INTEGRATION WITH
ASSURED INFORMATION SHARING:
User Interface Layer
RDF Data
Preprocessor
Policy Translation and
Transformation Layer
MapReduce
Framework for Query
Processing
Hadoop HDFS
Agency
1
Agency
2
Agency
n
…
RDF Data
and
Policies
SPARQL
Query
Result
www.kellytechno.com
16. ARCHITECTURE
Policy Engine
Provenance
Agency 1
Agency 2
Agency n
User Interface Layer
Connection Interface
RDF GraphPolicy Request
RDF Graph: ModelRDF Query: SPARQL
RDBMS Connection: DB
Connection: Cloud
Connection: Text
Cloud-based
Store
Local
Access Control
Combined
Redaction
Policy n-2
Policy n-1 Access Control
Combined
Redaction
Policy n
www.kellytechno.com
17. POLICY RECIPROCITY
Agency 1 wishes to share its resources if Agency 2 also
shares its resources with it
Use our Combined policies
Allow agents to define policies based on reciprocity and mutual interest
amongst cooperating agencies
SPARQL query:
SELECT B
FROM NAMED uri1 FROM NAMED uri2
WHERE P
www.kellytechno.com
18. DEVELOP AND SCALE POLICIES
Agency 1 wishes to extend its existing policies with
support for constructing policies at a finer granularity.
The Policy engine
Policy interface that should be implemented by all policies
Add newer types of policies as needed
www.kellytechno.com
19. JUSTIFICATION OF RESOURCES
Agency 1 asks Agency 2 for a justification of resource R2
Policy engine
Allows agents to define policies over provenance
Agency 2 can provide the provenance to Agency 1
But protect it by using access control or redaction policies
www.kellytechno.com
20. OTHER EXAMPLE POLICIES
Agency 1 shares a resource with Agency 2
provided Agency 2 does not share with Agency 3
Agency 1 shares a resource with Agency 2
depending on the content of the resource or until
a certain time
Agency 1 shares a resource R with agency 2
provided Agency 2 does not infer sensitive data S
from R (inference problem)
Agency 1 shares a resource with Agency 2
provided Agency 2 shares the resource only with
those in its organizational (or social) network
www.kellytechno.com
21. ANALYZING AND SECURING
SOCIAL NETWORKS IN THE CLOUD
ANALYTICS
LOCATION MINING FROM ONLINE SOCIAL
NETWORKS
PREDICTING THREATS FROM SOCIAL NETWORK
DATA, SENTIMENT ANALYSIS
CLOUD PLATFORM FOR IMPLEMENTATION
SECURITY AND PRIVACY
PREVENTING THE INFERENCE OF PRIVATE
ATTRIBUTES (LIBERAL OR CONSERVATIVE; GAY OR
STRAIGHT)
ACCESS CONTROL IN SOCIAL NETWORKS
CLOUD PLATFORM FOR IMPLEMENTATION
www.kellytechno.com
22. SECURITY POLICIES FOR ON-
LINE SOCIAL NETWORKS (OSN)
Security Policies ate Expressed in SWRL (Semantic
Web Rules Language) examples
www.kellytechno.com
23. SECURITY POLICY ENFORCEMENT
A reference monitor evaluates the requests.
Admin request for access control could be evaluated by
rule rewriting
Example: Assume Bob submits the following admin request
Rewrite as the following rule
www.kellytechno.com
25. SECURE SOCIAL NETWORKING IN THE
CLOUD WITH TWITTER-STORM
User Interface Layer
Fine-grained Access
Control with Hive
SPARQL Query
Optimizer for Secure
RDF Data Processing
Relational
Data
RDF Data
Social
Network 1
Social
Network 2
Social Network N
…
www.kellytechno.com
26. SECURE STORAGE AND QUERY
PROCESSING IN A HYBRID CLOUD
The use of hybrid clouds is an emerging trend in cloud
computing
Ability to exploit public resources for high throughput
Yet, better able to control costs and data privacy
Several key challenges
Data Design: how to store data in a hybrid cloud?
Solution must account for data representation used
(unencrypted/encrypted), public cloud monetary costs and query workload
characteristics
Query Processing: how to execute a query over a hybrid
cloud?
Solution must provide query rewrite rules that ensure the correctness of a
generated query plan over the hybrid cloud
www.kellytechno.com
27. HYPERVISOR INTEGRITY AND
FORENSICS
IN THE CLOUD
Cloud integrity &
forensics
Hardware Layer
Virtualization Layer (Xen,
vSphere)
Linux Solaris XP MacOS
Secure control flow of hypervisor code
Integrity via in-lined reference monitor
Forensics data extraction in the cloud
Multiple VMs
De-mapping (isolate) each VM memory from physical memory
Hypervisor
OS
Applications
integrity
forensics
www.kellytechno.com
28. CLOUD-BASED MALWARE DETECTION
Benign
Buffer
Feature
extraction and
selection using
Cloud
Training &
Model update
Unknown
executable
Feature
extraction
Classify
ClassMalware
Remove Keep
Stream of known malware or
benign executables
Ensemble of
Classification
models
www.kellytechno.com
29. CLOUD-BASED MALWARE DETECTION
Binary feature extraction involves
Enumerating binary n-grams from the binaries and selecting the best n-
grams based on information gain
For a training data with 3,500 executables, number of distinct 6-grams
can exceed 200 millions
In a single machine, this may take hours, depending on available
computing resources – not acceptable for training from a stream of
binaries
We use Cloud to overcome this bottleneck
A Cloud Map-reduce framework is used
to extract and select features from each chunk
A 10-node cloud cluster is 10 times faster than a single node
Very effective in a dynamic framework, where malware characteristics
change rapidly
www.kellytechno.com
30. IDENTITY MANAGEMENT
CONSIDERATIONS IN A CLOUD
Trust model that handles
(i) Various trust relationships, (ii) access control policies based on
roles and attributes, iii) real-time provisioning, (iv) authorization,
and (v) auditing and accountability.
Several technologies are being examined to develop the
trust model
Service-oriented technologies; standards such as SAML and
XACML; and identity management technologies such as OpenID.
Does one size fit all?
Can we develop a trust model that will be applicable to all types of
clouds such as private clouds, public clouds and hybrid clouds
Identity architecture has to be integrated into the cloud
architecture.
www.kellytechno.com
31. Big Data and the Cloud
0 Big Data describes large and complex data that cannot be managed by
traditional data management tools
0 From Petabytes to Zettabytes to Exabytes of data
0 Need tools for capture, storage, search, sharing, analysis, visualization of big
data.
0 Examples include
- Web logs, RFID and surveillance data, sensor networks, social network data
(graphs), text and multimedia, data pertaining to astronomy, atmospheric
science, genomics, biogeochemical, biological fields, video archives
0 Big Data Technologies
0 Hadoop/MapReduce Platform, HIVE Platform, Twitter Storm Platform, Google
Apps Engine, Amazon EC2 Cloud, Offerings from Oracle and IBM for Big Data
Management, Other: Cassandra, Mahut, PigLatin, - - - -
0 Cloud Computing is emerging a critical tool for Big Data Management
0 Critical to maintain Security and Privacy for Big Data
www.kellytechno.com
32. Security and Privacy for Big Data
0 Secure Storage and Infrastructure
0 How can technologies such as Hadoop and MapReduce be
Secured
0 Secure Data Management
0 Techniques for Secure Query Processing
0 Examples: Securing HIVE, Cassandra
0 Big Data for Security
0 Analysis of Security Data (e.g., Malware analysis)
0 Regulations, Compliance Governance
0 What are the regulations for storing, retaining, managing,
transferring and analyzing Big Data
0 Are the corporations compliance with the regulations
0 Privacy of the individuals have to be maintained not just for raw
data but also for data integration and analytics
0 Roles and Responsibilities must be clearly defined
www.kellytechno.com
33. Security and Privacy for Big Data
0 Regulations Stifling Innovation?
0Major Concern is too many regulations will stifle
Innovation
0Corporations must take advantage of the Big Data
technologies to improve business
0But this could infringe on individual privacy
0Regulations may also interfere with Privacy – example
retaining the data
0Challenge: How can one carry out Analytics and still
maintain Privacy?
0 National Science F Workshop Planned for Spring 2014 at
the University of Texas at Dallas
www.kellytechno.com
34. EDUCATION ON SECURE CLOUD
COMPUTING AND RELATED
TECHNOLOGIES
Secure Cloud Computing
NSF Capacity Building Grant on Assured Cloud Computing
Introduce cloud computing into several cyber security courses
Completed courses
Data and Applications Security, Data Storage, Digital Forensics, Secure
Web Services
Computer and Information Security
Capstone Course
One course that covers all aspects of assured cloud computing
Week long course to be given at Texas Southern University
Analyzing and Securing Social Networks
Big Data Analytics and Security
www.kellytechno.com
35. DIRECTIONS
Secure VMM and VNM
Designing Secure XEN VMM
Developing automated techniques for VMM introspection
Determine a secure network infrastructure for the cloud
Integrate Secure Storage Algorithms into Hadoop
Identity Management in the Cloud
Secure cloud-based Big Data Management/Social
Networking
www.kellytechno.com
36. RELATED BOOKS
Developing and Securing the Cloud, CRC Press
(Taylor and Francis), November 2013
(Thuraisingham)
Secure Data Provenance and Inference Control
with Semantic Web, CRC Press 2014, In Print
(Cadenhead, Kantarcioglu, Khadilkar,
Thuraisingham)
Analyzing and Securing Social Media, CRC Press,
2014, In preparation (Abrol, Heatherly, Khan,
Kantarcioglu, Khadilkar, Thuraisingham)
www.kellytechno.com
Another research problem we are working on is to address the “hypervisor integrity and forensic issues in the cloud”.
More specifically, we don’t want to a malicious VM to compromise other VMs, so we need to ensure the integrity of the hypervisors.
Our approach is to instrument hypervisor code, and verify the integrity at run-time (like Kevin’s in-lin reference monitor)
The other problem we are dealing with is how to extract the forensic data when one VM gets compromised.
The challenge is there are multiple VMs in a cloud, we have to isolate each VM memory from the physical memory. Currently, we are developing techniques to handle this problem.