SlideShare a Scribd company logo
1 of 12
Download to read offline
Security and Privacy in a Big Data World
                         Dr. Flavio Villanustre, CISSP, LexisNexis Risk Solutions
VP of Information Security & Lead for the HPCC Systems open source initiative
                                                                28 January 2013
But what is Big Data?
•   Gartner told us that it’s defined by its four dimensions:
       •   Volume
       •   Velocity
       •   Variety
       •   Complexity
•   Driven by the proliferation of social media, sensors, the
    Internet of Things and the such (a lot of the latter)
•   Became accessible thanks to open source distributed data
    intensive platforms (for example, the open source
    LexisNexis HPCC Systems platform and Hadoop)
•   Made popular by consumer oriented services such as
    recommendation systems and search engines


                                                                2
Big Data platforms: key design principles

•   Distributed local store
    • Move algorithm to the data – exploit locality
•   Many data problems are embarrassingly parallel
    • Leverage massive resource aggregation:
       •   Thousands of execution cores
       •   Hundreds of disk controllers
       •   Hundreds of network interfaces
       •   Terabytes of memory and massive memory bandwidth
•   Storage is cheap
•   Moving data into the system takes time (hence keep the
    data around, if possible)
•   It’s fine (and encouraged) to perform iterative exploration
•   In the end: It’s just and all about the data
                                                                  3
A timeline of the main open source Big Data platforms

 LexisNexis designs
 the HPCC Systems                             Google’s
platform to meet its                     MapReduce Paper        First Hadoop                 These platforms gain
own Big Data Needs.                        is Published.           Summit                    mainstream adoption




      Late
    90s/Early          2001               2004         2007            2008           2011               2012
     2000s



             The first few systems are              First Release of           The LN HPCC Systems
              sold to companies and               Hadoop (designed              platform is officially
                   organizations                  after Google’s Map            released as an open
                                                     Reduce ideas)                 source project




                                                                                                                4
Just when we thought that we knew data security…
Big Data is not your dad’s data
   a.   More data sources (beware! data + data > 2 * information)
   b.   Boiling the ocean is at the reach of your hand
   c.   Public clouds offer scalability but introduce risks
   d.   Distributed data stores can blur boundaries
   e.   Leveraging diverse skills implies more people accessing the data

Old tricks may not work
   a.   Tokenization only goes so far
   b.   Encryption at rest just protects against misplaced hardware
   c.   Tracking data provenance is hard
   d.   Enforcing data access controls can quickly get unwieldy
   e.   Conveying policy information across multiple systems is hard



                                                                           5
The ever present challenges

•   Security
    •   Keeping the bad guys out
    •   Making sure the good guys are good and stay good
    •   Preventing mistakes
    •   Disposing of unnecessary/expired data
    •   Enforcing “least privilege” and “need to know” basis
•   Privacy
    • Statistics safer than aggregates
    • Aggregates safer than tokenized samples
    • Tokenized samples safer than individuals
    • Don’t underestimate the power of de-anonymization
    • Mistakes in privacy are irreversible
•   Security <> Privacy

                                                               6
Be wary of “tokenization in a box”

•   Tokenized dataset * Tokenized dataset ~= identifiable data

•   The problem of eliminating inference is NP-complete

•   Several examples in the last decade:

    •   The “Netflix case”

    •   The “Hospital discharge data case”

    •   The “AOL case”


                                                                 7
Common sense to the rescue
•   Track data provenance, permissible use and expiration
    through metadata (data labels and RCM)
•   Enforce fine granular access controls through code/data
    encapsulation (data access wrappers)
•   Utilize homogeneous platforms that allow for end-to-end
    policy preservation and enforcement
•   Deploy (and properly configure) Network and host based
    Data Loss Prevention
•   A comprehensive data governance process is king
•   Use proven controls (Homomorphic encryption and PIR
    are, so far, only theoretical concepts)


                                                              8
And always keep in mind that…


• Access to the hardware ~= Access to the data
   •   Encryption at rest only mitigates the risk for the isolated
       hard drive, but NOT if the decryption key goes with it
   •   Compromise of a running system is NOT mitigated by
       encryption of data at rest

• Virtualization may increase efficiency, but…
   •   In virtual environments, s/he who has access to your
       VMM/Hypervisor, also has access to your data
   •   Cross-VM side-channel attacks are not just theoretical



                                                                     9
Remember


•   Data exhibits its own form of quantum entanglement:
    once a copy is compromised, all other copies instantly are

•   The closer to the source the [filtering, grouping,
    tokenization] is applied, the lower the risk

•   YCLWYDH! You can’t lose what you don’t have (data
    destruction)




                                                                 10
Useful Links
 LexisNexis Risk Solutions: http://lexisnexis.com/risk
 LexisNexis Open Source HPCC Systems Platform: http://hpccsystems.com
 Cross-VM side-channel attacks:
  http://blog.cryptographyengineering.com/2012/10/attack-of-week-cross-
  vm-timing-attacks.html
 K-Anonymity: a model for protecting privacy:
  http://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity.p
  df
 Robust de-anonymization of Large Sparse Datasets (the Netflix case):
  http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
 Broken Promises of Privacy: Responding to the surprising failure of
  anonymization:
  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006
 Tamper detection and Relationship Context Metadata:
  http://blogs.gartner.com/ian-glazer/2011/08/19/follow-up-from-catalyst-
  2011-tamper-detection-and-relationship-context-metadata/

                                                          The HPCC Systems Platform   11
Questions?




             Email: info@hpccsystems.com


                                           The HPCC Systems Platform   12

More Related Content

What's hot

Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized Deduplication
Prem Rao
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computing
Alexander Decker
 
Oruta privacy preserving public auditing
Oruta privacy preserving public auditingOruta privacy preserving public auditing
Oruta privacy preserving public auditing
shakas007
 
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Mahmuda Rahman
 
Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
Shantanu Sharma
 
Encryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloudEncryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloud
prjpublications
 

What's hot (20)

a hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplicationsa hybrid cloud approach for secure authorized reduplications
a hybrid cloud approach for secure authorized reduplications
 
A Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud ComputingA Study of Data Storage Security Issues in Cloud Computing
A Study of Data Storage Security Issues in Cloud Computing
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized Deduplication
 
How One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonomHow One to One Sharing Enforces Secure Collaboration - xonom
How One to One Sharing Enforces Secure Collaboration - xonom
 
11.cyber forensics in cloud computing
11.cyber forensics in cloud computing11.cyber forensics in cloud computing
11.cyber forensics in cloud computing
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Oruta privacy preserving public auditing
Oruta privacy preserving public auditingOruta privacy preserving public auditing
Oruta privacy preserving public auditing
 
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
Analysis-of-Security-Algorithms-in-Cloud-Computing [Autosaved]
 
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
DDS-to-JSON and DDS Real-time Data Storage with MongoDBDDS-to-JSON and DDS Real-time Data Storage with MongoDB
DDS-to-JSON and DDS Real-time Data Storage with MongoDB
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Thoughts on Cybersecurity
Thoughts on CybersecurityThoughts on Cybersecurity
Thoughts on Cybersecurity
 
The “obsession” with checksums by Helen Hockx-Yu
The “obsession” with checksums by Helen Hockx-YuThe “obsession” with checksums by Helen Hockx-Yu
The “obsession” with checksums by Helen Hockx-Yu
 
Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)Reactive Systems with Data Distribution Service (DDS)
Reactive Systems with Data Distribution Service (DDS)
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
Improved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providersImproved deduplication with keys and chunks in HDFS storage providers
Improved deduplication with keys and chunks in HDFS storage providers
 
C017421624
C017421624C017421624
C017421624
 
Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
 
Paul Stokes (Jisc) - A provocation about preservation
Paul Stokes (Jisc) - A provocation about preservationPaul Stokes (Jisc) - A provocation about preservation
Paul Stokes (Jisc) - A provocation about preservation
 
Encryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloudEncryption based multi user manner secured data sharing and storing in cloud
Encryption based multi user manner secured data sharing and storing in cloud
 
Cloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion DetectionCloud Computing Using Encryption and Intrusion Detection
Cloud Computing Using Encryption and Intrusion Detection
 

Viewers also liked

Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cybera Inc.
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Kato Mivule
 
Conference Powerpoint Presentations
Conference Powerpoint PresentationsConference Powerpoint Presentations
Conference Powerpoint Presentations
apdh1312
 
The Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingThe Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud Computing
Ankit Singh
 

Viewers also liked (20)

IBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big data
 
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Time Of Courage
Time Of CourageTime Of Courage
Time Of Courage
 
Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and ReuseCyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
Cyber Summit 2016: Privacy Issues in Big Data Sharing and Reuse
 
走出IT人才荒 研討會
走出IT人才荒 研討會走出IT人才荒 研討會
走出IT人才荒 研討會
 
Data Privacy &amp; Security Update 2012
Data Privacy &amp; Security Update 2012Data Privacy &amp; Security Update 2012
Data Privacy &amp; Security Update 2012
 
Privacy and Big Data Overload!
Privacy and Big Data Overload!Privacy and Big Data Overload!
Privacy and Big Data Overload!
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
The Impact of Cloud: Cloud Computing Security and Privacy
The Impact of Cloud: Cloud Computing Security and PrivacyThe Impact of Cloud: Cloud Computing Security and Privacy
The Impact of Cloud: Cloud Computing Security and Privacy
 
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Information Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data MiningInformation Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data Mining
 
Big data
Big dataBig data
Big data
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Paper presentation held at national seminar
Paper presentation held at national seminarPaper presentation held at national seminar
Paper presentation held at national seminar
 
Big data security
Big data securityBig data security
Big data security
 
Conference Powerpoint Presentations
Conference Powerpoint PresentationsConference Powerpoint Presentations
Conference Powerpoint Presentations
 
The Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud ComputingThe Security and Privacy Threats to Cloud Computing
The Security and Privacy Threats to Cloud Computing
 

Similar to Global bigdata conf_01282013

2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
Chris Dwan
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 

Similar to Global bigdata conf_01282013 (20)

Data Analytics Governance and Ethics
Data Analytics Governance and EthicsData Analytics Governance and Ethics
Data Analytics Governance and Ethics
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
The Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of dataThe Internet-of-things: Architecting for the deluge of data
The Internet-of-things: Architecting for the deluge of data
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Toward a Mobile Data Commons
Toward a Mobile Data CommonsToward a Mobile Data Commons
Toward a Mobile Data Commons
 
Peer-to-peer Systems.ppt
Peer-to-peer Systems.pptPeer-to-peer Systems.ppt
Peer-to-peer Systems.ppt
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
 
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End UsersFrom Single Purpose to Multi Purpose Data Lakes - Broadening End Users
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability Management
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability Management
 
110307 cloud security requirements gourley
110307 cloud security requirements gourley110307 cloud security requirements gourley
110307 cloud security requirements gourley
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Big Data
Big Data Big Data
Big Data
 

More from HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Global bigdata conf_01282013

  • 1. Security and Privacy in a Big Data World Dr. Flavio Villanustre, CISSP, LexisNexis Risk Solutions VP of Information Security & Lead for the HPCC Systems open source initiative 28 January 2013
  • 2. But what is Big Data? • Gartner told us that it’s defined by its four dimensions: • Volume • Velocity • Variety • Complexity • Driven by the proliferation of social media, sensors, the Internet of Things and the such (a lot of the latter) • Became accessible thanks to open source distributed data intensive platforms (for example, the open source LexisNexis HPCC Systems platform and Hadoop) • Made popular by consumer oriented services such as recommendation systems and search engines 2
  • 3. Big Data platforms: key design principles • Distributed local store • Move algorithm to the data – exploit locality • Many data problems are embarrassingly parallel • Leverage massive resource aggregation: • Thousands of execution cores • Hundreds of disk controllers • Hundreds of network interfaces • Terabytes of memory and massive memory bandwidth • Storage is cheap • Moving data into the system takes time (hence keep the data around, if possible) • It’s fine (and encouraged) to perform iterative exploration • In the end: It’s just and all about the data 3
  • 4. A timeline of the main open source Big Data platforms LexisNexis designs the HPCC Systems Google’s platform to meet its MapReduce Paper First Hadoop These platforms gain own Big Data Needs. is Published. Summit mainstream adoption Late 90s/Early 2001 2004 2007 2008 2011 2012 2000s The first few systems are First Release of The LN HPCC Systems sold to companies and Hadoop (designed platform is officially organizations after Google’s Map released as an open Reduce ideas) source project 4
  • 5. Just when we thought that we knew data security… Big Data is not your dad’s data a. More data sources (beware! data + data > 2 * information) b. Boiling the ocean is at the reach of your hand c. Public clouds offer scalability but introduce risks d. Distributed data stores can blur boundaries e. Leveraging diverse skills implies more people accessing the data Old tricks may not work a. Tokenization only goes so far b. Encryption at rest just protects against misplaced hardware c. Tracking data provenance is hard d. Enforcing data access controls can quickly get unwieldy e. Conveying policy information across multiple systems is hard 5
  • 6. The ever present challenges • Security • Keeping the bad guys out • Making sure the good guys are good and stay good • Preventing mistakes • Disposing of unnecessary/expired data • Enforcing “least privilege” and “need to know” basis • Privacy • Statistics safer than aggregates • Aggregates safer than tokenized samples • Tokenized samples safer than individuals • Don’t underestimate the power of de-anonymization • Mistakes in privacy are irreversible • Security <> Privacy 6
  • 7. Be wary of “tokenization in a box” • Tokenized dataset * Tokenized dataset ~= identifiable data • The problem of eliminating inference is NP-complete • Several examples in the last decade: • The “Netflix case” • The “Hospital discharge data case” • The “AOL case” 7
  • 8. Common sense to the rescue • Track data provenance, permissible use and expiration through metadata (data labels and RCM) • Enforce fine granular access controls through code/data encapsulation (data access wrappers) • Utilize homogeneous platforms that allow for end-to-end policy preservation and enforcement • Deploy (and properly configure) Network and host based Data Loss Prevention • A comprehensive data governance process is king • Use proven controls (Homomorphic encryption and PIR are, so far, only theoretical concepts) 8
  • 9. And always keep in mind that… • Access to the hardware ~= Access to the data • Encryption at rest only mitigates the risk for the isolated hard drive, but NOT if the decryption key goes with it • Compromise of a running system is NOT mitigated by encryption of data at rest • Virtualization may increase efficiency, but… • In virtual environments, s/he who has access to your VMM/Hypervisor, also has access to your data • Cross-VM side-channel attacks are not just theoretical 9
  • 10. Remember • Data exhibits its own form of quantum entanglement: once a copy is compromised, all other copies instantly are • The closer to the source the [filtering, grouping, tokenization] is applied, the lower the risk • YCLWYDH! You can’t lose what you don’t have (data destruction) 10
  • 11. Useful Links  LexisNexis Risk Solutions: http://lexisnexis.com/risk  LexisNexis Open Source HPCC Systems Platform: http://hpccsystems.com  Cross-VM side-channel attacks: http://blog.cryptographyengineering.com/2012/10/attack-of-week-cross- vm-timing-attacks.html  K-Anonymity: a model for protecting privacy: http://dataprivacylab.org/dataprivacy/projects/kanonymity/kanonymity.p df  Robust de-anonymization of Large Sparse Datasets (the Netflix case): http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf  Broken Promises of Privacy: Responding to the surprising failure of anonymization: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006  Tamper detection and Relationship Context Metadata: http://blogs.gartner.com/ian-glazer/2011/08/19/follow-up-from-catalyst- 2011-tamper-detection-and-relationship-context-metadata/ The HPCC Systems Platform 11
  • 12. Questions? Email: info@hpccsystems.com The HPCC Systems Platform 12