Big Data Security - The Perfect Storm
The Perfect Storm 1991
        It was the storm of the century, boasting waves
        over one hundred feet high a tempest created
        by so rare a combination of factors that
        meteorologists deemed it "the perfect storm."
        When it struck in October 1991, there was
        virtually no warning.




*: http://books.wwnorton.com/books/detail.aspx?ID=5102

2
The Perfect Storm
                                           Increased
                                             profits

                                     Customer
                  Regulations         Support
                  & Breaches                             Increased
                                                           profits
                                                Social
                                                Media
                                 Big
                                 Data
                  Sales &                    Business
                 Marketing                 Improvement
     Increased
       profits                                           Increased
                         Customer    Security              profits
                          Profiles   Analysis


                     Increased            Increased
                       profits              profits

3
Perfect storm




                         Increased    Breach or
             Weaker
More Data               Regulations   Audit Fail
             Security
                                        ($$$)




4
The Perfect Storm

      Big Data is a Time Bomb based on how things are
        coming together
        Big Data deployment is growing fast, rushing into it
           • ROI in focus
           • Security is not part of Strategy
        Shortage in Big Data skills
           • People don’t know what they are doing
        Big Data Security solutions are not effective
        General shortage in Security skills



5
Mankind Created Data

                  Data
              40000
                (exabyte)

              35000
              30000
              25000
              20000
              15000
              10000
               5000
                   0
                            2005   2010   2015   2020   Year


Source: IBM

 6
What is
    Big Data?

7
What is Big Data?




Source: IBM 0307_Guardium_Final-.pdf

 8
What Happens in an Internet Minute?




Source: Intel

 9
Four Dimensions of Big Data




Source: IBM 0307_Guardium_Final-.pdf

 10
Big Data Sources




Source: IBM

 11
Business-driven Outcomes




Source: IBM

 12
How is
     Big Data
     Different?

13
How is Big Data Different?

             Why It’s Different Architecturally:
                • Shared’ data
                • Inter-node communication
                • No separate archive – all data is online
                • No Security – breaches go undetected

           Why It’s Different Operationally:
                • Insider data access
                • Authentication of applications and nodes
                • Audit and logging


Source: Securosis SecuringBigData_FINAL.pdf

14
What is
     The Problem Big
      Data Security?

15
Big Data and The Insider Threat




16
17
Many Ways to Hack Big Data


                             ETL Tools       BI Reporting          RDBMS                             Hackers
                           Pig (Data Flow)    Hive (SQL)            Sqoop                            Unvetted




                                                                             Avro (Serialization)
                                                                                                    Applications
          (Coordination)




                                         MapReduce                                                       Or
            Zookeeper




                               (Job Scheduling/Execution System)                                      Ad Hoc
                                                                                                     Processes
                                Hbase (Column DB)
                                                                                                     Privileged
                                           HDFS                                                        Users
                               (Hadoop Distributed File System)




Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase

 18
The Big Data platform
      may not be secure,
            but your
          Information
        can be secure.
19
A Changing
       Threat
     Landscape
20
New York Times about China Attack on US




21
One Single Sample: The Chinese APT1 group
        Compromised 141 companies in 20 industries

        Stole hundreds                of terabytes of data
              Technology blueprints, Proprietary manufacturing processes,
              Test results, Business plans, Pricing documents, Partnership
              agreements, Emails




*: http://intelreport.mandiant.com/Mandiant_APT1_Report.pdf

22
Dominating “hacktivism”




                           Attacks by Anonymous include
                           • 2012: CIA and Interpol
                           • 2011: Sony, Stratfor and HBGary Federal
     Source: http://www.verizonbusiness.com/Products/security/dbir/, http://en.wikipedia.org/wiki/Timeline_of_events_involving_Anonymous



23
http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF

24
DataLossBD - Incidents Over Time - Increasing




http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF

25
Breakout of Security Incidents by Country




26   http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF
Ranking Volume and Type of Security Incidents*




*: % of Escalated Alerts

http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF

27
Security Incidents - Malicious Code*




     *: % of Escalated Alerts

28   http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF
What is the
     Cost of A Breach?


29
Cost of Data Breach per Record
     Independently Conducted by Ponemon Institute LLC March 2012




 http://www.symantec.com/content/en/us/about/media/pdfs/b-ponemon-2011-cost-of-data-breach-global.en-us.pdf


30
How are Breaches Discovered?

                               Notified by law enforcement
               Third-party fraud detection (e.g., CPP)
               Reported by customer/partner affected
                        Brag or blackmail by perpetrator
                                                           Unknown
            Witnessed and/or reported by employee
                                                             Other(s)
                    Internal fraud detection mechanism
          Financial audit and reconciliation process
                    Log analysis and/or review process
          Unusual system behavior or performance

                                                                           0       10       20       30   40   50   60   70 %

     By percent of breaches . Source: 2012, http://www.verizonbusiness.com/Products/security/dbir/



31
What is the
       Trend in
     Regulations?

32
Regulations: Be Proactive in Protecting Data




33
HIPAA Omnibus - Penalties if PHI isn’t encrypted




http://www.diagnosticimaging.com/physicians-experts-make-case-secure-data-exchange-himss13


34
Regulations: Be Proactive in Protecting Data
         Big Data must prepare for the changing landscape
           • Trend: Encryption requirements are increasing
         PCI DSS, US State Laws
         Health Data Regulations
           • Need for Data Segmentation (tokenization, encryption
             or masking)
           • Extra Sensitive Data (drug abuse, HIV codes, sex
             abuse and more)
         Ponemon Institute “Big Data Analytics in Cyber
         Defense”
           • 61 percent will solve pressing security issues
           • Only 35 percent currently have security solutions

35
Balancing security and data insight

         Tug of war between security and data insight
         Big Data is designed for access, not security
         Privacy regulations require de-identification which
         creates problems with privileged users in an access
         control security model
         Only way to truly protect data is to provide data-
         level protection
         Traditional means of security don’t offer granular
         protection that allows for seamless data use




36
The Solution is
      Finally Here

37
The Solution - Preventing Misuse of Data

                                                   Attackers

             User                                     Hackers
                                  Application
                                                      Unvetted
                                                     Applications
                    Data Misuse
                     Prevention
                                                       Ad Hoc
       Data                                           Processes
     Protection
       Policy                                         Privileged
                                                        Users

                                                    Administrators
                                  Selective Data
                                    Protection
          Issued
          Patents


38
Support Business Applications
                                          4 digits clear




                                              90%             98 %
                                                           Application
                                                           transparent

              6 digits clear

                  8%
                               6 digits
                               encoded
                                 2%
                                                           2 % Application
                                                              changes


                                 PAN
39
How can we handle the Risk with Big Data?
      Risk


       High
                           Traditional
                             Access
                             Control                   Creativity
                                                       Happens
                                                      At the edge



       Low                       Data Tokens
                                                              Access
                  I                               I
                                                            Right Level
                Less                             More
              Small Data                       Big Data


40
Securing the Data Flow



                    ETL Tools         BI Reporting   RDBMS

                  Pig (Data Flow)     Hive (SQL)      Sqoop

                                MapReduce
                      (Job Scheduling/Execution System)

                       Hbase (Column DB)

                                   HDFS
                       (Hadoop Distributed File System)


 Legacy Systems                     Big Data                  Legacy Systems




41
Support Data Classification and Analytics




                           Application




       Data in Clear     Encrypted File
                                            Secured Data Fields
                                                (encoded)



42
The Process of Automating Security for Big Data
                             Discover sensitive data


                                Understand



      Control
     usage of                                                      Implement
                 Monitor           Big Data            Integrate    Solution
     sensitive
       data




                                   Secure


                           Lock down sensitive data

43
SUMMARY


44
Big Data Security Problem - Summary

     Traditional security solutions cannot bridge the gaps
       between
     1. Data breach protection and compliance
     2. Provide powerful analysis and data insight
     3. Utilize the power of a big data environment.




45
Proactive Data Protection for Big Data
        Know your data flow
           •   Protect the data flow - including legacy systems
        Protecting your data now could save big time and $ in retroactive
        security later
           •   Breaches and audits are on the rise – Organizations that fail to act now risk
               losing their hard earned investments.
        Granular data protection is cost effective
           • Addressing regulations and data breaches
           •   Data available for analytics and other usage
           •   Provide separation of duties for administrative functions
        Catch abnormal access to data
           •   Including (compromised) insider accounts




46

Big data security the perfect storm

  • 1.
    Big Data Security- The Perfect Storm
  • 2.
    The Perfect Storm1991 It was the storm of the century, boasting waves over one hundred feet high a tempest created by so rare a combination of factors that meteorologists deemed it "the perfect storm." When it struck in October 1991, there was virtually no warning. *: http://books.wwnorton.com/books/detail.aspx?ID=5102 2
  • 3.
    The Perfect Storm Increased profits Customer Regulations Support & Breaches Increased profits Social Media Big Data Sales & Business Marketing Improvement Increased profits Increased Customer Security profits Profiles Analysis Increased Increased profits profits 3
  • 4.
    Perfect storm Increased Breach or Weaker More Data Regulations Audit Fail Security ($$$) 4
  • 5.
    The Perfect Storm Big Data is a Time Bomb based on how things are coming together Big Data deployment is growing fast, rushing into it • ROI in focus • Security is not part of Strategy Shortage in Big Data skills • People don’t know what they are doing Big Data Security solutions are not effective General shortage in Security skills 5
  • 6.
    Mankind Created Data Data 40000 (exabyte) 35000 30000 25000 20000 15000 10000 5000 0 2005 2010 2015 2020 Year Source: IBM 6
  • 7.
    What is Big Data? 7
  • 8.
    What is BigData? Source: IBM 0307_Guardium_Final-.pdf 8
  • 9.
    What Happens inan Internet Minute? Source: Intel 9
  • 10.
    Four Dimensions ofBig Data Source: IBM 0307_Guardium_Final-.pdf 10
  • 11.
  • 12.
  • 13.
    How is Big Data Different? 13
  • 14.
    How is BigData Different? Why It’s Different Architecturally: • Shared’ data • Inter-node communication • No separate archive – all data is online • No Security – breaches go undetected Why It’s Different Operationally: • Insider data access • Authentication of applications and nodes • Audit and logging Source: Securosis SecuringBigData_FINAL.pdf 14
  • 15.
    What is The Problem Big Data Security? 15
  • 16.
    Big Data andThe Insider Threat 16
  • 17.
  • 18.
    Many Ways toHack Big Data ETL Tools BI Reporting RDBMS Hackers Pig (Data Flow) Hive (SQL) Sqoop Unvetted Avro (Serialization) Applications (Coordination) MapReduce Or Zookeeper (Job Scheduling/Execution System) Ad Hoc Processes Hbase (Column DB) Privileged HDFS Users (Hadoop Distributed File System) Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase 18
  • 19.
    The Big Dataplatform may not be secure, but your Information can be secure. 19
  • 20.
    A Changing Threat Landscape 20
  • 21.
    New York Timesabout China Attack on US 21
  • 22.
    One Single Sample:The Chinese APT1 group Compromised 141 companies in 20 industries Stole hundreds of terabytes of data Technology blueprints, Proprietary manufacturing processes, Test results, Business plans, Pricing documents, Partnership agreements, Emails *: http://intelreport.mandiant.com/Mandiant_APT1_Report.pdf 22
  • 23.
    Dominating “hacktivism” Attacks by Anonymous include • 2012: CIA and Interpol • 2011: Sony, Stratfor and HBGary Federal Source: http://www.verizonbusiness.com/Products/security/dbir/, http://en.wikipedia.org/wiki/Timeline_of_events_involving_Anonymous 23
  • 24.
  • 25.
    DataLossBD - IncidentsOver Time - Increasing http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF 25
  • 26.
    Breakout of SecurityIncidents by Country 26 http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF
  • 27.
    Ranking Volume andType of Security Incidents* *: % of Escalated Alerts http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF 27
  • 28.
    Security Incidents -Malicious Code* *: % of Escalated Alerts 28 http://public.dhe.ibm.com/common/ssi/ecm/en/wgl03027usen/WGL03027USEN.PDF
  • 29.
    What is the Cost of A Breach? 29
  • 30.
    Cost of DataBreach per Record Independently Conducted by Ponemon Institute LLC March 2012 http://www.symantec.com/content/en/us/about/media/pdfs/b-ponemon-2011-cost-of-data-breach-global.en-us.pdf 30
  • 31.
    How are BreachesDiscovered? Notified by law enforcement Third-party fraud detection (e.g., CPP) Reported by customer/partner affected Brag or blackmail by perpetrator Unknown Witnessed and/or reported by employee Other(s) Internal fraud detection mechanism Financial audit and reconciliation process Log analysis and/or review process Unusual system behavior or performance 0 10 20 30 40 50 60 70 % By percent of breaches . Source: 2012, http://www.verizonbusiness.com/Products/security/dbir/ 31
  • 32.
    What is the Trend in Regulations? 32
  • 33.
    Regulations: Be Proactivein Protecting Data 33
  • 34.
    HIPAA Omnibus -Penalties if PHI isn’t encrypted http://www.diagnosticimaging.com/physicians-experts-make-case-secure-data-exchange-himss13 34
  • 35.
    Regulations: Be Proactivein Protecting Data Big Data must prepare for the changing landscape • Trend: Encryption requirements are increasing PCI DSS, US State Laws Health Data Regulations • Need for Data Segmentation (tokenization, encryption or masking) • Extra Sensitive Data (drug abuse, HIV codes, sex abuse and more) Ponemon Institute “Big Data Analytics in Cyber Defense” • 61 percent will solve pressing security issues • Only 35 percent currently have security solutions 35
  • 36.
    Balancing security anddata insight Tug of war between security and data insight Big Data is designed for access, not security Privacy regulations require de-identification which creates problems with privileged users in an access control security model Only way to truly protect data is to provide data- level protection Traditional means of security don’t offer granular protection that allows for seamless data use 36
  • 37.
    The Solution is Finally Here 37
  • 38.
    The Solution -Preventing Misuse of Data Attackers User Hackers Application Unvetted Applications Data Misuse Prevention Ad Hoc Data Processes Protection Policy Privileged Users Administrators Selective Data Protection Issued Patents 38
  • 39.
    Support Business Applications 4 digits clear 90% 98 % Application transparent 6 digits clear 8% 6 digits encoded 2% 2 % Application changes PAN 39
  • 40.
    How can wehandle the Risk with Big Data? Risk High Traditional Access Control Creativity Happens At the edge Low Data Tokens Access I I Right Level Less More Small Data Big Data 40
  • 41.
    Securing the DataFlow ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop MapReduce (Job Scheduling/Execution System) Hbase (Column DB) HDFS (Hadoop Distributed File System) Legacy Systems Big Data Legacy Systems 41
  • 42.
    Support Data Classificationand Analytics Application Data in Clear Encrypted File Secured Data Fields (encoded) 42
  • 43.
    The Process ofAutomating Security for Big Data Discover sensitive data Understand Control usage of Implement Monitor Big Data Integrate Solution sensitive data Secure Lock down sensitive data 43
  • 44.
  • 45.
    Big Data SecurityProblem - Summary Traditional security solutions cannot bridge the gaps between 1. Data breach protection and compliance 2. Provide powerful analysis and data insight 3. Utilize the power of a big data environment. 45
  • 46.
    Proactive Data Protectionfor Big Data Know your data flow • Protect the data flow - including legacy systems Protecting your data now could save big time and $ in retroactive security later • Breaches and audits are on the rise – Organizations that fail to act now risk losing their hard earned investments. Granular data protection is cost effective • Addressing regulations and data breaches • Data available for analytics and other usage • Provide separation of duties for administrative functions Catch abnormal access to data • Including (compromised) insider accounts 46

Editor's Notes

  • #12  IBM BD usage GBE03519USEN.pdf
  • #13  IBM BD usage GBE03519USEN.pdf
  • #18  http://www.xconomy.com/san-francisco/2013/03/19/should-big-data-businesses-be-forced-to-prevent-hacking/
  • #26 DataLossBD.org Incidents Over Time