The Big Data Cloud:
Are You Ready for the Zettabyte?

Steven C. Markey, MSIS, PMP, CISSP, CIPP, CISM, CISA, STS-EV, CCSK, CompTIA Cloud
                                    Essentials
                         Principal, nControl, LLC
                            Adjunct Professor
President, Cloud Security Alliance – Delaware Valley Chapter (CSA-DelVal)
Big Data Cloud

• Presentation Overview
  – Why Should You Care?
  – Cloud Overview
  – Big Data Overview
  – Cloud-Based Big Data Offerings
  – Securing Cloud-Based DB Solutions
Big Data Cloud
• Why Should You Care
  – Organizational Cost Reduction Requirements
     • Justify Investments
     • Improve Efficiencies (Productivity, Time to Market)
  – Digital Information – 60%~ Annual Growth Rate (AGR)
  – Data Storage – 15-20% AGR Capital Expense (CapEx)
  – Categorization, Classification & Retention Magnify
     • Compliance, Legal & Privacy Regulations
  – Prevalent & Interconnected Business Ecosystems
     •   Supply Chains
     •   Business Process Outsourcers (BPO)
     •   Information Technology Outsourcers (ITO)
     •   Vendor’s Vendors                                    Source: IDC
Source: NIST
Service Delivery Models




                     Source: Swain Techs
Source: Matthew Gardiner, Computer Associates
Big Data Cloud




                 Source: Flickr
Big Data Cloud
• Big Data Overview
  – Aggregated Data from the Following Sources
     • Traditional
     • Source
     • Social
Big Data Cloud
• Traditional Data
  – Database Management Systems
     •   Relational Database Management Systems (RDBMS)
     •   Object-Oriented Database Management Systems (OODBMS)
     •   Non-Relational, Distributed DB Management Systems (NRDBMS)
     •   Mobile Databases (SQLite, Oracle Lite)
  – Online Transaction Processing (OLTP)
     • Real-Time Data Warehousing
  – Online Analytical Processing (OLAP)
     • Operational Data Stores (ODS)
     • Enterprise Data Warehouse (EDW)
Big Data Cloud
• Traditional Data
  – OLAP
     • Business Intelligence (BI)
        – Data Mining
        – Reporting
        – OLAP (Continued)
            » Relational OLAP (ROLAP)
            » Multi-Dimensional OLAP (MOLAP)
            » Hybrid OLAP (HOLAP)


     OLTPODSEDW (Data Marts)BI (Data Mining)
     OLTPODSEDW (Data Marts)BI (Reporting)
     OLTPODSEDW (Data Marts)BI (OLAP)
Big Data Cloud




                 Source: Flickr
Big Data Cloud
• Source Data
  – Log Files
     • Event Logs / Operating System (OS) - Level
     • Appliance / Peripherals
     • Analyzers / Sniffers
  – Multimedia
     • Image Logs
     • Video Logs
  – Web Content Management (WCM)
     • Web Logs
     • Search Engine Optimization (SEO)
        – Web Metadata
Big Data Cloud
• Big Data Overview
  – Aggregators
     • Mostly NRDBMS Implemtations
        – Not only – Structured Query Language (NoSQL)
     • NRDBMS Examples
        – Column Family Stores: BigTable (Google), Cassandra & HBase (Apache)
        – Key-Values Stores: App Engine DataStore (Google), DynamoDB &
          SimpleDB (AWS)
        – Document Databases: CouchDB, MongoDB
        – Graph Databases: Neo4J
Big Data Cloud
• Big Data Overview
  – Serial Processing
     • Hadoop
        – Hadoop Distributed File System (HDFS)
        – Hive – DW
        – Pig – Querying Language
     • Riak
  – Parallel Processing
     • HadoopDB
  – Analytics
     • Google MapReduce
     • Apache MapReduce
     • Splunk (for Security Information / Event Management [SIEM])
Source: Cloudera
Source: Wikispaces
Source: Google
Source: Cloudera
Big Data Cloud
• Cloud-Based Big Data Solutions
  – PaaS
     • DBaaS
        – Amazon Web Services (AWS)
            » DynamoDB
            » SimpleDB
            » Relational Database Service (RDS): Oracle 11g / MySQL
        – Google App Engine
            » Datastore
        – Microsoft SQL Azure
        – Oracle Public Cloud: 11g
     • Processing
        –   AWS Elastic MapReduce (EMR)
        –   Google App Engine MapReduce: Mapper API
        –   Microsoft: Apache Hadoop for Azure
        –   IBM SmartCloud Enterprise on IBM InfoSphere BigInsights Basics
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
• Cloud-Based Database Solutions
  – IaaS
     • Basic Components: Compute & Storage Nodes
           –   AWS Elastic Compute Cloud (EC2)
           –   AWS Elastic Block Store (EBS)
           –   OpenStack Compute (Nova)
           –   OpenStack Storage (Swift)
     • Advanced Components
           – Apache Hadoop
           – Apache Hadoop MapReduce
     • Commercial Applications
           –   Cloudera
           –   DataStax
           –   MapR
           –   Splunk
Big Data Cloud
                         AWS Cloud

      EC2 Availability Zone           S3 Storage



EBS   EBS   EBS    EBS   EBS   EBS
                                     EBS Snapshot

                                     EBS Snapshot

                                     EBS Snapshot

EC2          EC2         EC2         EBS Snapshot

                                     EBS Snapshot




                          Internet

                                                    Source: Amazon
Big Data Cloud
• Big Data in the Cloud Use Cases
  – Public Cloud
     •   AWS: EC2 Hadoop & S3
     •   AWS: EC2 Hadoop, DynamoDB & EMR
     •   AWS: EC2 Linux, Apache (w / Tomcat), DynamoDB & EMR
     •   AWS: EC2 Cloudera Hadoop & EMR
     •   AWS: EC2 Splunk
  – Hybrid
     • Oracle Big Data Appliance & Connector, Google App Engine
     • OpenStack Swift, AWS EC2 Cloudera Hadoop & EMR
  – Private Cloud
     • OpenStack Nova & Swift, Apache Hadoop
     • OpenStack Nova & Swift, Cloudera Hadoop
Big Data Cloud
Source: Flickr
Big Data Cloud
• Securing Cloud-Based NRDBMS Solutions
  – General
     • Focus on Application / Middleware-Level Security
        – SQL Injections Are Still Possible
        – Leverage Application IAM for NRDBMS User Rights Mgmt (URM)
        – Leverage Application & System Logging for Authentication,
          Authorization & Accounting (AAA)
     • Segregation of Duties
        – Read / Write Namespaces
        – Read-Only Namespaces
  – Specific
     • Document
        – Consistency Assurance
     • Key / Value
        – Ensure Referential Integrity
Big Data Cloud
Big Data Cloud
• Securing Big Data in the Cloud
  – Identity & Access Management (IAM)
     • Security Assertion Markup Language (SAML)
     • Representational State Transfer (REST)
        – AWS IAM
        – Windows Azure Access Control Service (ACS)
     • Web Services – Trust Language (WS-Trust)
Source: OASIS
Source: Intuit
Big Data Cloud
• Securing Big Data in the Cloud
  – Identity & Access Management (IAM)
     • Security Assertion Markup Language (SAML)
     • Representational State Transfer (REST)
        – AWS IAM
        – Windows Azure Access Control Service (ACS)
     • Web Services – Trust Language (WS-Trust)
Source: Apache
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
• Securing Big Data in the Cloud
  – Identity & Access Management (IAM)
     • Security Assertion Markup Language (SAML)
     • Representational State Transfer (REST)
        – AWS IAM
        – Windows Azure Access Control Service (ACS)
     • Web Services – Trust Language (WS-Trust)
Big Data Cloud
Big Data Cloud
• Securing Big Data in the Cloud
  – Electronic Discovery (eDiscovery)
     • eDiscovery Reference Model (EDRM)
     • Legal Holds
     • Litigation Response
  – Records & Information Management (RIM)
     •   Generally Accepted Recordkeeping Principles (GARP®)
     •   Information Governance Reference Model (IGRM)
     •   Information Lifecycle Management (ILM)
     •   MIKE2.0
Big Data Cloud
Big Data Cloud
• Privacy & Data Protection for Big Data Clouds
  – Jurisdictions*
     • Regional: EU DPA
     • National: PIPEDA, GLBA, HIPAA / HITECH, COPPA, Safe Harbor
     • Statutory: Bavarian, CA SB 1386 / 24, MA 201 CMR 17, NV SB 227
  – Data Flow & Jurisdictional Adherence
     • Data Sharing with Third Parties
         – Pseudonymization / De-Identification
     • Consent & Notices
  – Contract Clauses
     • Model Contracts
  – Privacy Best Practices
     • Generally Accepted Privacy Principles (GAPP)            * Not all inclusive.
Big Data Cloud
• Presentation Take-Aways
  – Big Data in the Cloud is Here to Stay
  – It Has to be Secure
      – Segregation of Data
      – Access Controls
         – Separation / Segregation of Duties
         – Federated Identities
         – Logging
• Questions?
• Contact
  –   Email: steve@ncontrol-llc.com
  –   Twitter: markes1
  –   LI: http://www.linkedin.com/in/smarkey
  –   CSA-DelVal: http://www.csadelval.org/

Bd cloud v3

  • 1.
    The Big DataCloud: Are You Ready for the Zettabyte? Steven C. Markey, MSIS, PMP, CISSP, CIPP, CISM, CISA, STS-EV, CCSK, CompTIA Cloud Essentials Principal, nControl, LLC Adjunct Professor President, Cloud Security Alliance – Delaware Valley Chapter (CSA-DelVal)
  • 2.
    Big Data Cloud •Presentation Overview – Why Should You Care? – Cloud Overview – Big Data Overview – Cloud-Based Big Data Offerings – Securing Cloud-Based DB Solutions
  • 3.
    Big Data Cloud •Why Should You Care – Organizational Cost Reduction Requirements • Justify Investments • Improve Efficiencies (Productivity, Time to Market) – Digital Information – 60%~ Annual Growth Rate (AGR) – Data Storage – 15-20% AGR Capital Expense (CapEx) – Categorization, Classification & Retention Magnify • Compliance, Legal & Privacy Regulations – Prevalent & Interconnected Business Ecosystems • Supply Chains • Business Process Outsourcers (BPO) • Information Technology Outsourcers (ITO) • Vendor’s Vendors Source: IDC
  • 4.
  • 5.
    Service Delivery Models Source: Swain Techs
  • 6.
    Source: Matthew Gardiner,Computer Associates
  • 7.
    Big Data Cloud Source: Flickr
  • 8.
    Big Data Cloud •Big Data Overview – Aggregated Data from the Following Sources • Traditional • Source • Social
  • 9.
    Big Data Cloud •Traditional Data – Database Management Systems • Relational Database Management Systems (RDBMS) • Object-Oriented Database Management Systems (OODBMS) • Non-Relational, Distributed DB Management Systems (NRDBMS) • Mobile Databases (SQLite, Oracle Lite) – Online Transaction Processing (OLTP) • Real-Time Data Warehousing – Online Analytical Processing (OLAP) • Operational Data Stores (ODS) • Enterprise Data Warehouse (EDW)
  • 10.
    Big Data Cloud •Traditional Data – OLAP • Business Intelligence (BI) – Data Mining – Reporting – OLAP (Continued) » Relational OLAP (ROLAP) » Multi-Dimensional OLAP (MOLAP) » Hybrid OLAP (HOLAP) OLTPODSEDW (Data Marts)BI (Data Mining) OLTPODSEDW (Data Marts)BI (Reporting) OLTPODSEDW (Data Marts)BI (OLAP)
  • 11.
    Big Data Cloud Source: Flickr
  • 12.
    Big Data Cloud •Source Data – Log Files • Event Logs / Operating System (OS) - Level • Appliance / Peripherals • Analyzers / Sniffers – Multimedia • Image Logs • Video Logs – Web Content Management (WCM) • Web Logs • Search Engine Optimization (SEO) – Web Metadata
  • 14.
    Big Data Cloud •Big Data Overview – Aggregators • Mostly NRDBMS Implemtations – Not only – Structured Query Language (NoSQL) • NRDBMS Examples – Column Family Stores: BigTable (Google), Cassandra & HBase (Apache) – Key-Values Stores: App Engine DataStore (Google), DynamoDB & SimpleDB (AWS) – Document Databases: CouchDB, MongoDB – Graph Databases: Neo4J
  • 15.
    Big Data Cloud •Big Data Overview – Serial Processing • Hadoop – Hadoop Distributed File System (HDFS) – Hive – DW – Pig – Querying Language • Riak – Parallel Processing • HadoopDB – Analytics • Google MapReduce • Apache MapReduce • Splunk (for Security Information / Event Management [SIEM])
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    Big Data Cloud •Cloud-Based Big Data Solutions – PaaS • DBaaS – Amazon Web Services (AWS) » DynamoDB » SimpleDB » Relational Database Service (RDS): Oracle 11g / MySQL – Google App Engine » Datastore – Microsoft SQL Azure – Oracle Public Cloud: 11g • Processing – AWS Elastic MapReduce (EMR) – Google App Engine MapReduce: Mapper API – Microsoft: Apache Hadoop for Azure – IBM SmartCloud Enterprise on IBM InfoSphere BigInsights Basics
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 28.
    Big Data Cloud •Cloud-Based Database Solutions – IaaS • Basic Components: Compute & Storage Nodes – AWS Elastic Compute Cloud (EC2) – AWS Elastic Block Store (EBS) – OpenStack Compute (Nova) – OpenStack Storage (Swift) • Advanced Components – Apache Hadoop – Apache Hadoop MapReduce • Commercial Applications – Cloudera – DataStax – MapR – Splunk
  • 29.
    Big Data Cloud AWS Cloud EC2 Availability Zone S3 Storage EBS EBS EBS EBS EBS EBS EBS Snapshot EBS Snapshot EBS Snapshot EC2 EC2 EC2 EBS Snapshot EBS Snapshot Internet Source: Amazon
  • 31.
    Big Data Cloud •Big Data in the Cloud Use Cases – Public Cloud • AWS: EC2 Hadoop & S3 • AWS: EC2 Hadoop, DynamoDB & EMR • AWS: EC2 Linux, Apache (w / Tomcat), DynamoDB & EMR • AWS: EC2 Cloudera Hadoop & EMR • AWS: EC2 Splunk – Hybrid • Oracle Big Data Appliance & Connector, Google App Engine • OpenStack Swift, AWS EC2 Cloudera Hadoop & EMR – Private Cloud • OpenStack Nova & Swift, Apache Hadoop • OpenStack Nova & Swift, Cloudera Hadoop
  • 32.
  • 33.
  • 34.
    Big Data Cloud •Securing Cloud-Based NRDBMS Solutions – General • Focus on Application / Middleware-Level Security – SQL Injections Are Still Possible – Leverage Application IAM for NRDBMS User Rights Mgmt (URM) – Leverage Application & System Logging for Authentication, Authorization & Accounting (AAA) • Segregation of Duties – Read / Write Namespaces – Read-Only Namespaces – Specific • Document – Consistency Assurance • Key / Value – Ensure Referential Integrity
  • 35.
  • 36.
    Big Data Cloud •Securing Big Data in the Cloud – Identity & Access Management (IAM) • Security Assertion Markup Language (SAML) • Representational State Transfer (REST) – AWS IAM – Windows Azure Access Control Service (ACS) • Web Services – Trust Language (WS-Trust)
  • 37.
  • 38.
  • 39.
    Big Data Cloud •Securing Big Data in the Cloud – Identity & Access Management (IAM) • Security Assertion Markup Language (SAML) • Representational State Transfer (REST) – AWS IAM – Windows Azure Access Control Service (ACS) • Web Services – Trust Language (WS-Trust)
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    Big Data Cloud •Securing Big Data in the Cloud – Identity & Access Management (IAM) • Security Assertion Markup Language (SAML) • Representational State Transfer (REST) – AWS IAM – Windows Azure Access Control Service (ACS) • Web Services – Trust Language (WS-Trust)
  • 45.
  • 46.
    Big Data Cloud •Securing Big Data in the Cloud – Electronic Discovery (eDiscovery) • eDiscovery Reference Model (EDRM) • Legal Holds • Litigation Response – Records & Information Management (RIM) • Generally Accepted Recordkeeping Principles (GARP®) • Information Governance Reference Model (IGRM) • Information Lifecycle Management (ILM) • MIKE2.0
  • 47.
  • 48.
    Big Data Cloud •Privacy & Data Protection for Big Data Clouds – Jurisdictions* • Regional: EU DPA • National: PIPEDA, GLBA, HIPAA / HITECH, COPPA, Safe Harbor • Statutory: Bavarian, CA SB 1386 / 24, MA 201 CMR 17, NV SB 227 – Data Flow & Jurisdictional Adherence • Data Sharing with Third Parties – Pseudonymization / De-Identification • Consent & Notices – Contract Clauses • Model Contracts – Privacy Best Practices • Generally Accepted Privacy Principles (GAPP) * Not all inclusive.
  • 49.
    Big Data Cloud •Presentation Take-Aways – Big Data in the Cloud is Here to Stay – It Has to be Secure – Segregation of Data – Access Controls – Separation / Segregation of Duties – Federated Identities – Logging
  • 50.
    • Questions? • Contact – Email: steve@ncontrol-llc.com – Twitter: markes1 – LI: http://www.linkedin.com/in/smarkey – CSA-DelVal: http://www.csadelval.org/

Editor's Notes

  • #30 http://qugstart.com/blog/amazon-web-services/how-to-set-up-db-server-on-amazon-ec2-with-data-stored-on-ebs-drive-formatted-with-xfs/ Here’s the procedure I decided on. It involves symlinking Mysql config files and data directories onto the EBS volume. Another trick I used because I needed to migrate about 20 GiB’s of data to get started, was that I initially set up an “X-tra large” instance, with 10 GiB’s RAM to handle the data import. After the data was migrated and imported to my database, I simply terminated my X-Large instance and spun up a small instance connected to the same EBS volume! All the databases were preserved nicely and I did not have to waste money paying for an X-Large instance anymore. This exemplifies the value of thinking in the “cloud” mindset – where you can spin up and down servers in a matter of seconds! Hope this article helps someone else out there!
  • #31 http://qugstart.com/blog/amazon-web-services/how-to-set-up-db-server-on-amazon-ec2-with-data-stored-on-ebs-drive-formatted-with-xfs/ Here’s the procedure I decided on. It involves symlinking Mysql config files and data directories onto the EBS volume. Another trick I used because I needed to migrate about 20 GiB’s of data to get started, was that I initially set up an “X-tra large” instance, with 10 GiB’s RAM to handle the data import. After the data was migrated and imported to my database, I simply terminated my X-Large instance and spun up a small instance connected to the same EBS volume! All the databases were preserved nicely and I did not have to waste money paying for an X-Large instance anymore. This exemplifies the value of thinking in the “cloud” mindset – where you can spin up and down servers in a matter of seconds! Hope this article helps someone else out there!
  • #39 realm