Don't Let Security Be The 'Elephant in the Room'
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Don't Let Security Be The 'Elephant in the Room'

on

  • 1,713 views

Enterprise security for big data

Enterprise security for big data

Statistics

Views

Total Views
1,713
Views on SlideShare
1,493
Embed Views
220

Actions

Likes
4
Downloads
71
Comments
0

2 Embeds 220

http://www.bigdatanosql.com 205
http://www.scoop.it 15

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Don't Let Security Be The 'Elephant in the Room' Presentation Transcript

  • 1. Don’t Let Security Be The ‘Elephant in the Room’ Enterprise Security for Big Data Mitch Ferguson, VP Business Development, Hortonworks Jeremy Stieglitz, VP Business Development, Voltage Security 8/5/2013
  • 2. © Hortonworks Inc. 2013 Hortonworks Community Driven Enterprise Apache Hadoop June 2013 Page 2
  • 3. © Hortonworks Inc. 2013 A Brief History of Apache Hadoop Page 3 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Focus on OPERATIONS 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Yahoo! begins to Operate at scale Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2008 2010 20122006 STABILITY 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo
  • 4. © Hortonworks Inc. 2013 Hortonworks Snapshot Page 4 • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop Develop Distribute Support We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo
  • 5. © Hortonworks Inc. 2013 Enabling Hadoop as Enterprise Big Data Platform DEVELOPER Data Platform Services & Open APIs Hortonworks Data Platform Applications, Business Tools, Development Tools, Open APIs and access Data Movement & Integration, Data Management Systems, Systems Management Installation & Configuration, Administration, Monitoring, High Availability, Replication, Multi-tenancy, .. Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs Page 5
  • 6. © Hortonworks Inc. 2013 Hortonworks Partner Eco-System 140+ Page 6
  • 7. © Hortonworks Inc. 2013 Page 7 Apache Software Foundation Guiding Principles • Release early & often • Transparency, respect, meritocracy Key Roles held by Hortonworkers • PMC Members – Managing community projects – Mentoring new incubator projects – Over 20 Hortonworkers managing community • Committers – Authoring, reviewing & editing code – Over 50 Hortonworkers across projects • Release Managers – Testing & releasing projects – Hortonworkers across key projects like Hadoop, Hive, Pig, HCatalog, Ambari, HBase Apache Hadoop Test & Patch Design & Develop Release Apache Pig Apache HCatalo g Apache HBase Other Apache Projects Apache Hive Apache Ambari “We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..” - Jeff Kelly: Wikibon Apache Community Leadership
  • 8. © Hortonworks Inc. 2013 Leadership that Starts at the Core Page 8 • Driving next generation Hadoop – YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery • 420k+ lines authored since 2006 – More than twice nearest contributor • Deeply integrating w/ecosystem – Enabling new deployment platforms – (ex. Windows & Azure, Linux & VMware HA) – Creating deeply engineered solutions – (ex. Teradata big data appliance) • All Apache, NO holdbacks – 100% of code contributed to Apache
  • 9. © Hortonworks Inc. 2013 Hortonworks Process for Enterprise Hadoop Page 9 Upstream Community Projects Downstream Enterprise Product Hortonworks Data Platform Design & Develop Distribute Integrate & Test Package & Certify Apache HCatalo g Apache Pig Apache HBase Other Apache Projects Apache Hive Apache Ambari Apache Hadoop Test & Patch Design & Develop Release Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Stable Project Releases Fixed Issues “We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
  • 10. © Hortonworks Inc. 2013 Enhancing the Core of Apache Hadoop Deliver high-scale storage & processing with enterprise-ready platform services Unique Focus Areas: • Bigger, faster, more flexible Continued focus on speed & scale and enabling near-real-time apps • Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release • Enterprise-ready services High availability, disaster recovery, snapshots, security, … Page 10 HADOOP CORE Hortonworkers are the architects, operators, and builders of core Hadoop Distributed Storage & Processing PLATFORM SERVICES Enterprise Readiness
  • 11. © Hortonworks Inc. 2013 Page 11 HADOOP CORE DATA SERVICES Provide data services to store, process & access data in many ways Unique Focus Areas: • Apache HCatalog Metadata services for consistent table access to Hadoop data • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Distributed Storage & Processing Hortonworks enables Hadoop data to be accessed via existing tools & systems Store, Process and Access Data PLATFORM SERVICES Enterprise Readiness Data Services for Full Data Lifecycle
  • 12. © Hortonworks Inc. 2013 Operational Services for Ease of Use Page 12 OPERATIONAL SERVICES Include complete operational services for productive operations & management Unique Focus Area: • Apache Ambari: Provision, manage & monitor a cluster; complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues Only Hortonworks provides a complete open source Hadoop management tool Manage & Operate at Scale DATA SERVICES Store, Process and Access Data HADOOP CORE Distributed Storage & Processing PLATFORM SERVICES Enterprise Readiness
  • 13. © Hortonworks Inc. 2013 Page 13 Only Hortonworks allows you to deploy seamlessly across any deployment option • Linux & Windows • Azure, Rackspace & other clouds • Virtual platforms • Big data appliances Deployable Across a Range of Options OS Cloud VM Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES Distributed Storage & Processing Manage & Operate at Scale Store, Process and Access Data
  • 14. © Hortonworks Inc. 2013 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 14 PLATFORM SERVICES HADOOP CORE Enterprise Readiness HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability Distributed Storage & Processing Manage & Operate at Scale Store, Process and Access Data
  • 15. © Hortonworks Inc. 2013 OS Cloud VM Appliance HDP: Enterprise Hadoop Distribution Page 15 PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE & HCATALOG PIG HBASE OOZIE AMBARI HDFS MAP REDUCE Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS
  • 16. © Hortonworks Inc. 2013 OS/VM Cloud Appliance HDP: Enterprise Hadoop Distribution 2.0 Page 16 PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE & HCATALOG PIG HBASE HDFS MAP Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN* TEZ* OTHERREDUCE *included HDP 2.0
  • 17. © Hortonworks Inc. 2013 Secure Hadoop Cluster Apache Knox Gateway Page 17 Browser REST Client Masters Slaves JTNN Web HCat Oozie AA DN TT Enterprise Identity Provider Firewall Firewall Ambari Server YARN Enterprise/Cl oud SSO Provider Knox Gateway Cluster GWGWGW HUE JDBC Client HBaseHive DMZ
  • 18. © Hortonworks Inc. 2013 Big Data Transactions, Interactions, Observations Hadoop Common Patterns of Use Business Cases HORTONWORKS DATA PLATFORM Refine Explore Enrich Batch Interactive Online “Right-time” Access to Data Page 18
  • 19. © Hortonworks Inc. 2013 Operational Data RefineryDATASYSTEMSDATASOURCES 1 3 1 Capture Process Distribute & Retain 2 3 Refine Explore Enric h 2 APPLICATIONS Transform & refine ALL sources of data Also known as Data Reservoir or Catch Basin TRADITIONAL REPOS RDBMS EDW MPP Business Analytics Custom Applications Enterprise Applications Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Page 19 HORTONWORKS DATA PLATFORM
  • 20. © Hortonworks Inc. 2013 Big Data Exploration & VisualizationDATASYSTEMSDATASOURCES Refine Explore Enrich APPLICATIONS Leverage “data lake” to perform iterative investigation for value 3 2 TRADITIONAL REPOS RDBMS EDW MPP 1 Business Analytics Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applications Enterprise Applications 1 Capture Process Explore & Visualize 2 3 Page 20 HORTONWORKS DATA PLATFORM
  • 21. © Hortonworks Inc. 2013 DATASYSTEMSDATASOURCES Refine Explore Enrich APPLICATIONS Create intelligent applications Collect data, create analytical models and deliver to online apps 3 1 2 TRADITIONAL REPOS RDBMS EDW MPP Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) Custom Applications Enterprise Applications NOSQL 1 Capture Process & Compute Deliver Model 2 3 Page 21 Application Enrichment HORTONWORKS DATA PLATFORM
  • 22. Don’t Let Security Be The ‘Elephant in the Room’ Enterprise Security for Big Data Jeremy Stieglitz
  • 23. Extracting Value from Data Big Data Now Includes Sensitive Data • Marketing – analyze purchase patterns • Social media – find best customer segments • Financial systems – model trading data • Banking and insurance – 360 customer view • Security – identify credit card fraud • Healthcare – advance disease prevention Copyright 2013 Voltage Security 23 How do you liberate the value in data – without increasing risk?
  • 24. Hidden Risks in Big Data Adoption Big Data  Enables deeper data analysis  More value from old data  New risks if data is not protected 24 Data Concentration Risks – Financial Positions – Market Position – Changes to big picture – Corporate Compliance risk Cloud Adoption Risks – Sensitive data in untrusted systems. – Data in storage, in use, transmitted to cloud. Data Sharing Risks – Compliance challenges with 3rd party risk – Data in and out of the enterprise Breach Risks – Internal users – External shares – Backup’s, Hadoop stores, data feeds. Copyright 2013 Voltage Security
  • 25. Data Security Approaches IT Infrastructure Security Security Gap Security Gap Security Gap Full disk encryption Transparent Database Encryption (TDE) SSL/TLS Authentication and Access Control SecurityCoverage Copyright 2013 Voltage Security 25
  • 26. Data Security Approaches IT Infrastructure Security SecurityCoverage Security Gap Security Gap Security Gap Full disk encryption Transparent Database Encryption (TDE) SSL/TLS Authentication and Access Control • More keys • More secure • Less computation • Application aware • Less keys • Less secure • More computation • Transparent “check box” encryption, Available from cloud providers Copyright 2013 Voltage Security 26
  • 27. Traditional IT Infrastructure Security Data-Centric Security  Top down: Application-layer data protection provides seamless end-to-end data security  Encrypt once, persistently protect from point of capture: in storage, in transit, in use  If attacked, data has no value SecurityCoverage SecurityCoverage Full disk encryption Transparent Database Encryption (TDE), triggers SSL/TLS/Firewalls Authentication and Access Control Security Gap (Data in the Clear) Security Gap (Data in the Clear) Security Gap (Data in the Clear) Security Gap (Data in the Clear) Traditional IT Security vs. Data-centric security Copyright 2013 Voltage Security 27
  • 28. Requirements for Big Data Security 28 Lock data in place More keys to manage Horizontal support to wherever your data travels Copyright 2013 Voltage Security
  • 29. Data – structure, value, and meaning Take a simple Tax ID. It’s more than just a number. • It has a format and structure • It has value in being unique • It’s parts have value – e.g. last 4 digits Copyright 2013 Voltage Security 29
  • 30. Traditional Encryption Practically Eliminates Value in the Data • Destroys the original value – makes data secure, but incompatible • Changes format of data – requires schema changes • Changes size of field – increases storage • Always requires application and data flow changes: “Ripping up the Roads” • Destroys any special encoding or checksums (Luhn checksum in credit cards, driver’s license checksums for certain states) 934-72-2356 Tax ID AES-CBC uE28W&=209gX32F*52 Encrypted Tax ID Copyright 2013 Voltage Security 30
  • 31. • Standard, proven mode of AES (NIST FFX mode – ask NIST) • Encrypt at capture. Data stays protected at all times • Fit into existing systems, protocols, schemas – any data • Enable operation on encrypted data – retains the value of the original data • Protect live data in applications & databases, business process or transactions • Create de-identified data for test, cloud apps, outsourcers • Can preserve validation checksums Voltage Format-Preserving Encryption™ (FPE) 31 Credit Card 934-72-2356 Tax ID Regular AES 8juYE%Uks&dDFa2345^WFLERG FPE 7412 3423 3526 0000 298-24-2356 Ija&3k24kQarotugDF2390^32 7412 3456 7890 0000 Copyright 2013 Voltage Security
  • 32. Stateless Key Management 32 Keys when you need them, not when you don’t. • Keys derived on the fly • Simple - lower risk, lower cost • Scale to millions of users • Keys don’t stay resident • Standards Based • FPE/AES Symmetric keys • Structured and unstructured data Identity Based Encryption IEEE 1363.3 Copyright 2013 Voltage Security
  • 33. High-performance Data Security 33 Voltage SecureData™ for Hadoop Hadoop ecosystem: ETL tools, HIVE, MapReduce jobs, other query and analysis tools Copyright 2013 Voltage Security
  • 34. Three Insertion Points into Hortonworks Data Platform (HDP) #1. Upon Ingest: APIs, CL, Batch tools for ETL, SQOOP, Streaming, etc. Copyright 2013 Voltage Security 34
  • 35. Three Insertion Points into Hortonworks Data Platform (HDP) #2. Executed as Map Job Copyright 2013 Voltage Security 35
  • 36. Three Insertion Points into Hortonworks Data Platform (HDP) #3. UDFs for PIG, Hive, etc. Copyright 2013 Voltage Security 36
  • 37. Benefits of Voltage SecureData • Solves complex global compliance issues • Ensures data stays protected wherever it goes • Enables accurate analytics on encrypted data • Optimizes performance • Flexibly adapts to the fast-growing Hadoop ecosystem • Delivers maximum return on information – without increased risk Copyright 2013 Voltage Security 37
  • 38. Use Case: Fortune 50 Healthcare Products and Services Company • Challenge – Sell new information-based services to medical suppliers & drug companies – Big Data team tasked with securing 1000 node Hadoop cluster for HIPAA, HITECH • Solution – Data de-identified in ETL move before entering Hadoop – Ability to decrypt analytic results when needed, through multiple tools • Benefits – Ability to monetize existing medical data, and fine-tune manufacturing and marketing 8/5/2013 38Copyright 2013 Voltage Security
  • 39. Use Case: Banking Top Worldwide Financial Institution • Challenge – Credit risk and consumer fraud groups – PCI compliance is #1 driver – ETL offload use case with Hadoop alongside DW • Solution – Integrate with Sqoop on ingestion, and Hive and Pig on the applications / query side to protect 20 types of data – Fraud analysts work with SST tokenized credit card numbers and only de-tokenize as needed • Benefits – Enable fraud and risk analytics directly in Hadoop on protected data – Use Hadoop processing with security and compliance for faster time to insight 8/5/2013 39Copyright 2013 Voltage Security
  • 40. 40 Contacts - Hortonworks - http://hortonworks.com/ - http://hortonworks.com/partners/certified-technology-program/ - USA: (855) 8-HORTON (1 for sales) - Intl: (408) 916-4121 (1 for sales) - Voltage Security - http://www.voltage.com/ - http://www.voltage.com/partners/technology-partners/hortonworks/ - Tel: +1 (408) 886-3200 - techpartners@voltage.com Copyright 2013 Voltage Security
  • 41. THANK YOU