Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

7,694 views

Published on

This presentation was included in a 30-minute webinar Balaji Ganesan, Hortonworks senior director for enterprise security strategy and Vinay Shukla, director of product management.

They discussed Hortonworks Data Platform 2.2’s features for delivering comprehensive security in HDP.

Balaji and Vinay discussed Apache Ranger and Apache Knox and how they are integrated in HDP 2.2 to provide fine grain authorization, auditing and API security that can be centrally administered.

Published in: Software
  • I have tried different kinds of software in last one year. I lost my account balance with all of those crap (big scam actually)-usually in less then a day of trading . Now I feel fortunate enough that at last, I got this system named 'Mikes Auto Trader'>>(tr.im/mikeautotrader) . This is the only software that's ever worked for me. Right now I am happy with this.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache Knox

  1. 1. Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger & Apache Knox Page 1 © Hortonworks Inc. 2014 Hortonworks. We do Hadoop.
  2. 2. Speakers Page 2 © Hortonworks Inc. 2014 Justin Sears Hortonworks Product Marketing Manager Vinay Shukla Hortonworks Director of Product Management Balaji Ganesan Hortonworks Senior Director of Enterprise Security Strategy
  3. 3. Agenda • Overview of Security in HDP 2.2: § Centralized security with Apache Ranger § API security with Apache Knox • Demo • Q & A We’ll move quickly: • Attendee phone lines are muted • Text any questions to Vinay Shukla using Webex chat • Questions answered at the end • Unanswered questions and answers in upcoming blog post Page 3 © Hortonworks Inc. 2014
  4. 4. Big Data, Hadoop & Data Center Re-platforming Business Drivers • From reactive analytics to proactive interactions • Insights that drive competitive advantage & optimal returns Page 4 © Hortonworks Inc. 2014 $ Financial Drivers • Cost of data systems, as % of IT spend, continues to grow • Cost advantages of commodity hardware & open source software Technical Drivers • Data is growing exponentially & existing systems overwhelmed • Predominantly driven by NEW types of data that can inform analytics There is an inequitable balance between vendor and customer in the market
  5. 5. Clickstream Capture and analyze website visitors’ data trails and optimize your website Page 5 © Hortonworks Inc. 2014 Sensors Discover patterns in data streaming automatically from remote sensors and machines Server Logs Research logs to diagnose process failures and prevent security breaches Hadoop Value: New Types of Data Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  6. 6. A Shift from Reactive to Proactive Interactions A shift in Advertising From mass branding …to 1x1 Targeting A shift in Financial Services From Educated Investing …to Automated Algorithms A shift in Healthcare From mass treatment …to Designer Medicine A shift in Retail A shift in Telco Page 6 © Hortonworks Inc. 2014 HDP and Hadoop allow organizations to use data to shift interactions from… Reactive Post Transaction Proactive Pre Decision …to Real-t From static branding ime Personalization From break then fix …to repair before break
  7. 7. Enterprise Goals for the Modern Data Architecture Batch Interactive Real-Time Page 7 © Hortonworks Inc. 2014 • Consolidate siloed data sets structured and unstructured • Central data set on a single cluster • Multiple workloads across batch interactive and real time • Central services for security, governance and operation • Preserve existing investment in current tools and platforms • Single view of the customer, product, supply chain DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N CRM ERP Other 1 ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING Systems Clickstream Web &Social Geoloca9on Sensor & Machine Server Logs Unstructured
  8. 8. YARN Transformed Hadoop & Opened a New Era Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Page 8 © Hortonworks Inc. 2014 YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark
  9. 9. YARN Extends Hadoop to Other Data Center Leaders Script Pig BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SQL Hive TezTez Java Scala Cascading Tez NoSQL HBase Accumulo Sli der 1 ° ° ° ° ° ° ° Stream Storm Slider HDFS In-Memory Spark (Hadoop Distributed File System) ° ° ° ° ° ° ° ° Page 9 © Hortonworks Inc. 2014 YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases • Supports 3rd-party ISV tools (ex. SAS, Syncsort, Actian, etc.) YARN: Data Operating System (Cluster Resource Management) ° ° ° ° Others ISV Engines Search Solr ° ° ° ° ° ° ° ° ° ° YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions
  10. 10. Enterprise Hadoop: Central Set of Services BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Page 10 © Hortonworks Inc. 2014 Slider Slider YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for: • Governance • Operations • Security Everything that plugs into Hadoop inherits these services Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines HDFS (Hadoop Distributed File System)
  11. 11. Hortonworks Development Investment for the Enterprise Vertical Integration with YARN and HDFS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 11 © Hortonworks Inc. 2014 Slider ° ° ° ° ° ° ° ° ° ° ° ° ° ° Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) • Ensure engines can run reliably and respectfully in a YARN based cluster • Implement features throughout the stack to accommodate
  12. 12. Hortonworks Development Investment for the Enterprise Horizontal Integration for Enterprise Services BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE SECURITY OPERATIONS Tez TezTez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 12 © Hortonworks Inc. 2014 Slider ° ° ° ° ° ° ° ° ° ° ° ° ° ° Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Load data and manage according to policy Deploy and effectively manage the platform Provide layered approach to security through Authentication, Authorization, Accounting, and Data Protection Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) • Ensure consistent enterprise services are applied across the entire Hadoop stack • Integrate with and extend existing data center solutions for these key requirements
  13. 13. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS Script Pig SQL Hive TezTez Page 13 © Hortonworks Inc. 2014 Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Linux Windows Deployment Choice Cloud YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop The widest range of deployment options • Linux & Windows • On premises & cloud Others ISV Engines On-Premises
  14. 14. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS OPERATIONS Script Pig SQL Hive TezTez Page 14 © Hortonworks Inc. 2014 Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Sli der Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop The widest range of deployment options • Linux & Windows • On premises & cloud Others ISV Engines SECURITY Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Linux Windows Deployment Choice On-Premises Cloud
  15. 15. Apache Ranger for Centralized Security Page 15 © Hortonworks Inc. 2014
  16. 16. Apache Ranger (formerly Apache Argus) Central Security Administration, Authorization and Auditing for Hadoop Page 16 © Hortonworks Inc. 2014
  17. 17. Central Security Administration • Delivers a ‘single pane of glass’ for the security administrator • Centralizes administration of security policy • Ensures consistent coverage across the entire Hadoop stack Page 17 © Hortonworks Inc. 2014
  18. 18. Set Up Authorization Policies Page 18 © Hortonworks Inc. 2014 File level access control with flexible definition Control user and group permissions
  19. 19. Monitor User Activity with Auditing Page 19 © Hortonworks Inc. 2014
  20. 20. New Apache Ranger Features in HDP 2.2 New Components in Centralized Administration • Apache Storm Authorization & Auditing • Apache Knox Authorization & Auditing Deeper Integration with the Hadoop Stack • Windows Support • Integration with Hive new auth API, support grant/revoke commands • Support grant/revoke commands in HBase Enterprise Readiness • Rest APIs for policy manager • Store Audit logs locally in HDFS Page 20 © Hortonworks Inc. 2014
  21. 21. About Apache Knox API Security for Hadoop Page 21 © Hortonworks Inc. 2014
  22. 22. Knox: Securely Share the Data Lake w/ Many Users • Securely extends the reach of Hadoop APIs to anyone on any device • Serves as a gateway for Hadoop’s REST API • Different REST APIs have varying levels of authentication, authorization, SSL and SSO capabilities • For enterprise authentication, applies enterprise capabilities to all REST APIs: IDM Integration, SSO, Oauth, SAML • Avoids exposing the cluster port and hostnames to all users Page 22 © Hortonworks Inc. 2014
  23. 23. Extend Hadoop API Reach with Knox App A App B Application Tier App C App N Business User Page 23 © Hortonworks Inc. 2014 REST/HTTP JDBC/ODBC Load Balancer Hadoop Cluster Data Ingest ETL Admin/Operator Bastian Node SSH RPC Call Falcon Oozie Sqoop Flume Data Operator Hadoop Admin Knox
  24. 24. New Apache Knox Features in HDP 2.2 • Use Ambari for Knox install, start, stop an configuration • New support for: • YARN REST API • HDFS HA • SSL to Hadoop cluster services (WebHDFS, HBase, Hive and Oozie) • Knox Management REST API • Integration with Apache Ranger for service level authorization Page 24 © Hortonworks Inc. 2014
  25. 25. Demo Page 25 © Hortonworks Inc. 2014
  26. 26. Q & A Page 26 © Hortonworks Inc. 2014
  27. 27. Thank you! Learn more at: hortonworks.com/labs/security/ Page 27 © Hortonworks Inc. 2014 Register for the remaining 7 Discover HDP 2.2 Webinars Hortonworks.com/webinars

×