Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hortonworks and Red Hat Webinar - Part 2

2,186 views

Published on

Learn more about creating reference architectures that optimize the delivery the Hortonworks Data Platform. You will hear more about Hive, JBoss Data Virtualization Security, and you will also see in action how to combine sentiment data from Hadoop with data from traditional relational sources.

Published in: Software
  • Be the first to comment

Hortonworks and Red Hat Webinar - Part 2

  1. 1. We’ll get started soon… Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining! Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  2. 2. Evolving your data into strategic asset …using HDP and Red Hat JBoss Data Virtualization Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved We do Hadoop.
  3. 3. Your speakers… Raghu Thiagarajan, Director, Partner Product Management, Hortonworks Kimberly Palko, Principal Product Manager, Red Hat Kenny Peeples, Principal Technical Marketing Manager, Red Hat Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  4. 4. Clickstream Capture and analyze website visitors’ data trails and optimize your website Sensors Discover patterns in data streaming automatically from remote sensors and machines Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Server Logs Research logs to diagnose process failures and prevent security breaches Hadoop Value: New types of data Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  5. 5. HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS YARN : Data OperaFng System DATA MANAGEMENT GOVERNANCE & DATA ACCESS SECURITY INTEGRATION AuthenFcaFon AuthorizaFon AccounFng Data ProtecFon Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-­‐Memory AnalyNcs, ISV engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Batch Map Reduce Deployment Choice Linux Windows On-Premise Cloud
  6. 6. HDP is deeply integrated in the data center YARN Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DEV & DATA TOOLS OPERATIONAL TOOLS INFRASTRUCTURE SOURCES EXISTING Systems Clickstream Web &Social GeolocaFon Sensor & Machine Server Logs Unstructured DATA SYSTEM RDBMS EDW MPP HANA APPLICATIONS BusinessObjects BI HDP 2.1 Governance & Integration Security Operations Data Access Data Management • Enables millions of JBoss developers to quickly build applications with Hadoop • Simplifies deployment of Hadoop on OpenStack • Develops and deploys Apache Hadoop as integrated components of the open modern data architecture
  7. 7. Modern Data Architecture + Red Hat Data Virtualization Extract and Refine • Easily combine data from multiple sources without moving or copying data • Use any reporting or analytical tool NFS DBs FiFFleilisel es s JMS Queue’s Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Database Server Application AMBARI MAPREDUCE YARN HDFS REST DATA REFINEMENT PIG HIVE CUSTOM HTTP STREAM LOAD SQOOP FLUME WebHDFS SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory INTERACTIVE HIVE Server2
  8. 8. Hive creates Structure on Raw Sentiment Data Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  9. 9. Hive: Interactive SQL-IN-Hadoop Stinger Initiative – DELIVERED Next generation SQL based interactive query in Hadoop Speed Improve Hive query performance has increased by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL Support broadest range of SQL semantics for analytic applications running against Hadoop Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Governance & Integration Security Operations HDP 2.1 Data Access Data Management Business AnalyFcs SQL Custom Apps Apache Hive Apache MapReduce Apache Tez Apache YARN 1 ° ° ° HDFS ° ° ° ° (Hadoop Distributed File System) ° ° ° ° An Open Community at its finest: Apache Hive Contribution 1,672 Jira Tickets Closed 145 Developers 44 Companies ° ° N ~330,000 Lines Of Code Added… (2.5x) SFnger Project SFnger Phase 1: • Base OpNmizaNons • SQL Types • SQL AnalyNc FuncNons • ORCFile Modern File Format SFnger Phase 2: • SQL Delivered Types • SQL AnalyNc FuncNons • Advanced OpNmizaNons • Performance Boosts via YARN SFnger Phase 3 • Hive on Apache Tez • Query Service (always on) • Buffer Cache • Cost Based OpNmizer (OpNq) 13 Months
  10. 10. Federation: Regulatory requirements drive geo-specific clusters NFS NFS Refine Explore Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DV Dashboard to analyze the aggregated data by User Role Consume Compose Connect JBoss Data Virtualization Store Web Catalog AMBARI MAPREDUCE YA RN HDFS REST DATA REFINEMENT PIG HIVE CUSTOM HTTP STREAM LOAD SQOOP FLUME WebHDFS AMBARI MAPREDUCE YA RN HDFS REST DATA REFINEMENT PIG HIVE CUSTOM HTTP STREAM LOAD SQOOP FLUME WebHDFS SOURCE 1: Hive/Hadoop in the HDP contains US Region Data SOURCE 2: Hive/Hadoop in the HDP contains EU Region Data INTERACTIVE HIVE Server2 INTERACTIVE HIVE Server2
  11. 11. Customer data with confidentiality requirements Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  12. 12. Red Hat JBoss Data Virtualization and Hortonworks HDP Kimberly Palko Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  13. 13. Engineering Collaboration Benefits Integration with JBoss Data Virtualization Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enable agile Big Data Hadoop integration with existing enterprise assets and maximize universal data utilization to enable self-service analytics Integration with multiple Red Hat JBoss Middleware product family Enables millions of JBoss developers to quickly build applications with Hadoop Integration with Red Hat Storage Enables Hadoop to use Red Hat Storage secure resilient storage pool for data applications Integration with Red Hat Enterprise Linux OpenStack Platform Simplifies automated deployment of Hadoop on OpenStack Integrated with Red Hat Enterprise Linux and OpenJDK Develop and deploy Apache Hadoop as an integrated component for multiple deployment scenarios
  14. 14. Data Supply and Integration Solution Data Virtualization sits in front of multiple data sources and ü allows them to be treated a single source ü delivering the desired data ü in the required form ü at the right time ü to any application and/or user. THINK VIRTUAL MACHINE FOR DATA Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  15. 15. Easy Access to Big Data Hive Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Reporting tool accesses the data virtualization server via rich SQL dialect • The data virtualization server translates rich SQL dialect to HiveQL • Hive translates HiveQL to MapReduce • MapReduce runs MR job on big data MapReduce HDFS Analytical Reporting Tool Data Virtualization Server Hadoop Big Data
  16. 16. Different Users Different Views of Big Data Hive Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Logical tables with different forms of aggregation • Logical tables containing extra derived data • Logical tables with filtered data • All reports/users share the same specifications MapReduce HDFS
  17. 17. Caching For Faster Performance – Virtual View Query 1 Query 2 Virtual Database (VDB) Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Cached View 1 View 1 • Same cached view for multiple queries • Refreshed automatically or manually • Cache repository can be any supported data source
  18. 18. Caching for Faster Performance – Result Set Query 1 Query 1 Virtual Database (VDB) Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Result Set Cache View 1 • Results for a single query are cached after first execution • Each unique query has its own cache
  19. 19. Demonstration Combining sentiment data from Hadoop with data from traditional relational Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved sources Kenny Peeples
  20. 20. Use Case 1 - Overview Objective: -Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales Problem: -Cannot utilize social data and sentiment analysis with sales management system Solution: -Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data. Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Excel Powerview and DV Dashboard to analyze the aggregated data Consume Compose Connect JBoss Data Virtualization Hive SOURCE 1: Hive/Hadoop contains twi[er data including senFment SOURCE 2: MySQL data that includes Fcket and merchandise sales
  21. 21. Use Case 1 - Architecture APPLICATIONS DATA SYSTEM Business AnalyFcs RDBMS EDW MPP TRADITIONAL REPOSITORIES Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Custom ApplicaFons Packaged ApplicaFons VIRTUAL DATA MART
  22. 22. Use Case 1 - Resources • GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase1 Tutorial: http://hortonworks.com/hadoop-tutorial/evolving-data-stratagic-asset-using-hdp-red-hat-jboss-data- virtualization/ • VIDEOS: http://vimeo.com/user16928011/hortonworksusecase1short http://vimeo.com/user16928011/hortonworksusecase2short • SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase1 Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  23. 23. JBoss Data Virtualization Security Kimberly Palko Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  24. 24. Role based access control Roles • Define roles based on organization hierarchy Users • External authentication via Kerberos, LDAP, etc. VDB • Assign users and groups to a virtual data base Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  25. 25. Authentication Kerberos From client to the virtual data base New in Data Virtualization 6.1: Kerberos authentication to the data source Login Modules LDAP (MS Active Directory, OpenLDAP, etc.), any JAAS based security domain REST and Web Services WS-UsernameToken HTTP Basic authentication SAML SAML authentication for web client applications Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  26. 26. Audit Logging via Dashboard Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  27. 27. Row and Column Masking Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved - Row based masking Ex: keyed off geographic marker - Column masking to a constant, null, or a SQL statement Example: change all but the Last 4 digits in a credit card number to stars concat('****', substring(column, length(column)-4))
  28. 28. Summary of Security Capabilities • Authentication – Kerberos, LDAP, WS-UsernameToken, HTTP Basic, SAML • Authorization – Virtual data views, Role based access control • Administration – Centralized management of VDB privileges • Audit – Centralized audit logging and dashboard • Protection – Row and column masking – SSL encryption (ODBC and JDBC) Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  29. 29. Demonstration Geographically Distributed Hadoop Clusters with Data Virtualization - Securing Data by User Role Kenny Peeples Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  30. 30. Use Case 2 - Overview Objective: -Secure data according to Role for row level security and Column Masking Problem: -Cannot hide region data from region specific users Solution: -Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  31. 31. Use Case 2 - Architecture APPLICATIONS DATA SYSTEM Business AnalyFcs Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Custom ApplicaFons Packaged ApplicaFons VIRTUAL DATA MART
  32. 32. Use Case 2 - Resources • GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase2 Tutorial: Available soon • VIDEOS: http://vimeo.com/user16928011/hortonworksusecase2short http://vimeo.com/user16928011/hortonworksusecase2short • SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase2 Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  33. 33. Benefits of JBoss Data Virtualization with Hortonworks HDP 2.1 • Combines new data in Hadoop with data in traditional data sources without moving or copying data • Gives access to a variety of BI and analytics tools • Provides caching for faster access to data • Provides consistent security policy across multiple data sources Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  34. 34. Thank you! Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks and Red Hat JBoss Data Virtualization
  35. 35. Next Steps... More about Red Hat & Hortonworks http://hortonworks.com/partner/redhat Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Contact us: events@hortonworks.com Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  36. 36. Don’t Forget to Register for our Next Webinar! Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved September 17th, 10 AM PST Red Hat JBoss Data Virtualization and Hortonworks Data Platform http://info.hortonworks.com/RedHatSeries_Hortonworks.html

×