Introduction to Hortonworks Data Platform for Windows

  • 1,924 views
Uploaded on

According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations …

According to IDC, Windows Servers run more than 50% of the servers in the Enterprise Data Center. Hortonworks has worked closely with Microsoft to port Apache Hadoop to Windows to enable organizations to take advantage of this emerging Big Data technology. Join us in this informative webinar to hear about the new Hortonworks Data Platform for Windows.

In less than an hour, you’ll learn:

-Key capabilities available in Hortonworks Data Platform for Windows
-How HDP for Windows integrates with Microsoft tools
-Key workloads and use cases for driving Hadoop today

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,924
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
72
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • For the visual thinkers out there, let’s expand our mathematical model to show some concrete examples.ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases.Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions & Feeds, and User-Generated Content are classic places to find Interaction data.Observational data tends to come from the “Internet of Things”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data.Most folks would agree that video is “big” data. The analysis of what’s happening in that video (ie. What you, me, and others are doing in the video) may not be “big” but it is valuable and it does fit under our umbrella.Moreover, business data feeds and publicly available data sets are also “big data”.So we should not minimize our thinking to just data that flows through an organization.Ex. The mortgage-related data you may have COULD benefit from being blended with external data found in Zillow, for example.The government, for example, has the Open Data Initiative. Which means that more and more data is being made publicly available.One of the use cases I find interesting is the Predictive Policing use case where state/local law enforcement is using analytics appied to crime databases and other publicly available data to help predict where and when pockets of crime might be springing up. These proactive analytics efforts have yielded real reductions in crime!Anyhow, this is what Big Data means to me…hopefully it makes sense to you.
  • , an amount that exceeds previous forecasts by 5 ZBs, resulting in a 50-fold growth from the beginning of 2010
  • At its core, Hadoop is about HDFS and MapReduce, 2 projects that are really about distributed storage and data processing which are the underpinnings of Hadoop.In addition to Core Hadoop, we must identify and include the requisite “Platform Services” that are central to any piece of enterprise software. These include High Availability, Disaster Recovery, Security, etc, which enable use of the technology for a much broader (and mission critical) problem set.This is accomplished not by introducing new open source projects, but rather ensuring that these aspects are addressed within existing projects.HDFS: Self-healing, distributed file system for multi-structured data; breaks files into blocks & stores redundantly across clusterMapReduce: Framework for running large data processing jobs in parallel across many nodes & combining resultsYARN: New application management framework that enables Hadoop to go beyond MapReduce appsEnterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • Not only is all of this backed by the architects, developers and operators of Hadoop, but it is also assisted by a world class support team. With backgrounds from IBM, Oracle, MySQL and more, the team enables 24X7 support together with very mature support processes to ensure high quality customer service and responsiveness
  • Additionally, we are a leading provider of Hadoop support through our Hortonworks University, with courses for both development and operations. If required, we can also provide expert consulting services from both ourselves or our System Integrator partners.And for anyone looking to get their hands on Hadoop, we have recently introduced the Hadoop Sandbox program which enables users to download a full instance of HDP together with guided tutorials covering both development and administration topics.
  • At Hortonworks today, our focus is very clear: we Develop, Distribute and Support a 100% open source distribution of Enterprise Apache Hadoop.We employ the core architects, builders and operators of Apache Hadoop and drive the innovation in the open source community.We distribute the only 100% open source Enterprise Hadoop distribution: the Hortonworks Data PlatformGiven our operational expertise of running some of the largest Hadoop infrastructure in the world at Yahoo, our team is uniquely positioned to support youOur approach is also uniquely endorsed by some of the biggest vendors in the IT marketYahoo is both and investor and a customer, and most importantly, a development partner. We partner to develop Hadoop, and no distribution of HDP is released without first being tested on Yahoo’s infrastructure and using the same regression suite that they have used for years as they grew to have the largest production cluster in the worldMicrosoft has partnered with Hortonworks to include HDP in both their off-premise offering on Azure but also their on-premise offering under the product name HDInsight. This also includes integration with both Visual Studio for application development but also with System Center for operational management of the infrastructureTeradata includes HDP in their products in order to provide the broadest possible range of options for their customers
  • Hdp on windows, hdp server on windows, hd on azureMscustomer that wants to leverage familiar windows tools system center, Work with it like in linux, bring your own scriptsWhat they will get and when they will get itIntegration with ms tooling Mscust gives them choice because the infrastructure bits underpinnings the sameSo get started todayDriver is isv app that is vertical in nature and need a choice to deploy on windows todayField positioning
  • At its core, Hadoop is about HDFS and MapReduce, 2 projects that are really about distributed storage and data processing which are the underpinnings of Hadoop.In addition to Core Hadoop, we must identify and include the requisite “Platform Services” that are central to any piece of enterprise software. These include High Availability, Disaster Recovery, Security, etc, which enable use of the technology for a much broader (and mission critical) problem set.This is accomplished not by introducing new open source projects, but rather ensuring that these aspects are addressed within existing projects.HDFS: Self-healing, distributed file system for multi-structured data; breaks files into blocks & stores redundantly across clusterMapReduce: Framework for running large data processing jobs in parallel across many nodes & combining resultsYARN: New application management framework that enables Hadoop to go beyond MapReduce appsEnterprise-ready servicesHigh availability, disaster recovery, snapshots, security, …
  • Beyond Core and Platform Services, we must add a set of Data Services that enable the full data lifecycle. This includes capabilities to:Store dataProcess dataAccess dataFor example: how do we maintain consistent metadata information required to determine how best to query data stored in HDFS? The answer: a project called Apache HCatalogOr how do we access data stored in Hadoop from SQL-oriented tools? The answer: with projects such as Hive, which is the defacto standard for accessing data stored in HDFS.All of these are broadly captured under the category of “data services”.Apache HCatalog: Metadata & Table ManagementMetadata service that enables users to access Hadoop data as a set of tables without needing to be concerned with where or how their data is storedEnables consistent data sharing and interoperability across data processing tools such as Pig, MapReduce and HiveEnables deep interoperability and data access with systems such as Teradata, SQL Server, etc.Apache Hive: SQL Interface for HadoopThe de-facto SQL-like interface for Hadoop that enables data summarization, ad-hoc query, and analysis of large datasetsConnects to Excel, Microstrategy, PowerPivot, Tableau and other leading BI tools via Hortonworks Hive ODBC DriverHive currently serves batch and non-interactive use cases; in 2013, Hortonworks is working with Hive community to extend use cases to interactive query. Cloudera, on the other hand, has chosen to abandon Hive in lieu of Cloudera Impala (a Cloudera controlled technology aimed at the analytics market and solely focused on non-operational interactive query use cases)Apache HBase: NoSQL DB for Interactive AppsNon-relational, columnar database that provides a way for developers to create, read, update, and delete data in Hadoop in a way that performs well for interactive applicationsCommonly used for serving “intelligent applications” that predict user behavior, detect shifting usage patterns, or recommend ways for users to engageWebHDFS: Web service interface for HDFSScalable REST API that enables easy and scalable access to HDFS Move files in & out and delete from HDFS; leverages parallelism of clusterPerform file and directory functionswebhdfs://<HOST>:<HTTP PORT>/PATHIncluded in versions 1.0 and 2.0 of Hadoop; created & driven by HortonworkersTalend Open Studio for Big Data: open source ETL tool available as an optional download with HDPIntuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and PigOozie scheduling allows you to manage and stage jobs Connectors for any database, business application or systemIntegrated HCatalog storage
  • Beyond Core and Platform Services, we must add a set of Data Services that enable the full data lifecycle. This includes capabilities to:Store dataProcess dataAccess dataFor example: how do we maintain consistent metadata information required to determine how best to query data stored in HDFS? The answer: a project called Apache HCatalogOr how do we access data stored in Hadoop from SQL-oriented tools? The answer: with projects such as Hive, which is the defacto standard for accessing data stored in HDFS.All of these are broadly captured under the category of “data services”.Apache HCatalog: Metadata & Table ManagementMetadata service that enables users to access Hadoop data as a set of tables without needing to be concerned with where or how their data is storedEnables consistent data sharing and interoperability across data processing tools such as Pig, MapReduce and HiveEnables deep interoperability and data access with systems such as Teradata, SQL Server, etc.Apache Hive: SQL Interface for HadoopThe de-facto SQL-like interface for Hadoop that enables data summarization, ad-hoc query, and analysis of large datasetsConnects to Excel, Microstrategy, PowerPivot, Tableau and other leading BI tools via Hortonworks Hive ODBC DriverHive currently serves batch and non-interactive use cases; in 2013, Hortonworks is working with Hive community to extend use cases to interactive query. Cloudera, on the other hand, has chosen to abandon Hive in lieu of Cloudera Impala (a Cloudera controlled technology aimed at the analytics market and solely focused on non-operational interactive query use cases)Apache HBase: NoSQL DB for Interactive AppsNon-relational, columnar database that provides a way for developers to create, read, update, and delete data in Hadoop in a way that performs well for interactive applicationsCommonly used for serving “intelligent applications” that predict user behavior, detect shifting usage patterns, or recommend ways for users to engageWebHDFS: Web service interface for HDFSScalable REST API that enables easy and scalable access to HDFS Move files in & out and delete from HDFS; leverages parallelism of clusterPerform file and directory functionswebhdfs://<HOST>:<HTTP PORT>/PATHIncluded in versions 1.0 and 2.0 of Hadoop; created & driven by HortonworkersTalend Open Studio for Big Data: open source ETL tool available as an optional download with HDPIntuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and PigOozie scheduling allows you to manage and stage jobs Connectors for any database, business application or systemIntegrated HCatalog storage
  • Beyond Core and Platform Services, we must add a set of Data Services that enable the full data lifecycle. This includes capabilities to:Store dataProcess dataAccess dataFor example: how do we maintain consistent metadata information required to determine how best to query data stored in HDFS? The answer: a project called Apache HCatalogOr how do we access data stored in Hadoop from SQL-oriented tools? The answer: with projects such as Hive, which is the defacto standard for accessing data stored in HDFS.All of these are broadly captured under the category of “data services”.Apache HCatalog: Metadata & Table ManagementMetadata service that enables users to access Hadoop data as a set of tables without needing to be concerned with where or how their data is storedEnables consistent data sharing and interoperability across data processing tools such as Pig, MapReduce and HiveEnables deep interoperability and data access with systems such as Teradata, SQL Server, etc.Apache Hive: SQL Interface for HadoopThe de-facto SQL-like interface for Hadoop that enables data summarization, ad-hoc query, and analysis of large datasetsConnects to Excel, Microstrategy, PowerPivot, Tableau and other leading BI tools via Hortonworks Hive ODBC DriverHive currently serves batch and non-interactive use cases; in 2013, Hortonworks is working with Hive community to extend use cases to interactive query. Cloudera, on the other hand, has chosen to abandon Hive in lieu of Cloudera Impala (a Cloudera controlled technology aimed at the analytics market and solely focused on non-operational interactive query use cases)Apache HBase: NoSQL DB for Interactive AppsNon-relational, columnar database that provides a way for developers to create, read, update, and delete data in Hadoop in a way that performs well for interactive applicationsCommonly used for serving “intelligent applications” that predict user behavior, detect shifting usage patterns, or recommend ways for users to engageWebHDFS: Web service interface for HDFSScalable REST API that enables easy and scalable access to HDFS Move files in & out and delete from HDFS; leverages parallelism of clusterPerform file and directory functionswebhdfs://<HOST>:<HTTP PORT>/PATHIncluded in versions 1.0 and 2.0 of Hadoop; created & driven by HortonworkersTalend Open Studio for Big Data: open source ETL tool available as an optional download with HDPIntuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and PigOozie scheduling allows you to manage and stage jobs Connectors for any database, business application or systemIntegrated HCatalog storage
  • Any data management platform that is operated at any reasonable scale requires a management technology – for example SQL Server Management Studio for SQL Server, or Oracle Enterprise Manager for Oracle DB, etc. Hadoop is no exception, and means Apache Ambari, which is increasingly being recognized as foundational to the operation of Hadoop infrastructures. It allows users to provision, manage and monitor a cluster and provides a set of tools to visualize and diagnose operational issues. There are other projects in this category (such as Oozie) but Ambari is really the most influential.Apache Ambari: Management & MonitoringMake Hadoop clusters easy to operateSimplified cluster provisioning with a step-by-step install wizardPre-configured operational metrics for insight into health of Hadoop servicesVisualization of job and task execution for visibility into performance issuesComplete RESTful API for integrating with existing operational toolsIntuitive user interface that makes controlling a cluster easy and productive
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
  • Rohit: Can you provide three bullet points of your demo?
  • Hdp on windows, hdp server on windows, hd on azureMscustomer that wants to leverage familiar windows tools system center, Work with it like in linux, bring your own scriptsWhat they will get and when they will get itIntegration with ms tooling Mscust gives them choice because the infrastructure bits underpinnings the sameSo get started todayDriver is isv app that is vertical in nature and need a choice to deploy on windows todayField positioning

Transcript

  • 1. Quick House Keeping Rule• Q&A panel is available if you have any questions during the webinar• There will be time for Q&A at the end• We will record the webinar for future viewing• All attendees will receive a copy of the slides and recording Page 1 © Hortonworks Inc. 2013
  • 2. Introducing Hortonworks Data Platform for WindowsEnterprise Apache Hadoop for Windows EnvironmentsMarch 2013© Hortonworks Inc. 2013 Page 2
  • 3. Our Speakers John Kreisa VP, Strategic Marketing Saptak Sen Sr. Product Manager Rohit Bakshi Product Manager Page 3 © Hortonworks Inc. 2013
  • 4. Agenda• Why Hadoop on Windows?• Hortonworks Data Platform for Windows• Microsoft - Big Data and Apache Hadoop• Hortonworks Data Platform under the covers• Q&A Page 4 © Hortonworks Inc. 2013
  • 5. Polling QuestionWhere are you with Hadoop?__ We are running it in production__ We have it running in our labs__ We are just investigating Hadoop__ What is Hadoop? Page 5 © Hortonworks Inc. 2013
  • 6. Agenda• Why Hadoop on Windows?• Hortonworks Data Platform for Windows• Microsoft - Big Data and Apache Hadoop• Hortonworks Data Platform under the covers• Q&A Page 6 © Hortonworks Inc. 2013
  • 7. Why Apache Hadoop on Windows?• According to IDC Windows Server held 73% market share in 2012 – Hadoop was traditionally built for Linux servers so there are a large number of underserved organizations• Apache Hadoop: de-facto platform for processing massive amounts of unstructured data – Complementary to existing Microsoft technologies – There is a huge untapped community of Windows developers and ecosystem partners• A strong Microsoft-Hortonworks partnership and 18 months of development makes this a natural next step Page 7 © Hortonworks Inc. 2013
  • 8. What Makes Up Big Data? Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment SMS/MMS User Click Stream = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity Page 8 © Hortonworks Inc. 2013
  • 9. Big Data: Big and Getting Bigger Fast!• Unstructured data growth exceeds 80% year/year in most enterprises – Machine-generated data is a key driver in data growth• IDC projects digital universe will reach 40 zettabytes (ZB) by 2020 – 1 ZB = 1,000,000,000,000 GBs! – Projected to increase 15x by 2020• According to 2012 Barclays CIO study big data outranks virtualization as #1 spending initiative *2012 IDC Digital Universe Study Page 9 © Hortonworks Inc. 2013
  • 10. Enter Apache HadoopThe core of the next generation data platform… OSS that delivers high-scale HADOOP CORE HDFS MAP REDUCE storage & processing with enterprise-ready platform PLATFORM SERVICES Enterprise Readiness services Hortonworkers are the original architects, operators, and builders of core Hadoop Page 10 © Hortonworks Inc. 2013
  • 11. Agenda• Why Hadoop on Windows?• Hortonworks Data Platform for Windows• Microsoft - Big Data and Apache Hadoop• Hortonworks Data Platform under the covers• Q&A Page 11 © Hortonworks Inc. 2013
  • 12. Introducing HDP for Windows OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & Manage & Store, Store, Operate at Operate at Process and Process and For Windows Scale Scale Access Data Access Data • 100% Open Source HADOOP CORE Distributed Enterprise Hadoop Storage & Processing • Component and version PLATFORM SERVICES Enterprise Readiness compatible with Microsoft HDInsight HORTONWORKS DATA PLATFORM (HDP) • Availability For Windows • Beta release available now • GA early 2Q 2013 Page 12 © Hortonworks Inc. 2013
  • 13. Hortonworks Data Platform for WindowsHDP: the first and only distribution available on Windows & Linux• Enterprise-grade Apache Hadoop on Windows – Enables same experience for Hadoop on Windows & Linux• More partners, more developers for Hadoop – Makes native Apache Hadoop available to Windows ecosystem – More options for Windows focused organizations• Hortonworks focus: Enterprise Apache Hadoop for all platforms – Trusted reliable production-ready distribution for on-premise Hadoop on Windows deployments• Built with joint investment and contributions from Microsoft – Deep engineering relationship ensures tight integration and maximum performance Page 13 © Hortonworks Inc. 2013
  • 14. Hortonworks: Best In Class Hadoop Support• Experienced enterprise support team – Experience supporting enterprise clients in production – Core engineers have real operational experience: built and supported 44+K nodes in production – Extensive experience in commercial big data offerings including HDP, MapR, Karmasphere• Global 24x7 operation – support based in Sunnyvale, UK & India• Stringent case management processes ensures high quality customer service & responsiveness Page 14 © Hortonworks Inc. 2013
  • 15. Transferring Our Hadoop Expertise to You The expert source for Apache Hadoop training & certification • World class training programs designed to help you learn fast – Role-based hands on classes with 50% lab time – New HDP on Windows course • Expert consulting services – Programs designed to transfer knowledge • Industry leading Hadoop Sandbox program – Fastest way to learn Apache Hadoop – Multi-level tutorials for wide applicability – Customizable and updateable Page 15 © Hortonworks Inc. 2013
  • 16. Hortonworks Snapshot We develop, distribute and support the ONLY 100% open source Headquarters: Palo Alto, CA Employees: 180+ and growing Enterprise Hadoop distribution Investors: Benchmark, Index, YahooDevelop Distribute Support• We employ the core • We distribute the only 100% • We are uniquely positioned architects, builders and Open Source Enterprise to deliver the highest quality operators of Apache Hadoop Hadoop Distribution: of Hadoop support Hortonworks Data• We drive innovation within Platform • We enable the ecosystem to Apache Software work better with Hadoop Foundation projects • We engineer, test & certify HDP for enterprise usageEndorsed by Strategic Partners Page 16 © Hortonworks Inc. 2013
  • 17. Agenda• Why Hadoop on Windows?• Hortonworks Data Platform for Windows• Microsoft - Big Data and Apache Hadoop• Hortonworks Data Platform under the covers• Q&A Page 17 © Hortonworks Inc. 2013
  • 18. Microsoft Big DataMicrosoft Big Data – Simplifies data management for IT – Enables IT and users to easily enrich their data with the world’s data, and – Delivers agility to end users through familiar tools like Excel Page 18 © Hortonworks Inc. 2013
  • 19. Microsoft End-To-End Big Data Platform Page 19 © Hortonworks Inc. 2013
  • 20. Agenda• Why Hadoop on Windows?• Hortonworks Data Platform for Windows• Microsoft - Big Data and Apache Hadoop• Hortonworks Data Platform under the covers• Q&A Page 20 © Hortonworks Inc. 2013
  • 21. Enhancing the Core of Apache Hadoop Deliver high-scale storage & processing with enterprise-ready platform services WEBHDFS MAP REDUCE Unique Focus Areas: HADOOP CORE HDFS • Bigger, faster, more flexible Continued focus on speed & scale and PLATFORM SERVICES Enterprise Readiness enabling near-real-time apps • Tested & certified at scale Run ~1300 system tests on large clusters for every release Hortonworkers are the architects, operators, and builders of core Hadoop • Enterprise-ready services High availability, disaster recovery, snapshots, security, … Page 21 © Hortonworks Inc. 2013
  • 22. Data Services for Full Data Lifecycle DATA Provide data services to SERVICES store, process & access SQOOP PIG HIVE data in many ways HCATALOG Unique Focus Areas: Distributed • Apache HCatalog HADOOP CORE Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM SERVICES Enterprise Readiness • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 22 © Hortonworks Inc. 2013
  • 23. Data Services for Full Data Lifecycle DATA Provide data services to SERVICES store, process & access SQOOP PIG HIVE data in many ways HCATALOG Unique Focus Areas: Distributed • Apache HCatalog HADOOP CORE Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM SERVICES Enterprise Readiness • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 23 © Hortonworks Inc. 2013
  • 24. Data Services for Full Data Lifecycle DATA Provide data services to SERVICES store, process & access SQOOP PIG HIVE data in many ways HCATALOG Unique Focus Areas: Distributed • Apache HCatalog HADOOP CORE Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM SERVICES Enterprise Readiness • Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 24 © Hortonworks Inc. 2013
  • 25. Operational Services for Ease of Use OPERATIONAL DATA Include complete SERVICES SERVICES operational services for Store, productive operations Oozie Process and Access Data & management Distributed • Apache Oozie: HADOOP CORE Storage & Processing Manage and schedule job execution for Hadoop jobs PLATFORM SERVICES Enterprise Readiness Only Hortonworks provides a complete open source Hadoop management tool Page 25 © Hortonworks Inc. 2013
  • 26. Inside HDP for Windows OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & Store, HIVE PIG Oozie Operate at Process and For Windows Scale Access Data SQOOP HCATALOG • 100% Open Source WEBHDFS Distributed Enterprise Hadoop HADOOP CORE Storage & ProcessingREDUCE HDFS MAP • Component and version PLATFORM SERVICES compatible with Microsoft HDInsight HORTONWORKS DATA PLATFORM (HDP) • Availability For Windows • Beta release available now • GA early 2Q 2012 Page 26 © Hortonworks Inc. 2013
  • 27. Seamless Interoperability with Your Microsoft Tools • Integrated with Microsoft toolsAPPLICATIONS for native big data analysis – Bi-directional connectors for SQL Microsoft Applications Server and SQL Azure through SQOOP – Excel ODBC integration through Hive • Addressing demand for Hadoop on Windows – Ideal for Windows customers withDATA SYSTEMS HORTONWORKS Hadoop operational experience DATA PLATFORM For Windows • Enables all common Hadoop workloads – Data refinement and ETL offload for high-volume data landing – Data exploration for discovery of new business opportunitiesDATA SOURCES Traditional Sources New Sources OLTP, (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media) MOBILE POS DATA SYSTEMS Page 27 © Hortonworks Inc. 2013
  • 28. Demo Time! Excel integration with HDP • Interact with HDP through Excel • Use Data Explorer to explore and turn raw data into valuable information Page 28 © Hortonworks Inc. 2013
  • 29. Maximize Your Hadoop Deployment Choice• Use HDP for Windows for on-premises deployment on Windows Server – Ideal for Windows users with Hadoop experience – Perfect next step for those who are ready to move from POC to production• Use HDInsight for Microsoft tooling and Management and Provisioning – HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) – available in Preview today – HDInsight Server for full integration of Hadoop with Microsoft tools on premises – Developer Preview available today• Full interoperability and deployment choice across platforms – Implement big data applications that run on-premise & cloud – By leveraging open source HDP, enables seamless interoperability across environments: Linux, Windows, Windows Azure Page 29 © Hortonworks Inc. 2013
  • 30. Next Steps Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform for Windows (Beta) www.hortonworks.com/download Follow… @hortonworks, @hortonworks_U Page 30 © Hortonworks Inc. 2013