This document discusses how Hortonworks Data Platform (HDP) can enable enterprises to build a modern data architecture centered around Hadoop. It describes how HDP provides a centralized platform for managing all types of data at scale using technologies like YARN. Case studies are presented showing how companies have used HDP to optimize costs, develop new analytics applications, and work towards creating a unified "data lake". The document outlines the key components of HDP including its support for any application, any data, and deployment anywhere. It also highlights how partners extend HDP's capabilities and how Hortonworks provides enterprise-grade support.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!
Hortonworks and Platfora in Financial Services - WebinarHortonworks
Big Data Analytics is transforming how banks and financial institutions unlock insights, make more meaningful decisions, and manage risk. Join this webinar to see how you can gain a clear understanding of the customer journey by leveraging Platfora to interactively analyze the mass of raw data that is stored in your Hortonworks Data Platform. Our experts will highlight use cases, including customer analytics and security analytics.
Speakers: Mark Lochbihler, Partner Solutions Engineer at Hortonworks, and Bob Welshmer, Technical Director at Platfora
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
Your Big Data strategy is only as good as the quality of your data. Today, deriving business value from data depends on how well your company can capture, cleanse, integrate and manage data. During this webinar, we discussed how to eliminate the challenges to Big Data management inside Hadoop.
Go over these slides to learn:
· How to use the scalability and flexibility of Hadoop to drive faster access to usable information across the enterprise.
· Why a pure-YARN implementation for data integration, quality and management delivers competitive advantage.
· How to use the flexibility of RedPoint and Hortonworks to create an enterprise data lake where data is captured, cleansed, linked and structured in a consistent way.
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
As the enterprise's big data program matures and Apache Hadoop becomes more deeply embedded in critical operations, the ability to support and operate it efficiently and reliably becomes increasingly important. To aid enterprise in operating modern data architecture at scale, Red hat and Hortonworks have collaborated to integrate Hortonworks Data Platform with Red Hat's proven platform technologies. Join us in this interactive 3-part webinar series, as we'll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data.
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
This document discusses optimizing a traditional enterprise data warehouse (EDW) architecture with Hortonworks Data Platform (HDP). It provides examples of how HDP can be used to archive cold data, offload expensive ETL processes, and enrich the EDW with new data sources. Specific customer case studies show cost savings ranging from $6-15 million by moving portions of the EDW workload to HDP. The presentation also outlines a solution model and roadmap for implementing an optimized modern data architecture.
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
The document discusses new features in Apache Hive 0.14 that improve SQL query performance. It introduces a cost-based optimizer that can optimize join orders, enabling faster query times. An example TPC-DS query is shown to demonstrate how the optimizer selects an efficient join order based on statistics about table and column sizes. Faster SQL queries are now possible in Hive through this query optimization capability.
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
When evaluating Apache Hadoop organizations often identifiy dozens of use cases for Hadoop but wonder where do you start? With hundreds of customer implementations of the platform we have seen that successful organizations start small in scale and small in scope. Join us in this session as we review common deployment patterns and successful implementations that will help guide you on your journey of cost optimization and new analytics with Hadoop.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
Join Cloudian, Hortonworks and 451 Research for a panel-style Q&A discussion about the latest trends and technology innovations in Big Data and Analytics. Matt Aslett, Data Platforms and Analytics Research Director at 451 Research, John Kreisa, Vice President of Strategic Marketing at Hortonworks, and Paul Turner, Chief Marketing Officer at Cloudian, will answer your toughest questions about data storage, data analytics, log data, sensor data and the Internet of Things. Bring your questions or just come and listen!
Hortonworks and Platfora in Financial Services - WebinarHortonworks
Big Data Analytics is transforming how banks and financial institutions unlock insights, make more meaningful decisions, and manage risk. Join this webinar to see how you can gain a clear understanding of the customer journey by leveraging Platfora to interactively analyze the mass of raw data that is stored in your Hortonworks Data Platform. Our experts will highlight use cases, including customer analytics and security analytics.
Speakers: Mark Lochbihler, Partner Solutions Engineer at Hortonworks, and Bob Welshmer, Technical Director at Platfora
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
Your Big Data strategy is only as good as the quality of your data. Today, deriving business value from data depends on how well your company can capture, cleanse, integrate and manage data. During this webinar, we discussed how to eliminate the challenges to Big Data management inside Hadoop.
Go over these slides to learn:
· How to use the scalability and flexibility of Hadoop to drive faster access to usable information across the enterprise.
· Why a pure-YARN implementation for data integration, quality and management delivers competitive advantage.
· How to use the flexibility of RedPoint and Hortonworks to create an enterprise data lake where data is captured, cleansed, linked and structured in a consistent way.
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
As the enterprise's big data program matures and Apache Hadoop becomes more deeply embedded in critical operations, the ability to support and operate it efficiently and reliably becomes increasingly important. To aid enterprise in operating modern data architecture at scale, Red hat and Hortonworks have collaborated to integrate Hortonworks Data Platform with Red Hat's proven platform technologies. Join us in this interactive 3-part webinar series, as we'll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data.
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
This document discusses optimizing a traditional enterprise data warehouse (EDW) architecture with Hortonworks Data Platform (HDP). It provides examples of how HDP can be used to archive cold data, offload expensive ETL processes, and enrich the EDW with new data sources. Specific customer case studies show cost savings ranging from $6-15 million by moving portions of the EDW workload to HDP. The presentation also outlines a solution model and roadmap for implementing an optimized modern data architecture.
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
The document discusses new features in Apache Hive 0.14 that improve SQL query performance. It introduces a cost-based optimizer that can optimize join orders, enabling faster query times. An example TPC-DS query is shown to demonstrate how the optimizer selects an efficient join order based on statistics about table and column sizes. Faster SQL queries are now possible in Hive through this query optimization capability.
Big Data Expo 2015 - Hortonworks Common Hadoop Use CasesBigDataExpo
When evaluating Apache Hadoop organizations often identifiy dozens of use cases for Hadoop but wonder where do you start? With hundreds of customer implementations of the platform we have seen that successful organizations start small in scale and small in scope. Join us in this session as we review common deployment patterns and successful implementations that will help guide you on your journey of cost optimization and new analytics with Hadoop.
Hortonworks, Novetta and Noble Energy Webinar Hortonworks
This document summarizes a webinar on enhancing security threat assessment presented by representatives from Noble Energy, Hortonworks, and Novetta. The webinar discussed Noble Energy's use of Hadoop and data analytics to gain insights into evolving security threats, provide critical information to decision makers, and safeguard operations across its global assets. It also described Novetta's security threat assessment solution which collects and analyzes online, subscription, and social media data using advanced profiles and an investigative interface. Finally, the webinar addressed Noble Energy's journey to becoming a more data-driven organization and the operational and strategic benefits as well as next steps in enhanced analytics.
Learn how when an organizations combine HP and Vertica Analytics Platform and Hortonworks, they can quickly explore and analyze broad variety of data types to transform to actionable information that allows them to better understand how their customers and site visitors interact with their business, offline and online.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
This webinar series covers Apache Kafka and Apache Storm for streaming data processing. Also, it discusses new streaming innovations for Kafka and Storm included in HDP 2.2
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
Accelerate Big Data Application Development with Cascading and HDP, webinar hosted by Hortonworks and Concurrent. Visit Hortonworks.com/webinars to access the recording.
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
The document discusses a Big Data Meetup organized by C-BAG (Chennai Big Data Analytic Group) on October 29, 2014 in Chennai. It provides details about two speakers, Dhruv Kumar from Concurrent Inc. and Vinay Shukla from Hortonworks, who will discuss reducing development time for production-grade Hadoop applications and Hortonworks' Hadoop platform respectively. The remainder of the document consists of presentation slides that cover topics including the modern data architecture with Hadoop, enterprise goals for data architecture, unlocking applications from new data types, and case studies.
This document summarizes a webinar presented by Hortonworks and Sqrrl on using big data analytics for cybersecurity. It discusses how the growth of data sources and targeted attacks require new security approaches. A modern data architecture with Hadoop can provide a common platform to analyze all security-related data and gain new insights. Sqrrl's linked data model and analytics run on Hortonworks to help investigate security incidents like a network breach, mapping different data sources and identifying abnormal activity patterns.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
This document discusses using Hadoop and the Hortonworks Data Platform (HDP) for big data applications. It outlines how HDP can help organizations optimize their existing data warehouse, lower storage costs, unlock new applications from new data sources, and achieve an enterprise data lake architecture. The document also discusses how Talend's data integration platform can be used with HDP to easily develop batch, real-time, and interactive data integration jobs on Hadoop. Case studies show how companies have used Talend and HDP together to modernize their data architecture and product inventory and pricing forecasting.
1) The webinar covered Apache Hadoop on the open cloud, focusing on key drivers for Hadoop adoption like new types of data and business applications.
2) Requirements for enterprise Hadoop include core services, interoperability, enterprise readiness, and leveraging existing skills in development, operations, and analytics.
3) The webinar demonstrated Hortonworks Apache Hadoop running on Rackspace's Cloud Big Data Platform, which is built on OpenStack for security, optimization, and an open platform.
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
How can you simplify the management and monitoring of your Hadoop environment? Ensure IT can focus on the right business priorities supported by Hadoop? Take a look at this presentation and learn how you can simplify the management and monitoring of your Hadoop environment, and ensure IT can focus on the right business priorities supported by Hadoop.
This document discusses the author's 10 year journey with Hadoop, from 2006 to 2016. It describes the evolution of key Hadoop technologies like HDFS, MapReduce, YARN and the addition of engines for SQL, NoSQL, streaming and in-memory processing. The document also addresses trends around growth of data from devices, users and the internet of things. It presents a vision of the future where Hadoop (YARN.next) will assemble and securely operate a flexible menu of data access applications and engines.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. It’s time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
Real Time Monitoring requires a high scalable infrastructure of message bus, database, distributed event processing and scalable analytics engine. By bringing together leading open source projects of Apache Kafka, Apache HBase, Apache Storm and Apache Hive, the Hortonworks Data Platform offers a comprehensive Real Time Analysis platform. In this session, we will provide an in-depth overview all the key technology components and demonstrate a working solution for monitoring a fleet of trucks.
Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=0278dc8aa49a9991e1ce436c71f53d30
see the recording: http://youtu.be/qdhF1sfef10
Ofer Medelvitch, Director of Data Science of Hortonworks and Michael Zeller, Founder and CEO of Zementis present key learnings as to what drives successful implementations of big data analytics projects. Their knowledge comes from working with dozens of companies from small cloud-based start-ups to some of the largest companies in the world.
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
Enhancing a customer experience has become essential for communication service providers to effectively manage customer churn and build a strong, long lasting relationship with their customers. This has become increasingly challenging as customer interactions occur across multiple channels. Understanding customer behavior and how it applies across channels is the key to ensuring the best level of experience is achieved by each customer.
In this webinar Hortonworks and Apigee discuss how service providers can capture and visualize customer behavior across customer interaction points like call center events (IVR and chat) and combine it with network data, to predict customer calls and patterns of digital channel abandonment using Hadoop and predictive analysis and visualization tools..
We will identify ways to develop a 360 degree view across a customer’s household through an HDP Data Lake and visualize customer interaction patterns and predict expected behavior using Apigee Insights to identify and initiate the Next-Best-Action for a customer to ensure a superior level of customer experience.
Hortonworks Yarn Code Walk Through January 2014Hortonworks
This slide deck accompanies the Webinar recording YARN Code Walk through on Jan. 22, 2014, on Hortonworks.com/webinars under Past Webinars, or
https://hortonworks.webex.com/hortonworks/lsr.php?AT=pb&SP=EC&rID=129468197&rKey=b645044305775657
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
An organization’s information is spread across multiple repositories, on-premise and in the cloud, with limited ability to correlate information and derive insights. The Smart Content Hub solution from HP and Hortonworks enables a shared content infrastructure that transparently synchronizes information with existing systems and offers an open standards-based platform for deep analysis and data monetization.
- Leverage 100% of your data: Text, images, audio, video, and many more data types can be automatically consumed and enriched using HP Haven (powered by HP IDOL and HP Vertica), making it possible to integrate this valuable content and insights into various line of business applications.
- Democratize and enable multi-dimensional content analysis: - Empower your analysts, business users, and data scientists to search and analyze Hadoop data with ease, using the 100% open source Hortonworks Data Platform.
- Extend the enterprise data warehouse: Synchronize and manage content from content management systems, and crack open the files in whatever format they happen to be in.
- Dramatically reduce complexity with enterprise-ready SQL engine: Tap into the richest analytics that support JOINs, complex data types, and other capabilities only available with HP Vertica SQL on the Hortonworks Data Platform.
Speakers:
- Ajay Singh, Director, Technical Channels, Hortonworks
- Will Gardella, Product Management, HP Big Data
In 2012, we released Hortonworks Data Platform powered by Apache Hadoop and established partnerships with major enterprise software vendors including Microsoft and Teradata that are making enterprise ready Hadoop easier and faster to consume. As we start 2013, we invite you to join us for this live webinar where Shaun Connolly, VP of Strategy at Hortonworks, will cover the highlights of 2012 and the road ahead in 2013 for Hortonworks and Apache Hadoop.
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
This document provides information about using Scalding on Tez. It begins with prerequisites for using Scalding on Tez, including having a YARN cluster, Cascading 3.0, and the TEZ runtime library in HDFS. It then discusses setting memory and Java heap configuration flags for Tez jobs in Scalding. The document provides a mini-tutorial on using Scalding on Tez, covering build configuration, job flags, and challenges encountered in practice like Guava version mismatches and issues with Cascading's Tez registry. It also presents a word count plus example Scalding application built to run on Tez. The document concludes with some tips for debugging Tez jobs in Scalding using Cascading's
http://hortonworks.com/hadoop/spark/
Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=03debab5ba04b34a033dc5c2f03c7967
As the ratio of memory to processing power rapidly evolves, many within the Hadoop community are gravitating towards Apache Spark for fast, in-memory data processing. And with YARN, they use Spark for machine learning and data science use cases along side other workloads simultaneously. This is a continuation of our YARN Ready Series, aimed at helping developers learn the different ways to integrate to YARN and Hadoop. Tools and applications that are YARN Ready have been verified to work within YARN.
Hortonworks, Novetta and Noble Energy Webinar Hortonworks
This document summarizes a webinar on enhancing security threat assessment presented by representatives from Noble Energy, Hortonworks, and Novetta. The webinar discussed Noble Energy's use of Hadoop and data analytics to gain insights into evolving security threats, provide critical information to decision makers, and safeguard operations across its global assets. It also described Novetta's security threat assessment solution which collects and analyzes online, subscription, and social media data using advanced profiles and an investigative interface. Finally, the webinar addressed Noble Energy's journey to becoming a more data-driven organization and the operational and strategic benefits as well as next steps in enhanced analytics.
Learn how when an organizations combine HP and Vertica Analytics Platform and Hortonworks, they can quickly explore and analyze broad variety of data types to transform to actionable information that allows them to better understand how their customers and site visitors interact with their business, offline and online.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
This webinar series covers Apache Kafka and Apache Storm for streaming data processing. Also, it discusses new streaming innovations for Kafka and Storm included in HDP 2.2
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
Accelerate Big Data Application Development with Cascading and HDP, webinar hosted by Hortonworks and Concurrent. Visit Hortonworks.com/webinars to access the recording.
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
The document discusses a Big Data Meetup organized by C-BAG (Chennai Big Data Analytic Group) on October 29, 2014 in Chennai. It provides details about two speakers, Dhruv Kumar from Concurrent Inc. and Vinay Shukla from Hortonworks, who will discuss reducing development time for production-grade Hadoop applications and Hortonworks' Hadoop platform respectively. The remainder of the document consists of presentation slides that cover topics including the modern data architecture with Hadoop, enterprise goals for data architecture, unlocking applications from new data types, and case studies.
This document summarizes a webinar presented by Hortonworks and Sqrrl on using big data analytics for cybersecurity. It discusses how the growth of data sources and targeted attacks require new security approaches. A modern data architecture with Hadoop can provide a common platform to analyze all security-related data and gain new insights. Sqrrl's linked data model and analytics run on Hortonworks to help investigate security incidents like a network breach, mapping different data sources and identifying abnormal activity patterns.
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
This document discusses using Hadoop and the Hortonworks Data Platform (HDP) for big data applications. It outlines how HDP can help organizations optimize their existing data warehouse, lower storage costs, unlock new applications from new data sources, and achieve an enterprise data lake architecture. The document also discusses how Talend's data integration platform can be used with HDP to easily develop batch, real-time, and interactive data integration jobs on Hadoop. Case studies show how companies have used Talend and HDP together to modernize their data architecture and product inventory and pricing forecasting.
1) The webinar covered Apache Hadoop on the open cloud, focusing on key drivers for Hadoop adoption like new types of data and business applications.
2) Requirements for enterprise Hadoop include core services, interoperability, enterprise readiness, and leveraging existing skills in development, operations, and analytics.
3) The webinar demonstrated Hortonworks Apache Hadoop running on Rackspace's Cloud Big Data Platform, which is built on OpenStack for security, optimization, and an open platform.
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
How can you simplify the management and monitoring of your Hadoop environment? Ensure IT can focus on the right business priorities supported by Hadoop? Take a look at this presentation and learn how you can simplify the management and monitoring of your Hadoop environment, and ensure IT can focus on the right business priorities supported by Hadoop.
This document discusses the author's 10 year journey with Hadoop, from 2006 to 2016. It describes the evolution of key Hadoop technologies like HDFS, MapReduce, YARN and the addition of engines for SQL, NoSQL, streaming and in-memory processing. The document also addresses trends around growth of data from devices, users and the internet of things. It presents a vision of the future where Hadoop (YARN.next) will assemble and securely operate a flexible menu of data access applications and engines.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. It’s time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
Real Time Monitoring requires a high scalable infrastructure of message bus, database, distributed event processing and scalable analytics engine. By bringing together leading open source projects of Apache Kafka, Apache HBase, Apache Storm and Apache Hive, the Hortonworks Data Platform offers a comprehensive Real Time Analysis platform. In this session, we will provide an in-depth overview all the key technology components and demonstrate a working solution for monitoring a fleet of trucks.
Audience: Developers, Architects and System Engineers from the Hortonworks Technology Partner community.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=0278dc8aa49a9991e1ce436c71f53d30
see the recording: http://youtu.be/qdhF1sfef10
Ofer Medelvitch, Director of Data Science of Hortonworks and Michael Zeller, Founder and CEO of Zementis present key learnings as to what drives successful implementations of big data analytics projects. Their knowledge comes from working with dozens of companies from small cloud-based start-ups to some of the largest companies in the world.
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
Enhancing a customer experience has become essential for communication service providers to effectively manage customer churn and build a strong, long lasting relationship with their customers. This has become increasingly challenging as customer interactions occur across multiple channels. Understanding customer behavior and how it applies across channels is the key to ensuring the best level of experience is achieved by each customer.
In this webinar Hortonworks and Apigee discuss how service providers can capture and visualize customer behavior across customer interaction points like call center events (IVR and chat) and combine it with network data, to predict customer calls and patterns of digital channel abandonment using Hadoop and predictive analysis and visualization tools..
We will identify ways to develop a 360 degree view across a customer’s household through an HDP Data Lake and visualize customer interaction patterns and predict expected behavior using Apigee Insights to identify and initiate the Next-Best-Action for a customer to ensure a superior level of customer experience.
Hortonworks Yarn Code Walk Through January 2014Hortonworks
This slide deck accompanies the Webinar recording YARN Code Walk through on Jan. 22, 2014, on Hortonworks.com/webinars under Past Webinars, or
https://hortonworks.webex.com/hortonworks/lsr.php?AT=pb&SP=EC&rID=129468197&rKey=b645044305775657
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
An organization’s information is spread across multiple repositories, on-premise and in the cloud, with limited ability to correlate information and derive insights. The Smart Content Hub solution from HP and Hortonworks enables a shared content infrastructure that transparently synchronizes information with existing systems and offers an open standards-based platform for deep analysis and data monetization.
- Leverage 100% of your data: Text, images, audio, video, and many more data types can be automatically consumed and enriched using HP Haven (powered by HP IDOL and HP Vertica), making it possible to integrate this valuable content and insights into various line of business applications.
- Democratize and enable multi-dimensional content analysis: - Empower your analysts, business users, and data scientists to search and analyze Hadoop data with ease, using the 100% open source Hortonworks Data Platform.
- Extend the enterprise data warehouse: Synchronize and manage content from content management systems, and crack open the files in whatever format they happen to be in.
- Dramatically reduce complexity with enterprise-ready SQL engine: Tap into the richest analytics that support JOINs, complex data types, and other capabilities only available with HP Vertica SQL on the Hortonworks Data Platform.
Speakers:
- Ajay Singh, Director, Technical Channels, Hortonworks
- Will Gardella, Product Management, HP Big Data
In 2012, we released Hortonworks Data Platform powered by Apache Hadoop and established partnerships with major enterprise software vendors including Microsoft and Teradata that are making enterprise ready Hadoop easier and faster to consume. As we start 2013, we invite you to join us for this live webinar where Shaun Connolly, VP of Strategy at Hortonworks, will cover the highlights of 2012 and the road ahead in 2013 for Hortonworks and Apache Hadoop.
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
Hadoop is a great platform for storing and processing massive amounts of data. Elasticsearch is the ideal solution for Searching and Visualizing the same data. Join us to learn how you can leverage the full power of both platforms to maximize the value of your Big Data.
In this webinar we'll walk you through:
How Elasticsearch fits in the Modern Data Architecture.
A demo of Elasticsearch and Hortonworks Data Platform.
Best practices for combining Elasticsearch and Hortonworks Data Platform to extract maximum insights from your data.
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
This document provides information about using Scalding on Tez. It begins with prerequisites for using Scalding on Tez, including having a YARN cluster, Cascading 3.0, and the TEZ runtime library in HDFS. It then discusses setting memory and Java heap configuration flags for Tez jobs in Scalding. The document provides a mini-tutorial on using Scalding on Tez, covering build configuration, job flags, and challenges encountered in practice like Guava version mismatches and issues with Cascading's Tez registry. It also presents a word count plus example Scalding application built to run on Tez. The document concludes with some tips for debugging Tez jobs in Scalding using Cascading's
http://hortonworks.com/hadoop/spark/
Recording:
https://hortonworks.webex.com/hortonworks/lsr.php?RCID=03debab5ba04b34a033dc5c2f03c7967
As the ratio of memory to processing power rapidly evolves, many within the Hadoop community are gravitating towards Apache Spark for fast, in-memory data processing. And with YARN, they use Spark for machine learning and data science use cases along side other workloads simultaneously. This is a continuation of our YARN Ready Series, aimed at helping developers learn the different ways to integrate to YARN and Hadoop. Tools and applications that are YARN Ready have been verified to work within YARN.
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
This webinar focuses on introducing Scalding for developers and writing applications for Hadoop and YARN using Scalding. Guest speaker Jonathan Coveney from Twitter provides an overview, use cases, limitations, and core concepts.
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
HBase adoption continues to explode amid rapid customer success and unbridled innovation. HBase with its limitless scalability, high reliability and deep integration with Hadoop ecosystem tools, offers enterprise developers a rich platform on which to build their next generation applications. In this workshop we will explore HBase SQL capabilities, deep Hadoop ecosystem integrations and deployment & management best practices.
Ambari 2.1 includes several new features and improvements, including manual Kerberos configuration, customizable dashboards, guided configurations, rack awareness, and views framework enhancements. It adds support for Storm Nimbus HA, Ranger HA, RHEL/CentOS 7, and new JDKs. The views framework now supports auto-configuration using cluster properties and auto-creation of view instances when cluster requirements are met. Standalone Ambari servers allow running views without managing a cluster.
This document compares query performance times between Apache Hive versions 0.10 and 0.13 using a benchmark of 50 SQL queries on a 30TB dataset. The results show that Hive 0.13 was over 100 times faster for 6 queries and averaged 52 times faster for all queries compared to Hive 0.10. Significant performance improvements were achieved through optimizations made during the Stinger Initiative involving 145 developers from 44 companies over 13 months.
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your analytic efforts. The slides from this technical webinar present a deep dive on this powerful modern data architecture for analytics and data science.
Learn more here: http://pivotal.io/big-data/pivotal-hawq
Apache Ambari: Managing Hadoop and YARNHortonworks
Part of the Hortonworks YARN Ready Webinar Series, this session is about management of Apache Hadoop and YARN using Apache Ambari. This series targets developers and we will feature a demo on Ambari.
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
Hortonworks Data Platform (HDP) 2.3 includes several new capabilities:
1) It improves the user experience with more guided configuration, customizable dashboards, and improved workload management.
2) It enhances security with new data encryption at rest and extends data governance.
3) It adds proactive cluster monitoring through Hortonworks SmartSense to enhance support.
As Hadoop becomes the defacto big data platform, enterprises deploy HDP across wide range of physical and virtual environments spanning private and public clouds. This session will cover key considerations for cloud deployment and showcase Cloudbreak for simple and consistent deployment across cloud providers of choice.
Hortonworks technical workshop operations with ambariHortonworks
Ambari continues on its journey of provisioning, monitoring and managing enterprise Hadoop deployments. With 2.0, Apache Ambari brings a host of new capabilities including updated metric collections; Kerberos setup automation and developer views for Big Data developers. In this Hortonworks Technical Workshop session we will provide an in-depth look into Apache Ambari 2.0 and showcase security setup automation using Ambari 2.0. View the recording at https://www.brighttalk.com/webcast/9573/155575. View the github demo work at https://github.com/abajwa-hw/ambari-workshops/blob/master/blueprints-demo-security.md. Recorded May 28, 2015.
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
This document discusses Databricks Cloud, a platform for running Apache Spark workloads that aims to accelerate time-to-results from months to days. It provides a unified platform with notebooks, dashboards, and jobs running on Spark clusters managed by Databricks. Key benefits include zero management of clusters, interactive queries and streaming for real-time insights, and the ability to develop models and visualizations in notebooks and deploy them as production jobs or dashboards without code changes. The platform is open source with no vendor lock-in and supports various data sources and third party applications. It is being used by over 3,500 organizations for applications like data preparation, analytics, and machine learning.
I gave this talk on the Highload++ conference 2015 in Moscow. Slides have been translated into English. They cover the Apache HAWQ components, its architecture, query processing logic, and also competitive information
Beyond SQL: Speeding up Spark with DataFramesDatabricks
This document summarizes Spark SQL and DataFrames in Spark. It notes that Spark SQL is part of the core Spark distribution and allows running SQL and HiveQL queries. DataFrames provide a way to select, filter, aggregate and plot structured data like in R and Pandas. DataFrames allow writing less code through a high-level API and reading less data by using optimized formats and partitioning. The optimizer can optimize queries across functions and push down predicates to read less data. This allows creating and running Spark programs faster.
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
This document provides an agenda for an advanced Spark class covering topics such as RDD fundamentals, Spark runtime architecture, memory and persistence, shuffle operations, and Spark Streaming. The class will be held in March 2015 and include lectures, labs, and Q&A sessions. It notes that some slides may be skipped and asks attendees to keep Q&A low during the class, with a dedicated Q&A period at the end.
This document provides an overview of a solution using HPE infrastructure with SAP HANA Vora, Hortonworks HDP, and Apache Spark. It describes implementing interactive analytics across SAP HANA and Hadoop data and managing data movement between SAP HANA and HDP. The solution uses HPE ConvergedSystem 500 for SAP HANA with HPE Elastic Platform for Big Data Analytics running HDP and SAP HANA Vora on HPE Apollo 2000 and 4200 servers. It aims to simplify data access and management while accelerating analytics across enterprise and big data.
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
Your Big Data strategy is only as good as the quality of your data. Today, deriving business value from data depends on how well your company can capture, cleanse, integrate and manage data. During this webinar, we discuss how to eliminate the challenges to Big Data management inside Hadoop.
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
Companies in every industry look for ways to explore new data types and large data sets that were previously too big to capture, store and process. They need to unlock insights from data such as clickstream, geo-location, sensor, server log, social, text and video data. However, becoming a data-first enterprise comes with many challenges.
Join this webinar organized by three leaders in their respective fields and learn from our experts how you can accelerate the implementation of a scalable, cost-efficient and robust Big Data solution. Cisco, Hortonworks and Red Hat will explore how new data sets can enrich existing analytic applications with new perspectives and insights and how they can help you drive the creation of innovative new apps that provide new value to your business.
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
Hadoop is no longer optional. Companies of all sizes are in various phases of their own Big Data journey. Whether you are just starting to explore the platform or have multiple clusters up and running, everyone is presented with a similar challenge - developing their internal skillset. Hadoop specialists are hard to find. Hand coding is too prone to error when it comes to storing, integrating or analyzing your data. However, it doesn’t need to be this difficult.
In this recorded webinar, Talend and Hortonworks help you learn how to unify all your data in Hadoop, with no specialized Big Data skills.
Find the recording here. www.talend.com/resources/webinars/challenges-to-hadoop-adoption-if-you-can-dream-it-you-can-build-it
This webinar covers: How Hadoop opens a new world of analytic applications, How to bridge the skills gap with our Big Data solutions, Experience a real-world, simple technical demo
Hortonworks provides an open source Apache Hadoop data platform to help organizations solve big data problems. It was founded in 2011 and was the first Hadoop company to go public. Hortonworks has over 800 employees across 17 countries and over 1,350 technology partners. Hortonworks' Hadoop Data Platform is a collection of Apache projects that provides data management, data access, governance and integration, operations, and security capabilities for enterprises. The platform supports batch, interactive, and streaming analytics on large volumes of structured and unstructured data across on-premise and cloud deployments.
Hortonworks provides an open source Apache Hadoop data platform for managing large volumes of data. It was founded in 2011 and went public in 2014. Hortonworks has over 800 employees across 17 countries and partners with over 1,350 technology companies. Hortonworks' Data Platform is a collection of Apache projects that provides data management, access, governance, integration, operations and security capabilities. It supports batch, interactive and real-time processing on a shared infrastructure using the YARN resource management system.
Human Information is made up of ideas, is diverse, and has context.
Ideas don’t exactly match like data does; they have distance.
Human Information is not static – it’s dynamic and lives everywhere.
Details on applications
HAVEn is integrated to costumers architecture through other n Apps
HP has started modifying our existing application portfolio to use HAVEn
And HP is building new applications that leverage power of HAVEn
Many customers are already building applications that use multiple HAVEn
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
This document provides an overview of how Hortonworks uses Apache Hadoop to enable a modern data architecture. Some key points:
- Hadoop allows organizations to create a "data lake" to store all types of data in one place and process it in various ways for different use cases.
- This provides a multi-use data platform that unlocks new approaches to insights by enabling analysis across all data, rather than just subsets stored in silos.
- A modern data architecture with Hadoop integrates with existing investments while freeing up resources for more valuable tasks by offloading lower value workloads to Hadoop.
- Examples of business applications that can benefit from Hadoop include optimizing customer insights
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
This document provides an overview of Hortonworks and Hadoop. It discusses Hortonworks' customer momentum, the Hortonworks Data Platform (HDP) which provides a multi-tenant platform for any application and data, and Hortonworks' focus on customer success through its open source community leadership and support. It also discusses how Hadoop has emerged as the foundation for a modern data architecture to unify data processing and analytics for both traditional and new data sources in order to drive business value.
This document provides an overview of Hortonworks and Hadoop. It discusses Hortonworks' customer momentum, the Hortonworks Data Platform (HDP), and Hortonworks' role as a partner for customer success. It also summarizes challenges with traditional data systems, how Hadoop emerged as a foundation for a new data architecture, and how HDP delivers a comprehensive data management platform.
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
In this webinar, WANdisco and Hortonworks look at three examples of using 'Big Data' to get a more comprehensive view of customer behavior and activity in the banking and insurance industries. Then we'll pull out the common threads from these examples, and see how a flexible next-generation Hadoop architecture lets you get a step up on improving your business performance. Join us to learn:
- How to leverage data from across an entire global enterprise
- How to analyze a wide variety of structured and unstructured data to get quick, meaningful answers to critical questions
- What industry leaders have put in place
This document provides an introduction to Hadoop and big data concepts. It discusses what big data is and how companies like Amazon and Netflix have seen returns on investment from applying data science to large amounts of data. It then covers Hadoop and HDFS, explaining what they are, their architecture, and common commands used to work with HDFS like put, get, ls, and cat. The document is an introductory presentation on big data and Hadoop.
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
- Traditional systems are under pressure due to their inability to manage new data sources and costly scaling. A modern data architecture using Apache Hadoop emerges to provide a centralized platform for all enterprise data and applications.
- Hortonworks Data Platform is powered by Apache Hadoop and provides a flexible, scalable platform for storing and processing all data types from any source and supports a variety of applications. It offers governance, security, and operations controls for enterprise data management.
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
Implementing and managing a Big Data environment effectively requires essential efficiencies such as automation, performance monitoring and flexible infrastructure management. Discover new innovations that enable you to manage entire Big Data environments with unparalleled ease of use and clear enterprise visibility across a variety of data repositories.
To learn more about Mainframe solutions from CA Technologies, visit: http://bit.ly/1wbiPkl
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
The document discusses Hortonworks and IBM's partnership around data management and analytics. It highlights how their combined platforms can power the modern data architecture with solutions for data at rest and in motion. Examples are provided of how customers like Merck and JPMC have leveraged Hortonworks' technologies to gain insights from their data and drive business outcomes. Industries that are investing in data science are also listed.
Up Your Analytics Game with Pentaho and Vertica Pentaho
Big Data is a game-changer.
In the face of exploding volumes and varieties of data, traditional data management and ETL systems just aren’t cutting it anymore. A new way of sifting through vast volumes of data to find the most relevant info, combining this data with other data sources to extract faster insights is desperately needed. Enter HP|Vertica and Pentaho with a proven solution for lightning fast queries and blended data and analytics capabilities for your business users.
Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.
Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.
Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.
Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
This document discusses how Syncsort helps companies access and integrate data from various sources to power analytics. It provides examples of how Syncsort has helped companies in insurance, media, and hotels easily onboard and integrate both historical and streaming data from multiple sources like mainframes, databases, and IoT devices. This allows for faster insights, increased productivity, cost savings, and helps future proof applications.
Hortonworks and Voltage Security webinarHortonworks
Securing Hadoop data is a hot topic for good reason – no matter where you are in your Hadoop implementation plans, it’s best to define your data security approach now, not later. Hortonworks and Voltage Security are focused on deeply integrating Hadoop with your existing data center technologies and team capabilities. Attend this discussion to learn about a central policy administration framework across security requirements for authentication, authorization, auditing and data protection.
Similar to Webinar turbo charging_data_science_hawq_on_hdp_final (20)
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
The HDF 3.3 release delivers several exciting enhancements and new features. But, the most noteworthy of them is the addition of support for Kafka 2.0 and Kafka Streams.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-3-taking-stream-processing-next-level/
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
Forrester forecasts* that direct spending on the Internet of Things (IoT) will exceed $400 Billion by 2023. From manufacturing and utilities, to oil & gas and transportation, IoT improves visibility, reduces downtime, and creates opportunities for entirely new business models.
But successful IoT implementations require far more than simply connecting sensors to a network. The data generated by these devices must be collected, aggregated, cleaned, processed, interpreted, understood, and used. Data-driven decisions and actions must be taken, without which an IoT implementation is bound to fail.
https://hortonworks.com/webinar/iot-predictions-2019-beyond-data-heart-iot-strategy/
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
Cloudbreak, a part of Hortonworks Data Platform (HDP), simplifies the provisioning and cluster management within any cloud environment to help your business toward its path to a hybrid cloud architecture.
https://hortonworks.com/webinar/getting-data-cloud-cloudbreak-live-demo/
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
In this webinar, we talk with experts from Johns Hopkins as they share techniques and lessons learned in real-world Apache Hadoop implementation.
https://hortonworks.com/webinar/johns-hopkins-using-hadoop-securely-access-log-events/
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
Cybersecurity today is a big data problem. There’s a ton of data landing on you faster than you can load, let alone search it. In order to make sense of it, we need to act on data-in-motion, use both machine learning, and the most advanced pattern recognition system on the planet: your SOC analysts. Advanced visualization makes your analysts more efficient, helps them find the hidden gems, or bombs in masses of logs and packets.
https://hortonworks.com/webinar/catch-hacker-real-time-live-visuals-bots-bad-guys/
We have introduced several new features as well as delivered some significant updates to keep the platform tightly integrated and compatible with HDP 3.0.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-2-release-raises-bar-operational-efficiency/
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
With the growth of Apache Kafka adoption in all major streaming initiatives across large organizations, the operational and visibility challenges associated with Kafka are on the rise as well. Kafka users want better visibility in understanding what is going on in the clusters as well as within the stream flows across producers, topics, brokers, and consumers.
With no tools in the market that readily address the challenges of the Kafka Ops teams, the development teams, and the security/governance teams, Hortonworks Streams Messaging Manager is a game-changer.
https://hortonworks.com/webinar/curing-kafka-blindness-hortonworks-streams-messaging-manager/
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
The healthcare industry—with its huge volumes of big data—is ripe for the application of analytics and machine learning. In this webinar, Hortonworks and Quanam present a tool that uses machine learning and natural language processing in the clinical classification of genomic variants to help identify mutations and determine clinical significance.
Watch the webinar: https://hortonworks.com/webinar/interpretation-tool-genomic-sequencing-data-clinical-environments/
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
Last year IBM and Hortonworks jointly announced a strategic and deep partnership. Join us as we take a close look at the partnership accomplishments and the conjoined road ahead with industry-leading analytics offers.
View the webinar here: https://hortonworks.com/webinar/ibmhortonworks-transformation-big-data-landscape/
The document provides an overview of Apache Druid, an open-source distributed real-time analytics database. It discusses Druid's architecture including segments, indexing, and nodes like brokers, historians and coordinators. It also covers integrating Druid with Hortonworks Data Platform for unified querying and visualization of streaming and historical data.
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
Gaining business advantages from big data is moving beyond just the efficient storage and deep analytics on diverse data sources to using AI methods and analytics on streaming data to catch insights and take action at the edge of the network.
https://hortonworks.com/webinar/accelerating-data-science-real-time-analytics-scale/
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
Thanks to sensors and the Internet of Things, industrial processes now generate a sea of data. But are you plumbing its depths to find the insight it contains, or are you just drowning in it? Now, Hortonworks and Seeq team to bring advanced analytics and machine learning to time-series data from manufacturing and industrial processes.
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
Trimble Transportation Enterprise is a leading provider of enterprise software to over 2,000 transportation and logistics companies. They have designed an architecture that leverages Hortonworks Big Data solutions and Machine Learning models to power up multiple Blockchains, which improves operational efficiency, cuts down costs and enables building strategic partnerships.
https://hortonworks.com/webinar/blockchain-with-machine-learning-powered-by-big-data-trimble-transportation-enterprise/
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
For years, the healthcare industry has had problems of data scarcity and latency. Clearsense solved the problem by building an open-source Hortonworks Data Platform (HDP) solution while providing decades worth of clinical expertise. Clearsense is delivering smart, real-time streaming data, to its healthcare customers enabling mission-critical data to feed clinical decisions.
https://hortonworks.com/webinar/delivering-smart-real-time-streaming-data-healthcare-customers-clearsense/
Making Enterprise Big Data Small with EaseHortonworks
Every division in an organization builds its own database to keep track of its business. When the organization becomes big, those individual databases grow as well. The data from each database may become silo-ed and have no idea about the data in the other database.
https://hortonworks.com/webinar/making-enterprise-big-data-small-ease/
Driving Digital Transformation Through Global Data ManagementHortonworks
Using your data smarter and faster than your peers could be the difference between dominating your market and merely surviving. Organizations are investing in IoT, big data, and data science to drive better customer experience and create new products, yet these projects often stall in ideation phase to a lack of global data management processes and technologies. Your new data architecture may be taking shape around you, but your goal of globally managing, governing, and securing your data across a hybrid, multi-cloud landscape can remain elusive. Learn how industry leaders are developing their global data management strategy to drive innovation and ROI.
Presented at Gartner Data and Analytics Summit
Speaker:
Dinesh Chandrasekhar
Director of Product Marketing, Hortonworks
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
Hortonworks DataFlow (HDF) is the complete solution that addresses the most complex streaming architectures of today’s enterprises. More than 20 billion IoT devices are active on the planet today and thousands of use cases across IIOT, Healthcare and Manufacturing warrant capturing data-in-motion and delivering actionable intelligence right NOW. “Data decay” happens in a matter of seconds in today’s digital enterprises.
To meet all the needs of such fast-moving businesses, we have made significant enhancements and new streaming features in HDF 3.1.
https://hortonworks.com/webinar/series-hdf-3-1-technical-deep-dive-new-streaming-features/
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
Join the Hortonworks product team as they introduce HDF 3.1 and the core components for a modern data architecture to support stream processing and analytics.
You will learn about the three main themes that HDF addresses:
Developer productivity
Operational efficiency
Platform interoperability
https://hortonworks.com/webinar/series-hdf-3-1-redefining-data-motion-modern-data-architectures/
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
The document discusses Apache NiFi and streaming change data capture (CDC) with Attunity Replicate. It provides an overview of NiFi's capabilities for dataflow management and visualization. It then demonstrates how Attunity Replicate can be used for real-time CDC to capture changes from source databases and deliver them to NiFi for further processing, enabling use cases across multiple industries. Examples of source systems include SAP, Oracle, SQL Server, and file data, with targets including Hadoop, data warehouses, and cloud data stores.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.