This document provides an overview of data science and machine learning. It discusses what data science and machine learning are, including extracting insights from data and computers learning without being explicitly programmed. It also covers Apache Spark, which is an open source framework for large-scale data processing. Finally, it discusses common machine learning algorithms like regression, classification, clustering, and dimensionality reduction.
Tuomas Autio's and Mikko Mattila's presentation from Hadoop & Azure Marketplace - digitalisaation tekijät Breakfast seminar on the 26th April. Find our blogs about Hadoop: http://www.bilot.fi/en/explore/?cat=blog&tag=hadoop
Pasi Vuorela's presentation from the Hadoop ja Azure Marketplace - digitalisaation tekijät - event. Vuorela works as Nordic Sales Manager @ Hortonworks
Tuomas Autio's and Mikko Mattila's presentation from Hadoop & Azure Marketplace - digitalisaation tekijät Breakfast seminar on the 26th April. Find our blogs about Hadoop: http://www.bilot.fi/en/explore/?cat=blog&tag=hadoop
Pasi Vuorela's presentation from the Hadoop ja Azure Marketplace - digitalisaation tekijät - event. Vuorela works as Nordic Sales Manager @ Hortonworks
see the recording: http://youtu.be/qdhF1sfef10
Ofer Medelvitch, Director of Data Science of Hortonworks and Michael Zeller, Founder and CEO of Zementis present key learnings as to what drives successful implementations of big data analytics projects. Their knowledge comes from working with dozens of companies from small cloud-based start-ups to some of the largest companies in the world.
Make Streaming IoT Analytics Work for YouHortonworks
Download Hortonworks DataFlow (HDF™) here - http://hortonworks.com/downloads/#dataflow. Making Streaming IoT Analytics Work For You With Apache NiFi, Storm, Raspberry Pi and more.
It is almost impossible to escape the topic of Data Science. While the core of Data Science has remained the same over the last decade, it’s emergence to the forefront is spurred by both the availability of new data types and a true realization of the value that it delivers. In this session, we will provide an overview of data science, the different classes of machine learning algorithm and deliver an end-to-end demonstration of performing Machine Learning Using Hadoop. Audience: Developers, Data Scientist Architects and System Engineers.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=4175a7421d00257f33df146f50c41af8
Data proliferation from 7+ billion humans and 20+ billion devices from every walk of life has been the focus in the last decade. With the velocity, variety and volume of data, every data organization’s goal shifted to protecting and monetizing data from rapidly growing network of IOT embedded objects and sensors.
One of the true and tried business continuity methodology of storing and retrieving vast amount of data has been through replication of Hadoop systems on hybrid clouds and in geographically distributed data centers. Replication is similar to Blockchain using autonomous smart contracts instantiated on the metadata and data so that the replicated data follows a single source of truth.
Replicas can be maintained across geographically distributed data centers giving greater risk tolerance capabilities to the businesses continuity plan for the data-sets. With intelligent predictive analytics based on usage patterns, dynamic tiering policies can be triggered on the data sets to provide true value-add to the data. The temperature of the data is used to move data between hot/warm/cold/archival storage based on configurable policies leading to greater reduction in total cost of ownership.
Users in 2018 and beyond demand absolute availability of data as and when they desire. The dynamic data access management is fundamental concept to satisfy the business continuity plan. Seamless enterprise-grade disaster recovery to support business continuity use case has significant challenges around replicating security and governance on data-sets. In this talk we will discuss how the above challenge can be addressed for supporting seamless replication and disaster recovery for Hadoop-scale data. NIRU ANISETI, Product Manager, Hortonworks
Global Data Management – a practical framework to rethinking enterprise, oper...DataWorks Summit
Global data management is not a newly coined term. However, what it stands for is actually widening in scope particularly around data-in-motion and data-at-rest. Significant technology trends such as IoT, cloud, AI/ML, blockchain, and streaming data have given rise to excessive data volumes and also innovative use cases. The scope for global data management now extends all the way from ingestion, processing, storage, governance, security to analysis. With a good number of endpoints served through the cloud and major application footprints remaining on-premisess, it is pertinent to have a global data management strategy that supports hybrid models and more specifically, a multi-cloud model.
Many modern businesses struggle to balance the demands of rapidly innovating through new technologies like machine learning with the need to keep data safe and secure, all while responding to a constantly changing regulatory landscape. This puts data stewards, data engineers, architects, data scientists, and analysts under intense pressure as they must contend with existing and new applications, multiple logical and physical data stores and sources, diverse data types, and data spread across several deployment environments.
Attend this session led by Matt Aslett, Research Director at 451 Research and Dinesh Chandrasekhar, Director, Hortonworks to learn more about creating a framework for your enterprise that offers guidance on how to think about global data management—priorities, responsibilities, key stakeholders, compliance, and growth.
Speakers
Dinesh Chandrasekhar, Hortonworks, Director Product Marketing
Matt Aslett, 451 Research, Research Director, Data platforms and Analytics
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
In this webinar, we talk with experts from Johns Hopkins as they share techniques and lessons learned in real-world Apache Hadoop implementation.
https://hortonworks.com/webinar/johns-hopkins-using-hadoop-securely-access-log-events/
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks
View the recording of the meet up, including the live demos, here: https://www.youtube.com/watch?v=uaJWB3K8lkg
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production?
Why Data Science on Big Data?
In this meetup you will cover the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of Apache Zeppelin, Apache Spark, Apache Livy and Apache Hadoop with the focus on integration, security and model deployment and management.
Data Science at Scale DEMO
The demo will cover the Data Science life cycle: develop model in team environment, train the model with all the data on a Hadoop cluster, deploy model into production. The model will be a Spark ML model
Practical ML with Apache Spark
To deliver machine learning solutions data scientists not only need to fit models but also do familiar tasks data collection & wrangling, labelling, feature extraction and transformation, model tuning and evaluation, etc. Apache Spark provide provides a unified solution for all this under the same framework.
For example, one can use Spark SQL to generate training data from different sources and then pass it directly to MLlib for feature engineering and model tuning, instead of using Hive/Pig for the first half and then downloading the data to a single machine to train models in R. The latter is actually very common in practice but painful to maintain. Spark MLlib makes life easier for data scientists and machine learning engineers so that they can focus on building better ML models and applications.
We will discuss the underlying principles required to develop practical machine learning and data science pipelines and show some hands-on experience using Apache Spark to solve typical machine learning and data science problem. We will also have a short discussion about how Spark MLlib faces challenges from other machine learning libraries such as TensorFlow and XGBoost.
Digital transformations require a new hybrid cloud—one that’s open by design, and frees clients to choose and change environments, data and services as needed. This approach allows cloud apps and services to be rapidly composed using the best relevant data and insights available, while maintaining clear visibility, control and security—everywhere. How do you decide where to put data on a hybrid cloud and how to use it? What’s the best hybrid cloud strategy in terms of data and workload? How should you leverage a 50/50 rule or a 80/20 rule and user interaction to evaluate which data/workload to move to the cloud and which data/workload to keep on-premise? Hybrid cloud provides an open platform for innovation, including cognitive computing. Organizations are looking for taking shadow IT out of the shadows by providing a self-service way to the information and a hybrid cloud strategy is allowing that. Also, how to use hybrid cloud for better manage data sovereignty & compliance?
General Data Protection Regulation (GDPR) which will be in effect in 2018, brings newer requirements for managing personal and sensitive data of European Union subjects. The recently enacted Privacy Shield directive from 2016 now regulates the movement of data between EU and the US. Together, both regulations are impacting how CXOs are thinking about procuring, storing and processing personal and sensitive data.
Over the last few years, open-source projects such as Apache Ranger and Apache Atlas have been driving comprehensive security and governance within Hadoop and the big data ecosystem. Solution vendors such as Privacera are leveraging the power of Hadoop and Apache projects such as Atlas, Ranger to help security and compliance teams within enterprises easily identify and protect data that are subject to the privacy regulations and monitor the use of such data.
This talk will walk through the current regulatory climate in Europe and how it can impact big data implementations. We will specifically walk through a business framework that enterprises can use to build a strategy to manage GDPR, Privacy Shield, and other regulations. We will use a live demonstration to show how projects such as Apache Ranger, Apache Atlas and solutions such as Privacera can be used effectively to address specific requirements of these regulations.
With the advent of Big Data in the Threat Analytics space needs emerge to perform near real-time (NRT) threat detection and automated interpretation that speed counter measures and remediation. AT&T Chief Security Organization (CSO) has developed an enterprise architecture that includes near real-time outlier processes necessary to protect its network from cyber threats using the Hadoop ecosystem. One enterprise challenge that CSO has faced is summarized in the statement by Brian Rexroad, Executive Director of Technology and Security: "I feel there is too much emphasis is on "detecting". Significantly more emphasis is needed in automated extraction of related information/activity and interpretation of that information." Therefore; CSO Engineering team developed the Stratum™ architecture that includes many open source and commercial products facilitating the rapid development and operationalization of outliner detectors and interpreters. Extensive use of NRT data ingestion, enrichment, organization and random access storage patterns, make these capabilities possible on top of a Hadoop based ecosystem. The Stratum™ architecture offers the CSO the ability to minimize the time and effects of many cyber threats. Using Big Data technologies for cyber threat analysis is becoming quite common, but the need for outlier detection and interpretation is crucial for enterprise protection.
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
Gaining business advantages from big data is moving beyond just the efficient storage and deep analytics on diverse data sources to using AI methods and analytics on streaming data to catch insights and take action at the edge of the network.
https://hortonworks.com/webinar/accelerating-data-science-real-time-analytics-scale/
Slides from the joint webinar. Learn how Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your Data Science efforts.
Together, Pivotal HAWQ and the Hortonworks Data Platform provide businesses with a Modern Data Architecture for IT transformation.
3 CTOs Discuss the Shift to Next-Gen Analytic EcosystemsHortonworks
Wow! When have you ever sat in on a Big Data analytics discussion by three of the most influential CTOs in the industry? What do they talk about among themselves?
Join Teradata's Stephen Brobst, Informatica's Sanjay Krishnamurthi, and Hortonworks' Scott Gnau as they provide a framework and best practices for maximizing value for data assets deployed within a Big Data & Analytics Architecture.
With the rise of IoT and complexity of applications, clouds, networks and infrastructure, it is becoming more difficult to protect data and infrastructure from attackers. When groups of bad actors collaborate, share information, provide unauthorized access, and do botnet as a service, attacks in terabit units also start easily. On the other hand, it is also difficult to find enough security analysts to deal with and defend against such attacks.
Here is the emergence of community cooperation like Apache Metron and efforts to open source. Metron provides a comprehensive framework for applications, networks and security built on Apache Hadoop and open source streaming analysis (eg Apache Nifi, Apache Kafka) tools in scalable data management and processing stacks. Extensions such as profiling, machine learning, and visualization work and real-time streaming detection make SOC analysts more efficient, while intrinsic scalability of open source gives data scientists security insight from data laboratories So that it can be quickly incorporated into production.
This section explains how real-world businesses and managed service providers use Apache Metron, identify and resolve security threats on a large scale, and explain methods and ideas for adapting the platform to your security architecture · I will demonstrate.
Big Traffic, Big Trouble: Big Data Security AnalyticsDataWorks Summit
With the rise of IoT and the increasing complexity of applications, clouds, networks and infrastructure, the battle to keep your data and your infrastructure safe from attackers is getting harder. As groups of bad actors collaborate, sharing information and offering illegal access, and botnets as a service, terabits of attack can be launched cheaply. Meanwhile, it’s hard to find enough security analysts to catch and prevent these attacks.
This is where community collaboration and open source efforts like Apache Metron come in. Metron presents a comprehensive framework for application and network, security built on Apache Hadoop and open source Streaming Analytics(ie Apache Nifi, Apache Kafka) tool’s highly scalable data management and processing stacks. Advanced features like profiling, machine learning, and visualization work with real-time streaming detection to make your SOC analysts more efficient, while the intrinsic extensibility of open source helps your data scientists get security insights out of the lab and into production fast.
We will discuss and demonstrate how some real-world businesses and managed service providers are using Apache Metron to identify and solve security threats at scale, and some approaches and ideas for how the platform can fit into your security architecture.
see the recording: http://youtu.be/qdhF1sfef10
Ofer Medelvitch, Director of Data Science of Hortonworks and Michael Zeller, Founder and CEO of Zementis present key learnings as to what drives successful implementations of big data analytics projects. Their knowledge comes from working with dozens of companies from small cloud-based start-ups to some of the largest companies in the world.
Make Streaming IoT Analytics Work for YouHortonworks
Download Hortonworks DataFlow (HDF™) here - http://hortonworks.com/downloads/#dataflow. Making Streaming IoT Analytics Work For You With Apache NiFi, Storm, Raspberry Pi and more.
It is almost impossible to escape the topic of Data Science. While the core of Data Science has remained the same over the last decade, it’s emergence to the forefront is spurred by both the availability of new data types and a true realization of the value that it delivers. In this session, we will provide an overview of data science, the different classes of machine learning algorithm and deliver an end-to-end demonstration of performing Machine Learning Using Hadoop. Audience: Developers, Data Scientist Architects and System Engineers.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=4175a7421d00257f33df146f50c41af8
Data proliferation from 7+ billion humans and 20+ billion devices from every walk of life has been the focus in the last decade. With the velocity, variety and volume of data, every data organization’s goal shifted to protecting and monetizing data from rapidly growing network of IOT embedded objects and sensors.
One of the true and tried business continuity methodology of storing and retrieving vast amount of data has been through replication of Hadoop systems on hybrid clouds and in geographically distributed data centers. Replication is similar to Blockchain using autonomous smart contracts instantiated on the metadata and data so that the replicated data follows a single source of truth.
Replicas can be maintained across geographically distributed data centers giving greater risk tolerance capabilities to the businesses continuity plan for the data-sets. With intelligent predictive analytics based on usage patterns, dynamic tiering policies can be triggered on the data sets to provide true value-add to the data. The temperature of the data is used to move data between hot/warm/cold/archival storage based on configurable policies leading to greater reduction in total cost of ownership.
Users in 2018 and beyond demand absolute availability of data as and when they desire. The dynamic data access management is fundamental concept to satisfy the business continuity plan. Seamless enterprise-grade disaster recovery to support business continuity use case has significant challenges around replicating security and governance on data-sets. In this talk we will discuss how the above challenge can be addressed for supporting seamless replication and disaster recovery for Hadoop-scale data. NIRU ANISETI, Product Manager, Hortonworks
Global Data Management – a practical framework to rethinking enterprise, oper...DataWorks Summit
Global data management is not a newly coined term. However, what it stands for is actually widening in scope particularly around data-in-motion and data-at-rest. Significant technology trends such as IoT, cloud, AI/ML, blockchain, and streaming data have given rise to excessive data volumes and also innovative use cases. The scope for global data management now extends all the way from ingestion, processing, storage, governance, security to analysis. With a good number of endpoints served through the cloud and major application footprints remaining on-premisess, it is pertinent to have a global data management strategy that supports hybrid models and more specifically, a multi-cloud model.
Many modern businesses struggle to balance the demands of rapidly innovating through new technologies like machine learning with the need to keep data safe and secure, all while responding to a constantly changing regulatory landscape. This puts data stewards, data engineers, architects, data scientists, and analysts under intense pressure as they must contend with existing and new applications, multiple logical and physical data stores and sources, diverse data types, and data spread across several deployment environments.
Attend this session led by Matt Aslett, Research Director at 451 Research and Dinesh Chandrasekhar, Director, Hortonworks to learn more about creating a framework for your enterprise that offers guidance on how to think about global data management—priorities, responsibilities, key stakeholders, compliance, and growth.
Speakers
Dinesh Chandrasekhar, Hortonworks, Director Product Marketing
Matt Aslett, 451 Research, Research Director, Data platforms and Analytics
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
In this webinar, we talk with experts from Johns Hopkins as they share techniques and lessons learned in real-world Apache Hadoop implementation.
https://hortonworks.com/webinar/johns-hopkins-using-hadoop-securely-access-log-events/
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Hortonworks
View the recording of the meet up, including the live demos, here: https://www.youtube.com/watch?v=uaJWB3K8lkg
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production?
Why Data Science on Big Data?
In this meetup you will cover the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of Apache Zeppelin, Apache Spark, Apache Livy and Apache Hadoop with the focus on integration, security and model deployment and management.
Data Science at Scale DEMO
The demo will cover the Data Science life cycle: develop model in team environment, train the model with all the data on a Hadoop cluster, deploy model into production. The model will be a Spark ML model
Practical ML with Apache Spark
To deliver machine learning solutions data scientists not only need to fit models but also do familiar tasks data collection & wrangling, labelling, feature extraction and transformation, model tuning and evaluation, etc. Apache Spark provide provides a unified solution for all this under the same framework.
For example, one can use Spark SQL to generate training data from different sources and then pass it directly to MLlib for feature engineering and model tuning, instead of using Hive/Pig for the first half and then downloading the data to a single machine to train models in R. The latter is actually very common in practice but painful to maintain. Spark MLlib makes life easier for data scientists and machine learning engineers so that they can focus on building better ML models and applications.
We will discuss the underlying principles required to develop practical machine learning and data science pipelines and show some hands-on experience using Apache Spark to solve typical machine learning and data science problem. We will also have a short discussion about how Spark MLlib faces challenges from other machine learning libraries such as TensorFlow and XGBoost.
Digital transformations require a new hybrid cloud—one that’s open by design, and frees clients to choose and change environments, data and services as needed. This approach allows cloud apps and services to be rapidly composed using the best relevant data and insights available, while maintaining clear visibility, control and security—everywhere. How do you decide where to put data on a hybrid cloud and how to use it? What’s the best hybrid cloud strategy in terms of data and workload? How should you leverage a 50/50 rule or a 80/20 rule and user interaction to evaluate which data/workload to move to the cloud and which data/workload to keep on-premise? Hybrid cloud provides an open platform for innovation, including cognitive computing. Organizations are looking for taking shadow IT out of the shadows by providing a self-service way to the information and a hybrid cloud strategy is allowing that. Also, how to use hybrid cloud for better manage data sovereignty & compliance?
General Data Protection Regulation (GDPR) which will be in effect in 2018, brings newer requirements for managing personal and sensitive data of European Union subjects. The recently enacted Privacy Shield directive from 2016 now regulates the movement of data between EU and the US. Together, both regulations are impacting how CXOs are thinking about procuring, storing and processing personal and sensitive data.
Over the last few years, open-source projects such as Apache Ranger and Apache Atlas have been driving comprehensive security and governance within Hadoop and the big data ecosystem. Solution vendors such as Privacera are leveraging the power of Hadoop and Apache projects such as Atlas, Ranger to help security and compliance teams within enterprises easily identify and protect data that are subject to the privacy regulations and monitor the use of such data.
This talk will walk through the current regulatory climate in Europe and how it can impact big data implementations. We will specifically walk through a business framework that enterprises can use to build a strategy to manage GDPR, Privacy Shield, and other regulations. We will use a live demonstration to show how projects such as Apache Ranger, Apache Atlas and solutions such as Privacera can be used effectively to address specific requirements of these regulations.
With the advent of Big Data in the Threat Analytics space needs emerge to perform near real-time (NRT) threat detection and automated interpretation that speed counter measures and remediation. AT&T Chief Security Organization (CSO) has developed an enterprise architecture that includes near real-time outlier processes necessary to protect its network from cyber threats using the Hadoop ecosystem. One enterprise challenge that CSO has faced is summarized in the statement by Brian Rexroad, Executive Director of Technology and Security: "I feel there is too much emphasis is on "detecting". Significantly more emphasis is needed in automated extraction of related information/activity and interpretation of that information." Therefore; CSO Engineering team developed the Stratum™ architecture that includes many open source and commercial products facilitating the rapid development and operationalization of outliner detectors and interpreters. Extensive use of NRT data ingestion, enrichment, organization and random access storage patterns, make these capabilities possible on top of a Hadoop based ecosystem. The Stratum™ architecture offers the CSO the ability to minimize the time and effects of many cyber threats. Using Big Data technologies for cyber threat analysis is becoming quite common, but the need for outlier detection and interpretation is crucial for enterprise protection.
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
Gaining business advantages from big data is moving beyond just the efficient storage and deep analytics on diverse data sources to using AI methods and analytics on streaming data to catch insights and take action at the edge of the network.
https://hortonworks.com/webinar/accelerating-data-science-real-time-analytics-scale/
Slides from the joint webinar. Learn how Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your Data Science efforts.
Together, Pivotal HAWQ and the Hortonworks Data Platform provide businesses with a Modern Data Architecture for IT transformation.
3 CTOs Discuss the Shift to Next-Gen Analytic EcosystemsHortonworks
Wow! When have you ever sat in on a Big Data analytics discussion by three of the most influential CTOs in the industry? What do they talk about among themselves?
Join Teradata's Stephen Brobst, Informatica's Sanjay Krishnamurthi, and Hortonworks' Scott Gnau as they provide a framework and best practices for maximizing value for data assets deployed within a Big Data & Analytics Architecture.
With the rise of IoT and complexity of applications, clouds, networks and infrastructure, it is becoming more difficult to protect data and infrastructure from attackers. When groups of bad actors collaborate, share information, provide unauthorized access, and do botnet as a service, attacks in terabit units also start easily. On the other hand, it is also difficult to find enough security analysts to deal with and defend against such attacks.
Here is the emergence of community cooperation like Apache Metron and efforts to open source. Metron provides a comprehensive framework for applications, networks and security built on Apache Hadoop and open source streaming analysis (eg Apache Nifi, Apache Kafka) tools in scalable data management and processing stacks. Extensions such as profiling, machine learning, and visualization work and real-time streaming detection make SOC analysts more efficient, while intrinsic scalability of open source gives data scientists security insight from data laboratories So that it can be quickly incorporated into production.
This section explains how real-world businesses and managed service providers use Apache Metron, identify and resolve security threats on a large scale, and explain methods and ideas for adapting the platform to your security architecture · I will demonstrate.
Big Traffic, Big Trouble: Big Data Security AnalyticsDataWorks Summit
With the rise of IoT and the increasing complexity of applications, clouds, networks and infrastructure, the battle to keep your data and your infrastructure safe from attackers is getting harder. As groups of bad actors collaborate, sharing information and offering illegal access, and botnets as a service, terabits of attack can be launched cheaply. Meanwhile, it’s hard to find enough security analysts to catch and prevent these attacks.
This is where community collaboration and open source efforts like Apache Metron come in. Metron presents a comprehensive framework for application and network, security built on Apache Hadoop and open source Streaming Analytics(ie Apache Nifi, Apache Kafka) tool’s highly scalable data management and processing stacks. Advanced features like profiling, machine learning, and visualization work with real-time streaming detection to make your SOC analysts more efficient, while the intrinsic extensibility of open source helps your data scientists get security insights out of the lab and into production fast.
We will discuss and demonstrate how some real-world businesses and managed service providers are using Apache Metron to identify and solve security threats at scale, and some approaches and ideas for how the platform can fit into your security architecture.
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
A walk through of creating a dataflow for ingest of twitter data and analyzing the stream with NLTK Vader Python Sentiment Analysis and Inception v3 TensorFlow via Python in Apache NiFi. Storage in Hadoop HDFS.
Overcoming the AI hype — and what enterprises should really focus onDataWorks Summit
Deep learning for all its hype is brittle, non-generalizeable, and its learnings are not readily transferable from one application to another. Since we are unlikely to see anything close to artificial general intelligence in the next few decades., we should instead focus on how enterprises can capitalize on the state of the art in machine learning and re-implement successful algorithms and follow the data science lifecycles that generate highest ROI.
This talk will cover the current state of the art in AI, its limits vs. hype, and discuss concrete steps that enterprises can take to achieve desired ROI by re-implementing production-grade-ready machine learning algorithms, that have been hardened and demonstrated to work very well in specific, constrained domains.
By the end of this talk, attendees should have a better grasp on how to avoid costly and unnecessary investments into yet unproven technologies, be better equipped to navigate the complex space of AI, and understand where to best focus their resources to maximize ROI. ROBERT HRYNIEWICZ, Technical Evangelist, Hortonworks
This workshop will provide a hands on introduction to basic Machine Learning techniques with Apache Spark ML using the cloud.
Format: A short introductory lecture on a select important supervised and unsupervised Machine Learning techniques followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Machine Learning with Spark ML. In the lab, you will use the following components: Apache Zeppelin (a “Modern Data Science Toolbox”) and Apache Spark. You will learn how to analyze the data, structure the data, train Machine Learning models and apply them to answer real-world questions.
Pre-requisites: Registrants must bring a laptop that can run the Hortonworks Data Cloud.
At this Crash Course everyone will have a cluster assigned to them to try several workloads using Machine Learning, Spark and Zeppelin on the cloud.
Speakers: Robert Hryniewicz
Introduction: This workshop will provide a hands-on introduction to Machine Learning (ML) with an overview of Deep Learning (DL).
Format: An introductory lecture on several supervised and unsupervised ML techniques followed by light introduction to DL and short discussion what is current state-of-the-art. Several python code samples using the scikit-learn library will be introduced that users will be able to run in the Cloudera Data Science Workbench (CDSW).
Objective: To provide a quick and short hands-on introduction to ML with python’s scikit-learn library. The environment in CDSW is interactive and the step-by-step guide will walk you through setting up your environment, to exploring datasets, training and evaluating models on popular datasets. By the end of the crash course, attendees will have a high-level understanding of popular ML algorithms and the current state of DL, what problems they can solve, and walk away with basic hands-on experience training and evaluating ML models.
Prerequisites: For the hands-on portion, registrants must bring a laptop with a Chrome or Firefox web browser. These labs will be done in the cloud, no installation needed. Everyone will be able to register and start using CDSW after the introductory lecture concludes (about 1hr in). Basic knowledge of python highly recommended.
We have introduced several new features as well as delivered some significant updates to keep the platform tightly integrated and compatible with HDP 3.0.
https://hortonworks.com/webinar/hortonworks-dataflow-hdf-3-2-release-raises-bar-operational-efficiency/
Introduction: This workshop will provide a hands-on introduction to Apache Spark using the HDP Sandbox on students’ personal machines.
Format: A short introductory lecture about Apache Spark components used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache Spark. This lab will use the following Spark and Apache Hadoop components: Spark, Spark SQL, Apache Hadoop HDFS, Apache Hadoop YARN, Apache ORC, and Apache Ambari User Views. You will learn how to move data into HDFS using Spark APIs, create Apache Hive tables, explore the data with Spark and Spark SQL, transform the data and then issue some SQL queries.
Pre-requisites: Registrants must bring a laptop that can run the Hortonworks Data Cloud.
Speaker:
Robert Hryniewicz, Developer Advocate, Hortonworks
What's new in Hortonworks DataFlow 3.0 by Andrew PsaltisData Con LA
Abstract:- Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.0.
The Car of the Future - Autonomous, Connected, and Data CentricDataWorks Summit
incredibly data intensive endeavor. Traditional data management approaches are straining to cope with the demands imposed by autonomous driving research.
This session investigates the role of data in teaching cars to drive and the data management challenges that automakers must overcome in achieving this objective. Finally, a modern data architecture, leveraging the latest advances in data management technologies is proposed to facilitate the promise of a self-driving future.
Speaker: Robert Hryniewicz, AI Evangelist, Hortonworks
Apache Spark in Cloud and Hybrid: Why Security and Governance Become More Imp...Spark Summit
An Increasing number of Apache Spark deployments are in Cloud and hybrid environments. This often means that Spark workloads are ephemeral but the data exists in a durable storage either in cloud and on-prem. The data also moves between cloud storage and on-prem. With this architecture in place, security and governance have become paramount to run Spark workloads across on-prem and cloud. In this keynote, we will walk through several issues and highlight a Spark workload running in an ephemeral cluster with security and governance across Cloud/On-Prem and how the same security and governance is shared with other workloads.
Guest lecture for
Course: Front Lines on Adoption of Digital and AI-based Service Offerings
Course URL: https://www.nhh.no/en/courses/front-lines-on-adoption-of-digital-and-ai-based-services/
Prof Tor Andreassen LI URL: https://www.linkedin.com/in/tor-wallin-andreassen-1aa9031/
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
Forrester forecasts* that direct spending on the Internet of Things (IoT) will exceed $400 Billion by 2023. From manufacturing and utilities, to oil & gas and transportation, IoT improves visibility, reduces downtime, and creates opportunities for entirely new business models.
But successful IoT implementations require far more than simply connecting sensors to a network. The data generated by these devices must be collected, aggregated, cleaned, processed, interpreted, understood, and used. Data-driven decisions and actions must be taken, without which an IoT implementation is bound to fail.
https://hortonworks.com/webinar/iot-predictions-2019-beyond-data-heart-iot-strategy/
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache NiFi, Apache Kafka, Apache Storm.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
QE automation for large systems is a great step forward in increasing system reliability. In the big-data world, multiple components have to come together to provide end-users with business outcomes. This means, that QE Automations scenarios need to be detailed around actual use cases, cross-cutting components. The system tests potentially generate large amounts of data on a recurring basis, verifying which is a tedious job. Given the multiple levels of indirection, the false positives of actual defects are higher, and are generally wasteful.
At Hortonworks, we’ve designed and implemented Automated Log Analysis System - Mool, using Statistical Data Science and ML. Currently the work in progress has a batch data pipeline with a following ensemble ML pipeline which feeds into the recommendation engine. The system identifies the root cause of test failures, by correlating the failing test cases, with current and historical error records, to identify root cause of errors across multiple components. The system works in unsupervised mode with no perfect model/stable builds/source-code version to refer to. In addition the system provides limited recommendations to file/open past tickets and compares run-profiles with past runs.
Improving business performance is never easy! The Natixis Pack is like Rugby. Working together is key to scrum success. Our data journey would undoubtedly have been so much more difficult if we had not made the move together.
This session is the story of how ‘The Natixis Pack’ has driven change in its current IT architecture so that legacy systems can leverage some of the many components in Hortonworks Data Platform in order to improve the performance of business applications. During this session, you will hear:
• How and why the business and IT requirements originated
• How we leverage the platform to fulfill security and production requirements
• How we organize a community to:
o Guard all the players, no one gets left on the ground!
o Us the platform appropriately (Not every problem is eligible for Big Data and standard databases are not dead)
• What are the most usable, the most interesting and the most promising technologies in the Apache Hadoop community
We will finish the story of a successful rugby team with insight into the special skills needed from each player to win the match!
DETAILS
This session is part business, part technical. We will talk about infrastructure, security and project management as well as the industrial usage of Hive, HBase, Kafka, and Spark within an industrial Corporate and Investment Bank environment, framed by regulatory constraints.
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
There has been an explosion of data digitising our physical world – from cameras, environmental sensors and embedded devices, right down to the phones in our pockets. Which means that, now, companies have new ways to transform their businesses – both operationally, and through their products and services – by leveraging this data and applying fresh analytical techniques to make sense of it. But are they ready? The answer is “no” in most cases.
In this session, we’ll be discussing the challenges facing companies trying to embrace the Analytics of Things, and how Teradata has helped customers work through and turn those challenges to their advantage.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
In high-risk manufacturing industries, regulatory bodies stipulate continuous monitoring and documentation of critical product attributes and process parameters. On the other hand, sensor data coming from production processes can be used to gain deeper insights into optimization potentials. By establishing a central production data lake based on Hadoop and using Talend Data Fabric as a basis for a unified architecture, the German pharmaceutical company HERMES Arzneimittel was able to cater to compliance requirements as well as unlock new business opportunities, enabling use cases like predictive maintenance, predictive quality assurance or open world analytics. Learn how the Talend Data Fabric enabled HERMES Arzneimittel to become data-driven and transform Big Data projects from challenging, hard to maintain hand-coding jobs to repeatable, future-proof integration designs.
Talend Data Fabric combines Talend products into a common set of powerful, easy-to-use tools for any integration style: real-time or batch, big data or master data management, on-premises or in the cloud.
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
Hadoop Distributed File System (HDFS) evolves from a MapReduce-centric storage system to a generic, cost-effective storage infrastructure where HDFS stores all data of inside the organizations. The new use case presents a new sets of challenges to the original HDFS architecture. One challenge is to scale the storage management of HDFS - the centralized scheme within NameNode becomes a main bottleneck which limits the total number of files stored. Although a typical large HDFS cluster is able to store several hundred petabytes of data, it is inefficient to handle large amounts of small files under the current architecture.
In this talk, we introduce our new design and in-progress work that re-architects HDFS to attack this limitation. The storage management is enhanced to a distributed scheme. A new concept of storage container is introduced for storing objects. HDFS blocks are stored and managed as objects in the storage containers instead of being tracked only by NameNode. Storage containers are replicated across DataNodes using a newly-developed high-throughput protocol based on the Raft consensus algorithm. Our current prototype shows that under the new architecture the storage management of HDFS scales 10x better, demonstrating that HDFS is capable of storing billions of files.
How to optimize Hortonworks Apache Spark ML workloads on Power - POWER 8/9 architecture is the latest offering from IBM and OpenPower foundation. It is the perfect platform for optimizing Hortonworks Spark's performance. During this presentation we will walk the audience through steps required to optimize YARN, HDFS, and Spark on a Power cluster.
Step required:
1) Classify workload into CPU, Memory, IO or mixed (CPU, memory, IO) intensive
2) Characterize "out-of-box" Hortonworks spark workload to understand CPU, Memory, IO and Network performance characteristics
3) Floor Plan cluster resources
4) Tune "out-of-box" workload to navigate "Roofline" Performance space in the above named dimensions
5) If workload is Memory / IO/ Network intensive bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound
6) Divide search space into regions and perform exhaustive search.
7) Identify Performance bottlenecks by resource monitoring and tune the System, JVM or application layer by profiling application and hardware counters if required.
The challenge of computing big data for evolving digital business processes demands variety of computation techniques and engines (SQL, OLAP, time-series, graph, document store), but working in unified framework. A simple architecture of data transformations while ensuring the security, governance, and operational administration are the necessary critical components for enterprise production environments supporting day-to-day business processes. In this session, you will learn about best practices & critical components to ensure business value from latest production deployments. Hear how existing customers are using SAP Vora and the value they have achieved so far with this in-memory engine for distributed data processing. The session provides you with a clear understanding how SAP Vora and open source components like Apache Hadoop and Apache Spark offer an architecture that supports a wide variety of use cases and industries. You will also receive very useful insight where to find development resources, test drive demos, and general documentation.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.