This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
These are slides from Manasi Vartak's Strata Talk in March 2020 on Robust MLOps with Open-Source.
* Introduction to talk
* What is MLOps?
* Building an MLOps Pipeline
* Real-world Simulations
* Let’s fix the pipeline
* Wrap-up
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Experimentation to Industrialization: Implementing MLOpsDatabricks
In this presentation, drawing upon Thorogood’s experience with a customer’s global Data & Analytics division as their MLOps delivery partner, we share important learnings and takeaways from delivering productionized ML solutions and shaping MLOps best practices and organizational standards needed to be successful.
We open by providing high-level context & answering key questions such as “What is MLOps exactly?” & “What are the benefits of establishing MLOps Standards?”
The subsequent presentation focuses on our learnings & best practices. We start by discussing common challenges when refactoring experimentation use-cases & how to best get ahead of these issues in a global organization. We then outline an Engagement Model for MLOps addressing: People, Processes, and Tools. ‘Processes’ highlights how to manage the often siloed data science use case demand pipeline for MLOps & documentation to facilitate seamless integration with an MLOps framework. ‘People’ provides context around the appropriate team structures & roles to be involved in an MLOps initiative. ‘Tools’ addresses key requirements of tools used for MLOps, considering the match of services to use-cases.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
Today, practically every firm uses big data to gain a competitive advantage in the market. With this in mind, freely available big data tools for analysis and processing are a cost-effective and beneficial choice for enterprises. Hadoop is the sector’s leading open-source initiative and big data tidal roller. Moreover, this is not the final chapter! Numerous other businesses pursue Hadoop’s free and open-source path.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
These are slides from Manasi Vartak's Strata Talk in March 2020 on Robust MLOps with Open-Source.
* Introduction to talk
* What is MLOps?
* Building an MLOps Pipeline
* Real-world Simulations
* Let’s fix the pipeline
* Wrap-up
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
Given at the MLOps. Summit 2020 - I cover the origins of MLOps in 2018, how MLOps has evolved from 2018 to 2020, and what I expect for the future of MLOps
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage the production machine learning lifecycle. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. MLOps applies to the entire ML lifecycle - from integrating with model generation (software development lifecycle, continuous integration/continuous delivery), orchestration, and deployment, to health, diagnostics, governance, and business metrics.
To watch the full presentation click here: https://info.cnvrg.io/mlopsformachinelearning
In this webinar, we’ll discuss core practices in MLOps that will help data science teams scale to the enterprise level. You’ll learn the primary functions of MLOps, and what tasks are suggested to accelerate your teams machine learning pipeline. Join us in a discussion with cnvrg.io Solutions Architect, Aaron Schneider, and learn how teams use MLOps for more productive machine learning workflows.
- Reduce friction between science and engineering
- Deploy your models to production faster
- Health, diagnostics and governance of ML models
- Kubernetes as a core platform for MLOps
- Support advanced use-cases like continual learning with MLOps
Experimentation to Industrialization: Implementing MLOpsDatabricks
In this presentation, drawing upon Thorogood’s experience with a customer’s global Data & Analytics division as their MLOps delivery partner, we share important learnings and takeaways from delivering productionized ML solutions and shaping MLOps best practices and organizational standards needed to be successful.
We open by providing high-level context & answering key questions such as “What is MLOps exactly?” & “What are the benefits of establishing MLOps Standards?”
The subsequent presentation focuses on our learnings & best practices. We start by discussing common challenges when refactoring experimentation use-cases & how to best get ahead of these issues in a global organization. We then outline an Engagement Model for MLOps addressing: People, Processes, and Tools. ‘Processes’ highlights how to manage the often siloed data science use case demand pipeline for MLOps & documentation to facilitate seamless integration with an MLOps framework. ‘People’ provides context around the appropriate team structures & roles to be involved in an MLOps initiative. ‘Tools’ addresses key requirements of tools used for MLOps, considering the match of services to use-cases.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
Today, practically every firm uses big data to gain a competitive advantage in the market. With this in mind, freely available big data tools for analysis and processing are a cost-effective and beneficial choice for enterprises. Hadoop is the sector’s leading open-source initiative and big data tidal roller. Moreover, this is not the final chapter! Numerous other businesses pursue Hadoop’s free and open-source path.
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
2017 StrataHadoop SJC conference talk. https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56047
Description:
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #DataScienceHappiness.
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
So, you finally have a data ecosystem with Kafka and Hadoop both deployed and operating correctly at scale. Congratulations. Are you done? Far from it.
As the birthplace of Kafka and an early adopter of Hadoop, LinkedIn has 13 years of combined experience using Kafka and Hadoop at scale to run a data-driven company. Both Kafka and Hadoop are flexible, scalable infrastructure pieces, but using these technologies without a clear idea of what the higher-level data ecosystem should be is perilous. Shirshanka Das and Yael Garten share best practices around data models and formats, choosing the right level of granularity of Kafka topics and Hadoop tables, and moving data efficiently and correctly between Kafka and Hadoop and explore a data abstraction layer, Dali, that can help you to process data seamlessly across Kafka and Hadoop.
Beyond pure technology, Shirshanka and Yael outline the three components of a great data culture and ecosystem and explain how to create maintainable data contracts between data producers and data consumers (like data scientists and data analysts) and how to standardize data effectively in a growing organization to enable (and not slow down) innovation and agility. They then look to the future, envisioning a world where you can successfully deploy a data abstraction of views on Hadoop data, like a data API as a protective and enabling shield. Along the way, Shirshanka and Yael discuss observations on how to enable teams to be good data citizens in producing, consuming, and owning datasets and offer an overview of LinkedIn’s governance model: the tools, process and teams that ensure that its data ecosystem can handle change and sustain #datasciencehappiness.
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Debraj GuhaThakurta
Presented at: Global Big AI Conference, Santa Clara, Jan 2018 Developing and deploying AI solutions on the cloud using Team Data Science Process (TDSP) and Azure Machine Learning (AML)
Building Bridges: Merging RPA Processes, UiPath Apps, and Data Service to bu...DianaGray10
This session is focused on the art of application architecture, where we unravel the intricacies of creating a standard, yet dynamic application structure.
We'll explore:
Essential components of a typical application, emphasizing their roles and interactions.
Learn how to connect UiPath RPA Processes, UiPath Apps, and Data Service together to build a stronger app.
Gain insights into building more efficient, interconnected, and robust applications in the UiPath ecosystem.
Speaker:
David Kroll, Director, Product Marketing @Ashling Partners and UiPath MVP
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
Comparison of Data Preparation vs. Data Wrangling Programming Languages, Frameworks and Tools in Machine Learning / Deep Learning Projects.
A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.
This session compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing (like Talend, Pentaho), streaming analytics ingestion (like Apache Storm, Flink, Apex, TIBCO StreamBase, IBM Streams, Software AG Apama), and data wrangling (DataWrangler, Trifacta) within visual analytics. Various options and their trade-offs are shown in live demos using different advanced analytics technologies and open source frameworks such as R, Python, Apache Hadoop, Spark, KNIME or RapidMiner. The session also discusses how this is related to visual analytics tools (like TIBCO Spotfire), and best practices for how the data scientist and business user should work together to build good analytic models.
Key takeaways for the audience:
- Learn various options for preparing data sets to build analytic models
- Understand the pros and cons and the targeted persona for each option
- See different technologies and open source frameworks for data preparation
- Understand the relation to visual analytics and streaming analytics, and how these concepts are actually leveraged to build the analytic model after data preparation
Video Recording / Screencast of this Slide Deck: https://youtu.be/2MR5UynQocs
Machine Learning in the Cloud has influences operations company-wide. Learn from Data Scientist, Ahmed Sherif, how to leverage Cloud offerings including AWS, IBM Cloud, and Microsoft Azure.
Data science holds tremendous potential for organizations to uncover new insights and drivers of revenue and profitability. Big Data has brought the promise of doing data science at scale to enterprises, however this promise also comes with challenges for data scientists to continuously learn and collaborate. Data Scientists have many tools at their disposal such as notebooks like Juypter and Apache Zeppelin & IDEs such as RStudio with languages like R, Python, Scala and frameworks like Apache Spark. Given all the choices how do you best collaborate to build your model and then work through the development lifecycle to deploy it from test into production ?
In this session learn the attributes of a modern data science platform that empowers data scientists to build models using all the data in their data lake and foster continuous learning and collaboration. We will show a demo of DSX with HDP with the focus on integration, security and model deployment and management.
Speakers:
Sriram Srinivasan, Senior Technical Staff Member, Analytics Platform Architect, IBM
Vikram Murali, Program Director, Data Science and Machine Learning, IBM
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
Getting machine learning models to production is notoriously difficult: it involves multiple teams (data scientists, data and machine learning engineers, operations, …), who often does not speak to each other very well; the model can be trained in one environment but then productionalized in completely different environment; it is not just about the code, but also about the data (features) and the model itself… At DataSentics, as a machine learning and cloud engineering studio, we see this struggle firsthand – on our internal projects and client’s projects as well.
Agile Testing Days 2017 Intoducing AgileBI Sustainably - ExcercisesRaphael Branger
"We now do Agile BI too” is often heard in todays BI community. But can you really "create" agile in Business Intelligence projects? This presentation shows that Agile BI doesn't necessarily start with the introduction of an iterative project approach. An organisation is well advised to establish first the necessary foundations in regards to organisation, business and technology in order to become capable of an iterative, incremental project approach in the BI domain.
In this session you learn which building blocks you need to consider. In addition you will see what a meaningful sequence to these building blocks is. Selected aspects like test automation, BI specific design patterns as well as the Disciplined Agile Framework will be explained in more and practical details.
Coding software and tools used for data science management - PhdassistancephdAssistance1
The technique of extracting usable information from data is known as data science. This is the procedure for collecting, modelling and analysing, data in order to address real-world issues. Data Science tools have been developed as a result of the vast range of applications and rising demand. The following section goes through the greatest Data Science tools in detail.The most notable attribute of these tools is that they do not require the usage of programming languages to implement Data Science.
Read More: https://bit.ly/3rbp1Lb
For Enquiry:
India: +91 91769 66446
UK: +44 7537144372
Email: info@phdassistance.com
This presentation explains what serverless is all about, explaining the context from Devs & Ops points of view, and presenting the various ways to achieve serverless (Functions a as Service, BaaS....). It also presents the various competitors on the market and demo one of them, openfaas. Finally, it enlarges the pictures, positionning serverless, combined with Edge computing & IoT, as a valuable triptic cloud vendors are leveraging on top of, to create end-to-end offers.
2 self-managed Docker clusters deployed on public clouds and fight each other in a ruthless battle. One has been designed to resist any form of threat. The other one's only aim is to destroy the first one. Who's going to win?
Although it's presented as an entertainment, this talk will show off two serious platforms leveraging on different principles. Beyond the technical aspects covered (swarm/kubernetes orchestration, IaaS clouds, various tools such as terraform, kops or helm) , it will be the opportunity to discuss more largely architecture topics such as immutable infrastructure, hybridation, microservices, etc.
DevOps at scale: what we did, what we learned at Societe GeneraleAdrien Blind
The following talk discusses Societe Generale's transformation journey to DevOps, and more largelly to continuous delivery principles, inside a large, traditionnal company. It emphases the importance of practices over tooling, a human centric approach massively leveraging on coaching, and our "framework" approach to make it scaling up to the IS level.
It has been initially delivered at DevOps Rex conference, with teammate Laurent Dussault, also DevOps coach at Societe Generale.
Unleash software architecture leveraging on dockerAdrien Blind
The following talk first comes back on key aspects of microservices architectures. It then shifts to Docker, to explain in this context the benefits of containers and especially the new orchestration features appeared with version 1.12.
Docker, cornerstone of cloud hybridation ? [Cloud Expo Europe 2016]Adrien Blind
The following talk discusses the opportunity to leverage on docker to create an hybrid logical cloud built simultaneously on top of traditionnal datacenters and public cloud vendors and enabling to manage new kind of containers (Windows, linux over ARM). It also discusses the value of such capacity for applications in a contexte of topology orchestrations and micro service oriented applications.
DevOps à l'échelle: ce que l'on a fait, ce que l'on a appris chez Societe Gen...Adrien Blind
The following talk discusses Societe Generale's transformation journey to DevOps, and more largelly to continuous delivery principles, inside a large, traditionnal company. It emphases the importance of practices over tooling, a human centric approach massively leveraging on coaching, and our "framework" approach to make it scaling up to the IS level.
It has been initially delivered at DevOps Rex conference, with teammate Laurent Dussault, also DevOps coach at Societe Generale.
Docker, cornerstone of an hybrid cloud?Adrien Blind
In this presentation, I propose to explore the orchestration & hybridation potential raised by Docker 1.12 Swarm Mode and the subsequent benefits.
I'll first remind why docker fits well the microservices paradigms, and how does this architecture engender new challenges : service discovery, app-centric security, scalability & resilience, and of course, orchestration.
I'll then discuss the opportunity to create your own docker CaaS platform hybridating simultaneously on various cloud vendors & traditional datacenters, better than just leveraging on vendors integrated offers.
Finally, I'll discuss the rise of new technologies (Windows containers, ARM architectures) in the docker landscape, and the opportunity of integrating them in a global docker composite orchestration, enabling to depict globally complex apps.
Petit déjeuner Octo - L'infra au service de ses projetsAdrien Blind
Cette présentation revient sur le projet d'automatisation de l'infrastructure informatique de Société Générale, dans un contexte plus large de déploiement des pratiques et outils du continuous delivery et devops.
Since many apps are not about just a single container, this talk discusses the ability and benefits of creating an hybrid Docker cluster capacity leveraging on Linux+Windows OS and x86+ARM architectures.
Moreover, the docker nodes composing this cloud will be hosted across several providers (local DC, cloud vendors such as Azure or AWS), in order to face various scenarios (cloud migration, elasticity...).
DevOps, NoOps, everything-as-code, commoditisation… Quel futur pour les ops ?Adrien Blind
La mise en oeuvre du continuous delivery engendre de nouvelles pressions sur les Ops, l’infra et l’opérabilité d’une application se bâtissant désormais au rythme croissant des itérations livrées. En parallèle, les patterns d’architecture évoluent eux aussi : résilience et scalabilité se traitent désormais de plus en plus au sein même des applications, ramenant progressivement l’infrastructure au rang de commodité… Enfin, les équipes de Devs n’ont de cesse de réclamer plus d’autonomie et une ergonomie plus adaptée à leurs besoins : les acteurs du cloud et de solutions star comme Docker ne s’y sont pas trompés en proposant des produits qui leur parlent directement : la tentation du NoOps grandit peu à peu…
L’enjeu pour les Ops consiste donc à proposer un positionnement et une offre en résonance avec ces nouvelles attentes. Les challenges sont nombreux, revêtant à la fois des aspects techniques (infra-as-code, software-defined-software/storage/, hybridation du SI…) et non techniques (agilité, craftsmanship, devops…).
Des Devs s’arrogeant la place des Ops, des Ops acquérant des compétence de Dev… Dans cette session, nous vous proposons ainsi d’explorer ces profondes mutations culturelles et techniques, et nous vous partagerons quelques recettes pour le plus grand bénéfice des OPs… comme des DEVs. Comme l’écrivait Audiard, « Quand ça change, ça change... Faut jamais se laisser démonter » !
Introduction to Unikernels at first Paris Unikernels meetupAdrien Blind
This is an introduction to unikernels and their impact on architecture and IT organizations (in French, I'll translate it in short terms). I produced this talk for the first Paris Unikernels Meetup.
When Docker Engine 1.12 features unleashes software architectureAdrien Blind
This slidedeck deals with new features delivered with Docker Engine 1.12, in a larger context of application architecture & security. It has been presented at Voxxed Days Luxembourg 2016
This presentation discusses how to achieve continuous delivery, leveraging on docker containers, here used as universal application artifacts. It has been presented at Voxxed '15 Bucharest.
Docker: Redistributing DevOps cards, on the way to PaaSAdrien Blind
This talk first presents Docker through its key characteristics: being Portable, Disposable, Live, Social. It then discusses a new type of cloud, the CaaS (Container as a Service), and it potential benefits for PaaS (Platform as a Service).
Docker, Pierre angulaire du continuous delivery ?Adrien Blind
This presentation explores continuous delivery principles leveraging on Docker : it depicts the use of Docker containers as universal application artifacts, delivered flowly all along a deployment pipeline.
This slideshow has been initially presented at Devops D-Day conference, Marseille.
Identity & Access Management in the cloudAdrien Blind
This presentation discusses the evolution of IAM (Identity & Access Management) problematic, considering a context pushing more & more externalization & opening (B2B, B2C) of enterprises IS, also leveraging massively on the cloud.
The talk particularly focuses on IAM SSO & federation topics, and subsequent technologies (SAML, OpenID, OAuth...).
The missing piece : when Docker networking and services finally unleashes so...Adrien Blind
Docker now provides several building blocks, combining engine, clustering, and componentization, while the new networking and service features enable many new usecases such as multi-tenancy. In this session, you will first discover the new experimental networking and service features expected soon, and then drift rapidly to software architecture, explaining how a complete Docker stack unleashes microservices paradigms.
The first part of the talk will introduce what SDNs and service registries are to the audience and will cover corresponding network & service experimental features of docker accordingly, with a technical focus. For instance, it explains how to create an overlay network of top of a swarm cluster or how to publish services.
The second part of the talk moves from infrastructure to application concerns, explaining that application architecture paradigms are shifting. In particular, we discuss the growing porosity of companies’s IS (especially due to massive use of cloud services) drifting security boundaries from the global IS perimeter, to the application shape. We also remind that traditional SOA patterns leveraging on buses (ie. ESBs & ETLs) are being replaced by microservices promoting more direct, full-mesh, interactions. To get the picture really complete, we’ll also rapidely remind other trends and shifts which are already covered by other docker components: scalability & resiliency to be supported by the apps themselves, fine-grained applications, or even infrastructure commoditization…
Most of all, the last part depicts a concrete, state-of-the-art application, applying all the properties discussed previously, and leveraging on a multi-tenant docker full stack using new networking and services features, in addition to traditional swarm, compose, and engine components. And just because we say it doesn’t mean it’s true, we’ll be happy to demonstrate this live !
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Introdution to Dataops and AIOps (or MLOps)
1. An introduction to
DataOps & AIOps (or MLOps)
Adrien Blind (@adrienblind)
Disclaimer and credits:
Parts of this presentation have been built with former team mates out of the context of Saagie:
- a broader talk initially co-developed and co-delivered along with Frederic Petit for DevOps D-Day and Snow Camp conferences. Original slides here: https://bit.ly/2Ci3Ilh
- a talk discussing Continuous Delivery and DevOps, co-developed and co-delivered along with Laurent Dussault for DevOps Rex conferences. Slides here: https://bit.ly/2CmEIcB
5. The point is to Operationalize data projects
Proof of Concept
Operational product
● Robust, resilient
● Scalable
● Secure
● Updatable
● Shareable
6. Value is hard to demonstrate
Long time to implement
Rarely deployed in production
Only 27% of CxO considered their Big
Data projects valuable
12 to 18 months to build and deploy
AI pilots
Only 15% of AI projects have been
deployed
Sources
Gartner’s CIO Survey (2018)
The Big Data Payoff: Turning Big Data into Business Value (Cap Gemini and Informatica survey, 2016)
BCG, Putting Artificial Intelligence to Work, September 2017
Challenges delivering value from Big Data / AI
8. DIY, time/budget-consuming, multi-skills, high-risk approach
Grant access
Connect databases /
files
Integrate data
frameworks
Deploy test jobs &
validate models
Define new policies
Change algos and
integrate new libs
Rewrite/build ETL
codes to prod
Deploy prod jobs
Monitor & audit
activity
Write/Build ML
codes
Write/Build ETL
codes
Provision cluster(s)
Align processes w/
business reqs
Rewrite/build ML
codes to prod
Challenges ㅡ Process
SecurityIT Ops
Data Engineer IT Ops Data Scientist
Data Engineer Data Scientist
IT Ops
IT Ops
Data ScientistData Engineer
Data Steward Business Analyst
9. Barriers between organization : silos and different cultures!
Challenges ㅡ People & organization
Data Analyst
Data Steward
BUSINESS
Data Analyst
Data Steward
ANALYTICS
TEAM
Data Engineer
Data Scientists
IT
IT Ops
IT Architect & Coders
14. Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
Infrastructure landscape: infrastructure driven
15. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
Continuous
improvement
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.
Application landscape: API driven
Information Technology (on premises, cloud, etc.)
16. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
Internal raw data generated by
your apps
Continuous
improvement
Continuous
improvement
Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
External
API you
consume
Data Information System is data processing centric. Input is data, output is data and data models.
Generally not directly plugged on the operational IS (you copy data and process there)
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.
Data processing landscape: data driven
17. #1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
(For
analytics)
As shared datamarts &
more & more as APIs
(Provide
training sets
for AI)
Internal raw data generated by
your apps
Datasets
Continuous
improvement
Continuous
improvement
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
Data you share
externally
Data you share
back to operational IS
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.Data Information System is data processing centric. Input is data, output is data and data models.
Data processing landscape outputs
18. #3 AIOPs
Explore & build models
Data scientists need pipelines to
deliver valuable models
#1 DevOps
Build, deliver & run apps
Developers need pipelines to
deliver innovative apps
#2 DataOps
Process & share data
Data engineers need pipelines to
deliver a capital of data
Continuous
improvement
(For
analytics)
Performance drift
analysis (to retrain &
optimize models)
As shared datamarts &
more & more as APIs
(Provide
training sets)
Internal raw data generated by
your apps
Models
to be bundled
and ran as
APIs in the
operational IS
Datasets
Continuous
improvement
Continuous
improvement
Information Technology (on premises, cloud, etc.)
#0 ITOps: provide compute & storage to host data processing / models / app code
API (used
internally &
shared
externally)
External data
you consume:
opendata, from
partners...
Data you share
externally
Data you share
back to operational IS
External
API you
consume
Operational I.S. (apps, ERP, CRM…) is API centric.
Input and output are business features as APIs.Data Information System is data processing centric. Input is data, output is data and data models.
Data science landscape: model driven
19. AIOPs needs DataOps
In the data landscape, spotlights are on data analytics,
and even more on data science/AI which valorize data in a revolutionary way… because they solve business challenges.
… But it requires to have built up a data capital to process first!
Said differently, I like to say that…
( of AI ) ( DATA )
20. Summary: Pensé par les Devs… Pansé par les Ops!
Tech side Non-tech side
#0 ITOps
ITOps operationalizes the delivery of infrastructure assets.
The purpose is to deliver an underlying platform on top of
which assets will be hosted (apps/data processing/ML).
CloudOps lands here, but is opinionated on the way to
achieve this.
Fosters collaboration between Infrastructure teams working
in project mode to deliver new assets, and those running
them (support/run/monitoring, etc.).
#1 DevOps
DevOps operationalizes the delivery of app code (automates,
measure, etc.). The purpose is to deliver innovative
services to the business.
Fosters collaboration between devs who build apps, and ops
responsible to deploying & running these apps. “You build
it, you run it!”
#2 DataOps
DataOps operationalizes the setup of of data (automates
data processing). The purpose is to deliver/shape a capital
(of data).
Fosters collaboration between data engineers who own and
shape the data, and ops deploying the underlying data
processing jobs.
#3 AIOPs
AIOPs operationalizes the delivery of models. The purpose is
to deliver value.
Fosters collaboration between datascientists who explore
data to build up models, and ops delivering these as
useable asset.
Designed by devs, bandaged by the Ops (less fun in english)
So, what about BizDevOps, ITSecOps, DevFinOps, etc.? Business, Security, Finance, etc. are transversal interlocutors / topics which are to be addressed anyway, whatever we’re speaking about DevOps,
DataOps or AIOPs.
22. Agile & DevOps are not enough for data projects
Agile+Devops was good for app-centric projects, where data was isolated. But data-centric projects triggers new additional
challenges!
● New players to involve: data scientists, data engineers... These may have a completely different background
(mathematicians...) and face the technology differently. → Need common understanding, appropriate ergonomy.
(notebooks, GUI…)
● A recurrent technological/language stack used for the various types of jobs to handle: ingestion, dataprep, modeling… →
Need for a ready-to-use toolbox
● Coordinate the various jobs applied to the data → Need for job pipelining/orchestration
● Feed the dev process massively using production data (ex. for machine learning) → Strengthen security
● Identify the patrimony (cataloging), share data, control spreading → Need for governance
23. One DataOps definition
DataOps is a collaborative data management
practice focused on improving the communication,
integration and automation of data flows between
data managers and data consumers across an
organization.
The goal of DataOps is to deliver value faster by
creating predictable delivery and change
management of data, data models and related
artifacts.
DataOps uses technology to automate the design,
deployment and management of data delivery with
the appropriate levels of governance and metadata
to improve the use and value of data in a dynamic
environment.
Source: Gartner - Innovation Insight for DataOps - Dec. 2018
24. DataOps is gaining momentum
The number of data and analytics
experts in business units will grow
at 3X the rate of experts in IT
departments, which will force
companies to rethink their
organizational models and skill sets.
80% of organizations will initiate
deliberate competency development
in the field of data literacy,
acknowledging their extreme
deficiency.
26. Data engineers need pipelines to deliver data
Extract Transform Agregate Share
Shared
Dataset(s)
& data APIs
Data processing
Consumers
That’s where your good old
datawarehouse
generally stands!
If data is the new oil, datalakes are just oil fields (passive, mass raw of structured & unstructured data),
Hive/Impala & co. are oil rigs, while the DataOps pipelines are refineries, aimed at processing data…
Car engines are the datascience leveraging on this fuel to provide a disruptive way of transportation!
#1 the datalake is not the point (while companies focused on it). Data processing is.
#2 You don’t process data just for the pleasure. You do it to support activities which, them, bring value to the business.
DATALAKE
Data storing: datalakes, object storage, data virtualization
27. In comparision, Dev needed pipelines to deliver innovative apps
Commit
Compile
& test
Package
Deploy to
Dev &
test
Code
Running
app
Promote
to … &
test
Promote
to PROD
29. ShareTransformExtract
Inception: DataOps (and AIOps) delivered in a DevOps way
CONSUMEAggregate
Data processing jobs (for ingesting, transforming data, etc.) are finally just pieces of code.
These pieces of code can be delivered themselves using DevOps principles :) Automated through delivery pipelines.
30. DataOps Orchestrator
Enables the delivery and run of
data projects
DataLab Teams
Data projects governance
Software factory
Inception: DataOps (& AIOPs) to be achieved... in a DevOps way!
Regular landscape for apps (app servers…)
UAT PRODPREPRODDEV
Feature
team x
Feature
team y
Version nVersion n+1Version n+3 Version n+2
Version nVersion n+1Version n+3 Version n+2
Business
needs
API
API
31. Building up a dataops platform
Concretely, you need a platform performing the following features:
- It must enable to deploy data processing jobs, leveraging on languages/stacks and technologies that are
commonly used by data engineers (Apache Sqoop, python, java…). Regular ETLs may be part of the story
- It must enable to schedule and run pipelines aggregating jobs in logical sequences (acquiring data, preparing it,
delivering it in datamarts (databases, indexing clusters…)
- It must provide data cataloging & governance features (to have a clear view of the data patrimony), and enable to
manage data governance/security (perform access control, etc.)
- It must appropriate types of datamarts regarding the data patrimony (structured/non structured, time oriented or
not, etc.)
- It must have an ergonomy enabling data engineers and dataops persons to be autonomous and productive (avoid
using tools not design for them, such as regular “OPs” schedulers, raw use of complex tools such as
kubernetes…)
Progressively, more event-driven, data streaming projects arrive on the market. They also need appropriate set of
underlying technologies (Kafka clusters among them)
33. Datahub commitments: build up a data capital
Data Dictionnary &
catalog
Data Extraction /
Lineage
Expertize animation,
marketing,
communication
Data Exposition Data Processing
Data WareHouse /
Data Lake
Data Viz
Data Quality
Governance /
Security
Modelization
Transversal commitment: Build up & share a transverse data capital for the company
The process is largely geared by DataOps pipelines!
This is an extract from a longer presentation: extensive version can be found here https://bit.ly/33tfoNJ
34. Datahub commitments: deliver usecases
Data Collection
Data Exploration &
Analysis tools
ML Code
ML Trainning
(Model)
Monitoring
Data Viz
Data Verification
Service
Presentation
Deliver valuable usecases for the business
The process is largelly geared by a combination of Devops + Dataops + ML/AIOps pipelines!
This is an extract from a longer presentation: extensive version can be found here https://bit.ly/33tfoNJ
36. From DevOps to DataOps & AIOPs
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
Chapter
datascience
Chapter data
engineer
False good idea
Sounds logical, prolongating agile/devops paradigms. But it’s too early! You don’t have the
maturity & critical mass to do this at the begining!
37. From DevOps to DataOps & AIOPs: short term
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
SquadSquad
Chapter
datascience
Chapter data
engineer
DataHub
Valuableusecasesforthebusiness
Transversa
lactivities
Build a datahub first, which create a clear positionning, creates visibility accross the org.
Two objectives: deliver valuable usecases to ignite & show off value of data, while data used for it are the first data to integrate you data catalog
38. Data scientists chapters
(per tribe & datahub)
linked through a guild
From DevOps to DataOps & AIOPs: longer term
Squad Squad Squad
Chapter devs
Tribe
Chapter ...
Squad
Data engineers
chapters (per
tribe & datahub)
linked through a
guild
DataHub
People working on business usescases will progressively get back to the regular organization: if you don’t your just creating a new silo, while the devops/agile
orgz were intended to remove them (paradox). As it was usefull in a first step, it should progressively spread in the org. You may only keep few squads to work on
very innovative tech to address new usescases (ex. deep learning when regular ML will become common). They will also be responsible to foster their expertize
through the guild they will animate too. However, you keep people working on transversal data engineering topics)
Valuableusecasesforthebusiness
Transversa
lactivities
39. Matrix organization & serendipity
This matrix organization (transversal datasets owned by the Datahub, securely shared to several isolated
usecases) enable to factorize the work (so raise your dataset ROI). Each time a usecase team needs a new
dataset, it should be capitalized by integratin the data catalog owned by the datahub (see the central team’s
value ?)
Serendipity: by having a clear understanding of your data patrimony, you can valorize it of course, but it may
also help to give new ideas! “Since I’ve this data, and this one, so I may be able to [your_new_idea_here]”
“If only HP knew what HP knows, we'd be three times more productive”
- Lew Platt, former CEO of Hewlett-Packard
Dataset #1 Dataset #2 Dataset #3 Dataset #4
Usecase #1
Usecase #2
Usecase #3
Data Catalog
41. Data engineering vs Data Science
[80%]
of a data project is roughly about
data aquisition/preparation/sharing
(data engineering)
[20%]
of a data project is roughly about
data valorization
(data science, data analytics)
→ Your datascientists generally spend most of their time at doing data engineering empirically
when a clear data engineer position doesn’t exist in your organization!
- It’s not very efficient (as datascientists costs much more than data engineers and are difficult to hire)
- They generally doesn’t like this activity (and may leave your company at the end!)
- Happens regularly: two datascientists using same data for different usecases will probably create 2 identical
ingestion/preparation pipelines for their projects (you miss a factorization effect)
42. Create clear Data Engineer and DataOps positions!
Data Engineers are the tech plumber of data
Key missions
- Create, configure transformation/preparation jobs to ingest and shape the data
- Deliver them through appropriate datamarts (DB, indexing clusters, APIs…)
- In small / fewly constrained setups, he may handle deployment/run of these process himself
in PROD (quite “noOps” pattern), or this is offloaded to a specialized dataops person
mutualized among several data engineers
Background
- More close to a developer / integrator than a datascientist! (but with a sensibilisation on data
challenges and technologies : Sqoop, HDFS, Hive, Impala, spark, Object storage, etc.)
Data analysts & scientists are experts in valorizing the data
Key missions
- Develop BI, analytics, models based on the datasets they have.
Background
- May come from a very non-IT background (former statisticians are commons) Knowledgeable
on specific frameworks (tensorflow, etc.)
The Data stewart is a functional manager of data
Key missions
- Manage governance and security
Background
- Have a functional / business knowledge of data
DataOps guy are the local, specialized OPs
attached to the data engineers & scientists
Key missions
- Offload deployment of jobs, pipelines and various assets built up by the data engineer (and
datascientists) from dev to prod
- Set up CI/CD toolchains and teach data engineers to work “in a devops way”
- Instrument/Monitor data flow and data quality, manage the run time
- ...
Background
- Mostly DevOps person, with sensibilization on data challenges, and technologies
Transversal,
support data
functions
44. How to start?
Focus on early usecases delivery to gain trust: datascientists and
analysts should be your best friends
● Define clear Data Engineer or even DataOps positions
● Provide them an industrial platform, enabling them to be more
autonomous and productive (less round trips with ops)
● Empower pluridisciplinary data project teams and make them
achieve some first (simple!) use cases to create confidence and
gain more budget if needed
● Set up empirically a basic data catalog made of the dataset
gathered and prepared for your usecases
Don’t enforce organization changes yet! Foster day to day collaboration on operational
topics first. Adopting technologies and automation is at the heart of any tech people (IT
dept. at the first row). This is a quite natural process. But changing organization is much
more sensitive (address management reorganization, people objectives changes, etc.).
This should be done in a latter step, when some early victories have helped to gain trust,
and proves your path is the right one.
45. How to start?
Now, it’s time to shape your datahub
● On the tech side: Automate the whole toolchain (CI/CD); shift to
more (complex) use cases (AI…), scale out platform
● Start changing organization / management: set up your datahub
with a clear commitment, spend more energy on the dataops part,
since enough usecases have been delivered to justify the
factorization/transversal effect
On a longer term, scuttle your work!
● More seriously, your initial siloted approach enabled to have the
critical mass to bootstrap. Now, it’s time to desilot your datalab to
spread in the whole IT dept; if you don’t, you just created a sub data
driven IT, in the larger IT ecosystem, with few porosity
46. BEWARE
Data engineering is a hidden (‘cause spotlights are on
datascientists) key success factor to accelerate,
increase reliability and enhance ROI of your data
project.
But don’t “do Dataops for Dataops”!
Remind : DataOps is there to serve, offload pains of
datascientists & analysts, which them transform
business needs in solution. Exactly like ITOps is there to
provide infrastructure assets to any app / data teams of
the IT dept...
47. WeWork
92 Av. des Champs-Élysées
75008 Paris - France
Seine Innopolis
72, rue de la République
76140 Le Petit-Quevilly - France
Thank you!
@adrienblind