Building the Artificially Intelligent EnterpriseDatabricks
This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise.
How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise?
•Data and analytics – Where are we?
•Why is the journey only half-way done?
•2021 and beyond – The new era of AI usage and not just build
•The requirement – event-driven, on-demand and automated analytics
•Operationalising what you build – DataOps, MLOps and RPA
•Mobilising the masses to integrate AI into processes – what needs to be done?
•Business strategy alignment – the guiding light to AI utilisation for high reward
•Agility step change – the shift to no-code integration of AI by citizen developers
•Recording decisions, and analysing business impact
•Reinforcement-learning – transitioning to continuous reward
Revolution in Business Analytics-Zika Virus ExampleBardess Group
Even from the “man in the street” perspective, there is a sense that we are living in an increasingly algorithmic world. Self-driving cars, pizza delivery by drone, and smart houses are commonplace. The technologies enabling this revolution are both simultaneously mature and evolving rapidly.
In this session, we’ll took a look at a real world problem, the recent global outbreak of the ZIka virus, and used data analytics technologies to gain valuable insights that can assist authorities and the general public to understand and potentially prevent the spread of this disease.
Bardess Group, a sponsor of the event and business analytics consulting firm, will demonstrate how huge, extremely jagged data from a variety of sources can be collected and prepared and rapidly made available for analysis. Advanced machine learning and predictive analysis further enhance the value of those insights.
Finally, Bardess will make the case that using a systematic approach to conceptually visualize the strategic journey to insightful business analytics, the analytics value chain, can assist any organization prepare for this revolution in analytics.
Also see http://cloudera.qlik.com for the demos.
This webinar featuring Claudia Imhoff, President of Intelligent Solutions & Founder of the Boulder BI Brain Trust (BBBT), Matt Schumpert, Director of Product Management and Azita Martin, CMO at Datameer, will highlight the latest technology trends in extending BI with big data analytics and the top high impact use cases.
Attendees will hear about:
-- The extended architecture for today's modern analytics environment
-- The Internet of Things (IoT) and big data
-- The evolution of analytics – from descriptive to prescriptive
-- High impact use cases as a result of the changing analytics world
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
With so many new, evolving frameworks, tools, and languages, a new big data project can lead to confusion and unwarranted risk.
Many organizations have found Data Warehouse Optimization with Hadoop to be a good starting point on their Big Data journey. Offloading ETL workloads from the enterprise data warehouse (EDW) into Hadoop is a well-defined use case that produces tangible results for driving more insights while lowering costs. You gain significant business agility, avoid costly EDW upgrades, and free up EDW capacity for faster queries. This quick win builds credibility and generates savings to reinvest in more Big Data projects.
A proven reference architecture that includes everything you need in a turnkey solution – the Hadoop distribution, data integration software, servers, networking and services – makes it even easier to get started.
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
The initial version of a maturity roadmap to help guide businesses when adopting AI technology into their workflow. IBM Watson Studio is referenced as an example of technology that can help in accelerating the adoption process.
Building the Artificially Intelligent EnterpriseDatabricks
This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise.
How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise?
•Data and analytics – Where are we?
•Why is the journey only half-way done?
•2021 and beyond – The new era of AI usage and not just build
•The requirement – event-driven, on-demand and automated analytics
•Operationalising what you build – DataOps, MLOps and RPA
•Mobilising the masses to integrate AI into processes – what needs to be done?
•Business strategy alignment – the guiding light to AI utilisation for high reward
•Agility step change – the shift to no-code integration of AI by citizen developers
•Recording decisions, and analysing business impact
•Reinforcement-learning – transitioning to continuous reward
Revolution in Business Analytics-Zika Virus ExampleBardess Group
Even from the “man in the street” perspective, there is a sense that we are living in an increasingly algorithmic world. Self-driving cars, pizza delivery by drone, and smart houses are commonplace. The technologies enabling this revolution are both simultaneously mature and evolving rapidly.
In this session, we’ll took a look at a real world problem, the recent global outbreak of the ZIka virus, and used data analytics technologies to gain valuable insights that can assist authorities and the general public to understand and potentially prevent the spread of this disease.
Bardess Group, a sponsor of the event and business analytics consulting firm, will demonstrate how huge, extremely jagged data from a variety of sources can be collected and prepared and rapidly made available for analysis. Advanced machine learning and predictive analysis further enhance the value of those insights.
Finally, Bardess will make the case that using a systematic approach to conceptually visualize the strategic journey to insightful business analytics, the analytics value chain, can assist any organization prepare for this revolution in analytics.
Also see http://cloudera.qlik.com for the demos.
This webinar featuring Claudia Imhoff, President of Intelligent Solutions & Founder of the Boulder BI Brain Trust (BBBT), Matt Schumpert, Director of Product Management and Azita Martin, CMO at Datameer, will highlight the latest technology trends in extending BI with big data analytics and the top high impact use cases.
Attendees will hear about:
-- The extended architecture for today's modern analytics environment
-- The Internet of Things (IoT) and big data
-- The evolution of analytics – from descriptive to prescriptive
-- High impact use cases as a result of the changing analytics world
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
With so many new, evolving frameworks, tools, and languages, a new big data project can lead to confusion and unwarranted risk.
Many organizations have found Data Warehouse Optimization with Hadoop to be a good starting point on their Big Data journey. Offloading ETL workloads from the enterprise data warehouse (EDW) into Hadoop is a well-defined use case that produces tangible results for driving more insights while lowering costs. You gain significant business agility, avoid costly EDW upgrades, and free up EDW capacity for faster queries. This quick win builds credibility and generates savings to reinvest in more Big Data projects.
A proven reference architecture that includes everything you need in a turnkey solution – the Hadoop distribution, data integration software, servers, networking and services – makes it even easier to get started.
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
The initial version of a maturity roadmap to help guide businesses when adopting AI technology into their workflow. IBM Watson Studio is referenced as an example of technology that can help in accelerating the adoption process.
Managing the Impact of COVID-19 Using Data VirtualizationDenodo
Watch here: https://bit.ly/2UUa7K1
To help alleviate the ramifications of COVID-19, Denodo launched the Coronavirus Data Portal (CDP), a collaborative initiative that leverages data virtualization to unify critical datasets originally exposed in different formats from multiple sources and countries, and make the unified data open to everyone.
Using the CDP and the data virtualization capabilities of the Denodo Platform, pmOne created detailed reports and AI analysis, seamlessly orchestrating all of the information streams in the pmOne Share Cockpit.
Working together, Denodo and pmOne provides the global community with trustworthy, up-to-date data about COVID-19 that can be used to develop new intelligence about COVID-19 and reduce its impact.
In this webinar, we will talk about how the CDP can accelerate your organization’s efforts to build solutions for fighting this terrible disease to save lives, the livelihood of workers, and our global economy.
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
Agile BI Development Through AutomationManta Tools
How can code life cycle automation satisfy the growing demands in modern enterprise business intelligence?
Whilst an agile approach to BI development is useful for delivering value in general, the use of advanced automation techniques can also save significant resources, prevent production errors, and shorten time to market.
Gentlemen from Data To Value, Manta Tools, Volkswagen and M&G investments presented and discussed different approaches to agile BI development. Take a look!
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3aXysas
Advanced data science techniques, like machine learning, have proven to be extremely useful to derive valuable insights from your data. Data Science platforms have become more approachable and user friendly. With all the advancements in the technology space, the Data Scientist is still spending most of the time massaging and manipulating the data into a usable data asset. How can we empower the data scientist? How can we make data more accessible, and foster a data sharing culture?
Join us, and we will show you how Data Virtualization can do just that, with an agile and AI/ML laced data management platform. It can empower your organization, foster a data sharing culture, and simplify the life of the data scientist.
Watch this webinar to learn:
- How data virtualization simplifies the life of the data scientist, by overcoming data access and manipulation hurdles.
- How integrated Denodo Data Science notebook provides for a unified environment
- How Denodo uses AI/ML internally to drive the value of the data and expose insights
- How customers have used Data Virtualization in their Data Science initiatives.
Big Data for Data Scientists - Info SessionWeCloudData
In this talk, WeCloudData introduces the Hadoop/Spark ecosystem and how businesses use big data tools and platforms. For more detail about WeCloudData's big data for data scientist course please visit: https://weclouddata.com/data-science/
Getting Started with Big Data for Business ManagersDatameer
Big Data has become critical to the enterprise because of the massive amount of untapped data sources, and the potential to gain new insights that were previously not possible. So, how to get started with Big Data and Hadoop becomes a question more pertinent than ever before.
Listen to leading analyst at Ovum, Tony Baer, as he discusses answers to the key questions around how to:
Approach Big Data and associated business challenges
-- Identify what types of new insights can be revealed by Big Data
-- Staff for this undertaking and implement the technology necessary to be successful
-- Take the first steps toward getting started with Big Data on Hadoop
Watch full webinar here: https://bit.ly/2vN59VK
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
- What data virtualization really is.
- How it differs from other enterprise data integration technologies.
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations.
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...Denodo
Watch the full webinar here: https://buff.ly/2FipFSD
Data is fueling a new digital economy and compelling companies to rapidly adopt modern technologies such as Machine Learning, AI and Cognitive Science. Consequently, assembling the right blend of data from disparate sources using agile and flexible techniques like logical data warehousing to create purposeful, accessible insights is one of the greatest strategic tasks before us.
To address the challenges associated with advanced analytics solutions, Neudesic uses a best-fit-engineering approach to enable enterprises to utilize the right tools for the right job to maximize their data and analytics strategy. When helping customers construct architectures that surface more data to an ever-growing number of data consumers without the need for data replication, Neudesic looks to Denodo as its tool of choice.
Join Neudesic and Denodo for an interactive webinar to learn how you can apply data virtualization to your advanced analytics strategy for the purpose of achieving growth objectives.
Register for this webinar to learn:
• Why data virtualization should be part of your advanced analytics strategy.
• How easily your use case will fit one of the numerous architecture patterns Denodo enables.
• How Denodo’s innovative engine offers best of breed data virtualization capabilities, through a product demonstration.
Self Service Analytics enabled by Data Virtualization from DenodoDenodo
Watch full webinar here: https://bit.ly/39U9qY8
Self-service Analytics BI is often quoted by many - ie, allow users to discover and access data without having to ask IT to create a data mart, or by allowing users to directly export/copy the data from the data sources themselves into their analytics tools and systems. The challenge is not just to provide access to the data – even from Excel this can be done - but to do this in real time without creating processing overhead, while getting trusted data, with the best response time possible, in a managed, governed and secure way in order for these users to trust the output of the analysis.
Data Virtualization provides a data access platform that allows users to access the data they need from multiple data sources, when they need it, and with the best possible response time. In addition, a Data Marketplace built on top of this proven technology enables Self Service Analytics by exposing consistent and governed data sets to be discovered by users, providing the trusted foundation for a successful Self-Service Analytics initiative.
5 Tips to Building a Successful Big Data StrategyWestern Digital
Watch the full webinar here: http://bit.ly/1Yqr5Lz
Companies are seeking ways to leverage big data and analytics to improve business operations or create new revenue streams. But where do you begin? Join Janet George, SanDisk Chief Data Scientist, as she shares the biggest challenges companies face when first analyzing their data, common mistakes and 5 tips on how to build a successful big data strategy.
The Infochimps Platform is your end-to-end Big Data solution, complete with infrastructure and expertise. Scalably and affordably ingest data from your legacy databases, data feeds, data from the web, or our Data Marketplace. Make it useful with algorithm hosting, Elastic Hadoop, and in-stream data augmentation. Let us host and manage your database, or deliver data back to your current stack.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
More Related Content
Similar to Data Mesh - It's not about technology, it's about people
Managing the Impact of COVID-19 Using Data VirtualizationDenodo
Watch here: https://bit.ly/2UUa7K1
To help alleviate the ramifications of COVID-19, Denodo launched the Coronavirus Data Portal (CDP), a collaborative initiative that leverages data virtualization to unify critical datasets originally exposed in different formats from multiple sources and countries, and make the unified data open to everyone.
Using the CDP and the data virtualization capabilities of the Denodo Platform, pmOne created detailed reports and AI analysis, seamlessly orchestrating all of the information streams in the pmOne Share Cockpit.
Working together, Denodo and pmOne provides the global community with trustworthy, up-to-date data about COVID-19 that can be used to develop new intelligence about COVID-19 and reduce its impact.
In this webinar, we will talk about how the CDP can accelerate your organization’s efforts to build solutions for fighting this terrible disease to save lives, the livelihood of workers, and our global economy.
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
If you’ve spent time investigating Big Data, you quickly realize that the issues surrounding Big Data are often complex to analyze and solve. The sheer volume, velocity and variety changes the way we think about data – including how enterprises approach data architecture.
Significant reduction in costs for processing, managing, and storing data, combined with the need for business agility and analytics, requires CIOs and enterprise architects to rethink their enterprise data architecture and develop a next-generation approach to solve the complexities of Big Data.
Creating the data architecture while integrating Big Data into the heart of the enterprise data architecture is a challenge. This webinar covered:
-Why Big Data capabilities must be strategically integrated into an enterprise’s data architecture
-How a next-generation architecture can be conceptualized
-The key components to a robust next generation architecture
-How to incrementally transition to a next generation data architecture
Agile BI Development Through AutomationManta Tools
How can code life cycle automation satisfy the growing demands in modern enterprise business intelligence?
Whilst an agile approach to BI development is useful for delivering value in general, the use of advanced automation techniques can also save significant resources, prevent production errors, and shorten time to market.
Gentlemen from Data To Value, Manta Tools, Volkswagen and M&G investments presented and discussed different approaches to agile BI development. Take a look!
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3aXysas
Advanced data science techniques, like machine learning, have proven to be extremely useful to derive valuable insights from your data. Data Science platforms have become more approachable and user friendly. With all the advancements in the technology space, the Data Scientist is still spending most of the time massaging and manipulating the data into a usable data asset. How can we empower the data scientist? How can we make data more accessible, and foster a data sharing culture?
Join us, and we will show you how Data Virtualization can do just that, with an agile and AI/ML laced data management platform. It can empower your organization, foster a data sharing culture, and simplify the life of the data scientist.
Watch this webinar to learn:
- How data virtualization simplifies the life of the data scientist, by overcoming data access and manipulation hurdles.
- How integrated Denodo Data Science notebook provides for a unified environment
- How Denodo uses AI/ML internally to drive the value of the data and expose insights
- How customers have used Data Virtualization in their Data Science initiatives.
Big Data for Data Scientists - Info SessionWeCloudData
In this talk, WeCloudData introduces the Hadoop/Spark ecosystem and how businesses use big data tools and platforms. For more detail about WeCloudData's big data for data scientist course please visit: https://weclouddata.com/data-science/
Getting Started with Big Data for Business ManagersDatameer
Big Data has become critical to the enterprise because of the massive amount of untapped data sources, and the potential to gain new insights that were previously not possible. So, how to get started with Big Data and Hadoop becomes a question more pertinent than ever before.
Listen to leading analyst at Ovum, Tony Baer, as he discusses answers to the key questions around how to:
Approach Big Data and associated business challenges
-- Identify what types of new insights can be revealed by Big Data
-- Staff for this undertaking and implement the technology necessary to be successful
-- Take the first steps toward getting started with Big Data on Hadoop
Watch full webinar here: https://bit.ly/2vN59VK
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
- What data virtualization really is.
- How it differs from other enterprise data integration technologies.
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations.
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...Denodo
Watch the full webinar here: https://buff.ly/2FipFSD
Data is fueling a new digital economy and compelling companies to rapidly adopt modern technologies such as Machine Learning, AI and Cognitive Science. Consequently, assembling the right blend of data from disparate sources using agile and flexible techniques like logical data warehousing to create purposeful, accessible insights is one of the greatest strategic tasks before us.
To address the challenges associated with advanced analytics solutions, Neudesic uses a best-fit-engineering approach to enable enterprises to utilize the right tools for the right job to maximize their data and analytics strategy. When helping customers construct architectures that surface more data to an ever-growing number of data consumers without the need for data replication, Neudesic looks to Denodo as its tool of choice.
Join Neudesic and Denodo for an interactive webinar to learn how you can apply data virtualization to your advanced analytics strategy for the purpose of achieving growth objectives.
Register for this webinar to learn:
• Why data virtualization should be part of your advanced analytics strategy.
• How easily your use case will fit one of the numerous architecture patterns Denodo enables.
• How Denodo’s innovative engine offers best of breed data virtualization capabilities, through a product demonstration.
Self Service Analytics enabled by Data Virtualization from DenodoDenodo
Watch full webinar here: https://bit.ly/39U9qY8
Self-service Analytics BI is often quoted by many - ie, allow users to discover and access data without having to ask IT to create a data mart, or by allowing users to directly export/copy the data from the data sources themselves into their analytics tools and systems. The challenge is not just to provide access to the data – even from Excel this can be done - but to do this in real time without creating processing overhead, while getting trusted data, with the best response time possible, in a managed, governed and secure way in order for these users to trust the output of the analysis.
Data Virtualization provides a data access platform that allows users to access the data they need from multiple data sources, when they need it, and with the best possible response time. In addition, a Data Marketplace built on top of this proven technology enables Self Service Analytics by exposing consistent and governed data sets to be discovered by users, providing the trusted foundation for a successful Self-Service Analytics initiative.
5 Tips to Building a Successful Big Data StrategyWestern Digital
Watch the full webinar here: http://bit.ly/1Yqr5Lz
Companies are seeking ways to leverage big data and analytics to improve business operations or create new revenue streams. But where do you begin? Join Janet George, SanDisk Chief Data Scientist, as she shares the biggest challenges companies face when first analyzing their data, common mistakes and 5 tips on how to build a successful big data strategy.
The Infochimps Platform is your end-to-end Big Data solution, complete with infrastructure and expertise. Scalably and affordably ingest data from your legacy databases, data feeds, data from the web, or our Data Marketplace. Make it useful with algorithm hosting, Elastic Hadoop, and in-stream data augmentation. Let us host and manage your database, or deliver data back to your current stack.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Dr. Arif Wider
A talk about applying Continuous Delivery to Machine Learning (CD4ML) presented by Arif Wider from ThoughtWorks at NDC Sydney Conference 2019.
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
Continuous Intelligence: Moving Machine Learning into Production ReliablyDr. Arif Wider
A workshop by Danilo Sato, Christoph Windheuser, Emily Gorcenski, and Arif Wider, given at Strata Data Conference 2019 in London.
Abstract:
So you want to include a machine learning component in your IT systems? The process is a little more involved than clicking through an AI tutorial on your laptop. It’s not just the first working model you run that you need to consider; you also need to think about things like integration, scaling, and testing. What’s more, postlaunch, you’ll want to continuously adapt your model to respond to the changing environment.
ThoughtWorks pioneered continuous delivery—a set of tools and processes that ensure that software under development can be reliably released to production at any time and with high frequency.
Danilo Sato and Christoph Windheuser demonstrate how to apply continuous delivery to machine learning—what’s known as continuous intelligence. In a live scenario, you’ll change a machine learning model in a development environment, test its new performance, and, depending on the outcome, automatically deploy the new model into a production environment. The tech stack for this scenario will be Python, DVC (Data Science Version Control), and GoCD.
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Emily Gorcenski and Arif Wider presented a Strata Data Conference 2019 in London.
Abstract:
It’s already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, continuous delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months.
Nevertheless, in the data science world, continuous delivery is rarely applied holistically—due in part to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as is to machine learning projects.
Arif Wider and Emily Gorcenski explore continuous delivery (CD) for AI/ML along with case studies for applying CD principles to data science workflows. Join in to learn how they drew on their expertise to adapt practices and tools to allow for continuous intelligence—the practice of delivering AI applications continuously.
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDr. Arif Wider
A talk by Sebastian Herold & Dr. Arif Wider at TDWI 2018 Munich.
Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto based on five general themes which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Zalando, one of Europe's largest online fashion platforms.
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDr. Arif Wider
A talk given by Dr. Arif Wider (ThoughtWorks) and Sebastian Herold (Zalando) at OOP 2018 in Munich.
Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto of seven principles which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Scout24, the provider of Europe’s largest online markets for cars and real estate.
DataDevOps - A Manifesto on Shared Data Responsibility in Times of MicroservicesDr. Arif Wider
A talk by Sebastian Herold (Scout24) and Arif Wider (ThoughtWorks)
Abstract:
More and more companies successfully migrate their monolithic applications to a Microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: unstructured data, huge amounts of data, and hundreds of data sources. Having a centralized data team does not scale in this setting as it becomes the bottleneck between application developers and business analysts.
We created a Data Manifesto of seven principles which break with traditional role separations and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDevOps: a culture where application developers also own their data. Learn about the experiences we made with facilitating this cultural transformation at Scout24, the provider of Europe’s largest online markets for cars and real estate.
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Dr. Arif Wider
How we applied continuous delivery to data science to create a high-performance & quickly evolving data product. Presented at Predictive Analytics World Business London 2016 by Arif Wider (ThoughtWorks) and Christian Deger (AutoScout24).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Data Mesh - It's not about technology, it's about people
1. Data Mesh
It’s not about technology, it’s about people
Dr. Arif Wider
Data Innovation Summit 2022, May 5-6, Stockholm, Sweden
Dr. Arif Wide
2. Who am I
Dr. Arif Wider
● Former Head of Data & AI
at Thoughtworks Germany
● Many years hands-on consulting,
often as a lead engineer in
Thoughtworks’ client’s data teams
● Now tenured professor of
software engineering at HTW Berlin
3. SCALE
~ 2005 ~ 2007 ~2010+
Volume Velocity
Variety
What problem are we solving?
4. SCALE
~ 2005 ~ 2007 ~2010+ NOW
Volume Velocity
Variety Getting Value in
face of complexity
What problem are we solving?
9. The source of the problem
9
checkout
service
checkout
events
10. The source of the problem
10
checkout
service
checkout
events
f
r
i
c
t
i
o
n
11. Data Mesh is a synthesis of existing practices
11
Domain-
Driven Design
Product
Thinking
Platform
Thinking
Data
Mesh
12. Data as a product
Data Product
Manager
Domain Data
Product
What is my market?
What are the desires
of my customers?
What “price” is justified?
How to do marketing?
What’s the USP?
Are my customers happy?
13. Applying domain-driven design…
13
Data products belong inside
domains. A domain will
usually contain many data
products that can be used
both within and outside its
domain.
Types of data products:
source-aligned, consumer-
aligned, or aggregate data
products.
Data products don’t operate
in isolation.
Data Products
14. …to structure your organisation, i.e. your people
14
Data products belong inside
domains. A domain will
usually contain many data
products that can be used
both within and outside its
domain.
Types of data products:
source-aligned, consumer-
aligned, or aggregate data
products.
Data products don’t operate
in isolation
Data Products
Domain Teams
Teams own one or more
data products, depending
on the complexity. They
usually sit within a domain.
Teams have long-term
ownership for data
products.
15. From
● THE data team
● Data pipelines
● Producing data
● Data engineering team
● Data lake / warehouse
It’s a mindset and a language shift,
not an architecture (well, mostly)
To
● WHICH data product team
● Data domains with data experts
● Owning and serving data
● Cross-functional data product teams
● Data product experience platform
16. Spreading a mindset
Incubate
16
To begin, incubate
a small slice of the
overall operating
model to establish
some of the
foundational
capabilities and
learn from them.
17. Spreading a mindset
Sustain
17
As the foundations
become
sustainable further
data products can
be introduced.
With each new
products coming
on line the teams
can also begin to
grow.
18. Spreading a mindset
Sustain
18
As teams learn and
the platform begins
to mature, the
development of
further data
products begins to
accelerate
19. Spreading a mindset
Scale
19
As data products
scale and mature
the ecosystem
establishes with
increased
interconnectivity
creating new
possibilities for
deeper insight and
collective
knowledge across
the organisation