Telecom Bell is migrating their core applications to the cloud to improve network quality of service and enable personalized customer engagement using customer data. They are facing challenges with their on-premise data platform's lack of scalability, data silos, and governance issues. Databricks will help design a new cloud-based data platform architecture using their platform and Confluent for event streaming. The joint delivery approach between Telecom Bell and Databricks teams will include establishing data governance, migrating applications in phases, change management support, and reaching the desired timeline of May 2024.
This migration plan aims to explore the potential of migrating from on-premises Hadoop to Azure Databricks. By leveraging Databricks' scalability, performance, collaboration, and advanced analytics capabilities, organizations can unlock faster insights and facilitate data-driven decision-making.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
This migration plan aims to explore the potential of migrating from on-premises Hadoop to Azure Databricks. By leveraging Databricks' scalability, performance, collaboration, and advanced analytics capabilities, organizations can unlock faster insights and facilitate data-driven decision-making.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesDATAVERSITY
With the aid of any number of data management and processing tools, data flows through multiple on-prem and cloud storage locations before it’s delivered to business users. As a result, IT teams — including IT Ops, DataOps, and DevOps — are often overwhelmed by the complexity of creating a reliable data pipeline that includes the automation and observability they require.
The answer to this widespread problem is a centralized data pipeline orchestration solution.
Join Stonebranch’s Scott Davis, Global Vice President and Ravi Murugesan, Sr. Solution Engineer to learn how DataOps teams orchestrate their end-to-end data pipelines with a platform approach to managing automation.
Key Learnings:
- Discover how to orchestrate data pipelines across a hybrid IT environment (on-prem and cloud)
- Find out how DataOps teams are empowered with event-based triggers for real-time data flow
- See examples of reports, dashboards, and proactive alerts designed to help you reliably keep data flowing through your business — with the observability you require
- Discover how to replace clunky legacy approaches to streaming data in a multi-cloud environment
- See what’s possible with the Stonebranch Universal Automation Center (UAC)
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Building Data Quality pipelines with Apache Spark and Delta LakeDatabricks
Technical Leads and Databricks Champions Darren Fuller & Sandy May will give a fast paced view of how they have productionised Data Quality Pipelines across multiple enterprise customers. Their vision to empower business decisions on data remediation actions and self healing of Data Pipelines led them to build a library of Data Quality rule templates and accompanying reporting Data Model and PowerBI reports.
With the drive for more and more intelligence driven from the Lake and less from the Warehouse, also known as the Lakehouse pattern, Data Quality at the Lake layer becomes pivotal. Tools like Delta Lake become building blocks for Data Quality with Schema protection and simple column checking, however, for larger customers they often do not go far enough. Notebooks will be shown in quick fire demos how Spark can be leverage at point of Staging or Curation to apply rules over data.
Expect to see simple rules such as Net sales = Gross sales + Tax, or values existing with in a list. As well as complex rules such as validation of statistical distributions and complex pattern matching. Ending with a quick view into future work in the realm of Data Compliance for PII data with generations of rules using regex patterns and Machine Learning rules based on transfer learning.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
Do you ever wonder how data-driven organizations fuel analytics, improve customer experience, and accelerate business productivity? They are successful by governing and mastering data effectively so they can get trusted data to those who need it faster. Efficient data discovery, mastering and democratization is critical for swiftly linking accurate data with business consumers. When business teams can quickly and easily locate, interpret, trust, and apply data assets to support sound business judgment, it takes less time to see value.
Join data mastering and data governance experts from Informatica—plus a real-world organization empowering trusted data for analytics—for a lively panel discussion. You’ll hear more about how a single cloud-native approach can help global businesses in any economy create more value—faster, more reliably, and with more confidence—by making data management and governance easier to implement.
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
Watch full webinar here: https://bit.ly/3g9PlQP
It is no news that Oil and Gas companies are constantly faced with immense pressure to stay competitive, especially in the current climate while striving towards becoming data-driven at the heart of the process to scale and gain greater operational efficiencies across the organization.
Hence, the need for a logical data layer to help Oil and Gas businesses move towards a unified secure and governed environment to optimize the potential of data assets across the enterprise efficiently and deliver real-time insights.
Tune in to this on-demand webinar where you will:
- Discover the role of data fabrics and Industry 4.0 in enabling smart fields
- Understand how to connect data assets and the associated value chain to high impact domain areas
- See examples of organizations accelerating time-to-value and reducing NPT
- Learn best practices for handling real-time/streaming/IoT data for analytical and operational use cases
Democratized Data & Analytics for the CloudPrecisely
In an era driven by data, organizations are constantly seeking ways to harness the power of their data assets to make informed decisions, gain competitive advantages, and foster innovation. The cloud has emerged as a game-changer, offering unparalleled scalability and accessibility for data and analytics solutions. However, achieving true democratization of data and analytics in the cloud remains a significant challenge.
In this session we will discuss:
· Why companies are pushing to move workloads to the cloud
· How data silos and a lack of democratized data can impact organizations
· Best practices and expectations for bringing data to the cloud for analytics
· Precisely’s solution for trusted data and analytics for the cloud
Watch our 10-minute webinar and embark on a journey to democratize data and analytics, enabling your organization to thrive in the data-driven age. Whether you are a data professional, IT leader, or business executive, this session will equip you with the knowledge and tools to harness the full potential of your data assets in the cloud.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Building Data Quality pipelines with Apache Spark and Delta LakeDatabricks
Technical Leads and Databricks Champions Darren Fuller & Sandy May will give a fast paced view of how they have productionised Data Quality Pipelines across multiple enterprise customers. Their vision to empower business decisions on data remediation actions and self healing of Data Pipelines led them to build a library of Data Quality rule templates and accompanying reporting Data Model and PowerBI reports.
With the drive for more and more intelligence driven from the Lake and less from the Warehouse, also known as the Lakehouse pattern, Data Quality at the Lake layer becomes pivotal. Tools like Delta Lake become building blocks for Data Quality with Schema protection and simple column checking, however, for larger customers they often do not go far enough. Notebooks will be shown in quick fire demos how Spark can be leverage at point of Staging or Curation to apply rules over data.
Expect to see simple rules such as Net sales = Gross sales + Tax, or values existing with in a list. As well as complex rules such as validation of statistical distributions and complex pattern matching. Ending with a quick view into future work in the realm of Data Compliance for PII data with generations of rules using regex patterns and Machine Learning rules based on transfer learning.
In this session, Sergio covered the Lakehouse concept and how companies implement it, from data ingestion to insight. He showed how you could use Azure Data Services to speed up your Analytics project from ingesting, modelling and delivering insights to end users.
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
The world of data architecture began with applications. Next came data warehouses. Then text was organized into a data warehouse.
Then one day the world discovered a whole new kind of data that was being generated by organizations. The world found that machines generated data that could be transformed into valuable insights. This was the origin of what is today called the data lakehouse. The evolution of data architecture continues today.
Come listen to industry experts describe this transformation of ordinary data into a data architecture that is invaluable to business. Simply put, organizations that take data architecture seriously are going to be at the forefront of business tomorrow.
This is an educational event.
Several of the authors of the book Building the Data Lakehouse will be presenting at this symposium.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
Organizations have been chasing the dream of data democratization, unlocking and accessing data at scale to serve their customers and business, for over a half a century from early days of data warehousing. They have been trying to reach this dream through multiple generations of architectures, such as data warehouse and data lake, through a cambrian explosion of tools and a large amount of investments to build their next data platform. Despite the intention and the investments the results have been middling.
In this keynote, Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, and its predecessor data warehouse.
She introduces Data Mesh, a paradigm shift in big data management that draws from modern distributed architecture: considering domains as the first class concern, applying self-sovereignty to distribute the ownership of data, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
This talk introduces the principles underpinning data mesh and Zhamak's recent learnings in creating a path to bring data mesh to life in your organization.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Databricks CEO Ali Ghodsi introduces Databricks Delta, a new data management system that combines the scale and cost-efficiency of a data lake, the performance and reliability of a data warehouse, and the low latency of streaming.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
This is the presentation for the talk I gave at JavaDay Kiev 2015. This is about an evolution of data processing systems from simple ones with single DWH to the complex approaches like Data Lake, Lambda Architecture and Pipeline architecture
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
Do you ever wonder how data-driven organizations fuel analytics, improve customer experience, and accelerate business productivity? They are successful by governing and mastering data effectively so they can get trusted data to those who need it faster. Efficient data discovery, mastering and democratization is critical for swiftly linking accurate data with business consumers. When business teams can quickly and easily locate, interpret, trust, and apply data assets to support sound business judgment, it takes less time to see value.
Join data mastering and data governance experts from Informatica—plus a real-world organization empowering trusted data for analytics—for a lively panel discussion. You’ll hear more about how a single cloud-native approach can help global businesses in any economy create more value—faster, more reliably, and with more confidence—by making data management and governance easier to implement.
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
Watch full webinar here: https://bit.ly/3g9PlQP
It is no news that Oil and Gas companies are constantly faced with immense pressure to stay competitive, especially in the current climate while striving towards becoming data-driven at the heart of the process to scale and gain greater operational efficiencies across the organization.
Hence, the need for a logical data layer to help Oil and Gas businesses move towards a unified secure and governed environment to optimize the potential of data assets across the enterprise efficiently and deliver real-time insights.
Tune in to this on-demand webinar where you will:
- Discover the role of data fabrics and Industry 4.0 in enabling smart fields
- Understand how to connect data assets and the associated value chain to high impact domain areas
- See examples of organizations accelerating time-to-value and reducing NPT
- Learn best practices for handling real-time/streaming/IoT data for analytical and operational use cases
Democratized Data & Analytics for the CloudPrecisely
In an era driven by data, organizations are constantly seeking ways to harness the power of their data assets to make informed decisions, gain competitive advantages, and foster innovation. The cloud has emerged as a game-changer, offering unparalleled scalability and accessibility for data and analytics solutions. However, achieving true democratization of data and analytics in the cloud remains a significant challenge.
In this session we will discuss:
· Why companies are pushing to move workloads to the cloud
· How data silos and a lack of democratized data can impact organizations
· Best practices and expectations for bringing data to the cloud for analytics
· Precisely’s solution for trusted data and analytics for the cloud
Watch our 10-minute webinar and embark on a journey to democratize data and analytics, enabling your organization to thrive in the data-driven age. Whether you are a data professional, IT leader, or business executive, this session will equip you with the knowledge and tools to harness the full potential of your data assets in the cloud.
On the Cloud? Data Integrity for Insurers in Cloud-Based PlatformsPrecisely
In our undeniably digital world, data is one of the most precious assets in business. This is especially true for the insurance industry, which is why many are leveraging modern cloud-based platforms to improve performance, reduce costs, and capitalize on new opportunities to innovate. While all industries feel the pressure to preserve or enhance the integrity of their data through their cloud migration initiatives, insurers are especially impacted given how crucial data is to their operation. With their high volumes of claims, policies, and premiums, an ineffective approach to data quality and validation, not only slows down cloud migration but leaves organizations open to threats and risk. Although there is no one-size-fits-all solution for implementing and maintaining data integrity for insurance companies, ensuring the potential to extract value from their data is maximized is universal.
If you are thinking of moving into a cloud platform or wondering what is next, join us to learn about:
Integrating data silos and ensuring better securityLeveraging data observability to proactively identify data issues before they impact the businessDelivering quality data attributes that are trusted and fit for purposeEnhancing business data through data enrichment and location intelligence solutions to unlock valuable, hidden context, and reveal critical relationships transforming raw data into actionable insights
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
Get ahead of the cloud or get left behindMatt Mandich
An enterprise cloud computing strategy results in:
Broad consensus on goals and expected results of moving select processes to the cloud
Standardized, consistent approach to evaluating the benefits and challenges of cloud projects
Clear requirements for the negotiation and monitoring of partnerships with cloud service providers
Understanding and consensus on the enabling and managing role IT will play in future cloud initiatives
Goals and a roadmap for transforming internal IT from asset managers to service broker
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Gartner predicts that by 2026, 75% of organizations will adopt a digital transformation model predicated on cloud as the fundamental underlying platform. It is clear that cloud is here to stay and will continue to be top of mind for organizations of all sizes for years to come. To have a successful cloud strategy, not only is it important to know how other organizations are successfully migrating their architecture, but also how they are handling operations once they make the switch.
However, moving to and operating in the cloud successfully is not as easy as purchasing some public cloud credits and calling it a day. There are many common challenges that organizations face as they move to be cloud-first. By understanding more about these challenges, organizations can avoid expensive consequences.
Join this session to learn about:
Top trends in cloud migration and computingCommon challenges that organizations face as they move to a cloud-first approachConsequences that organizations face when they mishandle cloud adoption
Reinventing and Simplifying Data Management for a Successful Hybrid and Multi...Denodo
Watch full webinar here: https://bit.ly/3AdAzkW
Hybrid cloud has become the standard for businesses. A successful move will involve using an intelligent and scalable architecture and seeking the right help. At the same time, multi-cloud strategies are on the rise. More enterprise organizations than ever before are analyzing their current technology portfolio and defining a cloud strategy that encompasses multiple cloud platforms to suit specific app workloads, and move those workloads as they see fit. Learn the key challenges in multi-hybrid data management, and how you can accelerate your digital transformation journey in a multi-cloud environment with data virtualization.
Data Quality from Precisely: Spectrum QualityPrecisely
Earlier this month, we announced new data quality capabilities in the Precisely Data Integrity Suite. During Precisely's annual Trust ’23 event, our experts discussed all these exciting advancements.
Join this informative webinar to hear specifically how Spectrum Quality customers can take advantage of the Data Integrity Suite for new use cases. You'll hear from our product experts about:
Benefits of the Data Integrity Suite for Spectrum Quality customersData quality use cases optimized in the Data Integrity SuiteHigh-level roadmap for Spectrum Quality
The Shifting Landscape of Data IntegrationDATAVERSITY
Enterprises and organizations from every industry and scale are working to leverage data to achieve their strategic objectives — whether they are to be more profitable, effective, risk-tolerant, prepared, sustainable, and/or adaptable in an ever-changing world. Data has exploded in volume during the last decade as humans and machines alike produce data at an exponential pace. Also, exciting technologies have emerged around that data to improve our abilities and capabilities around what we can do with data.
Behind this data revolution, there are forces at work, causing enterprises to shift the way they leverage data and accelerate the demand for leverageable data. Organizations (and the climates in which they operate) are becoming more and more complex. They are also becoming increasingly digital and, thus, dependent on how data informs, transforms, and automates their operations and decisions. With increased digitization comes an increased need for both scale and agility at scale.
In this session, we have undertaken an ambitious goal of evaluating the current vendor landscape and assessing which platforms have made, or are in the process of making, the leap to this new generation of Data Management and integration capabilities.
Multi-Cloud Integration with Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3corOL4
More and more organizations are adopting multi-cloud strategies to provide greater flexibility, cost savings, and performance optimization. Even when organizations commit to a single cloud provider, they often have data and applications spread across different cloud regions to support different business units or geographies. The result of this is a highly distributed infrastructure that makes finding and accessing the data needed for reporting and analytics even more challenging.
The Denodo Platform Multi-Location Architecture provides quick and easy managed access to data while still providing local control to the 'data owners' and complying with local privacy and data protection regulations (think GDPR and CCPA).
In this on-demand webinar, you will learn about:
- The challenges facing organizations as they adopt multi-cloud data strategies
- How the Denodo Platform provides a managed data access layer across the organization
- The different multi-location architectures that can maximize local control over data while still making it readily available
- How organizations have benefited from using the Denodo Platform as a multi-cloud data access layer
Data Driven Advanced Analytics using Denodo Platform on AWSDenodo
Watch full webinar here: https://buff.ly/3JC8gCS
Accelerating cloud adoption and modernizing analytics in the cloud has become a necessity to facilitate timely, insightful, and impactful decision making. However, with the widespread data in an organization across disparate hybrid cloud data sources poses a challenge with real time and well governed analytics. Data Virtualization is a modern data integration technique in which a single semantic layer can be built to help drive data democratization and speed up the analytics in an efficient and cost-effective manner.
Watch this session to learn:
- How various AWS services (Redshift, S3, RDS) can be quickly integrated using Denodo Platform’s logical data management by implementing a logical data fabric (LDF)
- How LDF helps you manage and deliver your data for data science and analytics programs, supporting your business users.
- How governed Data Services layer enables self-service analytics in your complex AWS data landscape
BATbern52 Swisscom's Journey into Data MeshBATbern
Swisscom is taking one bold step after another to become a data-driven company. The approach is always business-first: how to find data&AI-driven solutions to enable the business to make the best decisions to offer a great experience to our customers. Our journey with Data Mesh is no different. Together with the business, we looked at the current challenges of quickly transforming data into information and insights while having a crucial regard for data management and governance. I invite you to this session to go through our transformation from data to data products, how to foster co-creation between data producers and data consumers, and what it takes to create the right balance between central governance and decentralizing the accountability for its implementation.
Webinar: Hybrid Cloud Integration - Why It's Different and Why It MattersSnapLogic
In this webinar, hear from 451 Research analyst Carl Lehmann about how IT organizations are challenged like never before with several disruptive changes. As hybrid clouds proliferate and as workloads shift across these disruptive venues, enterprises must now consider a thoughtful and strategic approach to hybrid cloud integration.
This presentation features a discussion of the business and technical trends driving hybrid cloud integration, how hybrid cloud integration is different from traditional approaches to integration, and why it matters.
To learn more, visit: www.snaplogic.com/connect-faster
Data Governance for the Cloud with Oracle DRMUS-Analytics
Ready to move away from “hope, email and spreadsheets” as a strategy for maintaining system alignments? There’s a better way. Find out how to bring people, processes, and technology together for control over ever-changing enterprise reporting hierarchies and data.
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs
Marlabs’ Business Intelligence and Analytics practice can support customers’ needs throughout the information management lifecycle. As a vendor-agnostic and holistic service provider with expertise in a range of tools and technologies, we can help clients make informed decisions to employ the right technologies that align with their business needs.
Logical Data Fabric and Industry-Focused Solutions by IQZ SystemsDenodo
Watch full webinar here: https://bit.ly/3egOGfr
Logical data fabric, the theme of DataFest 2021, is an exciting new development in data architecture and management, and it is already delivering dramatic benefits in terms of business acceleration and agility. But how did logical data fabric come about, and how can logical data fabric be used to build industry-focused solutions?
In this presentation, Sathish TK, CTO at IQZ Systems, a global technology solutions company, will provide an introduction to logical data fabric, and then discuss specific IQZ solutions for financial services and insurance that are based on logical data fabric.
You will learn:
- What percentage of the available data is actually used for analysis, on average, using traditional data integration strategies
- The benefits of logical data fabric
- The main deployment options for logical data fabric
- How one IQZ financial services customer established golden records for complex data entities by leveraging logical data fabric
- How IQZ leveraged logical data fabric to create a life policy status tracker application
Govern and Protect Your End User InformationDenodo
Watch this Fast Data Strategy session with speakers Clinton Cohagan, Chief Enterprise Data Architect, Lawrence Livermore National Lab & Nageswar Cherukupalli, Vice President & Group Manager, Infosys here: https://buff.ly/2k8f8M5
In its recent report “Predictions 2018: A year of reckoning”, Forrester predicts that 80% of firms affected by GDPR will not comply with the regulation by May 2018. Of those noncompliant firms, 50% will intentionally not comply.
Compliance doesn’t have to be this difficult! What if you have an opportunity to facilitate compliance with a mature technology and significant cost reduction? Data virtualization is a mature, cost-effective technology that enables privacy by design to facilitate compliance.
Attend this session to learn:
• How data virtualization provides a compliance foundation with data catalog, auditing, and data security.
• How you can enable single enterprise-wide data access layer with guardrails.
• Why data virtualization is a must-have capability for compliance use cases.
• How Denodo’s customers have facilitated compliance.
Evolving From Monolithic to Distributed Architecture Patterns in the CloudDenodo
Watch full webinar here: https://goo.gl/rSfYKV
Gartner states in its Predicts 2018: Data Management Strategies Continue to Shift Toward Distributed,
“As data management activities are becoming more widespread in both distributed processing use cases, like IoT, and demands for new types of data, emerging roles such as data scientists or data engineers are expected to be driving the new data management requirements in the coming two years. These trends indicate that both the collection of data as well as the need to connect to data are rapidly becoming the new normal, and that the days of a single data store with all the data of interest — the enterprise data warehouse — are long gone.”
Data management solutions are becoming distributed, heterogeneous and extremely diverse.
Attend this session to learn:
• How to evolve architecture patterns in the cloud using data virtualization.
• How data virtualization accelerates cloud migration and modernization.
• Successful cloud implementation case studies.
Similar to Hadoop Migration to databricks cloud project plan.pptx (20)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
3. Summary | Business challenges
1
1 2 3 4 5
TELECOM BELL must
improve network
QOS to align with
consumers' changing
emphasis on mobile
connectivity and
data usage
As IoT and 5G advance,
customers easily switch
providers, prompting
TELECOM BELL to
prioritize personalized
engagement using
customer data for
customized messaging
and services.
TELECOM BELL is subject
to many regulations,
including data privacy
and security regulations,
and needs effective ways
to adhere to these.
Power of data
there is a data-volume
explosion, requiring both
focus and new
capabilities.
Increase pressure to show
growth and profits
is constant and data and
AI will be a critical
enabler
4. Summary| Technical Challenges
2
Today there are increased expectations and pressure on the Telecom organization to have a strong data & analytics strategy
•Data platform is not scalable for analytics, AI/ML
Upfront capacity planning and cost
Governance of the data on HDFS is a challenge
Data sits in silos and not easy to integrate/ connect
•Lack of discoverability of data (catalog)
•Housekeeping - Maintenance of the in-house cluster is a difficult thru
different portals and installations
•Advance disaster recovery, durability and availability
•Bigger IT infra staff required
5. Summary | Executive Plan
Telecom Bell wants to improve the Quality of Service (QoS) of their network and to get there, start
migrating the core applications to cloud.
Databricks will bring industry leading expertise and Databricks platform expertise to drive the
transformation at speed.
Confluent will bring event streaming platform built on Kafka and the necessary platform support
Telecom Bell has a team of 10 Engineers with expertise on Kafka and spark
Desired timeline – May 2024
4
1
3
2
5
3
7. Platform & Architecture | Current Architecture
1
Limitations
• Data platform is not scalable for analytics,
AI/ML
• Upfront capacity planning and cost
• Governance of the data on HDFS is a
challenge
• Data sits in silos and not easy to integrate/
connect
• Lack of discoverability of data (catalog)
• Housekeeping - Maintenance of the in-
house cluster is a difficult thru different
portals and installations
• Advance disaster recovery, durability and
availability
• Bigger IT infra staff required
8. Platform & Architecture | End state Architecture
2
Design target state architecture for a scalable, secure and well governed data platform
(AI /ML self-serve, advanced engineering capabilities including necessary governance on lake capability)
Highlights
• Warehouse + Data Lake capabilities at scale with
Governance
• Data product mindset – Marketplace, Self service capabilities
• MLOps – Full ML Lifecycle
• Domain data tiers - Advance data management capabilities,
curated democratized data layers
Designing and activating a World Class Data
Platform:
Fundamental Principles
• Scalability
• Performance
• Industrialized processes governing the pipeline
• Distributed, fault tolerant architecture
• Open file format for better interoperability between systems
• Security and reliability
• Data provenance and lineage
• ACID complaint
9. Platform & Architecture | Current vs New
3
More performant and optimized spark
engine
1 Governance under the same roof
2
New
10. Platform & Architecture | Artifacts
4
Key components of the data platform:
A World Class Data Platform!
12. Approach | Our Tenets
1
Security is job
zero
Agile
Methodology
Continues
delivery of
results
Because - "Approach is the first step towards achieving goals"
Leverage customer
asset first
Multiple velocity
joint delivery
approach
A B C D
E F G H
Zero down time Log the journey at
every step to look back
& learn
Principal of least
access
privilege(PoLAP)
13. Approach | Objectives
2
Build the data strategy
roadmap that
empowers Telecom Bell
to overcome its
business challenges
Mindset
HORIZO
N
HORIZO
N
HORIZO
N
Strategic
roadmap
Platform
1
2
3
Build strong
foundations with data
platform development
and implementation
Co-create an operating
model that would take
TELECOM BELL where it
wants, in a sustainable
way.
Migrate core
applications to cloud in
a secure and reliable
way
4
Industrialization
15. Operating Model | Joint Delivery Approach
Executive Leadership
Databricks Leadership:
1
Application Team
Telecom Bell Leadership
1
Program Management
Databricks Lead
1
Telecom Bell Lead
1
Platform Team Data Quality &
Governance
Bringing it Together
Databricks
(Professional services)
5
D
C
B
A
Meeting Cadence
• Bi-Weekly Steering
Committee Meetings
• Weekly PMO Meetings
• Daily Delivery Team
Meetings
Telecom Bell Resources
3
1
Telecom Bell Resources
3
Telecom Bell Resources
4
Telecom Bell Resources
1
Databricks
(Professional services)
5
Databricks
(Professional services)
3
Databricks
(Professional services)
3
16. Leadership
Scrum Master
Application
Team
Functional
Domain Expert
Data Visualization
Engineer
Customer Success
Engineer
Data Engineer
Operating Model | Pod Structure
Data Quality &
Governance
Test /
Quality Lead
Data Quality
Engineer
Data Governance
Lead
Data Lineage and
Profiling Engineer
Product
Owner
Bring it
Together
Delivery Lead
Change
management
Specialist
PMO Lead
Roadmap
Officer
Databricks resource Telecom Bell resource
Leader
Leader
Leader
Leader
Platform
Azure Platform
Cloud Architect
Cloud DevOps
Engineer
Resident Solutions
Architect
Delivery Solutions
Architect
Customer Success
Engineer
Resident
Solutions
Architect
Resident
Solutions
Architect
Specialist Solutions
Architect (Security)
Specialist Solutions
Architect (Security)
Cloud DevOps
Engineer
Scrum Master
16 12
Shared
Resource
Shared
Resource
Shared Resource
Cloud DevOps
Engineer
2
Enterprise
Support
Enterprise
Support
17. CELEBRATION
Celebrate completion
`
PROGRAM
KICKOFF
Operating Model | Road Map
3
DELIVERABL
ES
DIAGNOSTIC OF THE CURRENT
ENVIRONMENT
1
PLATFOR
M
3
END STATE ARCHITECTURE
2
MIGRATION: 10
%
4
MIGRATION: 60%
5
6 MIGRATIO
N
100%
Progress
Progress
Consistently –
communicate,
remove
roadblocks &
eliminate
friction
Celebrate
completion of
quick wins to
strengthen
morale
ALONG THE WAY
Progress
MEASURE PROGRESS
MIGRATION
PLAYBOOK
A repeatable guideline to
migrate
applications to new
architecture
3
HUMAN-CENTERED
CHANGE
Focus on each individual team
member’s technical skills and
capacity for change. Reskill team
members whose roles are changing
1
MINDSET CHANGE
Adopt ‘Data as a Product’, self
service platform, federated
governance, domain specific
ownership
2
PROCESS GOALS
18. Operating Model | Timeline
3
Q2 2023 Q3 2023 Q4 2023 Q1 2024 Q2 2024
Agile : Update Roadmap and plan per evolving
priorities
Current State
Diagnostics
Assess skill and capability
gaps within the organization
Design & Deliver Governance Structure
Databricks workspace
setup
Assess Current State & Catalog Critical
Data Elements
Prepare Governance Strategy
(Identify roles, define interaction model)
Application
Platform
Bring it together
Data Quality +
Governance
Best practices and tagging
Design Target State DQ Monitoring
Steerco
Meeting
Assess Current State Data Governance
Steerco
Meeting
Steerco
Meeting
Steerco
Meeting
Confluent workspace
setup
Cost management reports
Define
Elements/Sources/Dat
a
Test & Modify
Refactor the
code
Deploy
Document &
KT
Define Pods and
teams
Create Upskilling Curriculum
and setup trainings sessions
Establish ways of working –
documentation, win celebrations
Continuously monitor, foresee risk, mitigate risks , fetch leadership
guidance
Project management
Arrange handover of all
areas
Handover
Handover
Handover
Security and compliance | phase1
Security and compliance | phase2
Talk to business
team
Incorporate changes
Cost optimization
Move towards Infra as
code
Implement Target State DQ
Monitoring
20. Industrialization:
Competitive Differentiation
High throughput of innovation analytics (AI/ML)
Predictive analytics at scale
Data driven(real time what-if analysis)
Harmonized MDM; ML & AI based DQ
Fast, repeatable time-to-market from idea to
product
5
Additional Details | Future Scope
1
21. Additional Details | Risk & Mitigation - Technical
1
Risks Mitigating Actions
Data Loss Risk
Reconciliation, Check pointing, Audit, Monitoring. Use of fault tolerant ingestion/migration tools like Azure Data Factory
– Az Copy Activity
Data Corruption and Data Integrity
Risk
Data Validation - Each record is compared in a bidirectional manner, and each record in the old system is compared
against the target system and the target system against the old system
Interference Risks
(simultaneously use of source
application)
Align with the stakeholders of each source on how the bandwidth can be shared. “Bring it together” team come into
play to address this
Schema Evolution
(Changing Dimensions)
Delta file format – Schema evolution feature. Depends on schema on read. Further to make sure there are no
incompatible schemas coming in. A catalog and governance would be leveraged – Databricks Unity Catalog
Authorization Risk MFA and Identity Federation , access controls at row and column level by Delta Lake
Data Security Risk
Apply Encryption where possible and appropriate
All tokens and keys will be securely stored and rotated in Azure Key Vault
Rotate keys on regular interval
Down time due to migration Replicate and activate approach
22. Additional Details | Risk & Mitigation - Other
2
Risk Mitigating Actions
Resource Availability &
Competing Priorities
Making sure employees are fully advised about participation into workshops and/or interviews.
Get the right people at the right time
Senior Leadership Buy-In and Delays
in Decision Making
Strong support from the leadership Group, including areas who are not fully involved by the initial changes. One Team,
One direction
Establish governance to provide clarity on accountabilities for decision making
Potential Impacts to Other
Projects
Strong support from Senior Leadership if there is a need to put a hold on
existing projects
Review current state of ongoing projects to see how it impacts to the Finance model
Prioritize major changes and focus on the big obstacles upfront
Lack of People Adoption –
Major Change
Agile and inspirational change management and communication structure
Leverage Bring it together team, and roles like change management experts to steward people readiness and prepare
for change
Design in Isolation
(Enterprise Integration)
Work with scalable and flexible design principles in mind to ensure proper
integration and alignment with the business. It is a partnership approach
Gather key inputs to support cross function process design decisions
where applicable
Availability of Key Data Inputs
and Information
Simplify data requests to collect data and information at the appropriate level of detail
Assign designated Databricks and Telecom Bell contact to ensure smooth and timely transition of data
Discovery Phase to identify hidden environmental risks to foresee and mitigate
23. Area Assumption
1 Platform
Telecom Bell on premise platform is owned and managed by Telecom Bell and Databricks will get the necessary support to extent the setup to
provision the solution per the scope of this effort.
2 Data Security
Telecom Bell is responsible for the design, integration and operation of all Client Identity and Access Management, Security Incident and Event
Management, Vulnerability Scanning and Security Testing tooling and processes as appropriate.
5 Access & Setup
Telecom Bell will provide system access to all source systems or applications required by scope. Telecom Bell will provide access to systems and
environments(including DEV, SIT) within 5 business days of receipt of request.
6 Access & Setup
Databricks persona will not have access to unencrypted PII data. Telecom Bell will be responsible for encrypting any PII data, prior to extraction in
the Databricks platform.
7 Access & Setup PII and GDPR Data handling will be done by Telecom Bell as per the existing practices in delivery , any additional arrangement is out of scope.
9
Project
Management
Telecom Bell will provide relevant functional, technical and process documentation for data platforms and systems required by the scope.
10
Project
Management
Telecom Bell will nominate full time business and technical SMEs aligned to this project as per the agreed pod structure.
11
Project
Management
Telecom Bell data owners /nominees will make every attempt to attend the Scrum meetings and ceremonies to present their progress on the issues
assigned
12
Project
Management
Telecom Bell will make sure we get required time and support from all the stakeholders for complete success of the project.
14 Data Build
Databricks team will reuse and extend the existing data ingestion tooling and framework to support the ingestion activities into the platform. The
project will carry a data discovery exercise where it will assess the local market data quality and readiness.
15 Data Build Source System inventory have already been identified and already in place.
16 License The Cloudera CDH on premise license is already expired in March 2022. However, the extended support is required and obtained.
Additional Details | Assumptions
3
24. •Is there an onboarding guide for the consultants to get started on your environment ?
Is there a Source System inventory already identified and can be shared ?
What are the roles and skills of existing 10 engineers on the team ?
What is the current data governance mechanism ?
Other than Cloudera, what all other paid subscriptions and packages are installed on the concerned
architecture ?
•Is there any major business contingency on this project plan? If so, what is the impact of the delayed delivery?
•What are all the compliances and regulations that Telecom Bell need to follow about the concerned data?
•Does Telecom Bell already have Azure account? If so, what is the level of enterprise support plan that is subscribed ?
•Does Telecom Bell already have Confluent account? If so, what is the level of enterprise support plan that is subscribed
?
•Any due license expires ?
•What is the Cloudera’s extended support expiry date ?
Additional Details | Questions
4
26. Yashodhan Kale
BACKGROUND SELECTED EXPERIENCES
Amazon Web Services Certified Data Analytics - Specialty
Amazon Web Services Solutions Architect - Associate
Cloudera Certified Developer for Apache Hadoop (CCDH)
RELEVANT FUNCTIONAL AND INDUSTRY EXPERIENCE
Modern Technologist | Data and ML at scale
Design and drive clients' Data and AI journeys powered by cloud analytics
expertise! Offering data product mindset-driven solutions to deliver platforms
and beyond: Self-service framework, rapid experimentation lab, democratized
data, data products marketplace, multi-cloud solutions, data lake, data fabric,
data mesh patterns with federated governance, domain-specific ownership, and
more
Industry Focus:
• HealthCare
• Retail
• Market Research
• Finance
Functional Expertise:
• Digital Transformation
• Analytics and CDO Strategy
• Open Source
• Machine Learning, IOT
• Data Drive Re-invention
• Fortune 5 American healthcare company
Establish and manage DevOps, Data Engineering, and ML engineering teams in close collaboration with Data
Scientists. Set up a self-service Data and ML platform on Azure cloud for a Retail enterprise, incorporating an
experimentation framework, Model Training pipelines, and real-time inference using Azure AKS, Kubeflow, and
Snowflake. Implement an Rx enterprise Data and ML platform on Azure cloud, enabling ETL pipelines with
Databricks and Apache Airflow. Lead the development of large-scale projects, including legacy modernization,
Rx personalization, and Retail personalization programs that impact millions of lives daily. Collaborate with
technology partners, MSFT and NVIDIA, to present objectives, findings, and incorporate feedback for ML
solutions with specialized NVIDIA GPUs. Architect and oversee the implementation of the Refrigerator IoT
project on Azure, leveraging IOT hub, Azure Analytics, and Databricks. Lead the development of SAP HANA to
Spark integration. Manage the enhancement team in Data Engineering for pharmacy-related projects, ensuring
critical business deliveries. Design data-driven solutions, including self-service analytics platforms, rapid
experimentation labs, democratized data, multi-cloud solutions, data fabric, data mesh patterns with federated
governance, and domain-specific ownership. Develop an ingestion framework for seamless data migration across
projects and cloud storage services.
• Multinational American information, data & market measurement company
Build a retail store data aggregation engine (Retail Intelligence system) for 24 countries, initially using Hadoop
MapReduce, later upgraded to Spark. Migrate on-premise batch processes to the cloud using Docker, Azure Batch
Services, and Azure Shipyard for cost efficiency. Perform performance tuning on Apache Spark, cloud Hadoop
clusters (HDI), and Databricks on Azure and Hadoop platforms.
CERTIFICATIONS
PREVIOUSLY
Sr Cloud Solution Architect @ Amazon Web Services Level 6
Sr ML Engineering Manager @ Databricks Level 6
WHAT HAS BROUGHT ME HERE
• Customer Obsession
• Deliver Results
• Earn trust
• Learn and Be Curious
27. ACID Compliant
Time Travel
Data as product
Inter Operability
Self service
experimentation
Scale &
Pay as you go
Lake House Governance
Data Migration
Identity Management,
SSO
Event Streaming
Exactly once
semantics
28. Upfront cost
Not easy to integrate/
connect
Lack of discoverability
Efforts to make data HA &
durable
End of support
Maintenance
29. Platform & Architecture | Artifacts
1
Key components of the data platform:
A World Class Data Platform!
34. Databricks Notebooks
1
Share
insights
Quickly discover new insights with
built-in interactive visualizations,
or leverage libraries such as
Matplotlib and ggplot. Export
results and Notebooks in HTML or
IPYNB format, or build and share
dashboards that always stay up to
date.
3 Production
at scale
Schedule Notebooks to automatically
run machine learning and data
pipelines at scale. Create multistage
pipelines using Databricks Workflows.
Set up alerts and quickly access audit
logs for easy monitoring and
troubleshooting.
2
Work
together
Share Notebooks and work with peers
across teams in multiple languages (R,
Python, SQL and Scala) and libraries of
your choice. Real-time coauthoring,
commenting and automated versioning
simplify collaboration while providing
control.
Editor's Notes
Ex AWS
Ex, Databricks ML Engineering Sr Manager,
I have built data platforms and delivered - campaign management, personalization while touches millinos of lives a day.
Extensively worked into Retail , healthcare, telecom and finance industries and worked into 3 different counties
experienced start up culture. And I know how to deliver results.
Qualities that has brought me here are – Customer ob, Delivering result, earn trust and not giving up on learning.
fifa, chess, Salsa
Transition –
that’s me . With that lets get going
10K ft overview
business, technical , and plan
slightly deeper look into platform
Transition –
what how and when
personalization customer engagement
regulations - data privacy and security
data volumn
show growth and profits.
Top priority : improve QOS
Transition
-Lets look at some technical challenges
To address the B challenges above A Strong data and analytics strategy is
maintain the pace of innovation =
experimentation capability
=
pay as you go is crucial
+ easy access to data
+ SAAS model of services
====
benchmark , Red flags - : kafka and spark architecture which processes nework data
End of support
Transition –
yes we saw the businesses and technical challenges so What’s the plan ? The direction : next slide
we will improve the QoS of the network and start with migration
Databricks
Confluent
telicom bell10 engineers
we plan to complete this project in 12 months
Transition –
Alright. How do we achieve this? I have but together plan and that I will walk you thru.
feedback, suggestions, concerns are all welcome. Craft a final version together.
TB On -premise arch would look more or less like this.
Transition –
Enough time on architecture. Lets take 2 differences and move from here.
1. More optimized more performant.. With less configurations to worry about. : zordering, vaucum, auto optimize feautures.
2. integration with UC
A leap towards data as a product mind set –
Federated goverenance, self service platform, inter – operatability, share within and across organization – notebooks and code. (product mindset.)
Add Marketplace
Talk about few in the interest of time
Security – Azure key vault,
Encryption where possible.
Network setup – No data will flow thru public internet.. Private endpoints will be used.
Principal of least access privilege(PoLAP)
Zero down time:
Replicate and then activate
Leverage customer asset first :
you will see in the next few slides 10 engineers distributed across all project areas
operating model is designed to deliver these objectives over next 12 months
After the essential piece of roadmap and planning
Platform : Not only run existing apps ,
empowers bell to accelerate on pace of innovation
provide solutions beyond the scope of this project: - personalized customer engagement and other business experiments
Also, OM delivers a needed shift in mindset. Think “Data as a product” , create a data-product culture
features of marketplace, federated governance, delta sharing,
Lastly, it deliver pay as you go, secure & low maintenance solution that can handle the immediate need to migrate to cloud “given end of support”
sharing the resource where possible
used all 10 tb enineers
Assuming enterprise support from confluent and azure
total count
Dignostic -
complete picture of where we are,
our pain points,
scope of improvement,
asset
Final End state architecture
Time line activity
Any party has any concerns, we can definitely relook at this and try adjust to make it smoothly achievable
Ex AWS
Ex, Accenture ML Engineering Sr Manager,
I have built data platforms and delivered - campaign management, personalization while touches millinos of lives a day.
Extensively worked into Retail , healthcare, telecom and finance industries and worked into 3 different counties
experienced start up culture. And I know how to deliver results.
Qualities that has brought me here are – Customer ob, Delivering result, earn trust and not giving up on learning.
fifa, chess, Salsa
Transition –
that’s me . With that lets get going
A leap towards data as a product mind set –
Federated governance, self service platform, inter – operability, share within and across organization – notebooks and code. (product mindset.)
Few pain points -
These services all run on premise. upgrades
Limitations
Data platform is not scalable for analytics, AI/ML
Upfront capacity planning and cost
Governance of the data on HDFS is a challenge
Data sits in silos and not easy to integrate/ connect
Lack of discoverability of data (catalog)
Housekeeping - Maintenance of the in-house cluster is a difficult thru different portals and installations
Advance disaster recovery, durability and availability
Bigger IT infra staff required
A leap towards data as a product mind set –
Federated goverenance, self service platform, inter – operatability, share within and across organization – notebooks and code. (product mindset.)
Add Marketplace