Logical Data Warehouse and Data Lakes can play a role in many different type of projects and, in this presentation, we will look at some of the most common patterns and use cases. Learn about analytical and big data patterns as well as performance considerations. Example implementations will be discussed for each pattern.
- Architectural patterns for logical data warehouse and data lakes.
- Performance considerations.
- Customer use cases and demo.
This presentation is part of the Denodo Educational Seminar, and you can watch the video here goo.gl/vycYmZ.
Data Modeling, Data Governance, & Data QualityDATAVERSITY
Data Governance is often referred to as the people, processes, and policies around data and information, and these aspects are critical to the success of any data governance implementation. But just as critical is the technical infrastructure that supports the diverse data environments that run the business. Data models can be the critical link between business definitions and rules and the technical data systems that support them. Without the valuable metadata these models provide, data governance often lacks the “teeth” to be applied in operational and reporting systems.
Join Donna Burbank and her guest, Nigel Turner, as they discuss how data models & metadata-driven data governance can be applied in your organization in order to achieve improved data quality.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Data Modeling, Data Governance, & Data QualityDATAVERSITY
Data Governance is often referred to as the people, processes, and policies around data and information, and these aspects are critical to the success of any data governance implementation. But just as critical is the technical infrastructure that supports the diverse data environments that run the business. Data models can be the critical link between business definitions and rules and the technical data systems that support them. Without the valuable metadata these models provide, data governance often lacks the “teeth” to be applied in operational and reporting systems.
Join Donna Burbank and her guest, Nigel Turner, as they discuss how data models & metadata-driven data governance can be applied in your organization in order to achieve improved data quality.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Activate Data Governance Using the Data CatalogDATAVERSITY
Data Governance programs depend on the activation of data stewards that are held formally accountable for how they manage data. The data catalog is a critical tool to enable your stewards to contribute and interact with an inventory of metadata about the data definition, production, and usage. This interaction is active Data Governance in the truest sense of the word.
In this RWDG webinar, Bob Seiner will share tips and techniques focused on activating your data stewards through a data catalog. Data Governance programs that involve stewards in daily activities are more likely to demonstrate value from their data-intensive investments.
Bob will address the following in this webinar:
- A comparison of active and passive Data Governance
- What it means to have an active Data Governance program
- How a data catalog tool can be used to activate data stewards
- The role a data catalog plays in Data Governance
- The metadata in the data catalog will not govern itself
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
(Updated deck) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the technical components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures when using the Data Vault modeling technique. The target audience is anyone wishing to explore implementing a Data Vault style data model for an Enterprise Data Warehouse, Operational Data Warehouse, or Dynamic Data Integration Store. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Graph databases provide the ability to quickly discover and integrate key relationships between enterprise data sets. Business use cases such as recommendation engines, social networks, enterprise knowledge graphs, and more provide valuable ways to leverage graph databases in your organization. This webinar will provide an overview of graph database technologies, and how they can be used for practical applications to drive business value.
Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Fast Data Strategy Houston Roadshow PresentationDenodo
Fast Data Strategy Houston Roadshow focused on the next industrial revolution on the horizon, driven by the application of big data, IoT and Cloud technologies.
• Denodo’s innovative customer, Anadarko, elaborated on how data virtualization serves as the key component in their prescriptive and predictive analytics initiatives, driven by multi-structured data ranging from customer data to equipment data.
• Denodo’s session, Unleashing the Power of Data, described the complexity of the modern data ecosystem and how to overcome challenges and successfully harness insights.
• Our Partner Noah Consulting, an expert analytics solutions provider in the energy industry, explained how your peers are innovating using new business models and reducing cost in areas such as Asset Management and Operations by leveraging Data Virtualization and Prescriptive and Predictive Analytics.
For more information on upcoming roadshows near you, follow this link: https://goo.gl/WBDHiE
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
Watch the full webinar: Data Ninja Webinar Series by Denodo: https://goo.gl/QDVCjV
The expanding volume and variety of data originating from sources that are both internal and external to the enterprise are challenging businesses in harnessing their big data for actionable insights. In their attempts to overcome big data challenges, organizations are exploring data lakes as consolidated repositories of massive volumes of raw, detailed data of various types and formats. But creating a physical data lake presents its own hurdles.
Attend this session to learn how to effectively manage data lakes for improved agility in data access and enhanced governance.
This is session 5 of the Data Ninja Webinar Series organized by Denodo. If you want to learn more about some of the solutions enabled by data virtualization, click here to watch the entire series: https://goo.gl/8XFd1O
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Activate Data Governance Using the Data CatalogDATAVERSITY
Data Governance programs depend on the activation of data stewards that are held formally accountable for how they manage data. The data catalog is a critical tool to enable your stewards to contribute and interact with an inventory of metadata about the data definition, production, and usage. This interaction is active Data Governance in the truest sense of the word.
In this RWDG webinar, Bob Seiner will share tips and techniques focused on activating your data stewards through a data catalog. Data Governance programs that involve stewards in daily activities are more likely to demonstrate value from their data-intensive investments.
Bob will address the following in this webinar:
- A comparison of active and passive Data Governance
- What it means to have an active Data Governance program
- How a data catalog tool can be used to activate data stewards
- The role a data catalog plays in Data Governance
- The metadata in the data catalog will not govern itself
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
(Updated deck) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the technical components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures when using the Data Vault modeling technique. The target audience is anyone wishing to explore implementing a Data Vault style data model for an Enterprise Data Warehouse, Operational Data Warehouse, or Dynamic Data Integration Store. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Graph databases provide the ability to quickly discover and integrate key relationships between enterprise data sets. Business use cases such as recommendation engines, social networks, enterprise knowledge graphs, and more provide valuable ways to leverage graph databases in your organization. This webinar will provide an overview of graph database technologies, and how they can be used for practical applications to drive business value.
Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Fast Data Strategy Houston Roadshow PresentationDenodo
Fast Data Strategy Houston Roadshow focused on the next industrial revolution on the horizon, driven by the application of big data, IoT and Cloud technologies.
• Denodo’s innovative customer, Anadarko, elaborated on how data virtualization serves as the key component in their prescriptive and predictive analytics initiatives, driven by multi-structured data ranging from customer data to equipment data.
• Denodo’s session, Unleashing the Power of Data, described the complexity of the modern data ecosystem and how to overcome challenges and successfully harness insights.
• Our Partner Noah Consulting, an expert analytics solutions provider in the energy industry, explained how your peers are innovating using new business models and reducing cost in areas such as Asset Management and Operations by leveraging Data Virtualization and Prescriptive and Predictive Analytics.
For more information on upcoming roadshows near you, follow this link: https://goo.gl/WBDHiE
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
Watch the full webinar: Data Ninja Webinar Series by Denodo: https://goo.gl/QDVCjV
The expanding volume and variety of data originating from sources that are both internal and external to the enterprise are challenging businesses in harnessing their big data for actionable insights. In their attempts to overcome big data challenges, organizations are exploring data lakes as consolidated repositories of massive volumes of raw, detailed data of various types and formats. But creating a physical data lake presents its own hurdles.
Attend this session to learn how to effectively manage data lakes for improved agility in data access and enhanced governance.
This is session 5 of the Data Ninja Webinar Series organized by Denodo. If you want to learn more about some of the solutions enabled by data virtualization, click here to watch the entire series: https://goo.gl/8XFd1O
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
Watch full webinar here: https://bit.ly/3oah4ng
Gartner a récemment qualifié la Data Virtualisation comme étant une pièce maitresse des architectures d’intégration de données.
Découvrez :
- Les bénéfices d’une plateforme de virtualisation de données
- La multiplication des usages : Lakehouse, Data Science, Big Data, Data Service & IoT
- La création d’une vue unifiée de votre patrimoine de données sans transiger sur la performance
- La construction d’une architecture d’intégration Agile des données : on-premise, dans le cloud ou hybride
Data Virtualization. An Introduction (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3uiXVoC
What is Data Virtualization and why do I care? In this webinar we intend to help you understand not only what Data Virtualization is but why it's a critical component of any organization's data fabric and how it fits. How data virtualization liberates and empowers your business users via data discovery, data wrangling to generation of reusable reporting objects and data services. Digital transformation demands that we empower all consumers of data within the organization, it also demands agility too. Data Virtualization gives you meaningful access to information that can be shared by a myriad of consumers.
Watch on-demand this session to learn:
- What is Data Virtualization?
- Why do I need Data Virtualization in my organization?
- How do I implement Data Virtualization in my enterprise? Where does it fit..?
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESBDenodo
Data integration is paramount, in this presentation you will find three different paradigms: using client-side tools, creating traditional data warehouses and the data virtualization solution - the logical data warehouse, comparing each other and positioning data virtualization as an integral part of any future-proof IT infrastructure.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/1q94Ka.
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
Watch full webinar here: https://bit.ly/3cUA0Qi
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Innovative Data Strategies for Advanced Analytics Solutions and the Role of D...Denodo
Watch the full webinar here: https://buff.ly/2FipFSD
Data is fueling a new digital economy and compelling companies to rapidly adopt modern technologies such as Machine Learning, AI and Cognitive Science. Consequently, assembling the right blend of data from disparate sources using agile and flexible techniques like logical data warehousing to create purposeful, accessible insights is one of the greatest strategic tasks before us.
To address the challenges associated with advanced analytics solutions, Neudesic uses a best-fit-engineering approach to enable enterprises to utilize the right tools for the right job to maximize their data and analytics strategy. When helping customers construct architectures that surface more data to an ever-growing number of data consumers without the need for data replication, Neudesic looks to Denodo as its tool of choice.
Join Neudesic and Denodo for an interactive webinar to learn how you can apply data virtualization to your advanced analytics strategy for the purpose of achieving growth objectives.
Register for this webinar to learn:
• Why data virtualization should be part of your advanced analytics strategy.
• How easily your use case will fit one of the numerous architecture patterns Denodo enables.
• How Denodo’s innovative engine offers best of breed data virtualization capabilities, through a product demonstration.
Watch full webinar here: https://bit.ly/2vN59VK
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
- What data virtualization really is.
- How it differs from other enterprise data integration technologies.
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations.
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATIONMatt Stubbs
Date: 14th November 2018
Location: Keynote Theatre
Time: 13:50 - 14:20
Speaker: Becky Smith
Organisation: Denodo
About: How many users inside and outside of your organization access your organization’s data? Dozens? Hundreds is probably more like it, each with their own structure and content requirements as well as different access rights. As a result, many organizations have witnessed the formation of “data delivery mills,” in various shapes and sizes. How does one create order and reliability in this world of chaotic data streams? Quite easily, if it’s done with data virtualization.
According to Gartner, "through 2020, 50% of enterprises will implement some form of data virtualization as one enterprise production option for data integration.” Data virtualization enables organizations to gain data insights from multiple, distributed data sources without the time-consuming processes of data extraction and loading. This allows for faster insights and fact-based decisions, which help business realize value sooner.
Join us to find out more about:
• What data virtualization actually means and how it differs from traditional data integration approaches.
• How you can connect and combine all your data in real-time, without compromising on scalability, security or governance.
• The benefits of data virtualization and its most important use cases.
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
Enabling Self-Service Analytics with Logical Data WarehouseDenodo
Watch full webinar here: https://buff.ly/2GNO8PC
What makes data scientists happy? Of course data. They want it fast and flexible, and they want to do it themselves. But most classic data warehouses (DW) and data lakes are not easy to deal with for agile data access. A more practical solution is the logical data warehouse (LDW), which has shown to be a more agile foundation for delivering and transforming data and makes it easy to quickly plug in new data sources.
Attend this webinar to learn:
* How easily new data sources can be made available for analytics and data science
* How your organization can successfully migrate to a flexible LDW architecture in a step-by-step fashion
* How LDWs help integrate self-service analytics with classic forms of business intelligence
Education Seminar: Self-service BI, Logical Data Warehouse and Data LakesDenodo
This educational seminar took place on Thursday, December 8th in Westin Galleria Dallas, Texas.
Self-service BI, Logical Data Warehouse and Data Lakes – They are all essential components of Fast Data Strategy. Many companies are rapidly augmenting their traditional data warehouses, data marts, and ETL with their logical counterparts. Reason? Agility and rapid time-to-market.
Speakers including:
• Chuck DeVries, VP, Strategic Technology and Enterprise Architecture, Vizient,
• Ravi Shankar, Chief Marketing Officer, Denodo
• Charles Yorek, Vice President, iOLAP
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...Denodo
Watch the full webinar - Session one: Data Ninja Webinar Series by Denodo: https://goo.gl/yAdMpL
The following presentation was used during the webinar entitled: "Accelerating Business Value with Data Virtualization Solutions". It discusses the role of data virtualization in delivering real business value from your new and existing data assets.
This is session 1 of the Data Ninja Webinar Series organized by Denodo. If you want to learn more about some of the solutions enabled by data virtualization, click here to watch the entire series: https://goo.gl/8XFd1O
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/2O2r3NP
In the last several decades, BI has evolved from large, monolithic implementations controlled by IT to orchestrated sets of smaller, more agile capabilities that include visual-based data discovery and governance. These new capabilities provide more democratic analytics accessibility that is increasingly being controlled by business users. However, given the rapid advancements in emerging technologies such as cloud and big data systems and the fast changing business requirements, creating a future-proof data management strategy is an incredibly complex task.
Catch this on demand session to understand:
- BI program modernization challenges
- What is data virtualization and why is its adoption growing so quickly?
- How data virtualization works and how it compares to alternative approaches to data integration
- How modern data virtualization can significantly increase agility while reducing costs
Enterprise Monitoring and Auditing in DenodoDenodo
Watch full webinar here: https://buff.ly/3P3l4oK
Proper monitoring of an enterprise system is critical to understanding its capacity and growth, anticipating potential issues, and even understanding key ROI metrics. This also facilitates the implementation of policies and user access audits which are key to optimizing the resource utilization in an organization. Do you want to learn more about the new Denodo features for monitoring, auditing, and visualizing enterprise monitoring data?
Join us for the session with Vijayalakshmi Mani, Data Engineer at Denodo, to understand how the new features and components help in monitoring your Denodo Servers and the resource utilizations and how to extract the most out of the logs that the Denodo Platform generates including FinOps information.
Watch on-demand and Learn:
- What is a Denodo Monitor and what’s new in it?
- How to visualize the Denodo Monitor Information and use of Diagnostics & Monitoring Tool
- Introduction to the new Denodo Dashboard
- Demonstration on the Denodo Dashboard
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps ApproachDenodo
Watch full webinar here: https://buff.ly/4bYOOgb
With the rise of cloud-first initiatives and pay-per-use systems, forecasting IT costs has become a challenge. It's easy to start small, but it's equally easy to get skyrocketing bills with little warning. FinOps is a discipline that tries to tackle these issues, by providing the framework to understand and optimize cloud costs in a more controlled manner. The Denodo Platform, being a middleware layer in charge of global data delivery, sits in a privileged position not only to help us understand where costs are coming from, but also to take action, manage, and reduce them.
Attend this session to learn:
- The importance of FinOps in a cloud architecture.
- How the Denodo Platform can help you collect and visualize key FinOps metrics to understand where your costs are coming from?
- What actions and controls the Denodo Platform offers to keep costs at bay.
Achieving Self-Service Analytics with a Governed Data Services LayerDenodo
Watch full webinar here: https://buff.ly/3wBhxYb
In an increasingly distributed and complex data landscape, it is becoming increasingly difficult to govern and secure data effectively throughout the enterprise. Whether it be securing data across different repositories or monitoring access across different business units, the proliferation of data technologies and repositories across both on-premises and in the cloud is making the task unattainable. The challenge is only made greater by the ongoing pressure to offer self-service data access to business users.
Watch on-demand and learn:
- How to use a logical data fabric to build an enterprise-wide data access role model.
- Centralise security when data is spread across multiple systems residing both on-premises and in the cloud.
- Control and audit data access across different regions.
What you need to know about Generative AI and Data Management?Denodo
Watch full webinar here: https://buff.ly/3UXy0A2
It should be no surprise that Generative AI will have a profound impact to data management in years to come. Much like other areas of the technology sector, the opportunities presented by GenAI will accelerate our efforts around all aspects of data management, including self-service, automation, data governance and security. On the other hand, it is also becoming clearer that to unleash the true potential of AI assistants powered by GenAI, we need novel implementation strategies and a reimagined data architecture. This presents an exhilarating yet challenging future, demanding innovative thinking and methodologies in data management.
Join us on this webinar to learn about:
- The opportunities and challenges presented by GenAI today.
- Exploiting GenAI to democratize data management.
- How to augment GenAI applications with corporate data and knowledge.
- How to get started.
Mastering Data Compliance in a Dynamic Business LandscapeDenodo
Watch full webinar here: https://buff.ly/48rpLQ3
Join us for an enlightening webinar, "Mastering Data Compliance in a Dynamic Business Landscape," presented by Denodo Technologies and W5 Consulting. This session is tailored for business leaders and decision-makers who are navigating the complexities of data compliance in an ever-evolving business environment.
This webinar will focus on why data compliance is crucial for your business. Discover how to turn compliance into a competitive advantage, enhancing operational efficiency and market trust. We'll also address the risks of non-compliance, including financial penalties and the loss of customer trust, and provide strategies to proactively overcome these challenges.
Key Takeaways:
- How can your business leverage data management practices to stay agile and compliant in a rapidly changing regulatory landscape?
- Keys to balancing data accessibility with security and privacy in today's data-driven environment.
- What are the common pitfalls in achieving compliance with regulations like GDPR, CCPA, and HIPAA, and how can your business avoid them?
We will go beyond the technical aspects and delve into how you can strategically position your organization in the realm of data management and compliance. Learn how to craft a data compliance strategy that aligns with your business goals, enhances operational efficiency, and builds stakeholder trust.
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo
Watch full webinar here: https://buff.ly/3OCQvGk
In this session, Denodo Sales Engineer, Yik Chuan Tan, will guide you through the art of delivering a compelling demo of the Denodo Platform with Denodo Demo Lite. Watch to uncover the significant functionalities that set Denodo apart and learn how to effectively win over potential customers.
In this session, we will cover:
Understanding the Denodo Platform & Tailoring Your Demo to Prospect Needs: By gaining a comprehensive understanding of the Denodo Platform, its architecture, and how it addresses data management challenges, you can customize your demo to align with the specific needs and pain points of your prospects, including:
- seamless data integration with real-time access
- data security and governance
- self-service data discovery
- advanced analytics and reporting
- performance optimization scalability and deployment
Watch this Denodo demo session and acquire the skills and knowledge necessary to captivate your prospects. Whether you're a seasoned technical professional or new to the field, this session will equip you with the skills to deliver compelling demos that lead to successful conversions.
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...Denodo
Watch full webinar here: https://buff.ly/3wdI1il
As organizations compete in new markets and new channels, business data requirements include new data platforms and applications. Migration to the cloud typically adds more distributed data when operations set up their own data platforms. This spreads important data across on-premises and cloud-based data platforms. As a result, data silos proliferate and become difficult to access, integrate, manage, and govern. Many organizations are using cloud data platforms to consolidate data, but distributed environments are unlikely to go away.
Organizations need holistic data strategies for unifying distributed data environments to improve data access and data governance, optimize costs and performance, and take advantage of modern technologies as they arrive. This TDWI Expert Panel will focus on overcoming challenges with distributed data to maximize business value.
Key topics this panel will address include:
- Developing the right strategy for your use cases and workloads in distributed data environments, such as data fabrics, data virtualization, and data mesh
- Deciding whether to consolidate data silos or bridge them with distributed data technologies
- Enabling easier self-service access and analytics across a distributed data environment
- Maximizing the value of data catalogs and other data intelligence technologies for distributed data environments
- Monitoring and data observability for spotting problems and ensuring business satisfaction
Watch full webinar here: https://buff.ly/3UE5K5l
The ability to recognize and flag sensitive information within corporate datasets is essential for compliance with emerging privacy laws, for completing a privacy impact assessment (PIA) or data subject access request (DSAR), and also for cyber-insurance compliance. During this session, we will discuss data privacy laws, the challenges they present, and how they can be applied with modern tools.
Join us for the session driven by Mark Rowan, CEO at Data Sentinel, and Bhavita Jaiswal, SE at Denodo, who will show how a data classification engine augments Data Catalog to support data governance and compliance objectives.
Watch on-demand & Learn:
- Changing landscape of data privacy laws and compliance requirements
- How to create a data classification framework
- How Data Sentinel classifies data and this can be integrated into Denodo
- Using the enhanced data classifications via consuming tools such as Data Catalog and Power BI
Знакомство с виртуализацией данных для профессионалов в области данныхDenodo
Watch full webinar here: https://buff.ly/3OETC08
По данным аналитической компании Gartner, "к 2022 году 60% предприятий включат виртуализацию данных в качестве основного метода доставки данных в свою интеграционную архитектуру". Компания Gartner назвала Denodo лидером в Магическом квадранте 2020 года по инструментам интеграции данных.
В ходе этого 1,5-часового занятия вы узнаете, как виртуализация данных революционизирует бизнес и ИТ-подход к доступу, доставке, потреблению, управлению и защите данных, независимо от возраста вашей технологии, формата данных или их местонахождения. Эта зрелая технология устраняет разрыв между ИТ и бизнес-пользователями и обеспечивает значительную экономию средств и времени.
**ФОРМАТ
Онлайн-семинар продолжительностью 1 час 30 минут.
Благодаря записи вы можете выполнять упражнения в своем собственном темпе.
**ДЛЯ КОГО ЭТОТ СЕМИНАР?
ИТ-менеджеры / архитекторы
Специалисты по анализу данных / аналитики
CDO
**СОДЕРЖАНИЕ
В программе: введение в суть виртуализации данных, примеры использования, реальные примеры из практики клиентов и демонстрация возможностей платформы Denodo Platform:
Интеграция и предоставление данных быстро и легко с помощью платформы Denodo Platform 8.0
Оптимизатор запросов Denodo предоставляет данные в режиме реального времени, по запросу, даже для очень больших наборов данных
Выставлять данные в качестве "сервисов данных" для потребления различными пользователями и инструментами
Каталог данных: Открывайте и документируйте данные с помощью нашего Каталога данных
пространства для самостоятельного доступа к данным.
Виртуализация данных играет ключевую роль в управлении и обеспечении безопасности данных в вашей организации
**ПОВЕСТКА
Введение в виртуализацию данных
Примеры использования и примеры из практики клиентов
Архитектура - Управление и безопасность
Производительность
Демо
Следующие шаги: как самостоятельно протестировать и внедрить платформу
Интерактивная сессия вопросов и ответов
Data Democratization: A Secret Sauce to Say Goodbye to Data FragmentationDenodo
Watch full webinar here: https://buff.ly/41Zf31D
Despite recent and evolving technological advances, the vast amounts of data that exist in a typical enterprise is not always available to all stakeholders when they need it. In modern enterprises, there are broad sets of users, with varying levels of skill sets, who strive to make data-driven decisions daily but struggle to gain access to the data needed in a timely manner.
Join our webinar to learn how to:
- Unlock the Power of Your Data: Discover how data democratization can transform your organization by giving every user access to the data they need, when they need it.
- Say 'Goodbye' to Data Fragmentation: Learn practical strategies to break down data silos and foster a more collaborative and efficient data environment.
- Realize the Full Potential of Your Data: Hear success stories about industry leaders who have embraced data democratization and witnessed tangible results.
Denodo Partner Connect - Technical Webinar - Ask Me AnythingDenodo
Watch full webinar here: https://buff.ly/48ZpEf1
In this session, we will cover a deeper dive into the Denodo Platform 8.0 Certified Architect Associate (DEN80EDUCAA) exam by answering any questions that have developed since the previous session.
Additionally, we invite partners to bring any general questions related to Denodo, the Denodo Platform, or data management.
Lunch and Learn ANZ: Key Takeaways for 2023!Denodo
Watch full webinar here: https://buff.ly/3SnH5QY
2023 is coming to an end where organisations dependency on trusted, accurate, secure and contextual data only grows more challenging. The perpetual aspect in seeking new architectures, processes, organisational team structures to "get the business their data" and reduce the operating costs continues unabated. While confidence from the business in what "value" is being derived or "to be" delivered from these investments in data, is being heavily scrutinised. 2023 saw significant new releases from vendors, focusing on the Data Fabric.
At this session we will look at these topics and key takeaways for 2023, including;
- Data management and data integration market highlights for 2023
- Key achievements for Denodo in their journey as a leader in this market
- A few case studies from Australian organisations in how they are delivering strategic business value through Denodo's Data Fabric platform and what they have been doing differently
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way ForwardDenodo
Watch full webinar here: https://buff.ly/3S4Y49o
A little over a year ago, we would not have expected the disruptions caused by the rise of Generative AI. If 2023 was a groundbreaking year for AI, what will 2024 bring? More importantly, what can you do now to take advantage of these trends and ensure you are future-proof?
For example:
- Generative AI will become more powerful and user-friendly, enabling novel and realistic content creation and automation.
- Data Architectures will need to adapt to feed these powerful new models.
- Data ecosystems are moving to the cloud, but there is a growing need to maintain control of costs and optimize workloads better.
Join us for a discussion on the most significant trends in the Data & AI space, and how you can prepare to ride this wave!
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...Denodo
Watch full webinar here: https://buff.ly/3O7rd2R
Afin d’être conformes au RGPD, les entreprises ont besoin d'avoir une vue d'ensemble sur toutes leurs données et d'établir des contrôles de sécurité sur toute l'infrastructure. La virtualisation des données de Denodo permet de rassembler les multiples sources de données, de les rendre accessibles à partir d'une seule couche, et offre des capacités de monitoring pour surveiller les changements.
Pour cela, Square IT Services a développé pour l’un de ses grands clients français prestigieux dans le secteur du luxe une interface utilisateur ergonomique qui lui permet de consulter les informations personnelles de ses clients, vérifier leur éligibilité à pratiquer leur droit à l'oubli, et de désactiver leurs différents canaux de notification. Elle dispose aussi d'une fonctionnalité d'audit qui permet de tracer l'historique des opérations effectuées, et lui permet donc de retrouver notamment la date à laquelle la personne a été anonymisée.
L'ensemble des informations remontées au niveau de l'application sont récupérées à partir des APIs REST exposées par Denodo.
Dans ce webinar, nous allons détailler l’ensemble des fonctionnalités de l’application DPO-Cockpit autour d’une démo, et expliquer à chaque étape le rôle central de Denodo pour réussir à simplifier la gestion du RGPD tout en étant compliant.
Les points clés abordés:
- Contexte client face aux enjeux du RGPD
- Défis et challenges rencontrés
- Options et choix retenu (Denodo)
- Démarche: architecture de la solution proposée
- Démo de l'outil: fonctionnalités principales
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...Denodo
Watch full webinar here: https://buff.ly/48zzN2h
In an increasingly distributed and complex data landscape, it is becoming increasingly difficult to govern and secure data effectively throughout the enterprise. Whether it be securing data across different repositories or monitoring access across different business units, the proliferation of data technologies and repositories across both on-premises and in the cloud is making the task unattainable. The challenge is only made greater by the ongoing pressure to offer self-service data access to business users.
Tune in and learn:
- How to use a logical data fabric to build an enterprise-wide data access role model.
- Centralise security when data is spread across multiple systems residing both on-premises and in the cloud.
- Control and audit data access across different regions.
How to Build Your Data Marketplace with Data Virtualization?Denodo
Watch full webinar here: https://buff.ly/4aAi0cS
Organizations continue to collect mounds of data and it is spread over different locations and in different formats. The challenge is navigating the vastness and complexity of the modern data ecosystem to find the right data to suit your specific business purpose. Data is an important corporate asset and it needs to be leveraged but also protected.
By adopting an alternate approach to data management and adapting a logical data architecture, data can be democratized while providing centralized control within a distributed data landscape. The web-based Data Catalog tool acts as a single access point for secure enterprise-wide data access and governance. This corporate data marketplace provides visibility into your data ecosystem and allows data to be shared without compromising data security policies.
Catch this live webinar to understand how this approach can transform how you leverage data across the business:
- Empower the knowledge worker with data and increase productivity
- Promote data accuracy and trust to encourage re-use of important data assets
- Apply consistent security and governance policies across the enterprise data landscape
Webinar #2 - Transforming Challenges into Opportunities for Credit UnionsDenodo
Watch full webinar here: https://buff.ly/3vhzqL5
Join our exclusive webinar series designed to empower credit unions with transformative insights into the untapped potential of data. Explore how data can be a strategic asset, enabling credit unions to overcome challenges and foster substantial growth.
This webinar will delve into how data can serve as a catalyst for addressing key challenges faced by credit unions, propelling them towards a future of enhanced efficiency and growth.
Enabling Data Catalog users with advanced usabilityDenodo
Watch full webinar here: https://buff.ly/48A4Yu1
Data catalogs are increasingly important in any modern data-driven organization. They are essential to manage and make the most of the huge amount of data that any organization uses. As this information is continuously growing in size and complexity, data catalogs are key to providing Data Discovery, Data Governance, and Data Lineage capabilities.
Join us for the session driven by David Fernandez, Senior Technical Account Manager at Denodo, to review the latest features aimed at improving the usability of the Denodo Data Catalog.
Watch on-demand & Learn:
- Enhanced search capabilities using multiple terms.
- How to create workflows to manage internal requests.
- How to leverage the AI capabilities of Data Catalog to generate SQL queries from natural language.
Watch full webinar here: https://buff.ly/3vjrn0s
The purpose of the Denodo Platform 8.0 Certified Architect Associate (DEN80EDUCAA) exam is to provide organizations that use Denodo Platform 8.0 with a means of identifying suitably qualified data architects who understand the role and position of the Denodo Platform within their broader information architecture.
This exam covers the following technical topics and subject areas:
- Denodo Platform functionality, including
- Governance and metadata management
- Security
- Performance optimization
- Caching
- Defining Denodo Platform use scenarios
Along with some sample questions, a Denodo Sales Engineer will help you prepare for exam topics and ace the exam.
Join us now to start your journey toward becoming a Certified Denodo Architect Associate!
GenAI y el futuro de la gestión de datos: mitos y realidadesDenodo
Watch full webinar here: https://buff.ly/3NLMSNM
El Generative AI y los Large Language Models (LLMs), encabezados por GPT de OpenAI, han supuesto la mayor revolución en el mundo de la computación de los últimos años. Pero ¿Cómo afectan realmente a la gestión de datos? ¿Reemplazarán los LLMs al profesional de la gestion de datos? ¿Cuánto hay de mito y cuánto de realidad?
En esta sesión revisaremos:
- Que es la Generative AI y por qué es importante para la gestión de datos
- Presente y futuro de aplicación de genAI en el mundo de los datos
- Cómo preparar tu organización para la adopción de genAI
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
3. 3
HEADQUARTERS
Palo Alto, CA.
DENODO OFFICES, CUSTOMERS, PARTNERS
Global presence throughout North America,
EMEA, APAC, and Latin America.
CUSTOMERS
250+ customers, including many
F500 and G2000 companies across every
major industry have gained significant
business agility and ROI.
LEADERSHIP
Longest continuous focus on data
virtualization and data services.
Product leadership.
Solutions expertise.
3
THE LEADER IN DATA VIRTUALIZATION
Denodo provides agile, high performance data
integration and data abstraction across the broadest
range of enterprise, cloud, big data and unstructured
data sources, and real-time data services at half the
cost of traditional approaches.
4. Speakers
Paul Moxon
Senior Director of Product
Management, Denodo
Pablo Álvarez
Principal Technical Account
Manager, Denodo
Rubén Fernández
Technical Account Manager,
Denodo
6. Agenda1.The Logical Data Warehouse
2.Different Types, Different Needs
3.Performance in a LDW
4.Customer Success Stories
5.Q&A
7. What is a Logical Data Warehouse?
A logical data warehouse is a data system that follows
the ideas of traditional EDW (star or snowflake schemas)
and includes, in addition to one (or more) core DWs,
data from external sources.
The main motivations are improved decision making
and/or cost reduction
8. Logical Data Warehouse
Description:
“The Logical Data Warehouse (LDW) is a new data management architecture for
analytics combining the strengths of traditional repository warehouses with
alternative data management and access strategy. The LDW will form a new
best practice by the end of 2015.”
“The LDW is an evolution and augmentation of DW practices, not a replacement”
“A repository-only style DW contains a single ontology/taxonomy, whereas in the
LDW a semantic layer can contain many combination of use cases, many
business definitions of the same information”
“The LDW permits an IT organization to make a large number of datasets
available for analysis via query tools and applications.”
8
Gartner Definition
Gartner Hype Cycle for Enterprise Information Management, 2012
9. Logical Data Warehouse
Description:
“The Logical Data Warehouse (LDW) is a new data management architecture for
analytics combining the strengths of traditional repository warehouses with
alternative data management and access strategy. The LDW will form a new
best practice by the end of 2015.”
“The LDW is an evolution and augmentation of DW practices, not a replacement”
“A repository-only style DW contains a single ontology/taxonomy, whereas in the
LDW a semantic layer can contain many combination of use cases, many
business definitions of the same information”
“The LDW permits an IT organization to make a large number of datasets
available for analysis via query tools and applications.”
9
Gartner Definition
Gartner Hype Cycle for Enterprise Information Management, 2012
10. Logical Data Warehouse
Description:
A semantic layer on top of the data warehouse that keeps the business data
definition.
Allows the integration of multiple data sources including enterprise systems,
the data warehouse, additional processing nodes (analytical appliances, Big
Data, …), Web, Cloud and unstructured data.
Publishes data to multiple applications and reporting tools.
10
11. 11
Three Integration/Semantic Layer Alternatives
Gartner’s View of Data Integration
Application/BI Tool as Data
Integration/Semantic Layer
EDW as Data
Integration/Semantic Layer
Data Virtualization as Data
Integration/Semantic Layer
Application/BI Tool Data Virtualization
EDW
EDW
ODS ODS EDW ODS
12. 12
Application/BI Tool as the Data Integration Layer
Application/BI Tool as Data
Integration/Semantic Layer
Application/BI Tool
EDW ODS
• Integration is delegated to end user tools
and applications
• e.g. BI Tools with ‘data blending’
• Results in duplication of effort – integration
defined many times in different tools
• Impact of change in data schema?
• End user tools are not intended to be
integration middleware
• Not their primary purpose or expertise
13. 13
EDW as the Data Integration Layer
EDW as Data
Integration/Semantic Layer
EDW
ODS
• Access to ‘other’ data (query federation) via
EDW
• Teradata QueryGrid, IBM FluidQuery, SAP
Smart Data Access, etc.
• Often coupled with traditional ETL replication
of data into EDW
• EDW ‘center of data universe’
• Provides data integration and semantic layer
• Appears attractive to organizations heavily
invested in EDW
• More than one EDW? EDW costs?
14. 14
Data Virtualization as the Data Integration Layer
Data Virtualization as Data
Integration/Semantic Layer
Data Virtualization
EDW ODS
• Move data integration and semantic layer to
independent Data Virtualization platform
• Purpose built for supporting data access
across multiple heterogeneous data sources
• Separate layer provides semantic models for
underlying data
• Physical to logical mapping
• Enforces common and consistent security
and governance policies
• Gartner’s recommended approach
17. 17
The State and Future of Data Integration. Gartner, 25 may 2016
Physical data movement architectures that aren’t designed to
support the dynamic nature of business change, volatile
requirements and massive data volume are increasingly being
replaced by data virtualization.
Evolving approaches (such as the use of LDW architectures) include
implementations beyond repository-centric techniques
18. What about the Logical Data Lake?
A Data Lake will not have a star or snowflake schema, but rather a more
heterogeneous collection of views with raw data from heterogeneous
sources
The virtual layer will act as a common umbrella under which these
different sources are presented to the end user as a single system
However, from the virtualization perspective, a Virtual Data Lake shares
many technical aspects with a LDW and most of these contents also
apply to a Logical Data Lake
20. 20
Common Patterns for a Logical Data Warehouse
1. The Virtual Data Mart
2. DW + MDM
Data Warehouse extended with master data
3. DW + Cloud
Data Warehouse extended with cloud data
4. DW + DW
Integration of multiple Data Warehouse
5. DW historical offloading
DW horizontal partitioning with historical data in cheaper storage
6. Slim DW extension
DW vertical partitioning with rarely used data in cheaper storage
21. 21
Virtual Data Marts
Business friendly models defined on top of one or multiple systems,
often “flavored” for a particular division
Motivation
Hide complexity of star schemas for business users
Simplify model for a particular vertical
Reuse semantic models and security across multiple reporting engines
Typical queries
Simple projections, filters and aggregations on top of curated “fat tables”
that merge data from facts and many dimensions
Simplified semantic models for business users
22. 22
Virtual Data Marts
Time Dimension Fact table
(sales)
Product
Retailer
Dimension
Sales
EDW Others
Product
Prod. Details
23. 23
DW + MDM
Slim dimensions with extended information maintained in an external
MDM system
Motivation
Keep a single copy of golden records in the MDM that can be reused across
systems and managed in a single place
Typical queries
Join a large fact table (DW) with several MDM dimensions, aggregations on
top
Example
Revenue by customer, projecting the address from the MDM
25. 25
DW + Cloud dimensional data
Fresh data from cloud systems (e.g. SFDC) is mixed with the EDW, usually
on the dimensions. DW is sometimes also in the cloud.
Motivation
Take advantage of “fresh” data coming straight from SaaS systems
Avoid local replication of cloud systems
Typical queries
Dimensions are joined with cloud data to filter based on some external attribute
not available (or not current) in the EDW
Example
Report on current revenue on accounts where the potential for an expansion is
higher than 80%
26. 26
DW + Cloud dimensional data
Time Dimension Fact table
(sales) Product Dimension
Customer
Dimension
CRM
SFDC
Customer
EDW
27. 27
Multiple DW integration
Motivation
Merges and acquisitions
Different DWs by department
Transition to new EDW Deployments (migration to Spark, Redshift, etc.)
Typical queries
Joins across fact tables in different DW with aggregations before or after the JOIN
Example
Get customers with a purchases higher than 100 USD that do not have a fidelity
card (purchases and fidelity card data in different DW)
Use of multiple DW as if it was only one
28. 28
Multiple DW integration
Time
Dimensi
on
Sales fact
Product
Dimension
Region
Finance EDW
City
Marketing EDW
Customer Fidelity factsProduct
Dimension
*Real Examples: Nationwide POC, IBM tests
Store
29. 29
DW Historical Partitioning
Only the most current data (e.g. last year) is in the EDW. Historical data is
offloaded to a Hadoop cluster
Motivations
Reduce storage cost
Transparently use the two datasets as if they were all together
Typical queries
Facts are defined as a partitioned UNION based on date
Queries join the “virtual fact” with dimensions and aggregate on top
Example
Queries on current date only need to go to the DW, but longer timespans need to merge
with Hadoop
Horizontal partitioning
31. 31
Slim DW extension
Minimal DW, with more complete raw data in a Hadoop cluster
Motivation
Reduce cost
Transparently use the two datasets as if they were all together
Typical queries
Tables are defined virtually as 1-to-1 joins between the two systems
Queries join the facts with dimensions and aggregate on top
Example
Common queries only need to go to the DW, but some queries need attributes or
measures from Hadoop
Vertical partitioning
34. 34
It is a common assumption that a virtualized solution will
be much slower than a persisted approach via ETL:
1. There is a large amount of data moved through the
network for each query
2. Network transfer is slow
But is this really true?
35. 35
Debunking the myths of virtual performance
1. Complex queries can be solved transferring moderate data volumes when
the right techniques are applied
Operational queries
Predicate delegation produces small result sets
Logical Data Warehouse and Big Data
Denodo uses characteristics of underlying star schemas to apply
query rewriting rules that maximize delegation to specialized sources
(especially heavy GROUP BY) and minimize data movement
2. Current networks are almost as fast as reading from disk
10GB and 100GB Ethernet are a commodity
36. 36
Denodo has done extensive testing using queries from the standard benchmarking test
TPC-DS* and the following scenario
Compares the performance of a federated approach in Denodo with an MPP system where
all the data has been replicated via ETL
Customer Dim.
2 M rows
Sales Facts
290 M rows
Items Dim.
400 K rows
* TPC-DS is the de-facto industry standard benchmark for measuring the performance of
decision support solutions including, but not limited to, Big Data systems.
vs.
Sales Facts
290 M rows
Items Dim.
400 K rows
Customer Dim.
2 M rows
Performance Comparison
Logical Data Warehouse vs. Physical Data Warehouse
37. 37
Performance Comparison
Query Description
Returned
Rows
Time Netezza
Time Denodo
(Federated Oracle,
Netezza & SQL Server)
Optimization Technique
(automatically selected)
Total sales by customer 1,99 M 20.9 sec. 21.4 sec. Full aggregation push-down
Total sales by customer and
year between 2000 and 2004
5,51 M 52.3 sec. 59.0 sec Full aggregation push-down
Total sales by item brand 31,35 K 4.7 sec. 5.0 sec. Partial aggregation push-down
Total sales by item where
sale price less than current
list price
17,05 K 3.5 sec. 5.2 sec On the fly data movement
Logical Data Warehouse vs. Physical Data Warehouse
38. 38
Performance and optimizations in Denodo
Focused on 3 core concepts
Dynamic Multi-Source Query Execution Plans
Leverages processing power & architecture of data sources
Dynamic to support ad hoc queries
Uses statistics for cost-based query plans
Selective Materialization
Intelligent Caching of only the most relevant and often used
information
Optimized Resource Management
Smart allocation of resources to handle high concurrency
Throttling to control and mitigate source impact
Resource plans based on rules
39. 39
Performance and optimizations in Denodo
Comparing optimizations in DV vs ETL
Although Data Virtualization is a data integration platform,
architecturally speaking it is more similar to a RDBMs
Uses relational logic
Metadata is equivalent to that of a database
Enables ad hoc querying
Key difference between ETL engines and DV:
ETL engines are optimized for static bulk movements
Fixed data flows
Data virtualization is optimized for queries
Dynamic execution plan per query
Therefore, the performance architecture presented here
resembles that of a RDBMS
41. 41
Step by Step
Metadata
Query Tree
• Maps query entities (tables, fields) to actual metadata
• Retrieves execution capabilities and restrictions for views involved
in the query
Static
Optimizer
• Query delegation
• SQL rewriting rules (removal of redundant filters, tree pruning, join
reordering, transformation push-up, star-schema rewritings, etc.)
• Data movement query plans
Cost Based
Optimizer
• Picks optimal JOIN methods and orders based on data distribution
statistics, indexes, transfer rates, etc.
Physical
Execution Plan
• Creates the calls to the underlying systems in their corresponding
protocols and dialects (SQL, MDX, WS calls, etc.)
How Dynamic Query Optimizer Works
42. How Dynamic Query Optimizer Works
42
Example: Total sales by retailer and product during the last month for the brand ACME
Time Dimension Fact table
(sales) Product Dimension
Retailer
Dimension
EDW MDM
SELECT retailer.name,
product.name,
SUM(sales.amount)
FROM
sales JOIN retailer ON
sales.retailer_fk = retailer.id
JOIN product ON sales.product_fk =
product.id
JOIN time ON sales.time_fk = time.id
WHERE time.date < ADDMONTH(NOW(),-1)
AND product.brand = ‘ACME’
GROUP BY product.name, retailer.name
43. How Dynamic Query Optimizer Works
43
Example: Non-optimized
1,000,000,0
00 rows
JOIN
JOIN
JOIN
GROUP BY
product.name,
retailer.name
100 rows 10 rows 30 rows
10,000,000
rows
SELECT
sales.retailer_fk,
sales.product_fk,
sales.time_fk,
sales.amount
FROM sales
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT
product.name,
product.id
FROM product
WHERE
produc.brand =
‘ACME’
SELECT time.date,
time.id
FROM time
WHERE time.date <
add_months(CURRENT_
TIMESTAMP, -1)
44. How Dynamic Query Optimizer Works
44
Step 1: Applies JOIN reordering to maximize delegation
100,000,000
rows
JOIN
JOIN
100 rows 10 rows
10,000,000
rows
GROUP BY
product.name,
retailer.name
SELECT sales.retailer_fk,
sales.product_fk,
sales.amount
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
45. How Dynamic Query Optimizer Works
45
Step 2
10,000 rows
JOIN
JOIN
100 rows 10 rows
1,000 rows
GROUP BY
product.name,
retailer.name
Since the JOIN is on foreign keys
(1-to-many), and the GROUP BY is
on attributes from the dimensions,
it applies the partial aggregation
push down optimization
SELECT sales.retailer_fk,
sales.product_fk,
SUM(sales.amount)
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
46. How Dynamic Query Optimizer Works
46
Step 3
Selects the right JOIN
strategy based on costs for
data volume estimations
1,000 rows
NESTED
JOIN
HASH
JOIN
100 rows10 rows
1,000 rows
GROUP BY
product.name,
retailer.name
SELECT sales.retailer_fk,
sales.product_fk,
SUM(sales.amount)
FROM sales JOIN time ON
sales.time_fk = time.id WHERE
time.date <
add_months(CURRENT_TIMESTAMP, -1)
GROUP BY sales.retailer_fk,
sales.product_fk
WHERE product.id IN (1,2,…)
SELECT
retailer.name,
retailer.id
FROM retailer
SELECT product.name,
product.id
FROM product
WHERE
produc.brand = ‘ACME’
47. How Dynamic Query Optimizer Works
1. Automatic JOIN reordering
Groups branches that go to the same source to maximize query delegation and reduce processing in the DV
layer
End users don’t need to worry about the optimal “pairing” of the tables
2. The Partial Aggregation push-down optimization is key in those scenarios. Based on
PK-FK restrictions, pushes the aggregation (for the PKs) to the DW
Leverages the processing power of the DW, optimized for these aggregations
Reduces significantly the data transferred through the network (from 1 b to 10 k)
3. The Cost-based Optimizer picks the right JOIN strategies based on estimations on data
volumes, existence of indexes, transfer rates, etc.
Denodo estimates costs in a different way for parallel databases (Vertica, Netezza, Teradata) than for regular
databases to take into consideration the different way those systems operate (distributed data, parallel
processing, different aggregation techniques, etc.)
47
Summary
48. How Dynamic Query Optimizer Works
Automatic data movement
Creation of temp tables in one of the systems to enable complete delegation
Only considered as an option if the target source has the “data movement” option
enabled
Use of native bulk load APIs for better performance
Execution Alternatives
If a view exist in more than one system, Denodo can decide in execution time which one
to use
The goal is to maximize query delegation depending on the other tables involved in the
query
48
Other relevant optimization techniques for LDW and Big Data
49. How Dynamic Query Optimizer Works
Optimizations for Virtual Partitioning
Eliminates unnecessary queries and processing based on a pre-execution analysis of the
views and the queries
Pruning of unnecessary JOIN branches
Relevant for horizontal partitioning and “fat” semantic models when queries do not
need attributes for all the tables
Pruning of unnecessary UNION branches
Enables detection of unnecessary UNION branches in vertical partitioning scenarios
Push down of JOIN under UNION views
Enables the delegation of JOINs with dimensions
Automatic Data movement for partition scenarios
Enables the delegation of JOINs with dimensions
49
Other relevant optimization techniques for LDW and Big Data
51. 51
Caching
Sometimes, real time access & federation not a good fit:
Sources are slow (ex. text files, cloud apps. like Salesforce.com)
A lot of data processing needed (ex. complex combinations, transformations,
matching, cleansing, etc.)
Limited access or have to mitigate impact on the sources
For these scenarios, Denodo can replicate just the relevant data in
the cache
Real time vs. caching
52. 52
Caching
Denodo’s cache system is based on an external relational database
Traditional (Oracle, SQLServer, DB2, MySQL, etc.)
MPP (Teradata, Netezza, Vertica, Redshift, etc.)
In-memory storage (Oracle TimesTen, SAP HANA)
Works at view level.
Allows hybrid access (real-time / cached) of an execution tree
Cache Control (population / maintenance)
Manually – user initiated at any time
Time based - using the TTL or the Denodo Scheduler
Event based - e.g. using JMS messages triggered in the DB
Overview
54. 54
Further Reading
Data Virtualization Blog (http://www.datavirtualizationblog.com)
Check the following articles written by our CTO Alberto Pan in our blog:
• Myths in data virtualization performance
• Performance of Data Virtualization in Logical Data Warehouse scenarios
• Physical vs Logical Data Warehouse: the numbers
• Cost Based Optimization in Data Virtualization
Denodo Cookbook
• Data Warehouse Offloading
56. Autodesk Overview
• Founded 1982 (NASDAQ: ASDK)
• Annual revenues (FY 2015) $2.5B
Over 8,800 employees
• 3D modeling and animation software
Flagship product is AutoCAD
• Market sectors:
Architecture, Engineering, and Construction
Manufacturing
Media and Entertainment
Recently started 3D Printing offerings
56
57. Business Drivers for Change
• Software consumption model is changing
Perpetual licenses to subscriptions
User want more flexibility in how they use software
• Autodesk needed to transition to subscription pricing
2016 – some products will be subscription only
• Lifetime revenue higher with subscriptions
Over 3-5 years, subscriptions = more revenues
• Changing a licensing model is disruptive
57
58. Technology Challenges
• Current ‘traditional’ BI/EDW architecture not
designed for data streams from online apps
Weblogs, Clickstreams, Cloud/Desktop apps, etc.
• Existing infrastructure can’t simply ‘go away’
Regulatory reporting (e.g. SEC)
Existing ‘perpetual’ customers
• ‘Subscription’ infrastructure work in parallel
Extend and enhance existing systems
With single access point to all data
• Solution – ‘Logical Data Warehouse’
58
63. 63
Problem Solution Results
Case Study Autodesk Successfully Changes Their
Revenue Model and Transforms Business
Autodesk was changing their business
revenue model from a conventional
perpetual license model to
subscription-based license model.
Inability to deliver high quality data in
a timely manner to business
stakeholders.
Evolution from traditional operational
data warehouse to contemporary
logical data warehouse deemed
necessary for faster speed.
General purpose platform to deliver
data through logical data warehouse.
Denodo Abstraction Layer helps live
invoicing with SAP.
Data virtualization enabled a culture
of “see before you build”.
Successfully transitioned to
subscription-based licensing.
For the first time, Autodesk can do
single point security enforcement and
have uniform data environment for
access.
Autodesk, Inc. is an American multinational software corporation that makes software for the
architecture, engineering, construction, manufacturing, media, and entertainment industries.