The document provides an overview of data warehousing concepts including:
1) The differences between operational systems and data warehouses in terms of purpose, design, and usage.
2) Common data warehousing approaches including top-down and bottom-up, and their characteristics.
3) Key elements of a data warehousing technical architecture including the staging area, data warehouse, and data marts.
Cheetah is a custom data warehouse system built on top of Hadoop that provides high performance for storing and querying large datasets. It uses a virtual view abstraction over star and snowflake schemas to provide a simple yet powerful SQL-like query language. The system architecture utilizes MapReduce to parallelize query execution across many nodes. Cheetah employs columnar data storage and compression, multi-query optimization, and materialized views to improve query performance. Based on evaluations, Cheetah can efficiently handle both small and large queries and outperforms single-query execution when processing batches of queries together.
The document discusses whether a company needs a data lake. It describes the customer's current and desired data warehouse situation, including a daily delta load requirement and a need for streaming and external data. It then covers data warehouse architectures, big data technologies, and reference architectures that combine a data warehouse with a data lake. While a data lake provides benefits like flexibility, streaming, and accessing more data sources and years of data, it also introduces costs, complexity, and new skills requirements.
This document summarizes a presentation by Kevin Kline on strategies for addressing common SQL Server challenges. The presentation covered topics such as tuning disk I/O, managing very large databases, and an overview of Quest software solutions for SQL Server monitoring and performance. Key points included strategies for tiered storage, partitioning very large databases, monitoring disk queue lengths and page reads/writes in SQL Server.
This document provides a sector roadmap for cloud analytic databases in 2017. It discusses key topics such as usage scenarios, disruption vectors, and an analysis of companies in the sector. Some main points:
- Cloud databases can now be considered the default option for most selections in 2017 due to economics and functionality.
- Several newer cloud-native offerings have been able to leapfrog more established databases through tight integration of cloud features like elasticity and separation of compute and storage.
- While traditional database functionality is still required, cloud dynamics are causing needs for capabilities like robust SQL support, diverse data support, and dynamic environment adaptation.
- Vendor solutions are evaluated on disruption vectors including SQL support, optimization, elasticity, environment
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
Â
DataOps is the application of DevOps concepts to data. The DataOps Manifesto outlines WHAT that means, similar to how the Agile Manifesto outlines the goals of the Agile Software movement. But, as the demand for data governance has increased, and the demand to do âmore with lessâ and be more agile has put more pressure on data teams, we all need more guidance on HOW to manage all this. Seeing that need, a small group of industry thought leaders and practitioners got together and created the #TrueDataOps philosophy to describe the best way to deliver DataOps by defining the core pillars that must underpin a successful approach. Combining this approach with an agile and governed platform like Snowflakeâs Data Cloud allows organizations to indeed balance these seemingly competing goals while still delivering value at scale.
Given in Montreal on 14-Dec-2021
The document discusses Cassandra and how it is used by various companies for applications requiring scalability, high performance, and reliability. It summarizes Cassandra's capabilities and how companies like Netflix, Backupify, Ooyala, and Formspring have used Cassandra to handle large and increasing amounts of data and queries in a scalable and cost-effective manner. The document also describes DataStax's commercial offerings around Apache Cassandra including support, tools, and services.
This document discusses building an integrated data warehouse with Oracle Database and Hadoop. It describes why a data warehouse may need Hadoop to handle big data from sources like social media, sensors and logs. Examples are given of using Hadoop for ETL and analytics. The presentation provides an overview of Hadoop and how to connect it to the data warehouse using tools like Sqoop and external tables. It also offers tips on getting started and avoiding common pitfalls.
Cheetah is a custom data warehouse system built on top of Hadoop that provides high performance for storing and querying large datasets. It uses a virtual view abstraction over star and snowflake schemas to provide a simple yet powerful SQL-like query language. The system architecture utilizes MapReduce to parallelize query execution across many nodes. Cheetah employs columnar data storage and compression, multi-query optimization, and materialized views to improve query performance. Based on evaluations, Cheetah can efficiently handle both small and large queries and outperforms single-query execution when processing batches of queries together.
The document discusses whether a company needs a data lake. It describes the customer's current and desired data warehouse situation, including a daily delta load requirement and a need for streaming and external data. It then covers data warehouse architectures, big data technologies, and reference architectures that combine a data warehouse with a data lake. While a data lake provides benefits like flexibility, streaming, and accessing more data sources and years of data, it also introduces costs, complexity, and new skills requirements.
This document summarizes a presentation by Kevin Kline on strategies for addressing common SQL Server challenges. The presentation covered topics such as tuning disk I/O, managing very large databases, and an overview of Quest software solutions for SQL Server monitoring and performance. Key points included strategies for tiered storage, partitioning very large databases, monitoring disk queue lengths and page reads/writes in SQL Server.
This document provides a sector roadmap for cloud analytic databases in 2017. It discusses key topics such as usage scenarios, disruption vectors, and an analysis of companies in the sector. Some main points:
- Cloud databases can now be considered the default option for most selections in 2017 due to economics and functionality.
- Several newer cloud-native offerings have been able to leapfrog more established databases through tight integration of cloud features like elasticity and separation of compute and storage.
- While traditional database functionality is still required, cloud dynamics are causing needs for capabilities like robust SQL support, diverse data support, and dynamic environment adaptation.
- Vendor solutions are evaluated on disruption vectors including SQL support, optimization, elasticity, environment
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
Â
DataOps is the application of DevOps concepts to data. The DataOps Manifesto outlines WHAT that means, similar to how the Agile Manifesto outlines the goals of the Agile Software movement. But, as the demand for data governance has increased, and the demand to do âmore with lessâ and be more agile has put more pressure on data teams, we all need more guidance on HOW to manage all this. Seeing that need, a small group of industry thought leaders and practitioners got together and created the #TrueDataOps philosophy to describe the best way to deliver DataOps by defining the core pillars that must underpin a successful approach. Combining this approach with an agile and governed platform like Snowflakeâs Data Cloud allows organizations to indeed balance these seemingly competing goals while still delivering value at scale.
Given in Montreal on 14-Dec-2021
The document discusses Cassandra and how it is used by various companies for applications requiring scalability, high performance, and reliability. It summarizes Cassandra's capabilities and how companies like Netflix, Backupify, Ooyala, and Formspring have used Cassandra to handle large and increasing amounts of data and queries in a scalable and cost-effective manner. The document also describes DataStax's commercial offerings around Apache Cassandra including support, tools, and services.
This document discusses building an integrated data warehouse with Oracle Database and Hadoop. It describes why a data warehouse may need Hadoop to handle big data from sources like social media, sensors and logs. Examples are given of using Hadoop for ETL and analytics. The presentation provides an overview of Hadoop and how to connect it to the data warehouse using tools like Sqoop and external tables. It also offers tips on getting started and avoiding common pitfalls.
Snowflake is a cloud-based data warehouse system that allows enterprises to store and analyze both structured and semi-structured data. It creates separate virtual warehouses for different workloads so they do not compete for computing resources and can easily scale up or down. Snowflake has grown exponentially since being founded in 2012, reaching a $3.5 billion valuation in October 2018. It sells data warehousing services using a pay-as-you-use business model.
Data mesh is a decentralized approach to managing and accessing analytical data at scale. It distributes responsibility for data pipelines and quality to domain experts. The key principles are domain-centric ownership, treating data as a product, and using a common self-service infrastructure platform. Snowflake is well-suited for implementing a data mesh with its capabilities for sharing data and functions securely across accounts and clouds, with built-in governance and a data marketplace for discovery. A data mesh implemented on Snowflake's data cloud can support truly global and multi-cloud data sharing and management according to data mesh principles.
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
Â
- Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware.
- Cloudera's Data Operating System (CDH) is an enterprise-grade distribution of Apache Hadoop that includes additional components for management, security, and integration with existing systems.
- CDH enables enterprises to leverage Hadoop for data agility, consolidation of structured and unstructured data sources, complex data processing using various programming languages, and economical storage of data regardless of type or size.
1) The document discusses big data strategies and technologies including Oracle's big data solutions. It describes Oracle's big data appliance which is an integrated hardware and software platform for running Apache Hadoop.
2) Key technologies that enable deeper analytics on big data are discussed including advanced analytics, data mining, text mining and Oracle R. Use cases are provided in industries like insurance, travel and gaming.
3) An example use case of a "smart mall" is described where customer profiles and purchase data are analyzed in real-time to deliver personalized offers. The technology pattern for implementing such a use case with Oracle's real-time decisions and big data platform is outlined.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
Â
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session Iâll cover all of them in detail and compare the pros and cons of each. Iâll include use cases so you can see what approach will work best for your big data needs.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Â
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called âdata products,â to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for âdata productsâ in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Federated data architecture involves integrating data from multiple disparate sources to provide a logically integrated view. It allows existing systems to continue operating while being modernized. The US Air Force implemented a federated data solution to manage its $40 billion budget across 100 global locations. It integrated financial data from over 20 legacy systems and provided 15,000 users with real-time access and ad hoc querying capabilities while maintaining high performance.
Barbara Zigman has over 25 years of experience in telecommunications management positions involving business
development, sales, marketing, and product management. She has worked for several service providers and has led
teams supporting the sale of complex technical products and services. Her technical expertise includes fiber networks,
TDM networks, IP networking, PBX/VoIP systems, and wireless technologies.
Data Science Operationalization: The Journey of Enterprise AIDenodo
Â
Watch full webinar here: https://bit.ly/3kVmYJl
As we move into a world driven by AI initiatives, we find ourselves facing new and diverse challenges when it comes to operationalization. Creating a solution and putting it into practice, is certainly not the same. The challenges span various organizational and data facades. In many instances, the data scientists may be working in silos and connecting to the live data may not always be possible. But how does one guarantee their developed model in a silo is still relevant to live data? How can we manage the data flow and data access across the entire AI operationalization cycle?
Watch on-demand to explore:
- The journey and challenges of the Data Scientist
- How Denodo data virtualization with data movement streamlines operationalization
- The best practices and techniques when dealing with siloed data
- How customers have used data virtualization in their data science initiatives
This presentation explains the Integrator's Dilemma and and how the SnapLogic Integration Cloud can help.
To learn more, visit: http://www.snaplogic.com/.
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Â
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
Govern and Protect Your End User InformationDenodo
Â
Watch this Fast Data Strategy session with speakers Clinton Cohagan, Chief Enterprise Data Architect, Lawrence Livermore National Lab & Nageswar Cherukupalli, Vice President & Group Manager, Infosys here: https://buff.ly/2k8f8M5
In its recent report âPredictions 2018: A year of reckoningâ, Forrester predicts that 80% of firms affected by GDPR will not comply with the regulation by May 2018. Of those noncompliant firms, 50% will intentionally not comply.
Compliance doesnât have to be this difficult! What if you have an opportunity to facilitate compliance with a mature technology and significant cost reduction? Data virtualization is a mature, cost-effective technology that enables privacy by design to facilitate compliance.
Attend this session to learn:
⢠How data virtualization provides a compliance foundation with data catalog, auditing, and data security.
⢠How you can enable single enterprise-wide data access layer with guardrails.
⢠Why data virtualization is a must-have capability for compliance use cases.
⢠How Denodoâs customers have facilitated compliance.
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
Â
A good data model, done right the first time, can save you time and money. We have all seen the charts on the increasing cost of finding a mistake/bug/error late in a software development cycle. Would you like to reduce, or even eliminate, your risk of finding one of those errors late in the game? Of course you would! Who wouldn't? Nobody plans to miss a requirement or make a bad design decision (well nobody sane anyway). No data modeler or database designer worth their salt wants to leave a model incomplete or incorrect. So what can you do to minimize the risk?
In this talk I will show you a best practice approach to developing your data models and database designs that I have been using for over 15 years. It is a simple, repeatable process for reviewing your data models. It is one that even a non-modeler could follow. I will share my checklist of what to look for and what to ask the data modeler (or yourself) to make sure you get the best possible data model. As a bonus I will share how I use SQL Developer Data Modeler (a no-cost data modeling tool) to collect the information and report it.
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
Â
This document discusses using Oracle Business Intelligence Enterprise Edition (OBIEE) and the Data Vault data modeling technique to virtualize a business intelligence environment in an agile way. Data Vault provides a flexible and adaptable modeling approach that allows for rapid changes. OBIEE allows for the virtualization of dimensional models built on a Data Vault foundation, enabling quick iteration and delivery of reports and dashboards to users. Together, Data Vault and OBIEE provide an agile approach to business intelligence.
This document provides an introduction and overview of implementing Data Vault 2.0 on Snowflake. It begins with an agenda and the presenter's background. It then discusses why customers are asking for Data Vault and provides an overview of the Data Vault methodology including its core components of hubs, links, and satellites. The document applies Snowflake features like separation of workloads and agile warehouse scaling to support Data Vault implementations. It also addresses modeling semi-structured data and building virtual information marts using views.
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
Â
This document discusses a platform called EzBake that was created to help a US government customer modernize their systems and better analyze large amounts of data. EzBake provides tools to easily develop and deploy applications, integrate and analyze data from various sources, and implement security controls. It improved the customer's ability to share data and applications across many teams and networks, decreased development times from 6-8 months to 3-4 weeks, and reduced costs while increasing capabilities.
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
Â
This document discusses Marriott International's journey to implementing a cloud-based data warehouse and analytics platform using IBM BigSQL on Softlayer cloud infrastructure. It describes the limitations of their existing on-premises system, challenges faced in migrating data and queries to the cloud, lessons learned, and next steps to further improve the platform. The system is now in production use by an initial group of users at Marriott.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant â he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Youngâs Center for Technology Enablement. Jeff is also the author of âSemantic Web for Dummiesâ and "Adaptive Information,â a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeleyâs Extension for object-oriented systems, software development process and enterprise architecture.
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
Â
Watch full webinar here: [https://buff.ly/2CIOtys]
As organizations collect increasing amounts of diverse data, integrating that data for analytics becomes more difficult. Technology that scales poorly and fails to support semi-structured data fails to meet the ever-increasing demands of todayâs enterprise. In short, companies everywhere canât consolidate their data into a single location for analytics.
In this Denodo DataFest 2018 session weâll cover:
Bypassing the mandate of a single enterprise data warehouse
Modern data sharing to easily connect different data types located in multiple repositories for deeper analytics
How cloud data warehouses can scale both storage and compute, independently and elastically, to meet variable workloads
Presentation by Harsha Kapre, Snowflake
The document provides an overview of a course on data warehousing. It includes a roadmap that covers topics such as why data warehousing is used, the difference between operational systems and data warehouses, data warehouse approaches, data modeling concepts, and ETL products. It also defines key concepts like operational systems, data warehouses, and dimensional modeling. Specific techniques covered include entity-relationship modeling and dimensional modeling.
The document provides an overview of data warehousing concepts including:
1) A data warehouse is a subject-oriented collection of integrated data used to support management decisions. It contains current and historical data.
2) A data warehouse architecture typically includes source systems, a staging area, and presentation layer for querying and reporting.
3) Data marts are focused subsets of a data warehouse tailored for specific business units or departments. There are dependent, independent, and hybrid approaches to building data marts.
Snowflake is a cloud-based data warehouse system that allows enterprises to store and analyze both structured and semi-structured data. It creates separate virtual warehouses for different workloads so they do not compete for computing resources and can easily scale up or down. Snowflake has grown exponentially since being founded in 2012, reaching a $3.5 billion valuation in October 2018. It sells data warehousing services using a pay-as-you-use business model.
Data mesh is a decentralized approach to managing and accessing analytical data at scale. It distributes responsibility for data pipelines and quality to domain experts. The key principles are domain-centric ownership, treating data as a product, and using a common self-service infrastructure platform. Snowflake is well-suited for implementing a data mesh with its capabilities for sharing data and functions securely across accounts and clouds, with built-in governance and a data marketplace for discovery. A data mesh implemented on Snowflake's data cloud can support truly global and multi-cloud data sharing and management according to data mesh principles.
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
Â
- Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware.
- Cloudera's Data Operating System (CDH) is an enterprise-grade distribution of Apache Hadoop that includes additional components for management, security, and integration with existing systems.
- CDH enables enterprises to leverage Hadoop for data agility, consolidation of structured and unstructured data sources, complex data processing using various programming languages, and economical storage of data regardless of type or size.
1) The document discusses big data strategies and technologies including Oracle's big data solutions. It describes Oracle's big data appliance which is an integrated hardware and software platform for running Apache Hadoop.
2) Key technologies that enable deeper analytics on big data are discussed including advanced analytics, data mining, text mining and Oracle R. Use cases are provided in industries like insurance, travel and gaming.
3) An example use case of a "smart mall" is described where customer profiles and purchase data are analyzed in real-time to deliver personalized offers. The technology pattern for implementing such a use case with Oracle's real-time decisions and big data platform is outlined.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
Â
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session Iâll cover all of them in detail and compare the pros and cons of each. Iâll include use cases so you can see what approach will work best for your big data needs.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Â
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called âdata products,â to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for âdata productsâ in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Federated data architecture involves integrating data from multiple disparate sources to provide a logically integrated view. It allows existing systems to continue operating while being modernized. The US Air Force implemented a federated data solution to manage its $40 billion budget across 100 global locations. It integrated financial data from over 20 legacy systems and provided 15,000 users with real-time access and ad hoc querying capabilities while maintaining high performance.
Barbara Zigman has over 25 years of experience in telecommunications management positions involving business
development, sales, marketing, and product management. She has worked for several service providers and has led
teams supporting the sale of complex technical products and services. Her technical expertise includes fiber networks,
TDM networks, IP networking, PBX/VoIP systems, and wireless technologies.
Data Science Operationalization: The Journey of Enterprise AIDenodo
Â
Watch full webinar here: https://bit.ly/3kVmYJl
As we move into a world driven by AI initiatives, we find ourselves facing new and diverse challenges when it comes to operationalization. Creating a solution and putting it into practice, is certainly not the same. The challenges span various organizational and data facades. In many instances, the data scientists may be working in silos and connecting to the live data may not always be possible. But how does one guarantee their developed model in a silo is still relevant to live data? How can we manage the data flow and data access across the entire AI operationalization cycle?
Watch on-demand to explore:
- The journey and challenges of the Data Scientist
- How Denodo data virtualization with data movement streamlines operationalization
- The best practices and techniques when dealing with siloed data
- How customers have used data virtualization in their data science initiatives
This presentation explains the Integrator's Dilemma and and how the SnapLogic Integration Cloud can help.
To learn more, visit: http://www.snaplogic.com/.
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
Â
Learn how to solve the top 3 challenges Snowflake customers face, and what you can do to ensure high-performance, intelligent analytics at any scale. Ideal for those currently using Snowflake and those considering it. Learn more at: https://kyligence.io/
Govern and Protect Your End User InformationDenodo
Â
Watch this Fast Data Strategy session with speakers Clinton Cohagan, Chief Enterprise Data Architect, Lawrence Livermore National Lab & Nageswar Cherukupalli, Vice President & Group Manager, Infosys here: https://buff.ly/2k8f8M5
In its recent report âPredictions 2018: A year of reckoningâ, Forrester predicts that 80% of firms affected by GDPR will not comply with the regulation by May 2018. Of those noncompliant firms, 50% will intentionally not comply.
Compliance doesnât have to be this difficult! What if you have an opportunity to facilitate compliance with a mature technology and significant cost reduction? Data virtualization is a mature, cost-effective technology that enables privacy by design to facilitate compliance.
Attend this session to learn:
⢠How data virtualization provides a compliance foundation with data catalog, auditing, and data security.
⢠How you can enable single enterprise-wide data access layer with guardrails.
⢠Why data virtualization is a must-have capability for compliance use cases.
⢠How Denodoâs customers have facilitated compliance.
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
Â
A good data model, done right the first time, can save you time and money. We have all seen the charts on the increasing cost of finding a mistake/bug/error late in a software development cycle. Would you like to reduce, or even eliminate, your risk of finding one of those errors late in the game? Of course you would! Who wouldn't? Nobody plans to miss a requirement or make a bad design decision (well nobody sane anyway). No data modeler or database designer worth their salt wants to leave a model incomplete or incorrect. So what can you do to minimize the risk?
In this talk I will show you a best practice approach to developing your data models and database designs that I have been using for over 15 years. It is a simple, repeatable process for reviewing your data models. It is one that even a non-modeler could follow. I will share my checklist of what to look for and what to ask the data modeler (or yourself) to make sure you get the best possible data model. As a bonus I will share how I use SQL Developer Data Modeler (a no-cost data modeling tool) to collect the information and report it.
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
Â
This document discusses using Oracle Business Intelligence Enterprise Edition (OBIEE) and the Data Vault data modeling technique to virtualize a business intelligence environment in an agile way. Data Vault provides a flexible and adaptable modeling approach that allows for rapid changes. OBIEE allows for the virtualization of dimensional models built on a Data Vault foundation, enabling quick iteration and delivery of reports and dashboards to users. Together, Data Vault and OBIEE provide an agile approach to business intelligence.
This document provides an introduction and overview of implementing Data Vault 2.0 on Snowflake. It begins with an agenda and the presenter's background. It then discusses why customers are asking for Data Vault and provides an overview of the Data Vault methodology including its core components of hubs, links, and satellites. The document applies Snowflake features like separation of workloads and agile warehouse scaling to support Data Vault implementations. It also addresses modeling semi-structured data and building virtual information marts using views.
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
Â
This document discusses a platform called EzBake that was created to help a US government customer modernize their systems and better analyze large amounts of data. EzBake provides tools to easily develop and deploy applications, integrate and analyze data from various sources, and implement security controls. It improved the customer's ability to share data and applications across many teams and networks, decreased development times from 6-8 months to 3-4 weeks, and reduced costs while increasing capabilities.
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
Â
This document discusses Marriott International's journey to implementing a cloud-based data warehouse and analytics platform using IBM BigSQL on Softlayer cloud infrastructure. It describes the limitations of their existing on-premises system, challenges faced in migrating data and queries to the cloud, lessons learned, and next steps to further improve the platform. The system is now in production use by an initial group of users at Marriott.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant â he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Youngâs Center for Technology Enablement. Jeff is also the author of âSemantic Web for Dummiesâ and "Adaptive Information,â a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeleyâs Extension for object-oriented systems, software development process and enterprise architecture.
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
Â
Watch full webinar here: [https://buff.ly/2CIOtys]
As organizations collect increasing amounts of diverse data, integrating that data for analytics becomes more difficult. Technology that scales poorly and fails to support semi-structured data fails to meet the ever-increasing demands of todayâs enterprise. In short, companies everywhere canât consolidate their data into a single location for analytics.
In this Denodo DataFest 2018 session weâll cover:
Bypassing the mandate of a single enterprise data warehouse
Modern data sharing to easily connect different data types located in multiple repositories for deeper analytics
How cloud data warehouses can scale both storage and compute, independently and elastically, to meet variable workloads
Presentation by Harsha Kapre, Snowflake
The document provides an overview of a course on data warehousing. It includes a roadmap that covers topics such as why data warehousing is used, the difference between operational systems and data warehouses, data warehouse approaches, data modeling concepts, and ETL products. It also defines key concepts like operational systems, data warehouses, and dimensional modeling. Specific techniques covered include entity-relationship modeling and dimensional modeling.
The document provides an overview of data warehousing concepts including:
1) A data warehouse is a subject-oriented collection of integrated data used to support management decisions. It contains current and historical data.
2) A data warehouse architecture typically includes source systems, a staging area, and presentation layer for querying and reporting.
3) Data marts are focused subsets of a data warehouse tailored for specific business units or departments. There are dependent, independent, and hybrid approaches to building data marts.
Building an Effective Data Warehouse ArchitectureJames Serra
Â
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
The document provides an overview of key concepts in data warehousing and business intelligence, including:
1) It defines data warehousing concepts such as the characteristics of a data warehouse (subject-oriented, integrated, time-variant, non-volatile), grain/granularity, and the differences between OLTP and data warehouse systems.
2) It discusses the evolution of business intelligence and key components of a data warehouse such as the source systems, staging area, presentation area, and access tools.
3) It covers dimensional modeling concepts like star schemas, snowflake schemas, and slowly and rapidly changing dimensions.
Informix warehouse and accelerator overviewKeshav Murthy
Â
This document provides an overview of Informix Warehouse and Informix Warehouse Accelerator. It discusses data warehousing industry trends, features of Informix Warehouse 11.70 including loading, storage optimization, and query processing capabilities. It also describes the Informix Warehouse Accelerator which uses columnar storage, compression and massive parallelism to accelerate select queries with unprecedented response times.
ADV Slides: Platforming Your Data for Success â Databases, Hadoop, Managed Ha...DATAVERSITY
Â
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? Weâll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if itâs a bad fit.
Drop the herd mentality. In reality, there is no âone size fits allâ right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
Which Change Data Capture Strategy is Right for You?Precisely
Â
Change Data Capture or CDC is the practice of moving the changes made in an important transactional system to other systems, so that data is kept current and consistent across the enterprise. CDC keeps reporting and analytic systems working on the latest, most accurate data.
Many different CDC strategies exist. Each strategy has advantages and disadvantages. Some put an undue burden on the source database. They can cause queries or applications to become slow or even fail. Some bog down network bandwidth, or have big delays between change and replication.
Each business process has different requirements, as well. For some business needs, a replication delay of more than a second is too long. For others, a delay of less than 24 hours is excellent.
Which CDC strategy will match your business needs? How do you choose?
View this webcast on-demand to learn:
⢠Advantages and disadvantages of different CDC methods
⢠The replication latency your project requires
⢠How to keep data current in Big Data technologies like Hadoop
Imagine an entire IT infrastructure controlled not by hands and hardware, but by software. One in which application workloads such as big data, analytics, simulation and design are serviced automatically by the most appropriate resource, whether running locally or in the cloud. A Software Defined Infrastructure enables your organization to deliver IT services in the most efficient way possible, optimizing resource utilization to accelerate time to results and reduce costs. It is the foundation for a fully integrated software defined environment, optimizing your compute, storage and networking infrastructure so you can quickly adapt to changing business requirements. A comprehensive portfolio of management tools dynamically manage workloads and data, transforming a static IT infrastructure into a workload- , resource- and data-aware environment.
Learn more: http://ibm.co/1wkoXtc
Watch the video presentation: http://insidehpc.com/2015/03/slidecast-software-defined-infrastructure/
What is a Data Warehouse and How Do I Test It?RTTS
Â
ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them.
Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace.
Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them?
RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies.
You will learn the answer to the following questions:
⢠What is Big Data and what does it mean to me?
⢠What are the business reasons for a building a Data Warehouse and for using Business Intelligence software?
⢠How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective?
⢠Who are the primary players in this software space?
⢠How do I test these environments?
⢠What tools should I use?
This slide deck is geared towards:
ďź QA Testers
ďź Data Architects
ďź Business Analysts
ďź ETL Developers
ďź Operations Teams
ďź Project Managers
...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.
High Value Business Intelligence for IBM Platform compute environmentsGabor Samu
Â
IBM Platform Analytics is an advanced analysis and visualization tool for analyzing workload data from IBM Platform LSF and IBM Platform Symphony clusters. It allows organizations to correlate workload, resource and license data from multiple clusters for data-driven decision making.
A data warehouse is a pool of data structured to support decision making. It integrates data from multiple sources and is time-variant and nonvolatile. Data warehouses can take the form of enterprise data warehouses, used across an organization for decision support, or data marts designed for a specific department. The data warehousing process involves extracting data from sources, transforming and loading it into a comprehensive database, and using middleware tools and metadata. Real-time data warehousing allows for information-based decision making using up-to-date data.
The document discusses tips and strategies for using SAP NetWeaver Business Intelligence 7.0 as an enterprise data warehouse (EDW). It covers differences between evolutionary warehouse architecture and top-down design, compares data mart and EDW approaches, explores real-time data warehousing with SAP, examines common EDW pitfalls, and reviews successes and failures of large-scale SAP BI-EDW implementations. The presentation also explores the SAP NetWeaver BI architecture and Corporate Information Factory framework.
This document discusses enterprise data warehousing positioning based on the SAP Real-Time Data Platform. It describes the needs for different types of data marts and analytics in enterprise environments. Complex landscapes often include operational data marts, agile data marts, real-time data marts, and predictive data marts. The document also discusses the role of the enterprise data warehouse as a single point of truth and consolidating data across the enterprise. SAP addresses these needs with offerings like SAP BW, SAP HANA, and Sybase IQ which can be used for packaged or custom-built data warehouse and data mart solutions.
IBM's Big Data platform provides tools for managing and analyzing large volumes of data from various sources. It allows users to cost effectively store and process structured, unstructured, and streaming data. The platform includes products like Hadoop for storage, MapReduce for processing large datasets, and InfoSphere Streams for analyzing real-time streaming data. Business users can start with critical needs and expand their use of big data over time by leveraging different products within the IBM Big Data platform.
The document discusses operational analytics and its performance on Informix, including what operational analytics is, how it can be implemented on Informix, and performance analysis of Informix on Intel platforms. It provides an overview of operational analytics and its challenges, how it can leverage Informix for the complete lifecycle, and benchmarks showing Informix's scaling on Intel's Xeon platforms for operational analytics workloads.
IBM's Big Data platform provides tools for managing and analyzing large volumes of structured, unstructured, and streaming data. It includes Hadoop for storage and processing, InfoSphere Streams for real-time streaming analytics, InfoSphere BigInsights for analytics on data at rest, and PureData System for Analytics (formerly Netezza) for high performance data warehousing. The platform enables businesses to gain insights from all available data to capitalize on information resources and make data-driven decisions.
Engage for success ibm spectrum accelerate 2xKinAnx
Â
IBM Spectrum Accelerate is software that extends the capabilities of IBM's XIV storage system, such as consistent performance tuning-free, to new delivery models. It provides enterprise storage capabilities deployed in minutes instead of months. Spectrum Accelerate runs the proven XIV software on commodity x86 servers and storage, providing similar features and functions to an XIV system. It offers benefits like business agility, flexibility, simplified acquisition and deployment, and lower administration and training costs.
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
Â
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here?
In this webinar, we look at this foundational technology for modern Data Management and show how it evolved to meet the workloads of today, as well as when other platforms make sense for enterprise data.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Â
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. Itâs time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
Â
The document discusses Pivotal's big data suite and business data lake offerings. It provides an overview of the components of a business data lake, including storage, ingestion, distillation, processing, unified data management, and action components. It also defines various data processing approaches like streaming, micro-batching, batch, and real-time response. The goal is to help organizations build analytics and transactional applications on big data to drive business insights and revenue.
Similar to Dwh basics datastage online training (20)
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. đ This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. đť
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. đĽď¸
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. đ
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Â
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
Â
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Â
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
Â
An English đŹđ§ translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech đ¨đż version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Â
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Â
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Â
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Â
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Â
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind fĂźr viele in der HCL-Community seit letztem Jahr ein heiĂes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und LizenzgebĂźhren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer mĂśglich. Das verstehen wir und wir mĂśchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lÜsen kÜnnen, die dazu fßhren kÜnnen, dass mehr Benutzer gezählt werden als nÜtig, und wie Sie ßberflßssige oder ungenutzte Konten identifizieren und entfernen kÜnnen, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnÜtigen Ausgaben fßhren kÜnnen, z. B. wenn ein Personendokument anstelle eines Mail-Ins fßr geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren LÜsungen. Und natßrlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Ăberblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und ĂźberflĂźssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps fßr häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
Â
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of whatâs possible in finance.
In summary, DeFi in 2024 is not just a trend; itâs a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
4. IBM Software Group | WebSphere software
4
Course Roadmap
⢠Why we use Data warehousing
⢠Difference between Operational System and Data Warehouse
⢠Introduction to Data warehousing
⢠Data Warehousing Approaches
⢠Data Warehouse Technical Architecture
⢠Data Modelling concepts
⢠Operational Data Store
⢠Schema Design of Data warehouse
⢠Data Acquisation
⢠ETL Products
⢠Project Life Cycle
5. IBM Software Group | WebSphere software
5
Why We Need Data Warehousing ?
ď§ Better business intelligence for end-users
ď§ Reduction in time to locate, access, and analyze information
ď§ Consolidation of disparate information sources
ď§ To Store Large Volumes of Historical Detail Data from Mission Critical Applications
ď§ Strategic advantage over competitors
ď§ Faster time-to-market for products and services
ď§ Replacement of older, less-responsive decision support systems
ď§ Reduction in demand on IS to generate reports
6. IBM Software Group | WebSphere software
6
What is an Operational System?
ď§ Operational systems are just what their name implies; they are the systems that
help us run the day-to-day enterprise operations.
ď§ These are the backbone systems of any enterprise, such as order entry inventory
etc.
ď§ The classic examples are airline reservations, credit-card authorizations, and ATM
withdrawals etc.,
7. IBM Software Group | WebSphere software
7
Characteristics of Operational Systems
⢠Continuous availability
⢠Predefined access paths
⢠Transaction integrity
⢠Volume of transaction - High
⢠Data volume per query - Low
⢠Used by operational staff
⢠Supports day to day control operations
⢠Large number of users
8. IBM Software Group | WebSphere software
8
OLTP Vs Data Warehouse
Operational System Data Warehouse
Transaction Processing Query Processing
Predictable CPU Usage Random CPU Usage
Time Sensitive History Oriented
Operator View Managerial View
Normalized Efficient
Design for TP
Denormalized Design for
Query Processing
9. IBM Software Group | WebSphere software
9
OLTP Vs Warehouse
Operational System Data Warehouse
Designed for Atmocity,
Consistency, Isolation and
Durability
Designed for quite or static
database
Organized by transactions
(Order, Input, Inventory)
Organized by subject
(Customer, Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent
users
Volatile Data Non Volatile Data
10. IBM Software Group | WebSphere software
10
Operational System Data Warehouse
Stores all data Stores relevant data
Performance Sensitive Less Sensitive to performance
Not Flexible Flexible
Efficiency Effectiveness
11. IBM Software Group | WebSphere software
11
What is a Data Warehouse ?
ď§ Data WarehouseData Warehouse is a
ď§ Subject-Oriented
ď§ Integrated
ď§ Time-Variant
ď§ Non-volatile
WH Inmon - Regarded As Father Of Data WarehousingWH Inmon - Regarded As Father Of Data Warehousing
13. IBM Software Group | WebSphere software
13
13
Subject Oriented Analysis
Data Warehouse StorageTransactional Storage
SalesSales
CustomersCustomers
ProductsProducts
Entry
Sales Rep
Quantity Sold
Part Number
Date
Customer Name
Product Description
Unit Price
Mail Address
Process Oriented Subject Oriented
14. IBM Software Group | WebSphere software
14
14
Integration of Data
Data Warehouse StorageTransactional Storage
Appl. A - M, F
Appl. B - 1, 0
Appl. C - X, Y
Appl. A - pipeline cm.
Appl. B - pipeline inches
Appl. C - pipeline mcf
Appl. A - balance dec(13,2)
Appl. B - balance PIC 9(9)V99
Appl. C - balance float
Appl. A - bal-on-hand
Appl. B - current_balance
Appl. C - balance
Appl. A - date (Julian)
Appl. B - date (yymmdd)
Appl. C - date (absolute)
M, F
pipeline cm
balance dec(13, 2)
balance
date (Julian)
Integration
Encoding
Unit of
Attributes
Physical
Attributes
Naming
Conventions
Data
Consistency
15. IBM Software Group | WebSphere software
15
15
Load
Access
Mass Load / Access of DataRecord-by-Record Data Manipulation
Insert
Access
Insert
Change
Delete
Change
Volatile Non-Volatile
Volatility of Data
Data Warehouse StorageTransactional Storage
16. IBM Software Group | WebSphere software
16
16
Time Variant Data Analysis
Data Warehouse StorageTransactional Storage
Current Data Historical Data
0
5
10
15
20
Sales ( in lakhs
)
January February March
Year97
Sales ( Region , Year - Year 97 - 1st Qtr)
East
West
North
17. IBM Software Group | WebSphere software
Load/
Update
Consistent Points in Time
Updated constantly
Data changes according to
need, not a fixed schedule
Added to regularly, but loaded data
is rarely directly changed
Does NOT mean the Data
warehouse is never updated or
never changes!!
Constant Change
Operational systems
Database
Data warehouse
Datawarehouse- Differences from Operational
Systems
Insert
Insert
Update
Initial Load
Incremental Load
Incremental Load
Update
Delete
19. IBM Software Group | WebSphere software
19
DW Implementation Approaches
ď§ Top Down
ď§ Bottom-up
ď§ Combination of both
ď§ Choices depend on:
ď´current infrastructure
ď´resources
ď´architecture
ď´ROI
ď´Implementation speed
20. IBM Software Group | WebSphere software
20
Heterogeneous Source Systems
Staging
Common Staging interface Layer
EDW- âTop DownâApproach
Data mart bus architecture Layer
Enterprise Datawarehouse
Source
1
Source
2
Source
3
Incremental Architected data marts
DM 1 DM 3DM 2
21. IBM Software Group | WebSphere software
21
Heterogeneous Source Systems
Staging
Common Staging interface Layer
EDW- âBottom upâApproach
Data mart bus architecture Layer
Source
1
Source
2
Source
3
Incremental Architected data marts
DM 1 DM 3DM 2
Enterprise Datawarehouse
22. IBM Software Group | WebSphere software
22
Source System Data Staging Area Presentation Area
Services:
Transform from
source-to-Target
Maintain Conform
Dimensions
No user query
support
Data Store:
Flat files or
relational tables
Design Goals:
Staging
Throughput
integrity/
consistency
Load
Access
Ad Hoc Query Tools
Report Writers
Analytic Applications
Modeling:
Forecasting
Scoring
Data
Mining
Data Mart #1
Dimensional
Atomic AND
summery data
Business
Process Centric
Design Goals:
Easy-of -use
Query
Performance
Data Mart #2
Data Mart #.....
Data Mart Bus:
Conformed facts and dims
Extract
Extract
Extract
Data Access Tools
Independent Data Marts: Ralph Kimballâs Ideology
Ralph Kimballâ Approach
23. IBM Software Group | WebSphere software
23
â˘E/R Design or Flat File
â˘Retain History Needed
for
regular processing
â˘No end user access
⢠Dimensional
â˘Transaction &
Summary data
â˘Data Mart Single
subject area
(i.e. Fact table)
â˘Multiple Marts May
exist in a
Single Database
Instance
Bottom Up Approach
Staging Data Store
Data Warehouse
Data Mart Data Mart Data Mart
Data Mart Data MartData Mart
â˘Integrated Data
â˘Timely User Access
â˘Conformed Dimensions
â˘Single Process to
Build Dimension
25. IBM Software Group | WebSphere software
25
Bill Inmonâ Approach
Source
System
Data Staging
Area
Presentation
Area
âEnterprise Data
Warehouseâ
Normalized
tables
Atomic Data
User query
support to
atomic data
Extract
Extract
Extract
Load
Data Mart #1
Dimensional
summery data
Departmental
Centric
Access
Access
Data Access
Tools
Data Mart #2
Data Mart #...
ETL
Dependent Data Marts: Bill Inmonâs Ideology
DWH
26. IBM Software Group | WebSphere software
26
Top Down Approach
⢠Raw Input Data
⢠E/R Model
⢠Subject Areas
⢠Transaction Level Detail
⢠Historical Persistency As justified- Archive
for Retrieval if Needed
⢠Most are dimensional
⢠Data Mart Design by Business
Function
⢠Summary Level Data
â˘
Data Mart Data Mart
Staging Data Store
Data
Warehouse
Data Mart
Data
Mart
Flat
File
â˘Integrated Data
â˘Timely user Access
â˘Single Process to build dimension
27. IBM Software Group | WebSphere software
27
DW Implementation Approaches
Top Down
ď§ More planning and design initially
ď§ Involve people from different work-
groups, departments
ď§ Data marts may be built later from
Global DW
ď§ Overall data model to be decided up-
front
Bottom Up
ď§ Can plan initially without waiting for
global infrastructure
ď§ built incrementally
ď§ can be built before or in parallel with
Global DW
ď§ Less complexity in design
28. IBM Software Group | WebSphere software
28
DW Implementation Approaches
Top Down
ď§ Consistent data definition and
enforcement of business rules across
enterprise
ď§ High cost, lengthy process, time
consuming
ď§ Works well when there is centralized IS
department responsible for all H/W and
resources
Bottom Up
ď§ Data redundancy and
inconsistency between data marts
may occur
ď§ Integration requires great planning
ď§ Less cost of H/W and other
resources
ď§ Faster pay-back
30. IBM Software Group | WebSphere software
30
Prod
Mkt
HR
Fin
Acctg
Data Sources
Transaction Data
IBM
IMS
VSAM
Oracle
Sybase
ETL Software Data Stores Data Analysis
Tools and
Applications
Users
Other Internal Data
ERP SAP
Clickstream Informix
Web Data
External Data
Demographic Harte-
Hanks
S
T
A
G
I
N
G
A
R
E
A
O
P
E
R
A
T
I
O
N
A
L
D
A
T
A
S
T
O
R
E
Ascential
Extract
Sagent
SAS
Clean/Scrub
Transform
Firstlogic
Load
DATASTAGE
Data Marts
Teradata
IBM
Data
Warehouse
Meta
Data
Finance
Marketing
Sales
Essbase
Microsoft
ANALYSTS
MANAGERS
EXECUTIVES
OPERATIONAL
PERSONNEL
CUSTOMERS/
SUPPLIERS
SQL
Cognos
SAS
Queries,Reporting,
DSS/EIS,
Data Mining
Micro Strategy
Siebel
Business
Objects
Web
Browser
31. IBM Software Group | WebSphere software
31
Benefits of DWH
To formulate effective business, marketing
and sales strategies.
To precisely target promotional activity.
To discover and penetrate new markets.
To successfully compete in the marketplace
from a position of informed strength.
To build predictive rather than retrospective models.
33. IBM Software Group | WebSphere software
33
Data Modeling
ďś WHAT IS A DATA MODEL?
ďźA data model is an abstraction of some aspect of the real
world (system).
ďś WHY A DATA MODEL?
⢠Helps to visualize the business
⢠A model is a means of communication.
⢠Models help elicit and document requirements.
⢠Models reduce the cost of change.
⢠Model is the essence of DW architecture based on which
DW will be implemented
34. IBM Software Group | WebSphere software
34
STEPS in DATA MODELING
Problem & scope definition
Requirement Gathering
Analysis
Logical Database Design
Deciding Database
Physical Database design
Schema Generation
35. IBM Software Group | WebSphere software
35
Levels of modeling
ď§ Conceptual modeling
ď´Describe data requirements from a
business point of view without technical
details
ď§ Logical modeling
ď´Refine conceptual models
ď´Data structure oriented, platform
independent
ď§ Physical modeling
ď´Detailed specification of what is physically
implemented using specific technology
36. IBM Software Group | WebSphere software
36
Modeling Techniques
ď§ Entity-Relationship Modeling
ď´Traditional modeling technique
ď´Technique of choice for OLTP
ď´Suited for corporate data warehouse
ď§ Dimensional Modeling
ď´Analyzing business measures in the specific business context
ď´Helps visualize very abstract business questions
ď´End users can easily understand and navigate the data
structure
37. IBM Software Group | WebSphere software
37
ď§ Relationship
ď´Relationship between entities - structural interaction
and association
ď´described by a verb
ď´Cardinality
ď§ 1-1
ď§ 1-M
ď§ M-M
ď´Example : Books belong to Printed Media
Entity-Relationship Modeling - Basic Concepts
38. IBM Software Group | WebSphere software
38
Entity-Relationship Modeling - Basic Concepts
ď§ Attributes
ď´Characteristics and properties of entities
ď´Example :
ď§ Book Id, Description, book category are
attributes of entity âBookâ
ď´Attribute name should be unique and self-
explanatory
ď´Primary Key, Foreign Key, Constraints are defined
on Attributes
39. IBM Software Group | WebSphere software
Review of Logical Modeling Terms & Symbols
ď§ Entities define specific groups of information
Sales Organization
Sales Org ID
Distribution Channel
Entity
40. IBM Software Group | WebSphere software
Review of Logical Modeling Terms & Symbols
ď§ One or more attribute uniquely identifies an instance of an
entity
Sales Organization
Sales Org ID
Distribution Channel
Identifier
41. IBM Software Group | WebSphere software
Review of Logical Modeling Terms & Symbols
ď§ The logical model identifies relationships between
entities
Sales Detail
Sales Record ID
Sales Rep
Sales Rep ID
Relationship
{
43. IBM Software Group | WebSphere software
Logical Data Model
Sales Detail
Sales Record ID
Customer
Customer ID
Product
Product SKU
Suppliers
Supplier ID
Manufacturing Group
Manufacturing Org ID
Factory
Factory ID
Sales Organization
Sales Org ID
Distribution Channel
Sales Rep
Sales Rep ID
Retail
Market
Product Sales Plan
Plan ID
Wholesale
Industry
48. IBM Software Group | WebSphere software
48
Dimensional Modeling
ď§ Dimensional modeling uses three basic concepts : measures,
facts, dimensions.
ď§ Is powerful in representing the requirements of the business
user in the context of database tables.
ď§ Focuses on numeric data, such as values counts, weights,
balances and occurences.
49. IBM Software Group | WebSphere software
49
What is a Facts
ď§ A fact is a collection of related data items, consisting of measures
and context data.
ď§ Each fact typically represents a business item, a business
transaction, or an event that can be used in analyzing the business
or business process.
ď§ Facts are measured, âcontinuously valuedâ, rapidly changing
information. Can be calculated and/or derived.
ď§ Granularity
The level of detail of data contained in the data warehouse
e.g. Daily item totals by product, by store
50. IBM Software Group | WebSphere software
50
Types of Facts
ď§ Additive
ď´Able to add the facts along all the dimensions
ď´Discrete numerical measures eg. Retail sales in $
ď§ Semi Additive
ď´Snapshot, taken at a point in time
ď´Measures of Intensity
ď´Not additive along time dimension eg. Account balance, Inventory
balance
ď´Added and divided by number of time period to get a time-average
ď§ Non Additive
ď´Numeric measures that cannot be added across any dimensions
ď´Intensity measure averaged across all dimensions eg. Room
temperature
ď´Textual facts - AVOID THEM
51. IBM Software Group | WebSphere software
51
Dimensions
ď§ A dimension is a collection of members or units of the same type
of views.
ď§ Dimensions determine the contextual background for the facts.
ď§ Dimensions represent the way business people talk about the
data resulting from a business process, e.g., who, what, when,
where, why, how
52. IBM Software Group | WebSphere software
52
52
Dimensional Hierarchy
World
America AsiaEurope
USA
FL
Canada Argentina
GA VA CA WA
TampaMiami Orlando Naples
Continent Level
State Level
City Level
World Level
Country Level
ParentRelation
Dimension Member /
Business Entity
Geography Dimension
Attributes: Population, Touristâs Place
53. IBM Software Group | WebSphere software
53
Dimensions Types
ď§ Conformed Dimension
ď§ Junk Dimension
ď§ Fast Changing Dimension
ď§ Role Playing Dimension
ď§ âGarbageâ Dimension
ď§ Slowly Changing Dimension
ď§ Degenerated Dimension
53
54. IBM Software Group | WebSphere software
54
What is a Slowly Changing Dimension?
ď§ Although dimension tables are typically static lists, most dimension tables do change over
time.
ď§ Since these changes are smaller in magnitude compared to changes in fact tables, these
dimensions are known as slowly growing or slowly changing dimensions.
55. IBM Software Group | WebSphere software
55
Slowly Changing Dimension -Classification
Slowly changing dimensions are classified into three different
types
ď§ TYPE I
ď§ TYPE II
ď§ TYPE III
56. IBM Software Group | WebSphere software
56
Slowly Changing Dimensions Type I
Shane
Name
Shane@xyz.com1001
EmailEmp id
Shane
Name
Shane@xyz.com1001
EmailEmp id
Shane
Name
Shane@
abc.co.in
1001
EmailEmp id
Shane
Name
Shane@
abc.co.in
1001
EmailEmp id
Source
Source Target
Target
Shane@
xyz.com
57. IBM Software Group | WebSphere software
57
Slowly Changing Dimensions Type II
Shane
Name
Shane@xyz.com10
EmailEmp id
Shane@x
yz.
com
Email
Shane
Name
10
Emp id
1000
PM_PRI
MARYK
EY
0
PM_VER
SION_N
UMBER
Source Target
59. IBM Software Group | WebSphere software
59
Slowly Changing Dimensions -Versioning
Shane
Name
Shane@
abc.co.in
10
EmailEmp id
Source
Target
0Shane@
xyz.com
Shane101000
1Shane@
abc.co.in
Shane101001
EmailNameEmp idPM_PRIMA
RYKEY
PM_VERSION_NUMBER
60. IBM Software Group | WebSphere software
60
Slowly Changing Dimensions -Versioning
Shane
Name
Shane@
abc.com
10
EmailEmp id
Source
Target
1Shane@
abc.co.in
Shane101001
2Shane@
abc.com
Shane101003
0Shane@
xyz.com
Shane101000
EmailNameEmp idPM_PRIM
ARYKEY
PM_VERSION_NUM
BER
61. IBM Software Group | WebSphere software
61
Slowly Changing Dimensions Type II -
Flag
Shane
Name
Shane@xyz.com10
EmailEmp id
Shane@
xyz.
com
Email
Shane
Name
10
Emp id
1000
PM_PR
IMAR
YKEY
Y
PM_CUR
RENT_FL
AG
Source
Target
62. IBM Software Group | WebSphere software
62
Slowly Changing Dimensions - Flag Current
Shane
Name
Shane@
abc.co.in
10
EmailEmp id
Source
Target
NShane@
xyz.com
Shane101000
YShane@
abc.co.in
Shane101001
EmailNameEmp idPM_PRIMA
RYKEY
PM_CURRENT_FLAG
63. IBM Software Group | WebSphere software
63
Slowly Changing Dimensions - Flag Current
Shane
Name
Shane@
abc.com
10
EmailEmp id
Source
Target
NShane@
abc.co.in
Shane101001
YShane@
abc.com
Shane101003
NShane@
xyz.com
Shane101000
EmailNameEmp idPM_PRIMA
RYKEY
PM_CURRENT_FLAG
65. IBM Software Group | WebSphere software
65
Slowly Changing Dimensions Type II
Shane
Name
Shane@xyz.c
om
10
EmailEmp id
01/01/00
PM_BEG
IN_DAT
E
Shane@x
yz.com
Email
Shane
Name
10
Emp id
1000
PM_PRI
MARYK
EY
PM_EN
D_DATE
Source
Target
66. IBM Software Group | WebSphere software
66
Slowly Changing Dimensions -Effective Date
Shane
Name
Shane@
abc.co.in10
Email
Emp id
Source
Target
03/01/00
01/01/00
PM_BEGIN_D
ATE
03/01/00Shane@x
yz.com
Shane101000
Shane@
abc.co.in
Shane101001
EmailNameEmp idPM_PRIMAR
YKEY
PM_END_D
ATE
67. IBM Software Group | WebSphere software
67
Slowly Changing Dimensions - Effective Date
Shane
Name
Shane@
abc.com10
EmailEmp id
Source
Target
05/02/00
03/01/00
01/01/00
PM_BEGIN_D
ATE
05/02/00Shane@
abc.co.in
Shane101001
Shane@
abc.com
Shane101003
03/01/00Shane@
xyz.com
Shane101000
EmailNameEmp idPM_PRIM
ARYKEY
PM_END_DA
TE
69. IBM Software Group | WebSphere software
69
Slowly Changing Dimensions Type III
Shane
Name
Shane@xyz.c
om
10
EmailEmp id
PM_Prev_
Column
Name
Shane@xyz.
com
Email
Shane
Name
10
Emp id
1
PM_PRI
MARYKE
Y
01/01/00
PM_EFFEC
T_DATE
Source
Target
70. IBM Software Group | WebSphere software
70
Slowly Changing Dimensions Type III
Shane
Name
Shane@
abc.co.in10
EmailEmp id
Source
Target
Shane@xyz.co
m
PM_Prev_Colu
mnName
01/02/00Shane@
abc.co.in
Shane101
EmailNameEmp idPM_PRIMAR
YKEY
PM_EFFEC
T_DATE
71. IBM Software Group | WebSphere software
71
Slowly Changing Dimensions Type III
Shane
Name
Shane@
abc.com10
EmailEmp id
Source
Target
Shane@
abc.co.in
PM_Prev_Colu
mnName
01/03/00Shane@
abc.com
Shane101
EmailNameEmp idPM_PRIM
ARYKEY
PM_EFFECT_
DATE
72. IBM Software Group | WebSphere software
72
Degenerate Dimension
ď§ Dimension keys in fact table without corresponding dimension tables are
called Degenerate Dimensions
ď§ Purpose of Degenerate Dimensions
1. Generally used when each record in fact represents transaction line item
2. Useful for grouping transaction line items belonging to a single
transaction
73. IBM Software Group | WebSphere software
73
Fast Changing Dimension
A fast changing dimension is a dimension whose attribute or
attributes for a record (row) change rapidly over time.
1. Example: Age of associates, Income, Daily balance etc.
2. Technique to handle fast changing dimension: Create band
tables
74. IBM Software Group | WebSphere software
74
Role Playing Dimension
A single dimension which is expressed differently in a fact table using
views is called a role-playing dimension. This can be achieved by
creating views on dimension table.
75. IBM Software Group | WebSphere software
75
Conformed Dimension
A conformed dimension means the same thing to
each fact table to which it can be joined.
Typically, dimension tables that are referenced or are
likely to be referenced by multiple fact tables
(multiple dimensional models) are called conformed
dimensions
.
76. IBM Software Group | WebSphere software
76
Conformed Dimension Option #1
ďą Identical dimensions with same keys, labels, definitions and Values
Sales Schema
Inventory Schema
SALES Facts
DATE KEY
PRODUCT KEY
STORE KEY
PROMO KEY
Product Desc
Brand Desc
Category Desc
PRODUCT KEY
INVENTORY
Facts
DATE KEY
PRODUCT KEY
STORE KEY
Product Desc
Brand Desc
Category Desc
PRODUCT KEY
77. IBM Software Group | WebSphere software
77
Conformed Dimension Option #2
ďąSubset of base dimension with common labels, definitions
and values
Sales
Schema
Forecast
Schema
SALES $
DATE KEY
PRODUCT KEY
STORE KEY
PROMO KEY
Product Desc
Brand Desc
Category Desc
PRODUCT KEY
DATE KEY
Day-of-week
Week Desc
Month Desc
SALES $
MONTH KEY
BRAND KEYBrand Desc
Category Desc
BRAND KEY MONTH KEY
Month Desc
BRAND KEY Brand Desc Category Desc
12345 Cherriors Cereal
PROD KEY Prod Desc Brand Desc Category Desc
12345 Cherriors 10 Cherriors Cereal
78. IBM Software Group | WebSphere software
78
âGarbageâ Dimension
A garbage dimension is a dimension that consists of low-cardinality columns
such as codes, indicators, and status flags.
Approach to handle Garbage dimension:
⢠Put the new attributes into existing dimension tables.
⢠Put the new attributes into the fact table.
⢠Create new separate dimension tables garbage dimension
⢠Create a separate âGarbage Dimensionâ table
79. IBM Software Group | WebSphere software
79
Junk Dimensions
ďľ Whether to use junk dimension
ďľ5 indicators, each has 3 values -> 243 (35
) rows
ďľ5 indicators, each has 100 values -> 100 million (1005
) rows
ďľ When to insert rows in the dimension
80. IBM Software Group | WebSphere software
80
Factless Fact Tables
The two types of factless fact tables are:
ď§ Coverage tables
ď§ Event tracking tables
81. IBM Software Group | WebSphere software
81
Factless Fact Tables - Coverage Tables
Coverage tables are required when a primary fact
table is sparse
Example: Tracking products in a store that did not sell
83. IBM Software Group | WebSphere software
83
Factless Fact Tables - Event Tracking
These tables are used for tracking a event:
Example: Tracking student attendance
84. IBM Software Group | WebSphere software
84
Fact Constellation
ď§ Fact constellations: Multiple fact tables share dimension tables,viewed as
a collection of stars, therefore called galaxy schema or fact constellation
85. IBM Software Group | WebSphere software
85
What is a Data mart?
ď§ Data mart is a decentralized subset of data found either in a data warehouse
or as a standalone subset designed to support the unique business unit
requirements of a specific decision-support system.
ď§ Data marts have specific business-related purposes such as measuring the
impact of marketing promotions, or measuring and forecasting sales
performance etc,.
Data Mart
Data Mart
Enterprise
Data Warehouse
86. IBM Software Group | WebSphere software
86
Data marts - Main Features
Main Features:
ď§ Low cost
ď§ Controlled locally rather than centrally, conferring power on the user group.
ď§ Contain less information than the warehouse
ď§ Rapid response
ď§ Easily understood and navigated than an enterprise data warehouse.
ď§ Within the range of divisional or departmental budgets
88. IBM Software Group | WebSphere software
88
Datamart Advantages :
ď§ Typically single subject area and fewer dimensions
ď§ Limited feeds
ď§ Very quick time to market (30-120 days to pilot)
ď§ Quick impact on bottom line problems
ď§ Focused user needs
ď§ Limited scope
ď§ Optimum model for DW construction
ď§ Demonstrates ROI
ď§ Allows prototyping
Advantages of Datamart over Datawarehouse
89. IBM Software Group | WebSphere software
89
Data Mart disadvantages :
⢠Does not provide integrated view of business information.
⢠Uncontrolled proliferation of data marts results in redundancy
⢠More number of data marts complex to maintain
⢠Scalability issues for large number of users and increased data volume
Disadvantages of Data Mart
90. IBM Software Group | WebSphere software
90
90
Data marts
⢠Embedded data marts are marts that are stored within
the central DW. They can be stored relationally as files or
cubes.
⢠Dependent data marts are marts that are fed directly by
the DW, sometimes supplemented with other feeds, such
as external data.
⢠Independent data marts are marts that are fed directly
by external sources and do not use the DW.
DM - Types
93. IBM Software Group | WebSphere software
93
Why We Need Operational Data Store?
Need
ď§ To obtain a âsystem of recordâ that contains the best data that
exists in a legacy environment as a source of information
ď§ Best here implies data to be
ď´Complete
ď´Up to date
ď´Accurate
ď§ In conformance with the organizationâs information model
94. IBM Software Group | WebSphere software
ď§ ODS data resolves data integration issues
ď§ Data physically separated from production
environment to insulate it from the processing
demands of reporting and analysis
ď§ Access to current data facilitated.
Operational Data Store - Insulated from OLTP
Tactical
Analysis
OLTP Server
ODS
95. IBM Software Group | WebSphere software
95
ď§ Detailed data
ď´ Records of Business Events
(e.g. Orders capture)
ď§ Data from heterogeneous sources
ď§ Does not store summary data
ď§ Contains current data
Operational Data Store - Data
97. IBM Software Group | WebSphere software
97
ODS- Benefits
ď§ Integrates the data
ď§ Synchronizes the structural differences in data
ď§ High transaction performance
ď§ Serves the operational and DSS environment
ď§ Transaction level reporting on current data
Flat
files
Relational
Database
Operational
Data Store
60,5.2,âJOHNâ
72,6.2,âDAVIDâ
Excel files
98. IBM Software Group | WebSphere software
ď§ Update schedule - Daily or less
time frequency
ď§ Detail of Data is mostly between
30 and 90 days
ď§ Addresses operational needs
ď§ Weekly or greater time frequency
ď§ Potentially infinite history
ď§ Address strategic needs
Operational Data Store- Update schedule
ODS
Data
Data warehouse
Data
100. IBM Software Group | WebSphere software
100
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant
within system;
Unmanaged
redundancy among
systems
Somewhat
redundant with
operational
databases
Managed
redundancy
Data stability Dynamic Somewhat dynamic Static
Data update Field by field Field by field Controlled batch
Data usage Highly structured,
repetitive
Somewhat
structured, some
analytical
Highly
unstructured,
heuristic or
analytical
Database size Moderate Moderate Large to very large
Database
structure stability
Stable Somewhat stable Dynamic
101. IBM Software Group | WebSphere software
101
Star Schema Design
ď´Single fact table surrounded by denormalized dimension
tables
ď´The fact table primary key is the composite of the
foreign keys (primary keys of dimension tables)
ď´Fact table contains transaction type information.
ď´Many star schemas in a data mart
ď´Easily understood by end users, more disk storage
required
103. IBM Software Group | WebSphere software
103
Snowflake Schema
ď´Single fact table surrounded by normalized dimension
tables
ď´Normalizes dimension table to save data storage space.
ď´When dimensions become very very large
ď´Less intuitive, slower performance due to joins
ď§ May want to use both approaches, especially if supporting multiple
end-user tools.
104. IBM Software Group | WebSphere software
104
Example of Snow flake schema
105. IBM Software Group | WebSphere software
105
Snowflake - Disadvantages
ď§ Normalization of dimension makes it difficult for user to
understand
ď§ Decreases the query performance because it involves more
joins
ď§ Dimension tables are normally smaller than fact tables - space
may not be a major issue to warrant snowflaking
106. IBM Software Group | WebSphere software
106
Data Acquisation
ď§ Data Extraction
ď§ Data Transformation
ď§ Data Loading
106
107. IBM Software Group | WebSphere software
107
Tool Category Products
ETL Tools ETI Extract, Informatica, IBM Visual Warehouse
Oracle Warehouse Builder
OLAP Server Oracle Express Server, Hyperion Essbase, IBM DB2
OLAP Server, Microsoft SQL Server OLAP Services,
Seagate HOLOS, SAS/MDDB
OLAP Tools Oracle Express Suite, Business Objects, Web
Intelligence, SAS, Cognos Powerplay/Impromtu,
KALIDO, MicroStrategy, Brio Query, MetaCube
Data Warehouse Oracle, Informix, Teradata, DB2/UDB, Sybase,
Microsoft SQL Server, RedBricks
Data Mining &
Analysis
SAS Enterprise Miner, IBM Intelligent Miner,
SPSS/Clementine, TCS Tools
Representative DW Tools
108. IBM Software Group | WebSphere software
108
ETL PRODUCTS
ď§ CODE BASED ETL TOOLS
ď§ GUI BASED ETL TOOLS
108
109. IBM Software Group | WebSphere software
109
CODE BASED ETL TOOLS
ď§ SAS ACCESS
ď§ SAS BASE
ď§ TERADATA ETL TOOLS
1. BTEQ
2. TPUMP
3. FAST LOAD
4. MULTI LOAD
110. IBM Software Group | WebSphere software
110
GUI BASED ETL TOOLS
ď§ Informatica
ď§ DT/Studio
ď§ Data Stage
ď§ Business Objects Data Integrator (BODI)
ď§ AbInitio
ď§ Data Junction
ď§ Oracle Warehouse Builder
ď§ Microsoft SQL Server Integration Services
ď§ IBM DB2 Ware house Center
112. IBM Software Group | WebSphere software
112
Extraction Types
Extraction
Full Extract
Periodic/
Incremental
Extract
113. IBM Software Group | WebSphere software
113
Full Extract
Source System
Full Extract
Data Mart
New data
114. IBM Software Group | WebSphere software
115
Incremental Extract
Data Mart
Source System
Incremental Extract
Existing data
Incremental
Data
115. IBM Software Group | WebSphere software
116
Incremental Extract
Data Mart
Source System
Incremental Extract
New data
Changed data
Existing data
Incremental
Data
116. IBM Software Group | WebSphere software
117
Incremental Extract
Data Mart
Source System
Incremental Extract
New data
Changed data
Existing data updated
using changed data
Incremental
Data
Incremental addition
to data mart
119. IBM Software Group | WebSphere software
120
Types of Data warehouse Loading
ď§ Target update types
ď´Insert
ď´Update
120. IBM Software Group | WebSphere software
Types of Data Warehouse Updates
Insert
Full Replace
Selective Replace
Update plus Retain History
Update
Point in Time
Snapshots
New Data
Changed Data
Data Warehouse
Source data Data Staging
121. IBM Software Group | WebSphere software
New Data and Point-In-Time Data Insert
Source data
New data
OR
Point-in-Time
Snapshot
(e.g.. Monthly)
New Data Added to
Existing Data
122. IBM Software Group | WebSphere software
Changed Data Insert
Source data
Changed Data Added to
Existing Data
Changed
data
123. IBM Software Group | WebSphere software
124
DataData
WareWare
househouse
DataData
WareWare
househouse
Enterprise
Data
Warehouse
InfoInfo
AccessAccess
InfoInfo
AccessAccess
Reporting tools
Web
Browsers
OLAP
Mining
ETLETLETLETL
External DataExternal Data
StorageStorage
BusinessBusiness
RequirementRequirement
Map DataMap Data
sourcessources
ReverseReverse
Engg.Engg.
MapMap
Req. toReq. to
OLTPOLTP
OLTPOLTP
SystemSystem
LogicalLogical
ModelingModeling
RefineRefine
ModelModel
Data Warehouse Life cycle
124. IBM Software Group | WebSphere software
125
Project Life Cycle
ď§ Software Requirement Specification
ď§ High level Design(HLD)
ď§ Low level Design(LLD)
ď§ Development
ď§ Unit Testing
ď§ System Integration Testing
ď§ Peer Review
ď§ User Acceptance Testing
ď§ Production
ď§ Maintenance
125
126. IBM Software Group | WebSphere software
127
⢠Data about data and the processes
⢠Metadata is stored in a data dictionary and repository.
⢠Insulates the data warehouse from changes in the schema of
operational systems.
⢠It serves to identify the contents and location of data in the
data warehouse
What is Metadata?
127. IBM Software Group | WebSphere software
128
ď§ Share resources
ď´ Users
ď´ Tools
ď§ Document system
ď§ Without meta data
ď´ Not Sustainable
ď´ Not able to fully utilize resource
Why Do You Need Meta Data?
128. IBM Software Group | WebSphere software
The Role of Meta Data in the Data Warehouse
ď§ Know what data you have and
ď§ You can trust it!
Meta Data enables data to become information, because with it you
129. IBM Software Group | WebSphere software
Meta Data AnswersâŚ.
How have business definitions and terms changed over time?
How do product lines vary across organizations?
What business assumptions have been made?
How do I find the data I need?
What is the original source of the data?
How was this summarization created?
What queries are available to access the data
130. IBM Software Group | WebSphere software
131
Meta Data Process
ď§ Integrated with entire process and data flow
ď´ Populated from beginning to end
ď´ Begin population at design phase of project
ď´ Dedicated resources throughout
ď§ Build
ď§ Maintain
â˘Design
â˘Mapping
â˘Design
â˘Mapping
â˘Extract
â˘Scrub
â˘Transform
â˘Extract
â˘Scrub
â˘Transform
â˘Load
â˘Index
â˘Aggregation
â˘Load
â˘Index
â˘Aggregation
â˘Replication
â˘Data Set Distribution
â˘Replication
â˘Data Set Distribution
â˘Access & Analysis
â˘Resource Scheduling & Distribution
â˘Access & Analysis
â˘Resource Scheduling & Distribution
Meta DataMeta Data
System MonitoringSystem Monitoring
131. IBM Software Group | WebSphere software
132
Types of ETL Meta Data
.
ETL Meta data
Technical
Meta data
Operational
Meta data
132. IBM Software Group | WebSphere software
ď§ Data Warehouse Meta data
This Meta data stores descriptive information about the physical
implementation details of data warehouse.
ď§ Source Meta data
This Meta data stores information about the source data and the mapping of source
data to data warehouse data
Classification of ETL Meta Data
133. IBM Software Group | WebSphere software
ď§ Transformations & Integrations.
This Meta data describes comprehensive information about the Transformation and
loading.
ď§ Processing Information
This Meta data stores information about the activities involved in the processing of data
such as scheduling and archives etc
ď§ End User Information
This Meta data records information about the user profile and security.
ETL Meta Data
134. IBM Software Group | WebSphere software
135
ETL -Planning for the Movement
The following may be helpful for planning the movement
ď§ Develop a ETL plan
ď§ Specifications
ď§ Implementation