The document introduces Oracle Data Integrator and Oracle GoldenGate as solutions for enterprise data integration. It discusses challenges with fragmented data silos and the need to improve data accessibility, reliability, and quality across systems. Oracle Data Integrator is presented as a solution for real-time enterprise data integration using an ELT approach. It can integrate data across various systems faster and with lower total cost of ownership compared to traditional ETL. Oracle GoldenGate enables real-time data replication and change data capture. Together, Oracle Data Integrator and Oracle GoldenGate provide a full suite for batch, incremental, and real-time data integration.
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...Jade Global
Learn about the First and Only Automated Solution for Informatica to Oracle Data Integrator (ODI) conversion
Do you want to know:
“What” is Informatica vs ODI?
“Why” do you need to move to ODI?
“How” is the migration from Informatica to ODI possible?
Learn how you can achieve up to 90% automated conversion, up to 90% reduced implementation time, up to 50% cost savings and up to 5X productivity gain.
Know more please visit: http://informaticatoodi.jadeglobal.com/
The AnalytiX DS – LiteSpeed Conversion® (ALC) solution runs as Software as a Service (SaaS) and provides an automated framework that automates the conversion of ETL tool platforms. Enable FAST and AUTOMATED conversion between one ETL Platform and another.
The document compares ETL and ELT data integration processes. ETL extracts data from sources, transforms it, and loads it into a data warehouse. ELT loads extracted data directly into the data warehouse and performs transformations there. Key differences include that ETL is better for structured data and compliance, while ELT handles any size/type of data and transformations are more flexible but can slow queries. AWS Glue, Azure Data Factory, and SAP BODS are tools that support these processes.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
The document discusses various concepts related to database design and data warehousing. It describes how DBMS minimize problems like data redundancy, isolation, and inconsistency through techniques like normalization, indexing, and using data dictionaries. It then discusses data warehousing concepts like the need for data warehouses, their key characteristics of being subject-oriented, integrated, and time-variant. Common data warehouse architectures and components like the ETL process, OLAP, and decision support systems are also summarized.
Transform Your Data Integration Platform From Informatica To ODI Jade Global
Watch this webinar to know why to transform your Data Integration Platform from Informatica To ODI. Join us for the live demo of the InfatoODI tool and learn how you can reduce your implementation time by up to 70% and increase your productivity gains by up to 5 times. For more information, please visit: http://informaticatoodi.jadeglobal.com/
Finding ways to make ETL loads faster is not always obvious. Moreover, there is a difference in how to tune OLAP vs OLTP databases. Some of the techniques learned through years of tuning EBS seem to make no effect on tuning a BI ETL. This presentation will discuss why this is the case, present some techniques on how to find the bottlenecks in your BI ETL jobs and some techniques to tune these slow SQL statements, improving the speed of nightly ETL jobs. Attendees will learn the steps to monitor ETLs, capture Problem SQL and gain knowledge to improve the overall ETL Performance.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. It’s time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...Jade Global
Learn about the First and Only Automated Solution for Informatica to Oracle Data Integrator (ODI) conversion
Do you want to know:
“What” is Informatica vs ODI?
“Why” do you need to move to ODI?
“How” is the migration from Informatica to ODI possible?
Learn how you can achieve up to 90% automated conversion, up to 90% reduced implementation time, up to 50% cost savings and up to 5X productivity gain.
Know more please visit: http://informaticatoodi.jadeglobal.com/
The AnalytiX DS – LiteSpeed Conversion® (ALC) solution runs as Software as a Service (SaaS) and provides an automated framework that automates the conversion of ETL tool platforms. Enable FAST and AUTOMATED conversion between one ETL Platform and another.
The document compares ETL and ELT data integration processes. ETL extracts data from sources, transforms it, and loads it into a data warehouse. ELT loads extracted data directly into the data warehouse and performs transformations there. Key differences include that ETL is better for structured data and compliance, while ELT handles any size/type of data and transformations are more flexible but can slow queries. AWS Glue, Azure Data Factory, and SAP BODS are tools that support these processes.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
The document discusses various concepts related to database design and data warehousing. It describes how DBMS minimize problems like data redundancy, isolation, and inconsistency through techniques like normalization, indexing, and using data dictionaries. It then discusses data warehousing concepts like the need for data warehouses, their key characteristics of being subject-oriented, integrated, and time-variant. Common data warehouse architectures and components like the ETL process, OLAP, and decision support systems are also summarized.
Transform Your Data Integration Platform From Informatica To ODI Jade Global
Watch this webinar to know why to transform your Data Integration Platform from Informatica To ODI. Join us for the live demo of the InfatoODI tool and learn how you can reduce your implementation time by up to 70% and increase your productivity gains by up to 5 times. For more information, please visit: http://informaticatoodi.jadeglobal.com/
Finding ways to make ETL loads faster is not always obvious. Moreover, there is a difference in how to tune OLAP vs OLTP databases. Some of the techniques learned through years of tuning EBS seem to make no effect on tuning a BI ETL. This presentation will discuss why this is the case, present some techniques on how to find the bottlenecks in your BI ETL jobs and some techniques to tune these slow SQL statements, improving the speed of nightly ETL jobs. Attendees will learn the steps to monitor ETLs, capture Problem SQL and gain knowledge to improve the overall ETL Performance.
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
Many enterprises are turning to Apache Hadoop to enable Big Data Analytics and reduce the costs of traditional data warehousing. Yet, it is hard to succeed when 80% of the time is spent on moving data and only 20% on using it. It’s time to swap the 80/20! The Big Data experts at Attunity and Hortonworks have a solution for accelerating data movement into and out of Hadoop that enables faster time-to-value for Big Data projects and a more complete and trusted view of your business. Join us to learn how this solution can work for you.
The document describes the software architecture of Informatica PowerCenter ETL product. It consists of 3 main components: 1) Client tools that enable development and monitoring. 2) A centralized repository that stores all metadata. 3) The server that executes mappings and loads data into targets. The architecture diagram shows the data flow from sources to targets via the server.
An introduction to SQL 2014 in-memory OLTP know as Hekaton (XTP) Introduction. I presented this session at SQL User Group Melbourne sometime back. Suitable for people who are getting to know more about Hekaton
This document summarizes the key points from a presentation on SQL Server 2016. It discusses in-memory and columnstore features, including performance gains from processing data in memory instead of on disk. New capabilities for real-time operational analytics are presented that allow analytics queries to run concurrently with OLTP workloads using the same data schema. Maintaining a columnstore index for analytics queries is suggested to improve performance.
The document discusses StreamHorizon's "adaptiveETL" platform for big data analytics. It highlights limitations of legacy ETL platforms and StreamHorizon's advantages, including massively parallel processing, in-memory processing, quick time to market, low total cost of ownership, and support for big data architectures like Hadoop. StreamHorizon is presented as an effective and cost-efficient solution for data integration and processing projects.
Organization now trying to make themselves much more operational with easy-to-interoperate data.
Data is most important part of any organization.
Data is backbone of any report and reports are the baseline on which all the vital management decisions are taken.
Data coexist in different maybe heterogeneous data sources.
The document summarizes Microsoft's SQL Server 2005 Analysis Services (SSAS). It provides an overview of SSAS capabilities such as data mining algorithms, unified dimensional modeling, scalability features, and integrated manageability with SQL Server. It also describes demos of the OLAP and data mining capabilities and how SSAS can be deployed and managed for scalability, availability, and serviceability.
Any data source becomes an SQL Query with all the power of
Apache Spark. Querona is a virtual database that seamlessly connects any data source with Power BI, TARGIT, Qlik, Tableau, Microsoft Excel or others. It lets you build your
own universal data model and share it among reporting tools.
Querona does not create another copy of your data, unless you want to accelerate your reports and use build-in execution engine created for purpose of Big Data analytics. Just write standard SQL query and let Querona consolidate data on the fly, use one of execution engines and accelerate processing no matter what kind and how many sources you have.
The document discusses how Oracle Database 11g can help lower IT costs through features like grid computing, high availability, storage optimization, and security. It provides examples of how Oracle RAC, Exadata, Automatic Storage Management, compression, and other 11g capabilities allow customers to consolidate servers and storage, improve performance, and reduce costs compared to alternative solutions. Overall the document promotes Oracle Database 11g as enabling lower costs through grid computing, optimized storage, high performance, and security.
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...Doina Draganescu
Oracle Data Integration provides high performance, real-time data integration across heterogeneous systems and data sources. It uses a declarative design approach to simplify and accelerate data integration projects. Oracle GoldenGate enables low-impact change data capture and real-time data replication between databases for real-time data warehousing. Using Oracle Data Integration and GoldenGate together provides a best-in-class solution for real-time data integration and warehousing.
What is a Data Warehouse and How Do I Test It?RTTS
ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them.
Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace.
Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them?
RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies.
You will learn the answer to the following questions:
• What is Big Data and what does it mean to me?
• What are the business reasons for a building a Data Warehouse and for using Business Intelligence software?
• How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective?
• Who are the primary players in this software space?
• How do I test these environments?
• What tools should I use?
This slide deck is geared towards:
QA Testers
Data Architects
Business Analysts
ETL Developers
Operations Teams
Project Managers
...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.
Bill Hayduk is the founder and CEO of QuerySurge, a software division that provides data integration and analytics solutions, with headquarters in New York; QuerySurge was founded in 1996 and has grown to serve Fortune 1000 customers through partnerships with technology companies and consulting firms. The document discusses the data and analytics marketplace and provides an overview of concepts like data warehousing, ETL, BI, data quality, data testing, big data, Hadoop, and NoSQL.
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
The Briefing Room with Dr. Robin Bloor and Splice Machine
Live Webcast August 11, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e1b33c9d45b178e13784b4a971a4c1349
The ETL process was born out of necessity, and for decades it has been the glue between data sources and target applications. But as data
growth soars and increased competition demands real-time data, standard ETL has become brittle and often unmanageable. Scaling up resources can do the trick, but it’s very costly and only a matter of time before the processes hit another bottleneck. When outmoded ETL stands in the way of real-time analytics, it might be time to consider a completely new approach.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he explains how modern, data-driven architectures must adopt an equally capable data integration strategy. He’ll be briefed by Rich Reimer of Splice Machine, who will discuss how his company solves ETL performance issues and enables real-time analytics and reports on big data. He will show that by leveraging the scale-out power of Hadoop and the in-memory speed of Spark, users can bring both analytical and operational systems together, eventually performing transformations only when needed.
Visit InsideAnalysis.com for more information.
This document discusses ETL (Extract, Transform, Load) processes in data warehousing. It defines ETL as extracting data from source systems, transforming it (e.g. performing calculations and conforming data), and loading it into a data warehouse or target storage. The document notes that 70-80% of BI/DW projects rely on reliable ETL processes and that the ETL market is large and growing. It provides examples of how to implement ETL using scripting, ETL tools, or custom coding and outlines key features of ETL tools.
SQL 2012 provides several new capabilities to improve availability, speed, and analytics including AlwaysOn for high availability, columnstore indexes for faster query performance, and PowerPivot and Crescent for self-service business intelligence. It also features improvements to management and monitoring tools, upgrades, and support for big data solutions like Hadoop. While SQL 2008 meets current needs, SQL 2012 aims to support larger environments and includes performance enhancements that can benefit organizations of any size.
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
This document provides an overview of new features in Oracle Database 12c Release 1 (12.1.0.2). It discusses Oracle Database In-Memory for accelerating analytics, improvements for developers like support for JSON and RESTful services, capabilities for accessing big data using SQL, enhancements to Oracle Multitenant for database consolidation, and other performance improvements. The document also briefly outlines features like Oracle Rapid Home Provisioning, Database Backup Logging Recovery Appliance, and Oracle Key Vault.
The document describes the results of a proof of concept test comparing the performance of Oracle Database 12c In-Memory on Oracle SPARC M7 servers versus Intel servers. The test loaded and populated database tables with the TPC-H schema and benchmark and measured query throughput and response times under increasing user load. The SPARC M7 servers achieved significantly higher query throughput and lower response times compared to the Intel servers, demonstrating the benefits of the dedicated acceleration engines built into the SPARC M7 chip for in-memory processing.
Тренинг для продвинутых разработчиков и администраторов Oracle. Посвящен самому известному и распространенному на планете инструменту для разработки, тестирования, управления и оптимизации баз данных и приложений для СУБД - продукту Toad от Dell Software.
The document summarizes features and capabilities of Oracle Database including:
- Support for structured and unstructured data types including images, XML, and multimedia.
- Tools for managing growth of data and enabling innovation with different data types.
- Self-managing capabilities that help liberate DBAs from resource management tasks.
- Features for high performance, availability, security and compliance at lower costs.
Flink in Zalando's world of Microservices ZalandoHayley
Apache Flink Meetup at Zalando Technology, May 2016
By Javier Lopez & Mihail Vieru, Zalando
In this talk we present Zalando's microservices architecture and introduce Saiki – our next generation data integration and distribution platform on AWS. We show why we chose Apache Flink to serve as our stream processing framework and describe how we employ it for our current use cases: business process monitoring and continuous ETL. We then have an outlook on future use cases.
The document describes the software architecture of Informatica PowerCenter ETL product. It consists of 3 main components: 1) Client tools that enable development and monitoring. 2) A centralized repository that stores all metadata. 3) The server that executes mappings and loads data into targets. The architecture diagram shows the data flow from sources to targets via the server.
An introduction to SQL 2014 in-memory OLTP know as Hekaton (XTP) Introduction. I presented this session at SQL User Group Melbourne sometime back. Suitable for people who are getting to know more about Hekaton
This document summarizes the key points from a presentation on SQL Server 2016. It discusses in-memory and columnstore features, including performance gains from processing data in memory instead of on disk. New capabilities for real-time operational analytics are presented that allow analytics queries to run concurrently with OLTP workloads using the same data schema. Maintaining a columnstore index for analytics queries is suggested to improve performance.
The document discusses StreamHorizon's "adaptiveETL" platform for big data analytics. It highlights limitations of legacy ETL platforms and StreamHorizon's advantages, including massively parallel processing, in-memory processing, quick time to market, low total cost of ownership, and support for big data architectures like Hadoop. StreamHorizon is presented as an effective and cost-efficient solution for data integration and processing projects.
Organization now trying to make themselves much more operational with easy-to-interoperate data.
Data is most important part of any organization.
Data is backbone of any report and reports are the baseline on which all the vital management decisions are taken.
Data coexist in different maybe heterogeneous data sources.
The document summarizes Microsoft's SQL Server 2005 Analysis Services (SSAS). It provides an overview of SSAS capabilities such as data mining algorithms, unified dimensional modeling, scalability features, and integrated manageability with SQL Server. It also describes demos of the OLAP and data mining capabilities and how SSAS can be deployed and managed for scalability, availability, and serviceability.
Any data source becomes an SQL Query with all the power of
Apache Spark. Querona is a virtual database that seamlessly connects any data source with Power BI, TARGIT, Qlik, Tableau, Microsoft Excel or others. It lets you build your
own universal data model and share it among reporting tools.
Querona does not create another copy of your data, unless you want to accelerate your reports and use build-in execution engine created for purpose of Big Data analytics. Just write standard SQL query and let Querona consolidate data on the fly, use one of execution engines and accelerate processing no matter what kind and how many sources you have.
The document discusses how Oracle Database 11g can help lower IT costs through features like grid computing, high availability, storage optimization, and security. It provides examples of how Oracle RAC, Exadata, Automatic Storage Management, compression, and other 11g capabilities allow customers to consolidate servers and storage, improve performance, and reduce costs compared to alternative solutions. Overall the document promotes Oracle Database 11g as enabling lower costs through grid computing, optimized storage, high performance, and security.
6. real time integration with odi 11g & golden gate 11g & dq 11g 20101103 -...Doina Draganescu
Oracle Data Integration provides high performance, real-time data integration across heterogeneous systems and data sources. It uses a declarative design approach to simplify and accelerate data integration projects. Oracle GoldenGate enables low-impact change data capture and real-time data replication between databases for real-time data warehousing. Using Oracle Data Integration and GoldenGate together provides a best-in-class solution for real-time data integration and warehousing.
What is a Data Warehouse and How Do I Test It?RTTS
ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them.
Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace.
Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them?
RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies.
You will learn the answer to the following questions:
• What is Big Data and what does it mean to me?
• What are the business reasons for a building a Data Warehouse and for using Business Intelligence software?
• How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective?
• Who are the primary players in this software space?
• How do I test these environments?
• What tools should I use?
This slide deck is geared towards:
QA Testers
Data Architects
Business Analysts
ETL Developers
Operations Teams
Project Managers
...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.
Bill Hayduk is the founder and CEO of QuerySurge, a software division that provides data integration and analytics solutions, with headquarters in New York; QuerySurge was founded in 1996 and has grown to serve Fortune 1000 customers through partnerships with technology companies and consulting firms. The document discusses the data and analytics marketplace and provides an overview of concepts like data warehousing, ETL, BI, data quality, data testing, big data, Hadoop, and NoSQL.
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
The Briefing Room with Dr. Robin Bloor and Splice Machine
Live Webcast August 11, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e1b33c9d45b178e13784b4a971a4c1349
The ETL process was born out of necessity, and for decades it has been the glue between data sources and target applications. But as data
growth soars and increased competition demands real-time data, standard ETL has become brittle and often unmanageable. Scaling up resources can do the trick, but it’s very costly and only a matter of time before the processes hit another bottleneck. When outmoded ETL stands in the way of real-time analytics, it might be time to consider a completely new approach.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he explains how modern, data-driven architectures must adopt an equally capable data integration strategy. He’ll be briefed by Rich Reimer of Splice Machine, who will discuss how his company solves ETL performance issues and enables real-time analytics and reports on big data. He will show that by leveraging the scale-out power of Hadoop and the in-memory speed of Spark, users can bring both analytical and operational systems together, eventually performing transformations only when needed.
Visit InsideAnalysis.com for more information.
This document discusses ETL (Extract, Transform, Load) processes in data warehousing. It defines ETL as extracting data from source systems, transforming it (e.g. performing calculations and conforming data), and loading it into a data warehouse or target storage. The document notes that 70-80% of BI/DW projects rely on reliable ETL processes and that the ETL market is large and growing. It provides examples of how to implement ETL using scripting, ETL tools, or custom coding and outlines key features of ETL tools.
SQL 2012 provides several new capabilities to improve availability, speed, and analytics including AlwaysOn for high availability, columnstore indexes for faster query performance, and PowerPivot and Crescent for self-service business intelligence. It also features improvements to management and monitoring tools, upgrades, and support for big data solutions like Hadoop. While SQL 2008 meets current needs, SQL 2012 aims to support larger environments and includes performance enhancements that can benefit organizations of any size.
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
This document provides an overview of new features in Oracle Database 12c Release 1 (12.1.0.2). It discusses Oracle Database In-Memory for accelerating analytics, improvements for developers like support for JSON and RESTful services, capabilities for accessing big data using SQL, enhancements to Oracle Multitenant for database consolidation, and other performance improvements. The document also briefly outlines features like Oracle Rapid Home Provisioning, Database Backup Logging Recovery Appliance, and Oracle Key Vault.
The document describes the results of a proof of concept test comparing the performance of Oracle Database 12c In-Memory on Oracle SPARC M7 servers versus Intel servers. The test loaded and populated database tables with the TPC-H schema and benchmark and measured query throughput and response times under increasing user load. The SPARC M7 servers achieved significantly higher query throughput and lower response times compared to the Intel servers, demonstrating the benefits of the dedicated acceleration engines built into the SPARC M7 chip for in-memory processing.
Тренинг для продвинутых разработчиков и администраторов Oracle. Посвящен самому известному и распространенному на планете инструменту для разработки, тестирования, управления и оптимизации баз данных и приложений для СУБД - продукту Toad от Dell Software.
The document summarizes features and capabilities of Oracle Database including:
- Support for structured and unstructured data types including images, XML, and multimedia.
- Tools for managing growth of data and enabling innovation with different data types.
- Self-managing capabilities that help liberate DBAs from resource management tasks.
- Features for high performance, availability, security and compliance at lower costs.
Flink in Zalando's world of Microservices ZalandoHayley
Apache Flink Meetup at Zalando Technology, May 2016
By Javier Lopez & Mihail Vieru, Zalando
In this talk we present Zalando's microservices architecture and introduce Saiki – our next generation data integration and distribution platform on AWS. We show why we chose Apache Flink to serve as our stream processing framework and describe how we employ it for our current use cases: business process monitoring and continuous ETL. We then have an outlook on future use cases.
Similar to CERN_DIS_ODI_OGG_final_oracle_golde.pptx (20)
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
2. 2
OLTP & ODS
Systems
Data
Warehouse, Data Mart
Oracle
PeopleSoft, Siebel, SAP
Custom Apps
Files
Excel
XML
IT Obstacles to Unifying Information
What is it costing you to unify your data?
Enterprise
Performance
Custom
Reporting
Packaged
Applications
Business
Intelligence
Analytics
Data
Federation
Data
Warehousing
Custom
Data Marts
Data Access
Data Silos
SQL
Java
Batch Scripts
Data Hubs
Data
Migration
Data
Replication
Fragmented
Data Silos
Slow
Performance
Poor
Data Quality
OLAP
Out of sync What’s the
cost?
2
4. 4
Oracle Data Integration
The solution for enterprise-wide real-time data
Dramatically improve the accessibility, reliability, and
quality of critical data across enterprise systems
Web Services
Databases
Distributed
systems
Legacy systems
OLAP systems
OLTP systems
Mission critical
systems and data
Business
Intelligence,
Performance
Management
Data Warehouses,
MDM
Data Access
Real-time Data
Data Quality
SOA
5. 5
Oracle Data Integration
The solution for enterprise-wide real-time data
Dramatically improve the accessibility, reliability, and
quality of critical data across enterprise systems
Enterprise Data Quality
Databases
Distributed
systems
Legacy systems
OLAP systems
OLTP systems
Mission critical
systems and data
Business
Intelligence,
Performance
Management
Data Warehouses,
MDM
ODI EE
Oracle Golden Gate
SOA
6. 6
Oracle Data Integration
The solution for enterprise-wide real-time data
Dramatically improve the accessibility, reliability, and
quality of critical data across enterprise systems
Enterprise Data Quality
Databases
Distributed
systems
Legacy systems
OLAP systems
OLTP systems
Mission critical
systems and data
Business
Intelligence,
Performance
Management
Data Warehouses,
MDM
ODI EE
Oracle Golden Gate
SOA
7. 7
Why Does ODI Win?
ODI is Faster
• Fastest E-LT Bulk/Batch Performance
• Faster Real Time integration (sub-second trickle) with CDC,
Replication, and SOA infrastructure
• Faster Project Setup, Design and Delivery
ODI is Simpler
• Simpler Setup, Configuration, Management, and Monitoring
• Simpler way to do Mapping using Declarative SQL Interfaces
• Simpler Deployment with Fewer Hardware Devices
• Simpler extensibility with Knowledge Module code templates
ODI is Saves Money (Lower TCO, Higher ROI)
• Less Hardware & Energy Costs with E-LT Architecture
• Less Time Wasted on Unnecessary ETL Mappings, Scripting,
and Complex Training
• Less Integration Overhead Integrating with Applications, SOA,
and Management Software
8. 8
ODI Saves Money
E-LT Runs on Existing Servers with Shared Administration
Conventional ETL Architecture
Extract Load
Transform
Next Generation Architecture
Load
Extract
Transform Transform
Typical: Separate ETL Server
• Proprietary ETL Engine
• Expensive Manual Parallel Tuning
• High Costs for Standalone Server
ODI: No New Servers
• Lower Cost: Leverage Compute Resources &
Partition Workload efficiently
• Efficient: Exploits Database Optimizer
• Fast: Exploits Native Bulk Load & Other
Database Interfaces
• Scalable: Scales as you add Processors to
Source or Target
• Manageability: unified Enterprise Manager
Benefits
• Better Hardware Leverage
• Easier to Manage & Lower Cost
• Simple Tuning & Linear Scalability
E-LT
9. 9
ODI is Faster
Up to 7TB per hour of real world data loading and complex transformations
ODI ELT (on Exadata/any DW)
ODI scales with the Database
Loads increase linearly as DW scales
ODI runs on relational technologies – no ETL hardware
required
No new hardware required as data sets grow
ODI processes used only during integration runs
Databases continually available for OLTP, BI, DW, etc
Common administration, monitoring and management
All the benefits of rapid tools-based ETL development
Conventional ETL
As data sets grow, more hardware ($$) needed to scale
ETL parallel optimization and design ($$$) is heavily
dependent on resources available to the ETL environment
Sources, integrations, targets must be designed to
match processing power of ETL environment
Source flat files split to match # of ETL engine CPU’s
Integration grid setup appropriately to match # of ETL
engine CPU’s
Target partitions, table spaces to match # of ETL
engine CPU’s
ETL engine hardware resources only used for ETL
Cannot be utilized for OLTP, BI, DW, etc.
Hardware not co located, multiple vendors
Different management, monitoring and administration from
database and BI infrastructure ($$)
10. 10
“Old Style” ETL
• Monolithic & Expensive Environments
• Fragile, Hard to Manage
• Difficult to Tune or Optimize
Stage Prod
Lookup
Data
Sources
ETL engines require
BIG H/W and heavy
parallel tuning
Extract Transform Load Lookups/Calcs Transform Load
ETL Engine(s)
Meta
Lookup
Data
ETL Metadata Server
Development, QA,
System (etc)
Environments
CDC Hub(s)
Mgmt Server
Monolithic data
streaming architecture
Admin Server
Near Real Time
Capture
Agent
11. 11
Modern Data Integration
• Lightweight, Inexpensive Environments – Agents
• Resilient, Easy to Manage – Non-Invasive
• Easy to Optimize and Tune – uses DBMS power
Extract Transform Load Lookups/Calcs Transform Load
Sources
Stage Prod
Lookup
Data Data Transformation
Bulk Data Movement
Set-based
SQL
transforms
typically
faster
SQL Load
inside DB is
always faster
OGG OGG
Near Real Time
True Real Time
Set-based
SQL
transforms
typically
faster
Flexible
options for
real time data
streams
ODI
12. 12
Best Data Integration for Exadata
Top Performance, Smallest Footprint
12
EMP DEPT
DIM
FACT
DIM
DIM
DIM
Oracle
Data Integrator
EMP DEPT
Oracle
GoldenGate
Batch Feeds, Incremental
Updates and in-DB
transformations via ELT
Non-Invasive Real Time Transaction Feeds
tx4 tx2 tx1
tx3
• Run ODI, EDQ & OGG Directly on
Exadata
• Support Any Latency Data Feeds
• Non-Invasive Source Capture
• Most Cost-Effective and High-
Performance Exadata Data Loading
Oracle
Enterprise DQ
13. 13
ODI is Simpler
Speed Project Delivery and Time to Market with ODI
• Development Productivity
• 40% Efficiency Gains
• Environment Setup (ex: BI Apps)
• 33-50% Less Complex
Number of Setup Steps 10
Number of Servers 3
Number of connections 7
Number of Setup Steps 7
Number of Servers 1
Number of connections 3
14. 14
Traditional procedural ETL
Traditional ETL row to row complexity
Mapping Designer
Mapping Designer
Mapping Designer
CUSTOMER
CUSTOMER
LOV_PRODUCT
LOV_PRODUCT
PRODUCT
PRODUCT
ORDER_LINES
ORDER_LINES
ORDERS
ORDERS
FISCAL_CALENDAR
FISCAL_CALENDAR
AGG_SALES
AGG_SALES
SQ_ORDERS
SQ_ORDERS
SQ_ORDER_LINES
SQ_ORDER_LINES SORT1
SORT1 JOIN1
JOIN1
JOIN2
JOIN2
SQ_CUST
SQ_CUST SORT2
SORT2 SORT5
SORT5
SORT4
SORT4
SORT3
SORT3
SQ_LOV
SQ_LOV
SQ_PRD
SQ_PRD
JOIN3
JOIN3
JOIN4
JOIN4
JOIN5
JOIN5
COUNTRY
COUNTRY
UPDATE
Transform2
Transform2
Transform1
Transform1
Transform5
Transform5
Transform10
Transform10
Transform10
Transform10
Transform0
Transform0
LOOKUP_KEY
LOOKUP_KEY
LOOKUP_KEY
LOOKUP_KEY INSERT
INSERT
Aggregator
SQ_FISC
SQ_FISC
Informatica Mapping
One or a related group
of flow-based procedural ETL
Mappings – first sample
One or a related group
of flow-based procedural ETL
Mappings - second sample
One declarative ODI interface plus
selection among existing Knowledge Modules
15. 15
Traditional procedural ETL
Traditional ETL row to row complexity
Mapping Designer
Mapping Designer
Mapping Designer
CUSTOMER
CUSTOMER
LOV_PRODUCT
LOV_PRODUCT
PRODUCT
PRODUCT
ORDER_LINES
ORDER_LINES
ORDERS
ORDERS
FISCAL_CALENDAR
FISCAL_CALENDAR
AGG_SALES
AGG_SALES
SQ_ORDERS
SQ_ORDERS
SQ_ORDER_LINES
SQ_ORDER_LINES SORT1
SORT1 JOIN1
JOIN1
JOIN2
JOIN2
SQ_CUST
SQ_CUST SORT2
SORT2 SORT5
SORT5
SORT4
SORT4
SORT3
SORT3
SQ_LOV
SQ_LOV
SQ_PRD
SQ_PRD
JOIN3
JOIN3
JOIN4
JOIN4
JOIN5
JOIN5
COUNTRY
COUNTRY
UPDATE
Transform2
Transform2
Transform1
Transform1
Transform5
Transform5
Transform10
Transform10
Transform10
Transform10
Transform0
Transform0
LOOKUP_KEY
LOOKUP_KEY
LOOKUP_KEY
LOOKUP_KEY INSERT
INSERT
Aggregator
SQ_FISC
SQ_FISC
Informatica Mapping
One or a related group
of flow-based procedural
ETL
Mappings – first sample
Flow Generation is AUTOMATIC, written by ODI directly!
16. 16
You describe how the relational infrastructure where ODI works is done
ODI builds the flow for a specific loading automatically!
Topology Module on ODI
Topology module allows to describe all the information on the technology where
the ELT projects work, starting from specific definition on the technologies that
are used, going to physical description on how to access a server, wich user
and password to enter, which schema users or database are involved in the
jobs. The final developer will have only a logical reference to the servers
17. 17
1-17
Journalize
Read from CDC
Source
Load
From Sources to
Staging
Check
Constraints before
Load
Integrate
Transform and Move
to Targets
Service
Expose Data and
Transformation
Services
Reverse
Engineer Metadata
Reverse
Journalize
Load
Check
Integrate
Services
Pluggable Knowledge Modules Architecture
CDC
Sources
Staging Tables
Error Tables
Target Tables
W
S
W
S W
S
Declarative mapping + Knowledge Modules = Generated Code
• 120+ KMs out-of-the-box
Tailor to existing best practices
Ease administration work
Reduce cost of ownership
• Customizable and extensible
KM
Interpreter
KM’s Meta Code
Metadata
Executed Code
SAP/R3
Siebel
DB2 Exp/Imp JMS Queues
Oracle
SQL*Loader
TPump/
Multiload
Type II SCD
Oracle Merge
Oracle Web
Services
18. 18
Technical and business metadata: ability to manage in a unique and centralized way jobs,
their transformation, schedulings, data definition language etc.
Central Monitoring and Logging: verifying the execution of jobs
Jobs, auditing
Graphical environment allows to describe job complex as needed, created putting
together simple steps like the declarative design
ELT Agent writes back on the repository the auditing offor
the job executions, giving information on generated code,
warnings and database errors that can eventually occur
19. 19
Oracle Data Integration
The solution for enterprise-wide real-time data
Dramatically improve the accessibility, reliability, and
quality of critical data across enterprise systems
Enterprise Data Quality
Databases
Distributed
systems
Legacy systems
OLAP systems
OLTP systems
Mission critical
systems and data
Business
Intelligence,
Performance
Management
Data Warehouses,
MDM
ODI EE
Oracle Golden Gate
SOA
20. 20
Oracle GoldenGate provides low-impact capture, routing, transformation,
and delivery of transactional data across heterogeneous environments in
real time
Key Differentiators:
Non-intrusive, low-impact, sub-second latency
Open, modular architecture - Supports
heterogeneous sources and targets
Maintains transactional integrity - Resilient
against interruptions and failures
Oracle GoldenGate Overview
Performance
Flexible and Extensible
Reliable
21. 21
Oracle GoldenGate Use Cases
Enterprise-wide Solution for Real Time Data Needs
Log Based, Real-
Time Change Data
Capture
Heterogeneous
Source Systems
EDW
ODS
EDW
Active-Active High
Availability
Zero Downtime
Migration and
Upgrades
Real-time BI
Fully Active
Distributed Database
Reporting
Database
ETL
ETL
Query Offloading
Data Distribution
New DB/
OS/HW/App
Global Data Centers
SOA/EDA
Oracle
GoldenGate
Reduce Costs
Lower Risks
Achieve Operational
Excellence
22. 22
Advantages of Oracle GoldenGate Architecture
• Captures once, delivers to many targets for different
uses
• Non-invasive, log-based capture
• Moves only committed data, reduces bandwidth needs
Reduced Overhead and TCO
• Subsecond latency even with high data volumes
• Preserves transaction integrity
• Ensures data recoverability
High Performance with Reliability
• Provides decoupled, modular architecture
• Supports heterogeneous sources and targets, and
different latency needs
• Coexists and integrates with ELT/ETL and messaging
solutions
Flexibility and Ease of Use
23. 23
How Oracle GoldenGate Works
LAN/WAN
Internet
Capture
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Source
Oracle & Non-Oracle
Database(s)
Target
Oracle & Non-Oracle
Database(s)
24. 24
How Oracle GoldenGate Works
LAN/WAN
Internet
Capture
Trail
Source
Oracle & Non-Oracle
Database(s)
Target
Oracle & Non-Oracle
Database(s)
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
25. 25
How Oracle GoldenGate Works
LAN/WAN
Internet
Capture
Trail
Pump
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
Pump: distributes data for routing to target(s).
Source
Oracle & Non-Oracle
Database(s)
Target
Oracle & Non-Oracle
Database(s)
26. 26
How Oracle GoldenGate Works
LAN/WAN
Internet
TCP/IP
Capture
Trail
Pump
Trail
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
Pump: distributes data for routing to target(s).
Route: data is compressed,
encrypted for routing to target(s).
Source
Oracle & Non-Oracle
Database(s)
Target
Oracle & Non-Oracle
Database(s)
27. 27
How Oracle GoldenGate Works
LAN/WAN
Internet
TCP/IP
Capture
Trail
Pump Delivery
Trail
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
Pump: distributes data for routing to target(s).
Route: data is compressed,
encrypted for routing to target(s).
Delivery: applies data with transaction
integrity, transforming the data as required.
Source
Oracle & Non-Oracle
Database(s)
Target
Oracle & Non-Oracle
Database(s)
28. 28
How Oracle GoldenGate Works
LAN/WAN
Internet
TCP/IP
Capture
Trail
Pump Delivery
Trail
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
Pump: distributes data for routing to target(s).
Route: data is compressed,
encrypted for routing to target(s).
Delivery: applies data with transaction
integrity, transforming the data as required.
Source
Oracle & Non-Oracle
Database(s)
Target
Oracle & Non-Oracle
Database(s)
Bi-directional
29. 29
• Capture, Pump, and Delivery save positions to a checkpoint file so they can
recover in case of failure
GoldenGate Checkpointing
Capture Delivery
Pump
Commit Ordered
Source Trail
Commit Ordered
Target Trail
Source
Database
Target
Database
Begin, TX 1
Insert, TX 1
Begin, TX 2
Update, TX 1
Insert, TX 2
Commit, TX 2
Begin, TX 3
Insert, TX 3
Begin, TX 4
Commit, TX 3
Delete, TX 4
Begin, TX 2
Insert, TX 2
Commit, TX 2
Begin, TX 3
Insert, TX 3
Commit, TX 3
Begin, TX 2
Insert, TX 2
Commit, TX 2
Start of Oldest Open (Uncommitted)
Transaction
Current Read
Position
Capture
Checkpoint
Current
Write
Position
Current
Read
Position
Pump
Checkpoint
Current
Write
Position
Current
Read
Position
Delivery
Checkpoint
31. 31
Zero Downtime Oracle Upgrade Implementation Steps:
Example of 9i 11g Cross-Platform
9i
Solaris
1. Start Oracle GoldenGate Capture module
2. - 4. Initial loading, export import of a new 11g target db (ELT/flat
files/jdbc/native db loaders/import export tablespaces etc.)
5. Start Oracle GoldenGate Delivery module at target
6. Start Oracle GoldenGate’s Capture at 11g
7. Start Oracle GoldenGate’s Delivery process 9i (old source, contingency)
1
2,
Oracle
GoldenGate
Capture
11g
Linux
3
4
5
6
7
Detect
collision
32. 32
Databases O/S and Platforms
Oracle GoldenGate Capture:
Oracle
DB2 for v 9.7
Microsoft SQL Server
Sybase ASE
Teradata
Enscribe
SQL/MP
SQL/MX
MySQL
JMS message queues
Oracle GoldenGate Delivery:
All listed above, plus:
TimesTen, DB2 for iSeries
Exadata, Netezza, Greenplum, and HP
Neoview
Linux
Sun Solaris
Windows 2000, 2003, XP
HP NonStop
HP-UX
HP OpenVMS
IBM AIX
IBM z Series
zLinux
Oracle GoldenGate 11g: Heterogeneity
NEW
NEW
NEW
NEW