Data warehousing is an architectural model that gathers data from various sources into a single unified data model for analysis purposes. It consists of extracting data from operational systems, transforming it, and loading it into a database optimized for querying and analysis. This allows organizations to integrate data from different sources, provide historical views of data, and perform flexible analysis without impacting transaction systems. While implementation and maintenance of a data warehouse requires significant costs, the benefits include a single access point for all organizational data and optimized systems for analysis and decision making.
This presentation is about the following points,
1. Introduction to ETL testing,
2. What is use of testing
3. What is quality & standards
4. Responsibilities of a ETL Tester
This presentation is about the following points,
1. Introduction to ETL testing,
2. What is use of testing
3. What is quality & standards
4. Responsibilities of a ETL Tester
Introduction about Data Stage ,
Difference between Data Stage 7.5.2 and 8.0.1,
What's new in Data Stage 8.0.1? ,
What is way ahead in Data Stage? ,
IBM Information Server architecture ,
Datastage within the IBM Information Server architecture ,
Difference between Server Jobs and Parallel Jobs
Difference between Pipeline Parallelism and Partition, Parallelism ,
Partition techniques (Round Robin, Random,
Types of database processing,OLTP VS Data Warehouses(OLAP), Subject-oriented
Integrated
Time-variant
Non-volatile,
Functionalities of Data Warehouse,Roll-Up(Consolidation),
Drill-down,
Slicing,
Dicing,
Pivot,
KDD Process,Application of Data Mining
Data Bases, Data Warehousing, Data Mining, Decision Support System (DSS), OLAP, OLTP, MOLAP, ROLAP, Data Mart, Meta Data, ETL Process, Drill Up, Roll Down, Slicing, Dicing, Star Schema, SnowFlake Scheme, Dimentional Modelling
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Introduction about Data Stage ,
Difference between Data Stage 7.5.2 and 8.0.1,
What's new in Data Stage 8.0.1? ,
What is way ahead in Data Stage? ,
IBM Information Server architecture ,
Datastage within the IBM Information Server architecture ,
Difference between Server Jobs and Parallel Jobs
Difference between Pipeline Parallelism and Partition, Parallelism ,
Partition techniques (Round Robin, Random,
Types of database processing,OLTP VS Data Warehouses(OLAP), Subject-oriented
Integrated
Time-variant
Non-volatile,
Functionalities of Data Warehouse,Roll-Up(Consolidation),
Drill-down,
Slicing,
Dicing,
Pivot,
KDD Process,Application of Data Mining
Data Bases, Data Warehousing, Data Mining, Decision Support System (DSS), OLAP, OLTP, MOLAP, ROLAP, Data Mart, Meta Data, ETL Process, Drill Up, Roll Down, Slicing, Dicing, Star Schema, SnowFlake Scheme, Dimentional Modelling
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Introduction
• The Past and The Problem
• What is a Data Warehouse?
• Components of a Data Warehouse
• OLAP, Metadata, Data Mining
• Getting the Data in
• Benefits vs. Costs
• Conclusion & Questions
3. The Past and The Problem
• Only had scattered transactional systems in the
organization – data spread among different
systems
• Transactional systems were not designed for
decision support analysis
• Data constantly changes on transactional systems
• Lack of historical data
• Often resources were taxed with both needs on the
same systems
4. The Past and The Problem
• Operational databases are designed to keep
transactions from daily operations. It is
optimized to efficiently update or create
individual records
• A database for analysis on the other hand
needs to be geared toward flexible requests
or queries (Ad hoc, statistical analysis)
5. What is a Data Warehouse?
Data warehousing is an architectural model
designed to gather data from various
sources into a single unified data model for
analysis purposes.
6. What Is a Data Warehouse?
Term was introduced in 1990 by William
Immon
A managed database in which the data is:
• Subject Oriented
• Integrated
• Time Variant
• Non Volatile
7. Subject Oriented
• Organized around major subject areas in the
enterprise (Sales, Inventory, Financial, etc.)
• Only includes data which is used in the
decision making processes
• Elements used for transactional processing are
removed
8. Integrated
• Data from different sources are brought
together and consolidated
• The data is cleaned and made consistent
Example – Bank Systems using Different Codes
Loan Department – COMM
Transactional System - C
9. Time Variant
• Data in a Data Warehouse contains both
current and historical information
• Operational Systems contain only current
data
Systems typically retain data:
Operational Systems – 60 to 90 Days
Data Warehouse – 5 to 10 Years
10. Non Volatile
• Operational systems have continually
changing data
• Data Warehouses continually absorb current
data and integrates it with its existing data
(Aggregate or Summary tables)
Example of volatile data would be an account balance at a bank
11. What Is a Data Warehouse?
• Not a product, it is a process
• Combination of hardware and software
• Concept of a Data Warehouse is not new,
but the technology that allows it is
12. What Is a Data Warehouse?
Can often be set up as one VLDB (Very Large
Database) or a collection of subject areas
called Data Marts.
There are now tools which “unify” these Data
Marts and make it appear as a single
database.
13. What Is a Data Warehouse?
Transformation of Data to Information
Transaction Processing
Cleansing / & Normalization
Relational Warehouse
SQL reporting
Exploration / Analysis
Data
Information
14. Components of a Data Warehouse
Four General Components:
• Hardware
• DBMS - Database Management System
• Front End Access Tools
• Other Tools
In all components scalability is vital
Scalability is the ability to grow as your data and processing needs
increase
15. Components of a Data Warehouse -
Hardware
• Power - # of Processors, Memory, I/O Bandwidth,
and Speed of the Bus
• Availability – Redundant equipment
• Disk Storage - Speed and enough storage for the
loaded data set
• Backup Solution - Automated and be able to allow
for incremental backups and archiving older data
16. Components of a Data Warehouse -
DBMS
• Physical storage capacity of the DBMS
• Loading, indexing, and processing speed
• Availability
• Handle your data needs
• Operational integrity, reliability, and
manageability
17. Components of a Data Warehouse -
Front End & Other Tools
• Query Tools (SQL & GUI based)
• Report Writers
• Metadata Repositories
• OLAP (Online Analytical Processing)
• Data Mining Products
18. Components of a Data Warehouse –
Metadata Repositories
Metadata is Data about Data. Users and Developers
often need a way to find information on the data
they use. Information can include:
• Source System(s) of the Data, contact information
• Related tables or subject areas
• Programs or Processes which use the data
• Population rules (Update or Insert and how often)
• Status of the Data Warehouse’s processing and
condition
19. Components of a Data Warehouse –
OLAP Tools
OLAP - Online Analytical Processing. It works by
aggregating detail data and looks at it by dimensions
• Gives the ability to “Drill Down” in to the detail data
• Decision Support Analysis Tool
• Multidimensional DB focusing on retrieval of
precalculated data
• Ends the “big reports” with large amounts of detailed data
• These tools are often graphical and can run on a “thin
client” such as a web browser
20. Components of a Data Warehouse –
Data Mining
• Answers the questions you didn’t know to
ask
• Analyzes great amounts of data (usually
contained in a Data Warehouse) and looks
for trends in the data
• Technology now allows us to do this better
than in the past
21. Components of a Data Warehouse –
Data Mining
• Most famous example is the Huggies -
Heineken case
• Used in Retail sector to analyze buying
habits
• Used in financial areas to detect fraud
• Used in the stock market to find trends
• Used in scientific research
• Used in national security
22. Getting the Data In
• Data will come from multiple databases and
files within the organization
• Also can come from outside sources
• Examples:
• Weather Reports
• Demographic information by Zip Code
23. Getting the Data In
Three Steps :
1. Extraction Phase
2. Transformation Phase
3. Loading Phase
24. Getting the Data In
Extraction Phase:
• Source systems export data via files or
populates directly when the databases can
“talk” to each other
• Transfers them to the Data Warehouse
server and puts it into some sort of staging
area
25. Getting the Data In
Transformation Phase:
• Takes data and turns it into a form that is suitable
for insertion into the warehouse
• Combines related data
• Removes redundancies
• Common Codes (Commercial Customer)
• Spelling Mistakes (Lozenges)
• Consistency (PA,Pa,Penna,Pennsylvania)
• Formatting (Addresses)
26. Getting the Data In
Loading Phase:
• Places the cleaned data into the DBMS in
its final, useable form
• Compare data from source systems and the
Data Warehouse
• Document the load information for the users
28. Benefits
• Creates a single point for all data
• System is optimized and designed
specifically for analysis
• Access data without impacting the
operational systems
• Users can access the data directly without
the direct help from IT dept
29. Costs
• Cost of implementation & maintenance
(hardware, software, and staffing)
• Lack of compatibility between components
• Data from many sources are hard to
combine, data integrity issues
• Bad designs and practices can lead to costly
failures
30. Conclusion
• What is a Data Warehouse?
• Components of a Data Warehouse
• How the Data Gets In
• OLAP, Metadata, and Data Mining
• Benefits vs. Costs
Welcome
Honor to give back to PSU
Resume
Announce Informal Rules
Review Slide
Last Bullet - Resources being taxed is the lead in to the next slide
Operational Systems – Many Small Writes,
Data Warehouse – Heavy “Off” hour loads – Also can have a “trickle feed”
Go over part that is in pink. - The only thing for them to remember
Word “warehouse” – Reminds me of the end Indiana Jones when they had the Ark and put it in the large government warehouse never to be seen again. It implies storage.
Data Warehouse is not only for business. Military, Intelligence, and Scientific Needs:
9/11 & Human Genome Project
Lead in to next 4 slides
Examples of what is removed is control fields and timestamps needed for transactional processing.
Additionally Elements could be too small for analysis purposes – Retail environment recording the opening of a cash register drawer.
Example with Bank is Commercial Customers
Goal is for the data from the different sources to work seamlessly together.
Operational systems usually need to purge some of this data in order for lookups to be efficient.
Additional Hardware Costs as well
Depending on the time of day the balance can change. Difficult to make decisions.
DW – The data is usually checked and finalized. Once the data is finalized it should not change.
Corrections of errors do happen after the fact but the should be communicated to the users.
Despite marketing claims, can’t just buy a Data Warehouse
Collection of tools and processes
Improvement in Processing and Lower costs for this technology
Data Marts can be divided up by subject areas or geographically
Data Marts tend to be small departmental specific
Geographic division is an example of Distributed System.
Users don’t know where the data is coming from. In many cases, they don’t need to. However there should be documentation that the users can get to.
Give example of 20,000 is data?
What does it mean? Sales In The West of BBQ Grills in 2000?
Now if I say Wilt Chamberlain – Something completely different comes to mind.
Data Warehousing is a step taking low level transactional data and turning it into useful information.
Growth is important, to switch to another solution can be very costly.
Data – More years of data from more subject areas
Processing – Load times, Number of users, Query or request times.
Power – Requires powerful machines such as a main frame or clusters of servers like UNIX that can work together
Storage – Not only disk, there is tape and optical
Data Needs – No longer values and text. Searchable Pictures, fingerprint, video, audio, documents. Someday DNA.
Operational integrity – Enforce its rules – Security and “Atomic Transactions”
Reliability - Can it recover from failure quickly and easily
Manageability – Ability to do the day to day tasks with little or no effort.
Last 3 we will go into more detail
Query Tools – Not uncommon to have different tools for different types of users – Power Users
Report Writers – Prepare and distributed to many different mediums (email,web,wireless,voice).
Can be electronic and centrally stored also.
Value of metadata is sometimes over looked – Cost of Man hours.
Column names alone are not enough. Field named “Date”
Organizations without this sometime spend a lot of time researching this.
Changes in Data – Impact is Known
Example: White Board:
Dimensions represent the core components of a business plan and often relate to departmental functions. Typical standard dimensions are Time, Accounts, Product Line, Market, and Division. Dimensions are static in most databases; database dimensions rarely change over the life of the application.
Dimensions
Organization – East,West
Time – 2000, 2001
Product – Gas Grills, Charcoal Grills
Technology Will Only Improve Data Mining
Better connectivity is eliminating the need for raw files
Let users know what data got updated
Very High Maintenance:
Staffing & Roles
DBAs, Data Architects, Hardware/OS Specialists, Data Auditors, Support for Front End Tools, Documentation, Security