The document discusses Extract, Transform, Load (ETL) processes in data warehousing. It describes the three stages of ETL - extraction of data from source systems, transformation of data to prepare it for loading, and loading of data into the data warehouse. It provides details on different types of extraction, transformation techniques including selection, splitting/joining, conversion and enrichment, and loading strategies such as full refresh, incremental refresh, and trickle feed. Diagrams depict the ETL cycle and data movement from sources to the data warehouse.
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...y-asgari
این فایل مقدمه ای بر شناسائی و مقایسه میان داده کاوی و تحلیل روی خط است که با شناسائی وجوه تشابه و تناظر میان این دو ابزار به رابطه تکمیل کننده این دو دانش و تکنیک می پردازد.
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
The enormous legacy of EDW experience and best practices can be adapted to the unique capabilities of the Hadoop environment. In this webinar, in a point-counterpoint format, Dr. Kimball will describe standard data warehouse best practices including the identification of dimensions and facts, managing primary keys, and handling slowly changing dimensions (SCDs) and conformed dimensions. Eli Collins, Chief Technologist at Cloudera, will describe how each of these practices actually can be implemented in Hadoop.
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...y-asgari
این فایل مقدمه ای بر شناسائی و مقایسه میان داده کاوی و تحلیل روی خط است که با شناسائی وجوه تشابه و تناظر میان این دو ابزار به رابطه تکمیل کننده این دو دانش و تکنیک می پردازد.
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
The enormous legacy of EDW experience and best practices can be adapted to the unique capabilities of the Hadoop environment. In this webinar, in a point-counterpoint format, Dr. Kimball will describe standard data warehouse best practices including the identification of dimensions and facts, managing primary keys, and handling slowly changing dimensions (SCDs) and conformed dimensions. Eli Collins, Chief Technologist at Cloudera, will describe how each of these practices actually can be implemented in Hadoop.
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
ODAM is an Experiment Data Table Management System (EDTMS) that gives you an open access to your data and make them ready to be mined - A data explorer as bonus
A Common Database Approach for OLTP and OLAP Using an In-Memory Column DatabaseIshara Amarasekera
This presentation was prepared by Ishara Amarasekera based on the paper, A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database by Hasso Plattner.
This presentation contains a summary of the content provided in this research paper and was presented as a paper discussion for the course, Advanced Database Systems in Computer Science.
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...Puneet Kansal
This research article experimentally shows, how Multiple Queries can be provided to Hive and their execution can be reduced by searching common expression and common data source. The TPC-H queries are used for demonstration and test data is generated in variation of 2 GB, 5 GB and 10 GB using DBGEN software. Technology used in this is HADOOP and Hive. HADOOP is configured in Single user over ubunto Operating system.
Fully featured, commercially supported machine learning suites that can build Decision Trees in Hadoop are few and far between. Addressing this gap, Revolution Analytics recently enhanced its entire scalable analytics suite to run in Hadoop. In this talk, I will explain how our Decision Tree implementation exploits recent research reducing the computational complexity of decision tree estimation, allowing linear scalability with data size and number of nodes. This streaming algorithm processes data in chunks, allowing scaling unconstrained by aggregate cluster memory. The implementation supports both classification and regression and is fully integrated with the R statistical language and the rest of our advanced analytics and machine learning algorithms, as well as our interactive Decision Tree visualizer.
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
This Edureka "Hadoop tutorial For Beginners" ( Hadoop Blog series: https://goo.gl/LFesy8 ) will help you to understand the problem with traditional system while processing Big Data and how Hadoop solves it. This tutorial will provide you a comprehensive idea about HDFS and YARN along with their architecture that has been explained in a very simple manner using examples and practical demonstration. At the end, you will get to know how to analyze Olympic data set using Hadoop and gain useful insights.
Below are the topics covered in this tutorial:
1. Big Data Growth Drivers
2. What is Big Data?
3. Hadoop Introduction
4. Hadoop Master/Slave Architecture
5. Hadoop Core Components
6. HDFS Data Blocks
7. HDFS Read/Write Mechanism
8. What is MapReduce
9. MapReduce Program
10. MapReduce Job Workflow
11. Hadoop Ecosystem
12. Hadoop Use Case: Analyzing Olympic Dataset
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
ODAM is an Experiment Data Table Management System (EDTMS) that gives you an open access to your data and make them ready to be mined - A data explorer as bonus
A Common Database Approach for OLTP and OLAP Using an In-Memory Column DatabaseIshara Amarasekera
This presentation was prepared by Ishara Amarasekera based on the paper, A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database by Hasso Plattner.
This presentation contains a summary of the content provided in this research paper and was presented as a paper discussion for the course, Advanced Database Systems in Computer Science.
OPTIMIZATION OF MULTIPLE CORRELATED QUERIES BY DETECTING SIMILAR DATA SOURCE ...Puneet Kansal
This research article experimentally shows, how Multiple Queries can be provided to Hive and their execution can be reduced by searching common expression and common data source. The TPC-H queries are used for demonstration and test data is generated in variation of 2 GB, 5 GB and 10 GB using DBGEN software. Technology used in this is HADOOP and Hive. HADOOP is configured in Single user over ubunto Operating system.
Fully featured, commercially supported machine learning suites that can build Decision Trees in Hadoop are few and far between. Addressing this gap, Revolution Analytics recently enhanced its entire scalable analytics suite to run in Hadoop. In this talk, I will explain how our Decision Tree implementation exploits recent research reducing the computational complexity of decision tree estimation, allowing linear scalability with data size and number of nodes. This streaming algorithm processes data in chunks, allowing scaling unconstrained by aggregate cluster memory. The implementation supports both classification and regression and is fully integrated with the R statistical language and the rest of our advanced analytics and machine learning algorithms, as well as our interactive Decision Tree visualizer.
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Edureka!
This Edureka "Hadoop tutorial For Beginners" ( Hadoop Blog series: https://goo.gl/LFesy8 ) will help you to understand the problem with traditional system while processing Big Data and how Hadoop solves it. This tutorial will provide you a comprehensive idea about HDFS and YARN along with their architecture that has been explained in a very simple manner using examples and practical demonstration. At the end, you will get to know how to analyze Olympic data set using Hadoop and gain useful insights.
Below are the topics covered in this tutorial:
1. Big Data Growth Drivers
2. What is Big Data?
3. Hadoop Introduction
4. Hadoop Master/Slave Architecture
5. Hadoop Core Components
6. HDFS Data Blocks
7. HDFS Read/Write Mechanism
8. What is MapReduce
9. MapReduce Program
10. MapReduce Job Workflow
11. Hadoop Ecosystem
12. Hadoop Use Case: Analyzing Olympic Dataset
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
ETL stands for extract, transform, and load and is a traditionally accepted way for organizations to combine data from multiple systems into a single database, data store, data warehouse, or data lake.
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
Presented at SplunkLive! Frankfurt 2018:
Splunk Data Collection Architecture
Apps and Technology Add-ons
Demos / Examples
Best Practices
Resources and Q&A
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 3(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
Apresentação sobre os métodos aplicados no processo de ETL, aprofundando sobre os métodos CDC que são utilizados em ETL de DataWarehouse de Tempo Real.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Planning Of Procurement o different goods and services
Lecture 16
1. Ahsan AbdullahAhsan Abdullah
11
Data WarehousingData Warehousing
Lecture-16Lecture-16
Extract Transform Load (ETL)Extract Transform Load (ETL)
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan1010@yahoo.com
3. Ahsan Abdullah
3
Data Warehouse Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
Data
Warehouse
Operational
Data Bases
Semistructured
Sources
MOLAP
ROLAP
Query/Reporting
Data Marts Tools
Meta
Data
Data sources
Data
(Tier 0)
IT
Users
Business
Users
Business Users
Data Mining
Archived
data
Analysis
ExtractExtract
TransformTransform
LoadLoad
(ETL)(ETL)
www data
Putting the pieces togetherPutting the pieces together
{Comment: All except ETL washed out look}
4. Ahsan Abdullah
4
The ETL CycleThe ETL Cycle
EEXTRACTXTRACT
The process of reading
data from different
sources.
TTRANSFORMRANSFORM
The process of transforming the
extracted data from its original
state into a consistent state so
that it can be placed into
another database.
LLOADOAD
The process of writing
the data into the target
source.
TRANSFORM CLEANSE
LOAD
Data Warehouse
OLAP
TemporaryTemporary
Data storageData storage
EXTRACT
MIS Systems
(Acct, HR)
Legacy
Systems
Other indigenous applications
(COBOL, VB, C++, Java)
Archived data
www data
5. Ahsan Abdullah
5
ETL ProcessingETL Processing
ExtractsExtracts
fromfrom
sourcesource
systemssystems
DataData
MovementMovement
DataData
Transfor-Transfor-
mationmation
DataData
LoadingLoading
IndexIndex
Mainte-Mainte-
nancenance
StatisticsStatistics
CollectioCollectio
DataData
CleansingCleansing
ETL is independent yet interrelated steps.
It is important to look at the big picture.
Data acquisition time may include…
Backup
Back-up is a major task, its a DWH not a cubeBack-up is a major task, its a DWH not a cube
Note: Backup will come as other
elements after “Statistical collection”
6. Ahsan Abdullah
6
Overview of Data ExtractionOverview of Data Extraction
First step of ETL, followed by many.
Source system for extraction are typically OLTP
systems.
A very complex task due to number of reasons:
Very complex and poorly documented source system.
Data has to be extracted not once, but number of times.
The process design is dependent on:
Which extraction method to choose?
How to make available extracted data for further
processing?
7. Ahsan Abdullah
7
Types of Data ExtractionTypes of Data Extraction
Logical Extraction
Full Extraction
Incremental Extraction
Physical Extraction
Online Extraction
Offline Extraction
Legacy vs. OLTP
8. Ahsan Abdullah
8
Logical Data ExtractionLogical Data Extraction
Full Extraction
The data extracted completely from the source system.
No need to keep track of changes.
Source data made available as-is with any additional
information.
Incremental Extraction
Data extracted after a well defined point/event in time.
Mechanism used to reflect/record the temporal changes in data
(column or table).
Sometimes entire tables off-loaded from source system into the
DWH.
Can have significant performance impacts on the data
warehouse server.
9. Ahsan Abdullah
9
Physical Data Extraction…Physical Data Extraction…
Online Extraction
Data extracted directly from the source system.
May access source tables through an intermediate system.
Intermediate system usually similar to the source system.
Offline Extraction
Data NOT extracted directly from the source system, instead staged
explicitly outside the original source system.
Data is either already structured or was created by an extraction
routine.
Some of the prevalent structures are:
Flat files
Dump files
Redo and archive logs
Transportable table-spaces
10. Ahsan Abdullah
10
Physical Data ExtractionPhysical Data Extraction
Legacy vs. OLTP
Data moved from the source system
Copy made of the source system data
Staging area used for performance reasons
15. Ahsan Abdullah
15
Data Transformation Basic Tasks: ConversionData Transformation Basic Tasks: Conversion
Example-1Example-1
Convert common data elements into a consistent
form i.e. name and address.
Translation of dissimilar codes into a standard
code.
Field formatField format Field dataField data
First-Family-title Muhammad Ibrahim Contractor
Family-title-comma-first Ibrahim Contractor, Muhammad
Family-comma-first-title Ibrahim, Muhammad Contractor
Natl. ID NID
National ID NID
F/NO-2
F-2
FL.NO.2
FL.2
FL/NO.2
FL-2
FLAT-2
FLAT#
FLAT,2
FLAT-NO-2
FL-NO.2
FLAT No. 2
16. Ahsan Abdullah
16
Data representation change
EBCIDIC to ASCII
Operating System Change
Mainframe (MVS) to UNIX
UNIX to NT or XP
Data type change
Program (Excel to Access), database format (FoxPro to
Access).
Character, numeric and date type.
Fixed and variable length.
Data Transformation Basic Tasks: ConversionData Transformation Basic Tasks: Conversion
Example-2Example-2
19. Ahsan Abdullah
19
Data Transformation Basic Tasks: EnrichmentData Transformation Basic Tasks: Enrichment
ExampleExample
Data elements are mapped from source tables
and files to destination fact and dimension tables.
Default values are used in the absence of source
data.
Fields are added for unique keys and time
elements.
Input DataInput Data
HAJI MUHAMMAD IBRAHIM, GOVT. CONT.
K. S. ABDULLAH & BROTHERS,
MAMOOJI ROAD, ABDULLAH MANZIL
RAWALPINDI, Ph 67855
Parsed DataParsed Data
First Name: HAJI MUHAMMAD
Family Name: IBRAHIM
Title: GOVT. CONT.
Firm: K. S. ABDULLAH &
BROTHERS
Firm Location: ABDULLAH MANZIL
Road: MAMOOJI ROAD
Phone: 051-67855
City: RAWALPINDI
Code: 46200
20. Ahsan Abdullah
20
Aspects of Data Loading StrategiesAspects of Data Loading Strategies
Need to look at:
Data freshness
System performance
Data volatility
Data Freshness
Very fresh low update efficiency
Historical data, high update efficiency
Always trade-offs in the light of goals
System performance
Availability of staging table space
Impact on query workload
Data Volatility
Ratio of new to historical data
High percentages of data change (batch update)
21. Ahsan Abdullah
21
Three Loading StrategiesThree Loading Strategies
Once we have transformed data, there are threeOnce we have transformed data, there are three
primary loading strategies:primary loading strategies:
Full data refreshFull data refresh with BLOCK INSERT or ‘blockwith BLOCK INSERT or ‘block
slamming’ into empty table.slamming’ into empty table.
Incremental data refreshIncremental data refresh with BLOCK INSERT orwith BLOCK INSERT or
‘block slamming’ into existing (populated) tables.‘block slamming’ into existing (populated) tables.
Trickle/continuous feedTrickle/continuous feed with constant datawith constant data
collection and loading using row level insert andcollection and loading using row level insert and
update operations.update operations.