SlideShare a Scribd company logo
Data Warehousing and
Machine Learning
Tom Donoghue
Data Warehouse
• Business user friendly stories about past events (including near time)
• Designed to support decision making
• Serves a digest of answers in grouped and aggregated ways
• More meaningful and therefore more important to the business
• Ingests data from disparate sources which need to be merged to
enable business friendly queries
Data Warehouse Definition
• A consolidating bolt-on to existing operational systems
• Structured data associated with a specific user base and a specific set of
predefined business queries
• The data schema is predefined and structured to facilitate regular and ad-
hoc queries
• Populating the data warehouse requires multiple ETL processes designed in
advance
• Halts the proliferation of reports
O'Leary (2014)
Data Warehouse Basic Architecture
ETL Staging Area
Source
Data
Data
Warehouse
Business Users
Source
Data
Source
Data
Operational Data Soures Data Preparation Business Queries
Data Warehouse Requirements
• Organisational Data is easy to access
• Information is presented consistently
• Adaptive and resilient to change
• Secure
• Serves as a base for improved decision making
• Accepted by the business community
(Kimball, 2002)
Machine Learning
• A Data warehouse provides historic information for decision making
• Machine Learning uses algorithms to process features in the data to
learns patterns, make predictions and solution outcomes
• Image recognition, Classification, Forecasting, Anomaly detection
• Learning is Supervised (labelled with the desired outcome) or
Unsupervised (unlabelled, the model learns unaided)
Machine Learning - Supervised
• A predictive model is trained using a labelled training data set and the
outcome evaluated on its performance
• The model is tweaked to improve performance
• The model is then run against a test data set which is unlabelled and
evaluated on its performance in identifying the correct label
• Examples:
• k-Nearest Neighbours
• Linear and Logistical Regression
• Decision Trees
• Support Vector Machines
(Lantz, 2015)
Machine Learning - Unsupervised
• The training data set is unlabelled
• The descriptive model is trained and evaluated on its performance
• Examples:
• Clustering - k-Means
• Association Rules
• Natural Language Processing
(Lantz, 2015)
Machine Learning an Extension to Data
Warehousing
• Much of the hard work to cleanse and transform data has been
accomplished
• Ask the Business Question – what is the objective? Is it descriptive or
predictive?
• Does the data contain the desired features?
• Is further data transformation required
• Which ML algorithm is optimal for answering the question?
• Iterative approach assessing and evaluating model(s) performance
• Present the Solution
References
• Kimball, R., Ross, M., Thornthwaite, W., Mundy. J and Becker, B. (2008) The data warehouse lifecycle toolkit.
2nd ed. Indianapolis: Wiley Publishing, Inc.
• Lantz, B. (2015). Machine Learning with R, 2nd edn, Birmingham: Packt.
• O'Leary, D. E. (2014), ‘Embedding AI and Crowdsourcing in the Big Data Lake’, IEEE Intelligent Systems,
Volume 29, Issue 5, pp. 70-73.

More Related Content

What's hot

Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
Sarita Kataria
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli
truongthuthuy47
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
Sana Alvi
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Edureka!
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
Kavisha Uniyal
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
Ranak Ghosh
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
Bahria University ,
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
Radhika Kotecha
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
Dendej Sawarnkatat
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
gulab sharma
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
ukc4
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
ethantelaviv
 
Data warehouse Project Report
Data warehouse Project ReportData warehouse Project Report
Data warehouse Project Report
Himanshu Yadav
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
Prithwis Mukerjee
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
phanleson
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
Satya P. Joshi
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
adivasoft
 

What's hot (20)

Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Data warehouse Project Report
Data warehouse Project ReportData warehouse Project Report
Data warehouse Project Report
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 

Similar to Data warehousing and machine learning primer

Bi 5
Bi 5Bi 5
Bi 5
shivz3
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
Antonios Chatzipavlis
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
Data Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianData Governance Overview - Doreen Christian
Data Governance Overview - Doreen Christian
Doreen Christian
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
RahulSingh986955
 
Spatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FMESpatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FME
Safe Software
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
Y Parandama Reddy
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
Murli Jha
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Shwetabh Jaiswal
 
Unit 2
Unit 2Unit 2
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
Shwetabh Jaiswal
 
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
TEST Huddle
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
Sunderland City Council
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
Barry Leventhal
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
Lesa Cote
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
karanamlakshminarasa
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Technologies & Computers
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
Vibrant Event
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Event
 

Similar to Data warehousing and machine learning primer (20)

Bi 5
Bi 5Bi 5
Bi 5
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Data Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianData Governance Overview - Doreen Christian
Data Governance Overview - Doreen Christian
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Spatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FMESpatial Network Inc. Data Management and Transformation with FME
Spatial Network Inc. Data Management and Transformation with FME
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Unit 2
Unit 2Unit 2
Unit 2
 
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
 
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 

More from Tom Donoghue

Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime Analysis
Tom Donoghue
 
The Prepared Executive: A Linguistic Exploration
The Prepared Executive: A Linguistic ExplorationThe Prepared Executive: A Linguistic Exploration
The Prepared Executive: A Linguistic Exploration
Tom Donoghue
 
Crime Analysis using Regression and ANOVA
Crime Analysis using Regression and ANOVACrime Analysis using Regression and ANOVA
Crime Analysis using Regression and ANOVA
Tom Donoghue
 
Exploration of Call Transcripts with MapReduce and Zipf’s Law
Exploration of Call Transcripts with MapReduce and Zipf’s LawExploration of Call Transcripts with MapReduce and Zipf’s Law
Exploration of Call Transcripts with MapReduce and Zipf’s Law
Tom Donoghue
 
Internet of Things (IoT) in the Fog
Internet of Things (IoT) in the FogInternet of Things (IoT) in the Fog
Internet of Things (IoT) in the Fog
Tom Donoghue
 
Data Lakes versus Data Warehouses
Data Lakes versus Data WarehousesData Lakes versus Data Warehouses
Data Lakes versus Data Warehouses
Tom Donoghue
 

More from Tom Donoghue (6)

Chicago Crime Analysis
Chicago Crime AnalysisChicago Crime Analysis
Chicago Crime Analysis
 
The Prepared Executive: A Linguistic Exploration
The Prepared Executive: A Linguistic ExplorationThe Prepared Executive: A Linguistic Exploration
The Prepared Executive: A Linguistic Exploration
 
Crime Analysis using Regression and ANOVA
Crime Analysis using Regression and ANOVACrime Analysis using Regression and ANOVA
Crime Analysis using Regression and ANOVA
 
Exploration of Call Transcripts with MapReduce and Zipf’s Law
Exploration of Call Transcripts with MapReduce and Zipf’s LawExploration of Call Transcripts with MapReduce and Zipf’s Law
Exploration of Call Transcripts with MapReduce and Zipf’s Law
 
Internet of Things (IoT) in the Fog
Internet of Things (IoT) in the FogInternet of Things (IoT) in the Fog
Internet of Things (IoT) in the Fog
 
Data Lakes versus Data Warehouses
Data Lakes versus Data WarehousesData Lakes versus Data Warehouses
Data Lakes versus Data Warehouses
 

Recently uploaded

社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 

Recently uploaded (20)

社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 

Data warehousing and machine learning primer

  • 1. Data Warehousing and Machine Learning Tom Donoghue
  • 2. Data Warehouse • Business user friendly stories about past events (including near time) • Designed to support decision making • Serves a digest of answers in grouped and aggregated ways • More meaningful and therefore more important to the business • Ingests data from disparate sources which need to be merged to enable business friendly queries
  • 3. Data Warehouse Definition • A consolidating bolt-on to existing operational systems • Structured data associated with a specific user base and a specific set of predefined business queries • The data schema is predefined and structured to facilitate regular and ad- hoc queries • Populating the data warehouse requires multiple ETL processes designed in advance • Halts the proliferation of reports O'Leary (2014)
  • 4. Data Warehouse Basic Architecture ETL Staging Area Source Data Data Warehouse Business Users Source Data Source Data Operational Data Soures Data Preparation Business Queries
  • 5. Data Warehouse Requirements • Organisational Data is easy to access • Information is presented consistently • Adaptive and resilient to change • Secure • Serves as a base for improved decision making • Accepted by the business community (Kimball, 2002)
  • 6. Machine Learning • A Data warehouse provides historic information for decision making • Machine Learning uses algorithms to process features in the data to learns patterns, make predictions and solution outcomes • Image recognition, Classification, Forecasting, Anomaly detection • Learning is Supervised (labelled with the desired outcome) or Unsupervised (unlabelled, the model learns unaided)
  • 7. Machine Learning - Supervised • A predictive model is trained using a labelled training data set and the outcome evaluated on its performance • The model is tweaked to improve performance • The model is then run against a test data set which is unlabelled and evaluated on its performance in identifying the correct label • Examples: • k-Nearest Neighbours • Linear and Logistical Regression • Decision Trees • Support Vector Machines (Lantz, 2015)
  • 8. Machine Learning - Unsupervised • The training data set is unlabelled • The descriptive model is trained and evaluated on its performance • Examples: • Clustering - k-Means • Association Rules • Natural Language Processing (Lantz, 2015)
  • 9. Machine Learning an Extension to Data Warehousing • Much of the hard work to cleanse and transform data has been accomplished • Ask the Business Question – what is the objective? Is it descriptive or predictive? • Does the data contain the desired features? • Is further data transformation required • Which ML algorithm is optimal for answering the question? • Iterative approach assessing and evaluating model(s) performance • Present the Solution
  • 10. References • Kimball, R., Ross, M., Thornthwaite, W., Mundy. J and Becker, B. (2008) The data warehouse lifecycle toolkit. 2nd ed. Indianapolis: Wiley Publishing, Inc. • Lantz, B. (2015). Machine Learning with R, 2nd edn, Birmingham: Packt. • O'Leary, D. E. (2014), ‘Embedding AI and Crowdsourcing in the Big Data Lake’, IEEE Intelligent Systems, Volume 29, Issue 5, pp. 70-73.