SlideShare a Scribd company logo
1 of 23
ME-438
AI AND INTERNET OF THINGS
ELECTIVE COURSE
NED University of Engineering & Technology
1
THIS WEEK
 Data Acquisition in Machine Learning
 Data Acquisition Techniques and Tools
AI and Internet of Things
DR. HAIDER ALI 2
DATA ACQUISITION IN MACHINE LEARNING
AI and Internet of Things
DR. HAIDER ALI 3
DATA ACQUISITION
AI and Internet of Things
DR. HAIDER ALI 4
“Data acquisition is the process of sampling signals that
measure real-world physical conditions and converting the
resulting samples into digital numeric values that a computer
can manipulate.”
LIFE-CYCLE OF A MACHINE LEARNING PROJECT
The life-cycle of a Machine Learning project follows:
1. Defining the project objective: Identifying the business problem, converting it into a
statistical problem, and then to the optimization problem
2. Data Acquisition or Collection: Acquiring and merging the data from all the appropriate
sources
3. Data Exploration and Pre-processing: Cleaning and preprocessing the data to create
homogeneity, performing exploratory data analysis and statistical analysis to understand the
relationships between the variables.
4. Feature Engineering: Create new features based on empirical relationships and select
significant variables using dimension reductional techniques.
AI and Internet of Things
DR. HAIDER ALI 5
LIFE-CYCLE OF A MACHINE LEARNING PROJECT
5. Model Building: Training the dataset and building the model by selecting the appropriate ML
algorithms to identify the patterns.
6. Execution & Model Validation: Implementation of the model and validating the model such
as validating and fine-tuning the parameters.
7. Deployment: is the representation of business-usable results of the ML process — models
are deployed to enterprise apps, systems, and data stores.
8. Interpretation, Data Visualization, and Documentation: Interpreting, visualizing, and
communicating the model insights. Documenting the modeling process for reproducibility and
creating the model monitoring and maintenance plan.
AI and Internet of Things
DR. HAIDER ALI 6
AI and Internet of Things
DR. HAIDER ALI 7
DATA ACQUISITION IN MACHINE LEARNING
 Collection and Integration of the data
 Formatting
 Labeling
AI and Internet of Things
DR. HAIDER ALI 8
COLLECTION AND INTEGRATION OF THE DATA
The data is extracted from various sources and also the data is
usually available at different places so multiple data need to be
combined to be used. The data acquired is typically in raw format
and not suitable for immediate consumption and analysis.
AI and Internet of Things
DR. HAIDER ALI 9
FORMATTING
 Prepare or organize the datasets as per the analysis requirements.
AI and Internet of Things
DR. HAIDER ALI 10
LABELING
 After gathering data, it is required to label the data. One such
instance is in an application factory, one would want to label the
images of the components if the components are defective or not.
THE DATA ACQUISITION PROCESS
The process of data acquisition involves searching for the datasets that
can be used to train the Machine Learning models. Having said that, it is
not simple. There are various approaches to acquiring data, here have
bucketed into three main segments such as:
 Data Discovery
 Data Augmentation
 Data Generation
AI and Internet of Things
DR. HAIDER ALI 11
DATA DISCOVERY
The first approach to acquiring data is Data discovery. It is a
key step when indexing, sharing, and searching for new
datasets available on the web and incorporating data lakes.
It can be broken into two steps: Searching and Sharing.
Firstly, the data must be labeled or indexed and published
for sharing using many available collaborative systems for
this purpose.
AI and Internet of Things
DR. HAIDER ALI 12
DATA AUGMENTATION
The next approach for data acquisition is Data
augmentation. Augment means to make something greater
by adding to it, so here in the context of data acquisition, we
are essentially enriching the existing data by adding more
external data. In Deep and Machine learning, using pre-
trained models and embeddings is common to increase the
features to train on.
AI and Internet of Things
DR. HAIDER ALI 13
AI and Internet of Things
DR. HAIDER ALI 14
AI and Internet of Things
DR. HAIDER ALI 15
DATA GENERATION
As the name suggests, the data is generated. If we do not have enough and
any external data is not available, the option is to generate the datasets
manually or automatically.
Instead of collecting and labeling large datasets, there are several techniques
for generating synthetic data that has similar properties to real data. Synthetic
data has major advantages, including reduced cost, higher accuracy in data
labeling (because the labels in synthetic data are already known), scalability (it
is easy to create vast amounts of simulated data), and variety. Synthetic data
can be used to create data samples for edge cases that do not frequently occur
in the real world.
AI and Internet of Things
DR. HAIDER ALI 16
DATA ACQUISITION TECHNIQUES AND TOOLS
AI and Internet of Things
DR. HAIDER ALI 17
DATA ACQUISITION TECHNIQUES AND TOOLS
The major tools and techniques for data acquisition are:
1.Data Warehouses and ETL
2.Data Lakes and ELT
3.Cloud Data Warehouse providers
AI and Internet of Things
DR. HAIDER ALI 18
DATA WAREHOUSES AND ETL
DR. HAIDER ALI AI and Internet of Things 19
DATA WAREHOUSES AND ETL
A data warehouse is a type of database that is used for storing and managing
large amounts of data. It is designed to facilitate the process of querying and
analyzing data, and is often used by organizations to support business
intelligence and decision-making activities. Data warehouses typically store data
from multiple sources, such as operational databases, transactional systems,
and external sources, and are designed to support the efficient execution of
complex queries and analysis. This allows organizations to gain insights into
their data and make informed decisions based on that information.
AI and Internet of Things
DR. HAIDER ALI 20
DATA LAKES AND ELT
A data lake is a storage repository having the capacity to store large amounts of
data, including structured, semi-structured, and unstructured data. It can store
images, videos, audio, sound records, and PDF files. It helps for faster ingestion
of new data.
Unlike data warehouses, data lakes store everything, are more flexible, and
follow the Extract, Load, and Transform (ELT) approach. The data is first loaded
and not transformed until required to transform. Therefore, the data is processed
later as per the requirements.
AI and Internet of Things
DR. HAIDER ALI 21
CLOUD DATA WAREHOUSE PROVIDERS
A cloud data warehouse is another service that collects, organizes, and
stores data. Cloud data warehouses are quicker and cheaper to set up as
no physical hardware needs to be procured.
• Amazon Redshift
• Snowflake
• Google BigQuery
• IBM Db2 Warehouse
• Microsoft Azure Synapse
• Oracle Autonomous Data Warehouse
• SAP Data Warehouse Cloud
• Yellowbrick Data
• Teradata Integrated Data Warehouse
DR. HAIDER ALI
AI and Internet of
Things 22
THANK YOU
DR. HAIDER ALI AI and Internet of Things 23

More Related Content

Similar to ML Data Acquisition Techniques

Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxSourabhkumar729579
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022Kavika Roy
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptxRupaliKute3
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefLindy-Anne Botha
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousingumesh patil
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Types of Data Engineering Services - By DataToBiz
Types of Data Engineering Services - By DataToBizTypes of Data Engineering Services - By DataToBiz
Types of Data Engineering Services - By DataToBizKavika Roy
 
Introduction To Data Science
Introduction To Data Science Introduction To Data Science
Introduction To Data Science PriyaMaurya52
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 

Similar to ML Data Acquisition Techniques (20)

Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
data collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptxdata collection, data integration, data management, data modeling.pptx
data collection, data integration, data management, data modeling.pptx
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
Big data
Big dataBig data
Big data
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 
Big data
Big dataBig data
Big data
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Types of Data Engineering Services - By DataToBiz
Types of Data Engineering Services - By DataToBizTypes of Data Engineering Services - By DataToBiz
Types of Data Engineering Services - By DataToBiz
 
Introduction To Data Science
Introduction To Data Science Introduction To Data Science
Introduction To Data Science
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 

Recently uploaded

Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 

Recently uploaded (20)

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 

ML Data Acquisition Techniques

  • 1. ME-438 AI AND INTERNET OF THINGS ELECTIVE COURSE NED University of Engineering & Technology 1
  • 2. THIS WEEK  Data Acquisition in Machine Learning  Data Acquisition Techniques and Tools AI and Internet of Things DR. HAIDER ALI 2
  • 3. DATA ACQUISITION IN MACHINE LEARNING AI and Internet of Things DR. HAIDER ALI 3
  • 4. DATA ACQUISITION AI and Internet of Things DR. HAIDER ALI 4 “Data acquisition is the process of sampling signals that measure real-world physical conditions and converting the resulting samples into digital numeric values that a computer can manipulate.”
  • 5. LIFE-CYCLE OF A MACHINE LEARNING PROJECT The life-cycle of a Machine Learning project follows: 1. Defining the project objective: Identifying the business problem, converting it into a statistical problem, and then to the optimization problem 2. Data Acquisition or Collection: Acquiring and merging the data from all the appropriate sources 3. Data Exploration and Pre-processing: Cleaning and preprocessing the data to create homogeneity, performing exploratory data analysis and statistical analysis to understand the relationships between the variables. 4. Feature Engineering: Create new features based on empirical relationships and select significant variables using dimension reductional techniques. AI and Internet of Things DR. HAIDER ALI 5
  • 6. LIFE-CYCLE OF A MACHINE LEARNING PROJECT 5. Model Building: Training the dataset and building the model by selecting the appropriate ML algorithms to identify the patterns. 6. Execution & Model Validation: Implementation of the model and validating the model such as validating and fine-tuning the parameters. 7. Deployment: is the representation of business-usable results of the ML process — models are deployed to enterprise apps, systems, and data stores. 8. Interpretation, Data Visualization, and Documentation: Interpreting, visualizing, and communicating the model insights. Documenting the modeling process for reproducibility and creating the model monitoring and maintenance plan. AI and Internet of Things DR. HAIDER ALI 6
  • 7. AI and Internet of Things DR. HAIDER ALI 7
  • 8. DATA ACQUISITION IN MACHINE LEARNING  Collection and Integration of the data  Formatting  Labeling AI and Internet of Things DR. HAIDER ALI 8
  • 9. COLLECTION AND INTEGRATION OF THE DATA The data is extracted from various sources and also the data is usually available at different places so multiple data need to be combined to be used. The data acquired is typically in raw format and not suitable for immediate consumption and analysis. AI and Internet of Things DR. HAIDER ALI 9
  • 10. FORMATTING  Prepare or organize the datasets as per the analysis requirements. AI and Internet of Things DR. HAIDER ALI 10 LABELING  After gathering data, it is required to label the data. One such instance is in an application factory, one would want to label the images of the components if the components are defective or not.
  • 11. THE DATA ACQUISITION PROCESS The process of data acquisition involves searching for the datasets that can be used to train the Machine Learning models. Having said that, it is not simple. There are various approaches to acquiring data, here have bucketed into three main segments such as:  Data Discovery  Data Augmentation  Data Generation AI and Internet of Things DR. HAIDER ALI 11
  • 12. DATA DISCOVERY The first approach to acquiring data is Data discovery. It is a key step when indexing, sharing, and searching for new datasets available on the web and incorporating data lakes. It can be broken into two steps: Searching and Sharing. Firstly, the data must be labeled or indexed and published for sharing using many available collaborative systems for this purpose. AI and Internet of Things DR. HAIDER ALI 12
  • 13. DATA AUGMENTATION The next approach for data acquisition is Data augmentation. Augment means to make something greater by adding to it, so here in the context of data acquisition, we are essentially enriching the existing data by adding more external data. In Deep and Machine learning, using pre- trained models and embeddings is common to increase the features to train on. AI and Internet of Things DR. HAIDER ALI 13
  • 14. AI and Internet of Things DR. HAIDER ALI 14
  • 15. AI and Internet of Things DR. HAIDER ALI 15
  • 16. DATA GENERATION As the name suggests, the data is generated. If we do not have enough and any external data is not available, the option is to generate the datasets manually or automatically. Instead of collecting and labeling large datasets, there are several techniques for generating synthetic data that has similar properties to real data. Synthetic data has major advantages, including reduced cost, higher accuracy in data labeling (because the labels in synthetic data are already known), scalability (it is easy to create vast amounts of simulated data), and variety. Synthetic data can be used to create data samples for edge cases that do not frequently occur in the real world. AI and Internet of Things DR. HAIDER ALI 16
  • 17. DATA ACQUISITION TECHNIQUES AND TOOLS AI and Internet of Things DR. HAIDER ALI 17
  • 18. DATA ACQUISITION TECHNIQUES AND TOOLS The major tools and techniques for data acquisition are: 1.Data Warehouses and ETL 2.Data Lakes and ELT 3.Cloud Data Warehouse providers AI and Internet of Things DR. HAIDER ALI 18
  • 19. DATA WAREHOUSES AND ETL DR. HAIDER ALI AI and Internet of Things 19
  • 20. DATA WAREHOUSES AND ETL A data warehouse is a type of database that is used for storing and managing large amounts of data. It is designed to facilitate the process of querying and analyzing data, and is often used by organizations to support business intelligence and decision-making activities. Data warehouses typically store data from multiple sources, such as operational databases, transactional systems, and external sources, and are designed to support the efficient execution of complex queries and analysis. This allows organizations to gain insights into their data and make informed decisions based on that information. AI and Internet of Things DR. HAIDER ALI 20
  • 21. DATA LAKES AND ELT A data lake is a storage repository having the capacity to store large amounts of data, including structured, semi-structured, and unstructured data. It can store images, videos, audio, sound records, and PDF files. It helps for faster ingestion of new data. Unlike data warehouses, data lakes store everything, are more flexible, and follow the Extract, Load, and Transform (ELT) approach. The data is first loaded and not transformed until required to transform. Therefore, the data is processed later as per the requirements. AI and Internet of Things DR. HAIDER ALI 21
  • 22. CLOUD DATA WAREHOUSE PROVIDERS A cloud data warehouse is another service that collects, organizes, and stores data. Cloud data warehouses are quicker and cheaper to set up as no physical hardware needs to be procured. • Amazon Redshift • Snowflake • Google BigQuery • IBM Db2 Warehouse • Microsoft Azure Synapse • Oracle Autonomous Data Warehouse • SAP Data Warehouse Cloud • Yellowbrick Data • Teradata Integrated Data Warehouse DR. HAIDER ALI AI and Internet of Things 22
  • 23. THANK YOU DR. HAIDER ALI AI and Internet of Things 23