Unit 2: AI Project Cycle
Class IX
Unit Sub-unit Details
Unit – 2
AI Project Cycle
Introduction Introduction to AI Project Cycle
Problem Scoping Understanding Problem Scoping and
Sustainable Development Goals
Data Acquisition Simplifying Data Acquisition
Data Exploration Visualising Data
Modelling Developing AI Models
Evaluation Proper Testing of AI Model
Suppose you are making a card for your mother’s
birthday. What steps you are going to follow?
1. Look for some cool greeting card ideas from different sources. You might go online and
checkout some videos or you may ask someone who has knowledge about it.
2. After finalising the design, you would make a list of things that are required to make
this card.
3. You will check if you have the material with you or not. If not, you could go and get all
the items required, ready for use.
4. Once you have everything with you, you would start making the card.
5. If you make a mistake in the card somewhere which cannot be rectified, you will
discard it and start remaking it.
6. Once the greeting card is made, you would gift it to your mother.
AI Project Cycle
AI Project Cycle provides us with an appropriate
framework which can lead us towards the goal of
our AI Project
https://www.youtube.com/watch?v=EZdpZEMPTe0
https://www.youtube.com/watch?v=V7QzWen9Odk
Problem Scoping
Identifying a problem and having a vision to solve
it, is what Problem Scoping is all about.
The 4Ws Problem Canvas
The 4Ws Problem canvas helps in identifying the key elements related to the problem.
Who canvas
Stakeholders are the people who face this problem and would
be benefitted with the solution.
What?
Where?
Why?
Problem Statement Template
Our [stakeholder(s)]
________________________
Who
Has/have a problem that
[issue, problem, need]
________________________
What
When/while
[context, situation]
________________________
Where
An ideal solution would
[benefit of solution for them]
________________________
Why
After filling the 4Ws Problem canvas, you now need to summarize all the cards into
one template. The Problem Statement Template helps us to summarize all the key
points into one single Template so that in future, whenever there is need to look back
at the basis of the problem, we can take a look at the Problem Statement Template and
understand the key elements of it.
Data Acquisition
What is data?
Data can be a piece of information or facts and statistics collected together for reference
or analysis. Whenever we want an AI project to be able to predict an output, we need to
train it first using data.
Example:
For example, If you want to make an Artificially Intelligent system which can predict the
salary of any employee based on his previous salaries, you would feed the data of his
previous salaries into the machine. This is the data with which the machine can be
trained. Now, once it is ready, it will predict his next salary efficiently. The previous salary
data here is known as Training Data while the next salary prediction data set is known
as the Testing Data.
For any AI project to be efficient, the training data should be authentic and relevant to
the problem statement scoped.
Data Features
Data features refer to the type of data you want
to collect. In our previous example, data features
would be salary amount, increment percentage,
increment period, bonus, etc.
Data Sources
Data Authenticity
Sometimes, you use the internet and try to acquire data for your project from
some random websites. Such data might not be authentic as its accuracy
cannot be proved. Due to this, it becomes necessary to find a reliable source
of data from where some authentic information can be taken. At the same
time, we should keep in mind that the data which we collect is open-sourced
and not someone’s property. Extracting private data can be an offence. One of
the most reliable and authentic sources of information, are the open-sourced
websites hosted by the government. These government portals have general
information collected in suitable format which can be downloaded and used
wisely.
Some of the open-sourced Govt. portals are: data.gov.in, india.gov.in
Structural Classification
Classification of data can also be done on the basic structure. Data from any source and in any form has a definite
structure. The only point of difference is the way the data is organised, i.e., if it has been organized according to some
predefined rules, ideas, or not.
Based on structures, data can be classified into three types:
Structured data: This type of data has a predefined data model, and it is organised in a predefined manner. Earlier,
structures of data were quite simple, and they were often known before the data model was designed, and therefore,
data was generally stored in a tabular form of relational databases. Train schedules, mark sheets of students from a
particular class are some of the common examples of this form of data.
Unstructured data: This type of data does not have any predefined structure. It can take any form. Most of the data
in the world exists in this form. Videos, audios, presentations, emails, documents, etc. are the best examples of the
unstructured data.
Semi-structured data: This type of data has a structure that has the qualities of both structured as well as
unstructured data. This data is not organised logically. Nevertheless, it has some sorts of markers and
tags that give it some sort of identifiable structure.
Other data classification
Time-stamped data: This type of data has time-order in it, which defines its sequence. The time-order can be
according to some event time, i.e., when the data was collected; or, processed. This data acquires real meaning with
behavioural data as it helps in forming an accurate representation of actions over some time. This data assists
scientists in making predictions based on selecting the next best action style models.
Machine data: Systems or programs mechanically generate this type of data. List of phone calls made the logging
details on a computer, data in emails, etc. are some of the examples of machine data. The importance of this data exists
in the fact that it contains valuable real-time, time-stamped data records of user behaviour, activities, or actions.
Spatiotemporal data: This type of data has both location and time information, i.e., the time when an event was
captured or collected along with the location where the event capture or collection took place.
Open data: This type of data is freely available for everyone to use. Open data is not restricted through copyrights,
patents, control, etc.
Real-time data: This type of data is available as soon as an event takes place.
BIG DATA
The term Big Data refers to the data that does not fit into the standard relational databases,
such as Oracle, SQL Server, MS Access, etc. The amount of big data is so large that the
traditional databases are unable to capture, manage, and process it
Basic features of big data
 It is continuously created by humans and machines.
 It includes structures, semi-structured, and unstructured data.
 It is collected from varying sources.
 Its size can vary from a few terabytes to zettabytes.
https://www.youtube.com/watch?v=bAyrObl7TYE
Importance of Big Data
Big data is integral from the point of view of Al.
 Machine learning depends on big data.
 Evaluation of big data allows us to identify patterns. It helps us understand the reasons for the
sequencing of certain things.
 Big data can help in making predictions and forming plans based on such predictions.
 Big data is used for finding answers, solving problems, and achieving goals.
 Big data helps in finding better and more complete answers.
 The more complete answers help in finding better and varying approaches for dealing with the same
problems.
Three Parameters of Big Data
Big data is generally defined based on three parameters. These parameters are also known as the three V’s of Big Data.
Volume: Big Data is characterised by high volume, low density, of unstructured data (though it can also contain
structured and semi-structured data). For example, Twitter data feed or data from sensor-enabled equipment. The
volume of this data can range from terabytes to petabytes.
Velocity: Velocity means the speed at which the data is received. This can range from traditional batches of data to
real-time data. These days, the internet-enabled devices, especially the Internet of Things, have made it possible to
receive large volumes of real-time data.
Variety: Traditionally, the data was structured data, i.e., it was possible to fit this data neatly into relational databases.
The big data, on the other hand, is primarily made up of unstructured and semi-structured data, which requires
additional processing before it can be used.
Data Exploration
Data Exploration refers to techniques and tools that are used for identifying important patterns and trends.
To analyse the data, you need to visualise it in some user-friendly format so that you can:
• Quickly get a sense of the trends, relationships and patterns contained within the data.
• Define strategy for which model to use at a later stage.
• Communicate the same to others effectively. To visualise data, we can use various types of visual
representations.
https://www.youtube.com/watch?v=QE7bC9kLptk
Type of Data weak AI systems
For a better understanding of Data Exploration, we need to understand the types of data weak Al
systems that are commonly used today:
1. Heuristics or rule-based: User-defined rules are used in these systems for making selections, e.g. a
system that groups individuals on demographics basis.
2. Brute force: These systems use decision trees for analysing every possible option. The Al-based
chess games use these systems for analysing every possible move to find out the best approach. The
brute force systems can only be effective in one thing.
3. Neural networks: These systems are designed to mimic our brain. This is also known as Deep
Learning. In this, there is a depth of layers in the network. Such systems can improve their learning
based on the data feedback. These systems need a substantial amount of data for learning, and
learning one thing does not help them to learn the other things.
Visualising Data
Data visualisation is a standard term that we can use for any graphic that helps us in understanding or
getting new insights from the data. The two most basic Data Visualisation forms are graphs and charts.
The visualisation becomes difficult when the data we receive is complex. The first step in simplifying
this complexity deals with segregating the parameters from the features of these parameters. The
parameters acquire the form of the elements of the data, and the features become the characteristics of
these elements .
The second step for its simplification is evaluating the different parameters and finding out the
parameters that are put the most impact on the project goals or objectives. This is called a
classification strategy. This is similar to the human brain, which can focus on only a few of the
essential parameters while handling multiple classification tasks subconsciously.
https://www.highcharts.com/
https://www.tableau.com/
https://www.youtube.com/watch?v=YaGqOPxHFkc
Data Visualisation Tools
Microsoft Excel: Microsoft Excel provides a variety of graphs and smart objects for data visualisations. Nevertheless, it is
limited by the fact that it requires structured data for working. These tools will not work with big data
Tableau: This software is regarded as the grandmaster of data visualisation. It has a customer base of nearly 60,000
accounts on the last count. This is simple to use and can create interactive visualisation, which is unmatched by its rivals. It
is capable of handling large, frequently updated data sets.
QlikView: This is the second major player in the data visualisation market. It boasts of more than 40,000 customers in more
than 100 countries. QlikView is known for its capability of customisation and extensive features.
FusionCharts: This is a charting and data visualising software, which uses JavaScript. It is capable of producing nearly 100
different types of charts.
Datawrapper: This is one of the favourites of the media industry. Its strength lies in its ability to use CSV data for creating
charts and maps.
Microsoft Power Bl: Power BI is a data visualisation tool offered by Microsoft. It has a free version that can be downloaded
and used by anyone.
Google Data Studio: This is a data visualisation tool offered by Google, and it is a part of the Google Marketing Platform.
https://www.youtube.com/watch?v=MiiANxRHSv4
https://www.youtube.com/watch?v=YaGqOPxHFkc
Modelling
Modelling is the process in which different models based on the visualized data can
be created and even checked for the advantages and disadvantages of the model.
• To Make a machine learning model there are 2 ways/Approaches Learning Based
Approach and Rule Based Approach.
• Learning Based Approach is based on Machine learning experiance with the data
feeded.
• Machine Learning
• Machine learning is a subset of artificial Intelligence (AI) whcih provides
machines the ability to learn automatically and improve from experience without
being programmed for it.
Base Training Set Testing Set
Use
Used for Training the
Model
Used for Testing the
Model after it is trained
Size
Is allot bigger than
testing data and
constitutes about 70% to
80%
It is smaller than Training
Set and constitutes
about 20% to 30%
Rule Based Approach
Datasets
Dataset is a collection of related sets of Information that is composed of separate elements but can be manipulated
by a computer as a unit.
In Rule based Approach we will deal with 2 divisions of dataset:
1. Training Data - A subset required to train the model
2. Testing Data - A subset required while testing the trained the model
Rule Based Approach
Decision Tree
It is a rule-based AI model which helps the machine in predicting
what an element is with the help of various decisions (or rules) fed to it.
• The beginning point of any Decision Tree is known as its Root.
• It then forks into two different ways or conditions: Yes or No.
• The forks or diversions are known as Branches of the tree.
• The branches either lead to another question, or they lead to a
decision which is known as the leaf.

AI Project Cycle Summary Class ninth please

  • 1.
    Unit 2: AIProject Cycle Class IX
  • 2.
    Unit Sub-unit Details Unit– 2 AI Project Cycle Introduction Introduction to AI Project Cycle Problem Scoping Understanding Problem Scoping and Sustainable Development Goals Data Acquisition Simplifying Data Acquisition Data Exploration Visualising Data Modelling Developing AI Models Evaluation Proper Testing of AI Model
  • 3.
    Suppose you aremaking a card for your mother’s birthday. What steps you are going to follow? 1. Look for some cool greeting card ideas from different sources. You might go online and checkout some videos or you may ask someone who has knowledge about it. 2. After finalising the design, you would make a list of things that are required to make this card. 3. You will check if you have the material with you or not. If not, you could go and get all the items required, ready for use. 4. Once you have everything with you, you would start making the card. 5. If you make a mistake in the card somewhere which cannot be rectified, you will discard it and start remaking it. 6. Once the greeting card is made, you would gift it to your mother.
  • 4.
    AI Project Cycle AIProject Cycle provides us with an appropriate framework which can lead us towards the goal of our AI Project
  • 5.
  • 6.
  • 7.
    Problem Scoping Identifying aproblem and having a vision to solve it, is what Problem Scoping is all about.
  • 8.
    The 4Ws ProblemCanvas The 4Ws Problem canvas helps in identifying the key elements related to the problem.
  • 9.
    Who canvas Stakeholders arethe people who face this problem and would be benefitted with the solution.
  • 10.
  • 11.
  • 12.
  • 13.
    Problem Statement Template Our[stakeholder(s)] ________________________ Who Has/have a problem that [issue, problem, need] ________________________ What When/while [context, situation] ________________________ Where An ideal solution would [benefit of solution for them] ________________________ Why After filling the 4Ws Problem canvas, you now need to summarize all the cards into one template. The Problem Statement Template helps us to summarize all the key points into one single Template so that in future, whenever there is need to look back at the basis of the problem, we can take a look at the Problem Statement Template and understand the key elements of it.
  • 14.
    Data Acquisition What isdata? Data can be a piece of information or facts and statistics collected together for reference or analysis. Whenever we want an AI project to be able to predict an output, we need to train it first using data. Example: For example, If you want to make an Artificially Intelligent system which can predict the salary of any employee based on his previous salaries, you would feed the data of his previous salaries into the machine. This is the data with which the machine can be trained. Now, once it is ready, it will predict his next salary efficiently. The previous salary data here is known as Training Data while the next salary prediction data set is known as the Testing Data. For any AI project to be efficient, the training data should be authentic and relevant to the problem statement scoped.
  • 15.
    Data Features Data featuresrefer to the type of data you want to collect. In our previous example, data features would be salary amount, increment percentage, increment period, bonus, etc.
  • 16.
  • 17.
    Data Authenticity Sometimes, youuse the internet and try to acquire data for your project from some random websites. Such data might not be authentic as its accuracy cannot be proved. Due to this, it becomes necessary to find a reliable source of data from where some authentic information can be taken. At the same time, we should keep in mind that the data which we collect is open-sourced and not someone’s property. Extracting private data can be an offence. One of the most reliable and authentic sources of information, are the open-sourced websites hosted by the government. These government portals have general information collected in suitable format which can be downloaded and used wisely. Some of the open-sourced Govt. portals are: data.gov.in, india.gov.in
  • 19.
    Structural Classification Classification ofdata can also be done on the basic structure. Data from any source and in any form has a definite structure. The only point of difference is the way the data is organised, i.e., if it has been organized according to some predefined rules, ideas, or not. Based on structures, data can be classified into three types: Structured data: This type of data has a predefined data model, and it is organised in a predefined manner. Earlier, structures of data were quite simple, and they were often known before the data model was designed, and therefore, data was generally stored in a tabular form of relational databases. Train schedules, mark sheets of students from a particular class are some of the common examples of this form of data. Unstructured data: This type of data does not have any predefined structure. It can take any form. Most of the data in the world exists in this form. Videos, audios, presentations, emails, documents, etc. are the best examples of the unstructured data. Semi-structured data: This type of data has a structure that has the qualities of both structured as well as unstructured data. This data is not organised logically. Nevertheless, it has some sorts of markers and tags that give it some sort of identifiable structure.
  • 20.
    Other data classification Time-stampeddata: This type of data has time-order in it, which defines its sequence. The time-order can be according to some event time, i.e., when the data was collected; or, processed. This data acquires real meaning with behavioural data as it helps in forming an accurate representation of actions over some time. This data assists scientists in making predictions based on selecting the next best action style models. Machine data: Systems or programs mechanically generate this type of data. List of phone calls made the logging details on a computer, data in emails, etc. are some of the examples of machine data. The importance of this data exists in the fact that it contains valuable real-time, time-stamped data records of user behaviour, activities, or actions. Spatiotemporal data: This type of data has both location and time information, i.e., the time when an event was captured or collected along with the location where the event capture or collection took place. Open data: This type of data is freely available for everyone to use. Open data is not restricted through copyrights, patents, control, etc. Real-time data: This type of data is available as soon as an event takes place.
  • 21.
    BIG DATA The termBig Data refers to the data that does not fit into the standard relational databases, such as Oracle, SQL Server, MS Access, etc. The amount of big data is so large that the traditional databases are unable to capture, manage, and process it Basic features of big data  It is continuously created by humans and machines.  It includes structures, semi-structured, and unstructured data.  It is collected from varying sources.  Its size can vary from a few terabytes to zettabytes. https://www.youtube.com/watch?v=bAyrObl7TYE
  • 22.
    Importance of BigData Big data is integral from the point of view of Al.  Machine learning depends on big data.  Evaluation of big data allows us to identify patterns. It helps us understand the reasons for the sequencing of certain things.  Big data can help in making predictions and forming plans based on such predictions.  Big data is used for finding answers, solving problems, and achieving goals.  Big data helps in finding better and more complete answers.  The more complete answers help in finding better and varying approaches for dealing with the same problems.
  • 23.
    Three Parameters ofBig Data Big data is generally defined based on three parameters. These parameters are also known as the three V’s of Big Data. Volume: Big Data is characterised by high volume, low density, of unstructured data (though it can also contain structured and semi-structured data). For example, Twitter data feed or data from sensor-enabled equipment. The volume of this data can range from terabytes to petabytes. Velocity: Velocity means the speed at which the data is received. This can range from traditional batches of data to real-time data. These days, the internet-enabled devices, especially the Internet of Things, have made it possible to receive large volumes of real-time data. Variety: Traditionally, the data was structured data, i.e., it was possible to fit this data neatly into relational databases. The big data, on the other hand, is primarily made up of unstructured and semi-structured data, which requires additional processing before it can be used.
  • 24.
    Data Exploration Data Explorationrefers to techniques and tools that are used for identifying important patterns and trends. To analyse the data, you need to visualise it in some user-friendly format so that you can: • Quickly get a sense of the trends, relationships and patterns contained within the data. • Define strategy for which model to use at a later stage. • Communicate the same to others effectively. To visualise data, we can use various types of visual representations. https://www.youtube.com/watch?v=QE7bC9kLptk
  • 25.
    Type of Dataweak AI systems For a better understanding of Data Exploration, we need to understand the types of data weak Al systems that are commonly used today: 1. Heuristics or rule-based: User-defined rules are used in these systems for making selections, e.g. a system that groups individuals on demographics basis. 2. Brute force: These systems use decision trees for analysing every possible option. The Al-based chess games use these systems for analysing every possible move to find out the best approach. The brute force systems can only be effective in one thing. 3. Neural networks: These systems are designed to mimic our brain. This is also known as Deep Learning. In this, there is a depth of layers in the network. Such systems can improve their learning based on the data feedback. These systems need a substantial amount of data for learning, and learning one thing does not help them to learn the other things.
  • 26.
    Visualising Data Data visualisationis a standard term that we can use for any graphic that helps us in understanding or getting new insights from the data. The two most basic Data Visualisation forms are graphs and charts. The visualisation becomes difficult when the data we receive is complex. The first step in simplifying this complexity deals with segregating the parameters from the features of these parameters. The parameters acquire the form of the elements of the data, and the features become the characteristics of these elements . The second step for its simplification is evaluating the different parameters and finding out the parameters that are put the most impact on the project goals or objectives. This is called a classification strategy. This is similar to the human brain, which can focus on only a few of the essential parameters while handling multiple classification tasks subconsciously. https://www.highcharts.com/ https://www.tableau.com/ https://www.youtube.com/watch?v=YaGqOPxHFkc
  • 27.
    Data Visualisation Tools MicrosoftExcel: Microsoft Excel provides a variety of graphs and smart objects for data visualisations. Nevertheless, it is limited by the fact that it requires structured data for working. These tools will not work with big data Tableau: This software is regarded as the grandmaster of data visualisation. It has a customer base of nearly 60,000 accounts on the last count. This is simple to use and can create interactive visualisation, which is unmatched by its rivals. It is capable of handling large, frequently updated data sets. QlikView: This is the second major player in the data visualisation market. It boasts of more than 40,000 customers in more than 100 countries. QlikView is known for its capability of customisation and extensive features. FusionCharts: This is a charting and data visualising software, which uses JavaScript. It is capable of producing nearly 100 different types of charts. Datawrapper: This is one of the favourites of the media industry. Its strength lies in its ability to use CSV data for creating charts and maps. Microsoft Power Bl: Power BI is a data visualisation tool offered by Microsoft. It has a free version that can be downloaded and used by anyone. Google Data Studio: This is a data visualisation tool offered by Google, and it is a part of the Google Marketing Platform. https://www.youtube.com/watch?v=MiiANxRHSv4 https://www.youtube.com/watch?v=YaGqOPxHFkc
  • 28.
    Modelling Modelling is theprocess in which different models based on the visualized data can be created and even checked for the advantages and disadvantages of the model. • To Make a machine learning model there are 2 ways/Approaches Learning Based Approach and Rule Based Approach. • Learning Based Approach is based on Machine learning experiance with the data feeded. • Machine Learning • Machine learning is a subset of artificial Intelligence (AI) whcih provides machines the ability to learn automatically and improve from experience without being programmed for it.
  • 29.
    Base Training SetTesting Set Use Used for Training the Model Used for Testing the Model after it is trained Size Is allot bigger than testing data and constitutes about 70% to 80% It is smaller than Training Set and constitutes about 20% to 30% Rule Based Approach Datasets Dataset is a collection of related sets of Information that is composed of separate elements but can be manipulated by a computer as a unit. In Rule based Approach we will deal with 2 divisions of dataset: 1. Training Data - A subset required to train the model 2. Testing Data - A subset required while testing the trained the model
  • 30.
    Rule Based Approach DecisionTree It is a rule-based AI model which helps the machine in predicting what an element is with the help of various decisions (or rules) fed to it. • The beginning point of any Decision Tree is known as its Root. • It then forks into two different ways or conditions: Yes or No. • The forks or diversions are known as Branches of the tree. • The branches either lead to another question, or they lead to a decision which is known as the leaf.