AI Project Cycle Summary Class ninth please

Unit 2: AI Project Cycle
Class IX

Unit Sub-unit Details
Unit – 2
AI Project Cycle
Introduction Introduction to AI Project Cycle
Problem Scoping Understanding Problem Scoping and
Sustainable Development Goals
Data Acquisition Simplifying Data Acquisition
Data Exploration Visualising Data
Modelling Developing AI Models
Evaluation Proper Testing of AI Model

Suppose you are making a card for your mother’s
birthday. What steps you are going to follow?
1. Look for some cool greeting card ideas from different sources. You might go online and
checkout some videos or you may ask someone who has knowledge about it.
2. After finalising the design, you would make a list of things that are required to make
this card.
3. You will check if you have the material with you or not. If not, you could go and get all
the items required, ready for use.
4. Once you have everything with you, you would start making the card.
5. If you make a mistake in the card somewhere which cannot be rectified, you will
discard it and start remaking it.
6. Once the greeting card is made, you would gift it to your mother.

AI Project Cycle
AI Project Cycle provides us with an appropriate
framework which can lead us towards the goal of
our AI Project

https://www.youtube.com/watch?v=EZdpZEMPTe0

https://www.youtube.com/watch?v=V7QzWen9Odk

Problem Scoping
Identifying a problem and having a vision to solve
it, is what Problem Scoping is all about.

The 4Ws Problem Canvas
The 4Ws Problem canvas helps in identifying the key elements related to the problem.

Who canvas
Stakeholders are the people who face this problem and would
be benefitted with the solution.

Problem Statement Template
Our [stakeholder(s)]
________________________
Who
Has/have a problem that
[issue, problem, need]
________________________
What
When/while
[context, situation]
________________________
Where
An ideal solution would
[benefit of solution for them]
________________________
Why
After filling the 4Ws Problem canvas, you now need to summarize all the cards into
one template. The Problem Statement Template helps us to summarize all the key
points into one single Template so that in future, whenever there is need to look back
at the basis of the problem, we can take a look at the Problem Statement Template and
understand the key elements of it.

Data Acquisition
What is data?
Data can be a piece of information or facts and statistics collected together for reference
or analysis. Whenever we want an AI project to be able to predict an output, we need to
train it first using data.
Example:
For example, If you want to make an Artificially Intelligent system which can predict the
salary of any employee based on his previous salaries, you would feed the data of his
previous salaries into the machine. This is the data with which the machine can be
trained. Now, once it is ready, it will predict his next salary efficiently. The previous salary
data here is known as Training Data while the next salary prediction data set is known
as the Testing Data.
For any AI project to be efficient, the training data should be authentic and relevant to
the problem statement scoped.

Data Features
Data features refer to the type of data you want
to collect. In our previous example, data features
would be salary amount, increment percentage,
increment period, bonus, etc.

Data Authenticity
Sometimes, you use the internet and try to acquire data for your project from
some random websites. Such data might not be authentic as its accuracy
cannot be proved. Due to this, it becomes necessary to find a reliable source
of data from where some authentic information can be taken. At the same
time, we should keep in mind that the data which we collect is open-sourced
and not someone’s property. Extracting private data can be an offence. One of
the most reliable and authentic sources of information, are the open-sourced
websites hosted by the government. These government portals have general
information collected in suitable format which can be downloaded and used
wisely.
Some of the open-sourced Govt. portals are: data.gov.in, india.gov.in

Structural Classification
Classification of data can also be done on the basic structure. Data from any source and in any form has a definite
structure. The only point of difference is the way the data is organised, i.e., if it has been organized according to some
predefined rules, ideas, or not.
Based on structures, data can be classified into three types:
Structured data: This type of data has a predefined data model, and it is organised in a predefined manner. Earlier,
structures of data were quite simple, and they were often known before the data model was designed, and therefore,
data was generally stored in a tabular form of relational databases. Train schedules, mark sheets of students from a
particular class are some of the common examples of this form of data.
Unstructured data: This type of data does not have any predefined structure. It can take any form. Most of the data
in the world exists in this form. Videos, audios, presentations, emails, documents, etc. are the best examples of the
unstructured data.
Semi-structured data: This type of data has a structure that has the qualities of both structured as well as
unstructured data. This data is not organised logically. Nevertheless, it has some sorts of markers and
tags that give it some sort of identifiable structure.

Other data classification
Time-stamped data: This type of data has time-order in it, which defines its sequence. The time-order can be
according to some event time, i.e., when the data was collected; or, processed. This data acquires real meaning with
behavioural data as it helps in forming an accurate representation of actions over some time. This data assists
scientists in making predictions based on selecting the next best action style models.
Machine data: Systems or programs mechanically generate this type of data. List of phone calls made the logging
details on a computer, data in emails, etc. are some of the examples of machine data. The importance of this data exists
in the fact that it contains valuable real-time, time-stamped data records of user behaviour, activities, or actions.
Spatiotemporal data: This type of data has both location and time information, i.e., the time when an event was
captured or collected along with the location where the event capture or collection took place.
Open data: This type of data is freely available for everyone to use. Open data is not restricted through copyrights,
patents, control, etc.
Real-time data: This type of data is available as soon as an event takes place.

BIG DATA
The term Big Data refers to the data that does not fit into the standard relational databases,
such as Oracle, SQL Server, MS Access, etc. The amount of big data is so large that the
traditional databases are unable to capture, manage, and process it
Basic features of big data
 It is continuously created by humans and machines.
 It includes structures, semi-structured, and unstructured data.
 It is collected from varying sources.
 Its size can vary from a few terabytes to zettabytes.
https://www.youtube.com/watch?v=bAyrObl7TYE

Importance of Big Data
Big data is integral from the point of view of Al.
 Machine learning depends on big data.
 Evaluation of big data allows us to identify patterns. It helps us understand the reasons for the
sequencing of certain things.
 Big data can help in making predictions and forming plans based on such predictions.
 Big data is used for finding answers, solving problems, and achieving goals.
 Big data helps in finding better and more complete answers.
 The more complete answers help in finding better and varying approaches for dealing with the same
problems.

Three Parameters of Big Data
Big data is generally defined based on three parameters. These parameters are also known as the three V’s of Big Data.
Volume: Big Data is characterised by high volume, low density, of unstructured data (though it can also contain
structured and semi-structured data). For example, Twitter data feed or data from sensor-enabled equipment. The
volume of this data can range from terabytes to petabytes.
Velocity: Velocity means the speed at which the data is received. This can range from traditional batches of data to
real-time data. These days, the internet-enabled devices, especially the Internet of Things, have made it possible to
receive large volumes of real-time data.
Variety: Traditionally, the data was structured data, i.e., it was possible to fit this data neatly into relational databases.
The big data, on the other hand, is primarily made up of unstructured and semi-structured data, which requires
additional processing before it can be used.

Data Exploration
Data Exploration refers to techniques and tools that are used for identifying important patterns and trends.
To analyse the data, you need to visualise it in some user-friendly format so that you can:
• Quickly get a sense of the trends, relationships and patterns contained within the data.
• Define strategy for which model to use at a later stage.
• Communicate the same to others effectively. To visualise data, we can use various types of visual
representations.
https://www.youtube.com/watch?v=QE7bC9kLptk

Type of Data weak AI systems
For a better understanding of Data Exploration, we need to understand the types of data weak Al
systems that are commonly used today:
1. Heuristics or rule-based: User-defined rules are used in these systems for making selections, e.g. a
system that groups individuals on demographics basis.
2. Brute force: These systems use decision trees for analysing every possible option. The Al-based
chess games use these systems for analysing every possible move to find out the best approach. The
brute force systems can only be effective in one thing.
3. Neural networks: These systems are designed to mimic our brain. This is also known as Deep
Learning. In this, there is a depth of layers in the network. Such systems can improve their learning
based on the data feedback. These systems need a substantial amount of data for learning, and
learning one thing does not help them to learn the other things.

Visualising Data
Data visualisation is a standard term that we can use for any graphic that helps us in understanding or
getting new insights from the data. The two most basic Data Visualisation forms are graphs and charts.
The visualisation becomes difficult when the data we receive is complex. The first step in simplifying
this complexity deals with segregating the parameters from the features of these parameters. The
parameters acquire the form of the elements of the data, and the features become the characteristics of
these elements .
The second step for its simplification is evaluating the different parameters and finding out the
parameters that are put the most impact on the project goals or objectives. This is called a
classification strategy. This is similar to the human brain, which can focus on only a few of the
essential parameters while handling multiple classification tasks subconsciously.
https://www.highcharts.com/
https://www.tableau.com/
https://www.youtube.com/watch?v=YaGqOPxHFkc

Data Visualisation Tools
Microsoft Excel: Microsoft Excel provides a variety of graphs and smart objects for data visualisations. Nevertheless, it is
limited by the fact that it requires structured data for working. These tools will not work with big data
Tableau: This software is regarded as the grandmaster of data visualisation. It has a customer base of nearly 60,000
accounts on the last count. This is simple to use and can create interactive visualisation, which is unmatched by its rivals. It
is capable of handling large, frequently updated data sets.
QlikView: This is the second major player in the data visualisation market. It boasts of more than 40,000 customers in more
than 100 countries. QlikView is known for its capability of customisation and extensive features.
FusionCharts: This is a charting and data visualising software, which uses JavaScript. It is capable of producing nearly 100
different types of charts.
Datawrapper: This is one of the favourites of the media industry. Its strength lies in its ability to use CSV data for creating
charts and maps.
Microsoft Power Bl: Power BI is a data visualisation tool offered by Microsoft. It has a free version that can be downloaded
and used by anyone.
Google Data Studio: This is a data visualisation tool offered by Google, and it is a part of the Google Marketing Platform.
https://www.youtube.com/watch?v=MiiANxRHSv4
https://www.youtube.com/watch?v=YaGqOPxHFkc

Modelling
Modelling is the process in which different models based on the visualized data can
be created and even checked for the advantages and disadvantages of the model.
• To Make a machine learning model there are 2 ways/Approaches Learning Based
Approach and Rule Based Approach.
• Learning Based Approach is based on Machine learning experiance with the data
feeded.
• Machine Learning
• Machine learning is a subset of artificial Intelligence (AI) whcih provides
machines the ability to learn automatically and improve from experience without
being programmed for it.

Base Training Set Testing Set
Use
Used for Training the
Model
Used for Testing the
Model after it is trained
Size
Is allot bigger than
testing data and
constitutes about 70% to
80%
It is smaller than Training
Set and constitutes
about 20% to 30%
Rule Based Approach
Datasets
Dataset is a collection of related sets of Information that is composed of separate elements but can be manipulated
by a computer as a unit.
In Rule based Approach we will deal with 2 divisions of dataset:
1. Training Data - A subset required to train the model
2. Testing Data - A subset required while testing the trained the model

Rule Based Approach
Decision Tree
It is a rule-based AI model which helps the machine in predicting
what an element is with the help of various decisions (or rules) fed to it.
• The beginning point of any Decision Tree is known as its Root.
• It then forks into two different ways or conditions: Yes or No.
• The forks or diversions are known as Branches of the tree.
• The branches either lead to another question, or they lead to a
decision which is known as the leaf.

AI Project Cycle Summary Class ninth please

More Related Content

Similar to AI Project Cycle Summary Class ninth please

Recently uploaded

AI Project Cycle Summary Class ninth please