Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Python tool to data analysis and artificial intelligence
1. Python tool to Data Analysis and
Artificial Intelligence
MD Aksam V K
Data Scientist, GrayMatter Software
Services Pvt Ltd., Bengaluru, Karnataka,
India
Ramco Institute of Technology
11/27/2018 1
2. Data
• Data value- date time, numeric, string
In time Out time
10:04:00 '11:03:11'
12:04:00 '21:03:11'
Product Quanity
Banana 4
Apple 3
Days Words
DAY 1 'Plant', 'Animal'
DAY 2 'Flower'
time_subs.py Num_opr.py
https://github.com/mdaksamvk/Python-tool-to-data-analysis-and-Artificial-intelligence
11/27/2018 2
4. Variable types
Data exploration, cleaning Visualization Python with application
Read file csv,xl Univariate SQL-MYSQL
Variables Bi variate NOSQL-Mango db
Multi variate Web app- Flask
Cross table
data_exp.py
11/27/2018 4
5. NoSQL vs. SQL
• A SQL database’s rigid schema also makes it relatively easy to perform aggregations on the
data, for instance by way of JOINs. like Microsoft SQL Server, MySQL, or Oracle Database uses
a schema
• With NoSQL, data can be stored in a schema-less or free-form fashion. Any data can be
stored in any record.
Document databases (e.g. CouchDB, MongoDB). Inserted data is stored in the form of free-
form JSON structures or “documents,” where the data could be anything from integers to
strings to freeform text. There is no inherent need to specify what fields, if any, a document
will contain.
Key-value stores (e.g. Redis, Riak). Free-form values—from simple integers or strings to
complex JSON documents—are accessed in the database by way of keys.
Wide column stores (e.g. HBase, Cassandra). Data is stored in columns instead of rows as in a
conventional SQL system. Any number of columns (and therefore many different types of
data) can be grouped or aggregated as needed for queries or data views.
Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their
relationships, with each node in the graph a free-form chunk of data.
11/27/2018 5
6. Dataframe
eg: [1,2]- 1x2 matrix
IT Maths # Denoted by
element coloumn 1 m
variable rows 2 n
11/27/2018 6
7. Data Analysis pipeline
Process Components
Data preprocessing Read data, Data
exploration with
visualizations
Data Analysis Class/objects/functi
ons/modules, If-else
condition,loops,itera
tor, lambda etc.
Results visualization Plots
if_for.py
mod_fun.py
11/27/2018 7
8. ARTIFICIAL INTELLIGENCE
1. Label encoding -
2. Single varaiable regression
3. Logistic regression -
4. Naive bayes classifier -
5. Support vector machine
6. k means clustering -
• Artificial Neural network
• Random forest
• ensemble learning
naive_bayes.py
label_encoder.py
logistic_regression.py
SVM_income_classifier.py
kmeans.py
regressor_singlevar.py
11/27/2018 8
9. Label encoding
Sex Encode value
Male 0
Female 1
Colour Encode value
red 1
blue 2
green 3
11/27/2018 9
10. Logistic regression
• Logistic Regression is a
classification algorithm.
It is used to predict a
binary outcome (1 / 0,
Yes / No, True / False)
given a set of
independent variables.
• It predicts the
probability of
occurrence of an event
by fitting data to a logit
function.
logistic_regression.py11/27/2018 10
11. Regression
Linear regressor performance:
Mean absolute error = 0.59
Mean squared error = 0.49
Median absolute error = 0.51
Explain variance score = 0.86
R2 score = 0.86
New mean absolute error = 0.59
regressor_singlevar.py
11/27/2018 11