Python tool to data analysis and artificial intelligence

•Download as PPTX, PDF•

0 likes•35 views

Md Aksam VK

Invited talk on python on data science at Ramco Institute of Technology

Data & Analytics

Python tool to Data Analysis and
Artificial Intelligence
MD Aksam V K
Data Scientist, GrayMatter Software
Services Pvt Ltd., Bengaluru, Karnataka,
India
Ramco Institute of Technology
11/27/2018 1

Data
• Data value- date time, numeric, string
In time Out time
10:04:00 '11:03:11'
12:04:00 '21:03:11'
Product Quanity
Banana 4
Apple 3
Days Words
DAY 1 'Plant', 'Animal'
DAY 2 'Flower'
time_subs.py Num_opr.py
https://github.com/mdaksamvk/Python-tool-to-data-analysis-and-Artificial-intelligence
11/27/2018 2

Basic Data Structures
Dimension Homogenous Hetrogenous
1-D Atomic vector List
2-D Matrix Dataframe
n-D Array
• Lists
• Tuples
• Dictionaries
data_structure.py
11/27/2018 3

Variable types
Data exploration, cleaning Visualization Python with application
Read file csv,xl Univariate SQL-MYSQL
Variables Bi variate NOSQL-Mango db
Multi variate Web app- Flask
Cross table
data_exp.py
11/27/2018 4

NoSQL vs. SQL
• A SQL database’s rigid schema also makes it relatively easy to perform aggregations on the
data, for instance by way of JOINs. like Microsoft SQL Server, MySQL, or Oracle Database uses
a schema
• With NoSQL, data can be stored in a schema-less or free-form fashion. Any data can be
stored in any record.
 Document databases (e.g. CouchDB, MongoDB). Inserted data is stored in the form of free-
form JSON structures or “documents,” where the data could be anything from integers to
strings to freeform text. There is no inherent need to specify what fields, if any, a document
will contain.
 Key-value stores (e.g. Redis, Riak). Free-form values—from simple integers or strings to
complex JSON documents—are accessed in the database by way of keys.
 Wide column stores (e.g. HBase, Cassandra). Data is stored in columns instead of rows as in a
conventional SQL system. Any number of columns (and therefore many different types of
data) can be grouped or aggregated as needed for queries or data views.
 Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their
relationships, with each node in the graph a free-form chunk of data.
11/27/2018 5

Dataframe
eg: [1,2]- 1x2 matrix
IT Maths # Denoted by
element coloumn 1 m
variable rows 2 n
11/27/2018 6

Data Analysis pipeline
Process Components
Data preprocessing Read data, Data
exploration with
visualizations
Data Analysis Class/objects/functi
ons/modules, If-else
condition,loops,itera
tor, lambda etc.
Results visualization Plots
if_for.py
mod_fun.py
11/27/2018 7

ARTIFICIAL INTELLIGENCE
1. Label encoding -
2. Single varaiable regression
3. Logistic regression -
4. Naive bayes classifier -
5. Support vector machine
6. k means clustering -
• Artificial Neural network
• Random forest
• ensemble learning
naive_bayes.py
label_encoder.py
logistic_regression.py
SVM_income_classifier.py
kmeans.py
regressor_singlevar.py
11/27/2018 8

Label encoding
Sex Encode value
Male 0
Female 1
Colour Encode value
red 1
blue 2
green 3
11/27/2018 9

Logistic regression
• Logistic Regression is a
classification algorithm.
It is used to predict a
binary outcome (1 / 0,
Yes / No, True / False)
given a set of
independent variables.
• It predicts the
probability of
occurrence of an event
by fitting data to a logit
function.
logistic_regression.py11/27/2018 10

Regression
Linear regressor performance:
Mean absolute error = 0.59
Mean squared error = 0.49
Median absolute error = 0.51
Explain variance score = 0.86
R2 score = 0.86
New mean absolute error = 0.59
regressor_singlevar.py
11/27/2018 11

Naive bayes classifier
naive_bayes.py
11/27/2018 12

k means clustering
kmeans.py
11/27/2018 13

Support vector machine
SVM_income_classifier.py11/27/2018 14

What's hot

Resume xiaodan(vinci)vinci105

Clio infra Collabs data analysis toolsvty

Fully Automated QA System For Large Scale Search And Recommendation Engines U...Spark Summit

Data analysis in dataverse & visualization of datasets on historical mapsvty

The recovery of netherlands geographic information system (nlgis 2)vty

Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...Globus

Neo4J and Weka 2 Vasko Yordanov

Rapid software evolutionborislav

AlphaPy: A Data Science Pipeline in PythonMark Conway

Making data typing efforts or automatically detecting data types for automat...National Institute of Informatics

Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Stefan Urbanek

C1803041317IOSR Journals

towards_analytics_query_engineNantia Makrynioti

معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان دادهWeb Standards School

Beyond stream analyticsRicardo Clemente

LSST Education and Public Outreach (EPO) Amanda Bauer

What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella

Oslo bekk2014Max Neunhöffer

Movie data analysisManvi Chandra

Big data & hadoop frameworkTu Pham

What's hot (20)

Resume xiaodan(vinci)

Clio infra Collabs data analysis tools

Fully Automated QA System For Large Scale Search And Recommendation Engines U...

Data analysis in dataverse & visualization of datasets on historical maps

The recovery of netherlands geographic information system (nlgis 2)

Modern Scientific Data Management Practices: The Atmospheric Radiation Measur...

Neo4J and Weka 2

Rapid software evolution

AlphaPy: A Data Science Pipeline in Python

Making data typing efforts or automatically detecting data types for automat...

Forces and Threats in a Data Warehouse (and why metadata and architecture is ...

C1803041317

towards_analytics_query_engine

معرفی کاربردهای یادگیری عمیق و چالش های آن در کلان داده

Beyond stream analytics

LSST Education and Public Outreach (EPO)

What is a distributed data science pipeline. how with apache spark and friends.

Oslo bekk2014

Movie data analysis

Big data & hadoop framework

Similar to Python tool to data analysis and artificial intelligence

04 open source_toolsMarco Quartulli

Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster

Apache Spark sqlaftab alam

Azure Databricks for Data ScientistsRichard Garris

Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal

Data Science & Big Data - Theory.pdfRAKESHG79

Analytics&IoTSelvaraj Kesavan

ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfcadejaumafiq

Data Wrangling and Visualization Using PythonMOHITKUMAR1379

Performance analysis of Data Mining algorithms in WekaIOSR Journals

Satwik mishra resumeSatwik Mishra

Elasticsearch - basics and beyondErnesto Reig

employee turnover prediction document.docxrohithprabhas1

Internet data mining 2006raj_vij

Modern Database Systems - Lecture 00Michael Mathioudakis

A Survey on Graph Database Management Techniques for Huge Unstructured Data IJECEIAES

Data analytcis-first-stepsShesha R

Satwik mishra resumeSatwik Mishra

B040101007012ijceronline

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf

Similar to Python tool to data analysis and artificial intelligence (20)

04 open source_tools

Mastering MapReduce: MapReduce for Big Data Management and Analysis

Apache Spark sql

Azure Databricks for Data Scientists

Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms

Data Science & Big Data - Theory.pdf

Analytics&IoT

ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf

Data Wrangling and Visualization Using Python

Performance analysis of Data Mining algorithms in Weka

Satwik mishra resume

Elasticsearch - basics and beyond

employee turnover prediction document.docx

Internet data mining 2006

Modern Database Systems - Lecture 00

A Survey on Graph Database Management Techniques for Huge Unstructured Data

Data analytcis-first-steps

Satwik mishra resume

B040101007012

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF

Recently uploaded

B2 Creative Industry Response Evaluation.docxStephen266013

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823

Invezz.com - Grow your wealth with trading signalsInvezz1

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Decoding Loan Approval: Predictive Modeling in Action

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow

Invezz.com - Grow your wealth with trading signals

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

Python tool to data analysis and artificial intelligence

1. Python tool to Data Analysis and Artificial Intelligence MD Aksam V K Data Scientist, GrayMatter Software Services Pvt Ltd., Bengaluru, Karnataka, India Ramco Institute of Technology 11/27/2018 1

2. Data • Data value- date time, numeric, string In time Out time 10:04:00 '11:03:11' 12:04:00 '21:03:11' Product Quanity Banana 4 Apple 3 Days Words DAY 1 'Plant', 'Animal' DAY 2 'Flower' time_subs.py Num_opr.py https://github.com/mdaksamvk/Python-tool-to-data-analysis-and-Artificial-intelligence 11/27/2018 2

3. Basic Data Structures Dimension Homogenous Hetrogenous 1-D Atomic vector List 2-D Matrix Dataframe n-D Array • Lists • Tuples • Dictionaries data_structure.py 11/27/2018 3

4. Variable types Data exploration, cleaning Visualization Python with application Read file csv,xl Univariate SQL-MYSQL Variables Bi variate NOSQL-Mango db Multi variate Web app- Flask Cross table data_exp.py 11/27/2018 4

5. NoSQL vs. SQL • A SQL database’s rigid schema also makes it relatively easy to perform aggregations on the data, for instance by way of JOINs. like Microsoft SQL Server, MySQL, or Oracle Database uses a schema • With NoSQL, data can be stored in a schema-less or free-form fashion. Any data can be stored in any record.  Document databases (e.g. CouchDB, MongoDB). Inserted data is stored in the form of free- form JSON structures or “documents,” where the data could be anything from integers to strings to freeform text. There is no inherent need to specify what fields, if any, a document will contain.  Key-value stores (e.g. Redis, Riak). Free-form values—from simple integers or strings to complex JSON documents—are accessed in the database by way of keys.  Wide column stores (e.g. HBase, Cassandra). Data is stored in columns instead of rows as in a conventional SQL system. Any number of columns (and therefore many different types of data) can be grouped or aggregated as needed for queries or data views.  Graph databases (e.g. Neo4j). Data is represented as a network or graph of entities and their relationships, with each node in the graph a free-form chunk of data. 11/27/2018 5

6. Dataframe eg: [1,2]- 1x2 matrix IT Maths # Denoted by element coloumn 1 m variable rows 2 n 11/27/2018 6

7. Data Analysis pipeline Process Components Data preprocessing Read data, Data exploration with visualizations Data Analysis Class/objects/functi ons/modules, If-else condition,loops,itera tor, lambda etc. Results visualization Plots if_for.py mod_fun.py 11/27/2018 7

8. ARTIFICIAL INTELLIGENCE 1. Label encoding - 2. Single varaiable regression 3. Logistic regression - 4. Naive bayes classifier - 5. Support vector machine 6. k means clustering - • Artificial Neural network • Random forest • ensemble learning naive_bayes.py label_encoder.py logistic_regression.py SVM_income_classifier.py kmeans.py regressor_singlevar.py 11/27/2018 8

9. Label encoding Sex Encode value Male 0 Female 1 Colour Encode value red 1 blue 2 green 3 11/27/2018 9

10. Logistic regression • Logistic Regression is a classification algorithm. It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. • It predicts the probability of occurrence of an event by fitting data to a logit function. logistic_regression.py11/27/2018 10

11. Regression Linear regressor performance: Mean absolute error = 0.59 Mean squared error = 0.49 Median absolute error = 0.51 Explain variance score = 0.86 R2 score = 0.86 New mean absolute error = 0.59 regressor_singlevar.py 11/27/2018 11

12. Naive bayes classifier naive_bayes.py 11/27/2018 12

13. k means clustering kmeans.py 11/27/2018 13

14. Support vector machine SVM_income_classifier.py11/27/2018 14

15. 11/27/2018 15

Python tool to data analysis and artificial intelligence

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Python tool to data analysis and artificial intelligence

Similar to Python tool to data analysis and artificial intelligence (20)

Recently uploaded

Recently uploaded (20)

Python tool to data analysis and artificial intelligence