SlideShare a Scribd company logo
1 of 27
AUTOMATED DATA ANALYSIS
  WITH PYTHON (PART II)



        S.Anand@Gramener.com
DO WE FOLLOW PEP8?
author          repo              filename                       errno name                                                     count
Michael0x2a     axe               interpreter-parsing_rules.py   E231 missing whitespace after ','                               14177
egirault        googleplay        api-googleplay_pb2.py          E121 continuation line indentation is not a multiple of four    12953
mdwrigh2        pyice             parsetab.py                    E231 missing whitespace after ','                                5452
steviesteveo    projecteuler      euler22.py                     E231 missing whitespace after ','                                5162
xiongchiamiov   Mirage            mirage.py                      W191 indentation contains tabs                                   4593
Ariel           team              Ariel-compiler-Sintactic.py    E231 missing whitespace after ','                                4489
albertz         PyCParser         cparser.py                     W191 indentation contains tabs                                   3041
pombredanne     PyCParser         cparser.py                     W191 indentation contains tabs                                   3025
cshen           PyCParser         cparser.py                     W191 indentation contains tabs                                   3025
bohr            PyCParser         cparser.py                     W191 indentation contains tabs                                   2988
fj              PyCParser         cparser.py                     W191 indentation contains tabs                                   1863
steviesteveo    projecteuler      euler42.py                     E231 missing whitespace after ','                                1786
steviesteveo    projecteuler      wordlist.py                    E231 missing whitespace after ','                                1785
aixp            pycoco            Core.py                        E111 indentation is not a multiple of four                       1760
mdoege          3NewsFeed         newsfeed.py                    W191 indentation contains tabs                                   1738
mdoege          3NewsFeed         newsfeed.py                    E101 indentation contains mixed spaces and tabs                  1738
ebegoli         EMLPy             w3c_ir_assertions.py           W191 indentation contains tabs                                   1650
AvsPmod         AvsPmod           AvsP.py                        E501 line too long (80 > 79 characters)                          1598
chikuzen        AvsPmod           AvsP.py                        E501 line too long (80 > 79 characters)                          1598
tweetr          python            twitter-twitterapi.py          E101 indentation contains mixed spaces and tabs                  1507
duartebarbosa   googletranslate   Languages.py                   E101 indentation contains mixed spaces and tabs                  1422
duartebarbosa   googletranslate   Languages.py                   W191 indentation contains tabs                                   1422
nrub            python            twitter-twitterapi.py          E101 indentation contains mixed spaces and tabs                  1297
danudey         python            twitter-twitterapi.py          E101 indentation contains mixed spaces and tabs                  1278
idan            python            twitter-twitterapi.py          E101 indentation contains mixed spaces and tabs                  1278
import sys
import pandas as pd

data = pd.read_csv(sys.argv[1])
import sys
import pandas as pd

data = pd.read_csv(sys.argv[1])
print data.groupby('name').sum().sort('count')
tab after keyword                                    1
blank lines found after function decorator           2
tab after operator                                  28
tab before keyword                                  31
unexpected indentation                              41
expected an indented block                          78
multiple spaces after keyword                      120
...
blank line contains whitespace                    40543
no spaces around keyword / parameter equals       41858
indentation is not a multiple of four             44109
missing whitespace around operator                47286
indentation contains mixed spaces and tabs        52633
line too long (80 > 79 characters)                78201
missing whitespace after ','                      91612
indentation contains tabs                        168842
LET’S TAKE MARKS
DIST_CODE        DOB      Day Caste    B/G Med Cond Total SCHOOL_NAME                      Kannada English Hindi Maths Science Social
CHIKKABALLAPUR   13-Jul-95 Thu ST      G   K   N    111 PRIYADHARSHINI HIGH SCHOOL               46       7   10     30       8    10
GADAG            09-Feb-95 Thu OTHERS B    E   N    458 LOYALA HIGH SCHOOL GADAG                 86     69    52     70      90    91
MANGALORE        27-Oct-95 Fri OTHERS B    K   N    390 GOVT.HIGH SCHOOL KOKKADA                105     35    65     76      67    42
BELGAUM          15-Jun-95 Thu ST      B   M   N    151 MADYAMIKA VIDYALAYA BELAVATTI                   14           23      25    26
MADHUGIRI        11-Sep-95 Mon OTHERS B    K   N    240 SRI KALIDASA VIDYAVARDHAKA H.S.          57     35    35     48      30    35
KOLAR            08-May-95 Mon OTHERS B    E   N    363 DR.AMBEDKAR HIGH SCHOOL                  57     63    60     61      62    60
BIJAPUR          24-May-95 Wed OTHERS B    K   N    451 LOYOLA HIGH SCHOOL STATION BACK          90     51    87     79      81    63
UDUPI            05-Feb-96 Mon SC      B   K   N    239 GOVT JUNIOR COLLEGE BAILOOR              54     30    65     30      30    30
BANGALORE NORTH 20-Oct-95 Fri OTHERS G     E   N    530 ST MARY'S HIGH SCHOOL NO 1 T                    92           78      69    77
GULBARGA         03-Jan-95 Tue OTHERS G    K   N    397 GOVERNMENT HIGH SCHOOL ANDOLA,           96     47    61     65      67    61
BELGAUM          10-May-94 Tue CAT-1   B   K   N    111 GOVERNMENT HIGH SCHOOL SULEBHAVI         21     35     9     22      18     6
BIJAPUR          10-Jul-95 Mon OTHERS B    K   N    380 H G P U COLLEGE SINDAGI BIJAPUR          87     43    69     65      60    56
CHIKODI          25-Apr-95 Tue OTHERS B    K   N    408 GOVERNMENT HIGH SCHOOL                   94     54    85     47      63    65
SHIMOGA          18-Dec-95 Mon SC      G   K   N    215 SAHYADRI HIGH SCHOOL SHIMOGA             44     35    40     31      30    35
BIJAPUR          18-Nov-93 Thu SC      B   K   N    157 TILAGUL HIGH SCHOOL TILAGUL              29     12    35     20      31    30
KOLAR            26-Sep-93 Sun SC      B   K   N    237 GOVERNMENT HIGH SCHOOL MEDIHAL           55     30    37     30      38    47
KOPPAL           01-Jun-93 Tue OTHERS B    K   N    254 GOVERNMENT HIGH SCHOOL HIRE              38     42    37     53      49    35
CHIKKABALLAPUR   21-Apr-96 Sun OTHERS B    K   N    251 GOVT. HIGH SCHOOL KADALAVENI             77     40    53     40      26    15
CHIKODI          25-Nov-95 Sat OTHERS B    M   N    477 ARUN SHAMARAO PATIL HIGH SCHOOL                 70           80      66    77
BELGAUM          16-Feb-95 Thu OTHERS G    U   N    307 BEGUM LATIFA GIRLS HIGH SCHOOL                  44            9      50    56
import sys
import pandas as pd

data = pd.read_csv(sys.argv[1])
print data.groupby('DIST_CODE').means().sort('TOTAL_MARKS')

              TOTAL_MARKS    Kannada    ... Social Science
DIST_CODE
BIDAR          245.018650   56.594794   ...   40.368867
YADGIR         285.778553   63.193738   ...   48.891916
MADHUGIRI      291.869219   73.725051   ...   43.854291
......
CHIKODI        354.548775   79.675186   ...   58.088485
SIRSI          356.859926   82.086493   ...   56.168686
UDUPI          358.532346   82.697818   ...   50.479084
KANNADA   ENGLISH   HINDI




MATHS     SCIENCE   SOCIAL SCIENCE
HOW DO WE GENERALISE?
Groups        Things you can group by
(Dimensions)   Place, Categories, Attributes
               string, datetime, int




Numbers        Things you can measure
   (Metrics)   Sizes, Values, Growth, Frequencies
               float, int
category   title                                                    kJ   rate
dairy      Activia Pouring Natural Yogurt 1X950g                   216   0.21
dairy      Activia Pouring Strawberry Yogurt 1X950g                250   0.21
dairy      Activia Pouring Vanilla Yogurt 1X950g                   263   0.21
icecream   Almondy Daim 400G                                     1804    0.75
icecream   Almondy Toblerone 400G                                1850      0.5
cereals    Alpen 10 Pack Lite Summer Fruits Cereal Bars 210G     1222     1.57
cereals    Alpen 10Pk Fruit Nut And Chocolate Cereal Bars 290G   1812     1.14
cereals    Alpen Coconut And Chocolate Cereal Bars 5Pk 145G      1863     1.24
cereals    Alpen Fruit And Nut With Chocolate Cereal Bar 5X29g   1812     1.24
cereals    Alpen High Fruit 650G                                 1439      0.4
cereals    Alpen Light Bars Chocolate And Orange 5X21g           1246     1.71
cereals    Alpen Light Chocolate And Fudge Bar 5X21g             1264     1.71
cereals    Alpen Light Sultana & Apple Bars 5Pk 105G              1197    1.71
cereals    Alpen Light Summer Fruits Bars 5Pk 105G               1222     1.71
cereals    Alpen No Added Sugar 1.3Kg                            1488    0.31
cereals    Alpen No Added Sugar 560G                             1488    0.46
cereals    Alpen Original 1.5Kg                                  1509    0.27
cereals    Alpen Original Muesli 750G                            1509    0.35
cereals    Alpen Raspberry And Yoghurt Cereal Bars5x29g          1748     1.24
cereals    Alpen Strawberry With Yoghurt Cereal Bar 5X29g        1756     1.24
dairy      Alpro Natural Yofu 500G                                       0.28
dairy      Alpro Raspberry Vanilla Yofu 4X125g                           0.35
dairy      Alpro Strawberry And Fof Soya Yofu 4X125g                     0.35
dairy      Alpro Vanilla Yofu 500G                                       0.28




  Which categories of food are light? Which are inexpensive?
import sys
import pandas as pd

data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index

>>> groups
Index([category, title], dtype=object)

>>> numbers
Index([kJ, rate], dtype=object)
import sys
import pandas as pd

data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index

for group in groups:
       ave = data.groupby(group).mean()
       for num in numbers:
              print ave.sort(num, ascending=False)
LET’S APPLY THIS
       MARKS
      TRAINS
      CRICKET
Afghanistan’s s/r               Australia’s s/r
                    Difference is large
                  compared to the spread                    High probability that
                                                            s/r is different

55         60               65              70         75




                                                            Average probability that
                                                            s/r is different

55         60               65              70         75




                                                            Low probability that s/r
                                                            are different

55         60               65              70         75
WELCOME TO STATS 201
    scipy.stats.mstats.ttest_ind
    scipy.stats.mstats.f_oneway
import sys
import pandas as pd
from scipi.stats.mstats import f_oneway

data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index

for group in groups:
       grouped = data.groupby(group)
       ave = grouped.mean()
       for num in numbers:
             F, prob = f_oneway(*grouped[number].values)
              print prob
              print ave.sort(num, ascending=False)
LET’S APPLY THIS
     GROCERIES
      CRICKET
      TRAINS
import sys
import pandas as pd
from scipi.stats.mstats import f_oneway

data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index

for group in groups:
       grouped = data.groupby(group)
       ave = grouped.mean()
       for num in numbers:
             F, prob = f_oneway(*grouped[number].values)
              improvement = (ave[number].max() /
                             data[number].mean() – 1)
              print improvement, prob
              # print ave.sort(num, ascending=False)
LET’S APPLY THIS
     GROCERIES
      CRICKET
       MARKS
       TRAIN
Hypotheses          Data         Insight




  Data            Autolysis




           TAKE ANY DATASET
         THROW IT AT A PROGRAM
             GET INSIGHTS
DIRECTIONS
CROSS TABULATIONS
  CORRELATIONS
    OUTLIERS
      HULLS
A data analytics and visualisation company



We handle terabyte-size data   via non-traditional analytics   and visualise it in real-time.




                                  We’re recruiting




                               S.Anand@Gramener.com

More Related Content

Viewers also liked

Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2OSri Ambati
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
 

Viewers also liked (10)

PowerShell Slides
PowerShell SlidesPowerShell Slides
PowerShell Slides
 
Inlining Heuristics
Inlining HeuristicsInlining Heuristics
Inlining Heuristics
 
XGBoost (System Overview)
XGBoost (System Overview)XGBoost (System Overview)
XGBoost (System Overview)
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2O
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Xgboost
XgboostXgboost
Xgboost
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 

More from Gramener

6 Methods to Improve Your Manufacturing Process with Computer Vision
6 Methods to Improve Your Manufacturing Process with Computer Vision6 Methods to Improve Your Manufacturing Process with Computer Vision
6 Methods to Improve Your Manufacturing Process with Computer VisionGramener
 
Detecting Manufacturing Defects with Computer Vision
Detecting Manufacturing Defects with Computer VisionDetecting Manufacturing Defects with Computer Vision
Detecting Manufacturing Defects with Computer VisionGramener
 
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma & Healthcare
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma  & HealthcareHow to Identify the Right Key Opinion Leaders (KOLs) in Pharma  & Healthcare
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma & HealthcareGramener
 
Automated Barcode Generation System in Manufacturing
Automated Barcode Generation System in ManufacturingAutomated Barcode Generation System in Manufacturing
Automated Barcode Generation System in ManufacturingGramener
 
The Role of Technology to Save Biodiversity
The Role of Technology to Save BiodiversityThe Role of Technology to Save Biodiversity
The Role of Technology to Save BiodiversityGramener
 
Enable Storytelling with Power BI & Comicgen Plugin
Enable Storytelling with Power BI  & Comicgen PluginEnable Storytelling with Power BI  & Comicgen Plugin
Enable Storytelling with Power BI & Comicgen PluginGramener
 
The Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science ProjectsThe Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science ProjectsGramener
 
Low Code Platform To Build Data & AI Products
Low Code Platform To Build Data & AI ProductsLow Code Platform To Build Data & AI Products
Low Code Platform To Build Data & AI ProductsGramener
 
5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX Program5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX ProgramGramener
 
Using Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceUsing Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceGramener
 
Recession Proofing With Data : Webinar
Recession Proofing With Data : WebinarRecession Proofing With Data : Webinar
Recession Proofing With Data : WebinarGramener
 
Engage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: WebinarEngage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: WebinarGramener
 
Structure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best OutcomesStructure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best OutcomesGramener
 
Dawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - WebinarDawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - WebinarGramener
 
5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : Webinar5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : WebinarGramener
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - WebinarGramener
 
Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020Gramener
 
Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)Gramener
 
The Art of Storytelling Using Data Science
The Art of Storytelling Using Data ScienceThe Art of Storytelling Using Data Science
The Art of Storytelling Using Data ScienceGramener
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesGramener
 

More from Gramener (20)

6 Methods to Improve Your Manufacturing Process with Computer Vision
6 Methods to Improve Your Manufacturing Process with Computer Vision6 Methods to Improve Your Manufacturing Process with Computer Vision
6 Methods to Improve Your Manufacturing Process with Computer Vision
 
Detecting Manufacturing Defects with Computer Vision
Detecting Manufacturing Defects with Computer VisionDetecting Manufacturing Defects with Computer Vision
Detecting Manufacturing Defects with Computer Vision
 
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma & Healthcare
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma  & HealthcareHow to Identify the Right Key Opinion Leaders (KOLs) in Pharma  & Healthcare
How to Identify the Right Key Opinion Leaders (KOLs) in Pharma & Healthcare
 
Automated Barcode Generation System in Manufacturing
Automated Barcode Generation System in ManufacturingAutomated Barcode Generation System in Manufacturing
Automated Barcode Generation System in Manufacturing
 
The Role of Technology to Save Biodiversity
The Role of Technology to Save BiodiversityThe Role of Technology to Save Biodiversity
The Role of Technology to Save Biodiversity
 
Enable Storytelling with Power BI & Comicgen Plugin
Enable Storytelling with Power BI  & Comicgen PluginEnable Storytelling with Power BI  & Comicgen Plugin
Enable Storytelling with Power BI & Comicgen Plugin
 
The Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science ProjectsThe Most Effective Method For Selecting Data Science Projects
The Most Effective Method For Selecting Data Science Projects
 
Low Code Platform To Build Data & AI Products
Low Code Platform To Build Data & AI ProductsLow Code Platform To Build Data & AI Products
Low Code Platform To Build Data & AI Products
 
5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX Program5 Key Foundations To Build An Effective CX Program
5 Key Foundations To Build An Effective CX Program
 
Using Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad PerformanceUsing Power BI To Improve Media Buying & Ad Performance
Using Power BI To Improve Media Buying & Ad Performance
 
Recession Proofing With Data : Webinar
Recession Proofing With Data : WebinarRecession Proofing With Data : Webinar
Recession Proofing With Data : Webinar
 
Engage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: WebinarEngage Your Audience With PowerPoint Decks: Webinar
Engage Your Audience With PowerPoint Decks: Webinar
 
Structure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best OutcomesStructure Your Data Science Teams For Best Outcomes
Structure Your Data Science Teams For Best Outcomes
 
Dawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - WebinarDawn Of Geospatial AI - Webinar
Dawn Of Geospatial AI - Webinar
 
5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : Webinar5 Steps To Become A Data-Driven Organization : Webinar
5 Steps To Become A Data-Driven Organization : Webinar
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 
Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020Saving Lives with Geospatial AI - Pycon Indonesia 2020
Saving Lives with Geospatial AI - Pycon Indonesia 2020
 
Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)Driving Transformation in Industries with Artificial Intelligence (AI)
Driving Transformation in Industries with Artificial Intelligence (AI)
 
The Art of Storytelling Using Data Science
The Art of Storytelling Using Data ScienceThe Art of Storytelling Using Data Science
The Art of Storytelling Using Data Science
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to Stories
 

Recently uploaded

Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Recently uploaded (20)

Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Automated data analysis with Python

  • 1. AUTOMATED DATA ANALYSIS WITH PYTHON (PART II) S.Anand@Gramener.com
  • 2. DO WE FOLLOW PEP8?
  • 3. author repo filename errno name count Michael0x2a axe interpreter-parsing_rules.py E231 missing whitespace after ',' 14177 egirault googleplay api-googleplay_pb2.py E121 continuation line indentation is not a multiple of four 12953 mdwrigh2 pyice parsetab.py E231 missing whitespace after ',' 5452 steviesteveo projecteuler euler22.py E231 missing whitespace after ',' 5162 xiongchiamiov Mirage mirage.py W191 indentation contains tabs 4593 Ariel team Ariel-compiler-Sintactic.py E231 missing whitespace after ',' 4489 albertz PyCParser cparser.py W191 indentation contains tabs 3041 pombredanne PyCParser cparser.py W191 indentation contains tabs 3025 cshen PyCParser cparser.py W191 indentation contains tabs 3025 bohr PyCParser cparser.py W191 indentation contains tabs 2988 fj PyCParser cparser.py W191 indentation contains tabs 1863 steviesteveo projecteuler euler42.py E231 missing whitespace after ',' 1786 steviesteveo projecteuler wordlist.py E231 missing whitespace after ',' 1785 aixp pycoco Core.py E111 indentation is not a multiple of four 1760 mdoege 3NewsFeed newsfeed.py W191 indentation contains tabs 1738 mdoege 3NewsFeed newsfeed.py E101 indentation contains mixed spaces and tabs 1738 ebegoli EMLPy w3c_ir_assertions.py W191 indentation contains tabs 1650 AvsPmod AvsPmod AvsP.py E501 line too long (80 > 79 characters) 1598 chikuzen AvsPmod AvsP.py E501 line too long (80 > 79 characters) 1598 tweetr python twitter-twitterapi.py E101 indentation contains mixed spaces and tabs 1507 duartebarbosa googletranslate Languages.py E101 indentation contains mixed spaces and tabs 1422 duartebarbosa googletranslate Languages.py W191 indentation contains tabs 1422 nrub python twitter-twitterapi.py E101 indentation contains mixed spaces and tabs 1297 danudey python twitter-twitterapi.py E101 indentation contains mixed spaces and tabs 1278 idan python twitter-twitterapi.py E101 indentation contains mixed spaces and tabs 1278
  • 4. import sys import pandas as pd data = pd.read_csv(sys.argv[1])
  • 5. import sys import pandas as pd data = pd.read_csv(sys.argv[1]) print data.groupby('name').sum().sort('count') tab after keyword 1 blank lines found after function decorator 2 tab after operator 28 tab before keyword 31 unexpected indentation 41 expected an indented block 78 multiple spaces after keyword 120 ... blank line contains whitespace 40543 no spaces around keyword / parameter equals 41858 indentation is not a multiple of four 44109 missing whitespace around operator 47286 indentation contains mixed spaces and tabs 52633 line too long (80 > 79 characters) 78201 missing whitespace after ',' 91612 indentation contains tabs 168842
  • 7. DIST_CODE DOB Day Caste B/G Med Cond Total SCHOOL_NAME Kannada English Hindi Maths Science Social CHIKKABALLAPUR 13-Jul-95 Thu ST G K N 111 PRIYADHARSHINI HIGH SCHOOL 46 7 10 30 8 10 GADAG 09-Feb-95 Thu OTHERS B E N 458 LOYALA HIGH SCHOOL GADAG 86 69 52 70 90 91 MANGALORE 27-Oct-95 Fri OTHERS B K N 390 GOVT.HIGH SCHOOL KOKKADA 105 35 65 76 67 42 BELGAUM 15-Jun-95 Thu ST B M N 151 MADYAMIKA VIDYALAYA BELAVATTI 14 23 25 26 MADHUGIRI 11-Sep-95 Mon OTHERS B K N 240 SRI KALIDASA VIDYAVARDHAKA H.S. 57 35 35 48 30 35 KOLAR 08-May-95 Mon OTHERS B E N 363 DR.AMBEDKAR HIGH SCHOOL 57 63 60 61 62 60 BIJAPUR 24-May-95 Wed OTHERS B K N 451 LOYOLA HIGH SCHOOL STATION BACK 90 51 87 79 81 63 UDUPI 05-Feb-96 Mon SC B K N 239 GOVT JUNIOR COLLEGE BAILOOR 54 30 65 30 30 30 BANGALORE NORTH 20-Oct-95 Fri OTHERS G E N 530 ST MARY'S HIGH SCHOOL NO 1 T 92 78 69 77 GULBARGA 03-Jan-95 Tue OTHERS G K N 397 GOVERNMENT HIGH SCHOOL ANDOLA, 96 47 61 65 67 61 BELGAUM 10-May-94 Tue CAT-1 B K N 111 GOVERNMENT HIGH SCHOOL SULEBHAVI 21 35 9 22 18 6 BIJAPUR 10-Jul-95 Mon OTHERS B K N 380 H G P U COLLEGE SINDAGI BIJAPUR 87 43 69 65 60 56 CHIKODI 25-Apr-95 Tue OTHERS B K N 408 GOVERNMENT HIGH SCHOOL 94 54 85 47 63 65 SHIMOGA 18-Dec-95 Mon SC G K N 215 SAHYADRI HIGH SCHOOL SHIMOGA 44 35 40 31 30 35 BIJAPUR 18-Nov-93 Thu SC B K N 157 TILAGUL HIGH SCHOOL TILAGUL 29 12 35 20 31 30 KOLAR 26-Sep-93 Sun SC B K N 237 GOVERNMENT HIGH SCHOOL MEDIHAL 55 30 37 30 38 47 KOPPAL 01-Jun-93 Tue OTHERS B K N 254 GOVERNMENT HIGH SCHOOL HIRE 38 42 37 53 49 35 CHIKKABALLAPUR 21-Apr-96 Sun OTHERS B K N 251 GOVT. HIGH SCHOOL KADALAVENI 77 40 53 40 26 15 CHIKODI 25-Nov-95 Sat OTHERS B M N 477 ARUN SHAMARAO PATIL HIGH SCHOOL 70 80 66 77 BELGAUM 16-Feb-95 Thu OTHERS G U N 307 BEGUM LATIFA GIRLS HIGH SCHOOL 44 9 50 56
  • 8. import sys import pandas as pd data = pd.read_csv(sys.argv[1]) print data.groupby('DIST_CODE').means().sort('TOTAL_MARKS') TOTAL_MARKS Kannada ... Social Science DIST_CODE BIDAR 245.018650 56.594794 ... 40.368867 YADGIR 285.778553 63.193738 ... 48.891916 MADHUGIRI 291.869219 73.725051 ... 43.854291 ...... CHIKODI 354.548775 79.675186 ... 58.088485 SIRSI 356.859926 82.086493 ... 56.168686 UDUPI 358.532346 82.697818 ... 50.479084
  • 9.
  • 10. KANNADA ENGLISH HINDI MATHS SCIENCE SOCIAL SCIENCE
  • 11. HOW DO WE GENERALISE?
  • 12. Groups Things you can group by (Dimensions) Place, Categories, Attributes string, datetime, int Numbers Things you can measure (Metrics) Sizes, Values, Growth, Frequencies float, int
  • 13. category title kJ rate dairy Activia Pouring Natural Yogurt 1X950g 216 0.21 dairy Activia Pouring Strawberry Yogurt 1X950g 250 0.21 dairy Activia Pouring Vanilla Yogurt 1X950g 263 0.21 icecream Almondy Daim 400G 1804 0.75 icecream Almondy Toblerone 400G 1850 0.5 cereals Alpen 10 Pack Lite Summer Fruits Cereal Bars 210G 1222 1.57 cereals Alpen 10Pk Fruit Nut And Chocolate Cereal Bars 290G 1812 1.14 cereals Alpen Coconut And Chocolate Cereal Bars 5Pk 145G 1863 1.24 cereals Alpen Fruit And Nut With Chocolate Cereal Bar 5X29g 1812 1.24 cereals Alpen High Fruit 650G 1439 0.4 cereals Alpen Light Bars Chocolate And Orange 5X21g 1246 1.71 cereals Alpen Light Chocolate And Fudge Bar 5X21g 1264 1.71 cereals Alpen Light Sultana & Apple Bars 5Pk 105G 1197 1.71 cereals Alpen Light Summer Fruits Bars 5Pk 105G 1222 1.71 cereals Alpen No Added Sugar 1.3Kg 1488 0.31 cereals Alpen No Added Sugar 560G 1488 0.46 cereals Alpen Original 1.5Kg 1509 0.27 cereals Alpen Original Muesli 750G 1509 0.35 cereals Alpen Raspberry And Yoghurt Cereal Bars5x29g 1748 1.24 cereals Alpen Strawberry With Yoghurt Cereal Bar 5X29g 1756 1.24 dairy Alpro Natural Yofu 500G 0.28 dairy Alpro Raspberry Vanilla Yofu 4X125g 0.35 dairy Alpro Strawberry And Fof Soya Yofu 4X125g 0.35 dairy Alpro Vanilla Yofu 500G 0.28 Which categories of food are light? Which are inexpensive?
  • 14. import sys import pandas as pd data = pd.read_csv(sys.argv[1]) groups = data.dtypes[data.dtypes != float].index numbers = data.dtypes[data.dtypes == float].index >>> groups Index([category, title], dtype=object) >>> numbers Index([kJ, rate], dtype=object)
  • 15. import sys import pandas as pd data = pd.read_csv(sys.argv[1]) groups = data.dtypes[data.dtypes != float].index numbers = data.dtypes[data.dtypes == float].index for group in groups: ave = data.groupby(group).mean() for num in numbers: print ave.sort(num, ascending=False)
  • 16.
  • 17. LET’S APPLY THIS MARKS TRAINS CRICKET
  • 18. Afghanistan’s s/r Australia’s s/r Difference is large compared to the spread High probability that s/r is different 55 60 65 70 75 Average probability that s/r is different 55 60 65 70 75 Low probability that s/r are different 55 60 65 70 75
  • 19. WELCOME TO STATS 201 scipy.stats.mstats.ttest_ind scipy.stats.mstats.f_oneway
  • 20. import sys import pandas as pd from scipi.stats.mstats import f_oneway data = pd.read_csv(sys.argv[1]) groups = data.dtypes[data.dtypes != float].index numbers = data.dtypes[data.dtypes == float].index for group in groups: grouped = data.groupby(group) ave = grouped.mean() for num in numbers: F, prob = f_oneway(*grouped[number].values) print prob print ave.sort(num, ascending=False)
  • 21. LET’S APPLY THIS GROCERIES CRICKET TRAINS
  • 22.
  • 23. import sys import pandas as pd from scipi.stats.mstats import f_oneway data = pd.read_csv(sys.argv[1]) groups = data.dtypes[data.dtypes != float].index numbers = data.dtypes[data.dtypes == float].index for group in groups: grouped = data.groupby(group) ave = grouped.mean() for num in numbers: F, prob = f_oneway(*grouped[number].values) improvement = (ave[number].max() / data[number].mean() – 1) print improvement, prob # print ave.sort(num, ascending=False)
  • 24. LET’S APPLY THIS GROCERIES CRICKET MARKS TRAIN
  • 25. Hypotheses Data Insight Data Autolysis TAKE ANY DATASET THROW IT AT A PROGRAM GET INSIGHTS
  • 26. DIRECTIONS CROSS TABULATIONS CORRELATIONS OUTLIERS HULLS
  • 27. A data analytics and visualisation company We handle terabyte-size data via non-traditional analytics and visualise it in real-time. We’re recruiting S.Anand@Gramener.com