5. import sys
import pandas as pd
data = pd.read_csv(sys.argv[1])
print data.groupby('name').sum().sort('count')
tab after keyword 1
blank lines found after function decorator 2
tab after operator 28
tab before keyword 31
unexpected indentation 41
expected an indented block 78
multiple spaces after keyword 120
...
blank line contains whitespace 40543
no spaces around keyword / parameter equals 41858
indentation is not a multiple of four 44109
missing whitespace around operator 47286
indentation contains mixed spaces and tabs 52633
line too long (80 > 79 characters) 78201
missing whitespace after ',' 91612
indentation contains tabs 168842
7. DIST_CODE DOB Day Caste B/G Med Cond Total SCHOOL_NAME Kannada English Hindi Maths Science Social
CHIKKABALLAPUR 13-Jul-95 Thu ST G K N 111 PRIYADHARSHINI HIGH SCHOOL 46 7 10 30 8 10
GADAG 09-Feb-95 Thu OTHERS B E N 458 LOYALA HIGH SCHOOL GADAG 86 69 52 70 90 91
MANGALORE 27-Oct-95 Fri OTHERS B K N 390 GOVT.HIGH SCHOOL KOKKADA 105 35 65 76 67 42
BELGAUM 15-Jun-95 Thu ST B M N 151 MADYAMIKA VIDYALAYA BELAVATTI 14 23 25 26
MADHUGIRI 11-Sep-95 Mon OTHERS B K N 240 SRI KALIDASA VIDYAVARDHAKA H.S. 57 35 35 48 30 35
KOLAR 08-May-95 Mon OTHERS B E N 363 DR.AMBEDKAR HIGH SCHOOL 57 63 60 61 62 60
BIJAPUR 24-May-95 Wed OTHERS B K N 451 LOYOLA HIGH SCHOOL STATION BACK 90 51 87 79 81 63
UDUPI 05-Feb-96 Mon SC B K N 239 GOVT JUNIOR COLLEGE BAILOOR 54 30 65 30 30 30
BANGALORE NORTH 20-Oct-95 Fri OTHERS G E N 530 ST MARY'S HIGH SCHOOL NO 1 T 92 78 69 77
GULBARGA 03-Jan-95 Tue OTHERS G K N 397 GOVERNMENT HIGH SCHOOL ANDOLA, 96 47 61 65 67 61
BELGAUM 10-May-94 Tue CAT-1 B K N 111 GOVERNMENT HIGH SCHOOL SULEBHAVI 21 35 9 22 18 6
BIJAPUR 10-Jul-95 Mon OTHERS B K N 380 H G P U COLLEGE SINDAGI BIJAPUR 87 43 69 65 60 56
CHIKODI 25-Apr-95 Tue OTHERS B K N 408 GOVERNMENT HIGH SCHOOL 94 54 85 47 63 65
SHIMOGA 18-Dec-95 Mon SC G K N 215 SAHYADRI HIGH SCHOOL SHIMOGA 44 35 40 31 30 35
BIJAPUR 18-Nov-93 Thu SC B K N 157 TILAGUL HIGH SCHOOL TILAGUL 29 12 35 20 31 30
KOLAR 26-Sep-93 Sun SC B K N 237 GOVERNMENT HIGH SCHOOL MEDIHAL 55 30 37 30 38 47
KOPPAL 01-Jun-93 Tue OTHERS B K N 254 GOVERNMENT HIGH SCHOOL HIRE 38 42 37 53 49 35
CHIKKABALLAPUR 21-Apr-96 Sun OTHERS B K N 251 GOVT. HIGH SCHOOL KADALAVENI 77 40 53 40 26 15
CHIKODI 25-Nov-95 Sat OTHERS B M N 477 ARUN SHAMARAO PATIL HIGH SCHOOL 70 80 66 77
BELGAUM 16-Feb-95 Thu OTHERS G U N 307 BEGUM LATIFA GIRLS HIGH SCHOOL 44 9 50 56
12. Groups Things you can group by
(Dimensions) Place, Categories, Attributes
string, datetime, int
Numbers Things you can measure
(Metrics) Sizes, Values, Growth, Frequencies
float, int
13. category title kJ rate
dairy Activia Pouring Natural Yogurt 1X950g 216 0.21
dairy Activia Pouring Strawberry Yogurt 1X950g 250 0.21
dairy Activia Pouring Vanilla Yogurt 1X950g 263 0.21
icecream Almondy Daim 400G 1804 0.75
icecream Almondy Toblerone 400G 1850 0.5
cereals Alpen 10 Pack Lite Summer Fruits Cereal Bars 210G 1222 1.57
cereals Alpen 10Pk Fruit Nut And Chocolate Cereal Bars 290G 1812 1.14
cereals Alpen Coconut And Chocolate Cereal Bars 5Pk 145G 1863 1.24
cereals Alpen Fruit And Nut With Chocolate Cereal Bar 5X29g 1812 1.24
cereals Alpen High Fruit 650G 1439 0.4
cereals Alpen Light Bars Chocolate And Orange 5X21g 1246 1.71
cereals Alpen Light Chocolate And Fudge Bar 5X21g 1264 1.71
cereals Alpen Light Sultana & Apple Bars 5Pk 105G 1197 1.71
cereals Alpen Light Summer Fruits Bars 5Pk 105G 1222 1.71
cereals Alpen No Added Sugar 1.3Kg 1488 0.31
cereals Alpen No Added Sugar 560G 1488 0.46
cereals Alpen Original 1.5Kg 1509 0.27
cereals Alpen Original Muesli 750G 1509 0.35
cereals Alpen Raspberry And Yoghurt Cereal Bars5x29g 1748 1.24
cereals Alpen Strawberry With Yoghurt Cereal Bar 5X29g 1756 1.24
dairy Alpro Natural Yofu 500G 0.28
dairy Alpro Raspberry Vanilla Yofu 4X125g 0.35
dairy Alpro Strawberry And Fof Soya Yofu 4X125g 0.35
dairy Alpro Vanilla Yofu 500G 0.28
Which categories of food are light? Which are inexpensive?
14. import sys
import pandas as pd
data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index
>>> groups
Index([category, title], dtype=object)
>>> numbers
Index([kJ, rate], dtype=object)
15. import sys
import pandas as pd
data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index
for group in groups:
ave = data.groupby(group).mean()
for num in numbers:
print ave.sort(num, ascending=False)
18. Afghanistan’s s/r Australia’s s/r
Difference is large
compared to the spread High probability that
s/r is different
55 60 65 70 75
Average probability that
s/r is different
55 60 65 70 75
Low probability that s/r
are different
55 60 65 70 75
19. WELCOME TO STATS 201
scipy.stats.mstats.ttest_ind
scipy.stats.mstats.f_oneway
20. import sys
import pandas as pd
from scipi.stats.mstats import f_oneway
data = pd.read_csv(sys.argv[1])
groups = data.dtypes[data.dtypes != float].index
numbers = data.dtypes[data.dtypes == float].index
for group in groups:
grouped = data.groupby(group)
ave = grouped.mean()
for num in numbers:
F, prob = f_oneway(*grouped[number].values)
print prob
print ave.sort(num, ascending=False)
27. A data analytics and visualisation company
We handle terabyte-size data via non-traditional analytics and visualise it in real-time.
We’re recruiting
S.Anand@Gramener.com