Giovedì 28 Agosto 2025
@PyDataVenice #22 #Meetup #PyData
in presenza e in streaming
alle ore 19:00
Venice
Alessandra Bilardi
Data / Automation Specialist
@ Corley Cloud
Exploratory Data Analysis
on a Kaggle competition
#DataAnalysis
#Workshop
Promotori di PyData Venice #22
Agenda
Speech / Workshop
Iniziative
Prossimi incontri
Networking
Exploratory Data Analysis
@PyDataVenice #22 #Meetup #PyData
Alessandra Bilardi - Data / Automation Specialist @ Corley Cloud
Alessandra Bilardi
Data & Automation Specialist @ Corley Cloud
● AWS User Group Venezia leader
● Coderdojo mentor
● PyData Venice leader
● PyVenice cofounder
alessandra.bilardi@gmail.com
@abilardi
bilardi
Summary
The basics of EDA
Python libraries for EDA
The basics of EDA
Form: Networking post meetup ?
1. [online|presence]
2. [yes|no]
Form: First step to analyze a data file ?
1. [one word]
Form: Is a pie chart or a bar chart better ?
1. [pie|bar]
Time for Some Theory: human perception
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
Time for Some Theory: human perception
1. position, common scale
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
5. size, area
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
➢ tree map, bubble chart
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
5. size, area
6. size, volume & brightness
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
➢ tree map, bubble chart
➢ 3d, heat map
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
5. size, area
6. size, volume & brightness
7. color
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
➢ tree map, bubble chart
➢ 3d, heat map
➢ color hue scales
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
5. size, area
6. size, volume & brightness
7. color
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
➢ tree map, bubble chart
➢ 3d, heat map
➢ color hue scales
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
5. size, area
6. size, volume & brightness
7. color
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
➢ tree map, bubble chart
➢ 3d, heat map
➢ color hue scales
Time for Some Theory: human perception
1. position, common scale
2. position, non-aligned
3. size, length
4. orientation, angle
5. size, area
6. size, volume & brightness
7. color
http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
➢ scatter plot, point map
➢ scatter plot matrix
➢ bar chart, histogram
➢ pie chart, gradient lines
➢ tree map, bubble chart
➢ 3d, heat map
➢ color hue scales
Python libraries for EDA
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing
YData profiling table & heatmap * histogram bar & heatmap
dabl scatter plot histogram ---
klib table & heatmap ** line / histo bar & heatmap
PyGWalker calculated field drag & drop calculated field
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing year of birth contributors
YData profiling table & heatmap * histogram bar & heatmap 2016 116
dabl scatter plot histogram --- 2019 25
klib table & heatmap ** line / histo bar & heatmap 2020 6
PyGWalker calculated field drag & drop calculated field 2023 23
** also single feature heatmap
* also interactions
EDA tools
** also single feature heatmap
* also interactions
tool correlation distribution missing version contributors
YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116
dabl scatter plot histogram --- 0.3.2 25
klib table & heatmap ** line / histo bar & heatmap 1.3.1 6
PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
EDA tools
tool correlation distribution missing year of birth contributors
YData profiling table & heatmap * histogram bar & heatmap 2016 116
dabl scatter plot histogram --- 2019 25
klib table & heatmap ** line / histo bar & heatmap 2020 6
PyGWalker calculated field drag & drop calculated field 2023 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing year of birth contributors
YData profiling table & heatmap * histogram bar & heatmap 2016 116
dabl scatter plot histogram --- 2019 25
klib table & heatmap ** line / histo bar & heatmap 2020 6
PyGWalker calculated field drag & drop calculated field 2023 23
** also single feature heatmap
* also interactions
EDA tools
tool correlation distribution missing year of birth contributors
YData profiling table & heatmap * histogram bar & heatmap 2016 116
dabl scatter plot histogram --- 2019 25
klib table & heatmap ** line / histo bar & heatmap 2020 6
PyGWalker calculated field drag & drop calculated field 2023 23
** also single feature heatmap
* also interactions
Questions ?
@PyDataVenice #22 #Meetup #PyData
Exploratory Data Analysis
on a Kaggle competition
@PyDataVenice #22 #Meetup #PyData
Alessandra Bilardi - Data / Automation Specialist @ Corley Cloud
Summary
Competition description
__init__ phase
Competition description
Form: What you need before you start doing EDA ?
1. [one word]
__init__ phase
Telegram: PyDataVe
Workshop material
● https://github.com/pydata-venice/PyDataVE
Questions ?
@PyDataVenice #22 #Meetup #PyData
Iniziative: workshop feedback
Iniziative: venice.pydata.org
Iniziative: pydata-venice.github.io
Prossimi incontri
● venerdì 5 settembre, PyVenice,
San Donà di Piave
Prossimi incontri
● venerdì 5 settembre, PyVenice, San Donà di Piave
● giovedì 30 ottobre, PyData Venice, Mestre
Prossimi incontri
● venerdì 5 settembre, PyVenice, San Donà di Piave
● giovedì 30 ottobre, PyData Venice, Mestre
● da martedì 9 a giovedì 11 dicembre, PyData Global, Online
Prossimi incontri
● venerdì 5 settembre, PyVenice, San Donà di Piave
● giovedì 30 ottobre, PyData Venice, Mestre
● da martedì 9 a giovedì 11 dicembre, PyData Global, Online
● giovedì 18 dicembre, PyData Venice, Mestre
Proposte
Thanks for listening.
@PyDataVenice #22 #Meetup #PyData

Data Analysis Workshop and Kaggle Competition - 2025-08-28

  • 1.
    Giovedì 28 Agosto2025 @PyDataVenice #22 #Meetup #PyData in presenza e in streaming alle ore 19:00 Venice Alessandra Bilardi Data / Automation Specialist @ Corley Cloud Exploratory Data Analysis on a Kaggle competition #DataAnalysis #Workshop
  • 2.
  • 3.
  • 4.
    Exploratory Data Analysis @PyDataVenice#22 #Meetup #PyData Alessandra Bilardi - Data / Automation Specialist @ Corley Cloud
  • 5.
    Alessandra Bilardi Data &Automation Specialist @ Corley Cloud ● AWS User Group Venezia leader ● Coderdojo mentor ● PyData Venice leader ● PyVenice cofounder alessandra.bilardi@gmail.com @abilardi bilardi
  • 6.
    Summary The basics ofEDA Python libraries for EDA
  • 7.
  • 8.
    Form: Networking postmeetup ? 1. [online|presence] 2. [yes|no]
  • 9.
    Form: First stepto analyze a data file ? 1. [one word]
  • 10.
    Form: Is apie chart or a bar chart better ? 1. [pie|bar]
  • 11.
    Time for SomeTheory: human perception http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf
  • 12.
    Time for SomeTheory: human perception 1. position, common scale http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map
  • 13.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix
  • 14.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram
  • 15.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines
  • 16.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle 5. size, area http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines ➢ tree map, bubble chart
  • 17.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle 5. size, area 6. size, volume & brightness http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines ➢ tree map, bubble chart ➢ 3d, heat map
  • 18.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle 5. size, area 6. size, volume & brightness 7. color http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines ➢ tree map, bubble chart ➢ 3d, heat map ➢ color hue scales
  • 19.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle 5. size, area 6. size, volume & brightness 7. color http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines ➢ tree map, bubble chart ➢ 3d, heat map ➢ color hue scales
  • 20.
  • 21.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle 5. size, area 6. size, volume & brightness 7. color http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines ➢ tree map, bubble chart ➢ 3d, heat map ➢ color hue scales
  • 22.
    Time for SomeTheory: human perception 1. position, common scale 2. position, non-aligned 3. size, length 4. orientation, angle 5. size, area 6. size, volume & brightness 7. color http://euclid.psych.yorku.ca/www/psy6135/papers/ClevelandMcGill1984.pdf ➢ scatter plot, point map ➢ scatter plot matrix ➢ bar chart, histogram ➢ pie chart, gradient lines ➢ tree map, bubble chart ➢ 3d, heat map ➢ color hue scales
  • 23.
  • 24.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
  • 25.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 26.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 27.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 28.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 29.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 30.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 31.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 32.
    EDA tools tool correlationdistribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23 ** also single feature heatmap * also interactions
  • 33.
    EDA tools tool correlationdistribution missing YData profiling table & heatmap * histogram bar & heatmap dabl scatter plot histogram --- klib table & heatmap ** line / histo bar & heatmap PyGWalker calculated field drag & drop calculated field ** also single feature heatmap * also interactions
  • 34.
    EDA tools tool correlationdistribution missing year of birth contributors YData profiling table & heatmap * histogram bar & heatmap 2016 116 dabl scatter plot histogram --- 2019 25 klib table & heatmap ** line / histo bar & heatmap 2020 6 PyGWalker calculated field drag & drop calculated field 2023 23 ** also single feature heatmap * also interactions
  • 35.
    EDA tools ** alsosingle feature heatmap * also interactions tool correlation distribution missing version contributors YData profiling table & heatmap * histogram bar & heatmap 4.16.1 116 dabl scatter plot histogram --- 0.3.2 25 klib table & heatmap ** line / histo bar & heatmap 1.3.1 6 PyGWalker calculated field drag & drop calculated field 0.4.9.15 23
  • 36.
    EDA tools tool correlationdistribution missing year of birth contributors YData profiling table & heatmap * histogram bar & heatmap 2016 116 dabl scatter plot histogram --- 2019 25 klib table & heatmap ** line / histo bar & heatmap 2020 6 PyGWalker calculated field drag & drop calculated field 2023 23 ** also single feature heatmap * also interactions
  • 37.
    EDA tools tool correlationdistribution missing year of birth contributors YData profiling table & heatmap * histogram bar & heatmap 2016 116 dabl scatter plot histogram --- 2019 25 klib table & heatmap ** line / histo bar & heatmap 2020 6 PyGWalker calculated field drag & drop calculated field 2023 23 ** also single feature heatmap * also interactions
  • 38.
    EDA tools tool correlationdistribution missing year of birth contributors YData profiling table & heatmap * histogram bar & heatmap 2016 116 dabl scatter plot histogram --- 2019 25 klib table & heatmap ** line / histo bar & heatmap 2020 6 PyGWalker calculated field drag & drop calculated field 2023 23 ** also single feature heatmap * also interactions
  • 39.
  • 40.
    Exploratory Data Analysis ona Kaggle competition @PyDataVenice #22 #Meetup #PyData Alessandra Bilardi - Data / Automation Specialist @ Corley Cloud
  • 41.
  • 42.
  • 43.
    Form: What youneed before you start doing EDA ? 1. [one word]
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    Prossimi incontri ● venerdì5 settembre, PyVenice, San Donà di Piave
  • 52.
    Prossimi incontri ● venerdì5 settembre, PyVenice, San Donà di Piave ● giovedì 30 ottobre, PyData Venice, Mestre
  • 53.
    Prossimi incontri ● venerdì5 settembre, PyVenice, San Donà di Piave ● giovedì 30 ottobre, PyData Venice, Mestre ● da martedì 9 a giovedì 11 dicembre, PyData Global, Online
  • 54.
    Prossimi incontri ● venerdì5 settembre, PyVenice, San Donà di Piave ● giovedì 30 ottobre, PyData Venice, Mestre ● da martedì 9 a giovedì 11 dicembre, PyData Global, Online ● giovedì 18 dicembre, PyData Venice, Mestre
  • 55.
  • 56.