Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Python Visualisation for Data Science

1,318 views

Published on

My personal usage of Python Visualisation Libraries for Data Science

Published in: Data & Analytics
  • Be the first to comment

Python Visualisation for Data Science

  1. 1. Data Vis for Data Science Usage of Python Visualisation Libraries Amit Kapoor @amitkaps
  2. 2. Data Science Pipeline — Frame: Problem definition — Acquire: Data ingestion — Refine: Data wrangling — Transform: Feature creation — Explore: Feature selection — Model: Model creation & assessment — Insight: Solution communication
  3. 3. Role of Visualisation — Frame: Structuring (issue tree, hypotheses) — Acquire: Loading (progress, errors) — Refine: Profiling (missing values, outliers) — Transform: Univariate & Bivariate Vis (1D, 2D) — Explore: Multi Dimensional Vis (3D ... ND) — Model: Model Vis (predictions, errors, models) — Insight: Vis Comm (chart, narrative, dashboard)
  4. 4. Understanding Visualisation — Domain & Task Layer e.g. Tabular Data for EDA — Data Layer e.g. Data Types, Transformation — Visual Layer e.g. Encoding, Marks, Coordinate — Annotation Layer e.g. Labels, Ticks, Titles — Interaction Layer e.g. Filtering, Highlighting, Selection
  5. 5. Python Visualisation Libraries — Matplotlib — Pandas built-in plotting — ggpy — Altair — Seaborn — Plotly — Bokeh — HoloViews — VisPy — Lightning — pygg
  6. 6. Choosing a Visualisation Library — Ease of Learning: How hard is the API? — Coverage: How many graphic types can it cover? — Approach: Is it Charting or Grammar based? — Documentation: How easy is it to make basics graphs? — Community Support: How hard is it to make complex graphs?
  7. 7. Notes in Circulation year | type | denom | value | money | number | ------- | -------| ------ | ------ | ------- | ------ | 1977 | Notes | 0001 | 1 | 2.72 | 2.720 | 1977 | Notes | 1000 | 1000 | 0.55 | 0.001 | 1977 | Notes | 0002 | 2 | 1.48 | 0.740 | 1977 | Notes | 0050 | 50 | 9.95 | 0.199 | ... | ... | ... | ... | ... | ... | 2015 | Notes | 0500 | 500 | 7853.75 | 15.708 | 2015 | Notes | 0001 | 1 | 3.09 | 3.090 | 2015 | Notes | 0010 | 10 | 320.15 | 32.015 | 2015 | Notes | 1000 | 1000 | 6325.68 | 6.326 |
  8. 8. Use Pandas for Base Plotting # Loading Data import pandas as pd notes = pd.read_csv('notes.csv') # Data Transformation notes_wide = pd.pivot_table(data = notes, index="year", columns="denom", values="money") # Plotting notes_wide.plot(kind="line")
  9. 9. Use Matplotlib for Annotation # Basic Styling import matplotlib.pyplot as pet plt.rcParams['figure.figsize'] = (9,6) plt.style.use('ggplot') # Plotting notes_wide.plot(kind="line") # Adding Annotation plt.ylabel('Value INR Bns') plt.title('Notes in Circulation')
  10. 10. Ideally use ggplot like R from plot import * ggplot(notes, aes(x='year', y='money', color='denom')) + / geom_line()
  11. 11. Use Altair for Grammar Visualisation from altair import Chart Chart(notes).mark_line().encode( x='year:N', y='money', color='denom' )
  12. 12. Personal Usage — Use Pandas for base plotting and time series — Use Matplotlib for matrices and customisation — Use Seaborn for 1D & 2D statistical graphs, especially categorical variable — Use IPython Widgets for model interaction — Use Datashader for Big Data Visualisation — Experimenting with Altair
  13. 13. What about interactivity? — Watch out for Altair - Interaction will be build in soon — Use Bokeh for web-based interactive dashboard, but require learning a different API — Use Plotly for creating full interactive charts. Integration with Matplotlib available.
  14. 14. Get in touch with me Amit Kapoor @amitkaps amitkaps.com

×