Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pyconmini Hiroshima 2018

502 views

Published on

Pycon mini hiroshima 2018: Introduction to Data Science in Python (Visualization ver.)

Published in: Data & Analytics
  • Be the first to comment

Pyconmini Hiroshima 2018

  1. 1. Introduction to Data Science in Python (Visualization ver.) 廻船孝行 (KAISEN Takayuki) PyCon mini Hiroshima 2018 Contact: ksn0215@gmail.com Ver. 3.1 Repository: https://github.com/ksnt/pycon_hiro_2018
  2. 2. Overview 1. Geographic Data Analysis 2. Interactive Graph and Application 3. Discussion Python 3.6.3 Pip 10.0.1 Development Environment Ubuntu 17.10
  3. 3. ・ Injury due to an accident Notice ・ Hard to pronounce some words ・ Appreciate for your cooperation in advance!
  4. 4. 0. WHO ARE YOU? ⚫ Favorite Languages ⚪ Python, R, Scala ⚫ Interests ⚪SNA, CSS, CMC, ABM, Complex Systems, Data Science, ... ⚫ Python Conference Attendance ⚪PyCon mini JP (2010?) ⚪PyConJP 2011 ⚪Tokyo.Scipy (2011?) ⚫ Love Online Learning ⚫ Oct 2017 - HiroshimaFreeman, L. (2004). The development of social network analysis. A Study in the Sociology of Science, 1. 1. Social network analysis is motivated by a structural in- tuition based on ties linking social actors, 2. It is grounded in systematic empirical data, 3. It draws heavily on graphic imagery, and 4. It relies on the use of mathematical and/or computation- al models.
  5. 5. Takeaways: All I am Talking to You 1. It is incredibly easy to make use of geographic data with Python (Folium) 2. It is incredibly easy to develop data driven web application with Python (Plotly and Dash)
  6. 6. 1. Geographic Data Analysis 2. Interactive Graph and Application 3. Discussion
  7. 7. How to use geographic data GIS = Geographic Information System “A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data. “ Wikipedia YOU DON’T HAVE TO USE GIS! Reference: “Python Geospatial Development – Third Edition” Chapter2
  8. 8. How to make use of Geographic data in Python ・ArcGIS, QGIS (PyQGIS) ・Geopandas ・Folium (Leaflet) ・Geopy ・And so forth… (reference) https://qiita.com/aimof/items/b4e4551d27abaf5bb258 ⚪ Do not need knowledge on GIS ⚪ Can easily create Web map
  9. 9. How to install folium $ pip install folium $ conda install -c conda-forge folium or
  10. 10. Data & Visualization(1) https://nbviewer.jupyter.org/github/ksnt/Predictor-of-blights-in- Detroit/blob/master/Final_Report_1.1.ipynb HERE!
  11. 11. Data & Visualization(2) df = pd.Dataframe() df = df.attend(crime_data[‘LAT’]) df = df.attend(crime_data[‘LON’]) df = df.T df = df.reset_index(drop=True) df[‘LAT’] = round(df[‘LAT’],3) df[‘LON’] = round(df[‘LON’],3) df.head()
  12. 12. Data & Visualization(3) from folium.plugins import HeatMap from IPython.display import HTML import folium map_detroit = folium.Map(location=(42.40,-83.01)) data = [] for i in range(len(df)): data.append((df['LAT'][i],df['LON'][i])) HeatMap(data,radius=9).add_to(map_detroit) map_detroit.save('crimes.html') HTML(r'<iframe width="800" height="500" frameborder="0" allowfullscreen src="./crimes.html"></iframe>') https://nbviewer.jupyter.org/github/ksnt/Predictor-of-blights-in- Detroit/blob/master/Final_Report_1.1.ipynb
  13. 13. Data & Visualization(4) How to visualize geographical data with folium 1. Data cleaning 2. Create map 3. Put the data into the map! (4. Save the map as a HTML file ) (5. Draw the HTML file ) HeatMap(data,radius=9).add_to(map_detroit)
  14. 14. 1. Geographic Data Analysis 2. Interactive Graph and Application 3. Discussion
  15. 15. Data Visualization in Python ⚫ seaborn: cool ⚫ bokeh: cool, interactive ⚫ plotly: cool, interactive, reactive ⚫ matplotlib: standard ⚫ PixieDust (IBM?): interactive, …? >>> plotly.__version__ '2.4.1' >>> dash.__version__ '0.21.0' ⚫ mpld3: cool, interactive
  16. 16. How to Use Plotly and Dash $ pip install plotly $ pip install dash $ python >>> import plotly >>> import dash
  17. 17. Interactive Graph For Scatter Plot (Optional) https://nbviewer.jupyter.org/gist/ksnt/340910aae39670202e4f790213e7afdc
  18. 18. Interactive Graph for Bar Plot import pandas as pd import plotly import plotly.graph_objs as go df2 = pd.read_excel('globalterrorismdb_0616dist.xlsx',header=0) data = [go.Bar( x=df2["country_txt"].value_counts()[:20].index, y=df2["country_txt"].value_counts()[:20] )] layout = go.Layout( title="Top 20 Frequency of Terrorism Incidents 1970 - 2015", xaxis={"title":"Country"}, yaxis={"title":"Occurence of terrorism"}, ) fig = go.Figure(data=data, layout=layout) # Preparation of plot by Plotly plotly.offline.iplot(fig, filename='basic-bar') # not online figure You have to prepare for this data! https://nbviewer.jupyter.org/gist/ksnt/eb8ac99dd69ecc5dc5774bf673977ceb
  19. 19. Interactive Graph for Time Series Plot layout = plotly.graph_objs.Layout( title="Occurence of Terrorism Incidents", xaxis={"title":"Year"}, yaxis={"title":"Occurence of terrorism"}, ) iraq_incidents = df2[df2["country_txt"] == "Iraq"] iraq_incidents_count = iraq_incidents['iyear'].value_counts() iraq_incidents_count = iraq_incidents_count.sort_index() iraq = go.Scatter( x=iraq_incidents_count.index, y=iraq_incidents_count, name = "Iraq", line = dict(color="black"), Opacity = 0.8) year = [i for i in range(1970,2016)] data = [iraq,us,pakistan,india,afghanistan,colombia,peru,phil,el,uk,turkey,spain,sri,somalia,nigeria,algeria, france,yemen,lebanon] fig = plotly.graph_objs.Figure(data=data, layout=layout) plotly.offline.iplot(fig, show_link=False,config={"displaylogo":False, "modeBarButtonsToRemove": ["sendDataToCloud"]}) https://nbviewer.jupyter.org/gist/ksnt/eb8ac99dd69ecc5dc5774bf673977ceb
  20. 20. Additional Example – MonteCarlo Simulation (Optional) https://nbviewer.jupyter.org/gist/ksnt/101a44cc21b0eb990f96dc1d640dbd42
  21. 21. Plotly Dash “”” Dash is Shiny for Python “”” ― Chris Parmer, Dash: Shiny for Python https://youtu.be/5BAthiN0htc “ Dash is a Python framework for building web application” ・ Flask ・ React.js ・ Ideal for building data visualization apps
  22. 22. Monte Carlo Simulator (1) https://montecarlo-dash-app.herokuapp.com/
  23. 23. Monte Carlo Simulator (2) Krauth, W. (2006). Statistical mechanics: algorithms and computations (Vol. 13). OUP Oxford. Covered by points Num of all points →S(□) = 4 r=1 r=1 Num of point in the circle → S(○) = π Num of all points Num of point in the circle S(○) S(□) = 4 π Count up these points! Points = {x,y}, x,y 〜 U(-1,1) Useful article about Monte Carlo Simulation in Japanese is: —モンテカルロ法の前線 サイコロを振って積分する方法 福島 孝治 https://www.smapip.is.tohoku.ac.jp/~smapip/2003/tutorial/presentation/koji-hukushima.pdf Institution
  24. 24. Monte Carlo Simulator (3) https://montecarlo-dash-app.herokuapp.com/
  25. 25. Monte Carlo Simulator (4) https://gist.github.com/ksnt/ccd88b6f63081e6d2d11f0daf6d0bc4e ⚫ Import libraries ⚫ app = dash.Dash() server=app.server server.secret_key = os.environ.get('secret_key', 'secret') ⚫ app.layout = html.Div( # WRITE LAYOUT) ⚫ @app.callback( #WRITE OPERATION) ⚫ if __name__ == '__main__': app.run_server(debug=True) View Controller
  26. 26. Python vs R as Data Visualization Tool Speed Extensibility Price Packages/ Libraries (for Data Analysis) Python ○ ◎ Free ML R △ △ OOP (S3,S4,R5(>=2.12)) Free Statistics Tableau ? △? For EDA? ¥18000/year ¥51000/year ¥102000/year ¥0/year students and teachers ? Kibana, Superset, Redash, Metabase, Splunk, KNIME, Google Charts, etc… Julia, Matlab, Scilab, Octave, etc...
  27. 27. 3 – 2ε. Welcome to Plotly Dash Document Translation Project! https://goo.gl/wnjHA6 HERE!
  28. 28. 3 – ε. I am looking for new opportunities! ・ Places: Anywhere ・ Salary & Benefits: Negotiable ・ Feel free to talk to me! ・ Like: International, Diverse, Python, Data analysis
  29. 29. 1. Geographic Data Analysis 2. Interactive Graph and Application 3. Discussion
  30. 30. Discussion

×