Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20160708 データ処理のプラットフォームとしてのpython 札幌

2,491 views

Published on

インサイトテクノロジーさん主催の[7月8日@札幌] Sapporo Tech Barでの発表スライドです。 http://www.db-tech-showcase.com/events-seminars/db-tech-salon/20160708_sapporo_tech_bar

Published in: Software
  • Be the first to comment

20160708 データ処理のプラットフォームとしてのpython 札幌

  1. 1. Python Sky
  2. 2. • • Python 2000 (**) • db tech showcase MongoDB • • FB: Ryuji Tamagawa • Twitter : tamagawa_ryuji
  3. 3. 2015
  4. 4. 2016
  5. 5. • Python • Python
  6. 6. • Python • • Python • NumPy, SciPy, matplotlib, Pandas • Python • scikit-learn • TensorFlow • Python IPython, Jupyter notebook, Spyder, VisualStudio • Python • Python • Pandas • Spark - PySpark DataFrame API • matplotlib
  7. 7. Part 1 : Python
  8. 8. Python • • Google Guido Google Google 1 • NumPy, SciPy, matplotlib → Pandas • • -2000 Linux -2010 Web Trac Google
  9. 9. Python • • • • → •
  10. 10. Python • • pyODBC • Web WSGI
  11. 11. Python • 2.x 3.x 32bit 64bit 64bit • 2.x • 3.x 3 • 2.x 3.x
  12. 12. • Ruby? • R? • Java? • Scala?
  13. 13. Python • Python ’CPython’ JIT PyPy JVM Jython .Net IronPython • CPython • CPython 2 • C • processing PySpark
  14. 14. Python • Python • 1 Linux Mac OS Python Python Mac • Python pip 3.x Python 2.7.9 2.x Python pip Linux Python pip yum apt • Python Anaconda Python conda • python 2016
 http://qiita.com/y__sama/items/5b62d31cb7e6ed50f02c
  15. 15. NumPy, SciPy, matplotlib, Pandas • • NumPy SciPy • Pandas Pandas Pandas NumPy • Anaconda Python
  16. 16. Python • scikit-learn http:// scikit-learn.org/stable/
  17. 17. Python • TensorFlow 
 Python
  18. 18. Python 
 IPython Jupyter, … IDE Spyder, Rodeo Visual Studio, PyCharm, PyDev
  19. 19. • • GUI IDLE • OK
  20. 20. • IPython • • • Anaconda • pip
  21. 21. 
 • Jupyter Notebook • Python • IPython Notebook Python • Apache Zeppelin http:// zeppelin.apache.org
  22. 22. IDE • R RStudio • IDE • • 2 Spyder Rodeo • Spyder
  23. 23. • • Visual Studio • Eclipse PyDev • PyCharm •
  24. 24. Part 2 : Python
  25. 25. 1 1.2 1000000L Python2 ‘abc’ u’ ’ Python2 [1, 2, 3,‘foo’,‘bar’,‘foo’] (1, 2, 3,‘foo’,‘bar’,‘foo’) {‘k1’:‘value1’,‘k2’:‘value2’} set(1, 2, 3,‘foo’,‘bar’)
  26. 26. • • • split s = ‘foo, bar, baz’ items = s.split(‘,’) print items[0] print items[-1] print items[0][-2:]
  27. 27. • 
 list comprehension • 
 dictionary comprehension • lambda map, reduce, filter sList = [‘foo’, ‘bar’, ‘baz’] lList = [len(s) for s in sList] lList = map(lambda s:len(s), sList) lDict = {s:len(s) for s in sList}
  28. 28. Pandas • Pandas • matplotlib / seaborn • NumPy SciPy Python • Pandas + matplotlib OK Pandas NumPy NumPy / SciPy
  29. 29. Pandas • Pandas DataFrame • R • RDB 2 • index Series Columns Columns Series Series SeriesIndex
  30. 30. Pandas I/O • CSV JSON RDB Excel • column • RDB • import pandas as pd pd.read_csv(<filename>) pd.read_json(<filename>) pd.to_csv(<filename>) pd.to_excel(<filename>) # pd.to_clipboard()
  31. 31. • http://sinhrks.hatenablog.com/entry/2015/01/28/073327 0 1 import pandas as pd df[‘nValue’] = df[‘value’] / sum(df[‘value’]) id value color sapporo 43 red osaka 42 pink matsumoto 40 green id value color nValue sapporo 43 red 0.344 osaka 42 pink 0.336 matsumoto 40 green 0.32 Python
  32. 32. Spark - PySpark DataFrame API • Python • Spark PySpark findSpark Spark • Python Spark API DataFrame API • Spark Pandas Spark PySpark Spark
 node Spark
 node Spark
 node Spark
 node driver
  33. 33. matplotlib / seaborn • • Python NumPy / Pandas • Jupyter Notebook Spyder
  34. 34. Questions ?

×