Python on Science ? Yes, We can.

2,701 views

Published on

Keynote at Python Nordeste 2014 01/05/2014

Published in: Science, Technology

Python on Science ? Yes, We can.

  1. 1. Python on Science? Marcel Caraciolo @marcelcaraciolo CTO of Genomika Diagnósticos, Scientist, MsC at Computer Science and Data Analysis, works with Python for 7 years, interested at mobile education, machine learning and dataaaaa! Current president of Association PythonBrazil. Recife, Brazil - http://aimotion.blogspot.com Yes we can!
  2. 2. About me Creator os several scientific python packages including crab (recsys), benchy and now biopandas Until last year, Scientist Chief at Atepassar.com (e-learning social network) Co-Founder and Instructor of PyCursos, teaching Python on-line including the famous Masanori’s Python for Zoooommbiesss!! Interested at Python, mobile, e-learning, machine learning and now my new acquisition skill: bioinformatics!!!
  3. 3. 2014, new challenges! http://www.genomika.com.br
  4. 4. Onde tudo começou…
  5. 5. ! Março 2013 ! https://github.com/ marcelcaraciolo/Geo- Friendship-Visualization
  6. 6. What already you know…
  7. 7. What already you know…
  8. 8. What already you know…
  9. 9. What already you know…
  10. 10. Putting Science back in Comp Science Much of the software stack is for systems programming C++, Java, .NET, ObjC, web, etc. Complex numbers ? Vectorized primitives ? Software stack for scientists is not helpful as it should be FORTRAN, C/C++ is still where many scientists end up
  11. 11. High Performance with Big Data
  12. 12. Packages for data analysis and visualization
  13. 13. Syntax - Gets out of your way !!
  14. 14. Community Driven
  15. 15. Ready for web applications
  16. 16. Which is better Data Analysis language ? R or Python ? Quora, http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python “Python  is  good  for  data  cleanup,  R  for  sta7s7cal  models”  
  17. 17. Which is better Data Analysis language ? R or Python ? Quora, http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python “Python  is  good  for  data  cleanup,  R  for  sta7s7cal  models”   “R  is  quirky  and  weird  but  the  sta7s7cians  love  it  and  there  really   isn’t  any  compelling  reason  to  switch”  
  18. 18. Which is better Data Analysis language ? R or Python ? Quora, http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python “Python  is  good  for  data  cleanup,  R  for  sta7s7cal  models”   “R  is  quirky  and  weird  but  the  sta7s7cians  love  it  and  there  really   isn’t  any  compelling  reason  to  switch”   “You’re  running  an  scien7fic  simula7on  on  a  laptop?                          Perhaps  you  should  write  it  in  C++/FORTRAN”  
  19. 19. “You’re(running(an(MCMC(simula2on(on( a(laptop?(Perhaps(you(should(write(it(in( C++/FORTRAN”(
  20. 20. Numba just-in-time compiler to LLVM through @decorators numba.pydata.org  
  21. 21. Numba just-in-time compiler to LLVM through @decorators* numba.pydata.org   *  aka  ,  fast.  easy  
  22. 22. Basic packages for data analysis and visualisation
  23. 23. NumPy: The foundation of the Python Data Analysis stack
  24. 24. Numpy: Array oriented
  25. 25. DEMO (.)
  26. 26. Matplotlib 2D plotting library
  27. 27. DEMO (…)
  28. 28. ! Python Data Analysis toolkit upon Numpy
  29. 29. DEMO
  30. 30. Upon pandas it started several fork projects https://github.com/kjordahl/geopandas
  31. 31. Upon pandas it started several fork projects https://github.com/wrobstory/vincent D3 with Vega
  32. 32. Upon pandas it started several fork projects http://statsmodels.sourceforge.net/ Statistics, regression plots
  33. 33. Scikit-learn Python toolkit for machine learning
  34. 34. DEMO (….)
  35. 35. Applied Science with Python!
  36. 36. Hurricane Detector using GFS Data Colorado State University Minwoo Lee http://conference.scipy.org/scipy2011/slides/lee_hurricane_prediction.pdf
  37. 37. biological data analysis with python
  38. 38. http://www.astropy.org/
  39. 39. http://scikit-image.org/
  40. 40. NLTK http://www.nltk.org/
  41. 41. IPython: Interactive Python
  42. 42. DEMO (..)
  43. 43. DEMO
  44. 44. DEMO WITH WAKARI + DEMO WITH GALLERY NBVIEWER (….)
  45. 45. 12. Anaconda Distribution packages Anaconda:  pulls  it  all  together
  46. 46. 12. Anaconda Distribution packages https://store.continuum.io/cshop/anaconda/ $ conda list $ conda search $ conda install <package-name> $ conda create -n numpy16 ipython-notebook numpy=1.6 $ source activate numpy16 $ source deactivate
  47. 47. https://binstar.org/ $ conda install binstar $ conda build <recipe-dir> $ conda config --add channels https://conda.binstar.org/username $ binstar login
  48. 48. How do you show your scientific app to the world ? One alternative: Yhat
  49. 49. How do you show your scientific app to the world ? One alternative: Yhat + Heroku
  50. 50. DEMO http://blog.yhathq.com/posts/digit-recognition-with-node-and- python.html
  51. 51. Who is using Scientific Python ?
  52. 52. Tools for scientific development
  53. 53. Tools for scientific development
  54. 54. Getting Started http://stackoverflow.com/questions/9555635/open-source-enthought-python-alternative http://fonnesbeck.github.io/ScipySuperpack/ Recent builds of fundamental Python scientific computing packages for OS X https://code.google.com/p/pythonxy/ Scientific-oriented Python Distribution based on Qt and Spyder https://store.continuum.io/cshop/anaconda/ Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing !
  55. 55. Scientific python has also several events! http://conference.scipy.org/
  56. 56. …and library community discussion groups http://www.scipy.org/scipylib/mailing-lists.html https://groups.google.com/forum/#!forum/pydata https://lists.sourceforge.net/lists/listinfo/scikit-learn-general biopython@biopython.org
  57. 57. …and library community discussion groups http://pyscience-brasil.wikidot.com/ https://groups.google.com/forum/#!forum/pyscience-brasil
  58. 58. …and library community discussion groups http://pycursos.com/biblioteca/computacao-cientifica/?filter=lesson
  59. 59. Scientific Community is f**ing amazing! https://www.enthought.com/products/pyxll/ PyXLL - Python for Excel Solution
  60. 60. Clinical Sequencing Flask + Python -Web Dashboard Python + matplotlib + scipy + numpy - Sequencing biological and log databases biopandas in work and sequencing parallel workflow approaches What Am I working on ?
  61. 61. biopandas What Am I working on ? https://github.com/genomika/biopandas
  62. 62. Challenges Reproducible  Research
  63. 63. Challenges “A  rule  of  thumb  among  biotechnology  venture  capitalists  is  that  half   of  published  research  cannot  be  replicated”  
  64. 64. Challenges How  do  we  replicate  research  today?
  65. 65. Challenges How  do  we  replicate  research  today? collaborate  on  
  66. 66. Challenges How  do  we  replicate  research  today? collaborate  on   data  analysis
  67. 67. How do we collaborate today ?
  68. 68. ??????
  69. 69. Project-based interaction
  70. 70. Project-based interaction wakari.io                    Browser-­‐based  Python  &  Linux  environment  
  71. 71. Presente em 22 dos 26 estados
  72. 72. Python on Science? Marcel Caraciolo @marcelcaraciolo CTO of Genomika Diagnósticos, Scientist, MsC at Computer Science and Data Analysis, works with Python for 7 years, interested at mobile education, machine learning and dataaaaa! Current president of Association PythonBrazil. Recife, Brazil - http://aimotion.blogspot.com Yes we can!

×