Python 	

on Science?
Marcel Caraciolo
@marcelcaraciolo	

CTO of Genomika Diagnósticos, Scientist, MsC at Computer Science and Data Analysis,	

works with Python for 7 years, interested at mobile	

education, machine learning and dataaaaa! Current president of Association PythonBrazil.	

Recife, Brazil - http://aimotion.blogspot.com
Yes we can!
About me
Creator os several scientific python packages including crab
(recsys), benchy and now biopandas
Until last year, Scientist Chief at Atepassar.com 	

(e-learning social network)
Co-Founder and Instructor of PyCursos, teaching Python on-line
including the famous Masanori’s Python for Zoooommbiesss!!
Interested at Python, mobile, e-learning, machine learning and now
my new acquisition skill: bioinformatics!!!
2014, new challenges!
http://www.genomika.com.br
Onde tudo começou…
!
Março 2013
!
https://github.com/
marcelcaraciolo/Geo-
Friendship-Visualization
What already you know…
What already you know…
What already you know…
What already you know…
Putting Science back in Comp Science
Much of the software stack is for systems programming
C++, Java, .NET, ObjC, web, etc.
Complex numbers ? Vectorized primitives ?
Software stack for scientists is not helpful as it should be
FORTRAN, C/C++ is still where many scientists end up
High Performance with Big Data
Packages for data analysis and visualization
Syntax - Gets out of your way !!
Community Driven
Ready for web applications
Which is better Data Analysis language ? R or Python ?
Quora, http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
“Python	
  is	
  good	
  for	
  data	
  cleanup,	
  R	
  for	
  sta7s7cal	
  models”	
  
Which is better Data Analysis language ? R or Python ?
Quora, http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
“Python	
  is	
  good	
  for	
  data	
  cleanup,	
  R	
  for	
  sta7s7cal	
  models”	
  
“R	
  is	
  quirky	
  and	
  weird	
  but	
  the	
  sta7s7cians	
  love	
  it	
  and	
  there	
  really	
  
isn’t	
  any	
  compelling	
  reason	
  to	
  switch”	
  
Which is better Data Analysis language ? R or Python ?
Quora, http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
“Python	
  is	
  good	
  for	
  data	
  cleanup,	
  R	
  for	
  sta7s7cal	
  models”	
  
“R	
  is	
  quirky	
  and	
  weird	
  but	
  the	
  sta7s7cians	
  love	
  it	
  and	
  there	
  really	
  
isn’t	
  any	
  compelling	
  reason	
  to	
  switch”	
  
“You’re	
  running	
  an	
  scien7fic	
  simula7on	
  on	
  a	
  laptop?	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Perhaps	
  you	
  should	
  write	
  it	
  in	
  C++/FORTRAN”	
  
“You’re(running(an(MCMC(simula2on(on(
a(laptop?(Perhaps(you(should(write(it(in(
C++/FORTRAN”(
Numba
just-in-time compiler to LLVM through
@decorators
numba.pydata.org	
  
Numba
just-in-time compiler to LLVM through
@decorators*
numba.pydata.org	
  
*	
  aka	
  ,	
  fast.	
  easy	
  
Basic packages for data analysis and visualisation
NumPy: The foundation of the
Python Data Analysis stack
Numpy: Array oriented
DEMO (.)
Matplotlib
2D plotting library
DEMO (…)
!
Python Data Analysis toolkit upon Numpy
DEMO
Upon pandas it started several fork projects
https://github.com/kjordahl/geopandas
Upon pandas it started several fork projects
https://github.com/wrobstory/vincent
D3 with Vega
Upon pandas it started several fork projects
http://statsmodels.sourceforge.net/
Statistics, regression plots
Scikit-learn
Python toolkit for machine learning
DEMO (….)
Applied Science with Python!
Hurricane Detector using GFS Data
Colorado State University
Minwoo Lee
http://conference.scipy.org/scipy2011/slides/lee_hurricane_prediction.pdf
biological data analysis with python
http://www.astropy.org/
http://scikit-image.org/
NLTK
http://www.nltk.org/
IPython:
Interactive Python
DEMO (..)
DEMO
DEMO WITH WAKARI +
DEMO WITH GALLERY NBVIEWER (….)
12. Anaconda Distribution packages
Anaconda:	
  pulls	
  it	
  all	
  together
12. Anaconda Distribution packages
https://store.continuum.io/cshop/anaconda/
$ conda list
$ conda search
$ conda install <package-name>
$ conda create -n numpy16 ipython-notebook numpy=1.6
$ source activate numpy16
$ source deactivate
https://binstar.org/
$ conda install binstar
$ conda build <recipe-dir>
$ conda config --add channels https://conda.binstar.org/username
$ binstar login
How do you show your scientific app
to the world ?
One alternative: Yhat
How do you show your scientific app
to the world ?
One alternative: Yhat + Heroku
DEMO
http://blog.yhathq.com/posts/digit-recognition-with-node-and-
python.html
Who is using Scientific Python ?
Tools for scientific development
Tools for scientific development
Getting Started
http://stackoverflow.com/questions/9555635/open-source-enthought-python-alternative
http://fonnesbeck.github.io/ScipySuperpack/
Recent builds of fundamental Python scientific computing packages for OS X
https://code.google.com/p/pythonxy/
Scientific-oriented Python Distribution based on Qt and Spyder
https://store.continuum.io/cshop/anaconda/
Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and
scientific computing
!
Scientific python has also several events!
http://conference.scipy.org/
…and library community discussion groups
http://www.scipy.org/scipylib/mailing-lists.html
https://groups.google.com/forum/#!forum/pydata
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
biopython@biopython.org
…and library community discussion groups
http://pyscience-brasil.wikidot.com/
https://groups.google.com/forum/#!forum/pyscience-brasil
…and library community discussion groups
http://pycursos.com/biblioteca/computacao-cientifica/?filter=lesson
Scientific Community is f**ing amazing!
https://www.enthought.com/products/pyxll/
PyXLL - Python for Excel Solution
Clinical Sequencing
Flask + Python -Web Dashboard
Python + matplotlib + scipy + numpy - Sequencing
biological and log databases
biopandas in work and sequencing parallel workflow
approaches
What Am I working on ?
biopandas
What Am I working on ?
https://github.com/genomika/biopandas
Challenges
Reproducible	
  Research
Challenges
“A	
  rule	
  of	
  thumb	
  among	
  biotechnology	
  venture	
  capitalists	
  is	
  that	
  half	
  
of	
  published	
  research	
  cannot	
  be	
  replicated”	
  
Challenges
How	
  do	
  we	
  replicate	
  research	
  today?
Challenges
How	
  do	
  we	
  replicate	
  research	
  today?
collaborate	
  on	
  
Challenges
How	
  do	
  we	
  replicate	
  research	
  today?
collaborate	
  on	
  
data	
  analysis
How do we collaborate today ?
??????
Project-based interaction
Project-based interaction
wakari.io	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Browser-­‐based	
  Python	
  &	
  Linux	
  environment	
  
Presente em 22 dos
26 estados
Python 	

on Science?
Marcel Caraciolo
@marcelcaraciolo	

CTO of Genomika Diagnósticos, Scientist, MsC at Computer Science and Data Analysis,	

works with Python for 7 years, interested at mobile	

education, machine learning and dataaaaa! Current president of Association PythonBrazil.	

Recife, Brazil - http://aimotion.blogspot.com
Yes we can!

Python on Science ? Yes, We can.