Why Python (for Statisticians)

2,065 views

Published on

Python and R are becoming the defacto tools for data scientists. What is Python, who is using it, and why should a statistician care?

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,065
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
40
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Why Python (for Statisticians)

  1. 1. Why Python? (For Stats People) @__mharrison__ © 2013
  2. 2. About Me ● ● ● 12+ years Python Worked in Data Analysis, HA, Search, Open Source, BI, and Storage Author of multiple Python Books
  3. 3. Book
  4. 4. Book Treading on Python Volume 1 meant to make people proficient in Python quickly
  5. 5. Why Python?
  6. 6. General Purpose Language “I’d rather do math in a general-purpose language than do general-purpose programming in a math language.” John D Cook
  7. 7. Who's Using Python? ● Startups (on HN) ● Data Scientists (Strata) ● Big Companies
  8. 8. Who ● Google ● Nasa ● ILM ● Redhat ● Finance ● Instagram ● Pinterest ● Youtube ● ...
  9. 9. Open Source Free in both senses of the word
  10. 10. Batteries Included ● Text ● Network ● JSON ● Command Line ● Files ● XML
  11. 11. Large Community PyPi - PYthon Package Index ● Web ● Database ● GUI ● Scientific ● Network Programming ● Games
  12. 12. Large Community ● User Groups ● PyLadies ● Conferences
  13. 13. Local ● utahpython.org - 2nd Thurs. 7pm ● Utah Open Source Conference
  14. 14. Tooling ● Editors ● Testing ● Profiling ● Debugging ● Documentation
  15. 15. Optimizes for Programmer Time “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” Donald Knuth
  16. 16. Executable Pseudocode function quicksort(array) if length(array) ≤ 1 return array // an array of zero or one elements is already sorted select and remove a pivot element pivot from 'array' // see '#Choice of pivot' below create empty lists less and greater for each x in array if x ≤ pivot then append x to less' else append x to greater return concatenate(quicksort(less), list(pivot), quicksort(greater)) // two recursive calls http://en.wikipedia.org/wiki/Quicksort
  17. 17. Executable Pseudocode >>> def quicksort(array): ... if len(array) <= 1: ... return array ... pivot = array.pop(len(array)/2) ... lt = [] ... gt = [] ... for item in array: ... if item < pivot: ... lt.append(item) ... else: ... gt.append(item) ... return quicksort(lt) + [pivot] + quicksort(gt)
  18. 18. But... Python has Timsort. Optimized for real world (takes advantage of inherent order) and written in C. (Stolen by Java, Android, and Octave)
  19. 19. Multi-paradigm Languange ● Imperative ● Object Oriented ● Functional
  20. 20. Imperative >>> def sum(items): ... total = 0 ... for item in items: ... total = total + item ... return total >>> sum([2, 4, 8]) 14
  21. 21. OO >>> class Summer: ... def __init__(self): ... self.items = [] ... def add_item(self, item): ... self.items.append(item) ... def sum(self): ... return sum(self.items) >>> >>> >>> >>> 5 s = Summer() s.add_item(2) s.add_item(3) s.sum()
  22. 22. Functional >>> import operator >>> sum = lambda x: reduce(operator.add, x) >>> sum([4, 8, 22]) 34
  23. 23. Why Not Python?
  24. 24. Slow Sometimes you have to optimize. Good C integration
  25. 25. If it ain't broke don't fix it Don't replace existing solutions for fun
  26. 26. R has more depth Though Python is catching up in some areas
  27. 27. Going Forward
  28. 28. IPython Notebook ● Notebook w/ integrated graphs
  29. 29. Libraries ● Numpy - matrix math ● scipy - scientific libraries ● scipy.stats - stats ● statsmodel - modeling ● pandas - dataframe ● matplotlib - graphing ● scikit.learn - ml
  30. 30. That's all Questions? Tweet me For beginning Python secrets see Treading on Python Volume 1 @__mharrison__ http://hairysun.com

×