Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

973 views

Published on

Published in:
Technology

License: CC Attribution License

No Downloads

Total views

973

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

29

Comments

0

Likes

3

No embeds

No notes for slide

- 1. The Artful Business of Data Mining Computational Statistics with Open Source ToolWednesday 20 March 13
- 2. David Coallier @davidcoallierWednesday 20 March 13
- 3. Data Scientist At Engine Yard (.com)Wednesday 20 March 13
- 4. Find DataWednesday 20 March 13
- 5. Clean DataWednesday 20 March 13
- 6. Analyse Data?Wednesday 20 March 13
- 7. Analyse DataWednesday 20 March 13
- 8. Question DataWednesday 20 March 13
- 9. Report FindingsWednesday 20 March 13
- 10. Data ScientistWednesday 20 March 13
- 11. Data JanitorWednesday 20 March 13
- 12. Actual TasksWednesday 20 March 13
- 13. “If your model is elegant, it’s probably wrong”Wednesday 20 March 13
- 14. “The Times they are a-Changing” — Bob DylanWednesday 20 March 13
- 15. Python & RWednesday 20 March 13
- 16. SciPy http://www.scipy.orgWednesday 20 March 13
- 17. scipy.statsWednesday 20 March 13
- 18. scipy.stats Descriptive StatisticsWednesday 20 March 13
- 19. from scipy.stats import describe s = [1,2,1,3,4,5] print describe(s)Wednesday 20 March 13
- 20. scipy.stats Probability DistributionsWednesday 20 March 13
- 21. Example Poisson DistributionWednesday 20 March 13
- 22. λ e k −k f (k; λ ) = k! for k >= 0Wednesday 20 March 13
- 23. import scipy.stats.poisson p = poisson.pmf([1,2,3,4,1,2,3], 2)Wednesday 20 March 13
- 24. print p.mean() print p.sum() ...Wednesday 20 March 13
- 25. NumPy http://www.numpy.org/Wednesday 20 March 13
- 26. NumPy Linear AlgebraWednesday 20 March 13
- 27. ⎛ 1 0 ⎞ ⎜ 0 1 ⎟ ⎝ ⎠Wednesday 20 March 13
- 28. import numpy as np x = np.array([ [1, 0], [0, 1] ]) vec, val = np.linalg.eig(x) np.linalg.eigvals(x)Wednesday 20 March 13
- 29. >>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.], [ 0., 1.] ]) )Wednesday 20 March 13
- 30. Matplotlib Python PlottingWednesday 20 March 13
- 31. statsmodels Advanced Statistics ModelingWednesday 20 March 13
- 32. NLTK Natural Language Tool KitWednesday 20 March 13
- 33. scikit-learn Machine LearningWednesday 20 March 13
- 34. from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 2.]]) >>> array([1])Wednesday 20 March 13
- 35. PyBrain ... Machine LearningWednesday 20 March 13
- 36. PyMC Bayesian InferenceWednesday 20 March 13
- 37. Pattern Web Mining for PythonWednesday 20 March 13
- 38. NetworkX Study NetworksWednesday 20 March 13
- 39. MILK MOAR machine LEARNING!Wednesday 20 March 13
- 40. Pandas easy-to-use data structuresWednesday 20 March 13
- 41. from pandas import * x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18} ]) print x[x[age] > 20].count() print x[x[age] > 20].mean()Wednesday 20 March 13
- 42. RWednesday 20 March 13
- 43. RStudio The IDEWednesday 20 March 13
- 44. lubridate and zoo Dealing with Dates...Wednesday 20 March 13
- 45. yy/mm/dd mm/dd/yy YYYY-mm-dd HH:MM:ss TZ yy-mm-dd 1363784094.513425 yy/mm different timezoneWednesday 20 March 13
- 46. reshape2 Reshape your DataWednesday 20 March 13
- 47. ggplot2 Visualise your DataWednesday 20 March 13
- 48. RCurl, RJSONIO Find more DataWednesday 20 March 13
- 49. HMisc Miscellaneous useful functionsWednesday 20 March 13
- 50. forecast Can you guess?Wednesday 20 March 13
- 51. garch And ruGarchWednesday 20 March 13
- 52. quantmod Statistical Financial TradingWednesday 20 March 13
- 53. xts Extensible Time SeriesWednesday 20 March 13
- 54. igraph Study NetworksWednesday 20 March 13
- 55. maptools Read & View MapsWednesday 20 March 13
- 56. map(state, region = c(row.names(USArrests)), col=cm.colors(16, 1)[ﬂoor(USArrests$Rape/max(USArrests$Rape)*28)], ﬁll=T)Wednesday 20 March 13
- 57. StorageWednesday 20 March 13
- 58. Oppose “big” DataWednesday 20 March 13
- 59. “Learn how to sample”Wednesday 20 March 13
- 60. ExperimentsWednesday 20 March 13
- 61. What Do You Want to Answer?Wednesday 20 March 13
- 62. Understand Your AudienceWednesday 20 March 13
- 63. Scientific ReportingWednesday 20 March 13
- 64. Busy-ness Time is moneyWednesday 20 March 13
- 65. Public VisualisationWednesday 20 March 13
- 66. Best Visualisation, Bad DataWednesday 20 March 13
- 67. Best Forecasting models... Bad VisualisationWednesday 20 March 13
- 68. Wednesday 20 March 13
- 69. Wednesday 20 March 13
- 70. SeanchaíWednesday 20 March 13
- 71. Wednesday 20 March 13
- 72. FeelitWednesday 20 March 13
- 73. Wednesday 20 March 13
- 74. Wednesday 20 March 13
- 75. Wednesday 20 March 13
- 76. “Don’t be scared of bar charts.”Wednesday 20 March 13
- 77. Mathematical Statistics Engineering Business Economics CuriosityWednesday 20 March 13
- 78. davidcoallier.github.com @davidcoallier on TwitterWednesday 20 March 13

No public clipboards found for this slide

Be the first to comment