Scikit-Learn - Or why I joined an open source software project

1,762 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,762
On SlideShare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scikit-Learn - Or why I joined an open source software project

  1. 1. Scikit-Learn (or why I joined an open source software project) Gilles Louppe Dept. of EE & CS, & GIGA-R Universit´ de Li`ge, Belgium e e October 30, 2013
  2. 2. Publishing scientific software matters 1 Software is a central part of modern scientific discovery. Software developed in one field can often be applied to advance a different field if the underlying mathematics is common. The public availability of code is a corner stone of the scientific method. 1. Pradal C. et al, Publishing scientific software matters, 2013.
  3. 3. if it’s not open and verifiable by others, it’s not science, or engineering, or whatever it is you call what we do 2 2. V. Stodden, The scientific method in practice.
  4. 4. As a young PhD student full of illusions... I wanted to write useful scientific software, for me and others
  5. 5. Leverage existing software ... but I didn’t want to reinvent the wheel !
  6. 6. ... and then I joined an OSS project An open source Machine Learning library in Python Classical and well-established algorithms - Supervised and unsupervised algorithms - Model evaluation and selection - Data processing and feature engineering
  7. 7. Collaborative development
  8. 8. Software quality matters Peer-reviewed and well-tested code
  9. 9. Simple and consistent API from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier() clf.fit(X_train, y_train) y_pred = clf.predict(X_test)
  10. 10. Simple and consistent API from sklearn.svm import SVC clf = SVC() clf.fit(X_train, y_train) y_pred = clf.predict(X_test)
  11. 11. Simple and consistent API from sklearn.linear_model import LassoCV clf = LassoCV() clf.fit(X_train, y_train) y_pred = clf.predict(X_test)
  12. 12. Side effect 1 : Learn and improve your skills Strict programming practices Software management (release cycle, git, etc) Team work
  13. 13. Side effect 2 : People might start using your software In research In industry
  14. 14. Side effect 3 : You get to meet interesting people (and eat pizzas !)
  15. 15. Start with small contributions...
  16. 16. Publish and share your research code Join an open source software project
  17. 17. Questions ?

×