0
Fast data mining flow prototyping     using IPython Notebook            2013/01/31             Jimmy Lai      r97922028 [a...
Outline1.   Workflow for data mining2.   What IPython Notebook provides3.   Exemplified by text classification4.   Demo co...
Workflow for data mining• Traditional programming workflow:  – Edit -> Compile -> Run• Data Mining workflow:  – Execute ->...
What IPython Notebook provides• Interactive Web IDE  – Display rich data like plots by matplotlib, math    symbols by late...
Demo code and Notebook usage• Demo Code: ipython_demo directory in  https://bitbucket.org/noahsark/slideshare• Ipython Not...
IPython Note Interface        IPython Notebook   6
Exemplified by text classification• Text classification on newsgroup dataset.• Dataset:  – Build in sklearn.datasets  – Ea...
talk.politics.mideastExample article     IPython Notebook                       8
IPython Notebook   9
Sample result of feature extraction              IPython Notebook    10
Table of experiment setups          IPython Notebook   11
IPython Notebook   12
Experiment Result      IPython Notebook   13
IPython Notebook   14
Observation from plots        IPython Notebook   15
Upcoming SlideShare
Loading in...5
×

Fast data mining flow prototyping using IPython Notebook

1,972

Published on

Big data analysis requires fast prototyping on data mining process to gain insight into data. In this slides, the author introduces how to use IPython Notebook to sketch code pieces for data mining stages and make fast observations easily.

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,972
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
61
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Transcript of "Fast data mining flow prototyping using IPython Notebook"

  1. 1. Fast data mining flow prototyping using IPython Notebook 2013/01/31 Jimmy Lai r97922028 [at] ntu.edu.tw
  2. 2. Outline1. Workflow for data mining2. What IPython Notebook provides3. Exemplified by text classification4. Demo code and Notebook usage IPython Notebook 2
  3. 3. Workflow for data mining• Traditional programming workflow: – Edit -> Compile -> Run• Data Mining workflow: – Execute -> Explore – Consists of many data processing stages and we may do trials in each stage with different methods. – Stages: data parsing, feature extraction, feature selection, model training, model predicting, post processing, etc. IPython Notebook 3
  4. 4. What IPython Notebook provides• Interactive Web IDE – Display rich data like plots by matplotlib, math symbols by latex – Code cell for sketching – Execute piece of code in arbitrarily order – Browser interface for programming remotely – Easy to demonstrate code and execution result in html or PDF.• IPython Notebook makes sketching data analysis easily. IPython Notebook 4
  5. 5. Demo code and Notebook usage• Demo Code: ipython_demo directory in https://bitbucket.org/noahsark/slideshare• Ipython Notebook: – Install $ pip install ipython – Execution (under ipython_demo dir) $ ipython notebook --pylab=inline – Open notebook with browser, e.g. http://127.0.0.1:8888 IPython Notebook 5
  6. 6. IPython Note Interface IPython Notebook 6
  7. 7. Exemplified by text classification• Text classification on newsgroup dataset.• Dataset: – Build in sklearn.datasets – Each article belongs to one of the 20 groups• Goal: classify article to one of the newsgroup name.• Experiment: feature generation using different ngram parameters. IPython Notebook 7
  8. 8. talk.politics.mideastExample article IPython Notebook 8
  9. 9. IPython Notebook 9
  10. 10. Sample result of feature extraction IPython Notebook 10
  11. 11. Table of experiment setups IPython Notebook 11
  12. 12. IPython Notebook 12
  13. 13. Experiment Result IPython Notebook 13
  14. 14. IPython Notebook 14
  15. 15. Observation from plots IPython Notebook 15
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×