4. WHATS DATA MINING
• Science: probability, statistics, graph theory etc.
• Techniques: clustering, classification, regression,
prediction etc.
• A way to think about this world.
On textbooks
4
10. HANDS-ON PRACTICE
• Tools to Facilitate Your Data Analysis
• Commercial
• SAS
• IBM SPSS
• Matlab etc.
• Free/Open Source
• RapidMiner + Weka
• R (my favor)
• Python + SciPy + scikit-learn
• Hadoop/Spark etc.
10
12. HANDS-ON PRACTICE
• RapidMiner (ads-free)
• A Java-based IDE for ML, data mining, text mining etc.
• Modular design, graphic interface, zero-line coding
• Complete Process logic: data ETL, visualization, modeling,
prediction, reports etc.
• Growing extension market
• CLI and API for other programs
• Call functions of Weka and R
Download: http://www.rapidminer.com/12
13. HANDS-ON PRACTICE
• StoneFlakes
• StoneFlakes.csv: flake
attribute information
• annotation.csv:
inventory properties
Formated: http://io.hsiamin.com/data/StoneFlakes.tar.gz
13