The "Introduction to Machine Learning" material covers the fundamental aspects of machine learning and data science through a series of key concepts. Starting with the understanding of data and information, the material elaborates on various types of data and the distinctions between large data, high dimension data, and big data. It encompasses the stages of machine learning, such as Data Preparation, Training, and Testing, providing practical application examples. Terminology such as samples and features is introduced, while metrics like confusion metrics are utilized to evaluate model performance. Concepts like the learning curve, supervised learning, and the differences between classification and regression are also briefly explained. The material also includes basic techniques like cleaning data (handling missing values and duplicates) and data transformation (changing data types). Finally, the concept of data visualization is introduced through examples involving scatter plots, histograms, and other visualizations to enhance understanding and interpretation of data more effectively. Overall, this material establishes a strong foundation for comprehending machine learning concepts and acquiring practical skills in manipulating and analyzing data.
2. Main
Topics
• Data dan Informasi
• Jenis-Jenis Data
• Pengertian Machine Learning
• Large data, High Dimentional Data,
dan Big Data
• Learning Curves
• Supervised & Unsupervised
Learning
• Confusion Matrix
• Data Preparation
4. Data dan
Informasi
Data Informasi
Sekumpulan fakta Data yang telah
dikeolah
Tidak tersusun
(not organized)
Tersusun (organized)
Tanpa konteks Membantu dalam
mendukung keputusan
pixabay.com
5. Jenis-Jenis
Data
• Data Kuantitatif
• Continous
Data kontinu adalah data yang mempunyai rentang nilai. Nilai
dari data kontinu tidak bisa dihitung, melainkan diukur.
Contoh :
Tinggi badan, berat badan, waktu, jarak, dll
• Discrete
Data diskrit adalah data yang mempunyai nilai diskrit.
Contoh :
Jumlah penduduk desa, jumlah siswa di kelas, nilai ujian siswa,
dll.
6. • Structured
Data yang terstruktur adalah data yang dapat direpresentasikan
dalam format tabular/tabel.
Contoh :
• Unstructured
Tidak ada proper format
Contoh : Emails, server logs, dll.
Jenis-Jenis
Data (cont..)
2. Data Kualitatif
7. Pengertian Machine Learning
Machine learning (ML) is the scientific study of algorithms and statistical models that computer
systems use to perform a specific task without being explicitly programmed (Betta Mahesh,
2018).
Machine Learning is subset or the branch of AI, which creates
mathematical model from the historical data.
Machine learning algorithms are centered around probability, linear
algebra, optimization techniques and statistical theories. They are
widely used in industries for prediction and automation
(codecrucks.com, 2023).
8. Pengertian Machine Learning (cont..)
nationalgeographic.com
(Jake VanderPlas, 2017)
Features --> Ciri-ciri, khas, individual measurable property,
dan karakteristik
Target --> Class atau outcome
9. Large data, High Dimentional
Data, dan Big Data
Large data
Ukuran data yang besar dari segi
sample atau kuantitasnya nya.
High Dimentional Data
Dataset yang fiturnya bisa sampai
berjumlah ribuan.
Big Data
Fitur dan samplesnya banyak dan
memenuhi kriteria 3V (Volume,
Velocity, dan Variety)
10. Good amount of
accuracy
Robust ML model
Mana yang lebih baik?
Training dan Testing
Kenapa harus split dataset?
Real world problems
Training
Model akan mempelari pola-pola,
relationships, dll dari fitur yang ada di
dataset
Testing
Prediksi Class atau Outcome dari unseen
data.
• 80 % traning data dan 20 % test data
• 90 % training data dan 10% test data
11. Learning
curves
A learning curve is a plot of model learning
performance over experience or time (Jason
Brownlee, 2016).
1. Overfitting
(Train > Test)
12. Learning curves
(cont..)
A learning curve is a plot of model learning
performance over experience or time (Jason
Brownlee, 2016).
1. Underfitting
(Train > Test)
13. Learning curves
(cont..)
A learning curve is a plot of model learning
performance over experience or time (Jason
Brownlee, 2016).
3. Good fit
(Train = Test)