The document discusses data mining primitives, languages, and system architectures. It describes the key components of a typical data mining system including the data mining engine, database/data warehouse, knowledge base, and graphical user interface. It also covers data mining primitives such as task-relevant data specification, background knowledge, and interestingness measures. Data mining query languages and different system architectures are also summarized.
The document provides an overview of databases and data modeling. It introduces key concepts such as database management systems (DBMS), database design phases, database actors and workers, and advantages of the DBMS approach over traditional file processing. Some key points covered include the self-describing nature of databases, program-data independence, multiple views of data, transaction processing, roles of database administrators and designers, and how DBMSs provide controlled data redundancy and security. An example university database structure is also presented.
This document discusses various types of computer storage media including primary, secondary, and tertiary storage. It focuses on disk storage devices, describing how data is organized on disks using tracks, sectors, cylinders, and blocks. Disk addressing and the hardware mechanisms involved in reading and writing data to disks are explained. Factors that influence disk access time like seek time and rotational latency are covered. The document also briefly discusses tape storage devices and their sequential access nature.
Part1 of SQL Tuning Workshop - Understanding the OptimizerMaria Colgan
Part 1 of a 5 part SQL Tuning workshop, This presentation covers the history of the Oracle Optimizer and explains the first thing the Optimizer does when it receives a SQL statements, which is to transform the SQL statement in order to open up additional access paths.
Dokumen tersebut membahas tentang cache memory, yang merupakan memori sementara untuk mempercepat akses data oleh processor. Ia menjelaskan prinsip kerja, karakteristik, tujuan, ukuran, dan jenis pemetaan cache memory yaitu pemetaan langsung, asosiatif, dan asosiatif set.
The document discusses data mining primitives, languages, and system architectures. It describes the key components of a typical data mining system including the data mining engine, database/data warehouse, knowledge base, and graphical user interface. It also covers data mining primitives such as task-relevant data specification, background knowledge, and interestingness measures. Data mining query languages and different system architectures are also summarized.
The document provides an overview of databases and data modeling. It introduces key concepts such as database management systems (DBMS), database design phases, database actors and workers, and advantages of the DBMS approach over traditional file processing. Some key points covered include the self-describing nature of databases, program-data independence, multiple views of data, transaction processing, roles of database administrators and designers, and how DBMSs provide controlled data redundancy and security. An example university database structure is also presented.
This document discusses various types of computer storage media including primary, secondary, and tertiary storage. It focuses on disk storage devices, describing how data is organized on disks using tracks, sectors, cylinders, and blocks. Disk addressing and the hardware mechanisms involved in reading and writing data to disks are explained. Factors that influence disk access time like seek time and rotational latency are covered. The document also briefly discusses tape storage devices and their sequential access nature.
Part1 of SQL Tuning Workshop - Understanding the OptimizerMaria Colgan
Part 1 of a 5 part SQL Tuning workshop, This presentation covers the history of the Oracle Optimizer and explains the first thing the Optimizer does when it receives a SQL statements, which is to transform the SQL statement in order to open up additional access paths.
Dokumen tersebut membahas tentang cache memory, yang merupakan memori sementara untuk mempercepat akses data oleh processor. Ia menjelaskan prinsip kerja, karakteristik, tujuan, ukuran, dan jenis pemetaan cache memory yaitu pemetaan langsung, asosiatif, dan asosiatif set.
De l'upsert sur des fichiers Parquet ? Retrouver l'état de mes données de mercredi dernier ? Des transactions ACID sur mon datalake ? C'est désormais possible avec DeltaLake, la nouvelle librairie de Databricks.
This document discusses distributed databases and client-server architectures. It begins by outlining distributed database concepts like fragmentation, replication and allocation of data across multiple sites. It then describes different types of distributed database systems including homogeneous, heterogeneous, federated and multidatabase systems. Query processing techniques like query decomposition and optimization strategies for distributed queries are also covered. Finally, the document discusses client-server architecture and its various components for managing distributed databases.
We have described the normalization (first normal form, second normal form ..... upto fifth normal form) in simple and easy to understand language.
We at BIWHIZ are committed to equip you with the hottest skills of BI, Analytics, Big Data, Database and Data Science.
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Usman Tariq
In this PPT, you will learn:
• About data modeling and why data models are important
• About the basic data-modeling building blocks
• What business rules are and how they influence database design
• How the major data models evolved
• About emerging alternative data models and the needs they fulfill
• How data models can be classified by their level of abstraction
Author: Carlos Coronel | Steven Morris
Dokumen tersebut membahas tentang algoritma k-means clustering. K-means clustering adalah salah satu metode clustering non-hirarki yang mengelompokkan data menjadi satu atau lebih cluster dengan menentukan nilai centroid awal secara acak lalu menghitung jarak antara data dan centroid untuk mengelompokkannya ke cluster mana. Algoritma k-means melakukan iterasi dengan menghitung centroid baru sampai posisi data tidak berubah lagi.
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
This document discusses Oracle Multitenant in Oracle Database 12c. It provides an overview of the multitenant architecture including containers, benefits such as lower costs and easier provisioning, and impacts such as shared redo logs and one character set across PDBs. It also covers deployment including creating a CDB, provisioning new PDBs from the seed database, plugging in non-CDBs, and cloning PDBs.
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
1. Scan the table to filter rows based on the predicate(s) in the WHERE clause.
2. For each qualifying row, add it to the result set.
3. Once the entire table has been scanned, the result set contains only the selected rows matching the predicate(s).
The document discusses database management systems (DBMS). It covers:
1. The evolution of DBMS from monolithic to client/server architectures.
2. Data models including conceptual, logical, and physical models. Conceptual models are independent of implementation while physical models describe how data is stored.
3. Database schemas define the structure, while instances represent the current data state. Schemas can evolve independently of instances.
Manajemen memori dalam linux terbagi menjadi memori fisik dan virtual, memori fisik dibagi menjadi 3 zona dan menggunakan 2 teknik alokasi, sedangkan memori virtual berfungsi untuk meningkatkan efisiensi sistem dengan mengatur ruang alamat dan membentuk halaman yang dibutuhkan proses.
- Delta Lake is an open source project that provides ACID transactions, schema enforcement, and time travel capabilities to data stored in data lakes such as S3 and ADLS.
- It allows building a "Lakehouse" architecture where the same data can be used for both batch and streaming analytics.
- Key features include ACID transactions, scalable metadata handling, time travel to view past data states, schema enforcement, schema evolution, and change data capture for streaming inserts, updates and deletes.
Transaction Management in Database Management SystemChristalin Nelson
The document discusses transaction processing and transaction management. It introduces concepts like transactions, concurrency control, recovery, and system logs. Transactions are logical units of work that must follow the ACID properties - Atomicity, Consistency, Isolation, and Durability. Concurrency control is needed to prevent issues like lost updates, dirty reads, and other problems from concurrent transaction execution. Recovery techniques use undo and redo operations logged in a system log to recover from failures.
The document provides an overview of data modeling and conceptual database design using the Entity-Relationship model. It discusses key concepts like entity types, attributes, relationships, and constraints. It also presents a sample COMPANY database application and initial ER design with entities for employees, departments, projects, and dependents along with attributes and relationships.
Backup and Disaster Recovery (BDR) combines data backup and disaster recovery solutions that work cohesively to ensure uptime and maximize productivity. Backup and Disaster recovery solutions are made to be simple and effective. Not only can a BDR plan save you money, it will ensure that you will survive if or when disaster strikes.
Servicing Dallas/Fort Worth, Abilene, Lubbock or Midland/Odessa and surrounding areas.
The Most Trusted In-Memory database in the world- AltibaseAltibase
This document provides an overview of an in-memory database company and its product capabilities. It discusses the company's history and growth, the changing data landscape driving demand for real-time analytics, and how the company's in-memory and hybrid database technologies provide extremely fast transaction processing, high availability, scalability, and flexibility for deploying on-premise or in the cloud. Example customer use cases and implementations are described to demonstrate how the database has helped organizations tackle challenges of high volume data processing and analytics.
Dokumen tersebut membahas beberapa metode untuk mengukur reliabilitas tes, yaitu metode bentuk paralel, tes ulang, belah dua, dan rumus-rumus untuk menghitung koefisien reliabilitasnya seperti Spearman-Brown, Flanagan, Rulon, KR-20, KR-21, dan Hoyt. Metode belah dua melibatkan pembagian butir soal menjadi dua bagian untuk menghitung reliabilitas masing-masing bagian dan keseluru
Dokumen tersebut membahas tentang mesin pembelajaran dan algoritmanya. Ia menjelaskan konsep dasar mesin pembelajaran yaitu mempelajari pola dari data latihan untuk membentuk hipotesis. Algoritma Find-S dijelaskan sebagai metode dasar untuk memperoleh hipotesis berdasarkan kesamaan nilai atribut pada data latihan. Keterbatasan Find-S juga disebutkan apabila data latihan tidak konsisten atau bias.
De l'upsert sur des fichiers Parquet ? Retrouver l'état de mes données de mercredi dernier ? Des transactions ACID sur mon datalake ? C'est désormais possible avec DeltaLake, la nouvelle librairie de Databricks.
This document discusses distributed databases and client-server architectures. It begins by outlining distributed database concepts like fragmentation, replication and allocation of data across multiple sites. It then describes different types of distributed database systems including homogeneous, heterogeneous, federated and multidatabase systems. Query processing techniques like query decomposition and optimization strategies for distributed queries are also covered. Finally, the document discusses client-server architecture and its various components for managing distributed databases.
We have described the normalization (first normal form, second normal form ..... upto fifth normal form) in simple and easy to understand language.
We at BIWHIZ are committed to equip you with the hottest skills of BI, Analytics, Big Data, Database and Data Science.
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Usman Tariq
In this PPT, you will learn:
• About data modeling and why data models are important
• About the basic data-modeling building blocks
• What business rules are and how they influence database design
• How the major data models evolved
• About emerging alternative data models and the needs they fulfill
• How data models can be classified by their level of abstraction
Author: Carlos Coronel | Steven Morris
Dokumen tersebut membahas tentang algoritma k-means clustering. K-means clustering adalah salah satu metode clustering non-hirarki yang mengelompokkan data menjadi satu atau lebih cluster dengan menentukan nilai centroid awal secara acak lalu menghitung jarak antara data dan centroid untuk mengelompokkannya ke cluster mana. Algoritma k-means melakukan iterasi dengan menghitung centroid baru sampai posisi data tidak berubah lagi.
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
This document discusses Oracle Multitenant in Oracle Database 12c. It provides an overview of the multitenant architecture including containers, benefits such as lower costs and easier provisioning, and impacts such as shared redo logs and one character set across PDBs. It also covers deployment including creating a CDB, provisioning new PDBs from the seed database, plugging in non-CDBs, and cloning PDBs.
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
1. Scan the table to filter rows based on the predicate(s) in the WHERE clause.
2. For each qualifying row, add it to the result set.
3. Once the entire table has been scanned, the result set contains only the selected rows matching the predicate(s).
The document discusses database management systems (DBMS). It covers:
1. The evolution of DBMS from monolithic to client/server architectures.
2. Data models including conceptual, logical, and physical models. Conceptual models are independent of implementation while physical models describe how data is stored.
3. Database schemas define the structure, while instances represent the current data state. Schemas can evolve independently of instances.
Manajemen memori dalam linux terbagi menjadi memori fisik dan virtual, memori fisik dibagi menjadi 3 zona dan menggunakan 2 teknik alokasi, sedangkan memori virtual berfungsi untuk meningkatkan efisiensi sistem dengan mengatur ruang alamat dan membentuk halaman yang dibutuhkan proses.
- Delta Lake is an open source project that provides ACID transactions, schema enforcement, and time travel capabilities to data stored in data lakes such as S3 and ADLS.
- It allows building a "Lakehouse" architecture where the same data can be used for both batch and streaming analytics.
- Key features include ACID transactions, scalable metadata handling, time travel to view past data states, schema enforcement, schema evolution, and change data capture for streaming inserts, updates and deletes.
Transaction Management in Database Management SystemChristalin Nelson
The document discusses transaction processing and transaction management. It introduces concepts like transactions, concurrency control, recovery, and system logs. Transactions are logical units of work that must follow the ACID properties - Atomicity, Consistency, Isolation, and Durability. Concurrency control is needed to prevent issues like lost updates, dirty reads, and other problems from concurrent transaction execution. Recovery techniques use undo and redo operations logged in a system log to recover from failures.
The document provides an overview of data modeling and conceptual database design using the Entity-Relationship model. It discusses key concepts like entity types, attributes, relationships, and constraints. It also presents a sample COMPANY database application and initial ER design with entities for employees, departments, projects, and dependents along with attributes and relationships.
Backup and Disaster Recovery (BDR) combines data backup and disaster recovery solutions that work cohesively to ensure uptime and maximize productivity. Backup and Disaster recovery solutions are made to be simple and effective. Not only can a BDR plan save you money, it will ensure that you will survive if or when disaster strikes.
Servicing Dallas/Fort Worth, Abilene, Lubbock or Midland/Odessa and surrounding areas.
The Most Trusted In-Memory database in the world- AltibaseAltibase
This document provides an overview of an in-memory database company and its product capabilities. It discusses the company's history and growth, the changing data landscape driving demand for real-time analytics, and how the company's in-memory and hybrid database technologies provide extremely fast transaction processing, high availability, scalability, and flexibility for deploying on-premise or in the cloud. Example customer use cases and implementations are described to demonstrate how the database has helped organizations tackle challenges of high volume data processing and analytics.
Dokumen tersebut membahas beberapa metode untuk mengukur reliabilitas tes, yaitu metode bentuk paralel, tes ulang, belah dua, dan rumus-rumus untuk menghitung koefisien reliabilitasnya seperti Spearman-Brown, Flanagan, Rulon, KR-20, KR-21, dan Hoyt. Metode belah dua melibatkan pembagian butir soal menjadi dua bagian untuk menghitung reliabilitas masing-masing bagian dan keseluru
Dokumen tersebut membahas tentang mesin pembelajaran dan algoritmanya. Ia menjelaskan konsep dasar mesin pembelajaran yaitu mempelajari pola dari data latihan untuk membentuk hipotesis. Algoritma Find-S dijelaskan sebagai metode dasar untuk memperoleh hipotesis berdasarkan kesamaan nilai atribut pada data latihan. Keterbatasan Find-S juga disebutkan apabila data latihan tidak konsisten atau bias.
Teks tersebut menjelaskan berbagai konsep statistika dasar seperti rata-rata (mean), modus, median, dan kuartil untuk data tunggal maupun berkelompok beserta contoh perhitungannya.
Secara umum logika fuzzy sugeno adalah suatu logika yang digunakan untuk menghasilkan keputusan tunggal/crisp saat defuzzyfikasi, penggunaannya tergantung dari domain masalah yang terjadi
Estetika Humanisme Diskusi Modul Part Ke-7.pdfHendroGunawan8
Anger management adalah belajar mengenali tanda-tanda pada diri saat marah dan mengambil tindakan yang “sehat” dalam meluapkan kemarahan.
Secara sederhana, dapat diartikan bahwa anger management adalah mengendalikan rasa marah, bukan mencegah atau menahan rasa marah.
Estetika Humanisme Diskusi Video Sesi Ke-7.pdfHendroGunawan8
Anger Management adalah suatu kemampuan atau teknik untuk melakukan tindakan mengatur pikiran, perasaan, nafsu amarah dengan cara yang tepat dan posistif serta dapat diterima di lingkungan, sehingga dapat mencegah sesuatu yang buruk atau merugikan diri sendiri dan orang lain.
Jaringan VOIP Ringkasan Modul Pertemuan Ke-6.pdfHendroGunawan8
Cisco Unified Communications (UC) adalah sistem komunikasi berbasis IP yang mengintegrasikan produk dan aplikasi suara, video, data, dan mobilitas. Ini memungkinkan komunikasi yang lebih efektif dan aman dan dapat mengubah cara kita berkomunikasi
Di dalam pengolahan citra, sebuah citra sering dilakukan proses penapisan (image filtering) untuk memperoleh citra sesuai dengan tujuan yang diinginkan.
Diskusi Modul Sistem Pakar Sesi Ke-6 - Salin.pdfHendroGunawan8
Metode Fuzzy Mamdani merupakan salah satu bagian dari Fuzzy Inference System yang berguna untuk penarikan kesimpulan atau suatu keputusan terbaik dalam permasalahan yang tidak pasti
Mindfulness adalah sikap berkesadaran penuh akan peristiwa yang sedang dijalani saat ini, dengan penuh perhatian, memiliki tujuan yang jelas, dan tanpa menghakimi.
Logika Fuzzy pertama kali dikembangkan oleh Lotfi A. Zadeh pada tahun 1965. Teori ini banyak diterapkan di berbagai bidang, antara lain representasi pikiran manusia ke dalam suatu sistem. Banyak alasan mengapa penggunaan logika Fuzzy ini sering dipergunakan antara lain, konsep logika Fuzzy yang mirip dengan konsep berpikir manusia. Sistem Fuzzy dapat merepresentasikan pengetahuan manusia ke dalam bentuk matematis dengan lebih menyerupai cara berpikir manusia ke dalam bentuk matematis. Selain itu, informasi berupa pengetahuan dan pengalaman mempunyai peranan penting dalam mengenali perilaku sistem di dunia nyata.
Kecerdasan emosional (Emotional Intelligence ) merupakan konsep baru yang dikembangkan oleh Daniel Goleman dalam karyanya pada tahun 1995 berjudul “Emotional Intelligence”.
Modul Ajar Matematika Kelas 11 Fase F Kurikulum MerdekaFathan Emran
Modul Ajar Matematika Kelas 11 SMA/MA Fase F Kurikulum Merdeka - abdiera.com. Modul Ajar Matematika Kelas 11 SMA/MA Fase F Kurikulum Merdeka. Modul Ajar Matematika Kelas 11 SMA/MA Fase F Kurikulum Merdeka. Modul Ajar Matematika Kelas 11 SMA/MA Fase F Kurikulum Merdeka. Modul Ajar Matematika Kelas 11 SMA/MA Fase F Kurikulum Merdeka.
Workshop "CSR & Community Development (ISO 26000)"_di BALI, 26-28 Juni 2024Kanaidi ken
Dlm wktu dekat, Pelatihan/WORKSHOP ”CSR/TJSL & Community Development (ISO 26000)” akn diselenggarakan di Swiss-BelHotel – BALI (26-28 Juni 2024)...
Dgn materi yg mupuni & Narasumber yg kompeten...akn banyak manfaat dan keuntungan yg didpt mengikuti Pelatihan menarik ini.
Boleh jga info ini👆 utk dishare_kan lgi kpda tmn2 lain/sanak keluarga yg sekiranya membutuhkan training tsb.
Smga Bermanfaat
Thanks Ken Kanaidi
1. 1
Machine Learning
Latihan Pertemuan 5
Supervised Learning: Naїve Bayes
Klasifikasi
Merupakan proses pembelajaran suatu fungsi tujuan yang memetakkan tiap himpunan
atribut X ke satu dari label kelas Y yang didefinisikan sebelumnya.
Model Klasifikasi
• Pemodelan deskriptif:
Model klasifikasi yang dapat berfungsi sebagai alat penjelasan untuk membedakan
objek-objek dalam kelas-kelas yang berbeda.
• Pemodelan Preditif
Model klasifikasi yang dapat digunakan untuk memprediksi label kelas yang tidak
diketahui pada suatu objek/record.
Klasifikasi Memerlukan Training Set
• Klasifikasi adalah proses pembelajaran secara terbimbing (supervised learning)
• Untuk melakukan klasifikasi, dibutuhkan training set sebagai data pembelajaran
• Setiap sampel dari training set memiliki atribut dan class label
Dua Tahapan Klasifikasi
1. Learning (training):
Pembelajaran menggunakan data training(unruk Naїve Bayessian Classifier, nilai
probabilitas dihitung dalam proses pembelajaran)
2. Testing:
Menguji model menggunakan data testing
Akurasi
• Akurasi =
Jumlah Prediksi Yang Benar
Jumlah Prediksi Keseluruhan
• Error =
Jumlah Prediksi Yang Salah
Jumlah Prediksi Keseluruhan
Teori Bayesian
P(H/X) =
𝑷(𝑿|𝑯) 𝑷(𝑯)
𝑷(𝑿)
Input
X
(Atribute Set)
Output
Y
(Class Label)
Classification
Model
(Fungsi Tujuan)
(1)
2. 2
➢ P(H/X): yaitu peluang hipotesa H berdasarkan kondisi X.
➢ X: data sampel dengan klas (label) yang tidak diketahui.
➢ H: merupakan hipotesa bahwa X adalah data dengan klas (label) C.
➢ P(H): peluang dari hipotesa H.
➢ P(X): adalah peluang dari X yang diamati.
➢ P(X|H): peluang X, berdasarkan kondisi pada hipotesa H
Naїve Bayessian Classifier
▪ Adalah metode classifier yang berdasarkan probabilitas dan Teorema Bayesian dengan
asumsi bahwa setiap variabel X bersifat bebas (independence).
▪ Dengan kata lain, Naїve Bayesian Classifier mengasumsikan bahwa keberadaan
sebuah atribut (variabel) tidak ada kaitannya dengan keberadaan atribut (variabel)
yang lain.
Naїve Bayesian Classifier
▪ Karena asumsi atribut tidak saling terkait (conditionally independent), maka:
P(X|𝑪𝒊) = ∏ 𝑷(𝒙𝒌|𝑪𝒊
𝒏
𝒌=𝟏 )
▪ Bila P(X|𝐶𝑖) dapat diketahui melalui perhitungan di atas, maka klas (label) dari data
sampel X adalah klas (label) yang memiliki P(X|𝐶𝑖) * P(𝐶𝑖) maksimum.
Latihan
Naї𝒗𝒆 𝑩𝒂𝒚𝒆𝒔
• Tentukan class label dari X
• X = (Outlook = Rainy, Temperature = Cool, Humidity = High, Windy = False)
Tabel 1. Dataset
ID OUTLOOK TEMPERATUR HUMIDITY WINDY PLAY
1 Sunny Hot High FALSE NO
2 Sunny Hot High TRUE NO
3 Cloudy Hot High FALSE YES
4 Rainy Mild High FALSE YES
5 Rainy Cool Normal FALSE YES
6 Rainy Cool Normal TRUE YES
7 Cloudy Cool Normal TRUE YES
8 Sunny Mild High FALSE NO
9 Sunny Cool Normal FALSE YES
10 Rainy Mild Normal FALSE YES
11 Sunny Mild Normal TRUE YES
12 Cloudy Mild High TRUE YES
13 Cloudy Hot Normal FALSE YES
14 Rainy Mild High TRUE NO
Clas:
C1: Play: YES
C2: Play: NO
(2)
3. 3
Bila data baru yang belum memiliki class sebagai berikut:
X = (Outlook = Rainy, Temperature = Cool, Humidity = High, Windy = False)
Hitung P(Xk|𝑪𝒊) untuk setiap Class i
Tabel 2. Dataset
ID OUTLOOK TEMPERATUR HUMIDITY WINDY PLAY
1 Rainy Mild High FALSE YES
2 Rainy Cool Normal FALSE YES
3 Rainy Cool Normal TRUE YES
4 Rainy Mild Normal FALSE YES
5 Rainy Mild High TRUE NO
6 Sunny Hot High FALSE NO
7 Sunny Hot High TRUE NO
8 Sunny Mild High FALSE NO
9 Sunny Cool Normal FALSE YES
10 Sunny Mild Normal TRUE YES
11 Cloudy Hot High FALSE YES
12 Cloudy Cool Normal TRUE YES
13 Cloudy Mild High TRUE YES
14 Cloudy Hot Normal FALSE YES
• P (Outlook = Rainy | Play = YES) =>
4
10
= 0,4
• P (Outlook = Rainy | Play = NO ) =>
1
4
= 2,5
Hitung P(Xk|𝑪𝒊) untuk setiap class i
Tabel 3. Dataset
ID OUTLOOK TEMPERATUR HUMIDITY WINDY PLAY
1 Rainy Cool Normal FALSE YES
2 Rainy Cool Normal TRUE YES
3 Cloudy Cool Normal TRUE YES
4 Sunny Cool Normal FALSE YES
5 Rainy Mild High FALSE YES
6 Sunny Mild High FALSE NO
7 Rainy Mild Normal FALSE YES
8 Sunny Mild Normal TRUE YES
9 Cloudy Mild High TRUE YES
10 Rainy Mild High TRUE NO
11 Sunny Hot High FALSE NO
12 Sunny Hot High TRUE NO
13 Cloudy Hot High FALSE YES
14 Cloudy Hot Normal FALSE YES
• P (Temperatur = Cool | Play = YES) =>
4
10
= 0,4
4. 4
• P (Temperatur = Cool | Play = NO ) =>
0
4
= 0
Hitung P(Xk|𝑪𝒊) untuk setiap class i
Tabel 4. Dataset
ID OUTLOOK TEMPERATUR HUMIDITY WINDY PLAY
1 Rainy Cool Normal FALSE YES
2 Rainy Cool Normal TRUE YES
3 Cloudy Cool Normal TRUE YES
4 Sunny Cool Normal FALSE YES
5 Rainy Mild Normal FALSE YES
6 Sunny Mild Normal TRUE YES
7 Cloudy Hot Normal FALSE YES
8 Sunny Hot High FALSE NO
9 Sunny Hot High TRUE NO
10 Cloudy Hot High FALSE YES
11 Rainy Mild High FALSE YES
12 Sunny Mild High FALSE NO
13 Cloudy Mild High TRUE YES
14 Rainy Mild High TRUE NO
• P (Humidity = High | Play = YES) =>
3
10
= 0,3
• P (Humidity = High | Play = NO ) =>
4
4
= 1
Hitung P(Xk|𝑪𝒊) untuk setiap class i
Tabel 5. Dataset
ID OUTLOOK TEMPERATUR HUMIDITY WINDY PLAY
1 Sunny Hot High TRUE NO
2 Rainy Cool Normal TRUE YES
3 Cloudy Cool Normal TRUE YES
4 Sunny Mild Normal TRUE YES
5 Cloudy Mild High TRUE YES
6 Rainy Mild High TRUE NO
7 Sunny Hot High FALSE NO
8 Cloudy Hot High FALSE YES
9 Rainy Mild High FALSE YES
10 Rainy Cool Normal FALSE YES
11 Sunny Mild High FALSE NO
12 Sunny Cool Normal FALSE YES
13 Rainy Mild Normal FALSE YES
14 Cloudy Hot Normal FALSE YES
• P (Windy = False | Play = YES) =>
6
10
= 0,6
5. 5
• P (Windy = False | Play = NO ) =>
2
4
= 0,5
P(X|𝑪𝒊) = ∏ 𝑷(𝒙𝒌|
𝒏
𝒌=𝟏 𝑪𝒊)
Tabel 6. Tabel class
Hitung P(Xk | 𝑪𝒊) untuk setiap class I
P (Outlook = Rainy | Play = YES) =>
4
10
0,4
P (Outlook = Rainy | Play = NO ) =>
1
4
2,5
P (Temperatur = Cool | Play = YES) =>
4
10
0,4
P (Temperatur = Cool | Play = NO ) =>
0
4
0
P (Humidity = High | Play = YES) =>
3
10
0,3
P (Humidity = High | Play = NO ) =>
4
4
1
P (Windy = False | Play = YES) =>
6
10
0,6
P (Windy = False | Play = NO ) =>
2
4
0,5
Hitung P(X|𝐶𝑖) untuk setiap class:
P(X|PLAY =” YES”)
0,4 x 0,4 x 0,3 x 0,6 = 0,029 –
P(X|PLAY =”NO”)
2,5 x 0 x 1 x 0,5 = 0
P(X|𝑪𝒊) * P(𝑪𝒊):
• P(X | PLAY = “YES”) * P( PLAY = “YES”)
0,29 * (
10
14
) = 0,207
• P(X | PLAY = “NO”) * P( PLAY = “NO”)
0 * (
4
14
) = 0
X memiliki class “PLAY = YES”
Karena
P(X|PLAY=”YES”) memiliki nilai maksimum pada perhitungan di atas.
6. 6
Naїve Bayesian: Summary
✓ Kekuatan:
- Mudah diimplementasikan.
- Memberikan hasil yang baik untuk banyak kasus.
✓ Klemehan:
- Harus mengasumsi bahwa antar fitur tidak terkait (independent) dalam realita,
keterkaitan itu ada.
- Keterkaitan tersebut tidak dapat dimodelkan oleh Naїve Bayesian Classifier.
Referenisi
Syahid Abdullah, S. M. (2023). Machine Learning. Dalam S. M. Syahid Abdullah, Supervised
Learning: Naїve Bayes (hal. 1 - 21). Jakarta: Informatika UNSIA.
Website: https://www.slideshare.net/HendroGunawan8/machine-learning-diskusi-4docx