Decision Tree and entropy

•Download as PPTX, PDF•

1 like•1,968 views

Decision trees use a tree structure to show possible consequences of decisions. A decision tree has internal nodes that represent tests of attributes, branches that represent test outcomes, and leaf nodes that represent class labels. Entropy is a measure of uncertainty in a random variable. Information gain is used to calculate the difference between entropy before and after splitting on an attribute, to determine which attribute provides the most information about the class. The attribute with the highest information gain is selected as the root node of the decision tree.

Technology Business

Decision Tree, Entropy
Md Saeed Siddik
Khaza Moinuddin Mazumder

Decision Tree
A decision tree is a decision support tool that
uses a tree and their possible consequences.
Decision Tree is a flow-chart like structure in which
internal node represents test on an attribute
each branch represents outcome of test
each leaf node represents class label (decision taken
after computing all attributes)
03/10/2013DT and Entropy2

Consists of DT
03/10/2013DT and Entropy3
 A decision tree consists of 3 types of nodes:
1.Decision nodes
2.Chance nodes
3.End nodes

Types of variables in DT
Four types of tree can generated from a variables.
Those are..
03/10/2013DT and Entropy4
Terminal
.
Both are Left side
/
Both are Right side

Separated in Both side
/

Decision Table
03/10/2013DT and Entropy5
Evidence Action Author Thread Length
e1 skip known new long
e2 read unknown new short
e3 skip unknown old long
e4 skip known old long
e5 read known new short
e6 skip known old long

Author
Length
Skip
Rea
d
Thread
read skip
Decision Tree
03/10/2013DT and Entropy6

Decision
03/10/2013DT and Entropy7
 Known ∧ Long ⇒ Skip
 Known ∧ Short ⇒ Read
 Unknown ∧ New ⇒ Read
 Unknown ∧ Old ⇒ Skip

Entropy
Entropy is a measure of the uncertainty in a random
variable
The term Entropy, usually refers to the Shannon
entropy, which quantifies the expected value of the
information contained in a message.
Given a random variable ‘v’ with value Vk , the entropy
of x is defined by
k
kk
vPvPvH )(log)()( 2
03/10/2013DT and Entropy8

Entropy Measurement Unit
03/10/2013DT and Entropy9
 bit
 {0,1}
 Based on 2
 nat
 Also known as nit or nepit
 Logarithmic unit, based on e
 1 nat = 1.44 bit = 0.434 ban
 ban
 Also known as hartley or a dit (short for decimal digit)
 Logarithmic unit, based on 10
 Introduced by Alan Turing and I J Good
 1 ban = 3.32 bits = 2.30 nats

Entropy
03/10/2013DT and Entropy10
 Given the Boolean random variable with
probability q, (1-q)
)1(log)1(log)( 22
qqqqqB

Entropy for n+p variables
03/10/2013DT and Entropy11
if we consider we have n+p examples
Where p is positive and n is negative.
qp
n
qp
n
qp
p
qp
p
qp
p
B
2
log
2
log
)(

Reminder
03/10/2013DT and Entropy12
The Expected Entropy (EH) or Reminder remaining
after trying attribute A (with branches i = 1,2.....,k)
is :
d
k kk
kkk
pn
p
B
pn
pn
Ader
1
)()(minRe

Information Gain (IG)
03/10/2013DT and Entropy13
Information Gain is a non-symmetric measure of
the difference between two probability
distributions P and Q.
)(minRe)()( Ader
np
p
BAGain

Calculate the root
03/10/2013DT and Entropy14
 Choose the attribute with highest gain.

This document summarizes key differences between various concepts in .NET: 1) TypeOf() is an operator that cannot be overloaded, while GetType() is a method that has many overloads. 2) const values are evaluated at design-time and must be integral or enum types, while readonly can be instance-level or static and allows complex types initialized in constructors. 3) Abstract classes can have implemented methods and allow single inheritance, while interfaces provide common functionality for unrelated classes and require implementing all methods.

2.2 decision tree

Krish_ver2

This document discusses decision tree induction and attribute selection measures. It describes common measures like information gain, gain ratio, and Gini index that are used to select the best splitting attribute at each node in decision tree construction. It provides examples to illustrate information gain calculation for both discrete and continuous attributes. The document also discusses techniques for handling large datasets like SLIQ and SPRINT that build decision trees in a scalable manner by maintaining attribute value lists.

pratik meshram-Unit 5 (contemporary mkt r sch)

Pratik Meshram

This document discusses various data analysis techniques including cluster analysis, multidimensional scaling, perceptual mapping, and discriminant analysis. It provides details on cluster analysis methods and processes. Cluster analysis involves grouping similar observations into clusters so that observations within a cluster are more similar to each other than observations in other clusters. The document discusses different clustering algorithms and applications. It also provides an example of using cluster analysis to segment customers of an auto insurance company based on preferences.

Decision Tables as a Programming Tool

Brenda Barnes

The document discusses decision tables as a programming tool. It provides a brief history of decision table usage, describes common decision table structures and formats, and discusses techniques for simplifying, optimizing and generating code from decision tables. Examples are given of decision table usage for tasks like order processing and discount calculation. The speaker's background and the capabilities of RapidGen's decision table language are also summarized.

Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...

Sunil Nair

The document summarizes research on classifying breast cancer datasets using decision trees. The researchers used a Wisconsin breast cancer dataset containing 699 instances with 10 attributes plus a class attribute. They preprocessed the data to handle missing values, compared various classification methods, and achieved the best accuracy of 97% using decision trees with attribute selection. Issues addressed included unbalanced classes and future work proposed methods like clustering and multiple classifiers to further improve accuracy.

Decision trees

Rohit Srivastava

This document provides an overview of decision trees, including: - Decision trees use a series of Boolean tests to classify data and make predictions based on attribute values. - The ID3 algorithm selects the attribute with the lowest entropy, or highest information gain, at each node to best split the data. - Entropy measures the impurity or uncertainty in a dataset, and is minimized when all data falls into a single target class. - Decision trees are easy to interpret, fast for classification, but may suffer from error propagation and produce non-optimal rectangular regions.

Dcs unit 2

Anil Nigam

This document provides an overview of information theory and coding concepts including: 1) Definitions of information, entropy, joint entropy, conditional entropy, and mutual information are introduced along with examples of calculating these quantities for discrete memoryless sources and channels. 2) Shannon's theorem for channel capacity is discussed and the channel capacity of a discrete memoryless channel is defined as the maximum mutual information over all possible input distributions. 3) Properties of entropy such as it being a measure of uncertainty, having a minimum of 0 and maximum of log2K, and being maximized when probabilities are equal are proven.

Secure information aggregation in sensor networks

Aleksandr Yampolskiy

The document summarizes the paper "Secure Information Aggregation in Sensor Networks" which proposes a framework called aggregate-commit-prove for securely computing aggregation functions like median, min/max, counting distinct elements in sensor networks even if sensors or aggregators are compromised. It describes the sensor network model, attack model, and gives concrete sublinear protocols for computing specific aggregation functions that allow the base station to detect incorrect results with high probability.

A qqplot is a graphical method for comparing two probability distributions by plotting their quantiles against each other. To construct a qqplot, the quantiles of each dataset are calculated and then plotted against each other. When the datasets have different sizes, interpolated quantile estimates are used. Quantiles divide a dataset so that a given percentage of values lie below a particular quantile. Qqplots are useful for comparing samples and revealing differences in location, spread, and shape between distributions.

Comparative analysis on different DES model

Saeed Siddik

Connect dell equallogic storage to linux instance

Saeed Siddik

This document provides instructions for configuring an Ubuntu server as an iSCSI initiator to connect to a Dell Equallogic storage array, mounting the new disk, and restoring data if the disk becomes unmounted or deleted. It describes installing the open-iscsi package, configuring the initiator, discovering the iSCSI target, logging into the target, partitioning and formatting the new disk, mounting it, and adding it to fstab. It also explains how to unmount an infected disk, remove its automatic mount, reboot to restore from a snapshot, rediscover and reconnect to the target, and remount the restored volume.

Comparison between VMware and Open Stack Cloud

Saeed Siddik

VMware and OpenStack are cloud platforms that provide virtualization and management capabilities. VMware uses vMotion for live migration, DRS for resource scheduling, and vSphere for storage and network management. OpenStack uses KVM for live migration, Nova scheduler for resource scheduling, and integrated services like Nova-volume and Quantum/Nova-network for storage and network management respectively. While both support live migration, security groups, and integration with solutions like Cisco Nexus 1000V, VMware provides more proprietary and centralized management capabilities compared to OpenStack's open source-based modular approach.

Deadlock in distribute system by saeed siddik

Saeed Siddik

The document discusses deadlocks in distributed systems, outlining the four conditions required for a deadlock, strategies to handle deadlocks such as ignoring, detecting, preventing, and avoiding them, and algorithms for centralized deadlock detection and distributed deadlock detection and prevention. It provides examples of resource allocation graphs to illustrate deadlock conditions and explains how distributed deadlock detection and prevention algorithms work.

MIS Case Study

Saeed Siddik

This case study examines how three companies - Heidelberg, Honeywell, and Eaton - developed smart products and services using information technology. Heidelberg used sensors and networking in printing presses to remotely monitor equipment and optimize performance. Honeywell deployed a remote control system at oil refineries to reduce costs. Eaton created a home monitoring system using various sensors to alert homeowners of issues. The companies benefited from lower maintenance costs, increased services revenue, and expanded customer bases by developing these smart product and service strategies.

Birth & death information automation

Saeed Siddik

This document describes a project to automate birth and death registration information in Bangladesh. The project aims to ensure 100% accurate registration and preserve citizens' right to their birth and death information. It will provide a reliable online system for obtaining birth and death certificates. The project involves identifying stakeholders, eliciting requirements, and developing database, entity relationship and use case models to design a web-based registration system with modules for data entry, user registration and searching. Future prospects include implementing upgraded features and realizing the system in practice.

Large Language Model (LLM) and it’s Geospatial Applications

Rohit Gautam

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject. We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup. Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved. The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring. The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise. By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.

Recently uploaded

Large Language Model (LLM) and it’s Geospatial Applications

Rohit Gautam

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

shyamraj55

名前です男

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Artificial Intelligence for XMLDevelopment

Octavian Nadolu

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Recently uploaded (20)

Large Language Model (LLM) and it’s Geospatial Applications

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Pushing the limits of ePRTC: 100ns holdover for 100 days

Communications Mining Series - Zero to Hero - Session 1

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack

“I’m still / I’m still / Chaining from the Block”

TrustArc Webinar - 2024 Global Privacy Survey

Monitoring Java Application Security with JDK Tools and JFR Events

20 Comprehensive Checklist of Designing and Developing a Website

A tale of scale & speed: How the US Navy is enabling software delivery from l...

Securing your Kubernetes cluster_ a step-by-step guide to success !

Introduction to CHERI technology - Cybersecurity

National Security Agency - NSA mobile device best practices

Artificial Intelligence for XMLDevelopment

Video Streaming: Then, Now, and in the Future

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

20240605 QFM017 Machine Intelligence Reading List May 2024

Climate Impact of Software Testing at Nordic Testing Days

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Decision Tree and entropy

1. Decision Tree, Entropy Md Saeed Siddik Khaza Moinuddin Mazumder

2. Decision Tree A decision tree is a decision support tool that uses a tree and their possible consequences. Decision Tree is a flow-chart like structure in which internal node represents test on an attribute each branch represents outcome of test each leaf node represents class label (decision taken after computing all attributes) 03/10/2013DT and Entropy2

3. Consists of DT 03/10/2013DT and Entropy3  A decision tree consists of 3 types of nodes: 1.Decision nodes 2.Chance nodes 3.End nodes

4. Types of variables in DT Four types of tree can generated from a variables. Those are.. 03/10/2013DT and Entropy4 Terminal . Both are Left side / Both are Right side Separated in Both side /

5. Decision Table 03/10/2013DT and Entropy5 Evidence Action Author Thread Length e1 skip known new long e2 read unknown new short e3 skip unknown old long e4 skip known old long e5 read known new short e6 skip known old long

6. Author Length Skip Rea d Thread read skip Decision Tree 03/10/2013DT and Entropy6

7. Decision 03/10/2013DT and Entropy7  Known ∧ Long ⇒ Skip  Known ∧ Short ⇒ Read  Unknown ∧ New ⇒ Read  Unknown ∧ Old ⇒ Skip

8. Entropy Entropy is a measure of the uncertainty in a random variable The term Entropy, usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Given a random variable ‘v’ with value Vk , the entropy of x is defined by k kk vPvPvH )(log)()( 2 03/10/2013DT and Entropy8

9. Entropy Measurement Unit 03/10/2013DT and Entropy9  bit  {0,1}  Based on 2  nat  Also known as nit or nepit  Logarithmic unit, based on e  1 nat = 1.44 bit = 0.434 ban  ban  Also known as hartley or a dit (short for decimal digit)  Logarithmic unit, based on 10  Introduced by Alan Turing and I J Good  1 ban = 3.32 bits = 2.30 nats

10. Entropy 03/10/2013DT and Entropy10  Given the Boolean random variable with probability q, (1-q) )1(log)1(log)( 22 qqqqqB

11. Entropy for n+p variables 03/10/2013DT and Entropy11 if we consider we have n+p examples Where p is positive and n is negative. qp n qp n qp p qp p qp p B 2 log 2 log )(

12. Reminder 03/10/2013DT and Entropy12 The Expected Entropy (EH) or Reminder remaining after trying attribute A (with branches i = 1,2.....,k) is : d k kk kkk pn p B pn pn Ader 1 )()(minRe

13. Information Gain (IG) 03/10/2013DT and Entropy13 Information Gain is a non-symmetric measure of the difference between two probability distributions P and Q. )(minRe)()( Ader np p BAGain

14. Calculate the root 03/10/2013DT and Entropy14  Choose the attribute with highest gain.

Decision Tree and entropy

Recommended

Recommended

More Related Content

More from Saeed Siddik

More from Saeed Siddik (7)

Recently uploaded

Recently uploaded (20)

Decision Tree and entropy