This document discusses machine learning concepts including representation, evaluation, and optimization. It provides examples of document classification using supervised learning. Key points include:
- Representation refers to the model or classifier used, such as a decision tree. Evaluation measures how good the classifier is, such as using accuracy on a test set. Optimization searches for the best representation according to the evaluation.
- In document classification, documents are represented as vectors of word counts and the goal is to predict a label like topic. Accuracy on labeled test documents is used to evaluate classifiers.
- Decision trees are a way to represent conditional rules for classification. They are learned using an algorithm that selects attributes to split on based on information gain, reducing uncertainty
Supervised learning: Types of Machine LearningLibya Thomas
This document discusses machine learning concepts including supervised and unsupervised learning, prediction, diagnosis, and discovery. It provides examples of using naive Bayes classifiers for spam filtering and digit recognition. For spam filtering, it shows how to represent emails as bags-of-words and learn word probabilities from labeled training emails. It also discusses issues with overfitting and the need for smoothing techniques like Laplace smoothing when estimating probabilities. For digit recognition, it outlines representing images as feature vectors over pixel values and using a naive Bayes model to classify images.
- A high-level overview of artificial intelligence
- The importance of predictions across different domains of life
- Big (text) data
- Competition as a discovery process
- Domain-general learning
- Computer vision and natural language processing
- Elements of a machine learning system
- A hierarchy of problem classes
- Data collection
- The purpose of a model
- Logistic loss function
- Likelihood, log likelihood and maximum likelihood
- Ockham's Razor
- Intelligence as sequence prediction
- Building blocks of neural networks: neurons, weights and layers
- Logistic regression as a neural network
- Sigmoid function
- A look at backpropagation
- Gradient descent
- Convolutional neural networks
- Max-pooling
- Deep neural networks
This document provides an overview of game theory and its applications to neural networks. It begins by discussing deductive and inductive reasoning, and how algorithms like weighted majority and gradient descent can be understood through the lens of game theory. Specifically, it notes that gradient descent achieves low regret when viewed as playing against an adversarial environment. It then discusses how neural networks achieve superhuman performance despite being non-convex problems, which required decades of engineering tweaks. Finally, it suggests game theory can provide insights into modeling populations of neural networks or "experts" that distribute knowledge effectively.
a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' using BERT, and GPT.
This document summarizes key concepts from the CS 221 lecture on machine learning. It discusses supervised learning techniques like Naive Bayes classification, linear regression, perceptrons, and SVMs. It also covers unsupervised learning through k-nearest neighbors and discusses challenges like overfitting, generalization, and the curse of dimensionality.
This document discusses knowledge representation in artificial intelligence. It begins by discussing what AI is and some of the underlying assumptions of AI techniques. It then discusses how knowledge representation captures generalizations, is understood by people, can be modified, and is useful in different situations. It provides examples of knowledge representation using tic-tac-toe and magic squares. It also discusses representing facts, reasoning, the frame problem, predicate logic, and approaches to knowledge representation.
The document discusses classification, which is a type of supervised learning where models are used to predict categorical class labels. It covers classification processes including model construction using a training set and model usage to classify future objects. Specific classification algorithms covered include decision trees, naive Bayes, neural networks, and support vector machines. Evaluation metrics for classification methods such as accuracy, speed, and interpretability are also discussed.
Computational Biology, Part 4 Protein Coding Regionsbutest
The document discusses different machine learning approaches for supervised classification and sequence analysis. It describes several classification algorithms like k-nearest neighbors, decision trees, linear discriminants, and support vector machines. It also discusses evaluating classifiers using cross-validation and confusion matrices. For sequence analysis, it covers using position-specific scoring matrices, hidden Markov models, cobbling, and family pairwise search to identify new members of protein families. It compares the performance of these different machine learning methods on sequence analysis tasks.
Supervised learning: Types of Machine LearningLibya Thomas
This document discusses machine learning concepts including supervised and unsupervised learning, prediction, diagnosis, and discovery. It provides examples of using naive Bayes classifiers for spam filtering and digit recognition. For spam filtering, it shows how to represent emails as bags-of-words and learn word probabilities from labeled training emails. It also discusses issues with overfitting and the need for smoothing techniques like Laplace smoothing when estimating probabilities. For digit recognition, it outlines representing images as feature vectors over pixel values and using a naive Bayes model to classify images.
- A high-level overview of artificial intelligence
- The importance of predictions across different domains of life
- Big (text) data
- Competition as a discovery process
- Domain-general learning
- Computer vision and natural language processing
- Elements of a machine learning system
- A hierarchy of problem classes
- Data collection
- The purpose of a model
- Logistic loss function
- Likelihood, log likelihood and maximum likelihood
- Ockham's Razor
- Intelligence as sequence prediction
- Building blocks of neural networks: neurons, weights and layers
- Logistic regression as a neural network
- Sigmoid function
- A look at backpropagation
- Gradient descent
- Convolutional neural networks
- Max-pooling
- Deep neural networks
This document provides an overview of game theory and its applications to neural networks. It begins by discussing deductive and inductive reasoning, and how algorithms like weighted majority and gradient descent can be understood through the lens of game theory. Specifically, it notes that gradient descent achieves low regret when viewed as playing against an adversarial environment. It then discusses how neural networks achieve superhuman performance despite being non-convex problems, which required decades of engineering tweaks. Finally, it suggests game theory can provide insights into modeling populations of neural networks or "experts" that distribute knowledge effectively.
a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' using BERT, and GPT.
This document summarizes key concepts from the CS 221 lecture on machine learning. It discusses supervised learning techniques like Naive Bayes classification, linear regression, perceptrons, and SVMs. It also covers unsupervised learning through k-nearest neighbors and discusses challenges like overfitting, generalization, and the curse of dimensionality.
This document discusses knowledge representation in artificial intelligence. It begins by discussing what AI is and some of the underlying assumptions of AI techniques. It then discusses how knowledge representation captures generalizations, is understood by people, can be modified, and is useful in different situations. It provides examples of knowledge representation using tic-tac-toe and magic squares. It also discusses representing facts, reasoning, the frame problem, predicate logic, and approaches to knowledge representation.
The document discusses classification, which is a type of supervised learning where models are used to predict categorical class labels. It covers classification processes including model construction using a training set and model usage to classify future objects. Specific classification algorithms covered include decision trees, naive Bayes, neural networks, and support vector machines. Evaluation metrics for classification methods such as accuracy, speed, and interpretability are also discussed.
Computational Biology, Part 4 Protein Coding Regionsbutest
The document discusses different machine learning approaches for supervised classification and sequence analysis. It describes several classification algorithms like k-nearest neighbors, decision trees, linear discriminants, and support vector machines. It also discusses evaluating classifiers using cross-validation and confusion matrices. For sequence analysis, it covers using position-specific scoring matrices, hidden Markov models, cobbling, and family pairwise search to identify new members of protein families. It compares the performance of these different machine learning methods on sequence analysis tasks.
This document provides an introduction to machine learning and inductive inference. It discusses what machine learning is, common learning tasks like concept learning and function learning, different data representations, and example applications such as knowledge discovery and building adaptive systems. The course will cover generalizing from specific examples to broader concepts through inductive inference and different learning approaches.
Static code analysis is a useful technique for finding bugs in code and proving their absence. Existing industrial tools sacrifice precision for scalability which leads to false errors reported and missed bugs.
I will describe a new way to perform accurate static analysis (ASA) of smart contracts in order to identify bugs and prove their absence before the code is deployed. ASA guarantees that all bugs are reported and that all errors are real. ASA operates on bytecode programs which enables to check the code even when the source is not available.
Scalability of the method is guaranteed by verifying each of the contracts with respect to the requirements of other contracts.
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
This video on Data science interview questions will take you through some of the most popular questions that you face in your Data science interviews. It’s simply impossible to ignore the importance of data and our capacity to analyze, consolidate, and contextualize it. Data scientists are relied upon to fill this need, but there is a serious dearth of qualified candidates worldwide. If you’re moving down the path to be a data scientist, you need to be prepared to impress prospective employers with your knowledge. In addition to explaining why data science is so important, you’ll need to show that you're technically proficient with Big Data concepts, frameworks, and applications. So, here we discuss the list of most popular questions you can expect in an interview and how to frame your answers.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. The data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data, you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its large library of mathematical functions
Perform scientific and technical computing using the SciPy package and its sub-packages such as Integrate, Optimize, Statistics, IO and Weave
4. Perform data analysis and manipulation using data structures and tools provided in the Pandas package
5. Gain expertise in machine learning using the Scikit-Learn package
Learn more at www.simplilearn.com/big-data-and-analytics/python-for-data-science-training
Linear algebra and probability (Deep Learning chapter 2&3)Yan Xu
Linear algebra and probability concepts are summarized in 3 sentences:
Scalars, vectors, matrices, and tensors are introduced as the basic components of linear algebra. Common linear algebra operations like transpose, addition, and multiplication are described. Probability concepts such as random variables, probability distributions, moments, and the central limit theorem are covered to lay the foundation for understanding deep learning techniques.
This document discusses classification and clustering techniques used in search engines. It covers classification tasks like spam detection, sentiment analysis, and ad classification. Naive Bayes and support vector machines are described as common classification approaches. Features, feature selection, and evaluation metrics for classifiers are also summarized.
The document discusses machine learning concepts and algorithms. It provides definitions of learning, discusses different types of learning problems including supervised, unsupervised, and reinforcement learning. It also covers key machine learning topics like generalization error, empirical risk minimization, probably approximately correct learning, model selection, and overfitting vs underfitting.
This document provides an overview of front-end development concepts including HTML, CSS, and JavaScript. It discusses the anatomy of HTML tags and common tags to know. It also reviews CSS selectors, the box model, positioning, and other CSS concepts like floats and clears. Finally, it covers basic JavaScript data types, variables, functions, and control flow structures like if/else statements and loops.
A primer in Data Analysis. To substantiate the concepts, I presented Python code in the form of an ipython notebook (not included - get in touch for these, email and twitter are on the last slide).
The talk starts by describing general data analysis (and skills required). I then speak about computing descriptive statistics and explain the details of two types of predictive models (simple linear regression and naive Bayes classifiers). We build examples using both predictive models using python (Pandas and Matplotlib).
Genetic algorithms (GAs) are optimization algorithms inspired by Darwinian evolution. They use techniques like mutation, crossover, and selection to evolve solutions to problems iteratively. The document provides examples to illustrate how GAs work, including finding a binary number and fitting a polynomial to data points. GAs initialize a population of random solutions, then improve it over generations by keeping the fittest solutions and breeding them using crossover and mutation to produce new solutions, until finding an optimal or near-optimal solution.
Probability theory provides a framework for quantifying and manipulating uncertainty. It allows optimal predictions given incomplete information. The document outlines key probability concepts like sample spaces, events, axioms of probability, joint/conditional probabilities, and Bayes' rule. It also covers important probability distributions like binomial, Gaussian, and multivariate Gaussian. Finally, it discusses optimization concepts for machine learning like functions, derivatives, and using derivatives to find optima like maxima and minima.
This document demonstrates how to simulate experimental data in Excel and R to gain insights into study design and statistical analysis. It shows how to generate random normal distributions to represent two groups, with and without an effect added, and then perform t-tests on the simulated data. Running many such simulations allows understanding of false positive rates, statistical power for different sample sizes, and other statistical properties before collecting real data. The key benefits of simulation include anticipating study design issues, clarifying optimal analysis methods, and performing power analyses to determine appropriate sample sizes.
The document discusses object detection using convolutional neural networks and Region-based Convolutional Neural Networks (R-CNNs). It provides an overview of object detection and classification tasks, as well as a history of approaches using hand-crafted features with SVMs and more recent deep learning methods. Region-based Convolutional Neural Networks were able to achieve state-of-the-art results on object detection benchmarks by proposing regions, extracting features from CNNs, and classifying each region.
The document discusses machine learning and genetic algorithms. It provides definitions of machine learning as the study of processes that lead to self-improvement of machine performance through experience. It also discusses different types of learning including supervised learning, unsupervised learning, and reinforcement learning. The document then explains genetic algorithms as evolutionary algorithms that use operations like mutation and crossover to evolve solutions to problems over multiple generations.
This document summarizes Robert Fry's presentation on computation and design of autonomous intelligent systems. It outlines a computational theory of intelligence based on defining questions and answers within a system. Key points include:
- Intelligent systems acquire information to make decisions to achieve goals.
- Questions are defined as sets of possible answers. Boolean algebra is used to represent questions and assertions.
- Probability and entropy theories are derived from this logical framework.
- A simple protozoan system is used to illustrate how a system maps information to decisions.
- Neural computation is modeled using this theory, with neurons posing questions and making optimal decisions.
- Hebbian learning allows neural systems to adapt optimally via dual-matching.
This document provides an overview of machine learning concepts. It defines machine learning as creating computer programs that improve with experience. Supervised learning uses labeled training data to build models that can classify or predict new examples, while unsupervised learning finds patterns in unlabeled data. Examples of machine learning applications include spam filtering, recommendation systems, and medical diagnosis. The document also discusses important machine learning techniques like k-nearest neighbors, decision trees, regularization, and cross-validation.
Are you better than a coin toss? - Richard Warbuton & John Oliver (jClarity)jaxLondonConference
Presented at JAX London 2013
So you’re a big data and distributed systems “expert”, you’ve collected 500 billion data points, thrown it into sci-lib-of-the-week, you’re using Hadoop, backing onto those cool AWS GPU instances, let it grind away for days and it's spit out the answer to life the universe and everything. But is it really better than a coin toss? How do you validate whether your data analysis algorithm works? Are you learning a solution to your problems or just the data you already have? What problems can you encounter when analysing your data?
The document discusses XUnit testing and its limitations. It argues that XUnit tests do too much by combining test execution with setup/teardown logic. Generative testing is proposed as an alternative where tests are automatically generated from the domain definition to find edge cases. However, verification is ultimately undecidable due to Rice's theorem, and testing can only improve confidence rather than prove correctness. Functional programming is suggested as a way to constrain the problem domain and make code easier to reason about and test.
The document discusses machine learning techniques. It describes how machines can learn from examples, through experience and adaptation. It evaluates methods for acquiring and representing knowledge, including decision trees, neural networks and genetic algorithms. While machine learning techniques have benefits like learning from experience and generalizing, they also have drawbacks such as not knowing whether the learned knowledge is completely correct.
This document discusses graphs and graph analytics. It begins by defining what a graph is as G = (V,E) where V is a set of vertices and E is a set of edges. It then discusses real world examples of graphs like the web, social networks, and communication logs. It covers various graph analytics tasks like structural analysis to compute metrics like degree and centrality, traversals to find minimum spanning trees and maximum flow, and pattern matching to find subgraphs that match a given pattern. It also discusses different languages that can be used to express patterns over graph data like SPARQL, Datalog, and SQL.
The document discusses gradient descent and stochastic gradient descent algorithms for machine learning. It explains that gradient descent aims to minimize a cost function by iteratively updating model parameters in the direction of steepest descent. Stochastic gradient descent processes minibatches or individual training examples on each step rather than the full dataset, allowing for parallelization. Regularization terms like L1 and L2 norms are also discussed for preventing overfitting by enforcing constraints on model weights.
This document provides an introduction to machine learning and inductive inference. It discusses what machine learning is, common learning tasks like concept learning and function learning, different data representations, and example applications such as knowledge discovery and building adaptive systems. The course will cover generalizing from specific examples to broader concepts through inductive inference and different learning approaches.
Static code analysis is a useful technique for finding bugs in code and proving their absence. Existing industrial tools sacrifice precision for scalability which leads to false errors reported and missed bugs.
I will describe a new way to perform accurate static analysis (ASA) of smart contracts in order to identify bugs and prove their absence before the code is deployed. ASA guarantees that all bugs are reported and that all errors are real. ASA operates on bytecode programs which enables to check the code even when the source is not available.
Scalability of the method is guaranteed by verifying each of the contracts with respect to the requirements of other contracts.
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
This video on Data science interview questions will take you through some of the most popular questions that you face in your Data science interviews. It’s simply impossible to ignore the importance of data and our capacity to analyze, consolidate, and contextualize it. Data scientists are relied upon to fill this need, but there is a serious dearth of qualified candidates worldwide. If you’re moving down the path to be a data scientist, you need to be prepared to impress prospective employers with your knowledge. In addition to explaining why data science is so important, you’ll need to show that you're technically proficient with Big Data concepts, frameworks, and applications. So, here we discuss the list of most popular questions you can expect in an interview and how to frame your answers.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. The data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data, you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its large library of mathematical functions
Perform scientific and technical computing using the SciPy package and its sub-packages such as Integrate, Optimize, Statistics, IO and Weave
4. Perform data analysis and manipulation using data structures and tools provided in the Pandas package
5. Gain expertise in machine learning using the Scikit-Learn package
Learn more at www.simplilearn.com/big-data-and-analytics/python-for-data-science-training
Linear algebra and probability (Deep Learning chapter 2&3)Yan Xu
Linear algebra and probability concepts are summarized in 3 sentences:
Scalars, vectors, matrices, and tensors are introduced as the basic components of linear algebra. Common linear algebra operations like transpose, addition, and multiplication are described. Probability concepts such as random variables, probability distributions, moments, and the central limit theorem are covered to lay the foundation for understanding deep learning techniques.
This document discusses classification and clustering techniques used in search engines. It covers classification tasks like spam detection, sentiment analysis, and ad classification. Naive Bayes and support vector machines are described as common classification approaches. Features, feature selection, and evaluation metrics for classifiers are also summarized.
The document discusses machine learning concepts and algorithms. It provides definitions of learning, discusses different types of learning problems including supervised, unsupervised, and reinforcement learning. It also covers key machine learning topics like generalization error, empirical risk minimization, probably approximately correct learning, model selection, and overfitting vs underfitting.
This document provides an overview of front-end development concepts including HTML, CSS, and JavaScript. It discusses the anatomy of HTML tags and common tags to know. It also reviews CSS selectors, the box model, positioning, and other CSS concepts like floats and clears. Finally, it covers basic JavaScript data types, variables, functions, and control flow structures like if/else statements and loops.
A primer in Data Analysis. To substantiate the concepts, I presented Python code in the form of an ipython notebook (not included - get in touch for these, email and twitter are on the last slide).
The talk starts by describing general data analysis (and skills required). I then speak about computing descriptive statistics and explain the details of two types of predictive models (simple linear regression and naive Bayes classifiers). We build examples using both predictive models using python (Pandas and Matplotlib).
Genetic algorithms (GAs) are optimization algorithms inspired by Darwinian evolution. They use techniques like mutation, crossover, and selection to evolve solutions to problems iteratively. The document provides examples to illustrate how GAs work, including finding a binary number and fitting a polynomial to data points. GAs initialize a population of random solutions, then improve it over generations by keeping the fittest solutions and breeding them using crossover and mutation to produce new solutions, until finding an optimal or near-optimal solution.
Probability theory provides a framework for quantifying and manipulating uncertainty. It allows optimal predictions given incomplete information. The document outlines key probability concepts like sample spaces, events, axioms of probability, joint/conditional probabilities, and Bayes' rule. It also covers important probability distributions like binomial, Gaussian, and multivariate Gaussian. Finally, it discusses optimization concepts for machine learning like functions, derivatives, and using derivatives to find optima like maxima and minima.
This document demonstrates how to simulate experimental data in Excel and R to gain insights into study design and statistical analysis. It shows how to generate random normal distributions to represent two groups, with and without an effect added, and then perform t-tests on the simulated data. Running many such simulations allows understanding of false positive rates, statistical power for different sample sizes, and other statistical properties before collecting real data. The key benefits of simulation include anticipating study design issues, clarifying optimal analysis methods, and performing power analyses to determine appropriate sample sizes.
The document discusses object detection using convolutional neural networks and Region-based Convolutional Neural Networks (R-CNNs). It provides an overview of object detection and classification tasks, as well as a history of approaches using hand-crafted features with SVMs and more recent deep learning methods. Region-based Convolutional Neural Networks were able to achieve state-of-the-art results on object detection benchmarks by proposing regions, extracting features from CNNs, and classifying each region.
The document discusses machine learning and genetic algorithms. It provides definitions of machine learning as the study of processes that lead to self-improvement of machine performance through experience. It also discusses different types of learning including supervised learning, unsupervised learning, and reinforcement learning. The document then explains genetic algorithms as evolutionary algorithms that use operations like mutation and crossover to evolve solutions to problems over multiple generations.
This document summarizes Robert Fry's presentation on computation and design of autonomous intelligent systems. It outlines a computational theory of intelligence based on defining questions and answers within a system. Key points include:
- Intelligent systems acquire information to make decisions to achieve goals.
- Questions are defined as sets of possible answers. Boolean algebra is used to represent questions and assertions.
- Probability and entropy theories are derived from this logical framework.
- A simple protozoan system is used to illustrate how a system maps information to decisions.
- Neural computation is modeled using this theory, with neurons posing questions and making optimal decisions.
- Hebbian learning allows neural systems to adapt optimally via dual-matching.
This document provides an overview of machine learning concepts. It defines machine learning as creating computer programs that improve with experience. Supervised learning uses labeled training data to build models that can classify or predict new examples, while unsupervised learning finds patterns in unlabeled data. Examples of machine learning applications include spam filtering, recommendation systems, and medical diagnosis. The document also discusses important machine learning techniques like k-nearest neighbors, decision trees, regularization, and cross-validation.
Are you better than a coin toss? - Richard Warbuton & John Oliver (jClarity)jaxLondonConference
Presented at JAX London 2013
So you’re a big data and distributed systems “expert”, you’ve collected 500 billion data points, thrown it into sci-lib-of-the-week, you’re using Hadoop, backing onto those cool AWS GPU instances, let it grind away for days and it's spit out the answer to life the universe and everything. But is it really better than a coin toss? How do you validate whether your data analysis algorithm works? Are you learning a solution to your problems or just the data you already have? What problems can you encounter when analysing your data?
The document discusses XUnit testing and its limitations. It argues that XUnit tests do too much by combining test execution with setup/teardown logic. Generative testing is proposed as an alternative where tests are automatically generated from the domain definition to find edge cases. However, verification is ultimately undecidable due to Rice's theorem, and testing can only improve confidence rather than prove correctness. Functional programming is suggested as a way to constrain the problem domain and make code easier to reason about and test.
The document discusses machine learning techniques. It describes how machines can learn from examples, through experience and adaptation. It evaluates methods for acquiring and representing knowledge, including decision trees, neural networks and genetic algorithms. While machine learning techniques have benefits like learning from experience and generalizing, they also have drawbacks such as not knowing whether the learned knowledge is completely correct.
This document discusses graphs and graph analytics. It begins by defining what a graph is as G = (V,E) where V is a set of vertices and E is a set of edges. It then discusses real world examples of graphs like the web, social networks, and communication logs. It covers various graph analytics tasks like structural analysis to compute metrics like degree and centrality, traversals to find minimum spanning trees and maximum flow, and pattern matching to find subgraphs that match a given pattern. It also discusses different languages that can be used to express patterns over graph data like SPARQL, Datalog, and SQL.
The document discusses gradient descent and stochastic gradient descent algorithms for machine learning. It explains that gradient descent aims to minimize a cost function by iteratively updating model parameters in the direction of steepest descent. Stochastic gradient descent processes minibatches or individual training examples on each step rather than the full dataset, allowing for parallelization. Regularization terms like L1 and L2 norms are also discussed for preventing overfitting by enforcing constraints on model weights.
This document discusses various statistical concepts related to analyzing results from multiple studies, including effect sizes, p-values, meta-analysis, and issues that can arise such as publication bias, multiple hypothesis testing, and heterogeneity. It provides examples of how effect sizes are calculated, how meta-analyses combine results across studies, and statistical methods that can help address problems like multiple comparisons, including Bonferroni corrections and false discovery rate analysis. The document cautions that when analyzing large datasets, there is a risk of finding spurious correlations due to chance that may have no predictive value.
The document discusses NoSQL databases and related systems. It provides a table comparing over 20 systems across features such as scale, indexing, transactions, joins/analytics, and data model. The key points are that scale is the primary motivation for NoSQL databases, which often sacrifice consistency in order to achieve high availability and partitioning across many servers. This relates to Brewer's CAP theorem, which states it is impossible to have all three of consistency, availability, and partitioning simultaneously in a distributed system.
The document discusses the concept of scalability in computing. Operationally, scalability now means being able to utilize thousands of inexpensive computers. Formally, in the past scalability meant algorithms with polynomial time complexity, while now it means logarithmic time complexity, allowing data to be processed more efficiently as data sizes increase. The document provides examples of finding matching DNA sequences and word frequency analysis to illustrate how distributed and parallel algorithms can improve scalability.
This document discusses data models and databases. It begins by explaining the components of a data model: structures, constraints, and operations. It then defines what a database is as a collection of organized information for efficient retrieval. The document goes on to discuss relational databases and their advantages like sharing data, enforcing data models, scaling to large datasets, and flexibility. It covers concepts like declarative query languages, views, indexes, and using databases for analytics like matrix operations and experiment design.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
1. WHEREWE ARE
Informatics
• management, manipulation, integration
• emphasis on scale, some emphasis on tools
Analytics
• statistical estimation and prediction
• machine learning, data mining
Visualization
• communication and presentation
Bill Howe, UW
2. WHAT IS MACHINE LEARNING?
“Systems that automatically learn programs from data”
Teaching a computer about the world
Bill Howe, UW
2
[Domingos 2012]
[Mark Dredze]
3. WHAT’S THE DIFFERENCE BETWEEN STATISTICS
AND MACHINE LEARNING?
Bill Howe, UW
3
Leo Breiman,Statistical Modeling:The Two Cultures,
Statistical Science 16(3),2001
One view: Emphasis on stochastic models of
nature:
Find a function that predicts y from x: no
model of nature implied or needed
4. TOY EXAMPLE
Bill Howe, UW
4
[Witten]
hypothesis:
we only play
when its
sunny?
No
hypothesis: we don’t play if its rainy and windy?
No
Goal: Predict when we play
6. TERMINOLOGY
Supervised Learning (“Training”)
• We are given examples of inputs and associated outputs
• We learn the relationship between them
Unsupervised Learning (sometimes:“Mining”)
• We are given inputs, but no outputs
• unlabeled data
• Learn the “latent” labels
• Ex: Clustering, dimension reduction
Bill Howe, UW
6
7. EXAMPLE:DOCUMENT CLASSIFICATION
Bill Howe, UW
7
“The Falcons trounced
the Saints on Sunday”
Sports
“The Mars Rover discovered
organic molecules on
Sunday”
Science
How do we set this up?
What are the rows and columns of our decision
table?
8. EXAMPLE:CONSTRUCTING THE DOCUMENT
MATRIX
Bill Howe, UW
d1 : Romeo and Juliet.
d2 : Juliet: O happy dagger!
d3 : Romeo died by dagger.
d4 :“Live free or die”, that’s the New-Hampshire’s motto.
d5 : Did you know, New-Hampshire is in New-England.
dagger die new-hampshir free happi live new-england motto romeo juliet
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
[1, 0, 0, 0, 1, 0, 0, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0]
[0, 1, 1, 1, 0, 1, 0, 1, 0, 0]
[0, 0, 1, 0, 0, 0, 1, 0, 0, 0]
d1
d2
d3
d4
d5
9. EXAMPLE:DOCUMENT CLASSIFICATION
Supervised Learning Problem
• A human assigns a topic label to each document in a
corpus
• The algorithm learns how to predict the label
Unsupervised Learning Problem
• No labels are given
• Discover groups of similar documents
Bill Howe, UW
10. LEARNING = THREE CORE COMPONENTS
Representation
Evaluation
Optimization
Bill Howe, UW
Pedro Domingos, A Few Useful Things to Know about Machine Learning,
CACM 55(10), 2012
11. LEARNING = THREE CORE COMPONENTS
Representation
• What exactly is your classifier?
• A hyperplane that separates the two classes?
• A decision tree?
• A neural network?
Evaluation
Optimization
Bill Howe, UW
Pedro Domingos, A Few Useful Things to Know about Machine Learning,
CACM 55(10), 2012
12. LEARNING = THREE CORE COMPONENTS
Representation
Evaluation
• How do we know if a given classifier is good or bad?
• # of errors on some test set?
• Precision and recall?
• Squared error?
• Likelihood?
Optimization
Bill Howe, UW
Pedro Domingos, A Few Useful Things to Know about Machine Learning,
CACM 55(10), 2012
13. LEARNING = THREE CORE COMPONENTS
Representation
Evaluation
Optimization
• How do you search among all the alternatives?
• Greedy search?
• Gradient descent?
Bill Howe, UW
Pedro Domingos, A Few Useful Things to Know about Machine Learning,
CACM 55(10), 2012
17. AVERY NAÏVE CLASSIFIER
Bill Howe, UW
17
pclass sex age sibsp parch fare cabin embarked
Does the new data point x* exactly match a previous point xi?
If so, assign it to the same class as xi
Otherwise, just guess.
This is the “rote” classifier
18. A MINOR IMPROVEMENT
Bill Howe, UW
18
Does the new data point x* match a set pf previous points xi on some specific
attribute?
If so, take a vote to determine class.
Example: If most females survived, then assume every female survives
But there are lots of possible rules like this.
And an attribute can have more than two values.
If most people under 4 years old survive, then assume everyone under 4 survives
If most people with 1 sibling survive, then assume everyone with 1 sibling survives
How do we choose?
19. IF sex = ‘female’THEN survive = yes
ELSE IF sex = ‘male’THEN survive = no
Bill Howe, UW
19
confusion matrix
no yes <-- classified as
468 109 | no
81 233 | yes
Not bad!
468+ 233
468+109 +81+ 233
= 79% correct (and 21% incorrect)
21. IF pclass=‘1’THEN survive=yes
ELSE IF pclass=‘2’THEN survive=yes
ELSE IF pclass=‘3’THEN survive=no
Bill Howe, UW
21
confusion matrix
no yes <-- classified as
372 119 | no
177 223 | yes
a bit worse…
372 + 223
372 +119 + 223+177
= 67% correct (and 33% incorrect)
22. 1-RULE
Bill Howe, UW
22
For each attribute A:
For each value V of that attribute, create a rule:
1. count how often each class appears
2. find the most frequent class, c
3. make a rule "if A=V then Class=c"
Calculate the error rate of this rule
Pick the attribute whose rules produce the lowest error
rate
24. HOW FAR CANWE GO?
Bill Howe, UW
IF pclass=‘1’ AND sex=‘female’THEN survive=yes
IF pclass=‘2’ AND sex=‘female’THEN survive=yes
IF pclass=‘3’ AND sex=‘female’ AND age < 4 THEN survive=yes
IF pclass=‘3’ AND sex=‘female’ AND age >= 4 THEN survive=no
IF pclass=‘2’ AND sex=‘male’THEN survive=no
IF pclass=‘3’ AND sex=‘male’THEN survive=no
IF pclass=‘1’ AND sex=‘male’ AND age < 5 THEN survive=yes
…
25. SEQUENTIAL COVERING
Bill Howe, UW
Initialize R to the empty set
for each class C {
while D is nonempty {
Construct one rule r that correctly classifies
some instances in D that belong to class C and
does not incorrectly classify any non-C instances
Add rule r to ruleset R
Remove from D all instances correctly classified by r
}
}
return R
src:Alvarez
26. SEQUENTIAL COVERING:FINDING NEXT RULE FOR
CLASS C
Bill Howe, UW
src:Alvarez
Initialize A as the set of all attributes over D
while r incorrectly classifies some non-C instances of D {
write r as ant(r) => C
for each attribute-value pair (a=v),
where a belongs to A and v is a value of a,
compute the accuracy of the rule
ant(r) and (a=v) => C
let (a*=v*) be the attribute-value pair of
maximum accuracy over D; in case of a tie,
choose the pair that covers the greatest
number of instances of D
update r by adding (a*=v*) to its antecedent:
r = ( ant(r) and (a*=v*) ) => C
remove the attribute a* from the set A:
A = A - {a*}
}
27. STRATEGIES FOR LEARNING EACH RULE
General-to-Specific
• Start with an empty rule
• Add constraints to eliminate negative examples
• Stop when only positives are covered
Specific-to-General (not shown)
• Start with a rule that identifies a single random instance
• Remove constraints in order to cover more positives
• Stop when further generalization results in covering negatives
Bill Howe, UW
28. CONFLICTS
If more than one rule is triggered
• Choose the “most specific” rule
• Use domain knowledge to order rules by priority
Bill Howe, UW
29. RECAP
Representation
• A set of rules: IF…THEN conditions
Evaluation
• coverage: # of data points that satisfy conditions
• accuracy = # of correct predictions / coverage
Optimization
• Build rules by finding conditions that maximize
accuracy
Bill Howe, UW
One rule is easy to
interpret, but a complex
set of rules probably
isn’t
30. HOW FAR CANWE GO?
We might consider grouping redundant conditions
Bill Howe, UW
IF pclass=‘1’THEN
IF sex=‘female’THEN survive=yes
IF sex=‘male’ AND age < 5 THEN survive=yes
IF pclass=‘2’
IF sex=‘female’THEN survive=yes
IF sex=‘male’THEN survive=no
IF pclass=‘3’
IF sex=‘male’THEN survive=no
IF sex=‘female’
IF age < 4 THEN survive=yes
IF age >= 4 THEN survive=no
A decision tree
31. HOW FAR CANWE GO?
31
sex
age
pclass
female
male
survive
not survive
3
1
not survive
survive
<=4
not survive
>4
Every path
from the root is
a rule
2
33. ASIDE ON ENTROPY
Consider two sequences of coin flips:
We want some function “Information” that satisfies:
Information1&2(p1p2) = Information1(p1) + Information2(p2)
HHTHTTHHHHTTHTHTHTTTT….
TTHHTTHTHTTTTHHHTHTTT….
How much information do we get after flipping each coin once?
Expected Information = “Entropy”
H(X) = E(I(X)) =
X
x
pxI(x) =
X
x
px log2 px
I(X) = log2 px
35. EXAMPLE:ROLLING A DIE
= 6 ⇥
✓
1
6
log2
1
6
◆
⇡ 2.58
p1 =
1
6
, p2 =
1
6
, p3 =
1
6
, . . .
Entropy =
X
i
pi log2 pi
36. EXAMPLE:ROLLING AWEIGHTED DIE
Entropy =
X
i
px log2 px
p1 = 0.1, p2 = 0.1, p3 = 0.1, . . . p6 = 0.5
= 5 ⇥ (0.1 log2 0.1) 0.5 log2 0.5
= 2.16
The weighted die is more unpredictable than a fair die
37. HOW UNPREDICTABLE ISYOUR DATA?
342/891 survivors in titanic training set
Say there were only 50 survivors
✓
342
891
log2
342
891
+
549
891
log2
549
891
◆
= 0.96
✓
50
891
log2
50
891
+
841
891
log2
841
891
◆
= 0.31
38. BACK TO DECISION TREES
Which attribute do we choose at each level?
The one with the highest information gain
• The one that reduces the unpredictability the most
39. Before: 14 records, 9 are “yes”
✓
9
14
log2
9
14
+
5
14
log2
5
14
◆
= 0.94
If we choose outlook:
overcast : 4 records, 4 are “yes”
rainy : 5 records, 3 are “yes”
sunny : 5 records, 2 are “yes”
Expected new entropy:
✓
4
4
log2
4
4
◆
= 0
✓
3
5
log2
3
5
+
2
5
log2
2
5
◆
= 0.97
✓
2
5
log2
2
5
+
3
5
log2
3
5
◆
= 0.97
4
14
⇥ 0.0 +
5
14
⇥ 0.97 +
5
14
⇥ 0.97
= 0.69
outlook temperature humidity windy play
overcast cool normal TRUE yes
overcast hot high FALSE yes
overcast hot normal FALSE yes
overcast mild high TRUE yes
rainy cool normal TRUE no
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes
rainy mild normal FALSE yes
sunny hot high FALSE no
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes
40. outlook temperature humidity windy play
overcast cool normal TRUE yes
overcast hot high FALSE yes
overcast hot normal FALSE yes
overcast mild high TRUE yes
rainy cool normal TRUE no
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes
rainy mild normal FALSE yes
sunny hot high FALSE no
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes
Before: 14 records, 9 are “yes”
✓
9
14
log2
9
14
+
5
14
log2
5
14
◆
= 0.94
If we choose temperature:
cool: 4 records, 3 are “yes”
= 0.81
rainy : 4 records, 2 are “yes”
=1.0
sunny : 6 records, 4 are “yes”
=0.92
Expected new entropy:
= 0.91
0.81(4/14) + 1.0(4/14) + 0.92(6/14)
41. outlook temperature humidity windy play
overcast cool normal TRUE yes
overcast hot high FALSE yes
overcast hot normal FALSE yes
overcast mild high TRUE yes
rainy cool normal TRUE no
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes
rainy mild normal FALSE yes
sunny hot high FALSE no
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes
Before: 14 records, 9 are “yes”
✓
9
14
log2
9
14
+
5
14
log2
5
14
◆
= 0.94
If we choose humidity:
normal: 7 records, 6 are “yes”
= 0.59
high: 7 records, 2 are “yes”
=0.86
Expected new entropy:
= 0.725
0.59(7/14) + 0.86(7/14)
42. outlook temperature humidity windy play
overcast cool normal TRUE yes
overcast hot high FALSE yes
overcast hot normal FALSE yes
overcast mild high TRUE yes
rainy cool normal TRUE no
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes
rainy mild normal FALSE yes
sunny hot high FALSE no
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes
Before: 14 records, 9 are “yes”
✓
9
14
log2
9
14
+
5
14
log2
5
14
◆
= 0.94
If we choose windy:
TRUE: 8 records, 6 are “yes”
= 0.81
FALSE: 5 records, 3 are “yes”
=0.97
Expected new entropy:
= 0.87
0.81(8/14) + 0.97(6/14)
43. outlook temperature humidity windy play
overcast cool normal TRUE yes
overcast hot high FALSE yes
overcast hot normal FALSE yes
overcast mild high TRUE yes
rainy cool normal TRUE no
rainy mild high TRUE no
rainy cool normal FALSE yes
rainy mild high FALSE yes
rainy mild normal FALSE yes
sunny hot high FALSE no
sunny hot high TRUE no
sunny mild high FALSE no
sunny cool normal FALSE yes
sunny mild normal TRUE yes
Before: 14 records, 9 are “yes”
✓
9
14
log2
9
14
+
5
14
log2
5
14
◆
= 0.94
outlook
0.94 – 0.69 = 0.25
temperature
0.94 – 0.91 = 0.03
humidity
0.94 – 0.725 = 0.215
windy
0.94 – 0.87 = 0.07
highest gain
45. BUILDING A DECISION TREE (ID3
ALGORITHM)
Assume attributes are discrete
• Discretize continuous attributes
Choose the attribute with the highest Information
Gain
Create branches for each value of attribute
Examples partitioned based on selected attributes
Repeat with remaining attributes
Stopping conditions
• All examples assigned the same label
• No examples left
Bill Howe, UW
46. PROBLEMS
Expensive to train
Prone to overfitting
• Drive to perfection on training data, bad on test data
• Pruning can help: remove or aggregate subtrees that
provide little discriminatory power (C45)
47. C4.5 EXTENSIONS
Continuous Attributes
outlook temperature humidity windy play
overcast cool 60 TRUE yes
overcast hot 80 FALSE yes
overcast hot 63 FALSE yes
overcast mild 81 TRUE yes
rainy cool 58 TRUE no
rainy mild 90 TRUE no
rainy cool 54 FALSE yes
rainy mild 92 FALSE yes
rainy mild 59 FALSE yes
sunny hot 90 FALSE no
sunny hot 89 TRUE no
sunny mild 90 FALSE no
sunny cool 60 FALSE yes
sunny mild 62 TRUE yes
48. outlook temperature humidity windy play
rainy mild 54 FALSE yes
overcast hot 58 FALSE yes
overcast cool 59 TRUE yes
rainy cool 60 FALSE yes
overcast mild 60 TRUE yes
overcast hot 62 FALSE yes
rainy mild 63 TRUE no
sunny cool 80 FALSE yes
rainy mild 81 FALSE yes
sunny mild 89 TRUE yes
sunny hot 90 FALSE no
rainy cool 90 TRUE no
sunny hot 90 TRUE no
sunny mild 92 FALSE no
Consider every possible binary
partition; choose the partition
with the highest gain
Expect =
8
14
(0.95)+
6
14
(0) = 0.54 Expect =
10
14
(0.47)+
4
14
(0) = 0.33
E(
4
4
) = 0.0
E(
6
6
) = 0.0
E(
3
8
)+ E(
5
9
) = 0.95
E(
9
10
)+ E(
1
10
) = 0.47
49. WHEREWE ARE
Supervised learning and classification problems
• Predict a class label based on other attributes
Rules
• To start, we guessed simple rules that might explain the data
• But the relationships are complex, so we need to automate
• The 1-rule algorithm
• A sequential cover algorithm for sets of rules with complex conditions
• But: Sets of rules are hard to interpret
Decision trees
• Each path from the root is a tree; easy to interpret
• Use entropy to choose best attribute at each node
• Extensions for numeric attributes
• But: Decision Trees are prone to overfitting
1/16/16
Bill Howe, Data Science, Autumn 2012
49
50. OVERFITTING
Bill Howe, UW
50
What if the knowledge and data we have are not sufficient to
completely determine the correct classifier? Then we run the risk of
just hallucinating a classifier (or parts of it) that is not grounded in
reality, and is simply encoding random quirks in the data.
This problem is called overfitting, and is the bugbear of machine
learning.When your learner outputs a classifier that is 100% accurate
on the training data but only 50% accurate on test data, when in fact it
could have output one that is 75% accurate on both, it has overfit.
Pedro Domingos, A Few Useful Things to Know About Machine Learning, CACM
55(10), 2012
54. Is the model able to generalize? Can it deal
with unseen data, or does it overfit the data?
Test on hold-out data:
• split data to be modeled in training and test set
• train the model on training set
• evaluate the model on the training set
• evaluate the model on the test set
• difference between the fit on training data and test
data measures the model’s ability to generalize
Bill Howe, UW
54
slide src: Frank Keller
55. Bill Howe, UW
55
src: Domingo 2012
Underfitting:
High Bias
Low Variance
Overfitting:
Low Bias
High Variance
56. EVALUATION
Division into training and test sets
• Fixed
• Leave out random N% of the data
• k-fold Cross-Validation
• Select K folds without replace
• Leave-One-Out Cross Validation
• Special case
• Related: Bootstrap
• Generate new training sets by sampling with replacement
Bill Howe, UW
56
57. LEAVE-ONE-OUT CROSS-VALIDATION
(LOOCV)
Bill Howe, UW
57
For each training example (xi, yi)
Train a classifier with all training data except (xi, yi)
Test the classifier’s accuracy on (xi, yi)
LOOCV accuracy = average of all n accuracies
59. EVALUATION:ACCURACY ISN’T ALWAYS ENOUGH
How do you interpret 90% accuracy?
• You can’t; it depends on the problem
Need a baseline:
• Base Rate
• Accuracy of trivially predicting the most-frequent class
• Random Rate
• Accuracy of making a random class assignment
• Might apply prior knowledge to assign random distribution
• Naïve Rate
• Accuracy of some simple default or pre-existing model
• Ex:“All females survived”
Bill Howe, UW
59
61. ACCURACY
Bill Howe, UW
61
Confusion Matrix
Accuracy =
a + d
a + b+c+ dPredicted
Positive
Predicted
Negative
True
Positive a b
True
Negative c d precision :
a
a +c
recall :
a
a + b
62. ROC PLOT
Bill Howe, UW
62
Confusion Matrix
“Receiver Operator
Characteristic”
• Historical term from WW2
• Used to measure accuracy of
radar operators
Predicted
Positive
Predicted
Negative
True
Positive a b
True
Negative c d
Accuracy =
a + d
a + b+c+ d
sensitivity =
a
a + b
1− specificity =
1− d
c+ d
64. THE BOOTSTRAP
Bill Howe, UW
64
Given a dataset of size N
Draw N samples with replacement to create a new
dataset
Repeat ~1000 times
You now have ~1000 sample datasets
• All drawn from the same population
• You can compute ~1000 sample statistics
• You can interpret these as repeated experiments, which is exactly
what the frequentist perspective calls for
Very elegant use of computational resources
Efron 1979
66. THE BOOTSTRAP
Bill Howe, UW
66
Example:
Generate 1000 samples and
1000 linear regressions
You want a 90% confidence
interval for the slope?
Just take the 5th percentile
and the 95th percentile!
67. ENSEMBLES:COMBINING CLASSIFIERS
Can a set of weak classifiers be combined to derive a
strong classifier? Yes!
Average results from different models
Why?
• Better classification performance than individual classifiers
• More resilience to noise
Why not?
• Time consuming
• Models become difficult to explain
Bill Howe, UW
4
“Wisdom of the (simulated) crowd”
Freund, Schapire 1995
68. BAGGING
Draw N bootstrap samples
Retrain the model on each sample
Average the results
• Regression: Averaging
• Classification: Majority vote
Works great for overfit models
• Decreases variance without changing bias
• Doesn’t help much with underfit/high bias models
• Insensitive to the training data
Bill Howe, UW
5
69. BOOSTING
Bill Howe, UW
6
Instead of selecting data points randomly with the bootstrap,
favor the misclassified points
Initialize the weights
Repeat:
Resample with respect to weights
Retrain the model
Recompute weights
70. Bill Howe, UW
7
✏t =
X
i:ht(xi)6=yi
Dt(i)
Dt(i)
For each step t
weights: probability of selecting
example i in the sample
ht
trained classifier at step t using sample drawn
according to Dt
xi, yi ith example, ith label
t =
✏t
1 ✏t
sum of weights for
misclassified examples
odds of misclassifying
adjust weights down for
correctly classified examples
…and normalize to
make sure weights
sum to 1
Dt+1(i) = tDt(i)
71. RANDOM FOREST ALGORITHM
Bill Howe, UW
71
Repeat k times:
• Draw a bootstrap sample from the dataset
• Train a decision tree
Until the tree is maximum size
Choose next leaf node
Select m attributes at random from the p available
Pick the best attribute/split as usual
• Measure out-of-bag error
• Evaluate against the samples that were not selected in the bootstrap
• Provides measures of strength (inverse error rate), correlation
between trees (which increases the forest error rate), and variable
importance
Make a prediction by majority vote among the k trees
Breiman 2001
72. RANDOM FORESTS:VARIABLE IMPORTANCE
Key Idea: If you scramble the values of a variable and
the accuracy of your tree doesn’t change much, then the
variable isn’t very important
Measure the error increase
Random Forests are more difficult to interpret than
single trees; understanding variable importance helps
• Ex: Medical applications can’t typically rely on black box solutions
Bill Howe, UW
72
Breiman 2001
73. GINI COEFFICIENT
Entropy captured an intuition for “impurity”
• We want to choose attributes that split records into pure
classes
The gini coefficient measures inequality
Bill Howe, UW
73
74. RANDOM FORESTS ON BIG DATA
Easy to parallelize
• Trees are built independently
Handles “small n big p” problems naturally
• A subset of attributes are selected by
importance
Bill Howe, UW
74
75. SUMMARY:DECISION TREES AND FORESTS
Representation
• Decision Trees
• Sets of decision trees with majority vote
Evaluation
• Accuracy
• Random forests: out-of-bag error
Optimization
• Information Gain or Gini Index to measure impurity and
select best attributes
Bill Howe, UW
75
77. NEAREST NEIGHBORS INTUITION
The last document I saw that mentioned “Falcons” and
“Saints” was about Sports, so I’ll classify this document
as about Sports too
Bill Howe, UW
77
78. NEAREST NEIGHBOR CHOICES
k nearest neighbors – how do we choose k?
• Benefits of a small k? Benefits of a large k?
Similarity function
• Euclidean distance? Cosine similarity?
Bill Howe, UW
78
Large k = bias towards popular labels
Large k = ignores outliers
Small k = fast
Cosine = favors dominant components
Euclidean = difficult to interpret with sparse
data, and high-dimensional data is always
sparse
80. REGRESSION-BASED METHODS
Trees and rules are designed for categorical
data
• Numerical data can be discretized, but this
introduces another decision
Discriminitive models
Bill Howe, UW
80
82. EVALUATION:MEAN SQUARED ERROR
Bill Howe, UW
82
: number of instances in the data set
H0
i
Hi
MSE =
1
n
nX
i=1
(Hi H0
i)2
: quantity observed in the data set
: quantity predicted by model
n
83. ASIDE ON ERRORS AND NORMS
Bill Howe, UW
83
X
i
Hi H0
i
errors cancel out;
usually not what you want
X
i
|Hi H0
i| Asserts that 1 error of 7 units is as
bad as 7 errors of 1 unit each
X
i
(Hi H0
i)2 Asserts that 1 error of 7 units is as
bad as 49 errors of 1 unit each
1
n
nX
i
(Hi H0
i)2
Average squared error per data point;
useful when comparing methods that
filter the data differently
L1-norm
L2-norm
not a norm