Monte Carlo simulations and discrete event simulations embrace randomness and uncertainty by sampling from probability distributions. They allow modeling complex predictions and systems where outcomes depend on probabilistic events. Discrete event simulation can estimate the probabilities of outcomes in a sports tournament by simulating matchups as probabilistic coin flips. Bootstrap resampling techniques estimate properties of sample estimates by resampling the observed data with replacement, and block bootstrap preserves correlation structures. Biased bootstrap can non-uniformly resample to estimate distributions based on additional information.
This document discusses game theory and its application to decision making in sports. It covers key concepts in game theory like zero-sum games, Nash equilibriums, and mixed strategies. Examples discussed include an NFL run vs pass game and penalty kicks in soccer. These examples show how game theory can provide insights into optimal strategic decisions in sports.
The document discusses modelling and evaluation in machine learning. It defines what models are and how they are selected and trained for predictive and descriptive tasks. Specifically, it covers:
1) Models represent raw data in meaningful patterns and are selected based on the problem and data type, like regression for continuous numeric prediction.
2) Models are trained by assigning parameters to optimize an objective function and evaluate quality. Cross-validation is used to evaluate models.
3) Predictive models predict target values like classification to categorize data or regression for continuous targets. Descriptive models find patterns without targets for tasks like clustering.
4) Model performance can be affected by underfitting if too simple or overfitting if too complex,
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
This document provides an overview of the data science process for predicting the NBA MVP winner for the upcoming season. It discusses framing the question, collecting relevant stats data from basketball-reference.com, cleaning and formatting the data, exploring it with Python libraries like Pandas and NumPy, building and evaluating decision tree and random forest models, and discussing ways to improve the model's performance, such as modifying feature selection.
Decision trees classify instances by starting at the root node and moving through the tree recursively according to attribute tests at each node, until a leaf node determining the class label is reached. They work by splitting the training data into purer partitions based on the values of predictor attributes, using an attribute selection measure like information gain to choose the splitting attributes. The resulting tree can be pruned to avoid overfitting and reduce error on new data.
The document provides an overview of key concepts related to estimation in statistics, including:
- Estimation involves using sample data to estimate unknown population parameters. Common estimators include the sample mean, proportion, and standard deviation.
- There are two main types of estimates - point estimates and interval estimates. Point estimates are single values while interval estimates specify a range.
- The process of estimation involves identifying the parameter, selecting a random sample, choosing an estimator, and calculating the estimate.
- Estimates can differ from the true population value due to sampling error and non-sampling error. Bias occurs when the expected value of the estimate differs from the true parameter value.
An introduction to machine learning and statisticsSpotle.ai
This document provides an overview of machine learning and predictive modeling. It begins by describing how predictive models can be used in various domains like healthcare, finance, telecom, and business. It then discusses the differences between machine learning and predictive modeling, noting that machine learning aims to allow machines to learn autonomously using feedback mechanisms, while predictive modeling focuses on building statistical models to predict outcomes. The document also uses examples like Microsoft's Tay chatbot to illustrate how machine learning systems can be exposed to real-world data to continuously learn and improve. It concludes by explaining how predictive analytics fits within machine learning as the starting point to build initial predictive models and continuously monitor and refine them.
This document discusses game theory and its application to decision making in sports. It covers key concepts in game theory like zero-sum games, Nash equilibriums, and mixed strategies. Examples discussed include an NFL run vs pass game and penalty kicks in soccer. These examples show how game theory can provide insights into optimal strategic decisions in sports.
The document discusses modelling and evaluation in machine learning. It defines what models are and how they are selected and trained for predictive and descriptive tasks. Specifically, it covers:
1) Models represent raw data in meaningful patterns and are selected based on the problem and data type, like regression for continuous numeric prediction.
2) Models are trained by assigning parameters to optimize an objective function and evaluate quality. Cross-validation is used to evaluate models.
3) Predictive models predict target values like classification to categorize data or regression for continuous targets. Descriptive models find patterns without targets for tasks like clustering.
4) Model performance can be affected by underfitting if too simple or overfitting if too complex,
Introduction, Terminology and concepts, Introduction to statistics, Central tendencies and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, Basic machine learning algorithms, Linear regression, SVM, Naive Bayes
This document provides an overview of the data science process for predicting the NBA MVP winner for the upcoming season. It discusses framing the question, collecting relevant stats data from basketball-reference.com, cleaning and formatting the data, exploring it with Python libraries like Pandas and NumPy, building and evaluating decision tree and random forest models, and discussing ways to improve the model's performance, such as modifying feature selection.
Decision trees classify instances by starting at the root node and moving through the tree recursively according to attribute tests at each node, until a leaf node determining the class label is reached. They work by splitting the training data into purer partitions based on the values of predictor attributes, using an attribute selection measure like information gain to choose the splitting attributes. The resulting tree can be pruned to avoid overfitting and reduce error on new data.
The document provides an overview of key concepts related to estimation in statistics, including:
- Estimation involves using sample data to estimate unknown population parameters. Common estimators include the sample mean, proportion, and standard deviation.
- There are two main types of estimates - point estimates and interval estimates. Point estimates are single values while interval estimates specify a range.
- The process of estimation involves identifying the parameter, selecting a random sample, choosing an estimator, and calculating the estimate.
- Estimates can differ from the true population value due to sampling error and non-sampling error. Bias occurs when the expected value of the estimate differs from the true parameter value.
An introduction to machine learning and statisticsSpotle.ai
This document provides an overview of machine learning and predictive modeling. It begins by describing how predictive models can be used in various domains like healthcare, finance, telecom, and business. It then discusses the differences between machine learning and predictive modeling, noting that machine learning aims to allow machines to learn autonomously using feedback mechanisms, while predictive modeling focuses on building statistical models to predict outcomes. The document also uses examples like Microsoft's Tay chatbot to illustrate how machine learning systems can be exposed to real-world data to continuously learn and improve. It concludes by explaining how predictive analytics fits within machine learning as the starting point to build initial predictive models and continuously monitor and refine them.
Online learning & adaptive game playingSaeid Ghafouri
The document discusses online learning and adaptive game playing. It defines online learning as processing data sequentially in a streaming fashion to train machine learning models. This allows learning from large datasets that cannot fit in memory or when data is continuously generated. Common applications include recommendations, fraud detection, and portfolio management. The document also discusses how reinforcement learning differs from online learning in having a goal of optimizing rewards through a sequence of actions rather than predicting single outputs. It describes early implementations of adaptive game playing using algorithms like naive Bayes, Markov decision processes, and n-grams on the game of rock-paper-scissors before discussing a more complex fighting game implementation.
The document discusses evaluation metrics and methodologies for comparing machine learning models. It introduces key metrics like accuracy, precision, recall and the confusion matrix. It emphasizes the importance of comparing models using statistical tests to determine if performance differences are significant, such as paired t-tests and McNemar's test. Cross-validation is presented as a method for estimating out-of-sample performance and comparing multiple learners on the same data splits.
Lecture 3 for the AI course in A universityCao Minh Tu
The document discusses evaluation metrics and methodologies for comparing machine learning models. It introduces key metrics like accuracy, precision, recall and the confusion matrix. It emphasizes the importance of comparing models using statistical tests to determine if performance differences are significant, such as paired t-tests and McNemar's test. Cross-validation is presented as a method for estimating out-of-sample performance and comparing multiple learners on the same data splits.
This document provides an overview of standard deviation and z-scores. It begins by listing the key learning objectives which are to describe the importance of variation in distributions, understand how to calculate standard deviation, describe what a z-score is and how to calculate them, and learn the Greek letters for mean and standard deviation. It then provides explanations and examples of how to calculate and interpret standard deviation as a measure of variation, how to convert values to z-scores based on the mean and standard deviation, and the importance of ensuring distributions are normal before using these statistical techniques. It emphasizes understanding the concepts rather than just memorizing formulas.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
UNIT 1 Machine Learning [KCS-055] (1).pptxRohanPathak30
Machine learning is a form of artificial intelligence that allows systems to learn from data and improve automatically without being explicitly programmed. The process of learning begins with observations or data that are used to identify patterns and make better decisions. There are three main types of machine learning: supervised learning where the system is trained by labeled examples, unsupervised learning where the system finds hidden patterns in unlabeled data, and reinforcement learning where the system learns from interaction with its environment through rewards and punishments. Key developments in machine learning history include the perceptron in the 1950s, backpropagation in the 1970s, and boosting algorithms in the 1990s.
This document provides an overview of machine learning concepts from the first lecture of an introduction to machine learning course. It discusses what machine learning is, examples of tasks that can be solved with machine learning, and key concepts like supervised vs. unsupervised learning, hypothesis spaces, searching hypothesis spaces, generalization, and model complexity.
Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things.
Use machine learning techniques to predict sporting events. Learn about how sports betting works and how to apply predictive analytics to gain a potential edge.
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
Data mining and machine learning techniques like classification and clustering are increasingly being used to extract useful information from large datasets. Data mining helps provide better customer service and aids scientists in hypothesis formation by analyzing patterns in data from various sources like business transactions, sensor networks, and scientific experiments. Classification algorithms such as decision trees can be applied to datasets containing attributes for individuals and a target variable to predict, like credit worthiness, to build a predictive model. Clustering algorithms like K-means group unlabeled data into clusters without a predefined target variable to discover hidden patterns in the data.
Network analysis methods can be used for sports analytics applications like team and lineup ranking. SportsNetRank ranks teams based on their win-loss network using PageRank centrality. LinNet evaluates lineups based on their matchup network using network embeddings. It learns latent representations of lineups using node2vec and predicts outcomes of new lineup matchups. LinNet outperforms adjusted plus-minus and PageRank in predicting unseen lineup matchups, with probabilities well calibrated and Brier scores around 0.19. Substitution networks also show potential for explaining team performance. Further work could optimize network embeddings and model lineup ability curves.
Chapter 4 Classification in data sience .pdfAschalewAyele2
This document discusses data mining tasks related to predictive modeling and classification. It defines predictive modeling as using historical data to predict unknown future values, with a focus on accuracy. Classification is described as predicting categorical class labels based on a training set. Several classification algorithms are mentioned, including K-nearest neighbors, decision trees, neural networks, Bayesian networks, and support vector machines. The document also discusses evaluating classification performance using metrics like accuracy, precision, recall, and a confusion matrix.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
This presentation discusses about following topics:
Types of Problems Solved Using Artificial Intelligence Algorithms
Problem categories
Classification Algorithms
Naive Bayes
Example: A person playing golf
Decision Tree
Random Forest
Logistic Regression
Support Vector Machine
Support Vector Machine
K Nearest Neighbors
The document discusses machine learning and various machine learning techniques. It defines machine learning as using data and experience to acquire models and modify decision mechanisms to improve performance. The document outlines different types of machine learning including supervised learning (using labeled data), unsupervised learning (using only unlabeled data), and reinforcement learning (where an agent takes actions and receives rewards or punishments). It provides examples of classification problems and discusses decision tree learning as a supervised learning method, including how decision trees are constructed and potential issues like overfitting.
The document discusses machine learning and various machine learning techniques. It defines machine learning as using data and experience to acquire models and modify decision mechanisms to improve performance. It describes supervised learning where data and labels are provided, unsupervised learning where only data is given, and reinforcement learning where an agent takes actions and receives rewards or punishments. Decision tree learning is discussed as a supervised learning method where trees are constructed by recursively splitting data based on attribute tests that optimize criteria like information gain. Overfitting and techniques like pruning are addressed to improve generalization.
Online learning & adaptive game playingSaeid Ghafouri
The document discusses online learning and adaptive game playing. It defines online learning as processing data sequentially in a streaming fashion to train machine learning models. This allows learning from large datasets that cannot fit in memory or when data is continuously generated. Common applications include recommendations, fraud detection, and portfolio management. The document also discusses how reinforcement learning differs from online learning in having a goal of optimizing rewards through a sequence of actions rather than predicting single outputs. It describes early implementations of adaptive game playing using algorithms like naive Bayes, Markov decision processes, and n-grams on the game of rock-paper-scissors before discussing a more complex fighting game implementation.
The document discusses evaluation metrics and methodologies for comparing machine learning models. It introduces key metrics like accuracy, precision, recall and the confusion matrix. It emphasizes the importance of comparing models using statistical tests to determine if performance differences are significant, such as paired t-tests and McNemar's test. Cross-validation is presented as a method for estimating out-of-sample performance and comparing multiple learners on the same data splits.
Lecture 3 for the AI course in A universityCao Minh Tu
The document discusses evaluation metrics and methodologies for comparing machine learning models. It introduces key metrics like accuracy, precision, recall and the confusion matrix. It emphasizes the importance of comparing models using statistical tests to determine if performance differences are significant, such as paired t-tests and McNemar's test. Cross-validation is presented as a method for estimating out-of-sample performance and comparing multiple learners on the same data splits.
This document provides an overview of standard deviation and z-scores. It begins by listing the key learning objectives which are to describe the importance of variation in distributions, understand how to calculate standard deviation, describe what a z-score is and how to calculate them, and learn the Greek letters for mean and standard deviation. It then provides explanations and examples of how to calculate and interpret standard deviation as a measure of variation, how to convert values to z-scores based on the mean and standard deviation, and the importance of ensuring distributions are normal before using these statistical techniques. It emphasizes understanding the concepts rather than just memorizing formulas.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
UNIT 1 Machine Learning [KCS-055] (1).pptxRohanPathak30
Machine learning is a form of artificial intelligence that allows systems to learn from data and improve automatically without being explicitly programmed. The process of learning begins with observations or data that are used to identify patterns and make better decisions. There are three main types of machine learning: supervised learning where the system is trained by labeled examples, unsupervised learning where the system finds hidden patterns in unlabeled data, and reinforcement learning where the system learns from interaction with its environment through rewards and punishments. Key developments in machine learning history include the perceptron in the 1950s, backpropagation in the 1970s, and boosting algorithms in the 1990s.
This document provides an overview of machine learning concepts from the first lecture of an introduction to machine learning course. It discusses what machine learning is, examples of tasks that can be solved with machine learning, and key concepts like supervised vs. unsupervised learning, hypothesis spaces, searching hypothesis spaces, generalization, and model complexity.
Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things.
Use machine learning techniques to predict sporting events. Learn about how sports betting works and how to apply predictive analytics to gain a potential edge.
Introduction to machine learning. Basics of machine learning. Overview of machine learning. Linear regression. logistic regression. cost function. Gradient descent. sensitivity, specificity. model selection.
Data mining and machine learning techniques like classification and clustering are increasingly being used to extract useful information from large datasets. Data mining helps provide better customer service and aids scientists in hypothesis formation by analyzing patterns in data from various sources like business transactions, sensor networks, and scientific experiments. Classification algorithms such as decision trees can be applied to datasets containing attributes for individuals and a target variable to predict, like credit worthiness, to build a predictive model. Clustering algorithms like K-means group unlabeled data into clusters without a predefined target variable to discover hidden patterns in the data.
Network analysis methods can be used for sports analytics applications like team and lineup ranking. SportsNetRank ranks teams based on their win-loss network using PageRank centrality. LinNet evaluates lineups based on their matchup network using network embeddings. It learns latent representations of lineups using node2vec and predicts outcomes of new lineup matchups. LinNet outperforms adjusted plus-minus and PageRank in predicting unseen lineup matchups, with probabilities well calibrated and Brier scores around 0.19. Substitution networks also show potential for explaining team performance. Further work could optimize network embeddings and model lineup ability curves.
Chapter 4 Classification in data sience .pdfAschalewAyele2
This document discusses data mining tasks related to predictive modeling and classification. It defines predictive modeling as using historical data to predict unknown future values, with a focus on accuracy. Classification is described as predicting categorical class labels based on a training set. Several classification algorithms are mentioned, including K-nearest neighbors, decision trees, neural networks, Bayesian networks, and support vector machines. The document also discusses evaluating classification performance using metrics like accuracy, precision, recall, and a confusion matrix.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
This presentation discusses about following topics:
Types of Problems Solved Using Artificial Intelligence Algorithms
Problem categories
Classification Algorithms
Naive Bayes
Example: A person playing golf
Decision Tree
Random Forest
Logistic Regression
Support Vector Machine
Support Vector Machine
K Nearest Neighbors
The document discusses machine learning and various machine learning techniques. It defines machine learning as using data and experience to acquire models and modify decision mechanisms to improve performance. The document outlines different types of machine learning including supervised learning (using labeled data), unsupervised learning (using only unlabeled data), and reinforcement learning (where an agent takes actions and receives rewards or punishments). It provides examples of classification problems and discusses decision tree learning as a supervised learning method, including how decision trees are constructed and potential issues like overfitting.
The document discusses machine learning and various machine learning techniques. It defines machine learning as using data and experience to acquire models and modify decision mechanisms to improve performance. It describes supervised learning where data and labels are provided, unsupervised learning where only data is given, and reinforcement learning where an agent takes actions and receives rewards or punishments. Decision tree learning is discussed as a supervised learning method where trees are constructed by recursively splitting data based on attribute tests that optimize criteria like information gain. Overfitting and techniques like pruning are addressed to improve generalization.
This document provides an overview of requirements documentation and modeling techniques. It discusses guidelines for writing requirements, such as using standard templates and natural language. Requirements documents establish what a system should do and provide validation. The document also discusses use case modeling and defines actors, flows of events, and extensions. It provides an example case study of an ATM banking system and describes associated use cases. Finally, it discusses principles of modeling like abstraction and partitioning, as well as modeling techniques like object-oriented and functional modeling.
The document provides requirements for developing software for an automated teller machine (ATM) banking system. It outlines the key functions of the system, including validating customer cards and PINs, performing withdrawal, balance inquiry, and funds transfer transactions, and maintaining transaction records. Diagrams are included in appendices to illustrate use cases, system objects, and data flow. The software is intended to enable customers to securely conduct basic banking activities from distributed ATMs connected to a central server.
SRS 2 requiremenr engineering in computer.pptubaidullah75790
This document discusses quality attributes of requirements documents and software requirements specifications (SRS). It outlines what should be included in an SRS, such as functional and non-functional requirements, as well as what should not be included. The document then describes key quality attributes an SRS should have, such as being correct, unambiguous, complete, verifiable, consistent, understandable, modifiable, traced, traceable, design independent, annotated, concise and organized. Examples are provided for some attributes.
Requirements management is the process of managing changes to system requirements. This lecture discusses reasons for changing requirements and how to manage them. Requirements cannot be effectively managed without traceability between requirements and other project artifacts. Requirements will inevitably change for reasons like evolving stakeholder needs, environmental changes, and technical issues. There are both stable and volatile requirements, with volatile types including mutable, emergent, consequential, and compatibility requirements.
This document discusses requirements documents, also known as software requirements specifications (SRS). It explains that an SRS is used to formally communicate system requirements to stakeholders. The document outlines what an SRS typically includes, such as user requirements, system constraints, and technical definitions. It also describes common SRS sections based on the IEEE standard, such as an introduction, general description, and specific requirements. Finally, it notes that the structure of an SRS depends on factors like the system type and organizational practices.
Requirements elicitation involves understanding the application domain, problem to be solved, organizational needs, and stakeholder requirements. The key stages of elicitation are setting objectives, acquiring background knowledge, organizing knowledge, and collecting stakeholder requirements. Elicitation requires techniques like interviewing, scenarios, and prototyping to understand stakeholders. Elicitation, analysis, and negotiation are iterative processes that inform one another to refine requirements.
Requirements elicitation involves gathering requirements through techniques like interviews, focus groups, and meetings. It aims to discover the needs of users and stakeholders but faces challenges like unclear scopes, differing understandings between parties, and changing requirements over time. The requirements elicitation process must consider organizational, environmental, project, and people constraints. Knowledge is acquired through reading, listening, asking questions, and observation.
The document discusses social and cultural issues that can arise in requirements engineering. It identifies six areas where social issues may occur, including within client organizations, requirements teams, and development teams. Cultural issues can also impact requirements engineering when clients and developers are in different geographic locations or have differences in language, time zones, religion, politics, or business environments. To address these challenges, requirements engineers must understand the social and cultural contexts, create a respectful work environment, and use technology to facilitate cross-boundary collaboration.
Requirements engineering is the process of determining and documenting the requirements for a software system through activities like elicitation, analysis, specification, validation and management. It involves analyzing problems, describing desired product behavior, and iterating between the two tasks. The requirements engineering process starts with recognizing a problem and ends with a complete description of the software system's external behavior. It varies based on factors like technical maturity, organizational culture and application domain.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
Do People Really Know Their Fertility Intentions? Correspondence between Sel...Xiao Xu
Fertility intention data from surveys often serve as a crucial component in modeling fertility behaviors. Yet, the persistent gap between stated intentions and actual fertility decisions, coupled with the prevalence of uncertain responses, has cast doubt on the overall utility of intentions and sparked controversies about their nature. In this study, we use survey data from a representative sample of Dutch women. With the help of open-ended questions (OEQs) on fertility and Natural Language Processing (NLP) methods, we are able to conduct an in-depth analysis of fertility narratives. Specifically, we annotate the (expert) perceived fertility intentions of respondents and compare them to their self-reported intentions from the survey. Through this analysis, we aim to reveal the disparities between self-reported intentions and the narratives. Furthermore, by applying neural topic modeling methods, we could uncover which topics and characteristics are more prevalent among respondents who exhibit a significant discrepancy between their stated intentions and their probable future behavior, as reflected in their narratives.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
3. School of Computing &
Information
Complex Predictions
• Prediction models give us a prediction under the
assumption that we know the inputs to the model
with certainty
• However, in many cases we are not certain for the
inputs
– Measurement errors
– Output from other models
– Inputs to the model are realized in the future
– …
4. School of Computing &
Information
Complex Predictions
• In other cases we do not even have a closed form
solution for the quantity we want to predict
– E.g., probability of a team winning a sports league
• In all these cases point estimates will not be able
to account for the randomness and uncertainty
associated with the phenomenon tried to be
described
5. School of Computing &
Information
Monte Carlo Simulations
• Monte Carlo simulations embrace the randomness
and uncertainty expressed through probabilities
for the inputs
– Uncertainty propagation mechanism
• A Monte Carlo – in its simplest form – is an
iterative process of sampling from a probability
distribution
6. School of Computing &
Information
Discrete Event Simulation
• Many times the operation of a system can be
abstracted to a series of discrete events taking
place with some probability
– Special case of Monte Carlo simulations
• Each event changes the state of the system
• The dependencies of these events can be overly
complicated formal treatment might not be
possible
– However, these discrete events can be simulated
several times in order to obtain an estimation of state
probabilities
7. School of Computing &
Information
Discrete Event Simulation
• A sports tournament is a series of matchups
• The outcome of each matchup is associated with a
given probability
• The “state of the league” (e.g., who gets into the
playoffs, who gets a buy week in the playoffs, who
gets the championship etc.) could be estimated
probabilistically by combining the probabilities of
individual matchups
– Tedious
– Solution: discrete event simulation (DES)
8. School of Computing &
Information
Discrete Event Simulation
• The smallest unit of a DES is the event
• In the case of a tournament this event correspond
to a matchup
– Building block for the DES is to simulate a single
event
• The single event simulation resembles a coin flip
– Biased coin
9. School of Computing &
Information
Discrete Event Simulation
Spurs rating: +6.4
Warriors rating: + 11.2
Home edge: 3
Projected point differential: 3+6.4-11.2= - 1.8
Home
Projected Spurs win probability: 43%
0 1
0.43
10. School of Computing &
Information
Discrete Event Simulation
Spurs rating: +6.4
Warriors rating: + 11.2
Home edge: 3
Projected point differential: 3+6.4-11.2= - 1.8
Home
Projected Spurs win probability: 43%
0 1
0.43
Imagine that you draw a line
intersecting the green line
vertically with your eyes closed
Why with closed eyes?
Spurs win Warriors win
If you repeat this process 1,000 times
how many times do you expect Spurs to win?
11. School of Computing &
Information
Discrete Event Simulation
Final
probabilities
Simulate
upcoming
matchups
Update standings,
playoff pairings,
etc.
Tournament not over?
Tournament over? Store
result
Number of simulations not reached?
12. School of Computing &
Information
Discrete Event Simulations
• How can we obtain the probability of each event?
(i.e., for each matchup)
• We can use team ratings
– Team ratings provide an estimate for the final point
difference between a matchup
– How to translate this to a probability?
13. School of Computing &
Information
Discrete Event Simulations
• Hal Stern in his seminal work “On the Probability
of Winning a Football Game” showed that the
difference between the final point margin of a
game and the point spread follows a normal
distribution with mean 0 and standard deviation
of 13.86
– Stern’s study was focused on Vegas’ point spread
– If one uses Vegas point spreads then can simply
calculate the probability of the favorite (by p points)
wins as:
Cumulative Distribution Function for
the standard normal distribution
14. School of Computing &
Information
Discrete Event Simulations
• We do not need to obtain Vegas’ spreads
– We can use our own regression-based rating method
– Will the difference between the point margin and our
prediction follow N(0,13.86)?
• Most probably not
• But we can examine this on past data
– Most probably it will still be a normal distribution but
with different variance
15. School of Computing &
Information
Monte Carlo Simulations
• Some times we need to
model a sequence of
discrete events that are
probabilistic in nature
– Maybe the best example is
modeling the winner of a
sports competition
• What we will need to know
is (i) probability of each
event and (ii) the sequence
of events
16. School of Computing &
Information
Final Four Simulation
• Let’s consider a simple discrete event simulation
case – Olympic Games Final Four
• The discrete events are the outcome of each single
game
– These probabilities can be obtained
through specific predictive models
• We also know how teams are
going to matchup in the future
17. School of Computing &
Information
Final Four Simulation
• For each semi final we flip a biased coin to decide
the winner
– The bias is based on the pre-computed probabilities
for each game
• Based on the outcome of our simulated semi finals
we simulate the corresponding final
• We repeat the process several times and
keep track of how many times each team won
18. School of Computing &
Information
Final Four Simulation
• An unbiased coin has a 50-50 chance of landing
heads or tails
– How to simulate a biased coin?
1-π
If we sample a uniform random distribution between 0 and 1,
the probability of getting a number in the interval [0,π] is
proportional to π.
π
Length π
19. School of Computing &
Information
Final Four Simulation
Gold Silver Bronze
Australia 0.21 0.21 0.29
France 0.22 0.32 0.18
Slovenia 0.18 0.28 0.29
USA 0.39 0.19 0.24
20. School of Computing &
Information
Bootstrap
• Monte Carlo and DES are based on random
sampling of known distributions for the
parameters/variables of the system
– The simulations simply allow us to propagate this
uncertainty to the output
• What if we want to identify the distribution of a
sample estimate but we only have a sample of
observations?
– E.g., assume that we are interested in the average
points scored by the Celtics per game
21. School of Computing &
Information
Bootstrap
• We have a sample from the 50 first games of the
season
• We could make an assumption for the distribution
of the data and use Maximum Likelihood
Estimates and the corresponding standard errors
– Normal distribution seems like an assumption that
someone could readily make since it has been the case
in many other situations (possibly unjustified)
• A better option is to use bootstrap
22. School of Computing &
Information
Bootstrap
• Estimate properties of an estimator through
resampling with replacement
– Assumption: observed data is a random sample of the
original population
• Typically we have only one sample – of n points –
observed for our variable of interest
– We can obtain a sample estimate (e.g., for the mean)
but we cannot estimate the distribution of this
estimator
23. School of Computing &
Information
Bootstrap Illustration
• Points scored by the Celtics during the first 50
games
– What is the average number of points scored?
• We can obtain a sample estimator (98.4 km/h)
• However, we do not know the
distribution of this estimator
– Resampling with replacement
will allow us to learn more about
the estimator
25. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100}
26. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110}
27. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110, 114}
28. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110, 114, 110}
29. School of Computing &
Information
Bootstrap Illustration
Original Sample
X1 = {100, 110, 114, 110, 98} First Bootstrap Sample
…
XB = {98, 101, 95, 79, 100} B-th Bootstrap Sample
30. School of Computing &
Information
Bootstrap Illustration
• Through bootstrap we can
identify the distribution of
an estimator and use it for
our simulations if needed
• For multidimensional data
with correlations block
bootstrap can be used
31. School of Computing &
Information
Block Bootstrap
• Data might exhibit correlations
– Time series
– Spatial data
– Cluster data
– …
• Block bootstrap attempts to replicate the correlation
structure in the bootstrapped samples
– Instead of resampling single data points, blocks of data
are resampled
33. School of Computing &
Information
Biased Bootstrap
• Sampling in bootstrap is uniform
– That is, every sample has the same probability of
being picked
• Sometimes – depending on the application – we
might want to resample the observations in a non-
uniform way
34. School of Computing &
Information
Biased Bootstrap Example
• The points scored per game sample includes
games against opponents of variable strength
• However, 100 points scored against the top
defensive team is not the same as 100 points
scored against the bottom defensive team
• If we want to estimate the distribution of the
average number of points the Celtics score against
teams similar to their next opponent we should
use biased bootstrap
35. School of Computing &
Information
Biased Bootstrap Example
• Let’s assume that Celtics’ next opponent has a
defensive rating of -4 points (i.e., they allow 4
points less than an average defense)
• How can we use this information to get an
estimate for the average points to be scored by the
Celtics ?
• Biased bootstrap based on the defensive rating
37. School of Computing &
Information
Biased Bootstrap Example
• Performances against teams with similar defensive
rating with our next opponent will be sampled
more aggressively
• Obviously one can use more than one variables to
calculate the bias term
– E.g., for simulating future matchups one might need
to control for both offensive and defensive ratings,
home-vs-away games etc.
38. School of Computing &
Information
Why does bootstrap work?
• Bootstrap almost looks as magic!
• The way traditional inferential statistics work is
that we have a population and we randomly
sample a set of points to infer the statistic of
interest
• Ideally we would take several samples from the
population and for each sample calculate the
statistic of interest
– Estimate the variability of the statistic
39. School of Computing &
Information
Why does bootstrap work?
• Getting several samples from the population is not
practical/realistic
• Solution 1 (inferential statistics): make
assumptions for the shape of the population
• Solution 2 (bootstrap statistics): use information
from the (single) population sample that you have
– The sample that we have is a (smaller) population
itself with the same shape as the original population
40. School of Computing &
Information
Why does bootstrap work?
• In this case resampling with replacement
simulates the generation of multiple samples from
the original population
– Replacing back the sampled data points retains the
shape of the original population
• The sample we have is the best information – and
in fact the only information - we have for the
population and bootstrap takes maximum
advantage of it
41. School of Computing &
Information
Bootstrap and Sample Size
• The only assumption that the bootstrap method
makes is that the sample is representative of the
population
– Therefore, if we have a very small sample (e.g., 4
points) then the bootstrap method itself will be
limited
– However, it can still be applied, but the corresponding
estimates might be far from the true mean
• The same though will be true for the sample mean
obtained from the small sample as well
42. School of Computing &
Information
Bootstrap and Sample Size
• There is no restriction on what size should the
bootstrap samples have
– Typically we choose them to have the same size as the
original sample
• With regards to the number of bootstrapped
samples to be obtained this is in some sense
similar to the number of Monte Carlo simulations
– Rule of thumb: the more the better