Maximum likelihood estimation (MLE) is a technique for estimating parameters in a probabilistic model based on observed data. MLE finds the parameter values that maximize the likelihood function, or the probability of obtaining the observed data given the parameters. This involves writing the log likelihood function, taking its derivative with respect to the parameters, and solving for the parameter values that set the derivative to zero. MLE was demonstrated for Bernoulli, normal, Poisson, and Markov chain models using both theoretical examples and observed data. In practice, MLE provides a principled approach for learning probability distributions from samples.
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
Definition of Machine Learning
Type of Machine Learning:
Classification
Regression
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning:
Supervised Classification
Training set
Hypothesis class
Empirical error
Margin
Noise
Inductive bias
Generalization
Model assessment
Cross-Validation
Classification in NLP
Types of Classification
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
Definition of Machine Learning
Type of Machine Learning:
Classification
Regression
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning:
Supervised Classification
Training set
Hypothesis class
Empirical error
Margin
Noise
Inductive bias
Generalization
Model assessment
Cross-Validation
Classification in NLP
Types of Classification
This report is based on my final report of the course CommE 5051: Mathematical Principles of Machine Learning, National Taiwan University, 2018 spring. In this report, some theoretical principles of domain adaptation established in the literature are briefly presented.
This report is based on my final report of the course CommE 5051: Mathematical Principles of Machine Learning, National Taiwan University, 2018 spring. In this report, some theoretical principles of domain adaptation established in the literature are briefly presented.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2. Overview
• Our recipe for learning
Data
Step 1: collect data
Samples
Features
Step 0: form a scientific hypothesis
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
3. Coin Flip Example
• Our recipe for learning
Step 1: collect data
Samples
Feature
Step 0: I got a new coin. What is the probability of head?
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
4. Weight Example
• Our recipe for learning
Step 1: collect data
Samples
Feature
Step 0: What is the distribution of weights of people in Pittsburgh?
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
6. Statistical Inference, Learning: a Formal Set Up
• Given a sample X1,...,Xn ∼ F, how do we learn our probability model?
– Statistical model F: a set of distributions
– Parametric model: F that can be parameterized by a finite number of parameters
F = { p( x| 𝜃 ): 𝜃 ∈Θ },
where 𝜃 = {𝜃1, ..., 𝜃k} is unknown parameters θ ∈ Θ.
7. Parametric Models: Coin Flip
Ex) Probability of head p of a coin
Bernoulli distribution
Parameter:
Parametric model:
8. Parametric Models: Gene Expression
Ex) Learn the probability distribution of the expression levels of gene A
Normal distribution
Parameters:
Parametric model:
9. Which Parametric Model?
• Discrete or continuous?
• Univariate or multivariate features?
• Prior knowledge
ex) Count data in intervals: Poisson distribution
ex) real-value, bell-shape: Normal distribution
• Evidence from exploratory analysis of data
ex) relationship between mean and variance: Poisson vs negative binomial distribution
10. Overview
• Our recipe for learning
Data
Step 1: collect data
Samples
Features
Step 0: form a scientific hypothesis
Step 2:
Pick the
appropriate
probability
distribution,
with parameter 𝜃
Step 3:
Estimate the
parameter 𝜃
11. Maximum Likelihood Estimation
• The most commonly used method for parametric estimation
• MLE for probability distributions we have seen so far
• MLE for complex state-of-the-art probability models
12. Parametric Models Meet Data
• Ingredient 1: Given our choice of parametric model X ~ F, for a random variable X
• Ingredient 2: Collect data D for n samples. Model each sample as a random variable,
X1,...,Xn
• Recipe: Maximum likelihood estimation (MLE)
– Pick a member in the set F that maximizes the “likelihood of data D”
14. Formally, Parametric Estimation, Point Estimation
• Select f ∈ F that best describes data
• Select parameter 𝜃 that best describes data
• Inference: estimate 𝜃 given data
• Maximum likelihood estimate: the parameter 𝜃 estimated from data using MLE
15. MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Likelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜃 that maximizes the score function
16. “Likelihood of Data”
• Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution
P(X|𝜃) in F. Then, the “likelihood of data” is defined as the probability of data D
P (D| 𝜃) =
17. “Likelihood of Data”
• Assume X1,...,Xn are i.i.d. random variables, representing samples from a distribution
P(X|𝜃) in F.
• The likelihood function, also called the likelihood of data, is given as
𝐿"(𝜃) = ∏#$%
"
P(Xi |𝜃)
• Function of parameter 𝜃 given data X1,...,Xn
18. “Log Likelihood of Data”
• The log likelihood function is
𝑙"(𝜃) = log 𝐿"(𝜃)
=
• Why log likelihood instead of likelihood?
19. MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
20. Maximum Likelihood Estimation
• Maximum likelihood estimator
*
𝜃" = argmax
&
𝐿"(𝜃)
= argmax
&
𝑙"(𝜃)
log is a monotonically increasing concave function
21. MLE as an Optimization Problem
• How can we select parameter 𝜃 that best describes data?
• Maximum likelihood estimation as an optimization
– Llikelihood: Score function, for scoring how well a candidate parameter 𝜃 describes data
– Maximize the likelihood: find the parameter 𝜽 that maximizes the score function
23. MLE, Examples
• Let’s perform MLE
– Bernoulli distribution
• You have a coin with unknown
probability of head, p
• You flip the coin 10 times and get
H, H, T, T, H, T, T, H, T, H
• What is your estimate of the
probability of head p?
– Normal distribution
• You have a gene, gene A, whose
expression level and variance are
unknown
• You collect the expression
measurements of gene A for 7
individuals
• How would you estimate for mean and
variance?
24. MLE: Bernoulli
• Let X1,...,Xn ∼ Bernoulli(p). Find a maximum likelihood estimate p.
• Step 1: Write down the log likelihood of data
𝐿"(𝑝) =
𝑙"(𝑝) =
26. MLE: Bernoulli Distribution with Observed Data
• Let X ∼ Bernoulli(p).
• Flipped the coin 10 times and got
H, H, T, T, T, H, H, T, T, H
• Step 1: Write down the log likelihood of data
𝐿(𝑝; 𝐷) =
𝑙(𝑝; 𝐷) =
28. MLE: Normal Distribution
• Let X1,...,Xn ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 .
• Step 1: Write down the log likelihood of data
𝐿"(µ, σ2) =
𝑙"(µ, σ2) =
31. MLE: Normal Distribution with Observations
• Let X ∼ N(µ, σ2). Find a maximum likelihood estimate of µ, σ2 .
• Data with 10 samples
10, 12, 9, 14, 8, 11, 7, 6, 10.5, 12.5
• Step 1: Write down the log likelihood of data
𝐿"(µ, σ2) =
𝑙"(µ, σ2) =
32. MLE: Poisson Distribution
• Let X1,...,Xn ∼ Poisson(𝜆). Find a maximum likelihood estimate 𝜆.
• Step 1: Write down the log likelihood of data
𝐿"(𝜆) =
𝑙"(𝜆) =
34. Markov Model
• Joint distribution of all binary random variables X1, . . . , XT
P(X1, . . . , XT )
• A Markov model is defined by
– P(X1): initial distribution
– P(Xk| Xk-1): transition probabilities identical for k=2,…,T
35. MLE: Markov Chain
• Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the
Markov chain parameters 𝜃.
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
36. MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(X1)?
37. MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(Xk| Xk-1=0)?
38. MLE: Markov Chain, Intuition
• Assume three sequences of observations D
1, 0, 0, 1, 1
1, 0, 1, 1, 0
0, 1, 0, 0, 1
• What is your estimate of P(Xk| Xk-1=1)?
39. MLE: Markov Chain
• Let X1,...,XT follows Markov chain. Find a maximum likelihood estimate of the
Markov chain parameters 𝜃.
• Assume three sequences of observations D
X1
1,...,XT
1
X1
2,...,XT
2
X1
3,...,XT
3
40. MLE: Markov Chain
• Step 1: Write down the log likelihood of data
𝐿"(𝜃) =
𝑙"(𝜃) =
43. MLE in Practice
• Follow the recipe and work out MLE on paper, given probability model
• Write a program to compute a maximum likelihood estimate of parameters from
data
– Load data into memory
– For normal distribution, compute μ = sample mean, σ2 = sample variance
– For Bernoulli distribution, compute p = proportion of successes
• Simple but important learning principle!
44. Maximum Likelihood Estimation as an Optimization Problem
• To find an MLE of the parameter 𝜃, we solve the following optimization problem
*
𝜃" = argmax
&
𝑙"(𝜃)
• In our examples for Bernoulli, univariate/multivariate normal distributions
– 𝐿!(𝜃) is a convex function: a single global maximum
– a closed-form solution exists
45. Maximum Likelihood Estimation as an Optimization Problem
• For some models, MLE is easy
• In general, in more complex probability models, performing MLE is not always easy
– 𝐿!(𝜃) is a non-convex function, multiple local maxima
– a closed-form solution does not exist, need to rely on iterative optimization methods
46. Other Probability Models and MLE: Neural Networks
• Deep neural nets for modeling P(Y|X)
• Optimization criterion for learning the model: MLE!
• No closed form solution for the parameter estimates:
– Rely on iterative optimization method
47. Summary
• Maximum likelihood estimation is the most commonly used technique for learning a
parametric probability models from data
• MLE, recipe
– Write down the log likelihood
– Differentiate the log likelihood with respect to the parameters
– Set the above to zero and solve for the parameters
• MLE for
– Bernoulli distribution
– Univariate/multivariate normal distributions