This document contains lecture notes for a pattern recognition course taught by Dr. Mostafa Gadal-Haqq at Ain Shams University. The notes cover mathematical foundations of pattern recognition including probability theory, statistics, and mathematical notations. Specifically, the notes define concepts like random variables, probability distributions, expected values, variance, and conditional probability. They also provide examples of applying these concepts to problems involving events, outcomes, and data modeling. The document concludes by noting that the next lecture will cover Bayesian decision theory.
This presentation discusses about the following topics:
Truth values and tables,
Fuzzy propositions,
Formation of rules decomposition of rules,
Aggregation of fuzzy rules,
Fuzzy reasoning‐fuzzy inference systems
Overview of fuzzy expert system‐
Fuzzy decision making.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
It is an introductory slide for pattern recolonization. This presentation was quit emotional for me because it was the last academic presentation in Green University of Bangladesh. For that i have used a sad emo at the first of the slide.
Overview of the course. Introduction to image sciences, image processing and computer vision. Basics of machine learning, terminologies, paradigms. No-free lunch theorem. Supervised versus unsupervised learning. Clustering and K-Means. Classification and regression. Linear least squares and polynomial curve fitting. Model complexity and overfitting. Curse of dimensionality. Dimensionality reduction and principal component analysis. Image representation, semantic gap, image features, and classical computer vision pipelines.
This presentation discusses about the following topics:
Truth values and tables,
Fuzzy propositions,
Formation of rules decomposition of rules,
Aggregation of fuzzy rules,
Fuzzy reasoning‐fuzzy inference systems
Overview of fuzzy expert system‐
Fuzzy decision making.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
It is an introductory slide for pattern recolonization. This presentation was quit emotional for me because it was the last academic presentation in Green University of Bangladesh. For that i have used a sad emo at the first of the slide.
Overview of the course. Introduction to image sciences, image processing and computer vision. Basics of machine learning, terminologies, paradigms. No-free lunch theorem. Supervised versus unsupervised learning. Clustering and K-Means. Classification and regression. Linear least squares and polynomial curve fitting. Model complexity and overfitting. Curse of dimensionality. Dimensionality reduction and principal component analysis. Image representation, semantic gap, image features, and classical computer vision pipelines.
Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
Probability theory is the proper mechanism for accounting for uncertainty.
Low-rank matrix approximations in Python by Christian Thurau PyData 2014PyData
Low-rank approximations of data matrices have become an important tool in machine learning and data mining. They allow for embedding high dimensional data in lower dimensional spaces and can therefore mitigate effects due to noise, uncover latent relations, or facilitate further processing. These properties have been proven successful in many application areas such as bio-informatics, computer vision, text processing, recommender systems, social network analysis, among others. Present day technologies are characterized by exponentially growing amounts of data. Recent advances in sensor technology, internet applications, and communication networks call for methods that scale to very large and/or growing data matrices. In this talk, we will describe how to efficiently analyze data by means of matrix factorization using the Python Matrix Factorization Toolbox (PyMF) and HDF5. We will briefly cover common methods such as k-means clustering, PCA, or Archetypal Analysis which can be easily cast as a matrix decomposition, and explain their usefulness for everyday data analysis tasks.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
CSC446: Pattern Recognition (LN3)
1. CSC446 : Pattern Recognition
Prof. Dr. Mostafa G. M. Mostafa
Faculty of Computer & Information Sciences
Computer Science Department
AIN SHAMS UNIVERSITY
Lecture Note 3:
Mathematical Foundations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Appendix, Pattern Classification and PRML
2. CS446 : Pattern Recognition
Readings: Chapter 1 in Bishop’s PRML
Data Modeling (Regression)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3. Learning: Data Modeling
• Assume we have examples of pairs (x , y) and we
want to learn the mapping 𝑭: 𝑿 → 𝒀 to predict y
for future values of x.
𝒚 𝒙 = 𝐬𝐢𝐧( 𝟐𝝅𝒙)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
4. Polynomial Curve Fitting
• Problem: There are many possible mapping
functions 𝑭: 𝑿 → 𝒀 exist!
Which one to choose?
• We could choose the one
that minimize the error :
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
5. Polynomial Curve Fitting
• Fitting a different polynomials (models) to
data:
𝑦 𝑥 = 𝒘 𝟎 𝑦 𝑥 = 𝒘 𝟎+𝒘 𝟏 𝒙
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
6. Polynomial Curve Fitting
• Fitting a different polynomials (models) to
data:
𝑦 𝑥 = 𝒘 𝟎+𝒘 𝟏 𝒙+𝒘 𝟐 𝒙 𝟐
𝑦 𝑥 = 𝒘 𝟎+𝒘 𝟏 𝒙+𝒘 𝟐 𝒙 𝟐
+ ⋯ + 𝒘 𝟖 𝒙 𝟖
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
7. Overfitting
• At M = 9, we get zero training Error , BUT
highest testing Error
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
8. Effect of Data Size
• As number of data samples N increases, we
get more closer to the real data model with
higher order.
M = 9 M = 9
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
9. Performance Evaluation
• Generalization error is the true error for the
population of examples we would like to optimize
– Sample mean only approximates it.
• Two ways to assess the generalization error is:
• Theoretical: Law of Large numbers
– statistical bounds on the difference between the true and
sample mean errors
• Practical: Use a separate data set with m data
samples to test the model
(Mean) test error =
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
10. Assignment 1
1. Derive an equation for estimating the
parameters w from the sample data for
the cases M = 1 and M = 2.
2. Use such equations to draw a relation
between w and E(w) for each M. Use the
estimated values of w as the middle values
of the w range.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
11. CS446 : Pattern Recognition
Readings: Appendix A
Probability & Statistics
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
12. 1- Probability Theory
• Randomness:
–we call a phenomenon random if individual outcomes
are uncertain but there is nonetheless a regular
distribution of outcomes in a large number of
repetitions.
• Probability:
–the probability of any outcome of a random phenomenon
is the proportion of times the outcome would occur in a
very long series of repetitions.
–Probability is the long-term relative frequency.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
13. 1- Probability Theory
• Discrete random variables:
–Let x X ; the sample space X = {v1, v2, ... , vm}.
–We denote by pi the probability that x = vi:
• Where pi must satisfy the following two conditions:
pi = Pr{ x = vi } , i = 1, . . . , m.
m
i
ii pp
1
1and0
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
14. 1- Probability Theory
• Equally likely outcomes:
“Equally likely outcomes are outcomes that
have the same probability of occurring.”
• Examples:
– Rolling a fair die
– Tossing a fair coin
• P(x) is a “Uniform Distribution”
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
15. 1- Probability Theory
• Equally likely outcomes:
• if we have ten identical balls numbered from 0 to 9, in a box
find the probability of randomly drawing a ball with a number
divisible by 3,
– the event space (desired outcomes): A={3,6,9}.
– the sample space (possible outcomes): S = {0, 1, 2, . . . , 9}.
• Since the drawing is at random, then each outcome is equally
likely to occur, i.e.: P(0) = P(1) = P(2) =…= P(9) =1/10
• P(A) ={numb. Of outcomes in A} / {number of outcomes in S}
= 3/10 = 0.3
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
16. 1- Probability Theory
• Biased outcomes (non-uniform dist.):
“Biased outcomes are outcomes that have
different probability of occurring.”
• Examples:
– Rolling a unfair die
– Tossing a unfair coin
• P(x) is a “Non-uniform Dist.”
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
17. 1- Probability Theory
• Biased outcomes (non-uniform dist.):
• A biased coin, twice as likely to come up tails as
heads, is tossed twice:
– What is the probability that at least one head occurs?
• Solution:
– Sample space = {HH, HT, TH, TT}
– P(H= head) = 1/3 , P(T= tail) =2/3
– Sample points/probability for the event:
• P(HT)= 1/3 x 2/3 = 2/9 P(HH)= 1/3 x 1/3= 1/9
• P(TH) = 2/3 x 1/3 = 2/9 P(TT)= 2/3 x 2/3 = 4/9
– Answer: 5/9 = 0.56 (sum of weights in red)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
18. 1- Probability Theory
• Probability and Language
What’s the probability of a random word (from a random
dictionary page) being a verb?
• Solution:
• All words = just count all the words in the dictionary
• # of ways to get a verb: number of words which are verbs!
• If a dictionary has 50,000 entries, and 10,000 are verbs,
then:
• P(Verb) =10000/50000 = 1/5 = .20
wordsall
verbagettowaysof
verbadrawingP
#
)(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
19. 1- Probability Theory
• Conditional Probability
– A way to reason about the outcome of an
experiment based on partial information:
• In a word guessing game the first letter for the word
is a “t”. How likely is the second letter is an “h”?
• How likely is a person has a disease given that a
medical test was negative?
• A spot shows up on a radar screen. How likely it
corresponds to an aircraft?
• I saw your friend, How likely I will saw you?
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
20. 1- Probability Theory
• Conditional Probability
• let A and B be events
• p(B|A) = the probability of event B occurring given event A
occurs
• definition:
)(
),(
)|(
BP
BAP
BAP
A BA,B
Note: P(A,B)=P(A|B) · P(B)
Also : P(A,B) = P(B,A)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
21. 1- Probability Theory
• Conditional Probability
• One of the following 30 items is chosen at random.
• What is P(X), the probability that it is an X?
• What is P(X|red), the probability that it is an X given that it
is red?
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
22. 1- Probability Theory
• Statistically Independent events
–Variables x and y are said to be
statistically independent if and only if:
–That is, knowing the value of x did not
give us any additional knowledge about
the possible value of y
)()(),( yPxPyxP
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
23. 1- Probability Theory
• Marginal Probability
• Conditional Probability
• Joint Probability
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
24. 1- Probability Theory
• Sum Rule
• Product Rule
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
25. 1- Probability Theory
• Sum Rule
• Product Rule
• The Rules of Probability
)()|()()|(),( YpYXpXpXYpYXp
Y
YXpXp ),()(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
26. 1- Probability Theory
• Bayes Theorem
where
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
27. 1- Probability Theory
• Probability mass function, P(x):
– P(x) is the cumulative distribution of p(x).
Xx
z
xP
xP
dxxpz)P(x
1)(and
0)(
)(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
28. 2- Statistics
• Statistics is the science of collecting, organizing, and interpreting numerical
facts, which we call data.
• The best way of
looking at data is to
draw its histogram/
(frequency
distribution)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
29. 2- Statistics
• Univariate Gaussian/Normal Density:
–A density that is analytically tractable
–Continuous density
–A lot of processes are asymptotically Gaussian
Where:
= mean (or expected value) of x
2 = squared deviation or variance
,
2
1
exp
2
1
)(
2
x
xp
1)( dxxp
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
30. 2- Statistics
• Univariate Gaussian/Normal Density
p(u) ~ N(0,1)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
31. 2- Statistics
• Multivariate Normal Density
– Multivariate normal density in d dimensions is:
where:
x = (x1, x2, …, xd)t = The multivariate random variable
= (1, 2, …, d)t = the mean vector
= d*d covariance matrix, || and -1 are it determinant
and inverse, respectively .
)x()x(
2
1
exp
)2(
1
)x( 1
2/12/
t
d
p
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
32. 2- Statistics
• Multivariate Density: Statistically Independent
– If xi and xj are statistically independent
σij = 0.
– In this case, p (x) reduces to the product of the
univariate normal densities for the components of
x. That is: if p(xi) ~ N(xi | µi , σi )
p(x) = p(x1,x2, …, xd) = p(x1) p(x2) … p(xd)
= p(xi) ,
i
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
33. 2- Statistics
• Multivariate Normal Density
– From the multivariate normal density, the loci of
points of constant density are hyperellipsoids for
which the quadratic form (x−µ)t Σ−1(x−µ) is
constant
– The quantity:
r2 = (x−µ)t Σ−1 (x−µ)
is sometimes called the squared Mahalanobis
distance from x to µ.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
35. 2- Statistics
Expected values:
• The expected value, mean or average of the random variable
x is defined by:
• if f(x) is any function of x, the expected value of f is defined
by:
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
36. 2- Statistics
Expected values:
• The second moment of x is defined by:
• The variance of x is defined by:
where σ is the standard deviation of x.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1