To download slides:
http://www.intelligentmining.com/category/knowledge-base/
These are my notes for a presentation I did internally at IM. It covers both the multinomial and multi-variate Bernoulli event models in Naive Bayes text classification.
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
The talk is concerned with a particular definition of statistical sparsity as a stochastic limit. The limit definition is satisfied by every example that has been proposed in the literature on sparse signal detection, so, in that sense it is uncontroversial. Nonetheless, the definition has implications for sparse signal detection. For example, it puts very specific limits on the types of inferential questions (integrals or conditional expectations) that we can hope to address. It also implies that certain pairs of sparse models are first -order equivalent, or effectively equivalent.
(Joint work with Nick Polson)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
The talk is concerned with a particular definition of statistical sparsity as a stochastic limit. The limit definition is satisfied by every example that has been proposed in the literature on sparse signal detection, so, in that sense it is uncontroversial. Nonetheless, the definition has implications for sparse signal detection. For example, it puts very specific limits on the types of inferential questions (integrals or conditional expectations) that we can hope to address. It also implies that certain pairs of sparse models are first -order equivalent, or effectively equivalent.
(Joint work with Nick Polson)
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
Sentiment analysis using naive bayes classifier Dev Sahu
This ppt contains a small description of naive bayes classifier algorithm. It is a machine learning approach for detection of sentiment and text classification.
Introduction to Factorization Machines model with an example. Motivations - why you should have it in your toolbox, model and it expressiveness, use case for context-aware recommendations and Field-Aware Factorization Machines.
Overview of structured search technology. Using the structure of a document to create better search results for document search and retrieval.
How both search precision and recall is improved when the structure of a document is used.
How a keyword match in a title of a document can be used to boost the search score.
Case studies with the eXist native XML database.
Steps to set up a pilot project.
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
Title: Factorization Machines
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
To download please go to: http://www.intelligentmining.com/category/knowledge-base/
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on Dec. 10, 2009.
A gentle introduction to 2 classification techniques, as presented by Kriti Puniyani to the NYC Predictive Analytics group (April 14, 2011). To download the file please go here: http://www.meetup.com/NYC-Predictive-Analytics/files/
Here the interest is mainly to compute characterisations like the entropy,
the Kullback-Leibler divergence, more general $f$-divergences, or other such characteristics based on
the probability density. The density is often not available directly,
and it is a computational challenge to just represent it in a numerically
feasible fashion in case the dimension is even moderately large. It
is an even stronger numerical challenge to then actually compute said characteristics
in the high-dimensional case.
The task considered here was the numerical computation of characterising statistics of
high-dimensional pdfs, as well as their divergences and distances,
where the pdf in the numerical implementation was assumed discretised on some regular grid.
We have demonstrated that high-dimensional pdfs,
pcfs, and some functions of them
can be approximated and represented in a low-rank tensor data format.
Utilisation of low-rank tensor techniques helps to reduce the computational complexity
and the storage cost from exponential $\C{O}(n^d)$ to linear in the dimension $d$, e.g.\
$O(d n r^2 )$ for the TT format. Here $n$ is the number of discretisation
points in one direction, $r<<n$ is the maximal tensor rank, and $d$ the problem dimension.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
2. Outline
Quick refresher for
Naïve Bayesian Text Classifier
Event Models
Multinomial Event Model
Multi-variate Bernoulli Event Model
Performance characteristics
Why are they important?
2
3. Naïve Bayes Text Classifier
Supervised Learning
Labeled data set for training to classify the
unlabeled data
Easy to implement and highly scalable
It is often the first thing to try
Successful cases: Email spam filtering,
News article categorization, Product
classification, Social content categorization
3
5. N.B. Text Classifier - Classifying
d1= w1w3|y=? Classify w1 w2 w3 … wn
To classify a document, pick the class
with “maximum posterior probability”
Argmax P(Y = c | w1,w 3 )
c
= Argmax P(w1,w 3 |Y = c)P(Y = c)
c P(w1|y=0)
= Argmax P(Y = c)∏ P(w n |Y = c) P(w3|y=1)
c
n P(w3|y=0)
P(w1|y=1)
P(Y = 0)P(w1 |Y = 0)P(w 3 |Y = 0)
P(Y = 1)P(w1 |Y = 1)P(w 3 |Y = 1)
5
y=0 y=1
€
€
6. Bayesian Theorem
Likelihood Class Prior
P(X |Y )P(Y )
P(Y | X) =
P(X)
Posterior Evidence
€ Generative learning algorithms model “Likelihood”
P(X|Y) and “Class Prior” P(Y) then make
prediction based on Bayes Theorem.
6
7. Naïve Bayes Text Classifier
P(X |Y )P(Y )
P(Y | X) =
P(X)
Y ∈ {0,1} d = w1,w 6 ,...,w n
Apply Bayes’ Theorem: €
P(w1,w 6 ,...,w n |Y = 0)P(Y = 0)
€P(Y = 0 | w1,w 6 ,...,w n ) =
P(w1,w 6 ,...,w n )
P(w1,w 6 ,...,w n |Y = 1)P(Y = 1)
P(Y = 1 | w1,w 6 ,...,w n ) =
P(w1,w 6 ,...,w n )
€
€
€ 7
€
8. Naïve Bayes Text Classifier
P(X |Y )P(Y )
P(Y | X) =
P(X)
Y ∈ {0,1} d = w1,w 6 ,...,w n
To find most likely class label for d: €
Argmax P(Y = c | w1,w 6 ,...,w n )
€ € c
P(w1,w 6 ,...,w n |Y = c)P(Y = c)
= Argmax
c P(w1,w 6 ,...,w n )
€ = Argmax P(w1,w 6 ,...,w n |Y = c)P(Y = c)
c
How do we estimate likelihood?
€ 8
€
9. Estimate Likelihood P(X|Y)
How to estimate Likelihood P(w1,w 6 ,...,w n |Y = c) ?
Naïve Bayes’ assumption - assume that the words
(wn) are conditionally independent given y.
P(w1,w 6 ,...,w n |Y € c)
=
= P(w1 |Y = c) * P(w 6 |Y = c) * ...* P(w n |Y = c)
= ∏ P(w i |Y = c)
i∈n Different event models
€ = ∑ log(P(w i |Y = c))
i∈n 9
€
10. Differences of Two Models
∏ P(w i |Y = c)
Multinomial Event model i∈n
tf wi ,c Term Freq. of wi in Class c
P(w i |Y = c) =
c €
Sum of all Term Freq. in Class c
Multi-variate Bernoulli Event model
€ df wi ,c Doc Freq. of wi in Class c
P(w i |Y = c) = # of Docs in Class c
Nc
when wi not exists in d
10
P(w i |Y = c) = 1− P(w i |Y = c)
€
12. Comparison of Two Models
tf wi ,c ∏ P(w |Y = c)
Multinomial: P(w i |Y = c) =
c i∈n
i
d = {w1,w 2 ,...,w n } n: # of words in d w n ∈ {1,2,3,..., V }
c €
Argmax ∑ log(P(w i |Y = c)) + log( )
€
c
i∈n D
€
df wi ,c
Multi-variate Bernoulli: P(w i |Y = c) =
Nc
d = {w1,w 2 ,...,w n } n: # of words in vocabulary |V| w n ∈ {0,1}
€ c
Argmax ∑ log(P(w i |Y = c) (1− P(w i |Y = c))
wi 1−w i
) + log( )
c
i∈n D 12
€
13. Multinomial
For each
word in
doc
w1 w2 w3 w4 .. wn w1 w2 w3 w4 .. w5
Y=0 Y=1
A% B%
13
14. Multi-variate Bernoulli
For each
word in
doc
Y=1 A%
W
Y=0 1-A%
When W does exists
in doc
Y=1 B%
W’
Y=0 1-B%
When W does not
exists in doc
14
15. Performance Characteristics
Multinomial would perform better Multi-variate
Bernoulli in most text classification tasks, especially
when vocabulary size |V| >= 1K
Multinomial perform better when handling data sets
that have large variance in document length
Multivariate Bernoulli could have the advantage of
dense data set
Non-text features could be added as additional
Bernoulli variables. However it should not be added to
the vocabulary in Multinomial model
15
16. Why are they interesting?
Certaindata points are more suitable for one
event model than the other.
Example:
Web page text + “Domain” + “Author”
Social content text + “Who”
Product name / desc + “Brand”
We can create a Naïve Bayes Classifier that
combines event models
Most importantly, try both on your data set
16
18. Appendix
Laplace Smoothing
Generative vs Discriminative Learning
Multinomial Event Model
Multi-variate Bernoulli Event Model
Notation
18
19. Laplace Smoothing
Multinomial:
tf wi ,c + 1
P(w i |Y = c) =
c+V
Multi-variate Bernoulli:
€ df wi ,c + 1
P(w i |Y = c) =
Nc + 2
19
€
20. Generative vs. Discriminative
Discriminative Learning Algorithm:
Try to learn P(Y | X) directly or try to map input X
to labels Y directly
Generative Learning Algorithm:
Try to model P(X |Y) and
€ P(Y ) first, then use
Bayes theorem to find out
P(X |Y )P(Y )
P(Y | X) =
€ P(X)
€ 20
21. Multinomial Event Model
tf wi ,c
P(w i |Y = c) =
c
d = {w1,w 2 ,...,w n } n: # of words in d w n ∈ {1,2,3,..., V }
= Argmax P(w1,w 6 ,...,w n |Y = c)P(Y = c)
c
€ €
c tf wi ,c
= Argmax ∑ log( ) + log( )
c
i∈n
c D
21
22. Multi-variate Bernoulli Event Model
df wi ,c
P(w i |Y = c) =
Nc
d = {w1,w 2 ,...,w n } n: # of words in vocabulary |V| w n ∈ {0,1}
= Argmax P(w1,w 6 ,...,w n |Y = c)P(Y = c)
c
€
df wi ,c
df
€ wi ,c 1−wi c
= Argmax ∑ log(( ) )(1− ( wi
) ) + log( )
c
i∈n
Nc Nc D
22
23. Notation
d: a document
w: a word in a document
X: observed data attributes
Y: Class Label
|V|: number of terms in vocabulary
|D|: number of docs in training set
|c|: number of docs in class c
tf: Term frequency
df: document frequency
!df:
23
24. References
McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes
text classification. In: Learning for Text Categorization: Papers from the AAAI
Workshop, AAAI Press (1998) 41–48 Technical Report WS-98-05
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive
Bayes – Which Naive Bayes (2006) Third Conference on Email and Anti-Spam
(CEAS)
Schneider, K: On Word Frequency Information and Negative Evidence in
Naive Bayes Text Classification (2004) España for Natural Language
Processing, EsTAL
24