SlideShare a Scribd company logo
Classifying a text to iOS without
CoreML: how and why?
Viacheslav Volodko
SMS Filter
• Filters SMS spam
• Freemium model
• ML-based checks on 

Server-side
• 4 localizations:
– Ukrainian
– English
– German
– Russian
SMS Filter
Why Language Detection?
1. Its preliminary step in SMS Spam detection
2. We can’t claim we filter spam for languages we don’t
know.
NLLanguageRecognizer
NLLanguageRecognizer.dominantLanguage(for: "Hello, how are you doing?")?.name
// English
NLLanguageRecognizer.dominantLanguage(for: "Привіт, як твої справи")?.name
// Українська
NLLanguageRecognizer.dominantLanguage(for: "Привет, как твои дела?")?.name
// Русский
NLLanguageRecognizer.dominantLanguage(for: "Hallo, wie geht es dir?")?.name
// Deutsch
let realWorldSMS =
"""
VITAEMO Kompiuternum vidbirom na nomer,vipav
pryz:AUTO-MAZDA SX-5
Detali:
+38(095)857-58-64
abo na saiti:
www.mir-europay.com.ua
"""
NLLanguageRecognizer.dominantLanguage(for: realWorldSMS)?.name
// Hrvatski
Why not NSStringTransform?
let detransliteratedString =
realWorldSMS.applyingTransform(StringTransform.latinToCyrillic, reverse: false) ?? ""
// ВИТАЕМО Компиутернум видбиром на номер,випав
// прыз:АУТО-МАЗДА СКС-5
// Детали:
// +38(095)857-58-64
// або на саити:
// ууу.мир-еуропаы.цом.уа
NLLanguageRecognizer.dominantLanguage(for: detransliteratedString)?.name
// русский
Why not Detransliteration?
1. Get language-aware
transliterator
2. Transliterate text onto
Ukrainian, Russian
3. Make language prediction for
original and transliterated
texts
4. Get language with highest
probability
Ukrainian
Russian
None
English
Bulgarian
Why not Detransliteration?
let ukrainianTranslitText = "Privit, jak tvoji spravy?"
let detransliteredUkrText = ukrainianTranslitText
.applyingTransform(StringTransform.latinToCyrillic, reverse: false) ?? ""
// Привит, йак твойи справы?




let englishText = "Hello, how are you doing?"
let detransliteredEngText = englishText
.applyingTransform(StringTransform.latinToCyrillic, reverse: false) ?? ""
// Хелло, хоу аре ыоу доинг?
So what now?
Language detection = Text classification
Let’s use Core ML + Create ML
1. Text classification models included:
• maximum entropy model
• conditional random field
2. It’s ready made solution
Core ML + Create ML
Prepare dataset
func testPreprocessText() {
// GIVEN
let text = """
Вітаємо, dear@friend.com!
Ми заборгували вам 5.00 гривень,
і хотіли б повернути їх до 21.03.2019.
Зателефонуйте нам на +38 (012) 345-67-89
або відвідайте example.com, щоб дізнатись деталі!
"""
// WHEN
let preprocessedText = testedPreprocessor.preprocessedText(for: text)
// THEN
XCTAssertEqual(preprocessedText,
"Вітаємо Ми заборгували вам гривень і хотіли б повернути їх " +
"Зателефонуйте нам на або відвідайте щоб дізнатись деталі")
}
Вітаємо, dear@friend.com!
Ми заборгували вам 5.00 гривень,
і хотіли б повернути їх до 21.03.2019.
Зателефонуйте нам на +38 (012) 345-67-89
або відвідайте example.com, щоб дізнатись деталі!
Training Core ML model
public struct DatasetItem {
let text: String
let label: String
}
public protocol Dataset {
var items: [DatasetItem]
}
public static func trainCoreMLClassifier(with preprocessor: Preprocessor,
on dataset: Dataset) throws -> MLTextClassifier {
let data: [String: MLDataValueConvertible] = [
"text": dataset.items.map { preprocessor.preprocessedText(for: $0.text) },
"label": dataset.items.map { $0.label },
]
let trainingDataTable = try MLDataTable(dictionary: data)
let mlClassifier = try MLTextClassifier(trainingData: trainingDataTable,
textColumn: "text",
labelColumn: "label")
return mlClassifier
}
Using CoreML Model
public func predictedLabel(for string: String) -> String? {
let input =
try? MLDictionaryFeatureProvider(dictionary: ["text": string])
let prediction = try? mlModel.prediction(from: input)
return prediction?.featureValue(for: "label")?.stringValue
}
let language = predictedLabel(for: "Hello, how are you?")
// en
Evaluating CoreML Model
Dataset
Train data
80%
Train data
20%
Evaluating CoreML Model
func testAccuracy() {
// GIVEN
let preprocessor = TrivialPreprocessor()
let (trainDataset, testDataset) =
self.testDatasets.languagesDataset.splitTestDataset(startPersentage: 0.8,
endPersentage: 1.0)
let classifier = CoreMLClassifier.train(with: preprocessor, on: trainDataset)
// WHEN
let testResults = classifier.test(on: testDataset)
// THEN
XCTAssertGreaterThan(testResults.accuracy, 1.0)
// failed: ("0.9463667820069204") is not greater than ("1.0") -
}
Cross Validation
Dataset
Train data
80%
Train data
20%
Ukrainian English Russian German
Step 1
Cross Validation
Ukrainian
English
Russian
German
0 30 60 90 120
Test Data Train Data
Step 2
Cross Validation
Ukrainian
English
Russian
German
0 30 60 90 120
Test Data Train Data
Step 2
Cross Validation
Ukrainian
English
Russian
German
0 30 60 90 120
Train Data Test Data Train Data
Cross Validation
func testCrossvalidateAdvancedPreprocessor() {
// GIVEN
let dataset = testDatasets.languagesDataset
// WHEN
let results =
CoreMLClassifier.crossValidate(on: dataset,
with: AdvancedPreprocessor())
// THEN
XCTAssertGreaterThan(results.accuracy, 1.0)
// failed: ("0.9661251296232285") is not greater than ("1.0")
}
func testCrossvalidateAdvancedPreprocessor() {
// GIVEN
let dataset = testDatasets.languagesDataset
// WHEN
let results =
CoreMLClassifier.crossValidate(on: dataset,
with: AdvancedPreprocessor())
// THEN
XCTAssertGreaterThan(results.accuracy, 1.0)
// failed: ("0.9661251296232285") is not greater than ("1.0")
}
func testAccuracy() {
// GIVEN
let testDataset = testDatasets.languagesDataset
// WHEN
let results = testedClassifier.test(on: testDataset)
// THEN
XCTAssertGreaterThan(results.accuracy, 1.0)
// failed: ("0.8022435526772291") is not greater than ("1.0")
}
CoreML vs NLLanguageRecognizer
NL Language
Recognizer
80,2% 👎
Core ML 96,6% 👍
HAPPY END
What could go wrong?
RAM Problem
• CoreML model file size: 556 KB
• Loading - breaks 6 mb RAM limit
Memory-Mapped File
• A memory-mapped file is a segment of virtual memory
that has been assigned a direct byte-for-byte correlation
with some portion of a file or file-like resource.
• Google’s FlatBuffers library
• https://google.github.io/flatbuffers/
Core ML + Memory-Mapped file
Building our own classifier
• Max Entropy
• Conditional random field
• Naive Bayes
• Decision Tree
• Many others
👉
• Based on Bayes’ Theorem:
Naive Bayes classifier
P(A|B) =
P(B|A)P(A)
P(B)
(1)
<latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit><latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit><latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit><latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit>
Thomas Bayes

1701-1761
Naive Bayes classifier
D = {d1, d2, ..., dm}
F = {f1, f2, ..., fq}
C = {c1, c2, ..., cr}<latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit><latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit><latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit><latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit>
Text samples:
Features (words):
Classes (languages):
Naive Bayes classifier
Cmax = argmax
c✏C
P(c|d) = argmax
c✏C
P (d|c)P (c)
P (d) = (1)
<latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit><latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit><latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit><latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit>
= argmax
c✏C
P(d|c)P(c) = argmax
c✏C
ln(P(d|c)P(c)) (1)
<latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit><latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit><latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit><latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit>
Assumptions:
• Order of words does not matter
• Probabilities of words are independent:
Naive Bayes classifier
P(fi  fj|c) = P(fi|c)P(fj|c)<latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit><latexit sha1_base64="VXNGbSPDGJW1KI7STe5hAY0Bb2g=">AAACInicbZDLSsNAFIZPvNZ4q27dDBZBNyVxoxtB0IXLCtYKTQmT6UkdnUzizEQosS/kxlcRwYUiPouTtOClHhjm5/vncOb8USa4Np735szMzs0vLNaW3OWV1bX1+sbKpU5zxbDNUpGqq4hqFFxi23Aj8CpTSJNIYCe6PSn9zj0qzVN5YYYZ9hI6kDzmjBqLwvqpG0Q44LLAu7xCI7e1G4c8YDQjcXjzwPbIEamQleVdIjdA2f9uCesNr+lVRaaFPxENmFQrrL8E/ZTlCUrDBNW663uZ6RVUGc4Ejtwg15hRdksH2LVS0gR1r6i2HZEdS/okTpU90pCK/uwoaKL1MInsy4Saa/3XK+F/Xjc38WGv4DLLDUo2HhTngpiUlNGRPlfIjBhaQZni9q+EXVNFmbEBuzYE/+/K0+Jyv+l7Tf/cgxpswTbsgg8HcAxn0II2MHiEZ3iDd+fJeXU+xnHNOJPcNuFXOZ9fD86l8A==</latexit><latexit sha1_base64="Dkj1pEItaaly4pTiWGp046C2Hkc=">AAACInicdZDPahsxEMZn06R1t2nj9pqLqCk4F6NNiR0fCoHmkKML9R/wmkUrzzpqtNqNpC2YjV+ol75KCeSQEPIs1a5TkpR2QOjj92kYzRfnUhhL6bW38Wxz6/mLxkv/1fbrNzvNt9sjkxWa45BnMtOTmBmUQuHQCitxkmtkaSxxHJ99rvzxd9RGZOqrXeY4S9lCiURwZh2Kmsd+GONCqBLPixqt/EE7iUTIWU6S6NsF3yOfSI2crO4K+SGq+UNL1GzRTp8efNynhHaCg263161EQHv9gAQdWlcL7msQNS/DecaLFJXlkhkzDWhuZyXTVnCJKz8sDOaMn7EFTp1ULEUzK+ttV+SDI3OSZNodZUlNH3eULDVmmcbuZcrsqfnbq+C/vGlhk8NZKVReWFR8PSgpJLEZqaIjc6GRW7l0gnEt3F8JP2WacesC9l0IfzYl/xej/U7gkvlCoQG78B7aEEAPjuAEBjAEDj/gF1zDjffTu/Ju13FtePe5vYMn5d39BnCppjU=</latexit><latexit sha1_base64="OSnWW1mYol7+XaMtstNB1bX7jDw=">AAACLXicdZDNahsxFIU1btqm06Zx22U2IqbgbozGwXa9CJi2iyxdiH/AYwaNfMdWrNFMJE3BTPxC3fRVSqGLlJJtXqOasUN+SC8IHb57D9I9YSq4NoRcOpUnO0+fPd994b58tfd6v/rm7VAnmWIwYIlI1DikGgSXMDDcCBinCmgcChiFy89Ff/QNlOaJPDWrFKYxnUsecUaNRUH1i+uHMOcyh/OsRGu3X48C7jOa4ig4u2Af8DEukZXFXSDXBzm7tQTVGml0SeuoSTBpeK12u9MuhEc6XQ97DVJWDW2rH1R/+bOEZTFIwwTVeuKR1ExzqgxnAtaun2lIKVvSOUyslDQGPc3Lbdf4vSUzHCXKHmlwSe86chprvYpDOxlTs9APewV8rDfJTPRxmnOZZgYk2zwUZQKbBBfR4RlXwIxYWUGZ4vavmC2ooszYgF0bws2m+P9i2Gx4NpmvpNb7tI1jFx2gQ1RHHuqgHjpBfTRADH1HP9El+uP8cH47f52rzWjF2XreoXvlXP8DnrOnyQ==</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit>
Naive Bayes classifier
Cmax = argmax
c✏C
ln(P(d|c)P(c)) (1)
<latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit><latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit><latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit><latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit>
= argmax
c✏C
ln P(c)
nY
i=1
P(fi|c)
<latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit><latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit><latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit><latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit>
= argmax
c✏C
ln P(f1, f2, ..., fn|c)P(c)
<latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit><latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit><latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit><latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit>
= argmax
c✏C
ln P(c) +
nX
i=1
ln P(fi|c)
<latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit><latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit><latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit><latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit>
Naive Bayes classifier
P(fi|cj) =
count(fi, cj)
Pq
k=1 count(fk, cj)<latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit><latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit><latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit><latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit>
P(fi|cj) =
count(fi, cj) + z
Pq
k=1 count(fk, cj) + zq<latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit><latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit><latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit><latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit>
Laplace smoothing:
Naive Bayes classifier
Building model:
typealias Model = [String: [String: Int]]
var model: Model = [
"uk": [
"Вітаю": 1,
"вас": 2,
...
],
"en": [
"Hello": 1,
"dear": 2,
...
]
...
]
Naive Bayes classifier
Building model:
for label in labels {
for text in trainTextsForLabel[label] {
let words = preprocessor.preprocess(text: text)
for word in words {
model[label][word] += 1
}
}
}
Naive Bayes classifier
Predicting label of text:
1. Preprocess text

2. Split onto words

3. Calculate probability of each word in label
«Зателефонуйте нам на +38 (012) 345-67-89» «Зателефонуйте нам на»
["Зателефонуйте", "нам", "на"]
["uk": ["Зателефонуйте": 0.84,
"нам": 0.1,
"на": 0.1],
"ru": ["Зателефонуйте": 0.0,
"нам": 0.1,
"на": 0.1], ...]
Naive Bayes classifier
Predicting label of text:
4. Calculate probability of label:







5. Return label with max probability:
[
"uk": -180.3,
"ru": -234.5,
"en": -2004.3,
...
]
"uk"
Naive Bayes classifier
Cross Validation:
func testCrossvalidate() {
// GIVEN
let dataset = self.testDatasets.testDataset
// WHEN
let results = NaiveBayesClassifier.crossValidate(on: dataset,
with: TrivialPreprocessor())
// THEN
XCTAssertGreaterThan(results.accuracy, 1.0)
// 0.9782382220164371
}
NL Language
Recognizer
80,2% 👎
Core ML 96,6% 👍
Naive Bayes 97,8% 👍
Objective-C 

Wrapper Framework
Naive Bayes + FlatBuffers
Schema
File
FlatBuffers
schema
compiler
C++
File
Naive Bayes + FlatBuffers
var model: Model = [
"uk": [
"Вітаю": 1,
"вас": 2,
...
],
"en": [
"Hello": 1,
"dear": 2,
...
]
...
]
schema.fbs:
namespace flatcollections;
table StringIntDictionary {
entries:[StringIntDictionaryEntry];
}
table StringIntDictionaryEntry {
key:string (key);
value:int64;
}
root_type StringIntDictionary;
FlatBuffers: Create Dictionary
#import "schema_generated.h"
@property (nonatomic, copy) NSDictionary<NSString *, NSNumber *> *dictionary;
- (NSData *)serialize {
}
// 1. Alloc 10MB buffer on stack
FlatBufferBuilder builder(1024 * 1024 * 10);
// 2. Iterate NSDictionary keys and values, converting them into
// flatcollections::StringIntDictionaryEntry structures
std::vector<Offset<StringIntDictionaryEntry>> entries;
for (NSString *key in self.dictionary.allKeys) {
int64_t value = (int64_t)[self.dictionary objectForKey:key].integerValue;
auto entry = CreateStringIntDictionaryEntryDirect(builder,
key.UTF8String,
value);
entries.push_back(entry);
}
// 3. Create flatcollections::StringIntDictionary
auto vector = builder.CreateVectorOfSortedTables(&entries);
auto dictionary = CreateStringIntDictionary(builder, vector);
// 4. Return flatbuffer as NSData
builder.Finish(dictionary);
NSData *data = [NSData dataWithBytes:builder.GetBufferPointer()
length:builder.GetSize()];
return data;
FlatBuffers: using Dictionary
#import "MMStringIntDictionary.h"
#import "schema_generated.h"
using namespace flatcollections;
@interface MMStringIntDictionary ()
@property (nonatomic, unsafe_unretained) const StringIntDictionary *dict;
@end
@implementation MMStringIntDictionary
- (instancetype)initWithFileURL:(NSURL *)fileURL
error:(NSError *__autoreleasing *)error {
NSData *data = [NSData dataWithContentsOfURL:fileURL
options:NSDataReadingMappedAlways error:error];
if (nil == data) {
return nil;
}
return [self initWithData:data];
}
Naive Bayes + FlatBuffers
var model: Model = [
"uk": [
"Вітаю": 1,
"вас": 2,
...
],
"en": [
"Hello": 1,
"dear": 2,
...
]
...
]
typealias Model = 

[String: MMStringIntDictionary]
typealias Model = 

[String: [String: Int]]
Results
Accuracy
Fits in 

6Mb RAM
Overall
NL Language
Recognizer
❌ 80,2% ✅ 👎
Core ML ✅ 96,6% ❌ 👎
Naive Bayes + 

FlatBuffers
✅ 97,8% ✅ 👍
Core ML
Pros:
• Dramatically simple
• Reliable
• Fast
Cons:
• No flexibility
• Limited ML tasks/
algorithms
Machine learning
• Not a rocket science in 2019
• Great competitive advantage
• Must-have skill for SW engineer in future
Thanks
Viacheslav Volodko
killobatt@gmail.com
t.me/killobatt
Attributions:
1. Create ML Docs:

https://developer.apple.com/documentation/createml/
creating_a_text_classifier_model
2. Naive Bayes Classifier: 

https://habr.com/ru/post/184574/
3. FlatBuffers

https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html
Code samples:
github.com/killobatt/TextClassification

More Related Content

Similar to Классифицируем текст в iOS без CoreML: как и зачем?

2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
Prof. Wim Van Criekinge
 
Sql exception and class notfoundexception
Sql exception and class notfoundexceptionSql exception and class notfoundexception
Sql exception and class notfoundexception
Rohit Singh
 
Java script
Java scriptJava script
Java script
Sanjay Gunjal
 
Rails and security
Rails and securityRails and security
Rails and security
Andrey Tokarchuk
 
Os Pruett
Os PruettOs Pruett
Os Pruett
oscon2007
 
Perl Teach-In (part 1)
Perl Teach-In (part 1)Perl Teach-In (part 1)
Perl Teach-In (part 1)
Dave Cross
 
Desafios do Profissionalismo Ágil
Desafios do Profissionalismo ÁgilDesafios do Profissionalismo Ágil
Desafios do Profissionalismo Ágil
Victor Hugo Germano
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Daniel Katz
 
Asp
AspAsp
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
pittaya
 
Sql Injection Adv Owasp
Sql Injection Adv OwaspSql Injection Adv Owasp
Sql Injection Adv Owasp
Aung Khant
 
Advanced SQL Injection
Advanced SQL InjectionAdvanced SQL Injection
Advanced SQL Injection
amiable_indian
 
working with PHP & DB's
working with PHP & DB'sworking with PHP & DB's
working with PHP & DB's
Hi-Tech College
 
Testing and validating distributed systems with Apache Spark and Apache Beam ...
Testing and validating distributed systems with Apache Spark and Apache Beam ...Testing and validating distributed systems with Apache Spark and Apache Beam ...
Testing and validating distributed systems with Apache Spark and Apache Beam ...
Holden Karau
 
It's Time to Get Ready for the Power of PL/SQL and JavaScript Combined
It's Time to Get Ready for the Power  of PL/SQL and JavaScript CombinedIt's Time to Get Ready for the Power  of PL/SQL and JavaScript Combined
It's Time to Get Ready for the Power of PL/SQL and JavaScript Combined
Rodrigo Mesquita
 
Charla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebCharla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo Web
Mikel Torres Ugarte
 
Php classes in mumbai
Php classes in mumbaiPhp classes in mumbai
Php classes in mumbai
aadi Surve
 
Into The Box 2018 - CBT
Into The Box 2018 - CBTInto The Box 2018 - CBT
Into The Box 2018 - CBT
Ortus Solutions, Corp
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
Dave Cross
 
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
singingfish
 

Similar to Классифицируем текст в iOS без CoreML: как и зачем? (20)

2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
Sql exception and class notfoundexception
Sql exception and class notfoundexceptionSql exception and class notfoundexception
Sql exception and class notfoundexception
 
Java script
Java scriptJava script
Java script
 
Rails and security
Rails and securityRails and security
Rails and security
 
Os Pruett
Os PruettOs Pruett
Os Pruett
 
Perl Teach-In (part 1)
Perl Teach-In (part 1)Perl Teach-In (part 1)
Perl Teach-In (part 1)
 
Desafios do Profissionalismo Ágil
Desafios do Profissionalismo ÁgilDesafios do Profissionalismo Ágil
Desafios do Profissionalismo Ágil
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
 
Asp
AspAsp
Asp
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Sql Injection Adv Owasp
Sql Injection Adv OwaspSql Injection Adv Owasp
Sql Injection Adv Owasp
 
Advanced SQL Injection
Advanced SQL InjectionAdvanced SQL Injection
Advanced SQL Injection
 
working with PHP & DB's
working with PHP & DB'sworking with PHP & DB's
working with PHP & DB's
 
Testing and validating distributed systems with Apache Spark and Apache Beam ...
Testing and validating distributed systems with Apache Spark and Apache Beam ...Testing and validating distributed systems with Apache Spark and Apache Beam ...
Testing and validating distributed systems with Apache Spark and Apache Beam ...
 
It's Time to Get Ready for the Power of PL/SQL and JavaScript Combined
It's Time to Get Ready for the Power  of PL/SQL and JavaScript CombinedIt's Time to Get Ready for the Power  of PL/SQL and JavaScript Combined
It's Time to Get Ready for the Power of PL/SQL and JavaScript Combined
 
Charla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo WebCharla EHU Noviembre 2014 - Desarrollo Web
Charla EHU Noviembre 2014 - Desarrollo Web
 
Php classes in mumbai
Php classes in mumbaiPhp classes in mumbai
Php classes in mumbai
 
Into The Box 2018 - CBT
Into The Box 2018 - CBTInto The Box 2018 - CBT
Into The Box 2018 - CBT
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
 
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
Don't RTFM, WTFM - Open Source Documentation - German Perl Workshop 2010
 

More from EatDog

macOS app development for iOS devs: expand your horizons
macOS app development for iOS devs: expand your horizonsmacOS app development for iOS devs: expand your horizons
macOS app development for iOS devs: expand your horizons
EatDog
 
Dependency Injections in Kotlin
Dependency Injections in KotlinDependency Injections in Kotlin
Dependency Injections in Kotlin
EatDog
 
Быстрый в имплементации и в работе мониторинг с использованием ELK
Быстрый в имплементации и в работе мониторинг с использованием ELKБыстрый в имплементации и в работе мониторинг с использованием ELK
Быстрый в имплементации и в работе мониторинг с использованием ELK
EatDog
 
Continuous integration / continuous delivery
Continuous integration / continuous deliveryContinuous integration / continuous delivery
Continuous integration / continuous delivery
EatDog
 
Как мы экспериментируем в больших микросервисных системах
Как мы экспериментируем в больших микросервисных системахКак мы экспериментируем в больших микросервисных системах
Как мы экспериментируем в больших микросервисных системах
EatDog
 
Отказоустойчивый Redis кластер
Отказоустойчивый Redis кластерОтказоустойчивый Redis кластер
Отказоустойчивый Redis кластер
EatDog
 
Кодстайл и насилие.
Кодстайл и насилие. Кодстайл и насилие.
Кодстайл и насилие.
EatDog
 
Refactor to Reactive With Spring 5 and Project Reactor
Refactor to Reactive With Spring 5 and Project ReactorRefactor to Reactive With Spring 5 and Project Reactor
Refactor to Reactive With Spring 5 and Project Reactor
EatDog
 
GraphQL: APIs the New Way.
GraphQL: APIs the New Way.GraphQL: APIs the New Way.
GraphQL: APIs the New Way.
EatDog
 
Большие проекты, архитектура и фреймворки.
Большие проекты, архитектура и фреймворки.Большие проекты, архитектура и фреймворки.
Большие проекты, архитектура и фреймворки.
EatDog
 
Microservices in a Wild.
Microservices in a Wild.Microservices in a Wild.
Microservices in a Wild.
EatDog
 
Dependency Rejection and TDD without Mocks
Dependency Rejection and TDD without MocksDependency Rejection and TDD without Mocks
Dependency Rejection and TDD without Mocks
EatDog
 
Стероиды для Дотнетчика
Стероиды для ДотнетчикаСтероиды для Дотнетчика
Стероиды для Дотнетчика
EatDog
 
Domain Driven Design – просто о сложном.
Domain Driven Design – просто о сложном.Domain Driven Design – просто о сложном.
Domain Driven Design – просто о сложном.
EatDog
 
OWASP: безопасное программирование на PHP.
OWASP: безопасное программирование на PHP.OWASP: безопасное программирование на PHP.
OWASP: безопасное программирование на PHP.
EatDog
 
Принципы Solid на практике
Принципы Solid на практикеПринципы Solid на практике
Принципы Solid на практике
EatDog
 
Mapbox GL: как работают современные векторные карты
Mapbox GL: как работают современные векторные картыMapbox GL: как работают современные векторные карты
Mapbox GL: как работают современные векторные карты
EatDog
 
Нельзя просто так взять и сделать версионирование API
Нельзя просто так взять и сделать версионирование APIНельзя просто так взять и сделать версионирование API
Нельзя просто так взять и сделать версионирование API
EatDog
 
API в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемость
API в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемостьAPI в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемость
API в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемость
EatDog
 
Выжить с помощью ООП. Максим Гопей
Выжить с помощью ООП. Максим ГопейВыжить с помощью ООП. Максим Гопей
Выжить с помощью ООП. Максим Гопей
EatDog
 

More from EatDog (20)

macOS app development for iOS devs: expand your horizons
macOS app development for iOS devs: expand your horizonsmacOS app development for iOS devs: expand your horizons
macOS app development for iOS devs: expand your horizons
 
Dependency Injections in Kotlin
Dependency Injections in KotlinDependency Injections in Kotlin
Dependency Injections in Kotlin
 
Быстрый в имплементации и в работе мониторинг с использованием ELK
Быстрый в имплементации и в работе мониторинг с использованием ELKБыстрый в имплементации и в работе мониторинг с использованием ELK
Быстрый в имплементации и в работе мониторинг с использованием ELK
 
Continuous integration / continuous delivery
Continuous integration / continuous deliveryContinuous integration / continuous delivery
Continuous integration / continuous delivery
 
Как мы экспериментируем в больших микросервисных системах
Как мы экспериментируем в больших микросервисных системахКак мы экспериментируем в больших микросервисных системах
Как мы экспериментируем в больших микросервисных системах
 
Отказоустойчивый Redis кластер
Отказоустойчивый Redis кластерОтказоустойчивый Redis кластер
Отказоустойчивый Redis кластер
 
Кодстайл и насилие.
Кодстайл и насилие. Кодстайл и насилие.
Кодстайл и насилие.
 
Refactor to Reactive With Spring 5 and Project Reactor
Refactor to Reactive With Spring 5 and Project ReactorRefactor to Reactive With Spring 5 and Project Reactor
Refactor to Reactive With Spring 5 and Project Reactor
 
GraphQL: APIs the New Way.
GraphQL: APIs the New Way.GraphQL: APIs the New Way.
GraphQL: APIs the New Way.
 
Большие проекты, архитектура и фреймворки.
Большие проекты, архитектура и фреймворки.Большие проекты, архитектура и фреймворки.
Большие проекты, архитектура и фреймворки.
 
Microservices in a Wild.
Microservices in a Wild.Microservices in a Wild.
Microservices in a Wild.
 
Dependency Rejection and TDD without Mocks
Dependency Rejection and TDD without MocksDependency Rejection and TDD without Mocks
Dependency Rejection and TDD without Mocks
 
Стероиды для Дотнетчика
Стероиды для ДотнетчикаСтероиды для Дотнетчика
Стероиды для Дотнетчика
 
Domain Driven Design – просто о сложном.
Domain Driven Design – просто о сложном.Domain Driven Design – просто о сложном.
Domain Driven Design – просто о сложном.
 
OWASP: безопасное программирование на PHP.
OWASP: безопасное программирование на PHP.OWASP: безопасное программирование на PHP.
OWASP: безопасное программирование на PHP.
 
Принципы Solid на практике
Принципы Solid на практикеПринципы Solid на практике
Принципы Solid на практике
 
Mapbox GL: как работают современные векторные карты
Mapbox GL: как работают современные векторные картыMapbox GL: как работают современные векторные карты
Mapbox GL: как работают современные векторные карты
 
Нельзя просто так взять и сделать версионирование API
Нельзя просто так взять и сделать версионирование APIНельзя просто так взять и сделать версионирование API
Нельзя просто так взять и сделать версионирование API
 
API в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемость
API в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемостьAPI в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемость
API в SAAS, с облаком и без: ресурсы, SLA, балансировка, расширяемость
 
Выжить с помощью ООП. Максим Гопей
Выжить с помощью ООП. Максим ГопейВыжить с помощью ООП. Максим Гопей
Выжить с помощью ООП. Максим Гопей
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 

Классифицируем текст в iOS без CoreML: как и зачем?

  • 1. Classifying a text to iOS without CoreML: how and why? Viacheslav Volodko
  • 2. SMS Filter • Filters SMS spam • Freemium model • ML-based checks on 
 Server-side • 4 localizations: – Ukrainian – English – German – Russian SMS Filter
  • 3. Why Language Detection? 1. Its preliminary step in SMS Spam detection 2. We can’t claim we filter spam for languages we don’t know.
  • 4. NLLanguageRecognizer NLLanguageRecognizer.dominantLanguage(for: "Hello, how are you doing?")?.name // English NLLanguageRecognizer.dominantLanguage(for: "Привіт, як твої справи")?.name // Українська NLLanguageRecognizer.dominantLanguage(for: "Привет, как твои дела?")?.name // Русский NLLanguageRecognizer.dominantLanguage(for: "Hallo, wie geht es dir?")?.name // Deutsch let realWorldSMS = """ VITAEMO Kompiuternum vidbirom na nomer,vipav pryz:AUTO-MAZDA SX-5 Detali: +38(095)857-58-64 abo na saiti: www.mir-europay.com.ua """ NLLanguageRecognizer.dominantLanguage(for: realWorldSMS)?.name // Hrvatski
  • 5. Why not NSStringTransform? let detransliteratedString = realWorldSMS.applyingTransform(StringTransform.latinToCyrillic, reverse: false) ?? "" // ВИТАЕМО Компиутернум видбиром на номер,випав // прыз:АУТО-МАЗДА СКС-5 // Детали: // +38(095)857-58-64 // або на саити: // ууу.мир-еуропаы.цом.уа NLLanguageRecognizer.dominantLanguage(for: detransliteratedString)?.name // русский
  • 6. Why not Detransliteration? 1. Get language-aware transliterator 2. Transliterate text onto Ukrainian, Russian 3. Make language prediction for original and transliterated texts 4. Get language with highest probability Ukrainian Russian None English Bulgarian
  • 7. Why not Detransliteration? let ukrainianTranslitText = "Privit, jak tvoji spravy?" let detransliteredUkrText = ukrainianTranslitText .applyingTransform(StringTransform.latinToCyrillic, reverse: false) ?? "" // Привит, йак твойи справы? 
 
 let englishText = "Hello, how are you doing?" let detransliteredEngText = englishText .applyingTransform(StringTransform.latinToCyrillic, reverse: false) ?? "" // Хелло, хоу аре ыоу доинг?
  • 8. So what now? Language detection = Text classification
  • 9. Let’s use Core ML + Create ML 1. Text classification models included: • maximum entropy model • conditional random field 2. It’s ready made solution
  • 10. Core ML + Create ML
  • 11. Prepare dataset func testPreprocessText() { // GIVEN let text = """ Вітаємо, dear@friend.com! Ми заборгували вам 5.00 гривень, і хотіли б повернути їх до 21.03.2019. Зателефонуйте нам на +38 (012) 345-67-89 або відвідайте example.com, щоб дізнатись деталі! """ // WHEN let preprocessedText = testedPreprocessor.preprocessedText(for: text) // THEN XCTAssertEqual(preprocessedText, "Вітаємо Ми заборгували вам гривень і хотіли б повернути їх " + "Зателефонуйте нам на або відвідайте щоб дізнатись деталі") } Вітаємо, dear@friend.com! Ми заборгували вам 5.00 гривень, і хотіли б повернути їх до 21.03.2019. Зателефонуйте нам на +38 (012) 345-67-89 або відвідайте example.com, щоб дізнатись деталі!
  • 12. Training Core ML model public struct DatasetItem { let text: String let label: String } public protocol Dataset { var items: [DatasetItem] } public static func trainCoreMLClassifier(with preprocessor: Preprocessor, on dataset: Dataset) throws -> MLTextClassifier { let data: [String: MLDataValueConvertible] = [ "text": dataset.items.map { preprocessor.preprocessedText(for: $0.text) }, "label": dataset.items.map { $0.label }, ] let trainingDataTable = try MLDataTable(dictionary: data) let mlClassifier = try MLTextClassifier(trainingData: trainingDataTable, textColumn: "text", labelColumn: "label") return mlClassifier }
  • 13. Using CoreML Model public func predictedLabel(for string: String) -> String? { let input = try? MLDictionaryFeatureProvider(dictionary: ["text": string]) let prediction = try? mlModel.prediction(from: input) return prediction?.featureValue(for: "label")?.stringValue } let language = predictedLabel(for: "Hello, how are you?") // en
  • 14. Evaluating CoreML Model Dataset Train data 80% Train data 20%
  • 15. Evaluating CoreML Model func testAccuracy() { // GIVEN let preprocessor = TrivialPreprocessor() let (trainDataset, testDataset) = self.testDatasets.languagesDataset.splitTestDataset(startPersentage: 0.8, endPersentage: 1.0) let classifier = CoreMLClassifier.train(with: preprocessor, on: trainDataset) // WHEN let testResults = classifier.test(on: testDataset) // THEN XCTAssertGreaterThan(testResults.accuracy, 1.0) // failed: ("0.9463667820069204") is not greater than ("1.0") - }
  • 16. Cross Validation Dataset Train data 80% Train data 20% Ukrainian English Russian German
  • 19. Step 2 Cross Validation Ukrainian English Russian German 0 30 60 90 120 Train Data Test Data Train Data
  • 20. Cross Validation func testCrossvalidateAdvancedPreprocessor() { // GIVEN let dataset = testDatasets.languagesDataset // WHEN let results = CoreMLClassifier.crossValidate(on: dataset, with: AdvancedPreprocessor()) // THEN XCTAssertGreaterThan(results.accuracy, 1.0) // failed: ("0.9661251296232285") is not greater than ("1.0") }
  • 21. func testCrossvalidateAdvancedPreprocessor() { // GIVEN let dataset = testDatasets.languagesDataset // WHEN let results = CoreMLClassifier.crossValidate(on: dataset, with: AdvancedPreprocessor()) // THEN XCTAssertGreaterThan(results.accuracy, 1.0) // failed: ("0.9661251296232285") is not greater than ("1.0") } func testAccuracy() { // GIVEN let testDataset = testDatasets.languagesDataset // WHEN let results = testedClassifier.test(on: testDataset) // THEN XCTAssertGreaterThan(results.accuracy, 1.0) // failed: ("0.8022435526772291") is not greater than ("1.0") } CoreML vs NLLanguageRecognizer NL Language Recognizer 80,2% 👎 Core ML 96,6% 👍
  • 23. What could go wrong?
  • 24. RAM Problem • CoreML model file size: 556 KB • Loading - breaks 6 mb RAM limit
  • 25. Memory-Mapped File • A memory-mapped file is a segment of virtual memory that has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. • Google’s FlatBuffers library • https://google.github.io/flatbuffers/
  • 26. Core ML + Memory-Mapped file
  • 27. Building our own classifier • Max Entropy • Conditional random field • Naive Bayes • Decision Tree • Many others 👉
  • 28. • Based on Bayes’ Theorem: Naive Bayes classifier P(A|B) = P(B|A)P(A) P(B) (1) <latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit><latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit><latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit><latexit sha1_base64="LxcAqgycoDA5BDJgw/3RThhQTDQ=">AAACKHicdVDLSgMxFM34rPVVdekmWIR2UzIVW12ItW5cVrAqtKVk0js1mMmMSUYoYz/Hjb/iRkQRt36JmVrxgR4InJxz703u8SLBtSHk1ZmYnJqemc3MZecXFpeWcyurpzqMFYMmC0Wozj2qQXAJTcONgPNIAQ08AWfe5WHqn12D0jyUJ2YQQSegfcl9zqixUje33/agz2UCV/FIGWYbhYObenGv7SvKkkahfnNQxFYrDtNLcZhtg+x9lXdzeVLaJdtbZYJJyd2uVKqVlLikuutit0RGyKMxGt3cY7sXsjgAaZigWrdcEplOQpXhTIAdH2uIKLukfWhZKmkAupOMFh3iTav0sB8qe6TBI/V7R0IDrQeBZysDai70by8V//JasfF3OgmXUWxAso+H/FhgE+I0NdzjCpgRA0soU9z+FbMLahMyNtusDeFzU/w/OS2XXJvMcTlfq4/jyKB1tIEKyEVVVENHqIGaiKFbdI+e0LNz5zw4L87rR+mEM+5ZQz/gvL0DLU+lcg==</latexit> Thomas Bayes
 1701-1761
  • 29. Naive Bayes classifier D = {d1, d2, ..., dm} F = {f1, f2, ..., fq} C = {c1, c2, ..., cr}<latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit><latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit><latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit><latexit sha1_base64="b62+SNs+ZFRUan+WIz+YaHx/O8o=">AAACrHicjVFda9swFJXdbe28r3R93ItYGGxQjJzRpH0YlGaMva2Dps2IjZHlq0RUll1JLgSTX9d/sLf9m8lOVrp1g12QOJx7z5F0lFVSGEvID8/fevDw0fbO4+DJ02fPX/R2X56bstYMJqyUpZ5m1IAUCiZWWAnTSgMtMgkX2eW47V9cgzaiVGd2WUFS0LkSXDBqHZX2buIM5kI1cFV3zCr4iD/gWAK3OG5wnkb7bhvs4zAMW1TgWIv5wjVXQQwqvyMM7ll9aq0azFsTfmvC06v/Uo87NWvF7FbMUn1Pm/b6JDwiB+8HBJMwOhgOR8MWRGR0FOEoJF310aZO0973OC9ZXYCyTFJjZhGpbNJQbQWT4OxrAxVll3QOMwcVLcAkTRf2Cr9xTI55qd1SFnfsXUVDC2OWReYmC2oX5s9eS/6tN6stP0waoaragmLrg3gtsS1x+3M4FxqYlUsHKNPC3RWzBdWUWfe/gQvh10vxv8H5IIxcMl8H/eOTTRw76BV6jd6iCI3QMfqMTtEEMe+d98Wbet/80D/zZ36yHvW9jWYP/VY+/wnql8w/</latexit> Text samples: Features (words): Classes (languages):
  • 30. Naive Bayes classifier Cmax = argmax c✏C P(c|d) = argmax c✏C P (d|c)P (c) P (d) = (1) <latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit><latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit><latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit><latexit sha1_base64="ic6Z0Rd6CSsyOn5jF7NRESaQTtc=">AAACrnicjVFdi9NAFJ3ErzV+VX305WIRtghlUtnWfVhY7IuPXbDdxSaEyeS2HXYyiTMTsWTz8/wDvvlvnGSzoqLghYHDufecuXMmLaUwltLvnn/r9p279w7uBw8ePnr8ZPD02coUlea45IUs9EXKDEqhcGmFlXhRamR5KvE8vZy3/fPPqI0o1Ae7LzHO2VaJjeDMOioZfA2iFLdC1fip6qgmAOgppjXbN7VsKYB5UufsSwMnEFUqc5Zoax5haYQsFMwbcPNb6EagXhzyq2zUQKe8qf9RRhvNuJNnV3zkTEZNi53RSbsWqqxfKujwz52TwZCOj+nRmwkFOg6PptPZtAUhnR2HEI5pV0PS1yIZfIuyglc5KsslM2Yd0tLGztwKLtHZVwZLxi/ZFtcOKpajiesu7gZeOSaDTaHdURY69ldFzXJj9nnqJnNmd+bPXkv+rbeu7OZtXAtVVhYVv75oU0mwBbR/B5nQyK3cO8C4Fm5X4DvmArMu1MCFcPNS+DdYTcahS+ZsMjx918dxQF6Ql+SQhGRGTsl7siBLwr3X3pn30Vv71F/5sZ9cj/per3lOfit/9wPtns8n</latexit> = argmax c✏C P(d|c)P(c) = argmax c✏C ln(P(d|c)P(c)) (1) <latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit><latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit><latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit><latexit sha1_base64="tkiG4ZUT+6hOEYX+W8p3aQApHQE=">AAACcXicjVBdSxtBFJ3daqtrP2LrS5HKxSAkCGE2YlIfBKkvfUzBqJANYXb2Jg7Ozq4zs9Kw3Xd/n2/+ib70D3QSU2rFggcGDufew7lz4lwKYym98/wXS8svX62sBmuv37x9V1t/f2qyQnPs80xm+jxmBqVQ2LfCSjzPNbI0lngWXx7P5mfXqI3I1Imd5jhM2USJseDMOmlUuwmiGCdClXhVzKUqgEOICpU4E9qSR5gbITMFxxWUTE8gZd8rgLLXSH7wZq/Bm9WzDJFUjYemZhChSv7Gjmp12jqg+3ttCrQV7nc63c6MhLR7EELYonPUyQK9Ue02SjJepKgsl8yYQUhzO3SJVnCJVRAVBnPGL9kEB44qlqIZlvPGKthxSgLjTLunLMzVh46SpcZM09htpsxemMezmfjUbFDY8edhKVReWFT8PmhcSLAZzOqHRGjkVk4dYVwLdyvwC6YZt66+wJXw56fwf3LaboWumW/t+tGXRR0rZJNskwYJSZccka+kR/qEk5/ehvfJ2/J++R998LfvV31v4flA/oG/+xuoiLv4</latexit>
  • 31. Assumptions: • Order of words does not matter • Probabilities of words are independent: Naive Bayes classifier P(fi fj|c) = P(fi|c)P(fj|c)<latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit><latexit sha1_base64="VXNGbSPDGJW1KI7STe5hAY0Bb2g=">AAACInicbZDLSsNAFIZPvNZ4q27dDBZBNyVxoxtB0IXLCtYKTQmT6UkdnUzizEQosS/kxlcRwYUiPouTtOClHhjm5/vncOb8USa4Np735szMzs0vLNaW3OWV1bX1+sbKpU5zxbDNUpGqq4hqFFxi23Aj8CpTSJNIYCe6PSn9zj0qzVN5YYYZ9hI6kDzmjBqLwvqpG0Q44LLAu7xCI7e1G4c8YDQjcXjzwPbIEamQleVdIjdA2f9uCesNr+lVRaaFPxENmFQrrL8E/ZTlCUrDBNW663uZ6RVUGc4Ejtwg15hRdksH2LVS0gR1r6i2HZEdS/okTpU90pCK/uwoaKL1MInsy4Saa/3XK+F/Xjc38WGv4DLLDUo2HhTngpiUlNGRPlfIjBhaQZni9q+EXVNFmbEBuzYE/+/K0+Jyv+l7Tf/cgxpswTbsgg8HcAxn0II2MHiEZ3iDd+fJeXU+xnHNOJPcNuFXOZ9fD86l8A==</latexit><latexit sha1_base64="Dkj1pEItaaly4pTiWGp046C2Hkc=">AAACInicdZDPahsxEMZn06R1t2nj9pqLqCk4F6NNiR0fCoHmkKML9R/wmkUrzzpqtNqNpC2YjV+ol75KCeSQEPIs1a5TkpR2QOjj92kYzRfnUhhL6bW38Wxz6/mLxkv/1fbrNzvNt9sjkxWa45BnMtOTmBmUQuHQCitxkmtkaSxxHJ99rvzxd9RGZOqrXeY4S9lCiURwZh2Kmsd+GONCqBLPixqt/EE7iUTIWU6S6NsF3yOfSI2crO4K+SGq+UNL1GzRTp8efNynhHaCg263161EQHv9gAQdWlcL7msQNS/DecaLFJXlkhkzDWhuZyXTVnCJKz8sDOaMn7EFTp1ULEUzK+ttV+SDI3OSZNodZUlNH3eULDVmmcbuZcrsqfnbq+C/vGlhk8NZKVReWFR8PSgpJLEZqaIjc6GRW7l0gnEt3F8JP2WacesC9l0IfzYl/xej/U7gkvlCoQG78B7aEEAPjuAEBjAEDj/gF1zDjffTu/Ju13FtePe5vYMn5d39BnCppjU=</latexit><latexit sha1_base64="OSnWW1mYol7+XaMtstNB1bX7jDw=">AAACLXicdZDNahsxFIU1btqm06Zx22U2IqbgbozGwXa9CJi2iyxdiH/AYwaNfMdWrNFMJE3BTPxC3fRVSqGLlJJtXqOasUN+SC8IHb57D9I9YSq4NoRcOpUnO0+fPd994b58tfd6v/rm7VAnmWIwYIlI1DikGgSXMDDcCBinCmgcChiFy89Ff/QNlOaJPDWrFKYxnUsecUaNRUH1i+uHMOcyh/OsRGu3X48C7jOa4ig4u2Af8DEukZXFXSDXBzm7tQTVGml0SeuoSTBpeK12u9MuhEc6XQ97DVJWDW2rH1R/+bOEZTFIwwTVeuKR1ExzqgxnAtaun2lIKVvSOUyslDQGPc3Lbdf4vSUzHCXKHmlwSe86chprvYpDOxlTs9APewV8rDfJTPRxmnOZZgYk2zwUZQKbBBfR4RlXwIxYWUGZ4vavmC2ooszYgF0bws2m+P9i2Gx4NpmvpNb7tI1jFx2gQ1RHHuqgHjpBfTRADH1HP9El+uP8cH47f52rzWjF2XreoXvlXP8DnrOnyQ==</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit><latexit sha1_base64="+RsQije6rquepISX9OBBMHoLryE=">AAACLXicdZBPT9swGMYdtrEuDCjjyMWiQoJL5RTRjgNSBRw4dhJtkZoqctw3xdRxgu0gVaFfiMu+yoTEATTtuq8xJ3Tij+CVLD/6ve8j+33CVHBtCLl3Fj58/LT4ufLFXfq6vLJaXfvW00mmGHRZIhJ1FlINgkvoGm4EnKUKaBwK6IeTo6LfvwKleSJPzTSFYUzHkkecUWNRUD12/RDGXOZwmZVo5na2o4D7jKY4Ci6u2Q4+wCWysrgL5PogR0+WoFoj9X2yt9sgmNS9vWaz1SyER1r7HvbqpKwamlcnqN76o4RlMUjDBNV64JHUDHOqDGcCZq6faUgpm9AxDKyUNAY9zMttZ3jLkhGOEmWPNLikzx05jbWexqGdjKk51697BXyrN8hM9H2Yc5lmBiR7fCjKBDYJLqLDI66AGTG1gjLF7V8xO6eKMmMDdm0I/zfF74teo+7ZZH40au3DeRwVtIE20TbyUAu10QnqoC5i6Ab9Qvfowfnp3Dm/nT+PowvO3LOOXpTz9x+fU6fL</latexit>
  • 32. Naive Bayes classifier Cmax = argmax c✏C ln(P(d|c)P(c)) (1) <latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit><latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit><latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit><latexit sha1_base64="MM40NlZJsOXbxp9nSyy8/iYqQu4=">AAACR3icdVDPT9swGHXKgC78WIEjF2vVpPZSOUW0cECq6IVjJ62A1JTKcb4Wq46T2Q6iCvnvuHDlxr/AhQPTtOPc0Ak2sSdZenrvff7sFySCa0PIg1Na+rC8slr+6K6tb2x+qmxtn+o4VQz6LBaxOg+oBsEl9A03As4TBTQKBJwF0+7cP7sCpXksv5lZAsOITiQfc0aNlUaVC9cPYMJlBt/TQsrd7iiL6HWOj7CfytDOgsmYD4nmIpa4m+OMqgkuItgXspb1auENq/dqrJ7XXR9k+HrZqFIljUOyv9ckmDS8/Var3ZoTj7QPPew1SIEqWqA3qtz7YczSCKRhgmo98Ehihnaj4UxA7vqphoSyKZ3AwFJJI9DDrOghx1+sEuJxrOyRBhfq24mMRlrPosAmI2ou9b/eXHzPG6RmfDDMuExSA5K9LBqnApsYz0vFIVfAjJhZQpni9q2YXVJFmbH1ubaEPz/F/yenzYZnm/narHaOF3WU0S76jGrIQ23UQSeoh/qIoVv0iJ7RD+fOeXJ+Or9eoiVnMbOD/kLJ+Q0OXbH/</latexit> = argmax c✏C ln P(c) nY i=1 P(fi|c) <latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit><latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit><latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit><latexit sha1_base64="fDhEjbdSV57hQuIFeozo0kmi2HU=">AAACUXicdVDLbhMxFL0ZXmXKI5QlG4sIqWwiT1CTdlGpajcsg0TaSpkw8njuJFY99mB7KiJ3fpFFWfEfbFiAcNIgHoIjWTo657588loK6yj93Ilu3b5z997W/Xj7wcNHj7tPdk6tbgzHCddSm/OcWZRC4cQJJ/G8NsiqXOJZfnGy8s8u0Vih1Vu3rHFWsbkSpeDMBSnrLuI0x7lQHt83a6mNySFJG1WEJnSep1hbIbUiJy3xzMxJxT60hKRS+fEuf9mmtdFF5sVh0r7zqg1imYmrYMQpquLX2Kzbo/0DuvdqQAntJ3vD4Wi4IgkdHSQk6dM1erDBOOtep4XmTYXKccmsnSa0drNwghNcYhjfWKwZv2BznAaqWIV25teJtORFUApSahOecmSt/t7hWWXtsspDZcXcwv7trcR/edPGlfszL1TdOFT8ZlHZSOI0WcVLCmGQO7kMhHEjwq2EL5hh3IU84xDCz5+S/5PTQT8JybwZ9I6ON3FswTN4DruQwAiO4DWMYQIcPsIX+AbfO586XyOIopvSqLPpeQp/INr+AaJxtPk=</latexit> = argmax c✏C ln P(f1, f2, ..., fn|c)P(c) <latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit><latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit><latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit><latexit sha1_base64="IauyqQNbYiEzTAKKQWjk0104QBc=">AAACT3icdVDLattAFB25aZuoLzddZnOJKTgQxMghdrMIhGaTpQt1ErCMGI2unCGjkTozKjGK/rCbZNff6KaLhtCR69IH7YGBwzn3NScppTCW0s9e58Haw0eP1zf8J0+fPX/Rfbl5aopKc5zwQhb6PGEGpVA4scJKPC81sjyReJZcHrf+2UfURhTqvV2UOMvZXIlMcGadFHczP0pwLlSNH6ql1PhwCFGlUteEtuYRlkbIQsFxAzXTc8jZVQMQSVWP+1kc7kIWD3YhCIKWqWu+M+7zncaPUKW/hsbdHg0O6P7egAINwv3hcDRsSUhHByGEAV2iR1YYx93bKC14laOyXDJjpiEt7cwdYAWX6MZXBkvGL9kcp44qlqOZ1cs8GnjtlBSyQrunLCzV3ztqlhuzyBNXmTN7Yf72WvFf3rSy2ZtZLVRZWVT8x6KskmALaMOFVGjkVi4cYVwLdyvwC6YZty5N34Xw86fwf3I6CEKXzLtB7+jtKo51skW2SZ+EZESOyAkZkwnh5BP5Qr6RO+/G++rdd1alHW9FXpE/0Nn4DmMMskk=</latexit> = argmax c✏C ln P(c) + nX i=1 ln P(fi|c) <latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit><latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit><latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit><latexit sha1_base64="Ay2FqcErOUoMhA37x7UKhQSfw+M=">AAACVnicdVBbaxNBGJ3dWlvX26qPvgwGISKE2ZQm9qFQ7IuPEUxbyMZldvbbdOjM7DoXMYz7J9uX9qf4Ik7SiBf0wMDhnO82p2wFN5aQmyjeurN9d2f3XnL/wcNHj9MnT09M4zSDKWtEo89KakBwBVPLrYCzVgOVpYDT8uJ45Z9+Bm14oz7YZQtzSReK15xRG6QilXkJC648fHJrpUvwIc6dqkIPWM9yaA0XjcLHHfZUL7CkXzqMk1woP+mzV93rJDdOFp4fZt1HrzofHDzp1wX/GtwkB1X9Gl6kPTI4IPt7Q4LJINsfjcajFcnI+CDD2YCs0UMbTIr0Mq8a5iQoywQ1ZpaR1s7DIZYzAWG8M9BSdkEXMAtUUQlm7texdPhlUCpcNzo8ZfFa/b3DU2nMUpahUlJ7bv72VuK/vJmz9Zu556p1FhS7XVQ7gW2DVxnjimtgViwDoUzzcCtm51RTZkOqSQjh50/x/8nJcJCFZN4Pe0dvN3HsoufoBeqjDI3REXqHJmiKGLpC36I42oquo+/xdrxzWxpHm55n6A/E6Q+NGbRY</latexit>
  • 33. Naive Bayes classifier P(fi|cj) = count(fi, cj) Pq k=1 count(fk, cj)<latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit><latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit><latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit><latexit sha1_base64="p+Seho2wkSC+FICnnUZxnXLttRg=">AAACT3icdVFPT9swHHU62KCDrRvHXSyqSSChyulEOw5IiF04dhIFpKaLHPeX4tVxgv9Mqrx8Qy7stq+xC4chNKdkY6DtSZae3nv+2X5OCsG1IeR70HiytPz02cpq8/na+ouXrVevT3RuFYMhy0WuzhKqQXAJQ8ONgLNCAc0SAafJ7EPln34BpXkuj828gHFGp5KnnFHjpbiVRglMuXRwYRdK2RxspTH/yuLP23gfR6miDDuWW2kqfQdXRoldpG0Wu9l+WH5yF+WfwKwOlM0I5OR+atxqk84e2X3XJZh0wt1er9+rSEj6eyEOO2SBNqoxiFvfoknObAbSMEG1HoWkMGNHleFMgB9vNRSUzegURp5KmoEeu0UfJX7rlQlOc+WXNHih/r3D0UzreZb4ZEbNuX7sVeK/vJE16fux47KwBiS7Oyi1ApscV+XiCVfAjJh7Qpni/q6YnVNfofFf0PQl/H4p/j856XZC38zHbvvgsK5jBb1Bm2gLhaiPDtARGqAhYugS/UA/0U1wFVwHt4062ghqsoEeoLH6C0SvtK4=</latexit> P(fi|cj) = count(fi, cj) + z Pq k=1 count(fk, cj) + zq<latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit><latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit><latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit><latexit sha1_base64="F91K35k3VMSi+TbXU2SEZMxvrVg=">AAACW3icdVFdb9MwFHUyYCUMVkA88WJRIQ2BKqfTWvYwaYIXHotEt0lNiRz3pvPqOKk/kIrJn+Rpe+CvIJyswEBwJUtH55x7r32cVYJrQ8hVEG7dun1nu3M3urdz/8Fu9+GjE11axWDCSlGqs4xqEFzCxHAj4KxSQItMwGm2fNvop59AaV7KD2ZdwaygC8lzzqjxVNpVUZRksODSwcq2XB2N9/KUf2HpxQt8hJNcUYYdK600Df8Kt8JL/LnGLtG2SN3yKK4/ulX9y7T8bVrVdZSAnN8Yn3Z7pH9IDvYHBJN+fDAcjoYNiMnoMMZxn7TVQ5sap92vybxktgBpmKBaT2NSmZmjynAmwM+3GirKlnQBUw8lLUDPXJtNjZ97Zo7zUvkjDW7Zmx2OFlqvi8w7C2rO9d9aQ/5Lm1qTv545LitrQLLrRbkV2JS4CRrPuQJmxNoDyhT3d8XsnPowjf+OJoSfL8X/ByeDfuyTeT/oHb/ZxNFBT9EztIdiNELH6B0aowli6BJ9D7aDTvAt3AqjcOfaGgabnsfojwqf/ABdr7OD</latexit> Laplace smoothing:
  • 34. Naive Bayes classifier Building model: typealias Model = [String: [String: Int]] var model: Model = [ "uk": [ "Вітаю": 1, "вас": 2, ... ], "en": [ "Hello": 1, "dear": 2, ... ] ... ]
  • 35. Naive Bayes classifier Building model: for label in labels { for text in trainTextsForLabel[label] { let words = preprocessor.preprocess(text: text) for word in words { model[label][word] += 1 } } }
  • 36. Naive Bayes classifier Predicting label of text: 1. Preprocess text
 2. Split onto words
 3. Calculate probability of each word in label «Зателефонуйте нам на +38 (012) 345-67-89» «Зателефонуйте нам на» ["Зателефонуйте", "нам", "на"] ["uk": ["Зателефонуйте": 0.84, "нам": 0.1, "на": 0.1], "ru": ["Зателефонуйте": 0.0, "нам": 0.1, "на": 0.1], ...]
  • 37. Naive Bayes classifier Predicting label of text: 4. Calculate probability of label:
 
 
 
 5. Return label with max probability: [ "uk": -180.3, "ru": -234.5, "en": -2004.3, ... ] "uk"
  • 38. Naive Bayes classifier Cross Validation: func testCrossvalidate() { // GIVEN let dataset = self.testDatasets.testDataset // WHEN let results = NaiveBayesClassifier.crossValidate(on: dataset, with: TrivialPreprocessor()) // THEN XCTAssertGreaterThan(results.accuracy, 1.0) // 0.9782382220164371 } NL Language Recognizer 80,2% 👎 Core ML 96,6% 👍 Naive Bayes 97,8% 👍
  • 39. Objective-C 
 Wrapper Framework Naive Bayes + FlatBuffers Schema File FlatBuffers schema compiler C++ File
  • 40. Naive Bayes + FlatBuffers var model: Model = [ "uk": [ "Вітаю": 1, "вас": 2, ... ], "en": [ "Hello": 1, "dear": 2, ... ] ... ] schema.fbs: namespace flatcollections; table StringIntDictionary { entries:[StringIntDictionaryEntry]; } table StringIntDictionaryEntry { key:string (key); value:int64; } root_type StringIntDictionary;
  • 41. FlatBuffers: Create Dictionary #import "schema_generated.h" @property (nonatomic, copy) NSDictionary<NSString *, NSNumber *> *dictionary; - (NSData *)serialize { } // 1. Alloc 10MB buffer on stack FlatBufferBuilder builder(1024 * 1024 * 10); // 2. Iterate NSDictionary keys and values, converting them into // flatcollections::StringIntDictionaryEntry structures std::vector<Offset<StringIntDictionaryEntry>> entries; for (NSString *key in self.dictionary.allKeys) { int64_t value = (int64_t)[self.dictionary objectForKey:key].integerValue; auto entry = CreateStringIntDictionaryEntryDirect(builder, key.UTF8String, value); entries.push_back(entry); } // 3. Create flatcollections::StringIntDictionary auto vector = builder.CreateVectorOfSortedTables(&entries); auto dictionary = CreateStringIntDictionary(builder, vector); // 4. Return flatbuffer as NSData builder.Finish(dictionary); NSData *data = [NSData dataWithBytes:builder.GetBufferPointer() length:builder.GetSize()]; return data;
  • 42. FlatBuffers: using Dictionary #import "MMStringIntDictionary.h" #import "schema_generated.h" using namespace flatcollections; @interface MMStringIntDictionary () @property (nonatomic, unsafe_unretained) const StringIntDictionary *dict; @end @implementation MMStringIntDictionary - (instancetype)initWithFileURL:(NSURL *)fileURL error:(NSError *__autoreleasing *)error { NSData *data = [NSData dataWithContentsOfURL:fileURL options:NSDataReadingMappedAlways error:error]; if (nil == data) { return nil; } return [self initWithData:data]; }
  • 43. Naive Bayes + FlatBuffers var model: Model = [ "uk": [ "Вітаю": 1, "вас": 2, ... ], "en": [ "Hello": 1, "dear": 2, ... ] ... ] typealias Model = 
 [String: MMStringIntDictionary] typealias Model = 
 [String: [String: Int]]
  • 44. Results Accuracy Fits in 
 6Mb RAM Overall NL Language Recognizer ❌ 80,2% ✅ 👎 Core ML ✅ 96,6% ❌ 👎 Naive Bayes + 
 FlatBuffers ✅ 97,8% ✅ 👍
  • 45. Core ML Pros: • Dramatically simple • Reliable • Fast Cons: • No flexibility • Limited ML tasks/ algorithms
  • 46. Machine learning • Not a rocket science in 2019 • Great competitive advantage • Must-have skill for SW engineer in future
  • 47. Thanks Viacheslav Volodko killobatt@gmail.com t.me/killobatt Attributions: 1. Create ML Docs:
 https://developer.apple.com/documentation/createml/ creating_a_text_classifier_model 2. Naive Bayes Classifier: 
 https://habr.com/ru/post/184574/ 3. FlatBuffers
 https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html Code samples: github.com/killobatt/TextClassification