Applications of the PMP. Cell Formation in Group TechnologySSA KPI
AACIMP 2010 Summer School lecture by Dmitry Krushinsky. "Applied Mathematics" stream. "The p-Median Problem and Its Applications" course. Part 5.
More info at http://summerschool.ssa.org.ua
Probabilistic Matrix Factorization (PMF)
Bayesian Probabilistic Matrix Factorization (BPMF) using
Markov Chain Monte Carlo (MCMC)
BPMF using MCMC – Overall Model
BPMF using MCMC – Gibbs Sampling
Applications of the PMP. Cell Formation in Group TechnologySSA KPI
AACIMP 2010 Summer School lecture by Dmitry Krushinsky. "Applied Mathematics" stream. "The p-Median Problem and Its Applications" course. Part 5.
More info at http://summerschool.ssa.org.ua
Probabilistic Matrix Factorization (PMF)
Bayesian Probabilistic Matrix Factorization (BPMF) using
Markov Chain Monte Carlo (MCMC)
BPMF using MCMC – Overall Model
BPMF using MCMC – Gibbs Sampling
محاضرة ألقيت بتنظيم من مجموعة برمج @parmg_sa
https://www.meetup.com/parmg_sa/events/238339639/
في الرياض، مقر حاضنة بادر. بتاريخ 20 جمادى الآخر 1438هـ، الموافق 18 مارس 2017
The slides include a condensed explanation of Transformers and their advantages in compared to CNN and RNN. The presentation begins with a brief explanation of Transformers. Then, the advantages and disadvantages of Transformers relative to CNNs and RNNs are discussed. The attention mechanism is next presented, followed by an illustration of the structure of the document "Attention all you need."
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 1.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
"Stochastic Optimal Control and Reinforcement Learning", invited to speak at the Nonlinear Dynamic Systems class taught by Prof. Frank Chong-woo Park, Seoul National University, December 4, 2019.
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field.
The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
محاضرة ألقيت بتنظيم من مجموعة برمج @parmg_sa
https://www.meetup.com/parmg_sa/events/238339639/
في الرياض، مقر حاضنة بادر. بتاريخ 20 جمادى الآخر 1438هـ، الموافق 18 مارس 2017
The slides include a condensed explanation of Transformers and their advantages in compared to CNN and RNN. The presentation begins with a brief explanation of Transformers. Then, the advantages and disadvantages of Transformers relative to CNNs and RNNs are discussed. The attention mechanism is next presented, followed by an illustration of the structure of the document "Attention all you need."
Paper Study: Melding the data decision pipelineChenYiHuang5
Melding the data decision pipeline: Decision-Focused Learning for Combinatorial Optimization from AAAI2019.
Derive the math equation from myself and match the same result as two mentioned CMU papers [Donti et. al. 2017, Amos et. al. 2017] while applying the same derivation procedure.
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 1.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
"Stochastic Optimal Control and Reinforcement Learning", invited to speak at the Nonlinear Dynamic Systems class taught by Prof. Frank Chong-woo Park, Seoul National University, December 4, 2019.
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentElectronic Arts / DICE
Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field.
The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
1. 2016 Spring Intern
@ Treasure Data
2016/4/3 - 2016/6/17
Part 1: Field-Aware Factorization Machines
Part 2: Kernelized Passive-Aggressive
Part 3: ChangeFinder
2. whoami
Sotaro Sugimoto (杉本 宗太郎)
• U. Tokyo B.S. Physics (2016)
• Georgia Tech M.S. Computational Science & Engineering (2016-2018)
• https://github.com/L3Sota
Facebook (Look for the dog)
3. What will this talk be about?
• Model-based Predictors
• “Reading the future”
• Estimating the value of an important variable
• Determining whether or not some action will occur
• Statistical Anomaly Detection
• The computer monitors a resource and tells us when “something unnatural”
happens
4. Part 1: Field-Aware Factorization Machines
• What we want to achieve
• SVM to FFM and everything in between
• What’s a Field?
• Pros and Cons
5. FFM: what we want to achieve
• Prediction: Data goes in, predictions come out
• CTR
• Shopping recommendations
𝑦 = 𝜙(𝒙)
• Regression & Classification
• Regression: Results are real-valued (𝑦, 𝑦 ∈ ℝ)
• Classification: Results are binary (𝑦, 𝑦 ∈ {0,1} and 𝑦, 𝑦 ∈ {±1} are common)
Prediction result
Prediction function
Input vector
6. Click-Through Rate (CTR) Prediction
• Will user X click my ad? What percentage of users will click my ad? ->
Find the probability that a target of an ad will click through.
Input:
• User ID
• Past ads clicked
• Past conversions made
• Mouse movements
• Favorite websites
Output:
• Whether or not a click-through will occur by user X during a particular session
• Classification
7. Shopping Recommendations
• Will user X buy this product? What products would this user like to see
next? -> Predict the rating that the user would give to unseen items.
Input:
• User ID
• Past items looked at
• Past items bought
• Past items rated
• Mouse movements
• Favorite product categories
Output:
• Expected ratings for each item (i.e. a list of recommended items when ordered by
rating from highest to lowest)
• Regression
• This is not to say that you can’t make a similar classification problem
10. FM’s Roots
• FM is a generalized model.
The point of FM was to combine Linear Classification…
• Support Vector Machines (SVM)
…with Matrix-based Approaches.
• Single Value Decomposition (SVD)
• Matrix Factorization (MF)
11. Support Vector Machines
• Classification
1. Find a plane splitting category 1 from category 2 (H2, H3)
2. Maximize the distance from both categories (H3)
3. New data can be classified with this plane
Image from Wikipedia: https://commons.wikimedia.org/wiki/File:Svm_separating_hyperplanes_(SVG).svg
12. Support Vector Machines
• Calculation specifics
• Plane is denoted by a vector 𝒘 (the normal vector)
• The prediction function is given by 𝜙 𝑥 = 𝒘, 𝒙 − 𝑏 .
• , is the inner product.
• When using a kernel, the function becomes 𝜙 𝑥 = 𝑖 𝛼𝑖 𝐾 𝒙, 𝒙𝑖 − 𝑏
• e.g. d-dimensional Polynomial Kernel: 𝜙 𝑥 = 𝑖 𝛼𝑖( 𝒙, 𝒙𝑖 + 1) 𝑑
−𝑏
• New data can be classified with 𝑠𝑔𝑛 𝒘, 𝒙 − 𝑏 ∈ {−1, +1}
Image originally from Wikipedia, modified: https://commons.wikimedia.org/wiki/File:Normal_vectors2.svg
13. FFM’s Roots
• FM is a generalized model.
The point of FM was to combine Linear Classification…
• Support Vector Machines (SVM)
…with Matrix-based Approaches.
• Single Value Decomposition (SVD)
• Matrix Factorization (MF)
14. Matrix-based
approaches
The difference between SVD and MF (besides the diagonal matrix S) is that MF ignores zero entries in the matrix during factorization, which tends to improve performance.
Image from Qiita: http://qiita.com/wwacky/items/b402a1f3770bee2dd13c
16. Factorization Machines
• No easy geometric representation
• The prediction function is given by
𝜙 𝑥 = 𝑤0 + 𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 + 𝑖=1
𝑛
𝑗=𝑖+1
𝑛
𝒗𝑖, 𝒗𝑗 𝑥𝑖 𝑥𝑗 .
• Interactions between components are implicitly modeled with
factorized vectors
• For each 𝒙𝑖, define a vector 𝒗𝑖 with 𝐹 < 𝑛 dimensions.
• 𝒗𝑖, 𝒗𝑗 is used instead of 𝑤𝑖,𝑗. Recall Poly2 is 𝜙2 𝒙 = 𝜙1 𝑥 + 𝑖<𝑗
𝑛
𝑤𝑖,𝑗 𝑥𝑖 𝑥𝑗.
• But wait…
• This is 𝑂(𝐹𝑛2)
18. Factorization Machines
Substitute in the previous calculations:
𝜙 𝑥 = 𝑤0 +
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 +
1
2 𝑘=1
𝐹
𝑖=1
𝑛
𝑣𝑖,𝑘 𝑥𝑖
2
−
𝑖=1
𝑛
𝑣𝑖,𝑘
2
𝑥𝑖
2
Works wonders on sparse data!
• Factorization allows implicit interaction modeling, i.e. we can infer interaction
strengths from similar data
• Factorization vectors only depend on one data point so calculations are 𝑂(𝐹𝑛).
• In fact, with a sparse representation the complexity is 𝑶(𝑭𝒎), where 𝑚 is the average
number of non-zero components.
But wait…
• Not as useful for dense data (use SVM for dense data classifications)
𝑂(1) 𝑂(𝑛) 𝑂 𝐹𝑛
𝜙 𝑥 = 𝑤0 +
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 +
𝑖=1
𝑛
𝑗=𝑖+1
𝑛
𝒗𝑖, 𝒗𝑗 𝑥𝑖 𝑥𝑗
19. Field-Aware Factorization Machines
• A more powerful FM
• The prediction function is given by
𝜙 𝑥 = 𝑤0 + 𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 + 𝑖<𝑗
𝑛
𝒗𝑖,𝛽, 𝒗𝑗,𝛼 𝑥𝑖 𝑥𝑗 .
• Wait, what changed?
• There is an additional subscript on 𝒗, known as the field.
• Note: The constant and linear terms remain the same.
21. Field-Aware Factorization Machines (cont.)
𝜙 𝑥 = 𝑤0 +
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 +
𝑖<𝑗
𝑛
𝒗𝑖,𝛽, 𝒗𝑗,𝛼 𝑥𝑖 𝑥𝑗
• We specify a 𝒗 based on the current feature 𝑖 of the input vector 𝒙
and the field 𝛽 of the other feature 𝑗.
• In other words, for each pair of features (𝑖, 𝑗) we can specify two
vectors 𝒗, one where we use the field 𝛼 of 𝑖 (i.e. 𝒗𝑗,𝛼), and another
where we use the field 𝛽 of 𝑗 (i.e. 𝒗𝑖,𝛽).
22. Worked Example: 1 Data Point
• Sotaro went to see Zootopia!
• I haven’t actually seen Zootopia yet.
• Let’s guess what his rating will be. -> Regression
Field Abbrev. Feature Abbrev. Value
Users u L3Sota s 1
Movies m Zootopia z 1
Genre g Comedy c 1
Genre g Drama d 1
Price pp Price p 1200
23. Linear Model
Field Abbrev. Feature Abbrev. Value
Users u L3Sota s 1
Movies m Zootopia z 1
Genre g Comedy c 1
Genre g Drama d 1
Price pp Price p 1200
𝜙1 𝒙 = 𝑤0 + 𝑤𝑠 𝑥 𝑠 + 𝑤𝑧 𝑥 𝑧 + 𝑤𝑐 𝑥 𝑐 + 𝑤 𝑑 𝑥 𝑑 + 𝑤 𝑝 𝑥 𝑝
= 1𝑤𝑠 + 1𝑤𝑧 + 1𝑤𝑐 + 1𝑤 𝑑 + 1200𝑤 𝑝
• A single vector is sufficient to hold all the weights.
27. Pros and Cons: FFM
• Pros
• Higher prediction accuracy (i.e. the model is more expressive than FM)
• Cons
• 𝑂(𝐹𝑓𝑚) computation complexity (𝑓: number of fields)
𝜙 𝑥 = 𝑤0 + 𝑖=1
𝑛
𝑤𝑖 𝑥𝑖 + 𝑖<𝑗
𝑛
𝒗𝑖,𝛽, 𝒗𝑗,𝛼 𝑥𝑖 𝑥𝑗
where 𝛽 is the field of 𝑗 and 𝛼 is the field of 𝑖
• Can’t split the inner product into two independent sums! -> Double loop
• FM was 𝑂(𝐹𝑚).
• Data structures need to understand the field of each component (feature) in
the input vector. -> More memory consumption
28. Status of FFM within Hivemall
• Pull request merged (#284)
• https://github.com/myui/hivemall/pull/284
• Will probably be in next release(?)
• train_ffm(array<string> x, double y[, const string options])
• Trains the internal FFM model using a (sparse) vector x and target y.
• Training uses Stochastic Gradient Descent (SGD).
• ffm_predict(m.model_id, m.model, data.features)
• Calculates a prediction from the given FFM model and data vector.
• The internal FFM model is referenced as ffm_model m
29. Part 2: Kernelized Passive-Aggressive
• What we want to achieve
• Quite Similar to SVM
• Pros and Cons
30. KPA: What we want to achieve
• Prediction: Same as FFM
• Regression & Classification: Same as FFM
• Passive-Aggressive uses a linear model -> similar to Support Vector Machines
31. Quite Similar to SVM
• SVM Model is 𝜙 𝑆𝑉𝑀 𝒙 = 𝒘, 𝒙 − 𝑏
• Passive-Aggressive Model is 𝜙 𝑃𝐴 𝒙 = 𝒘, 𝒙 − 𝑏
• Additionally, PA uses a margin 𝜖, which has different
meanings for classification and regression.
What’s the difference?
• Passive-Aggressive models don’t update
their weights when a new data point is correctly
classified/a new data point is within the
regression range.
• PA is an online algorithm (real-time learning)
• SVM generally uses batch learning
Classification
Regression
Images and equations from slides at http://ttic.uchicago.edu/~shai/ppt/PassiveAggressive.ppt
32. But That’s Regular Passive-Aggressive
What’s Kernelized PA, then?
• Kernelization means instead of using 𝜙 𝑃𝐴 𝒙 = 𝒘, 𝒙 − 𝑏, we introduce a
kernel function 𝐾 𝒙, 𝒙𝑖 which increases the expressiveness of the
algorithm, i.e. 𝜙 𝐾𝑃𝐴 𝒙 = 𝑖 𝛼𝑖 𝐾 𝒙, 𝒙𝑖 .
• This is geometrically interpreted as mapping each data point into a corresponding
point in a higher dimensional space.
• In our case we used a Polynomial Kernel (of degree 𝑑 with constant 𝑐)
which can be expressed as follows:
𝐾 𝒙, 𝒙𝑖 = 𝒙, 𝒙𝑖 + 𝑐 𝑑
• E.g. when 𝑑 = 2, 𝐾 𝒙, 𝒙𝑖 = 𝒙, 𝒙𝑖
2 + 2𝑐 𝒙, 𝒙𝑖 + 𝑐2
• This gives us a model of higher degree, i.e. a model that has interactions
between features!
• Note: The same methods can be used to make a Kernelized SVM too!
33. Regression? Model
Order
Categories Model Equation
Linear Model N 1 1
𝜙1 𝒙 = 𝑤0 +
𝑖=1
𝑛
𝑤𝑖 𝑥𝑖
Poly2 Model Y 2 1
𝜙2 𝒙 = 𝜙1 𝑥 +
𝑖<𝑗
𝑛
𝑤𝑖,𝑗 𝑥𝑖 𝑥𝑗
SVM N 1 1 𝜙 𝑆𝑉𝑀 𝒙 = 𝒘, 𝒙 − 𝑏 = 𝜙1(𝒙)
Kernelized SVM N n 1
𝜙 𝐾−𝑆𝑉𝑀 𝒙 =
𝑖=1
𝑛
𝛼𝑖 𝐾 𝒙, 𝒙𝑖 − 𝑏
SVD Y 2 2
𝜙 𝑆𝑉𝐷 𝒙 = 𝜙1 𝒙 +
𝑖<𝑗
𝑛
𝑝1,𝑝2
𝑈𝑖,𝑝1
𝑆 𝑝1,𝑝2
𝐼 𝑝2,𝑗 𝑥𝑖 𝑥𝑗
MF Y 2 2
𝜙 𝑀𝐹 𝒙 = 𝜙1 𝒙 +
𝑖<𝑗
𝑛
𝑝
𝑈𝑖,𝑝 𝐼 𝑝,𝑗 𝑥𝑖 𝑥𝑗
FM Y n n 𝜙 𝐹𝑀 𝒙 = 𝜙1 𝒙 +
𝑖<𝑗
𝒗𝑖, 𝒗𝑗 𝑥𝑖 𝑥𝑗
FFM Y 2 (n) n 𝜙 𝐹𝑀 𝒙 = 𝜙1 𝒙 +
𝑖<𝑗
𝒗𝑖,𝛽, 𝒗𝑗,𝛼 𝑥𝑖 𝑥𝑗
Global Bias Item/User Bias
Pairwise
35. Pros and Cons: KPA
• Pros
• A higher order model generally means better classification/regression results
• Cons
• A Polynomial Kernel of degree 𝑑 generally has a computational complexity of
𝑂(𝑛 𝑑
)
• However, this can be avoided, especially where input is sparse!
36. Status of Kernelized Passive-Aggressive in
Hivemall
• KPA for classification is complete
• Also includes modified PA algorithms PA-I and PA-II in kernelized form
• i.e. KPA-I, KPA-II
• No pull request yet
• https://github.com/L3Sota/hivemall/tree/feature/kernelized_pa
• Didn’t get around to writing the pull request
• Code has been reviewed.
• Includes options for faster processing of the kernel, such as Kernel
Expansion and Polynomial Kernel with Inverted Indices (PKI)
• Don’t ask me why it’s not called PKII
37. Part 3: ChangeFinder
• What we want to achieve
• How ChangeFinder Works
• What ChangeFinder can and can’t do
40. ChangeFinder: what we want to achieve
• Anomaly/Change-Point Detection: Data goes in, anomalies come out
• What’s the difference? -> Lone outliers are detected as anomalies and
long-lasting/permanent changes in behavior are detected as change-
points.
• Anomalies: Performance statistics (98th percentile response time, CPU usage)
go in; momentary dips in performance (anomalies) may be signs of network
or processing bottlenecks.
• Change-Points: Activity (port 135 traffic, SYN requests, credit card usage) goes
in; explosive increases in activity (change-points) may be signs of an attack
(virus, flood, identity theft).
41. How ChangeFinder Works
Anomaly Detection:
1. We assume the data follows a pattern and attempt to model it.
2. The current model 𝜃𝑡 gives a probability distribution 𝑝(⋅ | 𝜃𝑡 )for
the next data point, i.e. the probability that 𝑥 𝑡+1 ∈ 𝑎, 𝑏 is
𝑎
𝑏
𝑝( 𝑥 𝑡+1| 𝜃𝑡 )𝑑𝑥.
3. Once the next datum arrives, we can calculate a score from the
probability distribution
𝑆𝑐𝑜𝑟𝑒 𝑥 𝑡+1 = −log(𝑝 𝑥 𝑡+1 𝜃𝑡 )
4. If the score is greater than a preset threshold, an anomaly has been
detected.
42. How ChangeFinder Works
Change-Point Detection:
1. We assume the running mean of the anomaly scores
𝑦𝑡 =
1
𝑊 𝑖=1
𝑊
𝑆𝑐𝑜𝑟𝑒(𝑥𝑡−𝑖 )
follows a pattern and attempt to model it.
2. The current model 𝜙 𝑡 gives a probability distribution 𝑝(⋅ | 𝜙 𝑡 )for the next
score, i.e. the probability that 𝑦𝑡+1 ∈ 𝑎, 𝑏 is 𝑎
𝑏
𝑝( 𝑦𝑡+1| 𝜙 𝑡 )𝑑𝑥.
3. Once the next datum arrives, we can calculate a score from the probability
distribution
𝑆𝑐𝑜𝑟𝑒 𝑦𝑡+1 = −log(𝑝 𝑦𝑡+1 𝜙 𝑡 )
4. If the score is greater than a preset threshold, a change-point has been
detected.
43. How ChangeFinder Works
1. We assume an 𝑛 -degree Autoregressive model 𝜃𝑡 = 𝝁, 𝐴𝑖, 𝜺 𝑡 :
𝒙 𝑡 = 𝝁 +
𝑖=1
𝑛
𝐴𝑖(𝒙 𝑡−𝑖 − 𝝁) + 𝜺 𝑡
• 𝝁: The average of the model
• 𝐴𝑖: The model matrices, which determine how previous data affects
the next data point
• 𝜺 𝑡: A normally distributed error term following 𝒩(0, Σ)
AR model example graphs obtained from http://paulbourke.net/miscellaneous/ar/
44. How ChangeFinder Works
2. Given the parameters of the model, we calculate an estimate for
the next data point:
𝒙 𝑡 = 𝝁 +
𝑖=1
𝑛
𝐴𝑖(𝒙 𝑡−𝑖− 𝝁)
• Hats denote “statistically estimated value”
3. We then receive a new input 𝒙 𝑡, and calculate the estimation error
𝒙 𝑡 − 𝒙 𝑡. Assuming the model parameters are (mostly) correct, this
expression evaluates to 𝜺 𝑡, which we know is distributed according
to 𝒩(0, Σ).
45. How ChangeFinder Works
4. We can therefore calculate the score as
𝑆𝑐𝑜𝑟𝑒 𝒙 𝑡 = − log 𝑝 𝒙 𝑡 𝜃𝑡
= −
1
𝑑
log
exp −
1
2
𝒙 𝑡 − 𝜇 𝑇
Σ−1
𝒙 𝑡 − 𝜇
2𝜋 −
𝑑
2( Σ −
1
2)
• Our estimate of the model is never perfect, so we should update the model
parameters each time a new data point comes in!
• We also need to update the model parameters whenever we encounter a change-
point, since the series has completely changed behavior.
5. After calculating the score for 𝒙 𝑡, we assume that 𝒙 𝑡 follows the same
time series and update our model parameter estimates 𝜃𝑡 = 𝝁, 𝐴𝑖, 𝜺 𝑡
46. What ChangeFinder can and can’t do
• ChangeFinder can detect anomalies and change-points.
• ChangeFinder can adapt to slowly changing data without sending
false positives.
• ChangeFinder can be adjusted to be more/less sensitive.
• Window size, Forgetfulness, Detection Threshold
• ChangeFinder can’t distinguish an infinitely large anomaly from a
change-point.
• ChangeFinder can’t detect small change-points.
• ChangeFinder can’t correctly detect anything at the beginning of the
dataset.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56. Status of ChangeFinder within Hivemall
• No pull request yet
• https://github.com/L3Sota/hivemall/tree/feature/cf_sdar_focused
• Mostly complete but some issues remain with detection accuracy, esp. at
higher dimensions
• cf_detect(array<double> x[, const string options])
• ChangeFinder expects input one data point (one vector) at a time, and
automatically learns from the data in the order provided while returning
detection results.
57. How was Interning?
• Educational
• Eclipse
• Maven
• Java
• Contributing to an existing project
• Inspiring
• Cool people doing cool stuff, and I get to join in
• Critical
• Next steps: Code more! Get more experience!
• Shifting from “doing what I’m told” to “think what the next step is”
Editor's Notes
Order: output, input, function phi
Internally can be called regression (probability of clicking)
e.g. will the user click the item, will the user buy the item
To explain what FFM does, we need to explain what FM does.