This document discusses online advertising and techniques for fitting large-scale models to advertising data. It outlines batch and online algorithms for logistic regression, including parallelizing existing batch algorithms and stochastic gradient descent. The document also discusses using alternating direction method of multipliers and follow the proximal regularized leader to fit models to large datasets across multiple machines. It provides examples of how major companies like LinkedIn and Facebook implement hybrid online-batch algorithms at large scale.
A review of the paper “Ad Click Prediction: a View from the Trenches”
The paper discusses predicting ad click--through rates (CTR) which is a massive-scale learning problem central to the multi-billion dollar online advertising industry.
Presented by Mazen & Arzam in the Data Intensive Computing class at KTH, Stockholm, Sweden.
Link of the paper: http://research.google.com/pubs/pub41159.html
In this video I’m going to show you how SigOpt can help you amplify your machine learning and AI models by optimally tuning them using our black-box optimization platform.
Video: https://youtu.be/EjGrRxXWg8o
The SigOpt platform provides an ensemble of state-of-the-art Bayesian and Global optimization algorithms via a simple Software-as-a-Service API.
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...Fei Chen
ML platform meetups are quarterly meetups, where we discuss and share advanced technology on machine learning infrastructure. Companies involved include Airbnb, Databricks, Facebook, Google, LinkedIn, Netflix, Pinterest, Twitter, and Uber.
In this deck I’m going to show you how SigOpt can help you amplify your trading models by optimally tuning them using our black-box optimization platform.
A review of the paper “Ad Click Prediction: a View from the Trenches”
The paper discusses predicting ad click--through rates (CTR) which is a massive-scale learning problem central to the multi-billion dollar online advertising industry.
Presented by Mazen & Arzam in the Data Intensive Computing class at KTH, Stockholm, Sweden.
Link of the paper: http://research.google.com/pubs/pub41159.html
In this video I’m going to show you how SigOpt can help you amplify your machine learning and AI models by optimally tuning them using our black-box optimization platform.
Video: https://youtu.be/EjGrRxXWg8o
The SigOpt platform provides an ensemble of state-of-the-art Bayesian and Global optimization algorithms via a simple Software-as-a-Service API.
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...Fei Chen
ML platform meetups are quarterly meetups, where we discuss and share advanced technology on machine learning infrastructure. Companies involved include Airbnb, Databricks, Facebook, Google, LinkedIn, Netflix, Pinterest, Twitter, and Uber.
In this deck I’m going to show you how SigOpt can help you amplify your trading models by optimally tuning them using our black-box optimization platform.
Using q/kdb+ for pricing fixed income derivatives. Demonstrate some basic linear algebra in q. Show how to implement the Heath-Jarrow-Morton model. Introduce principal component analysis.
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMark Lefevre, CQF
Briefly introduce machine learning, supervised learning and neural networks. Implement neural network algorithms in q/kdb+ to recognize handwritten Japanese characters.
AWS makes it easy to build, train, tune, and deploy Machine Learning (ML) models. If you're excited to get started with ML on AWS but want a refresher on the ML concepts behind build, train, tune, and deploy, this Dev Chat is for you.
Originally delivered as a Dev Chat at AWS Summit SF by Software Engineer Alexandra Johnson
Our Summer 2017 release presents Deepnets, a highly effective supervised learning method that solves classification and regression problems in a way that can match or exceed human performance, especially in domains where effective feature engineering is difficult. BigML Deepnets bring two unique parameter optimization options: Automatic Network Search and Structure Suggestion. These options avoid the difficult and time-consuming work of hand-tuning the algorithm and ensure the best network among all possible networks to solve your problem. This new resource is available from the BigML Dashboard, API, as well as from WhizzML for its automation. Deepnets are state-of-the-art in many important supervised learning applications.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
Graph-based investigation often enables us to identify individuals who are of special interest, and their uniqueness is due in part to their pattern of interactions. For example:
-A patient whose carepath journey leverages best-practices gained from using pattern matching algorithms that find similar issues among the data of 50 million patients
-An individual who builds a successful portfolio by implementing actions recommended by similarity algorithms that find equivalent actions by successful investors
-A participant in a criminal ring whose attempts at swindling are blocked by matching them to patterns of known fraudulent activity
Once you have identified such a pattern and a key individual, you want to search your data for similar occurrences. Similarity algorithms are the answer.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
Topic: How to use big data to enhance AI
Outline:
1. Spark ETL
Spark SQL
Spark Streaming
2. Spark ML
Spark ML pipeline
Distributed model tuning
Spark ML model and data lineage management
3. Spark XGboost
XGboost introduction
XGboost with Spark
XGboost with GPU
4. Spark Deep Learning pipeline
Transfer learning
Build Spark ML pipeline with TensorFlow
Model selection on distributed TF model
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
Machine learning is full of ideas that are far abstracted away from the underlying data and difficult to understand. Luckily, this represents an amazing opportunity for visualization! These slides dive into the machine learning meta-problem of hyperparameter optimization. We'll show 4 opportunities for visualization in helping people understand, implement, and evaluate hyperparameter optimization strategies.
Disadvantages Of Scaling Content: A Slideshowdesigns.codes
Scaling content helps your content and website to reach out to a wider range of audience. However, with the increasing volume of data and content within the search engine, scaling content is becoming more challenging by the day. Marketers have to innovate newer techniques of marketing and keep changing their marketing strategies to accommodate the ever increasing competition.
Using q/kdb+ for pricing fixed income derivatives. Demonstrate some basic linear algebra in q. Show how to implement the Heath-Jarrow-Morton model. Introduce principal component analysis.
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMark Lefevre, CQF
Briefly introduce machine learning, supervised learning and neural networks. Implement neural network algorithms in q/kdb+ to recognize handwritten Japanese characters.
AWS makes it easy to build, train, tune, and deploy Machine Learning (ML) models. If you're excited to get started with ML on AWS but want a refresher on the ML concepts behind build, train, tune, and deploy, this Dev Chat is for you.
Originally delivered as a Dev Chat at AWS Summit SF by Software Engineer Alexandra Johnson
Our Summer 2017 release presents Deepnets, a highly effective supervised learning method that solves classification and regression problems in a way that can match or exceed human performance, especially in domains where effective feature engineering is difficult. BigML Deepnets bring two unique parameter optimization options: Automatic Network Search and Structure Suggestion. These options avoid the difficult and time-consuming work of hand-tuning the algorithm and ensure the best network among all possible networks to solve your problem. This new resource is available from the BigML Dashboard, API, as well as from WhizzML for its automation. Deepnets are state-of-the-art in many important supervised learning applications.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
What does finding the best location for a warehouse/office/retail store have in common with finding the most influential person in a referral network? Answer: they are both Centrality problems and can be solved with graph algorithms.
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
Graph-based investigation often enables us to identify individuals who are of special interest, and their uniqueness is due in part to their pattern of interactions. For example:
-A patient whose carepath journey leverages best-practices gained from using pattern matching algorithms that find similar issues among the data of 50 million patients
-An individual who builds a successful portfolio by implementing actions recommended by similarity algorithms that find equivalent actions by successful investors
-A participant in a criminal ring whose attempts at swindling are blocked by matching them to patterns of known fraudulent activity
Once you have identified such a pattern and a key individual, you want to search your data for similar occurrences. Similarity algorithms are the answer.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
Topic: How to use big data to enhance AI
Outline:
1. Spark ETL
Spark SQL
Spark Streaming
2. Spark ML
Spark ML pipeline
Distributed model tuning
Spark ML model and data lineage management
3. Spark XGboost
XGboost introduction
XGboost with Spark
XGboost with GPU
4. Spark Deep Learning pipeline
Transfer learning
Build Spark ML pipeline with TensorFlow
Model selection on distributed TF model
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
Machine learning is full of ideas that are far abstracted away from the underlying data and difficult to understand. Luckily, this represents an amazing opportunity for visualization! These slides dive into the machine learning meta-problem of hyperparameter optimization. We'll show 4 opportunities for visualization in helping people understand, implement, and evaluate hyperparameter optimization strategies.
Disadvantages Of Scaling Content: A Slideshowdesigns.codes
Scaling content helps your content and website to reach out to a wider range of audience. However, with the increasing volume of data and content within the search engine, scaling content is becoming more challenging by the day. Marketers have to innovate newer techniques of marketing and keep changing their marketing strategies to accommodate the ever increasing competition.
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
We have reached a remarkable point in history with the evolution of AI, from applying this technology to incredible use cases in healthcare, to addressing the world's biggest humanitarian and environmental issues. Our ability to learn task-specific functions for vision, language, sequence and control tasks is getting better at a rapid pace. This talk will survey some of the current advances in AI, compare AI to other fields that have historically developed over time, and calibrate where we are in the relative advancement timeline. We will also speculate about the next inflection points and capabilities that AI can offer down the road, and look at how those might intersect with other emergent fields, e.g. Quantum computing.
Recommender Systems from A to Z – Model TrainingCrossing Minds
This second meetup will be about training different models for our recommender system. We will review the simple models we can build as a baseline. After that, we will present the recommender system as an optimization problem and discuss different training losses. We will mention linear models and matrix factorization techniques. We will end the presentation with a simple introduction to non-linear models and deep learning.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
TensorFlow and Deep Learning Tips and TricksBen Ball
Presented at https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/241183195/ . Tips and Tricks for using Tensorflow with Deep Reinforcement Learning.
See our blog for more information at http://prediction-machines.com/blog/
Learning to Compose Domain-Specific Transformations for Data AugmentationTatsuya Shirakawa
A. J. Ratner, H. R. Ehrenberg, et al., “Learning to Compose Domain-Specific Transformations for Data Augmentation”, NIPS2017
(https://papers.nips.cc/paper/6916-learning-to-compose-domain-specific-transformations-for-data-augmentation)
Slides for NIPS2017 paper reading meet-up @Tokyo
https://abeja-innovation-meetup.connpass.com/event/75189/
Slides covered during Analytics Boot Camp conducted with the help of IBM, Venturesity. Special credits to Kumar Rishabh (Google) and Srinivas Nv Gannavarapu (IBM)
The slide of the talk in http://www.meetup.com/R-Users-Sydney/events/223867196/
There is a web version here: http://wush978.github.io/FeatureHashing/index.html
n this talk, I will take the preparation of tutorial program in data science conference 2014 as an example and share some experience of R, Git, Github and CI(jenkins). The rating of our tutorial exceeds 4.2 (1 ~ 5). Some speakers and assistants agree that the assistant package "DSC2014Tutorial" improves the preparation and the teaching. Therefore, I would like to share the experience to establish such an working environment.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
2. Outline
● Introduction of Online Advertising
● Handling Real Data
– Data Engineering
– Model Matrix
– Enhance Computation Speed of R
● Fitting Model to Large Scale Data
– Batch Algorithm – Parallelizing Existed Algorithm
– Online Algorithm – SGD, FTPRL and Learning Rate Schema
● Display Advertising Challenge
8. Why Online Advertising is Growing?
● Wide reach
● Target oriented
● Quick conversion
● Highly informative
● Cost-effective
● Easy to use
Measurable
Half the money I spend on advertising is wasted; the trouble is I
don't know which half.
9. How do we measure the online ad?
● The user behavior on the internet is trackable.
– We know who watches the ad.
– We know who buys the product.
● We collect data for measurement.
11. Performance-based advertising
● Pricing Model
– Cost-Per-Mille (CPM)
– Cost-Per-Click (CPC)
– Cost-Per-Action (CPA) or Cost-Per-Order (CPO)
12. To Improve Profit
● Display the ad with high Click-Through Rate(CTR) * CPC, or
Conversion Rate (CVR) * CPO
● Estimation of the probability of click (conversion) is the central
problem
– Rule Based
– Statistical Modeling (Machine Learning)
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
12
10
8
6
4
2
0
13. System
Website Ad Request
Recommendation
Website Ad Delivering
Log Server
Batch
Model Fitting
Online
14. Rule Based
● Let the advertiser selects the target group
XX
15. Statistical Modeling
● We log the display and collect the response
● Features
– Ad
– Channel
– User
16. Features of Ad
● Ad type
– Text
– Figure
– Video
● Ad Content
– Fashion
– Health
– Game
19. Real Features
Zhang, Weinan and Yuan, Shuai and Wang, Jun and Shen, Xuehua. Real-Time Bidding Benchmarking
with iPinYou Dataset
20. Know How v.s. Know Why
● We usually do not study the reason of high CTR
● Little improvement of accuracy implies large improvement of
profit
● Predictive Analysis
21. Data
● School
– Static
– Cleaned
– Public
● Commercial
– Dynamic
– Error
– Private
23. Data Engineering with R
http://wush978.github.io/REngineering/
● Automation of R Jobs
– Convert R script to command line application
– Learn modern tools such as jenkins
● Connections between multiple machine
– Learn ssh
● Logging
– Linux tools: bash redirection, tee
– R package: logging
● R Error Handling
– try, tryCatch
24. Characteristic of Data
● Rare Event
● Large Amount of Categorical Features
– Binning Numerical Features
● Features are highly correlated
● Some features occurs frequently, some occurs rarely
25. Common Statistical Model for CTR
● Logistic Regression ● Gradient Boosted Regression
Tree
– Check xgboost
26. Logistic Regression
P(Click| x)= 1
1+e−wT x=σ(wT x)
● Linear relationship with features
– Fast prediction
– (Relative) Fast Fitting
● Usually fit the model with L2 regularization
27. How large is the data?
● Instances: 10^9
● Binary features: 10^5
28. Subsampling
● Sampling is useful for:
– Data exploration
– Code testing
● Sampling might harm the accuracy (profit)
– Rare event
– Some features occurs frequently and some occurs rarely
● We do not subsample data so far
29. Sampling
● Olivier Chapelle, et. al. Simple and scalable response prediction
for display advertising.
32. Dense Matrix
● 10^9 instances
● 10^5 binary features
● 10^14 elements for model matrix
● Size: 4 * 10^14 bytes
– 400 TB
● In memory is about 10^3 faster than on disk
33. R and Large Scale Data
● R cannot handle large scale data
● R consumes lots of memory
36. Sparse Matrix
● The size of non-zero could be estimated by the number of
categorical variable
m∼109
n∼105
k∼101×109
Dense Matrix: 4×1014
List: 12×109
Compressed: 12×109 or 8×109+4×105
37. Sparse Matrix
● Sparse matrix is useful for:
– Large amount of categorical data
– Text Analysis
– Tag Analysis
40. Advanced tips: package Rcpp
● C/C++ uses memory more efficiently
● Rcpp provides easy interface for R and C/C++
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP XTv(S4 m, NumericVector v, NumericVector&
retval) {
//...
}
41. Two approach of fitting logistic
regression to large-scaled data
● Batch Algorithm
– Optimize the log likelihood
globally
● Online Algorithm
– Optimize the loss function
instance per instance
42. Batch Algorithm
Negative Loglikelihood:
f (w∣( x1, y1) ,⋯,( xm , ym))
m
−yt log(σ(wT xt))−(1−yt)log(1−σ(wT xt))
=Σt
=1
Gradient Decent:
wt+1=wt−η∇ f (wt)
Each update requires scanning all data
43. Parallelize Existed Batch Algorithm
Rowise Partition
(X1
X2)v=(X1 v
X2 v)
(v1 v2)(X1
X2)=v1 X1+v2 X2
● We could split data by instances to several machines
● The matrix-vector multiplication could be parallelized
44. Framework of Parallelization
● Hadoop
– Slow for iterative
algorithm
– False tolerance
– Good for many machines
● MPI
– If in memory, fast for
iterative algorithm
– No false tolerance
– Good for several machines
46. R Package: pbdMPI
● Easy to install (on ubuntu)
– sudo apt-get install openmpi-bin openmpi-common
libopenmpi-dev
– install.packages("pbdMPI")
● Easy to develop (compared to Rmpi)
47. R Package: pbdMPI
library(pbdMPI)
.rank <- comm.rank()
filename <- sprintf("%d.csv", .rank)
data <- read.csv(filename)
target <- reduce(sum(data$value), op="sum")
finalize()
48. Parallelize Algorithm with pbdMPI
● Implement functions required for optimization with pbdMPI
– optim requires f and g (gradient of f)
– nlminb requires f, g, and H(hessian of f)
– tron requires f, g, and Hs(H multiply a given vector s)
49. Some Tips of Optimization
● Take care of stopping criteria
– A relative threshold might be enough
● Save the coefficient during iteration and print the value of f and
g with operator <<-
– You can stop the iteration anytime
– Monitor the convergence
51. LinkedIn Way
Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013
● Too many data to fit in single machine
– Billions of observations, million of features
● A Naive Approach
– Partition the data and run logistic regression for each partition
– Take the mean of the learned coefficients
– Problem: Not guaranteed to converge to the model from single machine!
● Alternating Direction Method of Multipliers (ADMM)
– Boyd et al. 2011 (based on earlier work from the 70s)
52. ADMM
For each nodes, the data and coefficients are different
K
f k (wk )+λ2∥w∥2 2
Σk
=1
subject to wk=w∀k
k =argminwk f k (wk )+
wt+1
ρ2
∥wk−wt+ut
k∥2 2
wt+1=argminw λ2∥w∥2 2
+
ρ2
K
∥wt+1
Σk
=1
k −w+ut
k∥2 2
k =ut
ut+1
k+wt+1
k −wt+1
55. Our Remark of ADMM
● ADMM saves the communication between the nodes
● In our environment, the overhead of communication is
affordable
– ADMM does not enhance the performance of our system
56. Online Algorithm
Stochastic Gradient Decent(SGD):
f (w| yt , xt )
=−yt log(σ(wT xt ))−(1−yt )log(1−σ(wT xt ))
wt+1=wt−η∇ f (wt| yt , xt )
● Choose an initial value and learning rate
● Randomly shuffle the instance in the training set
● Scan the data and update the coefficient
– Repeat until an approximate minimum is obtained
57. SGD to Follow The Proximal
Regularized Leader
H. Brendan McMahan. Follow-the-Regularized-Leader and Mirror Descent:
Equivalence Theorems and L1 Regularization. AISTATS 2011
wt+1 = wt−ηt∇ f (wt| yt , xt)
= argminw∇ f (wt| yt , xt )T w+
1
2ηt
(w−wt )T (w−wt)
t
f (wi| yi , xi)
Let gt=f (wt| yt , xt ) and g1 :t=Σi
=1
T w+t λ1‖w‖1+
wt+1 = argminw g1 :t
t
‖w−wi‖2 2
λ2
2 Σi
=1
58. Regret of SGD
H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for
Online Convex Optimization. COLT 2010
T
f t (wt )−minwΣt
Regret :=Σt
=1
T
f t (w)
=1
Global learning rate achives regret bound O(D M√T )
D is the L2 diameter of the feasible set
M is the L2 bound of g
59. Regret of SGD
H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for
Online Convex Optimization. COLT 2010
Per-coordinate Learning Rate:
ηt ,i= α
β+√Σs
t
gs ,i
=1
2
achieves regret bound O(√T n
1−γ
2 )
n is the dimension of w. If w∈[−0.5,0.5]n ,D=√n
P(xt ,i=1)∼i−γ for some γ∈[1,2)
60. Comparison of Learning Rate Schema
Xinran He, et. al. Practical Lessons from Predicting Clicks on Ads at Facebook.
ADKDD 2014.
61. Google KDD 2013, FTPRL
H. Brendan McMahan, et. al. Ad Click Prediction: a View from the Trenches. KDD
2013.
62. Some Remark for FTPRL
● FTPRL is a general optimization framework.
– We used it successfully to fit neuron network
● The per-coordinate learning rate greatly improves the
convergence on our data
– SGD works with per-coordinate learning rate
● The “Proximal” part decreases the accuracy, but introduces the
sparsity
63. Implementation of FTPRL in R
● I am not aware of any implementation of online optimization in
R
● The algorithm is simple. Just write it with a for loop.
● The overhead of loop is small in C/C++ compared to R
● I implemented the algorithm in
https://github.com/wush978/BridgewellML/tree/r-pkg
– Call for user
– Contact me if you want to try
65. Batch v.s. Online
Olivier Chapelle, et. al. Simple and scalable response prediction for display
advertising.
● Batch Algorithm
– Optimize the likelihood
function to a high accuracy
once they are in a good
neighborhood of the optimal
solution.
– Quite slow in reaching the
solution
– Straightforward to
generalize batch learning to
distributed environment
● Online Algorithm (mini-batch)
– Optimize the likelihood to
a rough precision quite fast
– A handful of passes over
the data.
– Tricky to parallelize
66. Criteo Inc. Hybrid of Online and Batch
● For each node, making one online pass over its local data
according to adaptive gradient updates.
● Average these local weights to be the initial value of L-BFGS.
67. Facebook
Xinran He, et. al. Practical Lessons from Predicting Clicks on Ads at Facebook.
ADKDD 2014.
● Decision Tree (Batch) for Feature Transforms
● Logistic Regression (Online)
73. Display Advertising Challenge
● https://www.kaggle.com/c/criteo-display-ad-challenge
● 7 * 10^7 instances
● 13 integer features and 26 categorical features with about 3 *
10^7 levels
● We were 9th over 718 teams
– We fit the neuron network (2-layer logistic regression) to the
data with FTPRL and dropout
74. Dropout in SGD
Geoffrey E. Hinton, et. al. Improving neural networks by preventing co-adaptation
of feature detectors. CoRR 2012
75. Tools of Large-scale Model Fitting
● Almost top 10 competitors were implemented algorithm by themselves
– There is no dominant tool for large-scale model fitting
● The winner used 20GB memory only. See
https://github.com/guestwalk/kaggle-2014-criteo
● For single machine, there are some good machine learning library
– LIBLINEAR for linear model (The student in the Lab is no.1)
– xgboost for gradient boosted regression tree (The author is no.12)
– Vowpal Wabbit