The document proposes adding a middle layer of coverage patterns to the existing bipartite model of Adwords to better utilize ad space and reach more potential consumers. Coverage patterns are mined from query logs grouped by concept taxonomy to capture related tail keywords with low competition. Advertisers are matched to coverage patterns based on required impressions to exhaust budgets. Experiments on AOL query data show the approach with coverage patterns increases the number of advertisements per session and sessions per advertisement, indicating better use of ad space and more diverse viewers reached.
Презентация во второго из трех вебинаров, входящих в Онлайн-школу 360.
Ведущий - Юрий Михеев, соавтор книги "Центр оценки шаг за шагом", ведущий консультант по оценке персонала ГК "Институт Тренинга - АРБ Про"
Структура вебинара:
- Разбираем отчет. Выбор варианта отчета
- Организация процесса подачи обратной связи
- Результаты для руководства
Видеозапись вебинара: https://youtu.be/OLyc_Z-w1b0
Презентация с вебинара "Базовые Стратегии-2016", проведенного 13 ноября 2016.
ПРОГРАММА ВЕБИНАРА:
1. Информация о бизнес-событии «Базовые Стратегии 2016»: акценты в содержании, спикеры
2. 2016 год: ближайшие развилки, события, прогнозы
3. Где точки роста – отрасли, регионы, типы игроков
СПИКЕР:
Сергей Иванович Макшанов,
Эксперт, управляющий ГК «Институт Тренинга – АРБ Про»
Idata Insights which operates under precision research and consulting private limited is a leading market research information aggregator and marketing research consulting firm.we conduct both primary and secondary research.our work does not end with research it is also involved in distributing reports for different companies.
Презентация с бизнес-завтрака Школы Бизнеса "Синергия" 9 октября 2015 года.
Ведущая - Зоя Стрелкова, руководитель направления «Экономика компании», ведущий финансовый аналитик ГК «Институт Тренинга – АРБ Про».
Вопросы, которые обсуждали на завтраке:
• В каких направлениях российские компании ищут резервы для повышения эффективности?
• Как управлять затратами?
• Как сделать управление компании более измеряемым?
• Что нужно знать об экономике компании менеджеры высшего и среднего звена?
Презентация во второго из трех вебинаров, входящих в Онлайн-школу 360.
Ведущий - Юрий Михеев, соавтор книги "Центр оценки шаг за шагом", ведущий консультант по оценке персонала ГК "Институт Тренинга - АРБ Про"
Структура вебинара:
- Разбираем отчет. Выбор варианта отчета
- Организация процесса подачи обратной связи
- Результаты для руководства
Видеозапись вебинара: https://youtu.be/OLyc_Z-w1b0
Презентация с вебинара "Базовые Стратегии-2016", проведенного 13 ноября 2016.
ПРОГРАММА ВЕБИНАРА:
1. Информация о бизнес-событии «Базовые Стратегии 2016»: акценты в содержании, спикеры
2. 2016 год: ближайшие развилки, события, прогнозы
3. Где точки роста – отрасли, регионы, типы игроков
СПИКЕР:
Сергей Иванович Макшанов,
Эксперт, управляющий ГК «Институт Тренинга – АРБ Про»
Idata Insights which operates under precision research and consulting private limited is a leading market research information aggregator and marketing research consulting firm.we conduct both primary and secondary research.our work does not end with research it is also involved in distributing reports for different companies.
Презентация с бизнес-завтрака Школы Бизнеса "Синергия" 9 октября 2015 года.
Ведущая - Зоя Стрелкова, руководитель направления «Экономика компании», ведущий финансовый аналитик ГК «Институт Тренинга – АРБ Про».
Вопросы, которые обсуждали на завтраке:
• В каких направлениях российские компании ищут резервы для повышения эффективности?
• Как управлять затратами?
• Как сделать управление компании более измеряемым?
• Что нужно знать об экономике компании менеджеры высшего и среднего звена?
Первая отраслевая конференция "Тренды и бизнес - возможности в индустрии здоровья, красоты, активного долголетия".
Эта конференция о том, как зарабатывать большую прибыль на своих продуктах, услугах и компетенциях и увеличить силу бренда.
Scrumban Demystified. Talk from Agile New England.
A few of the Scrumban Evolutions from Mamamoth bank from the upcoming book on Scrumban.
More excerpts can be found at facebook.com/scrumban
Learn more at scrumban.io
Do you have a case study of applying the Kanban Method in a Scrum context. We want to learn more from your experiments and results. Contact us at info@codegenesys.com
#MobileStrategy #MobileTransfo - Sommets du Digital #SOMDIG16Alexandre Jubien
Slides de mon intervention aux Sommets du Digital #SOMDIG16
Exemples de pattern de stratégies mobiles qui emmergent à travers les industries, mais aussi comment utiliser le mobile comme moteur de sa transformation mobile !
(et qqs insights sur l'orga évolutive sur la transfo mobile)
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
Mining Large Streams of User Data for Personalized
Recommendations
Xavier Amatriain
Netflix
[email protected]
ABSTRACT
The Netflix Prize put the spotlight on the use of data min-
ing and machine learning methods for predicting user pref-
erences. Many lessons came out of the competition. But
since then, Recommender Systems have evolved. This evo-
lution has been driven by the greater availability of different
kinds of user data in industry and the interest that the area
has drawn among the research community. The goal of this
paper is to give an up-to-date overview of the use of data
mining approaches for personalization and recommendation.
Using Netflix personalization as a motivating use case, I will
describe the use of different kinds of data and machine learn-
ing techniques.
After introducing the traditional approaches to recommen-
dation, I highlight some of the main lessons learned from
the Netflix Prize. I then describe the use of recommenda-
tion and personalization techniques at Netflix. Finally, I
pinpoint the most promising current research avenues and
unsolved problems that deserve attention in this domain.
1. INTRODUCTION
Recommender Systems (RS) are a prime example of the
mainstream applicability of large scale data mining. Ap-
plications such as e-commerce, search, Internet music and
video, gaming or even online dating make use of similar
techniques to mine large volumes of data to better match
their users’ needs in a personalized fashion.
There is more to a good recommender system than the data
mining technique. Issues such as the user interaction design,
outside the scope of this paper, may have a deep impact
on the effectiveness of an approach. But given an existing
application, an improvement in the algorithm can have a
value of millions of dollars, and can even be the factor that
determines the success or failure of a business. On the other
hand, given an existing method or algorithm, adding more
features coming from different data sources can also result
in a significant improvement. I will describe the use of data,
models, and other personalization techniques at Netflix in
section 3. I will also discuss whether we should focus on
more data or better models in section 4.
Another important issue is how to measure the success of
a given personalization technique. Root mean squared er-
ror (RMSE) was the offline evaluation metric of choice in
the Netflix Prize (see Section 2). But there are many other
relevant metrics that, if optimized, would lead to different
solutions - think, for example, of ranking metrics such as
Normalized Discounted Cumulative Gain (NDCG) or other
information retrieval ones such as recall or area under the
curve (AUC). Beyond the optimization of a given offline met-
ric, what we are really pursuing is the impact of a method on
the business. Is there a way to relate the goodness of an algo-
rithm to more customer-facing metrics such as click-through
rate (CTR) or retention? I will describe our ...
Advertiser has to understand the purchase requirement
of the users who are looking for a particular service to
recommend advertisement. Once the users’ demand is identified,
advertisers can target those users with appropriate query. In
this paper, predicting conversion in advertising using expectation
maximization [PCAEM] model is proposed to provide influence of
their advertising campaigns to the advertisers by understanding
hidden topics in search terms with respect to the time period.
Query terms present in search log are used to construct vocabulary.
Expectation Maximization technique is used to learn
hidden topics from the vocabulary. Least Absolute Shrinkage
and Selection Operator (LASSO) is used to predict total number
of conversion. Experiment results show that PCAEM model outperforms
TopicMachine model by reducing Root Mean Squared
Error (RMSE) and Mean Absolute Error (MAE) for prediction.
Optimizing Budget Constrained Spend in Search AdvertisingSunny Kr
Search engine ad auctions typically have a signicant frac-
tion of advertisers who are budget constrained, i.e., if al-
lowed to participate in every auction that they bid on, they
would spend more than their budget. This yields an im-
portant problem: selecting the ad auctions in which these
advertisers participate, in order to optimize dierent system
Data-driven Reserve Prices for Social Advertising Auctions at LinkedInKun Liu
Online advertising auctions constitute an important source of revenue for search engines such as Google and Bing, as well as social networks such as Facebook, LinkedIn and Twitter. We study the problem of setting the optimal reserve price in a Generalized Second Price auction, guided by auction theory with suitable adaptations to social advertising at LinkedIn. Two types of reserve prices are deployed: one at the user level, which is kept private by the publisher, and the other at the audience segment level, which is made public to advertisers. We demonstrate through field experiments the effectiveness of this reserve price mechanism to promote demand growth, increase ads revenue, and improve advertiser experience.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
A Brief of Google AdWords Pay Per ClickRanjan Jena
PPC is particularly good for split testing as it allows your strategies to fail much faster, saving you time and money in the long run. While it seems like strange advice, knowing what terms not to go for can be more important in some ways than figuring out which search terms are best, as creating content is expensive and time consuming. If you don’t have it right, you will waste a lot of time and resources.
Первая отраслевая конференция "Тренды и бизнес - возможности в индустрии здоровья, красоты, активного долголетия".
Эта конференция о том, как зарабатывать большую прибыль на своих продуктах, услугах и компетенциях и увеличить силу бренда.
Scrumban Demystified. Talk from Agile New England.
A few of the Scrumban Evolutions from Mamamoth bank from the upcoming book on Scrumban.
More excerpts can be found at facebook.com/scrumban
Learn more at scrumban.io
Do you have a case study of applying the Kanban Method in a Scrum context. We want to learn more from your experiments and results. Contact us at info@codegenesys.com
#MobileStrategy #MobileTransfo - Sommets du Digital #SOMDIG16Alexandre Jubien
Slides de mon intervention aux Sommets du Digital #SOMDIG16
Exemples de pattern de stratégies mobiles qui emmergent à travers les industries, mais aussi comment utiliser le mobile comme moteur de sa transformation mobile !
(et qqs insights sur l'orga évolutive sur la transfo mobile)
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
Mining Large Streams of User Data for Personalized
Recommendations
Xavier Amatriain
Netflix
[email protected]
ABSTRACT
The Netflix Prize put the spotlight on the use of data min-
ing and machine learning methods for predicting user pref-
erences. Many lessons came out of the competition. But
since then, Recommender Systems have evolved. This evo-
lution has been driven by the greater availability of different
kinds of user data in industry and the interest that the area
has drawn among the research community. The goal of this
paper is to give an up-to-date overview of the use of data
mining approaches for personalization and recommendation.
Using Netflix personalization as a motivating use case, I will
describe the use of different kinds of data and machine learn-
ing techniques.
After introducing the traditional approaches to recommen-
dation, I highlight some of the main lessons learned from
the Netflix Prize. I then describe the use of recommenda-
tion and personalization techniques at Netflix. Finally, I
pinpoint the most promising current research avenues and
unsolved problems that deserve attention in this domain.
1. INTRODUCTION
Recommender Systems (RS) are a prime example of the
mainstream applicability of large scale data mining. Ap-
plications such as e-commerce, search, Internet music and
video, gaming or even online dating make use of similar
techniques to mine large volumes of data to better match
their users’ needs in a personalized fashion.
There is more to a good recommender system than the data
mining technique. Issues such as the user interaction design,
outside the scope of this paper, may have a deep impact
on the effectiveness of an approach. But given an existing
application, an improvement in the algorithm can have a
value of millions of dollars, and can even be the factor that
determines the success or failure of a business. On the other
hand, given an existing method or algorithm, adding more
features coming from different data sources can also result
in a significant improvement. I will describe the use of data,
models, and other personalization techniques at Netflix in
section 3. I will also discuss whether we should focus on
more data or better models in section 4.
Another important issue is how to measure the success of
a given personalization technique. Root mean squared er-
ror (RMSE) was the offline evaluation metric of choice in
the Netflix Prize (see Section 2). But there are many other
relevant metrics that, if optimized, would lead to different
solutions - think, for example, of ranking metrics such as
Normalized Discounted Cumulative Gain (NDCG) or other
information retrieval ones such as recall or area under the
curve (AUC). Beyond the optimization of a given offline met-
ric, what we are really pursuing is the impact of a method on
the business. Is there a way to relate the goodness of an algo-
rithm to more customer-facing metrics such as click-through
rate (CTR) or retention? I will describe our ...
Advertiser has to understand the purchase requirement
of the users who are looking for a particular service to
recommend advertisement. Once the users’ demand is identified,
advertisers can target those users with appropriate query. In
this paper, predicting conversion in advertising using expectation
maximization [PCAEM] model is proposed to provide influence of
their advertising campaigns to the advertisers by understanding
hidden topics in search terms with respect to the time period.
Query terms present in search log are used to construct vocabulary.
Expectation Maximization technique is used to learn
hidden topics from the vocabulary. Least Absolute Shrinkage
and Selection Operator (LASSO) is used to predict total number
of conversion. Experiment results show that PCAEM model outperforms
TopicMachine model by reducing Root Mean Squared
Error (RMSE) and Mean Absolute Error (MAE) for prediction.
Optimizing Budget Constrained Spend in Search AdvertisingSunny Kr
Search engine ad auctions typically have a signicant frac-
tion of advertisers who are budget constrained, i.e., if al-
lowed to participate in every auction that they bid on, they
would spend more than their budget. This yields an im-
portant problem: selecting the ad auctions in which these
advertisers participate, in order to optimize dierent system
Data-driven Reserve Prices for Social Advertising Auctions at LinkedInKun Liu
Online advertising auctions constitute an important source of revenue for search engines such as Google and Bing, as well as social networks such as Facebook, LinkedIn and Twitter. We study the problem of setting the optimal reserve price in a Generalized Second Price auction, guided by auction theory with suitable adaptations to social advertising at LinkedIn. Two types of reserve prices are deployed: one at the user level, which is kept private by the publisher, and the other at the audience segment level, which is made public to advertisers. We demonstrate through field experiments the effectiveness of this reserve price mechanism to promote demand growth, increase ads revenue, and improve advertiser experience.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
A Brief of Google AdWords Pay Per ClickRanjan Jena
PPC is particularly good for split testing as it allows your strategies to fail much faster, saving you time and money in the long run. While it seems like strange advice, knowing what terms not to go for can be more important in some ways than figuring out which search terms are best, as creating content is expensive and time consuming. If you don’t have it right, you will waste a lot of time and resources.
Know Your Data 1. The ArnetMiner citation dataset (provide.docxDIPESH30
Know Your Data
1. The ArnetMiner citation dataset (provided by arnetminer.org) by year 2012 can be downloaded in
the attached file.
(1) Count the number of authors, venues (conferences/journals), and publications in the datasets.
(2) What are the min, max, Q1, Q3, and median number of publications per author? Can you plot
the histogram for number of publications per author?
(3) What are the min, max, Q1, Q3, and median number of citations per author? Can you plot the
histogram for number of citations received per author?
(4) Please plot the scatter plot between the number of publications vs. the number of citations for
authors who have more than 5 publications.
Classification for Matrix Data
2. Decision Tree
Construct a decision tree for the following training data, where “Edible” is the class we are going to
predict. Information gain is used to select the attributes. Please write down the major steps in the
construction process (you need to show the information gain for each candidate attribute when a new
node is created in the tree).
3. Naïve Bayes
Consider a Naïve Bayes model for spam classification with the vocabulary V = {secret, offer, low, price,
valued, customer, today, dollar, million, sports, is, for, play, healthy, pizza}, where each word in the
vocabulary is considered as a feature, and their values could be either 1 or 0, denoting whether they exist
in one message. We have the messages and labels in the following table:
Messages Class label
Million dollar offer Spam
Secret offer today Spam
Secret is secret Spam
Low price for valued customer non-spam
Play secret sports today non-spam
Sports is healthy non-spam
Low price pizza non-spam
Give the MLEs for the following parameters:𝜃𝑠𝑝𝑎𝑚 = 𝑃(𝐶𝑠𝑝𝑎𝑚 ), 𝜃𝑠𝑒𝑐𝑟𝑒𝑡|𝑠𝑝𝑎𝑚 = 𝑃(𝑠𝑒𝑐𝑟𝑒𝑡 = 1|𝐶𝑠𝑝𝑎𝑚 ),
𝜃𝑠𝑒𝑐𝑟𝑒𝑡|𝑛𝑜𝑛−𝑠𝑝𝑎𝑚 = 𝑃(𝑠𝑒𝑐𝑟𝑒𝑡 = 1|𝐶𝑛𝑜𝑛−𝑠𝑝𝑎𝑚 ), 𝜃𝑠𝑝𝑜𝑟𝑡𝑠|𝑛𝑜𝑛−𝑠𝑝𝑎𝑚 = 𝑃(𝑠𝑝𝑜𝑟𝑡𝑠 = 1|𝐶𝑛𝑜𝑛−𝑠𝑝𝑎𝑚 ), and 𝜃𝑑𝑜𝑙𝑙𝑎𝑟|𝑠𝑝𝑎𝑚 =
𝑃(𝑑𝑜𝑙𝑙𝑎𝑟 = 1|𝐶𝑠𝑝𝑎𝑚 ).
4. Support Vector Machine
# x1 x2 class
1 2.46 2.59 1
2 3.05 2.87 1
3 1.12 1.64 1
4 0.01 1.44 1
5 2.20 3.04 1
6 0.41 2.04 1
7 0.53 0.77 1
8 1.89 2.64 1
9 -0.39 0.96 1
10 -0.96 0.08 1
11 2.65 -1.33 -1
12 1.57 -1.70 -1
13 3.05 0.01 -1
14 2.66 -1.15 -1
15 4.51 -0.52 -1
16 3.06 -0.82 -1
17 3.16 -0.56 -1
18 2.05 -0.62 -1
19 0.71 -2.47 -1
20 1.63 -0.91 -1
Given 20 data points and their class labels in the above, suppose by solving the dual form of the
quadratic programming of svm, we can derive the α′s for each data point as follows:
α7 = 0.4952
α18 = 0.0459
α20 = 0.4493
Others = 0
(1) Please point out the support vectors in the training points.
(2) Calculate the normal vector of the hyperplane: w
(3) Calculate the bias b, according to b = ∑ (yk − w′xkk:αk≠0 )/Nk , where xk = (xk1, xk2)′ indicate the
support vectors and Nk is the total number of
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
3. Introduction
Search engines have been deemed as the starting point of most of the web
transactions.
This results in a significantly larger user base, making search engines an ideal
avenue for businesses to reach potential consumers.
In the meta-model of knowledge sharing of search engines, advertising has
become a major revenue earning source for their sustenance.
According to IAB standards, sponsored search is the most dominant form of
online advertising covering almost 43 of the entire market.
4. Introduction : Sponsored Search
The model of search engine advertising is more popularly known as Adwords.
When a user queries a search engine, a list of search results and sponsored
results or advertisements are displayed.
Advertisers bid on search keywords and pay the search engine according to
Pay Per Click (PPC) model to display the advertisement on the query page
containing the desired keywords.
5. Introduction: Problem Statement
Search keywords follow a long tail frequency distribution with
a small and fat head of highly frequent keywords
a long but thin tail of less frequent keywords
During the keyword auctions, there is a high competition for head keywords
while there is little to no competition for the tail keywords.
This leads to underutilization of ad space of a large number of tail keywords.
Also, it leads to ignorance of a diverse set of potential consumers who could
be captured by targeting tail keywords.
7. Background: Model of Adwords
The present model is considered as
the online bipartite graph matching,
with advertisers as one disjoint set
and incoming queries are the other.
When a new query comes in, it is to be
matched to a set of advertisers.
The advertisers are then ranked and
their ads are display in that ranked
order.
9. Background: Coverage Patterns- Central Idea
The basic idea of Coverage Pattern is inspired from the set cover problem in set
theory.
Given a universe U and a family S of subsets of U, a cover is a subfamily C⊂ S
of sets whose union is U.
Using the same notion, coverage patterns aim to identify items that cover
certain percentage of the entire data.
A keypoint to be mentioned is that coverage patterns aim at identifying
that usually “do not” occur together, in contrast to frequent patterns
that identify patterns in data that occur together.
10. Background: Coverage Patterns - Notations
Let W be a set of webpages of a website, W = {w1 ,w2 … wN }.
Let D be a set of transactions from the click stream data, D = {T1
,T2 … } such that T ⊂ W.
X is defined as a patterns of webpages such that X ⊂ W, X = {wp
,wq ,… }.
Twi denotes the set of transactions containing the webpage wi and
its cardinality is denoted as |Twi|.
13. Background : Coverage Patterns
A pattern is interesting if it has a high CS and low OR.
A high CS value indicates more number of visitors and a low OR value means
less repetitions amongst the visitors.
A pattern is said to be interesting if CS(X) > minCS(X), OR(X) < maxOR and
RF(wi) > minRF
14. Background: Coverage Patterns - Example
Dataset :
Assuming, minRF - 0.2, minCS - 0.3 and maxOR- to be 0.5.
Ta is 5, Tb is 7 and f, Tf is 1 . So, RF for a is 0.5, for b is 0.7 and for f is 0.1.
RF(f) = 0.1 < 0.2 (minRF), f will be removed. RF(a) = 0.5 > 0.2and RF(b) = 0.7 >
0.2, so a and b are not removed.
{b,a} is a candidate pattern. (Order in a pattern is decreasing order of the RF.)
The Coverage Set for {b,a} is {1,2,3,4,5,6,7,8,9,10} and |CSet {b,a}| is 10.
Hence, CS = 10/10 = 1 > 0.3 (minCS).
16. Proposed Approach : Basic Idea
Because of the nature of distribution of search keywords, there is very less
competition for tail keywords. As a result, there are very less or no advertisers
for such keywords.
We noticed that if could combine such keywords into groups such that these
keyword groups have a certain number of visitors, we can utilize the ad space
of such keywords.
To perform the grouping of search keywords, we employ the notion of query
taxonomy to group semantically similar words.
These groups are further mined from the logs in the form of coverage patterns.
17. Proposed Model
We propose to add a middle layer of
coverage patterns to the bipartite
model of Adwords.
In the proposed model, the incoming
queries are first matched to a
coverage pattern using the concept
taxonomy.
The coverage pattern is then matched to
a set of advertisers.
The advertisers are then ranked
19. Architecture Comparison
Step Modification with respect to bipartite
architecture
Analyze Query This step remains the same except that the
subconcept of the query is also retrieved
Retrieve Relevant
Ads From the
Matching
In this step, the advertisers who have been
matched to the coverage pattern containing the
subconcept of the query are retrieved. (The
matching between coverage patterns and
advertisers would be explained later.)
Bidding Stays the same
Ranking Advertisers Stays the same
21. Coverage Pattern and Advertisers Matching
Coverage Pattern and Advertiser matching is the most important phase in the
architecture.
This has been further divided into four steps:
a. Converting Query Logs to Concept Transactions
b. Extraction of Coverage Patterns
c. Estimation of Number of Impressions for Advertisers
d. Matching Coverage Patterns and Advertisers
Each step is explained in the later slides.
22. Step 1 : Converting Query Logs to Concept Transactions
One key point to note here is that the web query logs cannot be directly mined
for coverage patterns because of the large vocabulary size (even when we
only consider English).
To generalize the coverage pattern mining, we proposed to use a three-level
concept taxonomy to classify queries into a pair of concept and subconcept.
23. Step 1:Converting Query Logs to Concept Transactions (cntd.)
Using the same taxonomy, we
convert the web query logs
into concept transactions
using the techniques of query
classification.
To define a transaction, we
consider a session boundary
of 30 mins in the query logs
for each user.
Sample Sessions
Converted Concepted Transactions
24. Step 2: Extraction of Coverage Patterns
● Coverage patterns are
extracted from the converted
concept transactions.
● One key point to note is that
coverage patterns mine
unique visitors while the
standard models of
advertising are either based
on Impressions or Clicks.
● So, we convert coverage
patterns’ coverage into
number of impressions as
follows:
● For the above example, we consider the concept of
Science and Agriculture, Biology, Chemistry,
Environment, Physics and Technology as its
subconcepts.
● The transaction size is also assumed to be 1000.
● NOTE: We also rank the coverage patterns in
ascending order of their CS - OR parameter.
25. Step 3: Estimating Required Impressions for
Advertisers
In Adwords, advertisers create an ad
campaign for their website.
In an ad campaign, a daily budget and a
bid is specified on the keywords that
they chose to bid upon.
Using CTR, bid and daily budget values,
we calculated the number of
impressions it will take to exhaust the
budget of an advertiser using the
following identity:
The above table shows details of nine
advertisers who bid upon the concept of
Science
26. Step 4: Matching Advertisers to Coverage Patterns
With coverage patterns and advertisers in the same unit of number of
impressions, we can create a matching between the two.
The matching can be termed as a MANY-TO-ONE matching between coverage
patterns and advertisers because a coverage patterns covers multiple
keywords from different nodes in the taxonomy.
The matching algorithm has a relaxation parameter, ε to perform faster.
The algorithms loops over coverage patterns and then advertisers, and a
coverage pattern is allocated to an advertisers if the following condition is
satisfied:
Ad.Impressions - CP.Coverage < ε × Ad.Impressions
28. Experiments : Dataset
We performed a comparative study on the bipartite model of Adwords with and
without coverage patterns layers.
We used AOL search query dataset to run the experiments.
We took the most popular four categories of queries to run our experiments.
Query Dataset for the most popular four categories
29. Experiments: Performance Metrics
1. Number of Advertisements per Session (AS) as the ratio of Sum of Unique
Advertisements of all Sessions (SUAS) and Number of Sessions with Advertisements
(NSA) to indicate the utilization of a session. Higher value of AS indicates better use
of ad space.
AS = SUAS / NAS
1. An increase in diversity among the viewers of the advertisements was also observed.
To indicate the same, the value of Sessions per Advertisement (SA) which is the
ratio of Number of Advertisements of all Sessions (NAS) to Number of Advertisements
(NA). Higher value of metric implies the more number of unique eyeballs and thus,
increasing the chances of the advertisement being clicked by diverse users.
SA = NAS/NA
Experiments: Performance Metrics
30. Graphs show a comparison of the bipartite Adwords system - With and Without Coverage Patterns layer
Experiments: Results- Utilization of Ad Space
31. Graphs show a comparison of the bipartite Adwords system - With and Without Coverage Patterns layer
Experiments: Results - Diversity
33. Related Work
Most works in Adwords target algorithms to optimize different aspects of the
system, including revenue, welfarism and display of ads.
Another aspect that is targeted in Adwords is the bidding scenarios with respect
to Adwords. Several studies have touched up on how to increase revenue in
dynamic bidding scenario when you only have a partial information of the
system.
In this paper, we have proposed an architectural solution to use the ad space of
the search keywords. Bidding strategies and Budget optimization can be
placed on top of it.
35. Conclusions and Future Work
In this paper, an architectural solution is proposed for Adwords to utilize the ad
space of tail keywords. The proposed approach also considerable
improvement with respect to diversity in reach of advertisements.
We plan on investigating the coverage patterns approach with respect to
different taxonomies. We believe a hybrid taxonomy would be the best when
it comes to Adwords architecture.
We also plan on expanding the user boundaries of exploration in searching
beyond search sessions. We plan on extracting user goals and modelling the
transactions from them.