SlideShare a Scribd company logo
1 of 15
Prognosis - An Approach
to Predictive Analytics
Abstract
Prediction is a statement made about the future, an anticipatory
vision or perception. This White Paper discusses the emergence
of technology that enables precise predictions in varied fields,
and the application of exploratory and normative methods to
augment decision making.
Forecasting is primarily based on mining historical data sets,
extracting hidden patterns and transforming them into valuable
information through a process of classification, clustering,
regression and association rule learning.
The white paper talks about Impetus’ implementation of
Behavioral Targeting for the ad world. This is a widely accepted,
statistical machine learning algorithm that helps select most
relevant ads to be displayed to a web user based on their
historical data.
.
Impetus Technologies Inc.
www.impetus.com
W H I T E P A P E R
Prognosis – An Approach to Predictive Analytics
2
Table of Contents
Introduction ..................................................................................................................................................2
Large scale data analytics .........................................................................................................................3
Algorithms for forecasting & prediction ...................................................................................................4
Behavioral Targeting.....................................................................................................................................4
Advantages and threats ............................................................................................................................5
Industry impact.........................................................................................................................................6
Generic Approach to BT problem solving .................................................................................................6
Large scale implementation of BT ................................................................................................................7
Poisson’s Linear Regression ......................................................................................................................7
Implementing BT using Poisson’s Linear Regression ................................................................................7
1. Data Preparation...........................................................................................................................8
2. Model Training..............................................................................................................................9
3. Model Evaluation........................................................................................................................13
Summary.....................................................................................................................................................15
Prognosis – An Approach to Predictive Analytics
3
Introduction
A prediction is a statement about the way things will happen in the future, often
but not always based on experience or knowledge. Prediction is necessary to
allow plans to be made about possible developments. Large corporations invest
heavily in this kind of activity to help focus attention on possible events, risks
and business opportunities. Such work brings together all available past and
current data, as a basis to develop reasonable expectations about the future.
The basic idea behind any such algorithm is to gather gigantic behavioral data
that describes the historical series of events/actions/behavior of the entity in
question. This data is fed into machines and run through complex machine
learning algorithms to derive models. The models serve as the basis for
predictions, i.e. based on input criteria the models infer the expected behavior
of the entity.
The application of prediction algorithms has gained prominence in a wide range
of fields such as finance (stock market predictions), insurance (predicting life
expectancy), science (weather forecasting, predicting natural disasters), medical
science (treating developmental disabilities), marketing (behavioral targeting)
and many more.
Typically, with predictions, there is a huge amount of historical data, time is of
the essence and there is always a current activity happening that impacts the
future. In many cases, freshness of data is a key factor and plays a major role in
forecasting the future course of action. In other instances, the entire data set
has equal relevance and contributes to determining the future.
Large scale data analytics
Projects related to future predictions and forecasting point to a huge increase in
the amount of data that must not only be stored but processed quickly and
efficiently. These challenges are at once a daunting and exciting chance to use
data to create a positive impact.
Often, there is an immediate need to analyze the data at hand, to discover
patterns, reveal threats, monitor critical systems, and make decisions about the
direction the organization should take. Several constraints are always present:
the need to implement new analytics quickly enough to capitalize on new data
sources, limits on the scope of development efforts, and the pressure to expand
mission capability without an increase in budgets. For many of these
applications, the large data processing stack (which includes the simplified
programming model Map-Reduce, distributed file systems, semi-structured
stores, and integration components, all running on commodity class hardware),
Prognosis – An Approach to Predictive Analytics
4
has opened up a new avenue for scaling out efforts and enabling analytics that
were impossible in previous architectures. This new ecosystem has been found
to be remarkably versatile at handling various types of data and classes of
analytics.
Perhaps the most exciting benefit, however, from moving to these highly
scalable architectures is that after the immediate issues have been solved, often
with a system that can handle today’s requirements and scale up to 10x or
more, new analytics and capabilities can be developed, evaluated and
integrated easily. This is owing to the speed and ease of Map-Reduce, Pig, Hive,
and other technologies. More than ever, the large-scale data analysis software
stack is proving to be a platform for innovation.
Algorithms for forecasting and prediction
There are several classes of statistical algorithms that are well suited for these
kinds of problems, which are associated with trend analysis, pattern generation
and artificial intelligence based predictions. Some of the most common ones
are:
Conjoint Analysis – Expert opinion and Delphi surveys
Quantitative – Statistical, suited to predicting trends e.g. Poisson’s
Linear regression, Exponential smoothing
Qualitative – Subjective, providing a range of possible outcomes, e.g.
the Bayesian approach
Statistical combination – A mix of quantitative and qualitative
techniques e.g. Quasi Bayes
Behavioral Targeting
Behavioral targeting (BT) leverages historical user behavior to select the most
relevant ads to display. The state-of-the-art of BT derives a Linear Poisson
Regression model from fine-grained user behavioral data and predicts click-
through rate (CTR) from user history.
Behavioral targeting is an application of modern statistical machine learning
methods to online advertising. But unlike other computational advertising
techniques, BT does not primarily rely on contextual information such as query
(‘sponsored search’) and web page (‘content match’). Instead, BT learns from
past user behavior, especially the implicit feedback (i.e., ad clicks) to match the
best ads to users.
Prognosis – An Approach to Predictive Analytics
5
This makes BT enjoy a broader applicability such as graphical display ads, or at
least a valuable user dimension complementary to other contextual advertising
techniques. In today's practice, behaviorally targeted advertising inventory
comes in the form of some kind of demand-driven taxonomy. Hierarchical
examples are Finance, Investment and Technology, Consumer Electronics, and
Cellular Telephones. Within a category of interest, a BT model derives a
relevance score for each user from past activity. Should the user appear online
during a targeting time window, the ad serving system will qualify this user (to
be shown an ad in this category) if the score is above a certain threshold. One
de facto measure of relevance is CTR, and the threshold is predetermined in
such a way that both a desired level of relevance (measured by the cumulative
CTR of a collection of targeted users) and the volume of targeted ad impressions
(also called reach) can be achieved.
The impact of behavioral targeting can be negative if consumers feel annoyed or
threatened by the use of their ‘personal’ data. However, as demonstrated by
Amazon, when personal information and technology enhance the online
experience, there is less risk of a negative response.
Advantages and threats
There are a lot of advantages attributed to ad targeting and behavioral analysis,
but at the same time it is also important to look at the downsides and surface
the threats posed by them. Some of the advantages that can be seen right away
are:
Reaching the right audience at the right time (of the day, week or life
stage), with clear behavioral assumptions
Standing out in a cluttered category
Reaching target audiences when ‘context’ inventory is sold out
(reaching same target in alternative content)
High cost of entry in desired content (reaching the same target in
alternative content with lower costs)
Tailoring message to behavioral patterns to make it more relevant
As mentioned earlier, there are some downsides to BT:
Achieving high reach is difficult. Within extremely targeted segments,
the potential universe available may be very limited and there may be a
limit to the sites currently allowing behavioral targeting.
Inconsistencies within segment classifications. The definition of
‘common’ behavioral segment may differ by publisher (e.g., job seeker
searching Monster.com not the same job seeker as reading job-related
article on iVillage). Also, as the technology is cookie enabled, it suffers
the usual issues of cookie stability and data accuracy.
Prognosis – An Approach to Predictive Analytics
6
Ultimate issue of behavioral targeting clutter. Other advertisers within
the same vertical will compete in the same space/segments. This is
currently a future issue but in time, cost, clutter and inventory
availability positives will become challenges (as seen in paid search). In
the future, as targeting matures and advertisers have measurable
results, historical data will be a key indicator of which assumptions
work. This will provide optimization insights. Collecting and analyzing
response data generated from different segments are important
prerequisites for success.
Industry impact
Behavioral targeting, as a concept, has wide acceptance in the industry.
Indicated below are some use-cases where it is being successfully implemented
as a tool for predicting user behavior:
Ad Targeting and Predicting the buying behavior of users
Relationship building
Audience targeting
Presidential candidates using BT to target persuasion
Treatment of mental disorders and developmental disabilities
There is a vast horizon where BT, or BT based solutions are being used to
successfully predict/forecast behavior in order to increase reach, accessibility,
and revenue.
Generic approach to BT problem solving
Data mining involves extracting hidden patterns from data to transform
it into valuable information using computer power to apply knowledge
discovery methodologies.
It applies knowledge discovery and prediction through a process of
classification, clustering, regression and association rule learning.
The value of the information depends on the collection of indicative and
representative data.
Cookies for behavioral advertising usually contain text that uniquely
identifies the browser so that advertisers or ad networks can recognize
the same Internet user across different Web sites or multiple areas on
the same site.
Prognosis – An Approach to Predictive Analytics
7
Large Scale Implementation of BT
Poisson’s Linear Regression
This is a statistical method used to calculate the probability of an event, given
the rate of occurrence of the event in disjoint timeframes, suited for analyzing
outcomes that have positive values.
Poisson’s Linear Regression works really well where the input data is sparse i.e.
results are valid for rare events. It can model rare events when everyone is
followed for the same length of time, or when people have different length of
follow ups.
Implementing BT using Poisson’s Linear Regression
Behavioral targeting can be effectively implemented using the Poisson’s Linear
Regression algorithm, as it maps well to the nature of input data and the kind of
predictions that organizations are looking at.
Prognosis – An Approach to Predictive Analytics
8
The Algorithm is well explained by the flow chart:
Impetus Technologies implemented Behavioral targeting using the Poisson’s
Linear Regression algorithm. The algorithm was deployed using the Hadoop
ecosystem. The entire algorithm was decomposed into individual steps. Each of
the steps was implemented as a Hadoop M/R job and the jobs were run
sequentially using the Oozie workflow engine. The results of the
implementation were models for different categories. These models were
stored on the HBase data store and later consumed for analytics and behavioral
predictions.The steps involved in the above implementation are explained
below:
1. Data Preparation
In this preprocessing step, the data fields of interest were extracted from raw
data feeds, thus reducing the size of the data.
Prognosis – An Approach to Predictive Analytics
9
Raw data was related to user behavior with respect to one or more ads. It also
included ad clicks, ad views, page views, searches, organic clicks or overture
clicks.
1. The raw data came from the user base
2. The system stored the raw data in HDFS
3. The raw data was sent to the data preparation module which
undertook the following:
a. Aggregated event counts over a configurable period of time, to
further shrink the data size
b. Merged counts into a single entry with <cookie, time-
period> as unique key
c. It included two M/R jobs–Feature-Extractor and Feature-
Generator
1.1 Feature-Extractor
Input - Raw data feeds
Output - <cookie:time-period:feature-Type:feature-Name, feature-
Count>
1.2 Feature-Generator
Input - <cookie: time-period: feature-Type: feature-Name, feature-
Count>
Output - <cookie: time-period, feature-Type: feature-Name, feature-
Count ...>
2. Model Training
This fitted the Linear Poisson Regression Model from the preprocessed data and
involved the following:
1. Feature selection
2. Generating of training examples
3. Model weights initialization
4. Multiplicative recurrence to converge model weights
2.1 Poisson-Entity-Dictionary
It mainly performed feature selection and inverted indexing. It did this
by counting entity frequency in terms of touching cookies and selecting
the most frequent entities in the given feature space.
Output-Hashmap of <entityType:featureName, featureIndex>(inverted
index) for all entity types
Prognosis – An Approach to Predictive Analytics
10
An entity referred to the name (unique identifier) of an event (e.g. an ad
id, a space Id for page, or a query). The Entity was different from the
feature since the latter was uniquely identified by the <featureType,
featureName> pair.
In the context of BT, there were three types of entities—ad, page and
search
The Poisson entity dictionary included three M/R jobs—
PoissonEntityUnit, PoissonEntitySum, and PoissonEntityHash
2.2 Poisson-Feature-Vector
This generated training examples (feature vectors) that were directly
used later by model initialization and multiplicative recurrence.
It used a sparse data structure (populated primarily with zeros) for
feature vectors. Behavioral count data is very sparse by nature. For a
given user, in a given time period, his or her activity only involves a
limited number of events. Impetus used a pair of arrays of the same
length to represent a feature vector or a target vector—an Integer type
for feature and float type for value (float type for possible decaying),
with an array index giving a <feature, value> pair.
Feature Selection and inverted indexing: - With the feature space
selected from PoissonEntityDictionary, in this step, Impetus discarded
the unselected events from the training data in the feature (input
variable) side. On the target (response variable) side Impetus took the
option of using all features or only selected features to categorize them
into target event counts.
With the inverted index built from PoissonEntityDictionary,
from the PoissonFeatureVector step and onwards, Impetus
referenced an original feature name by its index. The same idea was
also applied to cookies, since the cookie field was irrelevant.
Several pre-computations were performed at this stage: -
1. Impetus further aggregated feature counts into a time window,
with a size larger than or equal to the resolution from data
preparation.
2. Decay counts over time using a configurable factor
3. Realized causal approach to generate examples. (Causal
approach collects features before targets temporarily; while the
non-causal approach generates targets and features from the
same period of history).
Prognosis – An Approach to Predictive Analytics
11
4. Impetus used binary representation (serialized objects in java)
and data compression (Sequence file with BLOCK compression
in Hadoop framework) for feature vectors.
Data structure for the feature vector
 int[targetLength] targetIndex Array
 float[targetLength] targetValue Array
 int[inputLength] inputIndex Array
 float[inputLength] inputValue Array
Input - <cookie:timeperiod,
featureType:featureName:featureCount ...>
Output - <cookieIndex, featureVector>
Target counts were collected from a sliding time window and feature
counts aggregated (possibly with decay) from a time period preceding
the target window. The size of the sliding window was kept relatively
small for the following reasons: -
1. A large window effectively discarded many <features,
targets>co-occurrences within that window. E.g. The following
setup yielded superior long term models: -
a. A target window of size one day
b. Sliding over a one week period
c. Preceded by a four week feature window(also sliding
along with the target window)
The Algorithm included the following:
1. For each cookie Impetus cached all the event count data.
2. It sorted events by time, forming an event stream of this
particular cookie covering the entire time period of interest.
3. Impetus pre-computed boundaries of the sliding window. Four
boundaries were specified — featureBegin,
featureEnd, targetBegin, targetEnd.
separatingfeatureEnd and targetBegin allowed a
gap window in between, which was necessary to emulate
possible latency in online prediction.
Prognosis – An Approach to Predictive Analytics
12
4. The company maintained three iterators on the event stream,
referencing previous featureBegin, current
FeatureBegin, and targetBegin. It used one pair of
treeMap objects (i.e. inputMap and targetMap) to hold
features and targets of a feature vector as the data was being
processed.
2.3 Poisson-Initializer
It initialized the model weights (coefficients of the regressor’s) by
scanning the training data once.
k: Index of target variables
j: Index of features or input variables
i: examples
a unigram(j) is one occurrence of feature j
a bigram(k,j) is one co-occurrence of target k and feature j
The basic idea was to allocate the weight w(k,j) as a normalized number
of co-occurrences of (k,j).Bigram based initialization.
The output of PoissonInitializer was an initialized weight
matrix of dimensionality number of targets by number of features.
1. Impetus distributed the computation of counting the bigrams by
a composite key<k,j> and effectively pre-computed total bigram
counts of all examples before the final stage.
2. The M/R framework provided a single key data structure. In
order to distribute <k,j>, Impetus needed an efficient function
to transform a composite key(two integers) into a single key and
recover the composite key back when needed.
bigram Key(k,j) = a long integer obtained by bitwise left
shift 32 bit of k and then bitwise OR by j
3. The Impetus team cached the output of first mapper that
emitted <bigramKey, bigramCount>.
2.4 Poisson-Multiplicative
It updated the model weights by scanning the training data iteratively. It
utilized highly effective multiplicative recurrence.
Computing a normalizer Poisson mean involved dot product a previous
weight vector by a feature vector (The input portion)
Input - <cookieIndex, featurevector>
Output - updated wk for all k
Prognosis – An Approach to Predictive Analytics
13
1. Impetus represented the model weight matrix as K dense
weight vectors (arrays) of length J, where K was the number of
targets and J the number of features.
2. Using weight vectors was more scalable in terms of memory
footprint than matrix representation. But, it raised challenges in
Disk IO. Impetus addressed this problem via in-memory caching.
Caching weight vectors was not the solution. The trick was to
cache input examples. After caching, Impetus maintained a
hashmap that recorded all relevant targets for cached feature
vectors. And provided constant time lookup from target Index
to array-index Map<targetIndex, arrayIndex>.
3. Impetus also used Hadoop's distributed cache, which copied the
requested files from HDFS to the slave nodes before the task
was executed. It only copied the files once per job for each task
tracker, which was shared by M/R tasks.
3. Model Evaluation
It tested the trained model on a test data set. The main tasks were:
1. Predicting expected target counts(clicks and views)
2. Scoring (CTR)
3. Ranking scores of a test set
4. Calculating and reporting performance metrics such as CTR lift and area
under ROC curve.
This component contained three sequential steps:
3.1 Poisson-Feature-Vector-Eval
It was Identical to Poisson-Feature-Vector.
There was no need to book keep the summary statistics for
training such as total count of examples, feature and target
unigrams.
Decay was typically necessary in generating test data. Since
it enabled efficient incremental predicting as new events
flow in, while diminishing the obsolete long history
exponentially.
Sampling and heuristic based robot filtering were not
applied to generate test data
Impetus could remove those examples without a target
from the test dataset, since these records did not impact
the performance, no matter how the model predicted
them. However, examples with targets were also kept, even
those without any inputs. This was because these records
Prognosis – An Approach to Predictive Analytics
14
(‘new users’) had to be scored by the model in production
and hence had a non-trivial impact on the performance.
Impetus categorized target counts either from the entire
feature space or from the selected space depending on the
learning goal.
The size of the sliding window was configured
approximately the same as the ad serving cycle in
production and the size of the gap window imitated the
latency between last seen events and the next ad serving in
production.
3.2 Poisson-Predictor
Input - <cookieIndex, FeatureVector>
Output - <cookieIndex, predictedActualtarget[2 x
numTarget]>
It took the dot product of a weight vector and a feature vector as the
predicted target count (a continuous variable). To predict the expected
number of ad clicks and views in all categories for an example I, the
algorithm needed to read the weight vectors of all targets converged
from Poisson-Multiplicative.
3.3 Poisson-Evaluator
Input - <cookieIndex,
predictedActualtarget[2xnumTarget]>
Output - performance metrics, per category and overall reports
It scored each testing example by dividing its predicted clicks by
predicted views and applying Laplacian smoothing. It then sorted all
examples by score and finally computed and reported the performance
metrics. The performance metrics include: -
The number of winning categories over certain benchmarks
Cumulative CTR
CTR lift
Area under ROC curve
Summary stats
It generated reports of both in accordance with category results and
overall performance.
Prognosis – An Approach to Predictive Analytics
15
Summary
As explained above, prediction is a statement made about the future. A very
popular area of application that has flourished in recent times is Behavioral
targeting (BT). BT is defined as a large scale machine learning problem that
leverages historical user behavior to select the most relevant ads to display. The
process basically involves mining historical data sets and extracting hidden
patterns (trends) to predict user interests.
Major IT giants like Yahoo, Google and Amazon have used Behavioral Targeting
and achieved major gains in terms of reach and CTR increase. There are several
implementations of BT that employ various statistical algorithms and processes
to extract the behavioral traits of the users in question.
The input to the BT engine is a historical sequence of the activities undertaken
by users over the Internet. These activities include ad clicks, ad views, page
views, search queries and search clicks. As the users browse the Internet they
unknowingly leave a trail of footprints in terms of visited pages, ads, cookies,
etc. These footprints reveal a lot about their personality traits. BT leverages on
these subtle inputs and without hindering the privacy of the users draws their
personality sketch. Based on these inferences, advertisers are able to target
their audience and show them relevant ads.
Impetus applied Poisson’s Linear Regression algorithm for its implementation.
This was deployed on the Hadoop environment using chained Map reduce jobs
as an Oozie workflow.
About Impetus
Impetus Technologies is a leading provider of Big Data solutions for the
Fortune 500®. We help customers effectively manage the “3-Vs” of Big Data
and create new business insights across their enterprises.
Website: www.bigdata.impetus.com | Email: bigdata@impetus.com
© 2013 Impetus Technologies,
Inc. All rights reserved. Product
and company names mentioned
herein may be trademarks of
their respective companies.
May 2013

More Related Content

What's hot

Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modellinglalit Lalitm7225
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analyticsPrasad Narasimhan
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
 
¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?Fabricio Quintanilla
 
Stock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmStock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmVenkat Projects
 
Predictive analytics in mobility
Predictive analytics in mobilityPredictive analytics in mobility
Predictive analytics in mobilityEktimo
 
Stock market analysis
Stock market analysisStock market analysis
Stock market analysisSruti Jain
 
Predictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerPredictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerRyan Withop
 
Stock market prediction using data mining
Stock market prediction using data miningStock market prediction using data mining
Stock market prediction using data miningShivakumarSoppannavar
 
Prediction of stock market index using genetic algorithm
Prediction of stock market index using genetic algorithmPrediction of stock market index using genetic algorithm
Prediction of stock market index using genetic algorithmAlexander Decker
 
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET-  	  Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET-  	  Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET- Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET Journal
 
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...idescitation
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive ModelDKALab
 
Neural networks in business forecasting
Neural networks in business forecastingNeural networks in business forecasting
Neural networks in business forecastingAmir Shokri
 

What's hot (20)

Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modelling
 
Application of predictive analytics
Application of predictive analyticsApplication of predictive analytics
Application of predictive analytics
 
Predictive modelling
Predictive modellingPredictive modelling
Predictive modelling
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
Stock Market Analysis
Stock Market AnalysisStock Market Analysis
Stock Market Analysis
 
Predictive Analytics Overview
Predictive Analytics OverviewPredictive Analytics Overview
Predictive Analytics Overview
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?
 
Stock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithmStock market trend prediction using k nearest neighbor(knn) algorithm
Stock market trend prediction using k nearest neighbor(knn) algorithm
 
Predictive analytics in mobility
Predictive analytics in mobilityPredictive analytics in mobility
Predictive analytics in mobility
 
Stock market analysis
Stock market analysisStock market analysis
Stock market analysis
 
Predictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerPredictive Analytics: An Executive Primer
Predictive Analytics: An Executive Primer
 
Stock market prediction using data mining
Stock market prediction using data miningStock market prediction using data mining
Stock market prediction using data mining
 
STOCK MARKET PREDICTION
STOCK MARKET PREDICTIONSTOCK MARKET PREDICTION
STOCK MARKET PREDICTION
 
Prediction of stock market index using genetic algorithm
Prediction of stock market index using genetic algorithmPrediction of stock market index using genetic algorithm
Prediction of stock market index using genetic algorithm
 
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET-  	  Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET-  	  Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
 
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...
A Comparison of Stock Trend Prediction Using Accuracy Driven Neural Network V...
 
Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Neural networks in business forecasting
Neural networks in business forecastingNeural networks in business forecasting
Neural networks in business forecasting
 
Stock prediction system using ann
Stock prediction system using annStock prediction system using ann
Stock prediction system using ann
 

Viewers also liked

Webinar on Social media 2 0: Competitive Advantage for Businesses
Webinar on Social media 2 0: Competitive Advantage for BusinessesWebinar on Social media 2 0: Competitive Advantage for Businesses
Webinar on Social media 2 0: Competitive Advantage for BusinessesImpetus Technologies
 
Strategic Elements for Successful Performance Testing
Strategic Elements for Successful Performance Testing Strategic Elements for Successful Performance Testing
Strategic Elements for Successful Performance Testing Impetus Technologies
 
Social media 2 0 delivering competitive advantage for businesses v7 final
Social media 2 0   delivering competitive advantage for businesses v7 finalSocial media 2 0   delivering competitive advantage for businesses v7 final
Social media 2 0 delivering competitive advantage for businesses v7 finalImpetus Technologies
 
iLeap- Test Automation Framework- Impetus White Paper
iLeap- Test Automation Framework- Impetus White PaperiLeap- Test Automation Framework- Impetus White Paper
iLeap- Test Automation Framework- Impetus White PaperImpetus Technologies
 
Deriving Intelligence from Large Data - Hadoop implementation and Applying An...
Deriving Intelligence from Large Data - Hadoop implementation and Applying An...Deriving Intelligence from Large Data - Hadoop implementation and Applying An...
Deriving Intelligence from Large Data - Hadoop implementation and Applying An...Impetus Technologies
 
SaaS Enablement Challenges & Approaches
SaaS Enablement Challenges & ApproachesSaaS Enablement Challenges & Approaches
SaaS Enablement Challenges & ApproachesImpetus Technologies
 
tradingfootball.eu: The Nugget Green Book
tradingfootball.eu: The Nugget Green Booktradingfootball.eu: The Nugget Green Book
tradingfootball.eu: The Nugget Green Bookbingolittle
 
Big Data Technologies for Social Media Analytics- Impetus Webinar
Big Data Technologies for Social Media Analytics- Impetus WebinarBig Data Technologies for Social Media Analytics- Impetus Webinar
Big Data Technologies for Social Media Analytics- Impetus WebinarImpetus Technologies
 
The ladbrokes goal rush betting system
The ladbrokes goal rush betting systemThe ladbrokes goal rush betting system
The ladbrokes goal rush betting systemohmshanti1
 
1982 maher modelling association football scores
1982 maher   modelling association football scores1982 maher   modelling association football scores
1982 maher modelling association football scoresponton42
 
Modelling Association Football Scores
Modelling Association Football ScoresModelling Association Football Scores
Modelling Association Football Scoresponton42
 
Various types of odds formats - make money making decisive bets in your live ...
Various types of odds formats - make money making decisive bets in your live ...Various types of odds formats - make money making decisive bets in your live ...
Various types of odds formats - make money making decisive bets in your live ...ballbetz1
 
Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...
Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...
Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...Tomás González Olavarría
 
Major League Soccer 2015 Front Office Efficiency
Major League Soccer 2015 Front Office EfficiencyMajor League Soccer 2015 Front Office Efficiency
Major League Soccer 2015 Front Office EfficiencySoccermetrics Research LLC
 
Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...
Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...
Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...Impetus Technologies
 
Football predictions
Football predictionsFootball predictions
Football predictionsponton42
 
El pulso social y las variables que determian las venta de vino en Argentina
El pulso social y las variables que determian las venta de vino en ArgentinaEl pulso social y las variables que determian las venta de vino en Argentina
El pulso social y las variables que determian las venta de vino en ArgentinaVino Argentino
 
Strateji taktik futbolun prensipleri
Strateji taktik futbolun prensipleriStrateji taktik futbolun prensipleri
Strateji taktik futbolun prensipleriponton42
 

Viewers also liked (18)

Webinar on Social media 2 0: Competitive Advantage for Businesses
Webinar on Social media 2 0: Competitive Advantage for BusinessesWebinar on Social media 2 0: Competitive Advantage for Businesses
Webinar on Social media 2 0: Competitive Advantage for Businesses
 
Strategic Elements for Successful Performance Testing
Strategic Elements for Successful Performance Testing Strategic Elements for Successful Performance Testing
Strategic Elements for Successful Performance Testing
 
Social media 2 0 delivering competitive advantage for businesses v7 final
Social media 2 0   delivering competitive advantage for businesses v7 finalSocial media 2 0   delivering competitive advantage for businesses v7 final
Social media 2 0 delivering competitive advantage for businesses v7 final
 
iLeap- Test Automation Framework- Impetus White Paper
iLeap- Test Automation Framework- Impetus White PaperiLeap- Test Automation Framework- Impetus White Paper
iLeap- Test Automation Framework- Impetus White Paper
 
Deriving Intelligence from Large Data - Hadoop implementation and Applying An...
Deriving Intelligence from Large Data - Hadoop implementation and Applying An...Deriving Intelligence from Large Data - Hadoop implementation and Applying An...
Deriving Intelligence from Large Data - Hadoop implementation and Applying An...
 
SaaS Enablement Challenges & Approaches
SaaS Enablement Challenges & ApproachesSaaS Enablement Challenges & Approaches
SaaS Enablement Challenges & Approaches
 
tradingfootball.eu: The Nugget Green Book
tradingfootball.eu: The Nugget Green Booktradingfootball.eu: The Nugget Green Book
tradingfootball.eu: The Nugget Green Book
 
Big Data Technologies for Social Media Analytics- Impetus Webinar
Big Data Technologies for Social Media Analytics- Impetus WebinarBig Data Technologies for Social Media Analytics- Impetus Webinar
Big Data Technologies for Social Media Analytics- Impetus Webinar
 
The ladbrokes goal rush betting system
The ladbrokes goal rush betting systemThe ladbrokes goal rush betting system
The ladbrokes goal rush betting system
 
1982 maher modelling association football scores
1982 maher   modelling association football scores1982 maher   modelling association football scores
1982 maher modelling association football scores
 
Modelling Association Football Scores
Modelling Association Football ScoresModelling Association Football Scores
Modelling Association Football Scores
 
Various types of odds formats - make money making decisive bets in your live ...
Various types of odds formats - make money making decisive bets in your live ...Various types of odds formats - make money making decisive bets in your live ...
Various types of odds formats - make money making decisive bets in your live ...
 
Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...
Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...
Análisis econométrico del resultado del Mundial de Fútbol en base a las clasi...
 
Major League Soccer 2015 Front Office Efficiency
Major League Soccer 2015 Front Office EfficiencyMajor League Soccer 2015 Front Office Efficiency
Major League Soccer 2015 Front Office Efficiency
 
Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...
Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...
Identifying Software Performance Bottlenecks Using Diagnostic Tools- Impetus ...
 
Football predictions
Football predictionsFootball predictions
Football predictions
 
El pulso social y las variables que determian las venta de vino en Argentina
El pulso social y las variables que determian las venta de vino en ArgentinaEl pulso social y las variables que determian las venta de vino en Argentina
El pulso social y las variables que determian las venta de vino en Argentina
 
Strateji taktik futbolun prensipleri
Strateji taktik futbolun prensipleriStrateji taktik futbolun prensipleri
Strateji taktik futbolun prensipleri
 

Similar to Predictive Analytics Approach for Behavioral Targeting

A Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising PlatformA Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising Platformijaia
 
A Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising PlatformA Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising Platformgerogepatton
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AJeanmarieColbert3
 
Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleGrant Thornton LLP
 
MB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptxMB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptxssuser28b150
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlationVrushaliSolanke
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysisData analysis ireland
 
Data Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıMurat YAZICI, M.Sc.
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Analytics- Dawn of the Cognitive Era.PDF
Analytics- Dawn of the Cognitive Era.PDFAnalytics- Dawn of the Cognitive Era.PDF
Analytics- Dawn of the Cognitive Era.PDFMacGregor Olson
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatCharlie Hecht
 
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docxRunning title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docxanhlodge
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining TechniqRespa Peter
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxandreecapon
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paperShubhashish Biswas
 
II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...
II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...
II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...Dr. Haxel Consult
 
McKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning cultureMcKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning cultureMatt Ariker
 
How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...
How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...
How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...Data Analytics Company - 47Billion Inc.
 

Similar to Predictive Analytics Approach for Behavioral Targeting (20)

A Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising PlatformA Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising Platform
 
A Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising PlatformA Novel Feature Engineering Framework in Digital Advertising Platform
A Novel Feature Engineering Framework in Digital Advertising Platform
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
 
Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycle
 
MB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptxMB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptx
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
 
Data Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat Yazıcı
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Analytics- Dawn of the Cognitive Era.PDF
Analytics- Dawn of the Cognitive Era.PDFAnalytics- Dawn of the Cognitive Era.PDF
Analytics- Dawn of the Cognitive Era.PDF
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
 
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docxRunning title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
Running title TRENDS IN COMPUTER INFORMATION SYSTEMS1TRENDS I.docx
 
DataMining Techniq
DataMining TechniqDataMining Techniq
DataMining Techniq
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
 
II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...
II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...
II-SDV 2014 Hybrid Intelligence – foresight for opportunities (Youri Aksenov ...
 
McKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning cultureMcKinsey Big Data Trinity for self-learning culture
McKinsey Big Data Trinity for self-learning culture
 
How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...
How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...
How to Successfully Use Prescriptive Analytics to Optimize Healthcare Deliver...
 

More from Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

More from Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Predictive Analytics Approach for Behavioral Targeting

  • 1. Prognosis - An Approach to Predictive Analytics Abstract Prediction is a statement made about the future, an anticipatory vision or perception. This White Paper discusses the emergence of technology that enables precise predictions in varied fields, and the application of exploratory and normative methods to augment decision making. Forecasting is primarily based on mining historical data sets, extracting hidden patterns and transforming them into valuable information through a process of classification, clustering, regression and association rule learning. The white paper talks about Impetus’ implementation of Behavioral Targeting for the ad world. This is a widely accepted, statistical machine learning algorithm that helps select most relevant ads to be displayed to a web user based on their historical data. . Impetus Technologies Inc. www.impetus.com W H I T E P A P E R
  • 2. Prognosis – An Approach to Predictive Analytics 2 Table of Contents Introduction ..................................................................................................................................................2 Large scale data analytics .........................................................................................................................3 Algorithms for forecasting & prediction ...................................................................................................4 Behavioral Targeting.....................................................................................................................................4 Advantages and threats ............................................................................................................................5 Industry impact.........................................................................................................................................6 Generic Approach to BT problem solving .................................................................................................6 Large scale implementation of BT ................................................................................................................7 Poisson’s Linear Regression ......................................................................................................................7 Implementing BT using Poisson’s Linear Regression ................................................................................7 1. Data Preparation...........................................................................................................................8 2. Model Training..............................................................................................................................9 3. Model Evaluation........................................................................................................................13 Summary.....................................................................................................................................................15
  • 3. Prognosis – An Approach to Predictive Analytics 3 Introduction A prediction is a statement about the way things will happen in the future, often but not always based on experience or knowledge. Prediction is necessary to allow plans to be made about possible developments. Large corporations invest heavily in this kind of activity to help focus attention on possible events, risks and business opportunities. Such work brings together all available past and current data, as a basis to develop reasonable expectations about the future. The basic idea behind any such algorithm is to gather gigantic behavioral data that describes the historical series of events/actions/behavior of the entity in question. This data is fed into machines and run through complex machine learning algorithms to derive models. The models serve as the basis for predictions, i.e. based on input criteria the models infer the expected behavior of the entity. The application of prediction algorithms has gained prominence in a wide range of fields such as finance (stock market predictions), insurance (predicting life expectancy), science (weather forecasting, predicting natural disasters), medical science (treating developmental disabilities), marketing (behavioral targeting) and many more. Typically, with predictions, there is a huge amount of historical data, time is of the essence and there is always a current activity happening that impacts the future. In many cases, freshness of data is a key factor and plays a major role in forecasting the future course of action. In other instances, the entire data set has equal relevance and contributes to determining the future. Large scale data analytics Projects related to future predictions and forecasting point to a huge increase in the amount of data that must not only be stored but processed quickly and efficiently. These challenges are at once a daunting and exciting chance to use data to create a positive impact. Often, there is an immediate need to analyze the data at hand, to discover patterns, reveal threats, monitor critical systems, and make decisions about the direction the organization should take. Several constraints are always present: the need to implement new analytics quickly enough to capitalize on new data sources, limits on the scope of development efforts, and the pressure to expand mission capability without an increase in budgets. For many of these applications, the large data processing stack (which includes the simplified programming model Map-Reduce, distributed file systems, semi-structured stores, and integration components, all running on commodity class hardware),
  • 4. Prognosis – An Approach to Predictive Analytics 4 has opened up a new avenue for scaling out efforts and enabling analytics that were impossible in previous architectures. This new ecosystem has been found to be remarkably versatile at handling various types of data and classes of analytics. Perhaps the most exciting benefit, however, from moving to these highly scalable architectures is that after the immediate issues have been solved, often with a system that can handle today’s requirements and scale up to 10x or more, new analytics and capabilities can be developed, evaluated and integrated easily. This is owing to the speed and ease of Map-Reduce, Pig, Hive, and other technologies. More than ever, the large-scale data analysis software stack is proving to be a platform for innovation. Algorithms for forecasting and prediction There are several classes of statistical algorithms that are well suited for these kinds of problems, which are associated with trend analysis, pattern generation and artificial intelligence based predictions. Some of the most common ones are: Conjoint Analysis – Expert opinion and Delphi surveys Quantitative – Statistical, suited to predicting trends e.g. Poisson’s Linear regression, Exponential smoothing Qualitative – Subjective, providing a range of possible outcomes, e.g. the Bayesian approach Statistical combination – A mix of quantitative and qualitative techniques e.g. Quasi Bayes Behavioral Targeting Behavioral targeting (BT) leverages historical user behavior to select the most relevant ads to display. The state-of-the-art of BT derives a Linear Poisson Regression model from fine-grained user behavioral data and predicts click- through rate (CTR) from user history. Behavioral targeting is an application of modern statistical machine learning methods to online advertising. But unlike other computational advertising techniques, BT does not primarily rely on contextual information such as query (‘sponsored search’) and web page (‘content match’). Instead, BT learns from past user behavior, especially the implicit feedback (i.e., ad clicks) to match the best ads to users.
  • 5. Prognosis – An Approach to Predictive Analytics 5 This makes BT enjoy a broader applicability such as graphical display ads, or at least a valuable user dimension complementary to other contextual advertising techniques. In today's practice, behaviorally targeted advertising inventory comes in the form of some kind of demand-driven taxonomy. Hierarchical examples are Finance, Investment and Technology, Consumer Electronics, and Cellular Telephones. Within a category of interest, a BT model derives a relevance score for each user from past activity. Should the user appear online during a targeting time window, the ad serving system will qualify this user (to be shown an ad in this category) if the score is above a certain threshold. One de facto measure of relevance is CTR, and the threshold is predetermined in such a way that both a desired level of relevance (measured by the cumulative CTR of a collection of targeted users) and the volume of targeted ad impressions (also called reach) can be achieved. The impact of behavioral targeting can be negative if consumers feel annoyed or threatened by the use of their ‘personal’ data. However, as demonstrated by Amazon, when personal information and technology enhance the online experience, there is less risk of a negative response. Advantages and threats There are a lot of advantages attributed to ad targeting and behavioral analysis, but at the same time it is also important to look at the downsides and surface the threats posed by them. Some of the advantages that can be seen right away are: Reaching the right audience at the right time (of the day, week or life stage), with clear behavioral assumptions Standing out in a cluttered category Reaching target audiences when ‘context’ inventory is sold out (reaching same target in alternative content) High cost of entry in desired content (reaching the same target in alternative content with lower costs) Tailoring message to behavioral patterns to make it more relevant As mentioned earlier, there are some downsides to BT: Achieving high reach is difficult. Within extremely targeted segments, the potential universe available may be very limited and there may be a limit to the sites currently allowing behavioral targeting. Inconsistencies within segment classifications. The definition of ‘common’ behavioral segment may differ by publisher (e.g., job seeker searching Monster.com not the same job seeker as reading job-related article on iVillage). Also, as the technology is cookie enabled, it suffers the usual issues of cookie stability and data accuracy.
  • 6. Prognosis – An Approach to Predictive Analytics 6 Ultimate issue of behavioral targeting clutter. Other advertisers within the same vertical will compete in the same space/segments. This is currently a future issue but in time, cost, clutter and inventory availability positives will become challenges (as seen in paid search). In the future, as targeting matures and advertisers have measurable results, historical data will be a key indicator of which assumptions work. This will provide optimization insights. Collecting and analyzing response data generated from different segments are important prerequisites for success. Industry impact Behavioral targeting, as a concept, has wide acceptance in the industry. Indicated below are some use-cases where it is being successfully implemented as a tool for predicting user behavior: Ad Targeting and Predicting the buying behavior of users Relationship building Audience targeting Presidential candidates using BT to target persuasion Treatment of mental disorders and developmental disabilities There is a vast horizon where BT, or BT based solutions are being used to successfully predict/forecast behavior in order to increase reach, accessibility, and revenue. Generic approach to BT problem solving Data mining involves extracting hidden patterns from data to transform it into valuable information using computer power to apply knowledge discovery methodologies. It applies knowledge discovery and prediction through a process of classification, clustering, regression and association rule learning. The value of the information depends on the collection of indicative and representative data. Cookies for behavioral advertising usually contain text that uniquely identifies the browser so that advertisers or ad networks can recognize the same Internet user across different Web sites or multiple areas on the same site.
  • 7. Prognosis – An Approach to Predictive Analytics 7 Large Scale Implementation of BT Poisson’s Linear Regression This is a statistical method used to calculate the probability of an event, given the rate of occurrence of the event in disjoint timeframes, suited for analyzing outcomes that have positive values. Poisson’s Linear Regression works really well where the input data is sparse i.e. results are valid for rare events. It can model rare events when everyone is followed for the same length of time, or when people have different length of follow ups. Implementing BT using Poisson’s Linear Regression Behavioral targeting can be effectively implemented using the Poisson’s Linear Regression algorithm, as it maps well to the nature of input data and the kind of predictions that organizations are looking at.
  • 8. Prognosis – An Approach to Predictive Analytics 8 The Algorithm is well explained by the flow chart: Impetus Technologies implemented Behavioral targeting using the Poisson’s Linear Regression algorithm. The algorithm was deployed using the Hadoop ecosystem. The entire algorithm was decomposed into individual steps. Each of the steps was implemented as a Hadoop M/R job and the jobs were run sequentially using the Oozie workflow engine. The results of the implementation were models for different categories. These models were stored on the HBase data store and later consumed for analytics and behavioral predictions.The steps involved in the above implementation are explained below: 1. Data Preparation In this preprocessing step, the data fields of interest were extracted from raw data feeds, thus reducing the size of the data.
  • 9. Prognosis – An Approach to Predictive Analytics 9 Raw data was related to user behavior with respect to one or more ads. It also included ad clicks, ad views, page views, searches, organic clicks or overture clicks. 1. The raw data came from the user base 2. The system stored the raw data in HDFS 3. The raw data was sent to the data preparation module which undertook the following: a. Aggregated event counts over a configurable period of time, to further shrink the data size b. Merged counts into a single entry with <cookie, time- period> as unique key c. It included two M/R jobs–Feature-Extractor and Feature- Generator 1.1 Feature-Extractor Input - Raw data feeds Output - <cookie:time-period:feature-Type:feature-Name, feature- Count> 1.2 Feature-Generator Input - <cookie: time-period: feature-Type: feature-Name, feature- Count> Output - <cookie: time-period, feature-Type: feature-Name, feature- Count ...> 2. Model Training This fitted the Linear Poisson Regression Model from the preprocessed data and involved the following: 1. Feature selection 2. Generating of training examples 3. Model weights initialization 4. Multiplicative recurrence to converge model weights 2.1 Poisson-Entity-Dictionary It mainly performed feature selection and inverted indexing. It did this by counting entity frequency in terms of touching cookies and selecting the most frequent entities in the given feature space. Output-Hashmap of <entityType:featureName, featureIndex>(inverted index) for all entity types
  • 10. Prognosis – An Approach to Predictive Analytics 10 An entity referred to the name (unique identifier) of an event (e.g. an ad id, a space Id for page, or a query). The Entity was different from the feature since the latter was uniquely identified by the <featureType, featureName> pair. In the context of BT, there were three types of entities—ad, page and search The Poisson entity dictionary included three M/R jobs— PoissonEntityUnit, PoissonEntitySum, and PoissonEntityHash 2.2 Poisson-Feature-Vector This generated training examples (feature vectors) that were directly used later by model initialization and multiplicative recurrence. It used a sparse data structure (populated primarily with zeros) for feature vectors. Behavioral count data is very sparse by nature. For a given user, in a given time period, his or her activity only involves a limited number of events. Impetus used a pair of arrays of the same length to represent a feature vector or a target vector—an Integer type for feature and float type for value (float type for possible decaying), with an array index giving a <feature, value> pair. Feature Selection and inverted indexing: - With the feature space selected from PoissonEntityDictionary, in this step, Impetus discarded the unselected events from the training data in the feature (input variable) side. On the target (response variable) side Impetus took the option of using all features or only selected features to categorize them into target event counts. With the inverted index built from PoissonEntityDictionary, from the PoissonFeatureVector step and onwards, Impetus referenced an original feature name by its index. The same idea was also applied to cookies, since the cookie field was irrelevant. Several pre-computations were performed at this stage: - 1. Impetus further aggregated feature counts into a time window, with a size larger than or equal to the resolution from data preparation. 2. Decay counts over time using a configurable factor 3. Realized causal approach to generate examples. (Causal approach collects features before targets temporarily; while the non-causal approach generates targets and features from the same period of history).
  • 11. Prognosis – An Approach to Predictive Analytics 11 4. Impetus used binary representation (serialized objects in java) and data compression (Sequence file with BLOCK compression in Hadoop framework) for feature vectors. Data structure for the feature vector  int[targetLength] targetIndex Array  float[targetLength] targetValue Array  int[inputLength] inputIndex Array  float[inputLength] inputValue Array Input - <cookie:timeperiod, featureType:featureName:featureCount ...> Output - <cookieIndex, featureVector> Target counts were collected from a sliding time window and feature counts aggregated (possibly with decay) from a time period preceding the target window. The size of the sliding window was kept relatively small for the following reasons: - 1. A large window effectively discarded many <features, targets>co-occurrences within that window. E.g. The following setup yielded superior long term models: - a. A target window of size one day b. Sliding over a one week period c. Preceded by a four week feature window(also sliding along with the target window) The Algorithm included the following: 1. For each cookie Impetus cached all the event count data. 2. It sorted events by time, forming an event stream of this particular cookie covering the entire time period of interest. 3. Impetus pre-computed boundaries of the sliding window. Four boundaries were specified — featureBegin, featureEnd, targetBegin, targetEnd. separatingfeatureEnd and targetBegin allowed a gap window in between, which was necessary to emulate possible latency in online prediction.
  • 12. Prognosis – An Approach to Predictive Analytics 12 4. The company maintained three iterators on the event stream, referencing previous featureBegin, current FeatureBegin, and targetBegin. It used one pair of treeMap objects (i.e. inputMap and targetMap) to hold features and targets of a feature vector as the data was being processed. 2.3 Poisson-Initializer It initialized the model weights (coefficients of the regressor’s) by scanning the training data once. k: Index of target variables j: Index of features or input variables i: examples a unigram(j) is one occurrence of feature j a bigram(k,j) is one co-occurrence of target k and feature j The basic idea was to allocate the weight w(k,j) as a normalized number of co-occurrences of (k,j).Bigram based initialization. The output of PoissonInitializer was an initialized weight matrix of dimensionality number of targets by number of features. 1. Impetus distributed the computation of counting the bigrams by a composite key<k,j> and effectively pre-computed total bigram counts of all examples before the final stage. 2. The M/R framework provided a single key data structure. In order to distribute <k,j>, Impetus needed an efficient function to transform a composite key(two integers) into a single key and recover the composite key back when needed. bigram Key(k,j) = a long integer obtained by bitwise left shift 32 bit of k and then bitwise OR by j 3. The Impetus team cached the output of first mapper that emitted <bigramKey, bigramCount>. 2.4 Poisson-Multiplicative It updated the model weights by scanning the training data iteratively. It utilized highly effective multiplicative recurrence. Computing a normalizer Poisson mean involved dot product a previous weight vector by a feature vector (The input portion) Input - <cookieIndex, featurevector> Output - updated wk for all k
  • 13. Prognosis – An Approach to Predictive Analytics 13 1. Impetus represented the model weight matrix as K dense weight vectors (arrays) of length J, where K was the number of targets and J the number of features. 2. Using weight vectors was more scalable in terms of memory footprint than matrix representation. But, it raised challenges in Disk IO. Impetus addressed this problem via in-memory caching. Caching weight vectors was not the solution. The trick was to cache input examples. After caching, Impetus maintained a hashmap that recorded all relevant targets for cached feature vectors. And provided constant time lookup from target Index to array-index Map<targetIndex, arrayIndex>. 3. Impetus also used Hadoop's distributed cache, which copied the requested files from HDFS to the slave nodes before the task was executed. It only copied the files once per job for each task tracker, which was shared by M/R tasks. 3. Model Evaluation It tested the trained model on a test data set. The main tasks were: 1. Predicting expected target counts(clicks and views) 2. Scoring (CTR) 3. Ranking scores of a test set 4. Calculating and reporting performance metrics such as CTR lift and area under ROC curve. This component contained three sequential steps: 3.1 Poisson-Feature-Vector-Eval It was Identical to Poisson-Feature-Vector. There was no need to book keep the summary statistics for training such as total count of examples, feature and target unigrams. Decay was typically necessary in generating test data. Since it enabled efficient incremental predicting as new events flow in, while diminishing the obsolete long history exponentially. Sampling and heuristic based robot filtering were not applied to generate test data Impetus could remove those examples without a target from the test dataset, since these records did not impact the performance, no matter how the model predicted them. However, examples with targets were also kept, even those without any inputs. This was because these records
  • 14. Prognosis – An Approach to Predictive Analytics 14 (‘new users’) had to be scored by the model in production and hence had a non-trivial impact on the performance. Impetus categorized target counts either from the entire feature space or from the selected space depending on the learning goal. The size of the sliding window was configured approximately the same as the ad serving cycle in production and the size of the gap window imitated the latency between last seen events and the next ad serving in production. 3.2 Poisson-Predictor Input - <cookieIndex, FeatureVector> Output - <cookieIndex, predictedActualtarget[2 x numTarget]> It took the dot product of a weight vector and a feature vector as the predicted target count (a continuous variable). To predict the expected number of ad clicks and views in all categories for an example I, the algorithm needed to read the weight vectors of all targets converged from Poisson-Multiplicative. 3.3 Poisson-Evaluator Input - <cookieIndex, predictedActualtarget[2xnumTarget]> Output - performance metrics, per category and overall reports It scored each testing example by dividing its predicted clicks by predicted views and applying Laplacian smoothing. It then sorted all examples by score and finally computed and reported the performance metrics. The performance metrics include: - The number of winning categories over certain benchmarks Cumulative CTR CTR lift Area under ROC curve Summary stats It generated reports of both in accordance with category results and overall performance.
  • 15. Prognosis – An Approach to Predictive Analytics 15 Summary As explained above, prediction is a statement made about the future. A very popular area of application that has flourished in recent times is Behavioral targeting (BT). BT is defined as a large scale machine learning problem that leverages historical user behavior to select the most relevant ads to display. The process basically involves mining historical data sets and extracting hidden patterns (trends) to predict user interests. Major IT giants like Yahoo, Google and Amazon have used Behavioral Targeting and achieved major gains in terms of reach and CTR increase. There are several implementations of BT that employ various statistical algorithms and processes to extract the behavioral traits of the users in question. The input to the BT engine is a historical sequence of the activities undertaken by users over the Internet. These activities include ad clicks, ad views, page views, search queries and search clicks. As the users browse the Internet they unknowingly leave a trail of footprints in terms of visited pages, ads, cookies, etc. These footprints reveal a lot about their personality traits. BT leverages on these subtle inputs and without hindering the privacy of the users draws their personality sketch. Based on these inferences, advertisers are able to target their audience and show them relevant ads. Impetus applied Poisson’s Linear Regression algorithm for its implementation. This was deployed on the Hadoop environment using chained Map reduce jobs as an Oozie workflow. About Impetus Impetus Technologies is a leading provider of Big Data solutions for the Fortune 500®. We help customers effectively manage the “3-Vs” of Big Data and create new business insights across their enterprises. Website: www.bigdata.impetus.com | Email: bigdata@impetus.com © 2013 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. May 2013