Your SlideShare is downloading. ×
Pointlogic Analysis Data Fusion
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Pointlogic Analysis Data Fusion

485
views

Published on

Summary …

Summary

A combined research, with information on both print reach and internet reach, could be created by setting up a research that contains information
about print reach as well as internet reach. Nevertheless, this is not cost efficient. An alternative method is to complement the print research with
information about the internet reach. This is done by a mathematical technique that uses overlapping information, i.e. information from both
analyses. This technique is called Data fusion.

Data fusion combines the information of two analyses by using overlapping information. We used one of the data fusion techniques for generating
combined print and internet data. This data fusion technique is related to a well-known statistical technique named “Imputation”. Imputation is used for
complementing data in a dataset. Basically, the print research can be seen as a research that misses some data.

The data fusion method consists of two sequential steps. At first, econometric models need to be estimated, based on the respondent data of
the internet research. Secondly, these models need to be applied on the respondents of the print research. Since the dataset contains a large
amount of websites, both steps are accomplished fully automatically.

Published in: Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
485
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Pointlogic White Paper Data Fusion: Combining Multiple Analysis Susanne Hartog-Buijtenhek enabling smart decisions www.pointlogic.com
  • 2. 2 Data Fusion Preface Nowadays, a lot of money is spent on advertisement on a yearly basis. For advertisers it is important to know what the pay-off of their advertisement will be. Therefore, it is important to know how many people will see the advertisement (or: how many people will be reached). Several respondent researches are available to fulfill this need for information. For example, the reach of magazines and newspapers is measured by print researches. In a print research, a so-called ‘reading probability’ is available for every respondent. This ‘reading probability’ serves as an indicator in computing the reach. In contrast, the reach of websites is measured by an internet research that tracks the behavior of internet respondents. The results are used to compute the probability that respondents visit a certain website in a certain period. The resulting data is published by independent agencies and serves as the currency in the market. Advertisers show an increasing demand for combined reach figures. This is a result from the increased use of several media in a single advertisement campaign. Moreover, publishers of print media often have an accompanying website. Hence, the question is: who is reached by both an advertisement in a magazine/paper as well as an advertisement on the internet? Data fusion A combined research, with information on both print reach and internet reach, could be created by setting up a research that contains information about print reach as well as internet reach. Nevertheless, this is not cost- efficient. An alternative method is to complement the print research with information about the internet reach. This is done by a mathematical technique that uses overlapping information, i.e. information from both analyses. This technique is called Data fusion. Data fusion combines the information of two analyses by using overlapping information. We used one of the data fusion techniques for generating combined print and internet data. This data fusion technique is related to a well-known statistical technique named “Imputation”. Imputation is used for complementing data in a dataset. Basically, the print research can be seen as a research that misses some data. The data fusion method consists of two sequential steps. At first, econometric models need to be estimated, based on the respondent data of the internet research. Secondly, these models need to be applied on the respondents of the print research. Since the dataset contains a large amount of websites, both steps are accomplished fully automatically. enabling smart decisions www.pointlogic.com
  • 3. 3 Data Fusion The estimation of models The first step in data fusion is estimating econometric models that explain the internet behavior of respondents in the internet analysis. The estimation of models based on internet data is not really a straightforward process. The data of the reach of websites namely have a specific character. For most of the websites, respondents have a reading probability equal to zero. Regarding the people with a reading probability greater than zero, a substantial amount still has a probability almost equal to zero (or: very small). This structure impedes the use of a standard regression model and hence a more sophisticated model has to be chosen. The overlapping information can be used for the explanatory variables. However, there is also a high mutual correlation between the visiting probabilities and the websites. If only the overlapping information is taken into account, the correlation between the websites is ignored. By taking the websites into account as an explanatory variable, the correlation between the websites can be included. The challenge in this method lies in the application of the models. When applying a model for a website, based on the respondent data of a print research, it is still lacking information concerning other websites. The answer to this problem is an iterative technique named Gibbs’ Sampler, which will be discussed later on. The models need to be estimated for over 300 websites, each having different characteristics. Considering the extensive amount of characteristics, this will not be done manually. Therefore, we have developed a self-evident estimation procedure, which makes a selection of interesting explanatory variables per website, based on the underlying correlation and the underlying partial correlation. Applying models After the estimation of the models, the models have to be applied. Before actually applying the models, a starting-value is created for every respondent for every website. This starting-value forms the basic principle for the Gibbs’ Sampler implementation method. The models of the internet research have other websites as explanatory variables too. By initiating a starting-value, websites can be used as explanatory variables in the implementation process. Subsequently, the implementation of the models occurs iteratively. The initialized starting- value changes during every iteration, which is being carried through in the model. For convergence issues, it is important that the amount of websites included in the model is limited. By applying the models, one could choose to impute the expected value per respondent. But, in order to maintain the variance in visiting probabilities, a better alternative is the imputation of a-select drawings from the probability distribution for each respondent based on the models. enabling smart decisions www.pointlogic.com
  • 4. 4 Data Fusion Results The results of the data fusion technique have been extensively validated. This was possible, since a section of respondents were present in both researches1. The presence of both the true values and the imputed values for these respondents generates an unbiased way to validate the model’s results, which were remarkably positive. Qualitative validations are at least as important as quantitative validations. From a mathematical point of view, the results contain the average reach as well as the variance of the unbiased estimators. The comparison of the overlapping respondents provides another validation for the results of the used method. Unfortunately, this does not automatically mean the analysis is accepted by market. Other validations are necessary for common acceptance. Two executed validations are the judgment of the model’s used significant explanatory variables and the final combined scope data. The used variables simply need to be ‘logical’. However, the most important thing is whether the final overlap is being recognized and experienced as logical by the publishers. Some publishers strive to make the overlap as small as possible and therefore attract a different public. Others have the goal to have the overlap as large as possible, which is realized by, for example, placing a reference to a website in a magazine. If the final overlap is recognized, is an important part of the acceptance and hence the validation. Future To conclude, the column-wise data fusion method provides very good results for combined print-internet reach. This method will be, based on principles of new print and internet analyses, used semi-annually in order to generate a combined analysis file. These results make it obviously desirable to test the methodology on other analyses. By doing this, the method can quite easy be extended by new model formulations, which can then be used in determining combined reach with other media. 1 Both analyses come from the same agency. enabling smart decisions www.pointlogic.com
  • 5. 5 Data Fusion About Pointlogic | enabling smart decisions Founded in 1992 by Peter Kloprogge and Sjoerd Mostert - with offices in New York, London, Frankfurt, Sydney, Amsterdam, and Rotterdam - Pointlogic combines cutting-edge research, advanced mathematical modeling, and flexible software tools to enable our clients to make smart decisions. Pointlogic works together with clients, applying fresh, analytical thinking to problems. We then use powerful mathematical modeling to generate insight into clients’ choices. And then, most importantly, we deliver concrete, software-based solutions that clients can both implement and distribute across internal and partner networks. For more information about any of Pointlogic’s products or for press inquiries please contact Nicole Alexander: Office: 212-683-2330 E-Mail: alexander@pointlogic.com enabling smart decisions www.pointlogic.com