This document outlines the approach taken by Proved.co to calculate scores and metrics for concepts. It involves weighting survey responses to match census demographics, calculating raw scores based on relevance, word of mouth, value for money and uniqueness, and normalizing the raw scores on a 0-100 scale against benchmarks from over 1,000 previous concepts. The normalized scores and metrics show how well a concept performed compared to other concepts tested.
2. Introduction
For each concept proved.co calculates:
— Score to show overall concept performance
— Five key metrics to show concept’s strengths,
weaknesses and areas for improvement
This document outlines proved.co approach
to score & metrics calculation, as well as
background and framework behind.
2
3. Architecture
3
Survey distribution
DB
Questionnaire
Analytics
DB
Dashboard
Collection
Computation
Front-end Back-end
Visualization
4. Computation
4
Weighting
RIM-procedure
Individual weights
for each respondent
to fit age and
gender proportions
with census data
Calculation
Raw scores,
i.e. direct results
of stat formulae
application on
questionnaire
data collected
Normalization
Normalized
scores, i.e. raw
scored rescaled
to 0..100 using
benchmarks
Proved
Scores &
Metrics
6. Weighting
— Two target variables:
— Gender: males and females (2 targets)
— Age: 18-29, 30-49, 50+ (4 targets)
— Random Iterative Method (RIM):
— Data is weighted by gender. Gender weighting factors are
calculated and applied (Iteration 1)
— Data is weighted by age. Age weighting factors are
multiplied by gender’s (Iteration 1’s) and applied (Iteration 2)
— Data is re-weighted by gender. Gender weighting factors
are multiplied by iteration 2’s and applied (Iteration 3).
— Iterations continue until all targets are met or precision
does not change more that by 1% while weight factors are
within [0.25; 4] limits.
6
7. Weighting targets
7
United Kingdom USA
Males 49%
Females 52%
18-29YO 17%
30-49YO 38%
50+ YO 47%
Males 49%
Females 52%
18-29YO 23%
30-49YO 36%
50+ YO 43%
Based on 2012 census data Based on 2011 census data
8. Special notes
Weighting does not apply for:
— Self-service plans, i.e. samples from client’s
contact lists and river samples
— Audiences which target outside age and
gender, i.e. moms or car owners
8
10. Framework
— Proved.co is a project of Bojole (UK) Ltd, a
traditional market research company with
eight years of concept testing expierence
— At the moment of proved.co development
Bojole had norms for 1228 concept tests:
— Raw data, i.e. more than 250 thousands of
completed questionnaires
— Corroboration data, i.e. post-tests, ranking
data, and instrumental variables
10
11. Corroboration
Post%tests'
Ranking'data'
Instrumental'
variables'
11
Market data for launched concepts
Available for limited number of concepts
The most reliable corroboration
Max-diff ratings for sets of 30-90 concepts
Available for 1116 concepts
Quite reliable corroboration
Overall liking, purchase intent, etc
Available for all 1228 concepts
Questionable reliability
12. Score modeling
— Bojole has decided to develop a single score which best
represents overall performance of a concept under test
— Bojole used iterative regression modeling to determine:
— Variables to include into score calculation
— Score formulae
12
Corroboration
variables
Relevance
Uniqueness
Word of
mouth
…
All available
scaled diagnostic
variables
13. Score modeling
— The following set of variables and formulae
coefficients have been determined:
13
Concept relevance High impact
Concept’s word of mouth Mid impact
Concept’s value for money Mid impact
Concept uniqueness Low impact
14. Raw score calculations
On individual level
— Sum of weighted
— Concept relevance,
— Word of mouth,
— Value for money
— Uniqueness
— Weighting coefficients
reflect score modeling
described above
On aggregate level
— Weighted average of
individual scores
— Weights reflect fit to age/
gender proportions of
target population
14
16. Framework
— We believe any concept test results to be
useful only in context, i.e. against
benchmarks
— Thus, we normalize raw score and each
raw metric to the scale 0..100 representing
its performance against benchmarks
— A little extra benefit — 0..100 scores are
easier to read and compare between
16
17. Benchmarks
— We store for each idea:
— Description and sample
— All calculated raw metrics and scores,
i.e. raw benchmarks
— All normalized metrics and scores,
i.e. scaled benchmarks
— List is updated with each new computation
17
18. Benchmarks for idea score
18
Distribution of raw idea
scores is close to normal one
and thus can be used
for sensible 0..100 scaling
19. Normalization
19
— First, we calculate average (avg) and standard deviation
(sdev) for a distribution of all raw benchmarks
— Then we calculate normalized value,
measuring deviation of raw score / metric (rvalue) against
average (avg) in standard deviations (sdev).
— Normalized score shows how given concept
(or its metric) benchmarks against whole distribution of
other concepts in our database.