Open Creativity Scoring Tutorial

Open Creativity Scoring:
Freeware for Education Scientists
Denis Dumas, Peter Organisciak, & Selcuk Acar
European Association for Research on Learning and Instruction
Thessaloniki, Greece; August 25, 2023

Overview of Demonstration
• Background in creativity assessment
• Navigating the OCS
• Semantic distance models
Adults
Children
Elaboration
• Large language modeling
English
Other languages

Creativity
Assessment
Background

Why assess creativity?
• Creativity assessment is an important alternative to other more typical
forms of educational testing
• Correlates positively and moderately with other important measures
Intelligence, Math achievement, Reading comprehension, etc.
• But, highly creative students do not always score highly on other
measures
• So creativity assessment is a way for more students to demonstrate
their strengths and potential
• Meaningfully predicts creative activities and achievement in the real
world

Bottleneck in creativity assessment
• Some areas of psychology rely on self-report...but we can’t
People struggle to self-report their own creativity
• Some types of performance assessments use easy-to-score
multiple-choice items…but not ours
Creativity requires participants to generate their own responses
We collect open-ended and ill-structured responses from participants
Quantification of psychological attributes from these data typically require
human raters
• Psychological research on creativity has been slow and
expensive

Example of a common task
• The Alternate Uses Task is one common creativity assessment
• Tasks participants with generating multiple unusual ways to utilize a common everyday object
• Example stem: Think of as many creative uses for a PENCIL as you can in two minutes.
• Example responses from elementary students:
• Use rubber bands and make a slingshot
• Climb a building with it
• Style your hair and keep it there
• Break it down to mini pencils for your dolls
• Make a pencil castle
• Use them like chopsticks

Possible solution: Automatic scoring
• Since the late 1960’s researchers have been seeking ways to
automate the creativity assessment process using computers instead
of humans
Paulus & Renzulli, 1968 is the first attempt we know
But the use of punch cards meant that the process was not any faster than
using humans!
• In the last 15 years, more advanced computational models have
vastly improved this possibility
Kevin Dunbar and his students were the first to use text-mining to
operationalize originality as semantic distance (beginning in 2009)
Multiple labs joined this effort, and today many studies have used text-
mining-based originality scores

What is the OCS?
• Open Creativity Scoring is a cutting-edge freeware designed to automate the process of
scoring creativity assessment responses
Gets about 250 unique users a month currently
60% in the USA and 40% elsewhere
Has scores more than 300 thousand responses!
• Hosted on a server at the University of Denver
• Funded by our ongoing grant from the U.S. Department of Education Institute for
Education Sciences
• Continually being updated and improved
• Find it here: https://openscoring.du.edu/

Navigating the
Open Creativity
Scoring Website

OCS landing page
• A short description of creativity
assessment and the OCS
• Three main links:
• About
• Score with semantic models
• Score with AI

The ‘About’ tab on the OCS
• Gives information about our team and funding sources
• Describes the process of semantic distance calculation
• Has some information about large language modeling
• Gives some citation information

OCS: Two ways to score your data
•Semantic distance modeling is an unsupervised approach
using text-mining models for both children and adult
responses
•Scoring with AI is a supervised approach that uses large
language models (LLMs), which have been fine-tuned on
thousands of example responses along with human ratings
•We’re going to explain each of these functions separately
today

Scoring with
Semantic
Distance

Hoe does semantic distance work?
• Uses a corpus of billions of word co-occurrences to determine how
distant two terms are
• Represents terms as vectors within semantic space, and determines
their distance via the angle between the vector
• Semantic distance is a theoretically-based operationalization of the
originality of an idea
• Does not require supervision or training based on human ratings
• It correlates moderately (~.25) with human ratings of creativity
• Has been used in many studies of creativity

Semantic distance page
• Set up with example responses for a Hammer
• You can type in anything with the format:
• PROMPT, RESPONSE
• The ‘Submit’ button calculates your semantic distances
• ‘Export’ downloads your output as a spreadsheet for
later analysis
• If you have a lot of data, it is easier to upload your data
file directly
• Use the example file to help you format your file,
because a particular format is needed
• Two columns for prompts and responses

Semantic distance options
• ‘Stoplisting’ removes very common words from the
responses
• Common words tend to create noise in scoring
• ‘Term weighting’ augments the influence of rare words in
a response
• Rarer words tend to contain more information
• ‘Exclude target words’ deletes the prompt word (e.g.,
Hammer) from the response
• When participants repeat the prompt word, it lowers
the semantic distance
• ‘Normalize originality’ puts the semantic distance on an
easier to interpret 5-point scale
• Otherwise, the semantic distance scores are
decimals

Children’s and adults’ responses
• Default corpus is ‘Global Vectors for Word
Representation’, a very well-studied and massive
corpus meant to represent adults’ use of English
• But this drop-down menu can switch to a corpus
designed for kids, that was built and validated in our lab
• This ‘kids corpus’ is based on thousands of children’s
books, TV shows, and simple English Wikipedia.
• We recommend it for children under 13, or for any
participant not yet engaging with language at an adult
level
• Changing the corpus changes the scores to varying
amounts and in different directions, depending on the
response

Elaboration
• How detailed and wordy a response is (i.e., Elaboration)
can change its score on the OCS
• So, calculate elaboration for each response, and consider it
as a covariate in your later analysis
• Four ways to calculate:
• Whitespace is a simple word count
• ‘Idf’ is ‘inverse document frequency’, which counts the
words and then up-weights the rarer words and down-
weights the common words
• Stoplist counts all the words except for the very
common ones
• ‘Pos’ is ‘part of speech’: it only counts the nouns and
verbs and excludes others
• We recommend IDF-based elaboration scores

Scoring with
Artificial
Intelligence
(Ocsai)

How does Ocsai work?
• Built on the Generative Pre-trained Transformer (GPT)-3 and a text-to-text transformer
model (T5).
• Fine-tuned and specifically trained to score the originality of responses based on tens
of thousands of responses and human ratings
Labs all around the world donated their data for this training
• Our team pays the company that runs GPT-3 a small amount for every response that
Ocsai scores
• Correlates very strongly (>.80) with human ratings of originality
• Can be used alone as a replacement for human raters, or as an additional rater

Ocsai page
• Setup the same way as for the semantic distance
scoring
• The example prompt is ‘PANTS’, but you can type
anything in
• As before, uploading a file is also available
• Options (e.g., Stoplisting) for processing the text are no
longer needed
• Elaboration scores are less relevant, but whitespace
word counts are available
• Originality scores are given on a 5-point scale

Ocsai models
• GPT-3 is available in multiple versions based on the size
and complexity of the model
• The models are named after famous thinkers (Ada,
Babbage, Curie, and Da Vinci)
• In alphabetical order, the models get bigger and more
expensive to use
• We make the A, B, and C models available for free
• If you want to use Da Vinci, let us know…and maybe chip
in ;-)

Ocsai in other languages
• Semantic distance models on the OCS are built for English
responses only
• But Ocsai can handle responses in other languages
• Aleksandra Zielinska and Maciej Karwowski have demonstrated
that it works for Polish
See example responses for a BRICK (in Polish: Cegła)
• Either use a simple translation tool to initially translate the
responses and then import them into Ocsai, or put them in
directly without translation
• Correlations over .80 with human raters, even with untranslated
responses

Ocsai via API
• API connections allow you to integrate the OCS into
whatever other program you use for analysis and data
collection (e.g., Qualtrics, R)
• Can allow for instant feedback to students about the
creativity of their ideas
• Thanks to Pier Luc de Chantal for supporting the API
• If you want to use the API go to:
https://openscoring.du.edu/docs
Or email us for a little explanation

Ocsai: Always getting better
• As our research continues, we keep updating the OCS, and especially
Ocsai
• If you’re interested in this work, please consider using OCS for your own
work
• And also consider donating some data to augment our training dataset,
which will continue to improve Ocsai’s reliability and validity
• Currently Ocsai is being used with both children and adults, as well as for
additional types of creativity assessments
• Help us make it better and stay tuned for more features!

Thank you very much.
Questions?

Open Creativity Scoring Tutorial

Recommended

Recommended

More Related Content

Similar to Open Creativity Scoring Tutorial

Similar to Open Creativity Scoring Tutorial (20)

Recently uploaded

Recently uploaded (20)

Open Creativity Scoring Tutorial