SlideShare a Scribd company logo
1 of 26
Open Creativity Scoring:
Freeware for Education Scientists
Denis Dumas, Peter Organisciak, & Selcuk Acar
European Association for Research on Learning and Instruction
Thessaloniki, Greece; August 25, 2023
Overview of Demonstration
• Background in creativity assessment
• Navigating the OCS
• Semantic distance models
Adults
Children
Elaboration
• Large language modeling
English
Other languages
Creativity
Assessment
Background
Why assess creativity?
• Creativity assessment is an important alternative to other more typical
forms of educational testing
• Correlates positively and moderately with other important measures
Intelligence, Math achievement, Reading comprehension, etc.
• But, highly creative students do not always score highly on other
measures
• So creativity assessment is a way for more students to demonstrate
their strengths and potential
• Meaningfully predicts creative activities and achievement in the real
world
Bottleneck in creativity assessment
• Some areas of psychology rely on self-report...but we can’t
People struggle to self-report their own creativity
• Some types of performance assessments use easy-to-score
multiple-choice items…but not ours
Creativity requires participants to generate their own responses
We collect open-ended and ill-structured responses from participants
Quantification of psychological attributes from these data typically require
human raters
• Psychological research on creativity has been slow and
expensive
Example of a common task
• The Alternate Uses Task is one common creativity assessment
• Tasks participants with generating multiple unusual ways to utilize a common everyday object
• Example stem: Think of as many creative uses for a PENCIL as you can in two minutes.
• Example responses from elementary students:
• Use rubber bands and make a slingshot
• Climb a building with it
• Style your hair and keep it there
• Break it down to mini pencils for your dolls
• Make a pencil castle
• Use them like chopsticks
Possible solution: Automatic scoring
• Since the late 1960’s researchers have been seeking ways to
automate the creativity assessment process using computers instead
of humans
Paulus & Renzulli, 1968 is the first attempt we know
But the use of punch cards meant that the process was not any faster than
using humans!
• In the last 15 years, more advanced computational models have
vastly improved this possibility
Kevin Dunbar and his students were the first to use text-mining to
operationalize originality as semantic distance (beginning in 2009)
Multiple labs joined this effort, and today many studies have used text-
mining-based originality scores
What is the OCS?
• Open Creativity Scoring is a cutting-edge freeware designed to automate the process of
scoring creativity assessment responses
Gets about 250 unique users a month currently
60% in the USA and 40% elsewhere
Has scores more than 300 thousand responses!
• Hosted on a server at the University of Denver
• Funded by our ongoing grant from the U.S. Department of Education Institute for
Education Sciences
• Continually being updated and improved
• Find it here: https://openscoring.du.edu/
Navigating the
Open Creativity
Scoring Website
OCS landing page
• A short description of creativity
assessment and the OCS
• Three main links:
• About
• Score with semantic models
• Score with AI
The ‘About’ tab on the OCS
• Gives information about our team and funding sources
• Describes the process of semantic distance calculation
• Has some information about large language modeling
• Gives some citation information
OCS: Two ways to score your data
•Semantic distance modeling is an unsupervised approach
using text-mining models for both children and adult
responses
•Scoring with AI is a supervised approach that uses large
language models (LLMs), which have been fine-tuned on
thousands of example responses along with human ratings
•We’re going to explain each of these functions separately
today
Scoring with
Semantic
Distance
Hoe does semantic distance work?
• Uses a corpus of billions of word co-occurrences to determine how
distant two terms are
• Represents terms as vectors within semantic space, and determines
their distance via the angle between the vector
• Semantic distance is a theoretically-based operationalization of the
originality of an idea
• Does not require supervision or training based on human ratings
• It correlates moderately (~.25) with human ratings of creativity
• Has been used in many studies of creativity
Semantic distance page
• Set up with example responses for a Hammer
• You can type in anything with the format:
• PROMPT, RESPONSE
• The ‘Submit’ button calculates your semantic distances
• ‘Export’ downloads your output as a spreadsheet for
later analysis
• If you have a lot of data, it is easier to upload your data
file directly
• Use the example file to help you format your file,
because a particular format is needed
• Two columns for prompts and responses
Semantic distance options
• ‘Stoplisting’ removes very common words from the
responses
• Common words tend to create noise in scoring
• ‘Term weighting’ augments the influence of rare words in
a response
• Rarer words tend to contain more information
• ‘Exclude target words’ deletes the prompt word (e.g.,
Hammer) from the response
• When participants repeat the prompt word, it lowers
the semantic distance
• ‘Normalize originality’ puts the semantic distance on an
easier to interpret 5-point scale
• Otherwise, the semantic distance scores are
decimals
Children’s and adults’ responses
• Default corpus is ‘Global Vectors for Word
Representation’, a very well-studied and massive
corpus meant to represent adults’ use of English
• But this drop-down menu can switch to a corpus
designed for kids, that was built and validated in our lab
• This ‘kids corpus’ is based on thousands of children’s
books, TV shows, and simple English Wikipedia.
• We recommend it for children under 13, or for any
participant not yet engaging with language at an adult
level
• Changing the corpus changes the scores to varying
amounts and in different directions, depending on the
response
Elaboration
• How detailed and wordy a response is (i.e., Elaboration)
can change its score on the OCS
• So, calculate elaboration for each response, and consider it
as a covariate in your later analysis
• Four ways to calculate:
• Whitespace is a simple word count
• ‘Idf’ is ‘inverse document frequency’, which counts the
words and then up-weights the rarer words and down-
weights the common words
• Stoplist counts all the words except for the very
common ones
• ‘Pos’ is ‘part of speech’: it only counts the nouns and
verbs and excludes others
• We recommend IDF-based elaboration scores
Scoring with
Artificial
Intelligence
(Ocsai)
How does Ocsai work?
• Built on the Generative Pre-trained Transformer (GPT)-3 and a text-to-text transformer
model (T5).
• Fine-tuned and specifically trained to score the originality of responses based on tens
of thousands of responses and human ratings
Labs all around the world donated their data for this training
• Our team pays the company that runs GPT-3 a small amount for every response that
Ocsai scores
• Correlates very strongly (>.80) with human ratings of originality
• Can be used alone as a replacement for human raters, or as an additional rater
Ocsai page
• Setup the same way as for the semantic distance
scoring
• The example prompt is ‘PANTS’, but you can type
anything in
• As before, uploading a file is also available
• Options (e.g., Stoplisting) for processing the text are no
longer needed
• Elaboration scores are less relevant, but whitespace
word counts are available
• Originality scores are given on a 5-point scale
Ocsai models
• GPT-3 is available in multiple versions based on the size
and complexity of the model
• The models are named after famous thinkers (Ada,
Babbage, Curie, and Da Vinci)
• In alphabetical order, the models get bigger and more
expensive to use
• We make the A, B, and C models available for free
• If you want to use Da Vinci, let us know…and maybe chip
in ;-)
Ocsai in other languages
• Semantic distance models on the OCS are built for English
responses only
• But Ocsai can handle responses in other languages
• Aleksandra Zielinska and Maciej Karwowski have demonstrated
that it works for Polish
See example responses for a BRICK (in Polish: Cegła)
• Either use a simple translation tool to initially translate the
responses and then import them into Ocsai, or put them in
directly without translation
• Correlations over .80 with human raters, even with untranslated
responses
Ocsai via API
• API connections allow you to integrate the OCS into
whatever other program you use for analysis and data
collection (e.g., Qualtrics, R)
• Can allow for instant feedback to students about the
creativity of their ideas
• Thanks to Pier Luc de Chantal for supporting the API
• If you want to use the API go to:
https://openscoring.du.edu/docs
Or email us for a little explanation
Ocsai: Always getting better
• As our research continues, we keep updating the OCS, and especially
Ocsai
• If you’re interested in this work, please consider using OCS for your own
work
• And also consider donating some data to augment our training dataset,
which will continue to improve Ocsai’s reliability and validity
• Currently Ocsai is being used with both children and adults, as well as for
additional types of creativity assessments
• Help us make it better and stay tuned for more features!
Thank you very much.
Questions?

More Related Content

Similar to Open Creativity Scoring Tutorial

2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningAndré Karpištšenko
 
Project BUILD Workshop Presentation
Project BUILD Workshop PresentationProject BUILD Workshop Presentation
Project BUILD Workshop PresentationNCIL - STAR_Net
 
Design, Create, Evaluate Process (1).pptx
Design, Create, Evaluate Process (1).pptxDesign, Create, Evaluate Process (1).pptx
Design, Create, Evaluate Process (1).pptxLe Hung
 
Session 5 - Evaluation and Useability for elearning
Session 5 - Evaluation and Useability for elearningSession 5 - Evaluation and Useability for elearning
Session 5 - Evaluation and Useability for elearningYum Studio
 
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community MeetingTalk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community MeetingMark Guzdial
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
LSE SADL workshop 3 2015
LSE SADL workshop 3 2015LSE SADL workshop 3 2015
LSE SADL workshop 3 2015LSESADL
 
How children learn software testing
How children learn software testingHow children learn software testing
How children learn software testingKari Kakkonen
 
Creating Simple Web Text for People with Intellectual Disabilities and to Tra...
Creating Simple Web Text for People with Intellectual Disabilities and to Tra...Creating Simple Web Text for People with Intellectual Disabilities and to Tra...
Creating Simple Web Text for People with Intellectual Disabilities and to Tra...John Rochford
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)Nicolas Van Labeke
 
presentationSkills.pdf
presentationSkills.pdfpresentationSkills.pdf
presentationSkills.pdfDiptakkundu
 
Teaching Educational Research Methods: Making it Real & Relevant for Students
Teaching Educational Research Methods: Making it Real & Relevant for StudentsTeaching Educational Research Methods: Making it Real & Relevant for Students
Teaching Educational Research Methods: Making it Real & Relevant for StudentsSAGE Publishing
 
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)ALATechSource
 

Similar to Open Creativity Scoring Tutorial (20)

Storytelling
StorytellingStorytelling
Storytelling
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Lingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language LearningLingvist - Statistical Methods in Language Learning
Lingvist - Statistical Methods in Language Learning
 
Principles of assessment
Principles of assessmentPrinciples of assessment
Principles of assessment
 
Project BUILD Workshop Presentation
Project BUILD Workshop PresentationProject BUILD Workshop Presentation
Project BUILD Workshop Presentation
 
Design, Create, Evaluate Process (1).pptx
Design, Create, Evaluate Process (1).pptxDesign, Create, Evaluate Process (1).pptx
Design, Create, Evaluate Process (1).pptx
 
Session 5 - Evaluation and Useability for elearning
Session 5 - Evaluation and Useability for elearningSession 5 - Evaluation and Useability for elearning
Session 5 - Evaluation and Useability for elearning
 
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community MeetingTalk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
Talk on Ebooks at the NSF BPC/CE21/STEM-C Community Meeting
 
Navigating The World of E-Learning
Navigating The World of E-LearningNavigating The World of E-Learning
Navigating The World of E-Learning
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
AILD Full Deck
AILD Full DeckAILD Full Deck
AILD Full Deck
 
LSE SADL workshop 3 2015
LSE SADL workshop 3 2015LSE SADL workshop 3 2015
LSE SADL workshop 3 2015
 
Embracing AI in new forms of assessment
Embracing AI in new forms of assessmentEmbracing AI in new forms of assessment
Embracing AI in new forms of assessment
 
How children learn software testing
How children learn software testingHow children learn software testing
How children learn software testing
 
Creating Simple Web Text for People with Intellectual Disabilities and to Tra...
Creating Simple Web Text for People with Intellectual Disabilities and to Tra...Creating Simple Web Text for People with Intellectual Disabilities and to Tra...
Creating Simple Web Text for People with Intellectual Disabilities and to Tra...
 
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
OpenEssayist: Extractive Summarisation and Formative Assessment (DCLA13)
 
presentationSkills.pdf
presentationSkills.pdfpresentationSkills.pdf
presentationSkills.pdf
 
Blended e Assessment
Blended e AssessmentBlended e Assessment
Blended e Assessment
 
Teaching Educational Research Methods: Making it Real & Relevant for Students
Teaching Educational Research Methods: Making it Real & Relevant for StudentsTeaching Educational Research Methods: Making it Real & Relevant for Students
Teaching Educational Research Methods: Making it Real & Relevant for Students
 
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
Using Surveys to Improve Your Library: Part 1 (Sept. 2018)
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Open Creativity Scoring Tutorial

  • 1. Open Creativity Scoring: Freeware for Education Scientists Denis Dumas, Peter Organisciak, & Selcuk Acar European Association for Research on Learning and Instruction Thessaloniki, Greece; August 25, 2023
  • 2. Overview of Demonstration • Background in creativity assessment • Navigating the OCS • Semantic distance models Adults Children Elaboration • Large language modeling English Other languages
  • 4. Why assess creativity? • Creativity assessment is an important alternative to other more typical forms of educational testing • Correlates positively and moderately with other important measures Intelligence, Math achievement, Reading comprehension, etc. • But, highly creative students do not always score highly on other measures • So creativity assessment is a way for more students to demonstrate their strengths and potential • Meaningfully predicts creative activities and achievement in the real world
  • 5. Bottleneck in creativity assessment • Some areas of psychology rely on self-report...but we can’t People struggle to self-report their own creativity • Some types of performance assessments use easy-to-score multiple-choice items…but not ours Creativity requires participants to generate their own responses We collect open-ended and ill-structured responses from participants Quantification of psychological attributes from these data typically require human raters • Psychological research on creativity has been slow and expensive
  • 6. Example of a common task • The Alternate Uses Task is one common creativity assessment • Tasks participants with generating multiple unusual ways to utilize a common everyday object • Example stem: Think of as many creative uses for a PENCIL as you can in two minutes. • Example responses from elementary students: • Use rubber bands and make a slingshot • Climb a building with it • Style your hair and keep it there • Break it down to mini pencils for your dolls • Make a pencil castle • Use them like chopsticks
  • 7. Possible solution: Automatic scoring • Since the late 1960’s researchers have been seeking ways to automate the creativity assessment process using computers instead of humans Paulus & Renzulli, 1968 is the first attempt we know But the use of punch cards meant that the process was not any faster than using humans! • In the last 15 years, more advanced computational models have vastly improved this possibility Kevin Dunbar and his students were the first to use text-mining to operationalize originality as semantic distance (beginning in 2009) Multiple labs joined this effort, and today many studies have used text- mining-based originality scores
  • 8. What is the OCS? • Open Creativity Scoring is a cutting-edge freeware designed to automate the process of scoring creativity assessment responses Gets about 250 unique users a month currently 60% in the USA and 40% elsewhere Has scores more than 300 thousand responses! • Hosted on a server at the University of Denver • Funded by our ongoing grant from the U.S. Department of Education Institute for Education Sciences • Continually being updated and improved • Find it here: https://openscoring.du.edu/
  • 10. OCS landing page • A short description of creativity assessment and the OCS • Three main links: • About • Score with semantic models • Score with AI
  • 11. The ‘About’ tab on the OCS • Gives information about our team and funding sources • Describes the process of semantic distance calculation • Has some information about large language modeling • Gives some citation information
  • 12. OCS: Two ways to score your data •Semantic distance modeling is an unsupervised approach using text-mining models for both children and adult responses •Scoring with AI is a supervised approach that uses large language models (LLMs), which have been fine-tuned on thousands of example responses along with human ratings •We’re going to explain each of these functions separately today
  • 14. Hoe does semantic distance work? • Uses a corpus of billions of word co-occurrences to determine how distant two terms are • Represents terms as vectors within semantic space, and determines their distance via the angle between the vector • Semantic distance is a theoretically-based operationalization of the originality of an idea • Does not require supervision or training based on human ratings • It correlates moderately (~.25) with human ratings of creativity • Has been used in many studies of creativity
  • 15. Semantic distance page • Set up with example responses for a Hammer • You can type in anything with the format: • PROMPT, RESPONSE • The ‘Submit’ button calculates your semantic distances • ‘Export’ downloads your output as a spreadsheet for later analysis • If you have a lot of data, it is easier to upload your data file directly • Use the example file to help you format your file, because a particular format is needed • Two columns for prompts and responses
  • 16. Semantic distance options • ‘Stoplisting’ removes very common words from the responses • Common words tend to create noise in scoring • ‘Term weighting’ augments the influence of rare words in a response • Rarer words tend to contain more information • ‘Exclude target words’ deletes the prompt word (e.g., Hammer) from the response • When participants repeat the prompt word, it lowers the semantic distance • ‘Normalize originality’ puts the semantic distance on an easier to interpret 5-point scale • Otherwise, the semantic distance scores are decimals
  • 17. Children’s and adults’ responses • Default corpus is ‘Global Vectors for Word Representation’, a very well-studied and massive corpus meant to represent adults’ use of English • But this drop-down menu can switch to a corpus designed for kids, that was built and validated in our lab • This ‘kids corpus’ is based on thousands of children’s books, TV shows, and simple English Wikipedia. • We recommend it for children under 13, or for any participant not yet engaging with language at an adult level • Changing the corpus changes the scores to varying amounts and in different directions, depending on the response
  • 18. Elaboration • How detailed and wordy a response is (i.e., Elaboration) can change its score on the OCS • So, calculate elaboration for each response, and consider it as a covariate in your later analysis • Four ways to calculate: • Whitespace is a simple word count • ‘Idf’ is ‘inverse document frequency’, which counts the words and then up-weights the rarer words and down- weights the common words • Stoplist counts all the words except for the very common ones • ‘Pos’ is ‘part of speech’: it only counts the nouns and verbs and excludes others • We recommend IDF-based elaboration scores
  • 20. How does Ocsai work? • Built on the Generative Pre-trained Transformer (GPT)-3 and a text-to-text transformer model (T5). • Fine-tuned and specifically trained to score the originality of responses based on tens of thousands of responses and human ratings Labs all around the world donated their data for this training • Our team pays the company that runs GPT-3 a small amount for every response that Ocsai scores • Correlates very strongly (>.80) with human ratings of originality • Can be used alone as a replacement for human raters, or as an additional rater
  • 21. Ocsai page • Setup the same way as for the semantic distance scoring • The example prompt is ‘PANTS’, but you can type anything in • As before, uploading a file is also available • Options (e.g., Stoplisting) for processing the text are no longer needed • Elaboration scores are less relevant, but whitespace word counts are available • Originality scores are given on a 5-point scale
  • 22. Ocsai models • GPT-3 is available in multiple versions based on the size and complexity of the model • The models are named after famous thinkers (Ada, Babbage, Curie, and Da Vinci) • In alphabetical order, the models get bigger and more expensive to use • We make the A, B, and C models available for free • If you want to use Da Vinci, let us know…and maybe chip in ;-)
  • 23. Ocsai in other languages • Semantic distance models on the OCS are built for English responses only • But Ocsai can handle responses in other languages • Aleksandra Zielinska and Maciej Karwowski have demonstrated that it works for Polish See example responses for a BRICK (in Polish: Cegła) • Either use a simple translation tool to initially translate the responses and then import them into Ocsai, or put them in directly without translation • Correlations over .80 with human raters, even with untranslated responses
  • 24. Ocsai via API • API connections allow you to integrate the OCS into whatever other program you use for analysis and data collection (e.g., Qualtrics, R) • Can allow for instant feedback to students about the creativity of their ideas • Thanks to Pier Luc de Chantal for supporting the API • If you want to use the API go to: https://openscoring.du.edu/docs Or email us for a little explanation
  • 25. Ocsai: Always getting better • As our research continues, we keep updating the OCS, and especially Ocsai • If you’re interested in this work, please consider using OCS for your own work • And also consider donating some data to augment our training dataset, which will continue to improve Ocsai’s reliability and validity • Currently Ocsai is being used with both children and adults, as well as for additional types of creativity assessments • Help us make it better and stay tuned for more features!
  • 26. Thank you very much. Questions?