2019 CC Global Summit
Creative Commons Search team
Led by Kriti Godey, Director of Engineering
Jane Park, Director of Product & Research
State of CC Search
"First Image of a Black Hole" by European Southern Observatory is licensed under CC BY 2.0
Agenda
‣ Current state of CC Search and team (6 members!)
‣ Vision & Strategy
‣ How it all works
‣ What’s next + how you can get involved
‣ Demo
‣ Q&A and Discussion
Current State of CC Search
• 300 million images, 19 providers
• Images of: art objects, graphic art & designs, flowers, science, a lot of
(but not everything) on Flickr, initial set of CC0 3D designs
• “One-click” attribution tools: rich text, html embed
• Redesign: cleaner home page, improved navigation & filters
• User feedback and usage analytics
CC Search Vision in 2016
“to build a ‘front door’ to the Commons with the ultimate goal to find and
index all 1.1 billion CC licensed works on the web”
CC Search Vision in 2019
“CC Search is a leading tool for creators looking to discover and reuse free
resources with greater ease and confidence”
KRITI GODEY
Director of Engineering
CC Search Team
SOPHINE CLACHAR
Data Engineer
CC Search Team
ALDEN PAGE
Software Engineer
CC Search Team
BRENO FERREIRA
Front End Engineer
CC Search Team
SARAH PEARSON
Product Counsel
CC Search Team
JANE PARK
Director of Product & Research
CC Search Team
CC Search Vision in 2019
“CC Search is a leading tool for creators looking to discover and reuse free
resources with greater ease and confidence”
2019 Vision & Strategy CHANGE
optimization
pivot or persevere
PRODUCT
STRATEGY
(hypotheses)
VISION
(true north)
Value hypothesis — Users are motivated by ease of reuse of
free resources to come to CC Search
Growth hypothesis — Viral engine of growth through
attribution displayed prominently in all reuses
CC Search is a leading tool for creators looking to discover and reuse free resources with
greater ease and confidence
CC Search
CC API
CC Catalog
How we expect reuse to happen
Reuse somewhere out there
Download
Reuse on the discoverable web
(open & non-open)
Default
CC Search Front End
Catalog
Curation on
CC Search Front End
CC Search integrated in
other sites/software via API
Attribution Story
Journey
Impact
"click" by katie cowden is licensed under CC BY-NC-ND 2.0 
Current Product Stack
CC Catalog
CC Catalog API
CC Search Front End
CC Catalog
• CC Catalog is a growing collection of 300 million Creative Commons
works from ~20 different sources, collected by CC Data Engineer
Sophine Clachar
• That’s 288 million more works and 14 more data sources than we had
last year
How do we find CC content?
Common Crawl
‣ Useful for discovery - finding websites with lots of CC licensed works
‣ We can use Common Crawl to catalog works without even visiting the
source website.
Bespoke API integrations
‣ We write scripts to search for Creative Commons content on your website
and scrape them in bulk. ex: Flickr, Thingiverse, Cleveland Art Museum
The near future for CC Catalog
• Wikimedia Commons, Europeana, NYPL and Cleveland Library coming
soon
• Open Textbooks by the end of the year
CC Catalog API
• A simple open-source interface for searching hundreds of millions of
Creative Commons works in a few hundred milliseconds
• Responsible for filtering and ranking works. Which license has the user
selected? Which images have 404’d due to link rot? What’s the best result
set for a given query?
• A framework for indexing new Creative Commons content as we acquire it
in CC Catalog
Current status
• The API is open for developers to experiment with - visit https://
api.creativecommons.engineering
• Terms of use and rate limits
• If you find it useful, please contact us and tell us about your use case
Deployment and infrastructure automation
Back-end server language
Infrastructure host
Search engine
Database
Technical stuff
Near term search improvements
• Using some popularity-based ranking to determine which works have
been linked to the most
• Using AI to automatically tag images
• Collecting more metadata about images
• Smarter querying using boosting
The future of the API — content discovery
• The “Push API” - instead of us manually finding CC works, trusted institutions can send
them to us!
• Wilder, pie-in-the-sky idea: we crawl your works the instant you publish them on the
internet
‣ The user embeds a CC button image next to their work
‣ We track the image to find out where the work is located, dispatch a crawler to pick
up machine-readable attribution information (ccREL), and add it to the catalog
‣ Glaring issues: misattribution, vandalism, privacy, moderation, lack of tools for
generating ccREL
CC Search Front End
A web application designed to facilitate discovery and reuse of CC-
Licensed works, providing users with a friendly UI driven by the CC
Catalog API.
How does it work?
Vue.JS AppCC Catalog API CC Search Frontend
Legal considerations
• Copyright
• images hosted elsewhere
• embedding as © use?
• Attribution and licensing information
• wrong info
• insufficient info
• Future decisions: updating & verifying catalog, dealing with takedowns
Validated Learning
Fancy way of saying that everything we build is for our users,
that we will learn about how and why they use CC Search,
and that we incorporate that learning back into the product.
Usage data
• Graylog dashboard for CC Catalog API usage
• Google analytics
‣ Attribution button clicks
‣ GA referral links
‣ Html embed
User feedback & usability testing
• User feedback survey
• Image reuse survey
• Github repos
• #cc-usability channel on CC Slack, #feedbackfridays
• support-search@creativecommons.org
• Monthly usability tests
User research to come
• User research centered on “Creators making new works using existing free
content”
‣ Creators making designs, imagery and art works (commercial or independent)
‣ Creators illustrating a text or text-based resource (blog, journalistic articles,
educational/academic texts or presentations)
‣ Creators making a video
• User research for adding open texts, e.g. “Educators seeking access to free
textbooks in one place”
CC Search — Agile Development Process
• 2019 Roadmap broken down by quarter
• Quarter broken down into two week sprints
• Twice weekly “stand-ups”
• Releases of CC Search and the API every Thursday
• Community feedback on Fridays (#feedbackfriday)
• Everything tracked openly on Github
How you can get involved
CC Search Development Community
• Two ways to join the developer community
• Help us improve the CC Catalog, API, and CC Search
• Integrate the CC Catalog API into your own projects (experimental)
• We’ve had a lot of amazing community contributions already
Old CC Search layout
New CC Search layout
CC Search Development Community
• Visit the CC Open Source website: https://creativecommons.github.io
• List of CC projects
• Contribution guidelines, including how to find things to contribute to
• CC Technical blog
• Project ideas for new CC related products
Development Community: Google Summer of Code
• CC will have five students working with us full-time this summer via
Google Summer of Code, including
• CC Catalog data visualization
• CC Search browser extension
• CC WordPress plugin updates, including CC Catalog API integration
• Follow along on #cc-developers and #cc-gsoc on Slack
Development Community: Open Source Session
Come to our Open Source at Creative Commons session
tomorrow at 1:30 PM in the New Delhi room!
Join a monthly usability test
• Email jane@creativecommons.org if you’re interested, and I’ll add you
to the queue
• Join the #cc-usability channel on CC Slack and provide feedback on
#feedbackfridays
✓ Catalog 325 million works
✓ Ship product vision & strategy
✓ Ship CC API strategy
✓ Ship developer documentation
✓ Ship “Attribution in Frame” MVP
✓ Determine metrics & set up for validated learning
✓ Complete QA sprint to ship CC Search 1.0
✓ Ship CC Search 1.0 as default
✓ Soft launch CC API
✓ Make CC Search accessible
✓ User research for open texts
✓ Run usability tests
✓ GSoC (usability)
2019 ROADMAP
(subject to change)
Start of Q1 (January)
End of Q1 (March)
Global Summit
End of Q2 (June)
End of Q3
(September)
End of Q4
(December)✓ Integrate open texts
✓ User research for open audio
✓ ID API partners
✓ Run usability tests
✓ Branding for CC Search
✓ GSoC (usability)
✓ Integrate open audio
✓ Prototype API partner
integration
✓ Run usability tests
✓ GSoC prototype integration
Q&A
Discussion
Discussion
1. Why do you use CC Search? (e.g. to find images for a blog post)
2. Is CC Search useful for this purpose?
3. How can it be better? e.g. How can we make the tool more relevant to
your region?
4. What collections would you like to see in CC Search? Can you help
connect us to those people?

State of CC Search (GS 2019)

  • 1.
    2019 CC GlobalSummit Creative Commons Search team Led by Kriti Godey, Director of Engineering Jane Park, Director of Product & Research State of CC Search "First Image of a Black Hole" by European Southern Observatory is licensed under CC BY 2.0
  • 2.
    Agenda ‣ Current stateof CC Search and team (6 members!) ‣ Vision & Strategy ‣ How it all works ‣ What’s next + how you can get involved ‣ Demo ‣ Q&A and Discussion
  • 3.
    Current State ofCC Search • 300 million images, 19 providers • Images of: art objects, graphic art & designs, flowers, science, a lot of (but not everything) on Flickr, initial set of CC0 3D designs • “One-click” attribution tools: rich text, html embed • Redesign: cleaner home page, improved navigation & filters • User feedback and usage analytics
  • 4.
    CC Search Visionin 2016 “to build a ‘front door’ to the Commons with the ultimate goal to find and index all 1.1 billion CC licensed works on the web”
  • 9.
    CC Search Visionin 2019 “CC Search is a leading tool for creators looking to discover and reuse free resources with greater ease and confidence”
  • 10.
    KRITI GODEY Director ofEngineering CC Search Team
  • 11.
  • 12.
  • 13.
    BRENO FERREIRA Front EndEngineer CC Search Team
  • 14.
  • 15.
    JANE PARK Director ofProduct & Research CC Search Team
  • 16.
    CC Search Visionin 2019 “CC Search is a leading tool for creators looking to discover and reuse free resources with greater ease and confidence”
  • 17.
    2019 Vision &Strategy CHANGE optimization pivot or persevere PRODUCT STRATEGY (hypotheses) VISION (true north) Value hypothesis — Users are motivated by ease of reuse of free resources to come to CC Search Growth hypothesis — Viral engine of growth through attribution displayed prominently in all reuses CC Search is a leading tool for creators looking to discover and reuse free resources with greater ease and confidence CC Search CC API CC Catalog
  • 18.
    How we expectreuse to happen Reuse somewhere out there Download Reuse on the discoverable web (open & non-open) Default CC Search Front End Catalog Curation on CC Search Front End CC Search integrated in other sites/software via API Attribution Story Journey Impact
  • 19.
  • 20.
    Current Product Stack CCCatalog CC Catalog API CC Search Front End
  • 21.
    CC Catalog • CCCatalog is a growing collection of 300 million Creative Commons works from ~20 different sources, collected by CC Data Engineer Sophine Clachar • That’s 288 million more works and 14 more data sources than we had last year
  • 23.
    How do wefind CC content? Common Crawl ‣ Useful for discovery - finding websites with lots of CC licensed works ‣ We can use Common Crawl to catalog works without even visiting the source website. Bespoke API integrations ‣ We write scripts to search for Creative Commons content on your website and scrape them in bulk. ex: Flickr, Thingiverse, Cleveland Art Museum
  • 24.
    The near futurefor CC Catalog • Wikimedia Commons, Europeana, NYPL and Cleveland Library coming soon • Open Textbooks by the end of the year
  • 25.
    CC Catalog API •A simple open-source interface for searching hundreds of millions of Creative Commons works in a few hundred milliseconds • Responsible for filtering and ranking works. Which license has the user selected? Which images have 404’d due to link rot? What’s the best result set for a given query? • A framework for indexing new Creative Commons content as we acquire it in CC Catalog
  • 26.
    Current status • TheAPI is open for developers to experiment with - visit https:// api.creativecommons.engineering • Terms of use and rate limits • If you find it useful, please contact us and tell us about your use case
  • 27.
    Deployment and infrastructureautomation Back-end server language Infrastructure host Search engine Database Technical stuff
  • 29.
    Near term searchimprovements • Using some popularity-based ranking to determine which works have been linked to the most • Using AI to automatically tag images • Collecting more metadata about images • Smarter querying using boosting
  • 30.
    The future ofthe API — content discovery • The “Push API” - instead of us manually finding CC works, trusted institutions can send them to us! • Wilder, pie-in-the-sky idea: we crawl your works the instant you publish them on the internet ‣ The user embeds a CC button image next to their work ‣ We track the image to find out where the work is located, dispatch a crawler to pick up machine-readable attribution information (ccREL), and add it to the catalog ‣ Glaring issues: misattribution, vandalism, privacy, moderation, lack of tools for generating ccREL
  • 31.
    CC Search FrontEnd A web application designed to facilitate discovery and reuse of CC- Licensed works, providing users with a friendly UI driven by the CC Catalog API.
  • 32.
    How does itwork? Vue.JS AppCC Catalog API CC Search Frontend
  • 34.
    Legal considerations • Copyright •images hosted elsewhere • embedding as © use? • Attribution and licensing information • wrong info • insufficient info • Future decisions: updating & verifying catalog, dealing with takedowns
  • 35.
    Validated Learning Fancy wayof saying that everything we build is for our users, that we will learn about how and why they use CC Search, and that we incorporate that learning back into the product.
  • 36.
    Usage data • Graylogdashboard for CC Catalog API usage • Google analytics ‣ Attribution button clicks ‣ GA referral links ‣ Html embed
  • 37.
    User feedback &usability testing • User feedback survey • Image reuse survey • Github repos • #cc-usability channel on CC Slack, #feedbackfridays • support-search@creativecommons.org • Monthly usability tests
  • 38.
    User research tocome • User research centered on “Creators making new works using existing free content” ‣ Creators making designs, imagery and art works (commercial or independent) ‣ Creators illustrating a text or text-based resource (blog, journalistic articles, educational/academic texts or presentations) ‣ Creators making a video • User research for adding open texts, e.g. “Educators seeking access to free textbooks in one place”
  • 39.
    CC Search —Agile Development Process • 2019 Roadmap broken down by quarter • Quarter broken down into two week sprints • Twice weekly “stand-ups” • Releases of CC Search and the API every Thursday • Community feedback on Fridays (#feedbackfriday) • Everything tracked openly on Github
  • 40.
    How you canget involved
  • 41.
    CC Search DevelopmentCommunity • Two ways to join the developer community • Help us improve the CC Catalog, API, and CC Search • Integrate the CC Catalog API into your own projects (experimental) • We’ve had a lot of amazing community contributions already
  • 42.
  • 43.
  • 44.
    CC Search DevelopmentCommunity • Visit the CC Open Source website: https://creativecommons.github.io • List of CC projects • Contribution guidelines, including how to find things to contribute to • CC Technical blog • Project ideas for new CC related products
  • 45.
    Development Community: GoogleSummer of Code • CC will have five students working with us full-time this summer via Google Summer of Code, including • CC Catalog data visualization • CC Search browser extension • CC WordPress plugin updates, including CC Catalog API integration • Follow along on #cc-developers and #cc-gsoc on Slack
  • 46.
    Development Community: OpenSource Session Come to our Open Source at Creative Commons session tomorrow at 1:30 PM in the New Delhi room!
  • 47.
    Join a monthlyusability test • Email jane@creativecommons.org if you’re interested, and I’ll add you to the queue • Join the #cc-usability channel on CC Slack and provide feedback on #feedbackfridays
  • 48.
    ✓ Catalog 325million works ✓ Ship product vision & strategy ✓ Ship CC API strategy ✓ Ship developer documentation ✓ Ship “Attribution in Frame” MVP ✓ Determine metrics & set up for validated learning ✓ Complete QA sprint to ship CC Search 1.0 ✓ Ship CC Search 1.0 as default ✓ Soft launch CC API ✓ Make CC Search accessible ✓ User research for open texts ✓ Run usability tests ✓ GSoC (usability) 2019 ROADMAP (subject to change) Start of Q1 (January) End of Q1 (March) Global Summit End of Q2 (June) End of Q3 (September) End of Q4 (December)✓ Integrate open texts ✓ User research for open audio ✓ ID API partners ✓ Run usability tests ✓ Branding for CC Search ✓ GSoC (usability) ✓ Integrate open audio ✓ Prototype API partner integration ✓ Run usability tests ✓ GSoC prototype integration
  • 50.
  • 51.
  • 52.
    Discussion 1. Why doyou use CC Search? (e.g. to find images for a blog post) 2. Is CC Search useful for this purpose? 3. How can it be better? e.g. How can we make the tool more relevant to your region? 4. What collections would you like to see in CC Search? Can you help connect us to those people?