Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating a New Way to Search - CNI Fall 2017

137 views

Published on

Earlier this year, JSTOR Labs, an experimental product development group at JSTOR, released Text Analyzer, a new way to search in which users can upload their own document to initiate a search to find similar articles on the same topics. Scholars can upload near-finished manuscripts as a way to complete a literature review, and students can enter a few pages of a work-in-progress paper to find scholarship they'll need to finish their paper. Text Analyzer uses natural language processing to figure out what the uploaded document is "about" and then recommends articles and chapters in JSTOR about the same topics. Since its release, the JSTOR Labs team has worked with Columbia University Libraries to encourage the tool's usage and to explore possible applications of the tool. In this session, we will demonstrate the tool and the technology that powers it, share reactions of students and scholars who have used it, and reflect upon the challenges in driving adoption of a new kind of search, when users are accustomed to a single manner of interaction. We will also propose applications for this technology beyond the JSTOR corpus. These possibilities include the augment of other, current library systems, such as using a common infrastructure to create a discovery layer and aggregation of institutional repositories.
www.jstor.org/analyze
http://labs.jstor.org

Published in: Education
  • Be the first to comment

  • Be the first to like this

Creating a New Way to Search - CNI Fall 2017

  1. 1. CNI Fall 2017 Creating a New Way to Search @abhumphreys Alex Humphreys, JSTOR Labs @wilderbach Barbara Rockenbach, Columbia Libraries
  2. 2. ITHAKA is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways. JSTOR is a not-for-profit digital library of academic journals, books, and primary sources. Ithaka S+R is a not-for-profit research and consulting service that helps academic, cultural, and publishing communities thrive in the digital environment. Portico is a not-for-profit preservation service for digital publications, including electronic journals, books, and historical collections. Artstor provides 2+ million high-quality images and digital asset management software to enhance scholarship and teaching.
  3. 3. JSTOR Labs works with partner publishers, libraries and labs to create tools for researchers, teachers and students that are immediately useful – and a little bit magical.
  4. 4. WHAT’S A TEXT ANALYZER?
  5. 5. LET’S JUST START WITH A DEMO www.jstor.org/analyze
  6. 6. WHAT’S IT GOOD FOR?
  7. 7. SCHOLARS DOING LITERATURE REVIEWS
  8. 8. FINDING KEYWORDS IN UNFAMILIAR FIELDS https://publish.illinois.edu/commonsknowledge/2017/04/04/spotlight-jstor-labs-text-analyzer/:
  9. 9. ESL RESEARCHERS FINDING KEYWORDS
  10. 10. HOW’D YOU DO THAT?
  11. 11. THREE STEPS FOR EACH SEARCH • From many textual formats (pdf, word, html, etc.) • OCR, if needed (e.g. a picture of a page in a magazine) • Topics: JSTOR Thesaurus & an LDA Topic Model • Entities: Alchemy (Watson), OpenCalais, Stanford, Apache • TF-IDF to select 5 terms • “OR” search • Relevance ranked based on “equalizer” 1. Extract text 2. Identify terms 3. Generate results
  12. 12. TOPIC MODEL • Labeled LDA Topic model • Model trained using documents selected from Wikipedia and JSTOR • Using OSS Mallet tool • Current version of model includes approximately 11,000 topics • Each topic represents a distribution of word probabilities redistricting district congressional minority political majority house legislative racial gerrymandering court republican plan electoral districting seat representative black voter democrat partisan election democratic representation line supreme legislature drawn control population voting drawing policy texas draw map claim boundary following commission outcome shaw race census legal principle creation decision create finding elect lublin polarization optimal elected composition affect member measure vote gain previous legislator geographic southern section every approach controlled round note gerrymander reapportionment compactness decennial bipartisan constitutional find substantive california roll competitive county competition party requirement federal north post redrawn incumbent criterion consequence likely formal safe delegation georgia justice influence shotts equal favor might scholar equality south power law judicial bias king carolina call according voss baker panel professor rule mandate creating increased determine constraint politics argue standard redis grofman reno cain redrawing margin share ing tricting decrease congress geographical requires simple held critic empirical david niemi perverse latino analyze examine debate rather impact next provides give balance affected subsequent possible take practice community robbins constitution computer evenly fraction constituent illinois supporter shape responsiveness typically various proposed despite either focus conclusion african opportunity redistrict mcdonald white numerous test statewide percent suggests thus choice largely develop decade conclude fact four reached Redistricting district congressional congress house representative member federal districting seat majority plan representation population congressman apportionment elected court president washington columbia legislative census party interest political gerrymandering redistricting home thomas affect every black democrat dis foley carolina find reapportionment constituency supreme constitution voting geographic active dinner responsiveness south force john gingrich legislature equal membership neighborhood testimony north james service decennial constituent passed boundary law creation firm charles spending congruent election politically addition april contact proportion con assistant position following york land unconstitutional resident miller voter pledge stephen city official minority respective mainland kentucky post clause better divisor perimeter yao secretary republican senate moderate congruence map county grant senior drawing portion speaker feature decision professor became gerrymander swain trict leapfrog federalist partisan senator vote captain compelling lucas candidate race create harm require fourth shape you traditional purpose shaped concern people shaw historical simply policy henry david allocation vetoed arkansas smiley serra carl volunteer politician budget burden electoral leaf education reduced principle proximity november significant just represented second gathered fiorina representa gressional glazer apportion gerrymandered boris bronx issn rank redrawing twice refused eliminates provincial jefferson returned witness campaign fletcher georgia empirically personnel size maximize half reserve read demographic percent contrary required determining throughout … Congressional districts Top words from some sample topics
  13. 13. WHAT’S NEXT? • Ongoing improvements to algorithm • API releasing this week to beta partners • Article recommendations…
  14. 14. A/B tests on JSTOR for article recommendations
  15. 15. WHAT ARE WE STILL LEARNING?
  16. 16. Is this a feature, a product or a service?* * See: https://scholarlykitchen.sspnet.org/2015/01/27/when-is-a-feature-a-product- and-a-product-a-business/
  17. 17. Embedded widget showing related/ recommended content in Columbia IR or JSTOR. (Prototype only)
  18. 18. What does it take to change researcher behavior?
  19. 19. Thank you Alex Humphreys Director, JSTOR Labs ITHAKA labs.jstor.org @abhumphreys alex.humphreys@ithaka.org
  20. 20. APPENDIX (OPEN IN CASE OF NO INTERNET CONNECTION)

×