Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reimagining the Digital Monograph: Improving the Discovery and Use of Scholarly Ebooks

412 views

Published on

Monographs are increasingly making the print-to-digital shift that journals started twenty years ago, but many online platforms for monographs arguably do not take full advantage of the digital environment. In October 2016, JSTOR Labs, an experimental platform development group at JSTOR, convened a group of scholars, librarians, and publishers to unpack the design issues around the presentation of digital monographs. The group proposed a set of principles for reimagining the presentation of monographs in order to improve the user experience and increase the value of ebooks to scholars and students. This talk introduces these principles, which are also outlined in a white paper, and addresses discovery, evaluation, and interoperability challenges of the current scholarly ebook landscape. The presentation includes a demonstration of a new, open-source prototype that the JSTOR Labs group has designed: a topic-based navigational aid for monographs called "Topicgraph," and a deep dive into the topic modeling and natural language processing tools that power it. Last, the presentation included audience-participation voting on four potential follow-on projects. These slides show the results of that voting.

Published in: Education
  • Be the first to comment

Reimagining the Digital Monograph: Improving the Discovery and Use of Scholarly Ebooks

  1. 1. REIMAGINING THE DIGITAL MONOGRAPH @rdsnyderjr Ron Snyder, JSTOR Labs DPLAfest 2017 April 21, 2017 @abhumphreys Alex Humphreys, JSTOR Labs
  2. 2. ITHAKA is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways. JSTOR is a not-for-profit digital library of academic journals, books, and primary sources. Ithaka S+R is a not-for-profit research and consulting service that helps academic, cultural, and publishing communities thrive in the digital environment. Portico is a not-for-profit preservation service for digital publications, including electronic journals, books, and historical collections. Artstor provides 2+ million high-quality images and digital asset management software to enhance scholarship and teaching.
  3. 3. JSTOR Labs works with partner publishers, libraries and labs to create tools for researchers, teachers and students that are immediately useful – and a little bit magical.
  4. 4. labs.jstor.org/monograph REIMAGINING THE MONOGRAPH
  5. 5. HOW WE DID IT
  6. 6. WORKING RAPIDLY Can we improve the experience and value of long-form scholarship? Aug-Sep: User Research Oct: Workshop Nov: Build Prototype Dec: Release Paper/Prototype
  7. 7. TALKING TO USERS
  8. 8. WORKING WITH THE COMMUNITY
  9. 9. ITERATING, ITERATING, ITERATING, ITERATING, ITERA
  10. 10. TOPICGRAPH
  11. 11. labs.jstor.org/topicgraph TOPICGRAPH Understand at a glance the topics covered in a book. Jump straight to pages about topics you’re researching.
  12. 12. WHY THESE TOPICS? AND, WHERE DID THEY COME FROM? The topics used by Topicgraph are based on a controlled vocabulary containing +40,000 terms. These terms represent concepts (no entities, currently) found in the JSTOR corpus. The controlled vocabulary (JSTOR Thesaurus) was constructed from 20 thesauri obtained from various sources, including ERIC, MeSH, and NASA • The thesaurus was developed in collaboration with Access Innovations (creator of the Data Harmony tools) and subject matter experts • Key branches in the thesaurus are reviewed and corrected by subject matter experts
  13. 13. JSTOR THESAURUS
  14. 14. WHY THESE TOPICS? AND, WHERE DID THEY COME FROM? Human curated tagging rules have been developed for each concept in the JSTOR Thesaurus enabling concepts to be extracted from unstructured text All documents in the JSTOR corpus have been tagged with thesaurus concepts using a rules-based indexer This tagged corpus is then used to select training documents for building an LDA topic model The LDA topic model enables us to identify latent topics found in text in addition to those explicitly identified with the human-generated rules
  15. 15. THESAURUS TAGGER RULE BUILDER
  16. 16. TOPIC MODEL • Labeled LDA Topic model • Model trained using documents selected from JSTOR corpus with tagged thesaurus concepts • Using OSS Mallet tool • Current version of model includes approximately 11,000 topics • Each topic represents a distribution of word probabilities redistricting district congressional minority political majority house legislative racial gerrymandering court republican plan electoral districting seat representative black voter democrat partisan election democratic representation line supreme legislature drawn control population voting drawing policy texas draw map claim boundary following commission outcome shaw race census legal principle creation decision create finding elect lublin polarization optimal elected composition affect member measure vote gain previous legislator geographic southern section every approach controlled round note gerrymander reapportionment compactness decennial bipartisan constitutional find substantive california roll competitive county competition party requirement federal north post redrawn incumbent criterion consequence likely formal safe delegation georgia justice influence shotts equal favor might scholar equality south power law judicial bias king carolina call according voss baker panel professor rule mandate creating increased determine constraint politics argue standard redis grofman reno cain redrawing margin share ing tricting decrease congress geographical requires simple held critic empirical david niemi perverse latino analyze examine debate rather impact next provides give balance affected subsequent possible take practice community robbins constitution computer evenly fraction constituent illinois supporter shape responsiveness typically various proposed despite either focus conclusion african opportunity redistrict mcdonald white numerous test statewide percent suggests thus choice largely develop decade conclude fact four reached Redistricting district congressional congress house representative member federal districting seat majority plan representation population congressman apportionment elected court president washington columbia legislative census party interest political gerrymandering redistricting home thomas affect every black democrat dis foley carolina find reapportionment constituency supreme constitution voting geographic active dinner responsiveness south force john gingrich legislature equal membership neighborhood testimony north james service decennial constituent passed boundary law creation firm charles spending congruent election politically addition april contact proportion con assistant position following york land unconstitutional resident miller voter pledge stephen city official minority respective mainland kentucky post clause better divisor perimeter yao secretary republican senate moderate congruence map county grant senior drawing portion speaker feature decision professor became gerrymander swain trict leapfrog federalist partisan senator vote captain compelling lucas candidate race create harm require fourth shape you traditional purpose shaped concern people shaw historical simply policy henry david allocation vetoed arkansas smiley serra carl volunteer politician budget burden electoral leaf education reduced principle proximity november significant just represented second gathered fiorina representa gressional glazer apportion gerrymandered boris bronx issn rank redrawing twice refused eliminates provincial jefferson returned witness campaign fletcher georgia empirically personnel size maximize half reserve read demographic percent contrary required determining throughout … Congressional districts Top words from some sample topics
  17. 17. TEXT SEGMENTATION • Document text is recursively split into roughly equal sized chunks until a max size threshold is reached • Chunks aligned on chapter and page boundaries • Topic inference is then performed on all text chunks at all levels In Partisan Gerrymandering and the Construction of American Democracy, Erik J. Engstrom offers … In Partisan Gerrymandering and the Construction … Electoral Competition and Critical Elections 121 …
  18. 18. TOPIC PROFILES • Using the topic inferencer, topic profiles are generated for each chunk identifying the top topics and weights • The top N topics from these leaf chunks are then used in the generation of the topic graph visualizations In Partisan Gerrymandering and the Construction of American Democracy, Erik J. Engstrom offers … In Partisan Gerrymandering and the Construction … Electoral Competition and Critical Elections 121 … Redistricting -153 Gerrymandering - 113 Delegation of authority - 65 Congressional voting - 45 Democracy - 13 Gerrymandering - 66 Democracy - 47 Politicians - 1 Redistricting - 128 Gerrymandering - 38 Malapportionment - 28 Electoral districts - 4 Redistricting - 241 Gerrymandering - 6 Minority voters - 1 Redistricting - 131 Minority voters - 46 Congressional districts - 4
  19. 19. TOPIC VISUALIZATIONS Various topic visualization approaches were considered for the topic profiles • Including heat maps, tree maps, streamgraphs, simple line and area charts • Based on user testing we selected an approach that used multiple D3 area charts An early D3 Streamgraph prototype
  20. 20. TOPIC VISUALIZATIONS Various topic visualization approaches were considered for the topic profiles • Including heat maps, tree maps, streamgraphs, simple line and area charts • Based on user testing we selected an approach that used multiple D3 area charts
  21. 21. WHAT’S NEXT? • Topicgraph webapp code is available for forking and customization • https://github.com/JSTOR-Labs/topicgraph • Elements of Topicgraph have been incorporated into the new-ish JSTOR Labs Text Analyzer tool • Directions and further work on the Topicgraph POC tool will in large part be determined by the community JSTOR Labs Text Analyzer App – https://www.jstor.org/analyze Video – https://youtu.be/JTO859YCxDI
  22. 22. THE REIMAGINED MONOGRAPH
  23. 23. WHITE PAPER Currently released as a draft for comment Describes the project, process & prototype Includes 12 principles to consider when reimagining the monograph labs.jstor.org/monograph
  24. 24. 1. The importance of great writing is a given. 2. The ideal digital monograph should allow different kinds of readers to navigate it in different ways. 3. Readers should be given better tools to assess the content of scholarly books quickly and efficiently. 4. Readers should be able to navigate more quickly to the portion of the book they are interested in. 5. Readers should be given better capabilities for situating a book within the larger scholarly conversation. 6. Readers should be able to ‘flip’ between sections of a digital monograph as easily as they can in a print book. 7. In an ideal world, readers would be able to work simultaneously with both a print and digital edition. 8. Books should be able to ‘travel’ easily from device to device. 9. Readers should be able to interact with and mark up digital books. 10. Readers should be able to interact with books in collaborative environments. 11. Ideally, digital book collections and aggregations would offer serendipitous discovery—the “library stacks” effect. 12. Digital scholarly book files should be open and flexible. TWELVE PRINCIPLES
  25. 25. 1. The importance of great writing is a given. 2. The ideal digital monograph should allow different kinds of readers to navigate it in different ways. 3. Readers should be given better tools to assess the content of scholarly books quickly and efficiently. 4. Readers should be able to navigate more quickly to the portion of the book they are interested in. 5. Readers should be given better capabilities for situating a book within the larger scholarly conversation. 6. Readers should be able to ‘flip’ between sections of a digital monograph as easily as they can in a print book. 7. In an ideal world, readers would be able to work simultaneously with both a print and digital edition. 8. Books should be able to ‘travel’ easily from device to device. 9. Readers should be able to interact with and mark up digital books. 10. Readers should be able to interact with books in collaborative environments. 11. Ideally, digital book collections and aggregations would offer serendipitous discovery—the “library stacks” effect. 12. Digital scholarly book files should be open and flexible. TWELVE PRINCIPLES
  26. 26. 1. The importance of great writing is a given. 2. The ideal digital monograph should allow different kinds of readers to navigate it in different ways. 3. Readers should be given better tools to assess the content of scholarly books quickly and efficiently. 4. Readers should be able to navigate more quickly to the portion of the book they are interested in. 5. Readers should be given better capabilities for situating a book within the larger scholarly conversation. 6. Readers should be able to ‘flip’ between sections of a digital monograph as easily as they can in a print book. 7. In an ideal world, readers would be able to work simultaneously with both a print and digital edition. 8. Books should be able to ‘travel’ easily from device to device. 9. Readers should be able to interact with and mark up digital books. 10. Readers should be able to interact with books in collaborative environments. 11. Ideally, digital book collections and aggregations would offer serendipitous discovery—the “library stacks” effect. 12. Digital scholarly book files should be open and flexible. TWELVE PRINCIPLES
  27. 27. 1. The importance of great writing is a given. 2. The ideal digital monograph should allow different kinds of readers to navigate it in different ways. 3. Readers should be given better tools to assess the content of scholarly books quickly and efficiently. 4. Readers should be able to navigate more quickly to the portion of the book they are interested in. 5. Readers should be given better capabilities for situating a book within the larger scholarly conversation. 6. Readers should be able to ‘flip’ between sections of a digital monograph as easily as they can in a print book. 7. In an ideal world, readers would be able to work simultaneously with both a print and digital edition. 8. Books should be able to ‘travel’ easily from device to device. 9. Readers should be able to interact with and mark up digital books. 10. Readers should be able to interact with books in collaborative environments. 11. Ideally, digital book collections and aggregations would offer serendipitous discovery—the “library stacks” effect. 12. Digital scholarly book files should be open and flexible. TWELVE PRINCIPLES
  28. 28. “The reimagined monograph – whatever that ultimately means – will not be built in a single step, or by a single organization.”
  29. 29. WHAT SHOULD THE NEXT STEP BE?
  30. 30. DEMOCRACY IN ACTION! Vote here: PollEv.com/jstorlabs
  31. 31. THE “BOOK AS GATEWAY”
  32. 32. THE “BOOK DASHBOARD”
  33. 33. THE “SCHOLARLY READER”
  34. 34. THE “CITATION MIXER”
  35. 35. Thank you Alex Humphreys @abhumphreys alex.humphreys@ithaka.org Ron Snyder @rdsnyderjr ronald.snyder@ithaka.org http://labs.jstor.org
  36. 36. APPENDIX (OPEN IN CASE OF NO INTERNET CONNECTION)

×