Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Challenges of Using Twitter Data
Twitter Restrictions (only Tweet ID’s)
• Twitter Developer Policy only allows you to dist...
TANGIBLE, INTANGIBLE, DIGITAL, EPHEMERAL
Towards a Unified Heritage Classification Scheme
BY MARC CASTELLANI, RACHEL EGAN,...
LIS 653 Knowledge
Organization
Fall 2016
Dr. Cristina Pattuelli
Pratt Institute School of
Information
Main References:
Gil...
Crowdsourcing in Libraries, Archives, and Museums
What is crowdsourcing?
+ Outsourcing work to a crowd
+ Often involves “m...
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Upcoming SlideShare
Loading in …5
×

LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

471 views

Published on

Poster slides from Pratt Institute's Knowledge Organization final presentations, Fall 2016.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

  1. 1. Challenges of Using Twitter Data Twitter Restrictions (only Tweet ID’s) • Twitter Developer Policy only allows you to distribute or allow download of Tweet IDs and/or User IDs • You may provide export via non-automated means of up to 50,000 public Tweets and/or User Objects per user of your Service, per day • Can “hydrate” tweet ID’s from previous datasets Skills Needed • Understanding of technology Limitations • Social media collections within web archives tend to be event-driven and limited to selected platforms, pages or user accounts • Social media platforms protect the algorithms used to generate the allowed sample size • Without the algorithm used, researchers cannot verify the sample does not contain any misrepresentation • Only certain amount can be requested to prevent excessive data access Context •Individual tweets are limited in their length and contain very little information • Complicates the intelligibility of the content at a later time Storage •Sufficient storage space •Can’t store on third party cloud due to twitter restrictions No established standards and best practices Library of Congress Update 2013 First Object (was to be completed in 2013) •Acquire and preserve the 2006-10 archive •Establish a secure and sustainable process for receiving and preserving daily, ongoing steam of tweets •Create a structure for organizing the entire archive by date Second Objective •Confronting and working around the technology challenges to make archive accessible to researchers in a useful way Progress •Archive is at approximately 170 billion tweets! •Has not yet provided researchers access to the archive •A single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours »Limits the number of possible searches »Require an extensive infrastructure of hundreds if not thousands of servers •Working to develop a basic level of access that can be implemented while archival access technologies catch up Archiving Social Media: Twitter Stefanie Hew, Shazia Naderi, Kristiana Wesloh Fall 2016  LIS 653-01  Professor Cristina Pattuelli Library of Congress + Twitter •April 2010, the Library signed an agreement with Twitter providing the Library all public tweets from the company’s inception thorough the date of the agreement (2006-10) •Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis on the same terms Value of Archiving Twitter •Primary method of communication and creative expression •Supplementing and supplanting traditional print media •Provide future researchers access to a fuller picture of today’s • Cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes Figure 1. Figure 2. Examples of influential hashtag on Twitter. Software to Use Twitter API •Tawpperkepper (before 2011) •Twap •Socail Feed Manager •Twarc •TAGS: Twitter Archive Google Sheet •Twitter Capture and Analysis Toolset •Netlytic References Developer Agreement and Policy. (2016) Retrieved 10 22, 2016, from Twitter: https://dev.twitter.com/overview/terms/agreement-and-policy Felt, M. (2016, January-June). Social Media and the Social Sciences: How researchers Employ Big Data Analytics Big Data & Society: 1-15. Retrieved from DOI: 10.1177/2053951716645828 Firehose. (2016). Retrieved 10 22, 2016, from Twitter: https://dev.twitter.com/streaming/firehose Library of Congress. (2013, Jan.). Update on the Twitter Archive At the Library of Congress. Retrived from https://www.loc.gov/today/pr/2013/files/twitter_report_2013jan.pdf. Risse, T., Peters, W., Senellart, P., and Maynard, D. (2014). Documenting Contemporary Society by Preserving Relevant Information from Twitter. Retrived from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.337.9643 Thomson, S. D. (2016). Preserving social media. Digital Preservation Coalition Technology Watch Report 16-01 February 2016. Retrieved from http://dx.doi.org/10.7207/twr16-01 Images: Figure 1: https://about.twitter.com/en-gb/company/brand-assets; https://upload.wikimedia.org/wikipedia/commons/4/41/US-LibraryOfCongress-Logo.svg; Figure 2: https://twitter.com How to Access Twitter Data •Twitter provides a streaming API that gives raw access to Twitter’s global stream of data • Creates a “back door” into the current activity on Twitter • Only goes back about a week and does not allow for searches of historical tweets •Different volumes of streaming data • From 1% of tweets to the Firehose that opens access to 100% of tweets • Firehose requires special permission to access •Twitter does not reveal how the 1% sample is selected
  2. 2. TANGIBLE, INTANGIBLE, DIGITAL, EPHEMERAL Towards a Unified Heritage Classification Scheme BY MARC CASTELLANI, RACHEL EGAN, CORMAC FITZGERALD, AND DANA LACHENMAYER | PRATT INSTITUTE, LIS 653-02 FALL 2016 Fig. 1 Overlapping Fields of Cultural Heritage Cultural Heritage Metadata Structures Metadata standards often start as schemas developed by a specialized community in order to enable more accurate item description. Is there a structure that unites all heritage fields? 1. United Nations Educational, Scientific, and Cultural Organization. Convention for the Safeguarding of the Intangible Cultural Heritage (29 September-17 October 2013). Retrieved from http://www.unesco.org/culture/ich/en/convention 2. Zeng, M., & Qin, J. (2016). Metadata (2nd ed.). Chicago: ALA Neil-Schuman, p. 42. 3. Baca, M., Harpring, P., Ward, J., & Beecroft, A. (Eds.). (2014). Metadata standards crosswalk. The Getty Research Institute. Retrieved from http://www.getty.edu/research/publications/electronic_publications/intrometadata/crosswalks.html
  3. 3. LIS 653 Knowledge Organization Fall 2016 Dr. Cristina Pattuelli Pratt Institute School of Information Main References: Gilman, I. (2006). From marginalization to accessibility: Classification of indigenous materials. Faculty Scholarship (PUL), 6. Knowledge Organization Practices of Indigenous People Anna Holbert & Leslie To Maori Subject Headings In 1998, the Maori Subject Headings Working Party formed and went on to develop subject headings in the Maori language. The first group of headings were published in 2005. MSH utilizes controlled vocabularies within the Library of Congress structure. These unique implementations focus on relationships, as opposed to rigid hierarchies, which are central constructs of Maori culture. By introducing a bilingual thesaurus, MSH provides narrower and more specific search results leading to improvements and increase of access to users and researchers. There are now more than 500 subject headings in use today. Brian Deer Classification Created in 1974 by Brian Deer it reflected a First Nations epistemological framework and appropriate language. Rather than working within an existing framework, the scheme was developed from scratch. BDCS provided a foundation for institutions to create tailored classification schemes. Libraries could have a First Nation/Inuit/M´etis focus without everything being classified under one subject call number. As a member of the First Nations, Deer better understood the subtleties and worldview making the resulting classification more accessible. Although the Library of Congress Classification system is an incredible resource for organizing information, it is lacking in specificity and is inherently biased towards non-western cultures and information. The Brian Deer Classification and Maori Subject Headings are two examples of indigenous librarian innovation and the need for flexibility and openness in knowledge organization. Focus of Research
  4. 4. Crowdsourcing in Libraries, Archives, and Museums What is crowdsourcing? + Outsourcing work to a crowd + Often involves “microtasks,” or tasks not easily accomplished by a computer + Began as a money-making business tool, but quickly expanded to volunteer work Selected References + Brabham, D. (2013). Concepts, Theories, and Cases of Crowdsourcing. In Crowdsourcing (pp. 1-40). MIT Press. Retrieved from http://www.jstor.org/stable/j.ctt5hhk3m.7 +Ellis, Sally (2014). A History of Collaboration, a Future in Crowdsourcing: Positive Impacts of Cooperation on British Librarianship. International Journal of Libraries & Information Services. 1-10. Retrieved from http://www.crowdconsortium.org/wp-content/uploads/A-History- of-Collaboration-a-Future-in-Crowdsourcing-Positive-Impacts-of- Cooperation-on-British-Librarianship.pdf + Oomen, Johan and Lora Aroyo (2011). Crowdsourcing in the Cultural Heritage Domain: Opportunities and Challenges. Proceedings of the 5th International Conference on Communities and Technologies. 138-149. Retrieved from http://dl.acm.org/citation.cfm?id=2103373 ScribeCurrent Projects LIS 653-01 + Knowledge Organization + Professor Cristina Pattuelli + Fall 2016 Meg Edison, Karalyn Mark, Katrina Rink, Clair Rock Smithsonian Transcription Center + An ongoing crowdsourcing project launched in June 2013 by the Smithsonian Institution + Invites anyone with internet access to contribute transcriptions to a variety of documents provided by 14 of the Smithsonian’s libraries, archives, and museums + Contributions enable the materials to be text- searchable. NYPL Labs + Began to initiate crowdsourcing projects in 2011 + “What’s on the Menu” enlists volunteers in the transcription of historical menus. + “Map Rectifier” prompts amateur cartographers to digitally align ("rectifying") historical maps from the NYPL's collections to match today's precise maps. + Contributions culminated in open access release of NYPL’s entire public domain Digital Collection in 2016. + An entirely crowdsourced API software used to crowdsource information from large databases of handwritten documents. + Uses 3 simple steps to gather information. + Mark + Transcribe + Verify + Created and sponsored by NYPL Labs and Zooniverse.

×