The document provides an overview of ChemXSeer, a digital library and search engine for chemistry research. It was created using the Penn State's SeerSuite toolkit to build specialized search engines and digital libraries. ChemXSeer indexes the full text of research papers and extracts additional metadata like author information, tables, figures, chemical names and formulas. It allows for targeted searches in these different fields, like searching for specific authors, tables containing experimental data, figures with plotted results, or chemical structures. The system aims to improve on general search engines by preserving additional scholarly context during the indexing and search process. Future work areas include integrating 3D chemical graph searching and semantic chemical data from sources like spectra and experimental methods.
Support When It Counts - library roles in public access to federally-funded r...Hilary Davis
Charleston Conference 2013
November 8, 2013
Kristine M. Alpi, Director, William Rand Kenan, Jr. Library of Veterinary Medicine, kmalpi@ncsu.edu
William M. Cross, Director, Copyright and Digital Scholarship, NCSU Libraries, wmcross@ncsu.edu
Hilary M. Davis, Interim Head, Collection Management & Director of Research Data Services, NCSU Libraries, hmdavis4@ncsu.edu
In November 2012, the National Institutes of Health (NIH) said it would begin enforcing its earlier April 2008 public access mandate to NIH-funded research by delaying processing of investigators’ grants. In response, the NCSU Libraries offered to assist the university’s sponsored research office in supporting NC State researchers who had publications stemming from NIH funding and had not achieved compliance. Since the 2008 NIH mandate, over 1000 articles based on NIH-funding have been published by NC State across research areas including veterinary medicine, life sciences, physical sciences, social sciences, engineering, textiles, design, math and statistics. Many were published in journals which did not automatically deposit papers to meet NIH requirements. Although familiar with biomedical literature, author agreements and open access, we did not fully grasp the complex web of investigator, author, publisher, institution and funder relations involved in this mandate until we were deeply engaged in the process and gained access to the compliance monitoring data.
In this paper, we will discuss the costs and benefits of library support for authors needing to attain compliance with an eye toward how this support may be scaled up if other federal funding agencies follow suit. We will share practical strategies for supporting compliance efforts for individual researchers and at the campus-wide level, as well as training newly-funded researchers to facilitate future compliance. We discuss the advantages of leveraging existing relationships with publishers to help their researchers, strategies for getting involved in compliance support, and insights on how to skill-up and scale-up when engaging in this part of the research process.
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Support When It Counts - library roles in public access to federally-funded r...Hilary Davis
Charleston Conference 2013
November 8, 2013
Kristine M. Alpi, Director, William Rand Kenan, Jr. Library of Veterinary Medicine, kmalpi@ncsu.edu
William M. Cross, Director, Copyright and Digital Scholarship, NCSU Libraries, wmcross@ncsu.edu
Hilary M. Davis, Interim Head, Collection Management & Director of Research Data Services, NCSU Libraries, hmdavis4@ncsu.edu
In November 2012, the National Institutes of Health (NIH) said it would begin enforcing its earlier April 2008 public access mandate to NIH-funded research by delaying processing of investigators’ grants. In response, the NCSU Libraries offered to assist the university’s sponsored research office in supporting NC State researchers who had publications stemming from NIH funding and had not achieved compliance. Since the 2008 NIH mandate, over 1000 articles based on NIH-funding have been published by NC State across research areas including veterinary medicine, life sciences, physical sciences, social sciences, engineering, textiles, design, math and statistics. Many were published in journals which did not automatically deposit papers to meet NIH requirements. Although familiar with biomedical literature, author agreements and open access, we did not fully grasp the complex web of investigator, author, publisher, institution and funder relations involved in this mandate until we were deeply engaged in the process and gained access to the compliance monitoring data.
In this paper, we will discuss the costs and benefits of library support for authors needing to attain compliance with an eye toward how this support may be scaled up if other federal funding agencies follow suit. We will share practical strategies for supporting compliance efforts for individual researchers and at the campus-wide level, as well as training newly-funded researchers to facilitate future compliance. We discuss the advantages of leveraging existing relationships with publishers to help their researchers, strategies for getting involved in compliance support, and insights on how to skill-up and scale-up when engaging in this part of the research process.
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Site up an open access-ICAR
Institutional Repository-Hardware, Software, Policies and Personnel.
ICAR Initiatives
Under NATP Project – Integrated National Agricultural Resources Information System INARIS (Rai et. Al., 2007). A Central Data warehouse (CWD) of agricultural resources was established at IASRI
This project having collaborations with 13 other organizations of ICAR.
In this view 13 different data marts were designed.
This Project was available under this link (http://agdw.iasri.res.in)
My outlook Country should have agri-search engine
Agri-Search Engine should be developed in country to aggregate information from the internet and provide it to farmers in meaningful manner through using ICT tools.
Agri-Search Engine be coordinated with Govt. of India’s Agricultural Websites to monitor each website per day.
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
The session will provide an overview of the HathiTrust Research Center including its mission and current status. It will also include a demonstration of current HTRC phase one technology and services. Additionally, the speakers will address the HTRC's role in supporting humanities research at scale.
Library discovery: past, present and some futureslisld
A presentation at the NISO virtual conference on Webscale Discovery Services, 20 November 2013.
Considers some of the issues that have led to the adoption of these services, and some future directions.
Distinguishes between discovery (providing a library destination) and discoverability (making stuff discoverable elsewhere).
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
Closing the scientific literature access gap with CORE - how to gain free acc...Nancy Pontika
Presented during the International Open Access Week 2020 for the Kerala Library Association, October 21, 2020.
The presentation is about CORE, a global harvester of open access scientific content and the CORE services on content discovery, managing content and access to raw data.
Librarian building blocks; or, how to make the ideal librarianDom Bortruex
"Librarian building blocks" will explore recent changes and needs in librarianship, introduce strategies for learning new skills, and inspire participants to implement these skills. This presentation is for a general audience and will cover skills for all libraries. To build the ideal librarian, we determined what skills and knowledge a contemporary librarian needs to succeed. Since job postings and MLIS curriculum reflect current, popular trends in librarianship, we developed a data harvesting Python script that gathered the data for more than 600 librarian job postings and MLIS curriculum content. Based on this data, we will present which skills are being taught and which skills need to be taught. The presentation will explore what these changes in technology and librarianship mean for current librarians and how they can stay up to date in the continuously evolving field of librarianship.
Presented August 6, 2007 at the Florida Library Association and Panhandle Library Access Network Unconference, 'Web 2.0 and Library 2.0: Up Close and Personal.'
Participants will be able to:
Describe the different types of e-resource
Contrast their features and functionality
Describe the different access routes for electronic resources
Identify some of the access options available within developing countries
Access scholarly electronic resources
Visibility and internationalization USARB Through Institutional Repository [Resursă electronică] : Expoziţie / Bibl. Şt. a Univ. de Stat "Alecu Russo" din Bălţi ; realizare: Igor Afatin, Lina Mihaluţa, Tatiana Prian. - Bălţi, 2018.
Digital Commons Institutional Repository: Roles for Library LiaisonsSammie Morris
Presentation about selecting and implementing Digital Commons as the institutional repository system for Florida State University. The presentation discusses the roles library liaisons and subject bibliographers can play in encouraging faculty and student use of the repository. Presented at Florida State University, May 2011.
Site up an open access-ICAR
Institutional Repository-Hardware, Software, Policies and Personnel.
ICAR Initiatives
Under NATP Project – Integrated National Agricultural Resources Information System INARIS (Rai et. Al., 2007). A Central Data warehouse (CWD) of agricultural resources was established at IASRI
This project having collaborations with 13 other organizations of ICAR.
In this view 13 different data marts were designed.
This Project was available under this link (http://agdw.iasri.res.in)
My outlook Country should have agri-search engine
Agri-Search Engine should be developed in country to aggregate information from the internet and provide it to farmers in meaningful manner through using ICT tools.
Agri-Search Engine be coordinated with Govt. of India’s Agricultural Websites to monitor each website per day.
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
The session will provide an overview of the HathiTrust Research Center including its mission and current status. It will also include a demonstration of current HTRC phase one technology and services. Additionally, the speakers will address the HTRC's role in supporting humanities research at scale.
Library discovery: past, present and some futureslisld
A presentation at the NISO virtual conference on Webscale Discovery Services, 20 November 2013.
Considers some of the issues that have led to the adoption of these services, and some future directions.
Distinguishes between discovery (providing a library destination) and discoverability (making stuff discoverable elsewhere).
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
Closing the scientific literature access gap with CORE - how to gain free acc...Nancy Pontika
Presented during the International Open Access Week 2020 for the Kerala Library Association, October 21, 2020.
The presentation is about CORE, a global harvester of open access scientific content and the CORE services on content discovery, managing content and access to raw data.
Librarian building blocks; or, how to make the ideal librarianDom Bortruex
"Librarian building blocks" will explore recent changes and needs in librarianship, introduce strategies for learning new skills, and inspire participants to implement these skills. This presentation is for a general audience and will cover skills for all libraries. To build the ideal librarian, we determined what skills and knowledge a contemporary librarian needs to succeed. Since job postings and MLIS curriculum reflect current, popular trends in librarianship, we developed a data harvesting Python script that gathered the data for more than 600 librarian job postings and MLIS curriculum content. Based on this data, we will present which skills are being taught and which skills need to be taught. The presentation will explore what these changes in technology and librarianship mean for current librarians and how they can stay up to date in the continuously evolving field of librarianship.
Presented August 6, 2007 at the Florida Library Association and Panhandle Library Access Network Unconference, 'Web 2.0 and Library 2.0: Up Close and Personal.'
Participants will be able to:
Describe the different types of e-resource
Contrast their features and functionality
Describe the different access routes for electronic resources
Identify some of the access options available within developing countries
Access scholarly electronic resources
Visibility and internationalization USARB Through Institutional Repository [Resursă electronică] : Expoziţie / Bibl. Şt. a Univ. de Stat "Alecu Russo" din Bălţi ; realizare: Igor Afatin, Lina Mihaluţa, Tatiana Prian. - Bălţi, 2018.
Digital Commons Institutional Repository: Roles for Library LiaisonsSammie Morris
Presentation about selecting and implementing Digital Commons as the institutional repository system for Florida State University. The presentation discusses the roles library liaisons and subject bibliographers can play in encouraging faculty and student use of the repository. Presented at Florida State University, May 2011.
Presentation for ECSU Staff Retreat - July 2014sbclapp
Libraries, er, Librarians in the Digital Age: Disruptions, Digital Thinking & Transformation - a presentation I gave at Eastern Connecticut State University (ECSU) Library's Staff Retreat on Wed., July 23, 2014
Marketing of Digital Libraries - I presented this presentation in a guest lecture for students from the University of Leiden, the Netherlands. [December 3rd, 2009 - National Library of the Netherlands]
SGCI - URSSI - Research Software Engineers, Science Gateway Developers and Cy...Sandra Gesing
The conceptualization of the US Research Software Sustainability Institute (URSSI) just received funding in December 2017 and aims at building the focal point for RSEs in the US similar to SSI in the UK. The Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways on national and international level. Science gateways – also called virtual research environments or virtual labs – allow science and engineering communities to access shared data, software, computing services, instruments, and other resources specific to their disciplines and use them also in teaching environments. Especially the goals of the workforce development and incubator services have a broad overlap with RSE initiatives to improve career paths of developers and building on-campus developer teams. ACI-REFs (Advanced Cyberinfrastructure Research and Education Facilitators) is a synonym for RSEs and the goal of the project and the trainings aims also at building a network and training the trainers for efficient research software support. The talk will give an overview on the diverse initiatives and highlights the international collaboration possibilities.
Web-scale Discovery Services are becoming an integral part of libraries' information gathering arsenal. These services are able to use a single interface to seamlessly integrate results from a wide range of online sources, emulating the experience patrons have come to expect from Internet search engines. But despite their ability to streamline searching, discovery services provide a wide set of challenges for libraries who implement them. This virtual conference will touch on both the potential of discovery services as well as some of the issues involved.
Linked Data Love: research representation, discovery, and assessment
#ALAAC15
The explosion of linked data platforms and data stores over the last five years has been profound – both in terms of quantity of data as well as its potential impact. Research information systems such as VIVO (www.vivoweb.org) play a significant role in enabling this work. VIVO is an open source, Semantic Web-based application that provides an integrated, searchable view of the scholarly activities of an organization. The uniform semantic structure of VIVO-ISF data enables a new class of tools to advance science. This presentation will provide a brief introduction and update to VIVO and present ways that this semantically-rich data can enable visualizations, reporting and assessment, next-generation collaboration and team building, and enhanced multi-site search. Libraries are uniquely positioned to facilitate the open representation of research information and its subsequent use to spur collaboration, discovery, and assessment. The talk will conclude with a description of ways librarians are engaged in this work – including visioning, metadata and ontology creation, policy creation, data curation and management, technical, and engagement activities.
Kristi Holmes, PhD
Director, Galter Health Sciences Library
Director of Evaluation, NUCATS
Associate Professor, Preventive Medicine-Health and Biomedical Informatics
Northwestern University Feinberg School of Medicine
SGCI Science Gateways: Software sustainability via on-campus teams - Webinar ...Sandra Gesing
Achieve software sustainability via on-campus teams. SGCI can support you with a roadmap to use free resources on campus and/or build your own on-campus team
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Lisa Zilinski, Data Specialist, Carnegie Mellon University
Amy Barton, Metadata Specialist, Purdue
Tao Zhang, Digital User Experience Specialist, Purdue
Line Pouchard, Computational Science Information Specialist, Purdue
Pete E. Pascuzzi, Molecular Biosciences Information Specialist, Purdue
This presentation was provided by Tracy Bergstrom of Ithaka S+R, Todd Carpenter of NISO, Filip Jakobsen of Samhæng, Eva Jurczyk of the University of Toronto Libraries, Stacy McKenna of the University of California, Los Angeles (UCLA) Libraries, Jill Morris of PALCI and Boaz Nadav-Manes of Lehigh University, during the "Collaborative Collections Lifecycle Project Fall Update Webinar." The event was held virtually on September 27, 2023
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
Presentation for a Society of American Archivists Web Archiving Roundtable professional development webinar.
Session Description:
Two co-authors, Alexis Antracoli, Records Management Archivist at Drexel University and Kristen Yarmey, Associate Professor and Digital Services Librarian at the University of
Scranton will share their experiences and engage in discussion about their web archiving projects. The work they will be talking about is covered in “Capture All the URLs: First Steps in Web Archiving” (http://palrap.pitt.edu/ojs/index.php/palrap/article/view/67).
Kristen will discuss her and her colleagues’ first steps in web archiving at the University of Scranton, including making the case to campus stakeholders, finding funding, choosing Archive-It as well as selecting content and seeds to capture. Alexis will talk about establishing policies and implementing QA procedures. Both Alexis and Kristen will provide
insights on stumbling blocks, lessons learned, and future plans. Plenty of time will be allotted for questions and discussion.
Presentation by Sally Rumsey, The Bodleian Libraries, University of Oxford at Science and Engineering South (SES) Event - Helping Researchers Manage their Data - Friday 9th May 2014 held at Imperial College London
The panel will focus on a pilot project to ensure that all stakeholders understand the services and infrastructures to be included in the DMPs by the granting councils and CFI.
The first workshop on the "Qatar Digital Library Project”, held at Qatar University on May 20, 2013.
This project is part of a program of national priorities for scientific research NPRP, and funded by the Qatar National Research Fund (QNRF).
The project is managed by Dr. Edward Fox, the Lead Principal Investigator from Virginia Tech and Dr. Mohamed Samaka the Co-LPI from the Department of Computer Science and Engineering at Qatar University, and shared by many experts in digital libraries such as Dr. Lee Giles from Pennsylvania State University, and Dr. Richard Furuta from Texas A & M University. Consultants such as Dr. John Impagliazzo from Hofstra University in New York and Dr. Susan Lukesh, and Carol Thompson and Robert Laws, researchers Myrna Tabet and Asad Nafes from Qatar University and Tarek Kanan from Virginia Tech, Hamed AlHouri from Texas A & M University.
This workshop is the first part of a series of workshops and seminars to present the project and to train faculty, students, librarians and digital Qatari community members interested in joining the project and expand the national collections and services.
More info at http://qdl.qu.edu.qa/
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Normal Labour/ Stages of Labour/ Mechanism of Labour
20140113 q uchemxseerseminar
1. ChemXSeer:
Digital library tools, features,
and crawling characteristics
Edward A. Fox
Professor, Computer Science, Virginia Tech
Blacksburg, VA 24061 USA
fox@vt.edu http://fox.cs.vt.edu
and
Sagnik Ray Choudhury
Ph.D. Student, College of Information Science and
Technology, Penn State, USA
szr163@ist.psu.edu
13 Jan. 2014 -- QU Library, Doha, Qatar
1
3. Sponsored by Qatar University Library
HTTP://qnl.qa
HTTP://WWW.QU.EDU.QA/
Funding provided thru the ELISQ project:
Electronic Library Institute - SeerQ
HTTP://WWW.VT.EDU/
HTTP://WWW.PSU.EDU/
13 Jan. 2014 -- QU Library, Doha, Qatar
HTTP://WWW.TAMU.EDU/
3
4. Acknowledgments
• Dr. Mazen Hasna, VP and Chief Academic Officer,
Qatar University
• Dr. Rashid Alammari, Dean, College of Engineering,
Qatar University
• Dr. Moumen Hasnah , Director of Academic Research,
Qatar University
• Dr. Imad Bachir, Qatar University Library Director
• Prof. Sebti Foufou, Head of Department of Computer
Science and Engineering, Qatar University
• Prof. Ramazan Kahraman, Head of the Department of
Chemical Engineering, Qatar University
13 Jan. 2014 -- QU Library, Doha, Qatar
4
5. Additional Thanks
QScience – providing collection:
Christopher J. Leonard, Editorial Director
Paul Coyne, CTO
US National Science Foundation
(recent and current grants to Fox):
• IIS-1319578
• IIS-0916733
• DUE-0840719
• OCI-1032677
• plus those to PSU, TAMU
13 Jan. 2014 -- QU Library, Doha, Qatar
5
7. Introduction
• Digital libraries have emerged since 1991.
• Now each major publisher has its own
digital library; many others exist too.
• Related systems include:
• Institutional repositories, e.g., at QU
• Content & courseware management systems
• Research and development funding of
hundreds of millions of dollars has led to
powerful tailored systems, such as for
chemical information.
13 Jan. 2014 -- QU Library, Doha, Qatar
7
12. ELISQ – Electronic Library Institute –
SeerQ –– Project Team
Qatar University, Qatar:
Mohammed Samaka (Ph.D., Co-Lead PI)
Sumaya Ali S A Al-Maadeed (Ph.D., PI)
Myrna Tabet
Asad Nafees
Tahseena Moideen
Qatar National Library, Qatar:
Claudia Lux (PI)
Krishna RoyChowdhury
Postdoc - TBA
Virginia Tech, USA:
Edward Fox (Ph.D., Lead-PI)
Tarek Kanan
Penn. State University, USA:
C. Lee Giles (Ph.D., PI)
Sagnik Ray Choudhury
Texas A&M, USA:
Richard Furuta (Ph.D., PI)
Hamed Alhoori
Consultants:
John Impagliazzo (Ph.D., Key Investigator)
Susan Lukesh (Ph.D.)
This project was made possible by NPRP Grant # 4 - 029 - 1 – 007 from
Carole Thompson
the Qatar National Research Fund (a member of Qatar Foundation).
13 Jan. 2014 -- QU Library, Doha, Qatar
12
13. ELISQ Project (1 of 2)
Project Objectives/Aims
A. Research and prototype digital library systems and
infrastructure for Qatar, focusing initially on Qatari
information related to government and scholarly
activities.
Leverage the crawling engine from Penn State‘s SeerSuite software
infrastructure, and extend it beyond its current focus on English to
support Arabic-English collections, and to cover a broad range of
scholarly disciplines, and all types of government information.
13 Jan. 2014 -- QU Library, Doha, Qatar
13
14. ELISQ Project (2 of 2)
Project Objectives/Aims (continued)
B. Research and build the digital library community in
Qatar, supporting digital library use, services,
collection development, tailored systems, and
advancing toward a Knowledge Society.
Study scholarly activities, and engage in community building in
Qatar, so DLs can be tailored to specific domains and to the unique
needs of Qatar. Through workshops, a consulting center at the
proposed Institute, and collaborative efforts with libraries and
museums in Qatar, we will identify particular needs and uses, and
tailor collections, systems, and services, to lead toward the Qatari
Knowledge Society.
13 Jan. 2014 -- QU Library, Doha, Qatar
14
16. Crawler (Heritrix)
(for search engines & Web archives)
• A Web crawler starts with a list of URLs to visit,
called the seeds.
• On those page, identifies all the hyperlinks
• adds them to the list of URLs to visit
• recursively visits pages pointed to
• according to a set of policies.
• Prioritizes its downloads – some pages change often.
13 Jan. 2014 -- QU Library, Doha, Qatar
16
17. Selected SeerSuite Instantiations
• CiteSeerx
• http://citeseerx.ist.psu.edu
• A scientific literature digital library and search engine
• ChemXSeer
• http://chemxseer.ist.psu.edu
• Portal for researchers in environmental chemistry
integrating the scientific literature with experimental,
analytical, and simulation results and tools
• ArchSeer
• http://archseer.ist.psu.edu/
• Archeology literature
• TableSeer
13 Jan. 2014 -- QU Library, Doha, Qatar
17
18. CiteSeerX
http://citeseerx.ist.psu.edu
• CiteSeerX crawls researcher homepages on the web for scholarly papers, formerly in
computer science
• Converts PDF to text
• Automatically extracts OAI metadata and other data
• Automatic citation indexing, links to cited documents, creation of
document page, author disambiguation
• Software open source – can be used to build other such tools
• 3 M documents
• Ms of files
• 60 M citations
• 3 to 6 M authors
• 2 to 4 M hits day
• 100K documents added
monthly
• 800K individual users
• several Tbytes
13 Jan. 2014 -- QU Library, Doha, Qatar
18
21. SeerSuite
• Tool kit used to build search engines and digital libraries
• CiteSeerX , MyCiteSeerX , ChemXSeer, ArchSeer, AlgoSeer,
AckSeer, BizSeer, CSSeer, CollabSeer, RefSeer, GrantSeer,
SeerSeer, YouSeer, etc.
• Built on commercial grade open source tools (Solr/Lucene)
• Penn State expertise – automated specialized metadata
extraction
• Supports research in
• Indexing and search
• Data mining & structures
• Information and knowledge extraction
• Social networks: Name/entity disambiguation
• Scientometrics/infometrics
• Systems engineering
• User interface design (HCI = human-computer interaction)
• Software engineering and management
22. SeerSuite is not Google
• Metadata (as in library catalogs) as well as content
• Sets of collections, rather than the Web as a whole
• Provided by a curator (e.g., publisher, museum)
• Provided by user submissions
• Or collected by focused ‘crawling’
• Tailored services, rather than the same for everyone
• Browsing using categories, preserving, adding value
• Based on studying user requirements, e.g., chemists
• Working with entities, rather than just words
• Citations, tables, figures, names, chemical formula
• Using knowledge bases, machine learning, artificial intelligence
13 Jan. 2014 -- QU Library, Doha, Qatar
22
24. Search Engine and Repository for eChemistry
C. Lee Giles, Prasenjit Mitra, Karl Mueller, Levent Bolelli, Xiaonan Lu, Saurabh Kataria, Ying
Liu, Anuj Jaiswal, Kun Bai, Bingjun Sun, Isaac Councill, James Z. Wang, James Kubicki,
Barbara Garrison, William Brouwer, Joel Bandstra, Qingzhao Tan, Juan Pablo Ramirez
Fernandez, Madian Khabsa, Hung-Hsuan Chen, Sagnik Ray Choudhury
Chemistry, Computer Sciences and Engineering, Geosciences, Information Sciences and
Technology
Pennsylvania State University, University Park, PA, USA
Past funding: NSF Cyberinfrastructure Chemistry, Microsoft
Current Support: Dow Chemical
http://chemxseer.ist.psu.edu
25. Talk Overview
●
Challenges and Motivation.
●
Functionalities
–
–
–
–
–
–
–
●
Fulltext Search
Author Search
Table Search
Figure Search
Expertise Search
Chemical Name and Formula Tagging
Chemical Name and Formula Search
Summary.
34. ChemXSeer Figure/Plot Data Extraction and
Search
Numerical data in
scientific publications
are often found in figures.
No search engine allows
searching on figures and their
data in chemical documents.
Tools that automate the data extraction from figures and allow
search on them can provide the following:
•
•
•
•
Increases our understanding of key concepts of papers.
Provides data for automatic comparative analyses.
Enables regeneration of figures in different contexts.
Enables search for documents with figures containing specific experiment
results.
X. Lu, et.al, JCDL 2006., Ray Choudhury et al. JCDL 2013, ICDAR 2013
36. ChemXSeer Name and Formula Extraction
and Search
• Extraction and search of chemical names and formulae in scientific
documents has been shown to be very useful.
• Extraction and search on chemical names is hard:
– Many chemical molecules are created everyday, any dictionary based name
recognizer will fail eventually.
– Names need to segmented to get semantically meaningful sub-terms such as
“methyl”, “ethyl” and “alcohol” from “methylethyl alcohol”.
• Identifying formula is hard:
• “… YSI 5301, Yellow Springs, OH, USA …” (Non-formula)
• “… such as hydroxyl radical OH, superoxide O2- …” (formula)
• For searching, formulae cannot be treated as text.
• Domain knowledge (formula identification)
• Structural knowledge (substructure finding and search)
B. Sun, et.al., WWW 2007, WWW 2008, TOIS
37. Chemical Entity Extraction and Tagging
●
Name tagging
–
–
Each chemical name can be a phrase
Example
●
●
●
Formula tagging
–
–
Each formula is a single term
Example
●
–
"... such as hydroxyl radical OH, superoxide ..."
Non-formula example
●
●
"... Determination of lactic acid and ...“
"... insecticide promecarb (3-isopropyl-5-methylphenyl
methylcarbamate) acts against ..."
"... YSI 5301, Yellow Springs, OH, USA ... ”
Tagging examples
–
Name tagging:
"... of <name-type>lactic acid</name-type> and ...“
–
Formula tagging:
"...
radical <formula-type>OH</formula-type> , superoxide ..."
38. Online Chemical Entity Tagger
●
●
We have an open source chemical name and formula
tagger and a web based interface for evaluation.
The interface takes a PDF file as input, returns text of the
PDF with names or formulas tagged.
39. Online Chemical Entity Tagger: Chemical Name
Tagging Example
●
●
●
Results on a sample PDF.
Some chemical formula erroneously identified as chemical name (loss
of precision).
High recall (most chemical names identified)
40. Online Chemical Entity Tagger: Chemical
Formula Tagging Example
●
●
●
Results on a sample PDF.
Some chemical formulas not identified (loss of recall).
High precision (words identified as formula are actual formulas)
41. Chemical Name Indexing and Search
• Index Schemes:
– Which tokens to index?
– Indexing all subsequences generates a large size index
– “but” in “butane” is morpheme, but not for “nembutal”.
●
Segmentation-based index scheme
–
–
–
–
–
Used for indexing chemical names
First segment a chemical name hierarchically and then index
substrings at each node if frequent.
acetaldoxime->aldoxime->oxime.
Search for oxime returns all, depending on ranking function.
This can not be done in usual text search.
43. Expert Recommendation - CiteSeerX
http://seerseer.ist.psu.edu (new version CSSeers)
Built on top of millions of
papers in CiteSeerX.
A similar system was
developed for Dow
Chemicals.
Can find experts in
“polymer chemistry” or
expertise of “Linus Pauling”
Finds an expert based on
their publications.
Many approaches:
Keyphases
Citations
Download count.
Treeratpituk, Chen, JCDL’13
Affiliation
44. Future Work
Lots of interesting work to do! Few computer/machine
learning scientists involved.
•
•
•
•
•
•
•
•
•
•
Acquisitions - more documents, data, knowledge
Chemical 3D graph search
Fundamental chemical graph representation analysis
Table data storage and access
Figure search and data extraction and access
New data and feature search
• spectra, experimental methods, instrumentation
New documents: 400K PubMed
Semantic chemical graphs
Expert/collaborator search
Search integration of all features