A java prototype that processes the result set of pre-downloaded data (from a database) and allows one to claim his/her publications from a ranked list.
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
Development of plugins for access to researchers identified in VIVO on the ScientistsDB website. Also developed a plugin to access Elasticsearch from within Eureka.
Presentation on the use of the Eureka Research Workbench to store data and scientific workflow information. Presented online as part of the Dial-a-molecule 'Liberating Laboratory Data' event (http://www.dial-a-molecule.org/wp/events-listing/liberating-laboratory-data/)
SciFinder and its utility in Drug discoveryAlichy Sowmya
SciFinder Scholar® is a Z39.50 Windows-based interface that provides easy access to the rich and diverse scientific information contained in the CAS databases including Chemical Abstracts from 1907 onwards. SFS is an elegant search interface to six core chemical-related databases. Five of these databases are produced by CAS itself
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Stuart Chalk
Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
Development of plugins for access to researchers identified in VIVO on the ScientistsDB website. Also developed a plugin to access Elasticsearch from within Eureka.
Presentation on the use of the Eureka Research Workbench to store data and scientific workflow information. Presented online as part of the Dial-a-molecule 'Liberating Laboratory Data' event (http://www.dial-a-molecule.org/wp/events-listing/liberating-laboratory-data/)
SciFinder and its utility in Drug discoveryAlichy Sowmya
SciFinder Scholar® is a Z39.50 Windows-based interface that provides easy access to the rich and diverse scientific information contained in the CAS databases including Chemical Abstracts from 1907 onwards. SFS is an elegant search interface to six core chemical-related databases. Five of these databases are produced by CAS itself
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Stuart Chalk
Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
Seminar Presentation for PMB Department, UC Berkeley for Love Data Week. Subject is how to prepare publications and associated data sets for maximum reuse.
The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample—a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata records are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the analyzed metadata. The BioSample metadata field names and their values are not standardized or controlled—15% of the metadata fields use field names not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or numeric fields are often populated with inadequate values of different data types (e.g., only 27% of Boolean values are valid). Overall, the metadata in BioSample reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The aberrancies in the metadata are likely to impede search and secondary use of the associated datasets.
We describe current work in federating data from institutional research profiling systems – providing single-point
access to substantial numbers of investigators through concept-driven search, visualization of the relationships
among those investigators and the ability to interlink systems into a single information ecosystem.
iAuthor.cn: ORCID China Services and International Identifier for Researchersjianyongzhang
iAuthor is a China Registry Tool for ORCID developed by National Science Library
--Providing ORCID China Services and International Identifiers for Researchers
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
The current movement toward openness and sharing of data is likely to have a profound effect on the speed of scientific research and the complexity of questions we can answer. However, a fundamental problem with currently available datasets (and their metadata) is heterogeneity in terms of implementation, organization, and representation.
To address this issue we have developed a generic scientific data model (SDM) to organize and annotate raw and processed data, and the associated metadata. This paper will present the current status of the SDM, implementation of the SDM in JSON-LD, and the associated scientific data model ontology (SDMO). Example usage of the SDM to store data from a variety of sources with be discussed along with future plans for the work.
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
Recently, the US government has mandated that publicly funded scientific research data be freely made available in a useable form, allowing integration of data in other systems. While this mandate has been articulated, existing publications and new papers (PDF) still do not provide accessible data, meaning that the usefulness is limited without human intervention.
This presentation outlines our efforts to extract scientific data from PDF files, using the PDFToText software and regular expressions (regex), and process it into a form that structures the data and its context (metadata). Extracted data is processed (cleaned, normalized), organized, and inserted into a contextually developed MySQL database. The data and metadata can then be output using a generic JSON-LD based scientific data model (SDM) under development in our laboratory.
The availability of high-quality metadata is key to facilitating discovery in the large variety of scientific datasets that are increasingly becoming publicly available. However, despite the recent focus on metadata, the diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality. There is a pressing need for a metadata representation format that provides strong interoperation capabilities together with robust semantic underpinnings. In this talk, we describe such a format, together with open-source Web-based tools that support the acquisition, search, and management of metadata. We outline an initial evaluation using metadata from a variety of biomedical repositories.
Scientific Units in the Electronic AgeStuart Chalk
Scientists have standardized on the SI unit system since the late 1700’s. While much work has been done over the years to refine and redefine the system, little has formally done to standardize the representation of the SI units in electronic systems.
This paper will present a summary of current efforts toward electronic representation of scientific units in text, XML, and RDF, an analysis of needs for current computer/network systems, and an outline of future work.
Datat and donuts: how to write a data management planC. Tobin Magle
Good data management practices are becoming increasingly important in the digital age. Because we now have the technology to freely share research data and also because funding agencies want to do more with decreasing research funds, many funding agencies and journals require authors and grantees to share their research data. To provide training in this area, Tobin Magle, the Morgan Library's Cyberinfrastructure Facilitator, is putting on a series of data management workshops called "Data and Donuts". The first session of Data and Donuts will discuss the importance of data management and how to write a data management plan.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
The scientific and economic value of research data is enormous. To ensure successful subsequent usage, the scientific community needs efficient access to data, the data has to be reliable and persistent, and the quality of the data has to be proved.
One solution to these preconditions is to apply the techniques of today’s scientific publishing to research data. Besides its publication in a data repository together with some metadata, the data should undergo a transparent public peer-review using a publication platform.
The presentation discusses two approaches. On the one hand, the data can be the basis for a research article and undergoes a review parallel to the review of the manuscript. The data is then a reviewed supplement to a scientific publication. On the other hand, the data itself can be the subject of a publication whose quality is then assured by peers.
The presentation provides practical experience, especially with the latter strategy, realized through an established open access journal.
Slides for a presentation to GSE faculty on using research and citation tools: RefWorks, EndNote, Zotero, and Mendeley. For links view here: http://gseqt.wordpress.com/workshops/research-tools/
Data Publishing Workflows with DataverseMicah Altman
By: Mercè Crosas, Director of Data Science at the Institute for Quantitative Social Science (IQSS) at Harvard University
The Dataverse software provides multiple workflows for data publishing to support a wide range of data policies and practices established by journals, as well as data sharing needs from various research communities. This talk will describe these workflows from the user experience and from the system's technical implementation.
This talk was presented as part of the Information Science Brown Bag talks, hosted by the Program on Information Science. (See http://drmaltman.wordpress.com)
Bearcat Search: Implementing Federated Searching at the Newman LibraryNewman Library
A presentation by Michael Waldman, Lisa Ellis, Stephen Francoeur, Joseph Hartnett, and Rita Ormsby at the Teaching & Technology Conference, 28 March 2008, Baruch College, New York, NY
The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.
How to make your published data findable, accessible, interoperable and reusablePhoenix Bioinformatics
Seminar Presentation for PMB Department, UC Berkeley for Love Data Week. Subject is how to prepare publications and associated data sets for maximum reuse.
The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample—a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI). We tested whether 6.6 million BioSample metadata records are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the analyzed metadata. The BioSample metadata field names and their values are not standardized or controlled—15% of the metadata fields use field names not specified in the BioSample data dictionary. Only 9 out of 452 BioSample-specified fields ordinarily require ontology terms as values, and the quality of these controlled fields is better than that of uncontrolled ones, as even simple binary or numeric fields are often populated with inadequate values of different data types (e.g., only 27% of Boolean values are valid). Overall, the metadata in BioSample reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The aberrancies in the metadata are likely to impede search and secondary use of the associated datasets.
We describe current work in federating data from institutional research profiling systems – providing single-point
access to substantial numbers of investigators through concept-driven search, visualization of the relationships
among those investigators and the ability to interlink systems into a single information ecosystem.
iAuthor.cn: ORCID China Services and International Identifier for Researchersjianyongzhang
iAuthor is a China Registry Tool for ORCID developed by National Science Library
--Providing ORCID China Services and International Identifiers for Researchers
A Generic Scientific Data Model and Ontology for Representation of Chemical DataStuart Chalk
The current movement toward openness and sharing of data is likely to have a profound effect on the speed of scientific research and the complexity of questions we can answer. However, a fundamental problem with currently available datasets (and their metadata) is heterogeneity in terms of implementation, organization, and representation.
To address this issue we have developed a generic scientific data model (SDM) to organize and annotate raw and processed data, and the associated metadata. This paper will present the current status of the SDM, implementation of the SDM in JSON-LD, and the associated scientific data model ontology (SDMO). Example usage of the SDM to store data from a variety of sources with be discussed along with future plans for the work.
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
Recently, the US government has mandated that publicly funded scientific research data be freely made available in a useable form, allowing integration of data in other systems. While this mandate has been articulated, existing publications and new papers (PDF) still do not provide accessible data, meaning that the usefulness is limited without human intervention.
This presentation outlines our efforts to extract scientific data from PDF files, using the PDFToText software and regular expressions (regex), and process it into a form that structures the data and its context (metadata). Extracted data is processed (cleaned, normalized), organized, and inserted into a contextually developed MySQL database. The data and metadata can then be output using a generic JSON-LD based scientific data model (SDM) under development in our laboratory.
The availability of high-quality metadata is key to facilitating discovery in the large variety of scientific datasets that are increasingly becoming publicly available. However, despite the recent focus on metadata, the diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality. There is a pressing need for a metadata representation format that provides strong interoperation capabilities together with robust semantic underpinnings. In this talk, we describe such a format, together with open-source Web-based tools that support the acquisition, search, and management of metadata. We outline an initial evaluation using metadata from a variety of biomedical repositories.
Scientific Units in the Electronic AgeStuart Chalk
Scientists have standardized on the SI unit system since the late 1700’s. While much work has been done over the years to refine and redefine the system, little has formally done to standardize the representation of the SI units in electronic systems.
This paper will present a summary of current efforts toward electronic representation of scientific units in text, XML, and RDF, an analysis of needs for current computer/network systems, and an outline of future work.
Datat and donuts: how to write a data management planC. Tobin Magle
Good data management practices are becoming increasingly important in the digital age. Because we now have the technology to freely share research data and also because funding agencies want to do more with decreasing research funds, many funding agencies and journals require authors and grantees to share their research data. To provide training in this area, Tobin Magle, the Morgan Library's Cyberinfrastructure Facilitator, is putting on a series of data management workshops called "Data and Donuts". The first session of Data and Donuts will discuss the importance of data management and how to write a data management plan.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
The scientific and economic value of research data is enormous. To ensure successful subsequent usage, the scientific community needs efficient access to data, the data has to be reliable and persistent, and the quality of the data has to be proved.
One solution to these preconditions is to apply the techniques of today’s scientific publishing to research data. Besides its publication in a data repository together with some metadata, the data should undergo a transparent public peer-review using a publication platform.
The presentation discusses two approaches. On the one hand, the data can be the basis for a research article and undergoes a review parallel to the review of the manuscript. The data is then a reviewed supplement to a scientific publication. On the other hand, the data itself can be the subject of a publication whose quality is then assured by peers.
The presentation provides practical experience, especially with the latter strategy, realized through an established open access journal.
Slides for a presentation to GSE faculty on using research and citation tools: RefWorks, EndNote, Zotero, and Mendeley. For links view here: http://gseqt.wordpress.com/workshops/research-tools/
Data Publishing Workflows with DataverseMicah Altman
By: Mercè Crosas, Director of Data Science at the Institute for Quantitative Social Science (IQSS) at Harvard University
The Dataverse software provides multiple workflows for data publishing to support a wide range of data policies and practices established by journals, as well as data sharing needs from various research communities. This talk will describe these workflows from the user experience and from the system's technical implementation.
This talk was presented as part of the Information Science Brown Bag talks, hosted by the Program on Information Science. (See http://drmaltman.wordpress.com)
Bearcat Search: Implementing Federated Searching at the Newman LibraryNewman Library
A presentation by Michael Waldman, Lisa Ellis, Stephen Francoeur, Joseph Hartnett, and Rita Ormsby at the Teaching & Technology Conference, 28 March 2008, Baruch College, New York, NY
Open Annotation Rollout, Manchester, 2013-06-25
See also PPTX version with Notes: http://www.slideshare.net/soilandreyes/2013-0624annotatingr-osopenannotationmeeting
Open Annotation Rollout, Manchester, 2013-06-25
See also PDF version: http://www.slideshare.net/soilandreyes/2013-0624annotatingr-osopenannotationmeeting-23289491
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
This presentation is from the inaugural Atlanta Solr Meetup held on 2014/10/21 at Atlanta Tech Village.
Description: CareerBuilder uses Solr to power their recommendation engine, semantic search, and data analytics products. They maintain an infrastructure of hundreds of Solr servers, holding over a billion documents and serving over a million queries an hour across thousands of unique search indexes. Come learn how CareerBuilder has integrated Solr into their technology platform (with assistance from Hadoop, Cassandra, and RabbitMQ) and walk through api and code examples to see how you can use Solr to implement your own real-time recommendation engine, semantic search, and data analytics solutions.
Speaker: Trey Grainger is the Director of Engineering for Search & Analytics at CareerBuilder.com and is the co-author of Solr in Action (2014, Manning Publications), the comprehensive example-driven guide to Apache Solr. His search experience includes handling multi-lingual content across dozens of markets/languages, machine learning, semantic search, big data analytics, customized Lucene/Solr scoring models, data mining and recommendation systems. Trey is also the Founder of Celiaccess.com, a gluten-free search engine, and is a frequent speaker at Lucene and Solr-related conferences.
Scott Edmunds talk on Big Data Publishing at the "What Bioinformaticians need to know about digital publishing beyond the PDF" workshop at ISMB 2013, July 22nd 2013
An accademic project developed for the Web Information Retrieval class at Sapienza Università di Roma, Master Degree in Computer Engineering A.Y. 2019-20
The goal of the project is to rank computer scientists based on their influence using Google's PageRank and HITS algorithm.
For further information you can check our repository:
https://github.com/LucaTomei/Computer_Scientists
Members of the project:
Luca Tomei
Andrea Aurizi
Daniele Iacomini
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...Yongyao Jiang
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
Keynote speech - Carole Goble - Jisc Digital Festival 2015Jisc
Carole Goble is a professor in the school of computer science at the University of Manchester.
In this keynote, Carole offered her insights into research data management and data centres.
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
Keynote at JISC Digifest 2015 on Reproducibility and Research Objects in Scholarly Communication
Includes hidden slides
All material except maybe the IT Crowd screengrab reusable
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
Elasticsearch provides native integration with Apache Spark through ES-Hadoop. However, especially during development, it is at best cumbersome to have Elasticsearch running in a separate machine/instance. Leveraging Spark Cluster with Elasticsearch Inside it is possible to run an embedded instance of Elasticsearch in the driver node of a Spark Cluster. This opens up new opportunities to develop cutting-edge applications. One such application is Dataset Search.
Oscar will give a demo of a Dataset Search Engine built on Spark Cluster with Elasticsearch Inside. Motivation is that once Elasticsearch is running on Spark it becomes possible and interesting to have the Elasticsearch in-memory instance join an (existing) Elasticsearch cluster. And this in turn enables indexing of Datasets that are processed as part of Data Pipelines running on Spark. Dataset Search and Data Management are R&D topics that should be of interest to Spark Summit East attendees who are looking for a way to organize their Data Lake and make it searchable.
Similar to Open Harvester - Search publications for a researcher from CrossRef, PubMed and DBLP (20)
VIZ-VIVO: Towards Visualizations-driven Linked Data NavigationMuhammad Javed
Paper published in ISWC correlated workshop VOILA 2016.
Abstract: Scholars@Cornell is a new project of Cornell University Library (CUL) that provides linked data and novel visualizations of the scholarly record. Our goal is to enable easy discovery of explicit and latent patterns that can reveal high-impact research areas, the dynamics of scholarly collaboration, and expertise of faculty and researchers. We describe VIZ-VIVO, an extension for the VIVO framework that enables end-user exploration of a scholarly knowledge-base through a configurable set of data-driven visualizations. Unlike systems that provide web pages of researcher profiles using lists and directory-style metaphors, our work explores the power of visual metaphors for navigating a rich semantic network of scholarly data modeled with the VIVO-ISF ontology. We produce dynamic web pages using D3 visualizations and bridge the user experience layer with the underlying semantic triple-store layer. Our selection of visual metaphors enables end users to start with the big picture of scholarship and navigate to individuals faculty and researchers within a macro visual context. The D3-enabled interactive environment can guide the user through a sea of scholarly data depending on the questions the user wishes to answer. In this paper, we discuss our process for selection, design, and development of an initial set of visualizations as well as our approach to the underlying technical architecture. By engaging an initial set of pilot partners we are evaluating the use of these data-driven visualizations by multiple stakeholders, including faculty, students, librarians, administrators, and the public.
Scholars@Cornell: Visualizing the Scholarship DataMuhammad Javed
Short paper published in IEEE Visualizations in Practice workshop. Phoenix, AZ.
A new project of CUL is Scholars@Cornell, a data and visualization service built upon VIVO’s semantic, linked data knowledge-base that represents the record of scholarship produced by Cornell faculty and researchers. While adhering to the VIVO ontology, our work on Scholars@Cornell helps move VIVO forward in the technology areas that require a looser coupling of backend and frontend technologies. One key question we set out to answer was “how can visual mediation help users navigate the rich semantic data that represent the scholarship data recorded in VIVO knowledge-base?” Can visualizations be used to make the content more consumable and answer the questions that cannot easily be answered by browsing list views.
VIVO: A Community-driven Research Information Management System: Challenges a...Muhammad Javed
Presentation given at NISO Virtual Conference.
"Research Information Systems: The Connections Enabling Collaboration".
https://www.niso.org/events/2017/08/research-information-systems-connections-enabling-collaboration
Scholars@Cornell: Visualizing the scholarly recordMuhammad Javed
As stewards of the scholarly record, Cornell University Library is developing a data and visualization service known as Scholars@Cornell with the goal of improving the visibility of Cornell research and enabling discovery of explicit and latent patterns of scholarly collaboration. We provide aggregate views of data where dynamic visualizations become the entry points into a rich graph of knowledge that can be explored interactively to answer questions such as: Who are the experts in what areas? Which departments collaborate with each other? What are patterns of interdisciplinary research? And more. Key components of the system are Symplectic Elements to provide automated citation feeds from external sources such as Web of Science, the Scholars "Feed Machine" that performs automated data curation tasks, and the VIVO semantic linked data store. The new "VIZ-VIVO" component bridges the chasm between the back-end of semantically rich data with a front-end user experience that takes advantage of new developments in the world of dynamic web visualizations. We will demonstrate a set of D3 visualizations that leverage relationships between people (e.g., faculty), their affiliations (e.g., academic departments), and published research outputs (e.g., journal articles by subject area). We will discuss our results with two of the initial pilot partners at Cornell University, the School of Engineering and the Johnson School of Management.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Open Harvester - Search publications for a researcher from CrossRef, PubMed and DBLP
1. OpenHarvester
Muhammad Javed, Ph.D.
Ontology Engineer
Tech Lead (Scholars@Cornell)
Cornell University Library
A java prototype that processes the result set of pre-downloaded data
(from a database) and allows one to claim his/her publications from a
ranked list.
3. Reason 1: Symplectic Elements do not “search” publications, neither in
CrossRef nor in EPubmed Central. (Supplementary Sources)
Reason 2: Adrenaline in my veins to explore what data can be
harvested using open citation APIs.
Reason 2.1: Now I can access the data, can I harvest publications
for a researcher ?
Three reasons why I did this preliminary work:
Reason 3: To understand, what data is required to successfully
find publications for a researcher ?
4. Limitations:
1. Works on pre-downloaded data.
1. Step 1: Search database and download result set.
2. Step II: Process result set and harvest researcher’s publications.
2. No name diacritic handling.
3. Name handling requires some more tweaks.
4. and more..
5. 1
Name
USER PROFILE
…
(String)
• Dean B. Krafft
• D. Krafft
• Dean Blackmar Krafft
• Dean Krafft• Dean Krafft
Search name in
the author list
(Dean OR Krafft)
2
Set of
Publications
Result Set
3
Ranked list of
Publications
Ranking
4
List of my
Publications
List Review
from top
Claim a Publication
5
6
Update Profile 7
Re-Ranking
(based on update profile) 8
• Dean Krafft
Dean B. Krafft
6. User Profile:
1. NameVariants
2. Start/EndYear
3. Affiliations
4. Co-authors
5. Subject Areas
6. Identifiers (personal)
7. Identifiers (publications)
8. and more…
7. Two Step Process
java -jar CrossrefDataDownloader.jar Dean+Krafft CROSSREF
Step 1: Download Data
Downloads result set and save files in a folder named as “Dean+Krafft”
Step 2: Process Result Set
Processes result set files from folder “Dean+Krafft”
search string output base-folder
15. 1
Name Affiliation
• Cornell University Library
• Department of Computer
Science
• Cornell University
Co-Authors
• Simeon Warner
• Carl Lagoze
• ….
…
Year Range
• 1978 (start)
• 2016 (end)
Subject
• Computer Science
• Library & Info. Sci.
USER PROFILE
…
(String)
• Dean B. Krafft
• D. Krafft
• Dean Blackmar Krafft
• Dean Krafft• Dean Krafft
Search name in
the author list
(Dean OR Krafft)
2
List of
Publications
Result Set
3
Ranked list of
Publications
Ranking
4
List of my
Publications
List Review
(from top)
Claim a Publication
5
6
Update Profile 7
Re-Ranking
(based on update profile) 8
• Dean Krafft
Dean Krafft
USER PROFILE