Slides from talk for ICZN/SHNH symposium in honour of Charles Sherbon: "Anchoring Biodiversity Information: From Sherborn to the 21st Century and Beyond"
Current metadata landscape in the library world Getaneh AlemuGetaneh Alemu
This workshop was presented at MTSR-2017 (Nov. 27, 2017) in Tallinn, Estonia http://www.mtsr-conf.org/index.php/programme The workshop aims to bring the current metadata landscape in libraries in context, with particular emphasis on emerging theory/principles and best practices covering:
• The theory of enriching and filtering
• Metadata enriching through RDA (Hands on - The RDA Toolkit and implementation of RDA at Southampton Solent University)
• Metadata filtering through FRBR (practical issues that cataloguers face in FRBRising their catalogue)
• Metadata management (metadata quality, authority control and subject headings)
• Metadata systems, tools and applications (practical issues of e-books and database cataloguing)
Slides from talk for ICZN/SHNH symposium in honour of Charles Sherbon: "Anchoring Biodiversity Information: From Sherborn to the 21st Century and Beyond"
Current metadata landscape in the library world Getaneh AlemuGetaneh Alemu
This workshop was presented at MTSR-2017 (Nov. 27, 2017) in Tallinn, Estonia http://www.mtsr-conf.org/index.php/programme The workshop aims to bring the current metadata landscape in libraries in context, with particular emphasis on emerging theory/principles and best practices covering:
• The theory of enriching and filtering
• Metadata enriching through RDA (Hands on - The RDA Toolkit and implementation of RDA at Southampton Solent University)
• Metadata filtering through FRBR (practical issues that cataloguers face in FRBRising their catalogue)
• Metadata management (metadata quality, authority control and subject headings)
• Metadata systems, tools and applications (practical issues of e-books and database cataloguing)
Getting onboard the data training: How librarians fit inDiane Clark
Academic librarians' roles and responsibilities are evolving and expanding into the area of data, how to manage, share, access and preserve. Providing training on the topic of how to discuss data with faculty and researchers was the focus of upskilling a cohort of academic librarians.
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
An overview of recent works on entitiy linking and retrieval in large corpora, specifically bibliographic data. The works address both traditional Linked Data and knowledge graphs as well as data extracted from Web markup, such as the Web Data Commons.
Of Libraries and Labs: Effecting User-Driven InnovationAlex Humphreys
JSTOR has launched a new Labs team charged with
partnering with libraries and scholars to build innovative
tools for research and teaching. The JSTOR Labs team has
successfully used ‘flash builds’ – high-intensity, short-burst,
user-driven development efforts – in order to bring an idea
from conception to a working, user-delighting prototype in
as little as a week. In this talk the presenter will describe
the approach to flash builds, highlight the partnerships,
skills, tools and content that help to innovate, and suggest
ways that libraries can adopt these methods to support
innovation and the digital humanities.
Integrating with others: Stable VIVO URIs for local authority records; linkin...Violeta Ilik
Integrating with others: Stable VIVO URIs for local authority records; linking to VIAF; ORCID organizational identifiers; W3C Dataset ontology work by Melissa Haendel & Violeta Ilik, VIVO Implementation Fest, Durham NC, March 20, 2014
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...datascienceiqss
Richard Ball and Norm Medeiros will demonstrate how Dataverse is used within their Project TIER (Teaching Integrity in Empirical Economics) initiative to organize and showcase student work for transparency and reproducibility. Richard and Norm will discuss the prospect of extending Dataverse to serve as a resource for the Project TIER network of institutions and instructors.
What do MARC, RDF, and OWL have in common?Violeta Ilik
It is understood that in the current library ecosystem, catalogers must be willing to adapt to new semantic web environment while keeping in mind the crucial library mission – providing efficient access to information. How could catalogers transform their jobs in order to enable library users to retrieve information more effectively in the age of semantic web?
Researchers have argued that catalogers have the fundamental skills to successfully work with and repurpose the metadata originally created for use in traditional library systems by utilizing various programing languages. In the new environment their jobs will require new tools and new systems but the basic skills of organization of information, knowledge of commonly used access points, and an ever growing knowledge of information technology systems will still be the same. This presentation will stress the role of catalogers in bringing the data silos down, merging, augmenting, and creating interoperable data that can be used not just in library specific systems, but in various other systems. Catalogers’ indispensable knowledge of controlled vocabularies, authority aggregators, metadata creation, metadata reuse, taxonomies, and data stores makes it all possible.
We will demonstrate how catalogers’ knowledge can be leveraged to design an institutional repository and/or a researchers profiling system, create semantic web compliant data, create ontologies, utilize unique identifiers, and (re)use data from legacy systems.
Re-imagining the role of Institutional Repository in Open ScholarshipLeslie Chan
Keynote at the OpenAIRE and COAR Joint Conference Open Access: Movement to Reality
Putting the Pieces Together. Acropolis Museum, Athens, Greece, May 21-13, 2014
Presentation to CRC Mental Health Early Career Researcher Workshop, Melbourne 29.11.17 for @andsdata.
Workshop title: A by-product of scientific training: We're all a little bit biased.
Getting onboard the data training: How librarians fit inDiane Clark
Academic librarians' roles and responsibilities are evolving and expanding into the area of data, how to manage, share, access and preserve. Providing training on the topic of how to discuss data with faculty and researchers was the focus of upskilling a cohort of academic librarians.
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
An overview of recent works on entitiy linking and retrieval in large corpora, specifically bibliographic data. The works address both traditional Linked Data and knowledge graphs as well as data extracted from Web markup, such as the Web Data Commons.
Of Libraries and Labs: Effecting User-Driven InnovationAlex Humphreys
JSTOR has launched a new Labs team charged with
partnering with libraries and scholars to build innovative
tools for research and teaching. The JSTOR Labs team has
successfully used ‘flash builds’ – high-intensity, short-burst,
user-driven development efforts – in order to bring an idea
from conception to a working, user-delighting prototype in
as little as a week. In this talk the presenter will describe
the approach to flash builds, highlight the partnerships,
skills, tools and content that help to innovate, and suggest
ways that libraries can adopt these methods to support
innovation and the digital humanities.
Integrating with others: Stable VIVO URIs for local authority records; linkin...Violeta Ilik
Integrating with others: Stable VIVO URIs for local authority records; linking to VIAF; ORCID organizational identifiers; W3C Dataset ontology work by Melissa Haendel & Violeta Ilik, VIVO Implementation Fest, Durham NC, March 20, 2014
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...datascienceiqss
Richard Ball and Norm Medeiros will demonstrate how Dataverse is used within their Project TIER (Teaching Integrity in Empirical Economics) initiative to organize and showcase student work for transparency and reproducibility. Richard and Norm will discuss the prospect of extending Dataverse to serve as a resource for the Project TIER network of institutions and instructors.
What do MARC, RDF, and OWL have in common?Violeta Ilik
It is understood that in the current library ecosystem, catalogers must be willing to adapt to new semantic web environment while keeping in mind the crucial library mission – providing efficient access to information. How could catalogers transform their jobs in order to enable library users to retrieve information more effectively in the age of semantic web?
Researchers have argued that catalogers have the fundamental skills to successfully work with and repurpose the metadata originally created for use in traditional library systems by utilizing various programing languages. In the new environment their jobs will require new tools and new systems but the basic skills of organization of information, knowledge of commonly used access points, and an ever growing knowledge of information technology systems will still be the same. This presentation will stress the role of catalogers in bringing the data silos down, merging, augmenting, and creating interoperable data that can be used not just in library specific systems, but in various other systems. Catalogers’ indispensable knowledge of controlled vocabularies, authority aggregators, metadata creation, metadata reuse, taxonomies, and data stores makes it all possible.
We will demonstrate how catalogers’ knowledge can be leveraged to design an institutional repository and/or a researchers profiling system, create semantic web compliant data, create ontologies, utilize unique identifiers, and (re)use data from legacy systems.
Re-imagining the role of Institutional Repository in Open ScholarshipLeslie Chan
Keynote at the OpenAIRE and COAR Joint Conference Open Access: Movement to Reality
Putting the Pieces Together. Acropolis Museum, Athens, Greece, May 21-13, 2014
Presentation to CRC Mental Health Early Career Researcher Workshop, Melbourne 29.11.17 for @andsdata.
Workshop title: A by-product of scientific training: We're all a little bit biased.
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...OpenAIRE
Presentation at the OpenAIRE-COAR Conference: "Open Access Movement to Reality: Putting the Pieces Together", Athens - May 21-22, 2014.
Re-imagining the role of institutional repositories in open scholarship, by Leslie Chan - Senior Lecturer in the Department of Social Sciences at the University of Toronto Scarborough.
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
I have spend 2 years carrying out Content Mining (aka Text and Data Mining) in the UK under the 2014 "Hargreaves" exception. This talk was given in Paris, to ADBU , after France had passed the law of the numeric Republique. I illustrate what worked in what did not and why and offer ideas to France and Europe
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
This is largely a compilation of various other talks that I have posted here - a summary of the past 3+ years of work on SADI/SHARE. It includes the (now well-worn!!) slides about SHARE, as well as some of the more contemporary stuff about how we extended GALEN clinical classes with richer semantic descriptions, and then used them to do automated clinical phenotype analysis. Also includes the slide-deck related to automated Measurement Unit conversion (related to our work on semantically representing Framingham clinical risk assessment rules)
So... for anyone who regularly follows my uploads, there isn't much "new" in here, but at least it's all in one place now! :-)
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Aaron Sloman
In contrast with ontology developers concerned with a symbolic or digital environment (e.g. the internet), I draw attention to some features of our 3-D spatio-temporal environment that challenge young humans and other intelligent animals and will also challenge future robots. Evolution provides most animals with an ontology that suffices for life, whereas some animals, including humans, also have mechanisms for substantive ontology extension based on results of interacting with the environment. Future human-like robots will also need this. Since pre-verbal human children and many intelligent non-human animals, including hunting mammals, nest-building birds and primates can interact, often creatively, with complex structures and processes in a 3-D environment, that suggests (a) that they use ontologies that include kinds of material (stuff), kinds of structure, kinds of relationship, kinds of process (some of which are process-fragments composed of bits of stuff changing their properties, structures or relationships), and kinds of causal interaction and (b) since they don't use a human communicative language they must use information encoded in some form that existed prior to human communicative languages both in our evolutionary history and in individual development. Since evolution could not have anticipated the ontologies required for all human cultures, including advanced scientific cultures, individuals must have ways of achieving substantive ontology extension. The research reported here aims mainly to develop requirements for explanatory designs. The attempt to develop forms of representation, mechanisms and architectures that meet those requirements will be a long term research project.
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Enabling transparency and efficiency in the research landscape
Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Department of Medical Informatics and Epidemiology, Oregon Health & Science University
Force11: Enabling transparency and efficiency in the research landscapemhaendel
Presented at the Feb 2015, NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
http://www.niso.org/news/events/2015/virtual_conferences/sci_data_management/
DRUGS New agreement to tackle pharmaceutical pollution p.1AlyciaGold776
DRUGS New agreement to
tackle pharmaceutical
pollution p.164
WORLD VIEW Vaccination
the best way to measure
health care p.165
DUNG OVER Rolling beetles
fooled by look-alike
seeds p.167
Let’s think about cognitive bias
The human brain’s habit of finding what it wants to find is a key problem for research. Establishing
robust methods to avoid such bias will make results more reproducible.
“Ever since I first learned about confirmation bias I’ve been see-ing it everywhere.” So said British author and broadcaster Jon Ronson in So You’ve Been Publicly Shamed (Picador, 2015).
You will see a lot of cognitive bias in this week’s Nature. In a series
of articles, we examine the impact that bias can have on research, and
the best ways to identify and tackle it. One enemy of robust science
is our humanity — our appetite for being right, and our tendency to
find patterns in noise, to see supporting evidence for what we already
believe is true, and to ignore the facts that do not fit.
The sources and types of such cognitive bias — and the fallacies they
produce — are becoming more widely appreciated. Some of the prob-
lems are as old as science itself, and some are new: the IKEA effect, for
example, describes a cognitive bias among consumers who place artifi-
cially high value on products that they have built themselves. Another
common fallacy in research is the Texas sharp-shooter effect — fir-
ing off a few rounds and then drawing a bull’s eye around the bullet
holes. And then there is asymmetrical attention: carefully debugging
analyses and debunking data that counter a favoured hypothesis, while
letting evidence in favour of the hypothesis slide by unexamined.
Such fallacies sound obvious and easy to avoid. It is easy to think that
they only affect other people. In fact, they fall naturally into investiga-
tors’ blind spots (see page 182).
Advocates of robust science have repeatedly warned against cogni-
tive habits that can lead to error. Although such awareness is essential,
it is insufficient. The scientific community needs concrete guidance on
how to manage its all-too-human biases and avoid the errors they cause.
That need is particularly acute in statistical data analysis, where
some of the best-established methods were developed in a time before
data sets were measured in terabytes, and where choices between tech-
niques offer abundant opportunity for errors. Proteomics and genom-
ics, for example, crunch millions of data points at once, over thousands
of gene or protein variants. Early work was plagued by false positives,
before the spread of techniques that could account for the myriad
hypotheses that such a data-rich environment could generate.
Although problems persist, these fields serve as examples of commu-
nities learning to recognize and curb their mistakes. Another example is
the venerable practice of double-blind studies. But more effort is needed,
particularly in what some have called evidence- ...
Open Access and Research Communication: The Perspective of Force11Maryann Martone
Presentation at the National Federation of Advanced Information Services Workshop: Open Access to Published Research: Current Status and Future Directions, Philadelphia, PA USA November 22, 2013
Rare (and emergent) disciplines in the light of science studiesAndrea Scharnhorst
Andrea Scharnhorst. Insights from TD1210. presentation given at Exploratory Workshop “Integrating the stake of rare disciplines at the European level” COST, Brussels, September 9, 2015
data management, information management, data, big data, personal organization, organization, file management, scientific research, research, project management, data security, file naming conventions, data management plan,
Presented at the Oregon Special Libraries Association workshop: Customizing Data Delivery to Target Audiences.
Join ORSLA Members and friends for a half-day workshop on Customizing Data Delivery to Target Audiences.
When: Friday, September 27, 1pm – 4pm with a no-host happy hour to follow.
Architectural Heritage Center (701 SE Grand Avenue, Portland, http://www.visitahc.org/)
You’ve done your research and have a pile of information. Learn how to turn that information into an impressive research package by profiling your audience (clients, faculty, students, management), assessing their needs and learning styles, and creating meaningful charts and info-graphics guaranteed to impress.
Schedule:
1-2pm – Reece Dano: Ethnographic Research: Going Beyond WHAT and Getting to WHY
2-3pm – Temese Szalai: Delivering Insights, Not Information
3-4pm – Jackie Wirz: Fundamentals of Data Visualization: Creating Beautiful, Elegant and Descriptive Visual Displays
After 4pm – Happy hour at [to be announced]
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...Jackie Wirz, PhD
While data is the cornerstone of scientific research, traditional mechanisms of research assessment overlook these data outputs, instead focusing solely on publications. However, publications are just the tip of the iceberg: in reality, science is based on a complex landscape of research data and activities, which can be published or shared beyond traditional journal articles. Information scientists and software engineers are now working to relate and make accessible all of this data for research networking, research evaluation, resource sharing, and hypothesis discovery. Furthermore, federal funding agencies are increasing their requirements for data sharing and data standardization. Researchers, though, are often largely unaware of data standardization efforts and tools to access shared data.
In order to deal with this onslaught of data, standards, and tools, universities are asking libraries to play an increasing role in information management strategies. This includes training, data housing, and dissemination of information about tool resources. Libraries are at a key intersection between the research community and semantic engineers, and are increasingly hiring specialists with a research background to provide data modeling, curation, and scientific information dissemination services. As a result, libraries have been working closely with the research community to build and integrate semantic tools into the entire research cycle. Librarians can help researchers understand ways to interpret and share their data, and use tools to query the large amounts of existing data.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
6. Who is FORCE11?
Publishers
Library and
Information
scientists
Policy
makers
Tool
builders
Funders
Scholars
Social
Science
Humanities
Science
Free to join!
8. How does OHSU fit in?
We won 1K to find out.
Today | Discuss data-research cycle, reproducibility, and
communication of findings
Later | Data playground with researchers:
Your data needs
Identify the material and services you need
Get paid $50
38. - Anne Gilliland
Your metadata should
make your data
understandable to
others…
without your
involvement
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
47. Data standards can help with
reproducibility
Average of
~50% of
resources
were not
identifiable
Vasilevsky et al., 2013 PeerJ 1:e148
www.force11.org/node/4463 biosharing.org/bsg-000532
48. Data Analysis Pipeline Reproducibility
Platforms
RESOURCES
www.wf4ever-project.org runmycode.org
galaxyproject.org/
49. Are you aware of data standards in
your field?
@OHSU, 72% said no or didn’t know!
50. Data standards are the rules by which data are
described and recorded. In order to share, exchange,
and understand data, we must standardize the format
as well as the meaning.
www.usgs.gov/datamanagement/plan/datastandards.php
Data Standards
51. Types of data standards
Reporting
guidelines
Terminology
Artifacts
(includes ontologies)
Exchange
Formats
Can be used together
57. Ontologies as a tool for unification
Disease-
Phenotype
databases
Disease
phenotype
ontology
Expression
data
Gene function
data
Cell and tissue
ontology
GO
annotations
ontologies
58. For example, there are many useful ways to classify organism
parts:
its parts and their arrangement
its relation to other structures
what is it: part of; connected to; adjacent
to, overlapping?
its shape
its function
its developmental origins
its species or clade
its evolutionary history
Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”
Ontologies classify data in multiple ways
http://www.boloncol.com/images/stories/boletin19/cajal16.jpg
59. Human Disease:
PFEIFFER
SYNDROME
Most similar
mouse model:
CD1.Cg-Fgfr2tm4Lni/H
shortened
head
MP:0000435
malocclusion
MP:0000120
ocular
hypertelorism
MP:0001300
short maxilla
MP:0000097
Brachyturricephaly
HP:0000244
Hypoplasia of
the maxilla
HP:0000327
Dental crowding
HP:0000678
Hypertelorism
HP:0000316
Coronal
craniosynostosis
HP:0004440
premature
suture
closure
maxilla
hypoplasia
malocclusion
shortened
head
ocular
hypertelorism
premature
suture closure
MP:0000081
Cross-species
Phenotype
Ontologies aid candidate gene identification for
undiagnosed diseases
64. What to do with data?
Storage Versioning Publication
Back up in multiple
locations:
Local hard drive
Removable
storage
Shared Network
Cloud server
File name
versioning
Dropbox
Version control
software
CVS
SVN
Git
Data sharing
repositories:
Local repository
Domain specific
Generic public
repository
70. Thinking Beyond the PDF
Raw Science Small publications Self-publishing
Datasets
Code
Experimental
design
Argument or
passage
Blogging
Microblogging
Comments &
Reviews
Annotations
Single figure
publications
Nanopublications
MH: A grass roots effort to accelerate the pace and nature of scholarly communications and e-scholarship through technology, education, and community
Why 11? We were born in 2011
MH: Force11 is comprised of a diversity of participants to best aid in the redefinition of scholarly communication
MH: (Un)conference where stakeholders came together as equals to discuss issues
Incubator for change
What would you do to change scholarly communication if you had $1K?
M. Haendel award winner
MH-why we are here today, and how all of you can help.
JW: put the slide 2 here perhaps? This still has too much text; slide 2 is much less intimidating.
JW
RC: The traditional model of scientific communication is fairly straightforward. Sucessful research is shared via presentations and papers, after data is collected and analyzed.
RC: This model is slow, even when considred within the context of electronic journals. A recent study clocked the average timeframe from submission to to publication for biomedical journals at over 9 months: http://www.openaccesspublishing.org/2013/09/06/the-publishing-delay-in-scholarly-peer-reviewed-journals/.
RC: The traditional model is also very formalized in respect to when in the research cycle the science is shared (well after the study has taken place), how it is disseminated (peer-reviewed articles), to whom one is communicating (most often scientists in your specialized field) and how impact is measured (citation counts to articles).
RC: Finally, it is unilateral in that it doesn’t faciliate dynamic, real-time, interaction between scientists outside of the society meeting or conference. Nor does it further coversation between scientists and the public.
The Internet has had a profound affect on science and scientific communication, wherein the traditional model I just described is being reimainged, ulitmately in the perusuit of advancing the scientific process. But the traditional model of scholarly communication stills dominates how many scientists manage and share their research and data.
RC: Volume of literature has exploded since the first online journals were launched in the late 1980’s. Today, virtually all science journals are online. There are over 28,000 active peer reviewed journals, publishing nearly 2 million articles per year, with a new paper is published every 20 seconds. This is a huge industry, with revenues of about 10 billion/year.
50% of new research is freely available online either immediately or within 12 months of publication. But the other 50% lives behind high paywalls. Limiting the scope of science available to potential readers (human and compter; scientists and the public).
Infographic: http://www.sciencemag.org/site/special/scicomm/infographic.jpg
RC: We have also seen a proliferation of new publishing modes and models. This includes a variety of open access publishers and journals with new peer-review models such as open and post-publication peer-reivew. And, new economic models, wherein authors, funders, and libraries are sharing the cost of publication. New modes include but are not limited to self-publication and social media, such as science blogs and twitter, and data sharing via public repositories.
Communication is also occurring at more points across the research cycle. For instance, ideas are shared and developed via on-line conversations on blogs and Twitter. Code and data are being released as they are built and recorded via open lab notebooks. This activity compliments and feeds the traditional products of research – papers and presentations.
RC: However, if scientists don’t thoughtfully and actively manage their research products in this new system, the advantages are minimized and all this new stuff becomes noise.
Data. Complex…
JW
JW
JW
JW
JW: this is raw crystallography data, collected at OHSU. This is visual, high resolution, data
The image then gets integrated along the spots, transforming the image into a series of mathematical values
The “model’ we are used to seeing is actually a mathematical representation of how well a model (the sticks) fits into the mathematical distillation of the raw image. This looks static, but is actually a best representaiton along one axis of the data (which is to say, confidence levels).
Crystallography boils down to solving the “phase problem”, which can be done two ways: brute force (holy hell!), and by using an exisitng model as a jumping off point. This is the fastest and most efficient way of solving off structures, and is, in fact, what I did to solve this structure. I got the previously published data from pdb.org, which is also where I deposited my data.
The point of this is three fold: 1) data comes in many shapes and forms, 2) data transforms, and 3) data helps inform more data.
JW: this is raw crystallography data, collected at OHSU. This is visual, high resolution, data
Ask them to think about what type of data they deal with/generate. Give a couple minutes.
Ask if they have additional data types that they brainstormed
JW: yes, need this slide if we are to cover the examples listed later. Also, we are eventually getting to alt metrics, which means the third quadrant; therefore, important to cover here.
Data. Complex…
Data. Complex…
JW
JW
!
Add metadata not only to your experimental results, but also your process during research, such as resources, protocols, etc.
Ways to apply metadata to every moving part of your research
JW: this is raw crystallography data, collected at OHSU. This is visual, high resolution, data
JW: this is raw crystallography data, collected at OHSU. This is visual, high resolution, data
JW: this is raw crystallography data, collected at OHSU. This is visual, high resolution, data
JW
NV-The literature was the place we would go to find information to get protocols, information about techniques, find resources/reagents
Assuming you got to your relevant paper- look at the methods section, is there enough info there for you to be able to reuse/reproduce the info/experiment/technique?
NV- For example, if you look in the materials and methods section for an antibody used in a western blot, oftentimes the name is reported, along with the vendor and vendor’s location
Say here that the authors met the journal standards, but that they really aren’t sufficient.
NV However, there are several antibodies generated against one target, so how do you know which one works in this assay?
Need to report catalog numbers…
NV - Alternatively, report the AR ID
Permanent identifier, stays with the Ab, even as it changes vendors or catalog #’s change.
Similar to genbank for antibodies.
Most resources can be reported more specifically than publisher guidelines, which are not intended to support reproducibility.
NV: An area with poor data standards shows poor reproducibility. Here we showed how irrereproducible many studies were simply due to lack of specificity in the resources used in the experiments. We therefore developed guidelines that are now in place to support resource reporting, and these are now in effect in a number of journals, with more to come.
OHSU participates in the Reproducibility Initiative, aimed at developing policies and tools to aid scientific reproducibility.
Some bioinformatics tools to aid reproducibility are Workflow4Ever and RunMyCode.org.
Outcomes from data standards: Reproducibility and data reuse
Place urls in separate document, not on slide
www.scienceexchange.com/reproducibility
www.wf4ever-project.org
runmycode.org
Bioinformatics workflow standards such as Workflow 4Ever and Run my code have been developed to help with standardization and sharing of scientific workflows and code.
Workflow 4 ever
Run my code is a repository where people can share or reuse code that is associated with scientific publications.
For data manipulations, here is an example of tools that can help with reproducibility.
MH: Yes 28.0% , No 26.9% , I don't know 45.1%
175 answered question
MH
Put URL in supporting document. Too distracting here.
http://www.usgs.gov/datamanagement/plan/datastandards.php
MH: each type serves a different purpose:
Reporting guidelines serve to ensure that a minimum of metadata is reported, so that someone else can know what your data is about.
Terminology artifacts allow some of the data to be structured for reuse and interoperability. Think of these as interoperability handles.
Exchange formats provide the syntax for the data structure, and further enable data integration and mashup.
MH: each type serves a different purpose:
Reporting guidelines serve to ensure that a minimum of metadata is reported, so that someone else can know what your data is about.
Terminology artifacts allow some of the data to be structured for reuse and interoperability. Think of these as interoperability handles.
Exchange formats provide the syntax for the data structure, and further enable data integration and mashup.
MH: which one to use? Need a solution to help identify the right standard, contribute to and/or extend existing ones to best support community reproducibility and reuse
MH –Both of these resources provide a survey of data standards of all three types –
Reporting Guidelines
Terminology Artifacts (includes ontologies)
Exchange Formats
Biosharing has a biology focus, CDISC is a clinical focus
There are others, these are just two resources.
Take away- there are different standards, no standard meets everyone’s need.
NV: this is transition back to melissa
MH: Reusing data is not as easy as dumpster diving. You don’t always know that a coke can or a keyboard key can be a critical data element.
JW: Oh. My. God.
MH:
Slide from Chris Mungall
Ontologies provide the handle by which data from different databases and of different types can be linked and integrated for maximal biological knowledge
Do we need this slide?
JW: Maybe not IN the deck, but at the back. If soembody asks what an ontology is during the Q&A, we can bring it up. I did this all the time for my seminars – always have extra slides at the back end for potential questions.
MH: ontologies, unlike a file system, allow data to be classified in many different ways using logic and standardized identifiers
MH:
When data is encoded using ontologies, it can allow mashup in novel ways. Here, we are using clinical phenotype data and comparing it with model organism phenotype data to identify candidate genes for undiagnosed human diseases.
JW: Please let me clean up the original image. The pixilated borders are driving me nuts, and the human head has some white pixels that can very easily and quickly be cleaned up!
MH: those pesky data sharing mandates, what are they really for?
Does dumping my data into a data repository with no metadata or use of standards really help?
Answer- no it doesn’t. If you want your data to be a first class citizen as a scholarly product that can be cited and actually be reused, then you need to go a bit further.
Need to add links to policies
Transition- how can I meet data sharing requirements, and actually make my data reusable?
ANSWER: Just like any experiment or quality statistical approach, you need to plan ahead.
There are tools to help. The library can help too.
FigShare
Dryad
Data.gov
MH: add link
Want people to come to library to help with archiving/data publication
Where can you keep your data? Does it have sensitive info? Yes/no
Does it need to be archived?
Make decision tree for one on one meetings
What does this mean? It means storing or performing analyses on (many times) unsecure shared servers that may exist anywhere in the world
Why should you care? Tools like dropbox and googledocs are research effecience lifesavers but come with an IP risk as well as risk of sharing PHI data
Similarly, amazon cloud servers and genomics data analysis platforms are all too easy to set up or use, and can lead to PHI data being leaked.
MH:
Example: DOIs for publications, data, or other research product
doi: 10.1371/journal.pbio.1001339
A URI will resolve to a single location on the web
URIs for people
RC
Scientific output and potential impact is more complex, dynamic, and diverse than peer-reviewed papers. Actively managing your research footprint – which includes of course your data – can positively affect your scientific impact.
MH – I updated a bit..
MELISSA
Robin add better title? Needs cleanup still
Grab info for NIH
Melissa also talk about NSF biosketch and how everything you create speaks to you as a scientist- make it citable!
End with your scholarly footprint – lead into breakouts
JW
JW
MH: Should add links to libguide, library pages etc.