A Lined Data Approach to Interoperability between Biomedical Resource Invento...Trish Whetzel
Overview of Resource Representation Coordination efforts to coordinate the representation of resources from Biositemaps, eagle-i, and the Neuroscience Information Framework.
Horizon 2020: Outline of a Pilot for Open Research Data LIBER Europe
The European Commission is developing an Open Data Pilot. This pilot will look at research data generated in projects funded under the Horizon 2020 framework, with the aim of stimulating the data-sharing culture among researchers and facilitating both the re-use of information and data-driven science.
As organisations with a strong interest in Open Data, OpenAIRE, LIBER and COAR have assessed the current situation and made recommendations for an effective Open Data Pilot.
Presentation by Sally Rumsey, The Bodleian Libraries, University of Oxford at Science and Engineering South (SES) Event - Helping Researchers Manage their Data - Friday 9th May 2014 held at Imperial College London
This presentation was provided by Mark Llauferseiler of the University of Oklahoma, during part one of the NISO two-part webinar "Labor and Capacity for Research Data Management," which was held on March 11, 2020.
Monica Omodei RDAP11 Policy-based Data Management; ASIS&T
Monica Omodei, Australian National Data Service; Policy-based Data Management; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Overview of the world of geospatial metadata, and the role of the EDINA service GoGeo in creating, saving, and discovering it. Presented on 19 June 2014 by Tony Mathys in Aberdeen, Scotland.
Challenges in Enabling Mixed Media Scholarly Research with Multi-Media Data i...roelandordelman.nl
Presentation at the Digital Humanities 2018 Conference, Mexico City, on the development of the Media Suite, an online research environment that facilitates scholarly research using large multimedia collections maintained at archives, libraries and knowledge institutions. The Media Suite unlocks the data on the collection level, item level, and segment level, provides tools that are aligned with the scholarly primitives (discovery, annotation, comparison, linking), and has a 'workspace' for storing personal mixed media collections and annotations, and to do advanced analysis using Jupyter Notebooks and NLP tools.
See the notes for the narrative that goes with the slides.
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
Beginning in July 2011, the University of Illinois at Urbana-Champaign Library, working in conjunction with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of Data Management Plans (DMPs) in newly submitted National Science Foundation (NSF) grant proposals. The DMP became a required element in all NSF proposals beginning on January, 18th 2011. This analysis was undertaken to provide the Illinois campus and library with detailed information on the DMPs being submitted by Illinois researchers. In particular, the analysis allows us to categorize the grant applicant’s proposed DMP data storage venues and data reuse mechanisms, and provides us with data on the use of DMP templates developed by the University of Illinois Library.
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Lisa Zilinski, Data Specialist, Carnegie Mellon University
Amy Barton, Metadata Specialist, Purdue
Tao Zhang, Digital User Experience Specialist, Purdue
Line Pouchard, Computational Science Information Specialist, Purdue
Pete E. Pascuzzi, Molecular Biosciences Information Specialist, Purdue
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
A Lined Data Approach to Interoperability between Biomedical Resource Invento...Trish Whetzel
Overview of Resource Representation Coordination efforts to coordinate the representation of resources from Biositemaps, eagle-i, and the Neuroscience Information Framework.
Horizon 2020: Outline of a Pilot for Open Research Data LIBER Europe
The European Commission is developing an Open Data Pilot. This pilot will look at research data generated in projects funded under the Horizon 2020 framework, with the aim of stimulating the data-sharing culture among researchers and facilitating both the re-use of information and data-driven science.
As organisations with a strong interest in Open Data, OpenAIRE, LIBER and COAR have assessed the current situation and made recommendations for an effective Open Data Pilot.
Presentation by Sally Rumsey, The Bodleian Libraries, University of Oxford at Science and Engineering South (SES) Event - Helping Researchers Manage their Data - Friday 9th May 2014 held at Imperial College London
This presentation was provided by Mark Llauferseiler of the University of Oklahoma, during part one of the NISO two-part webinar "Labor and Capacity for Research Data Management," which was held on March 11, 2020.
Monica Omodei RDAP11 Policy-based Data Management; ASIS&T
Monica Omodei, Australian National Data Service; Policy-based Data Management; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Overview of the world of geospatial metadata, and the role of the EDINA service GoGeo in creating, saving, and discovering it. Presented on 19 June 2014 by Tony Mathys in Aberdeen, Scotland.
Challenges in Enabling Mixed Media Scholarly Research with Multi-Media Data i...roelandordelman.nl
Presentation at the Digital Humanities 2018 Conference, Mexico City, on the development of the Media Suite, an online research environment that facilitates scholarly research using large multimedia collections maintained at archives, libraries and knowledge institutions. The Media Suite unlocks the data on the collection level, item level, and segment level, provides tools that are aligned with the scholarly primitives (discovery, annotation, comparison, linking), and has a 'workspace' for storing personal mixed media collections and annotations, and to do advanced analysis using Jupyter Notebooks and NLP tools.
See the notes for the narrative that goes with the slides.
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
Beginning in July 2011, the University of Illinois at Urbana-Champaign Library, working in conjunction with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of Data Management Plans (DMPs) in newly submitted National Science Foundation (NSF) grant proposals. The DMP became a required element in all NSF proposals beginning on January, 18th 2011. This analysis was undertaken to provide the Illinois campus and library with detailed information on the DMPs being submitted by Illinois researchers. In particular, the analysis allows us to categorize the grant applicant’s proposed DMP data storage venues and data reuse mechanisms, and provides us with data on the use of DMP templates developed by the University of Illinois Library.
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Lisa Zilinski, Data Specialist, Carnegie Mellon University
Amy Barton, Metadata Specialist, Purdue
Tao Zhang, Digital User Experience Specialist, Purdue
Line Pouchard, Computational Science Information Specialist, Purdue
Pete E. Pascuzzi, Molecular Biosciences Information Specialist, Purdue
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018
Overview of the UKRDDS pilot project at Univwersity of Edinburgh employing PhD interns to validate metadata about research data created by University of Edinburgh researchers and held in local RDM services solutions. This was presented at IASSIST in June 2016, Bergen, Norway.
This presentation was provided by Andrew K. Pace of OCLC, during the 13th Annual NISO-BISG forum "Interoperability: From Silos to An Ecosystem," held on June 24, 2020.
In order to be reused, research data must be discoverable.
The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects.
Universities are increasingly making research data assets available through repositories or other data portals.
The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops.
This presentation starts with basic information about the Social Science Data Archives. Then it mostly introduces complexity and diversity of research data field. Participants can learn about Open Data project in Slovenia, about research lifecycle and research data lifecycle. And it concludes with roles and responsibilities in research data lifecycle.
Event was one of Foster Cessda trainings for doctoral students.
Videos: http://videolectures.net/adptecaj2015_ljubljana/
Related link: https://www.fosteropenscience.eu/event/cessda-research-data-management-open-data-doctoral-training-series-research-data-management
Open Science, Open Data: towards a new transparent and reproducible ecosystemLIBER Europe
Presented at the Preforma Open Source Workshop 8 April 2016
As a library membership organization, LIBER works on addressing Open Science barriers. Standardisation of file formats can really help in overcoming some of these barriers: it enables us to process and preserve data in a controlled way, it helps ensure that outputs are really open and accessible in the long term and it improves interoperability of new tools and services. Making sure data is stored in a controlled way and can be (re) used today and in the future is an important element in Open Science. We see this as not only a technical challenge but also a social one: awareness, trust and community building is needed in order to ensure uptake of these standards. Libraries therefore have a valuable role to play in the development of good research data management throughout all phases of the Open Data lifecycle.
Stuart Macdonald talks about the Research Data Management programme at the University of Edinburgh Data Library, delivered at the ADP Workshop for Librarians: Open Research Data in Social Sciences and Humanities (ADP), Ljubljana, Slovenia, 18 June 2014
Presentation on behalf of the SA Weather Service presented during SA National Science Week - The harsh realities of climate change, 29 July to 2 August 2019.
Presented at a NeDICC (Network of Data and Information Curation Communities) meeting, 14 March 2019, CSIR, and at the University of Pretoria and the Carnegie Corporation of New York Capstone Conference, 24-29 March 2019, Kieviets Kroon.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Workshop 4: Open Science & Open Data for Librarians/Ina Smith
1. Workshop 4:
Open Science & Open Data for Librarians
24 April 2018 14:00 – 17:30
XXIII SCECSAL Conference, Entebbe, Uganda
24 April 2018
2. Programme
14:00 – 14:30 Introduction to Open Science/Open Data
14:30 – 15:00 Data informing the library profession
15:00 – 15:40 Data in support of research
15:40 – 16:00 Health Break
16:00 – 17:00 Working with data – tools & applications
17:00 – 17:30 Towards a data strategy for your library & institution
3.
4. Data Stakeholders
• Governments (policy)
• Institutions (policy & strategy)
• Research Offices (reporting, impact)
• Researchers (collecting data in an ethical and
trusted way so that it can be re-used)
• Statisticians (processing, analysing and visualising
data)
• System engineers (to maintain a network and
allow for data to be digitally transmitted)
• Librarians (managing and organizing the data, and
making sure it is digitally preserved for the
unforeseeable future)
5. Why Librarians as Data Partners?
• Information standards
• Organizational skills
• Setting up file structures (organizing
information)
• Knowledge of workflows
• Knowledge of collection management
• Describing data using established metadata
schemes & controlled vocabulary
• Collection curation/preservation
6. Role of Librarians
• Advocate for transparency, openness in research,
access to data
• Initiating conversation on Open Science Open Data
Policy & Strategy - implement
• Develop own data skills (data skills but also
informed on copyright, licensing, citation)
• Increase visibility of research data
• Manage & register trusted data repositories
• Recommend trusted data repositories
• Promote & support proper research data
management planning among researchers
7. Data Skills for Librarians (1)
• Data terminology
• Unix-style command line interface, allowing librarians to
efficiently work with directories and files, and find and manipulate
data
• Cleaning and enhancing data in OpenRefine and spreadsheets
• Git version control system and the GitHub collaboration tool
• Web scraping and extracting data from websites
• Scientific writing in useful, powerful, and open mark-up
languages such as LaTeX, XML, and Markdown
• Formulating and managing citation data, publication lists, and
bibliographies in open formats such as BiBTeX, JSON, XML and
using open source reference management tools such as JabRef
and Zotero
8. Data Skills for Librarians (2)
• Transforming metadata documenting research outputs into open plain
text formats for easy reuse in research information systems in support of
funder compliance mandates and institutional reporting
• Scholarly identity with ORCiD and managing reputation with ORCiD-
enabled scholarly sharing platforms such as ScienceOpen
• Authorship, contributorship, and copyright ownership in collaborative
research projects
• Demonstrating best practices in attribution, acknowledgement, and
citation, particularly for non-traditional research outputs (software,
datasets)
• Identifying reputable Open Access publications and Open
Institutional/Open Data repositories
• Scholarly annotation and open peer review
• Investigating and managing copyright status of a work, and evaluating
conditions for Fair Use
10. Types of data
• Government data
• Communication data (mobile phones)
• Internet data
• Statistical data
• Research data (social & natural sciences)
• Discipline specific
• And more …
14. Open Science Defined
“Open Science is the practice of science in such a
way that others can collaborate and contribute,
where research data, lab notes and other
research processes are freely available, under
terms that enable reuse, redistribution and
reproduction of the research and its
underlying data and methods.” - FOSTER Project,
funded by the European Commission
17. Original Research Data Lifecycle image from University of California, Santa Cruz
http://guides.library.ucsc.edu/datamanagement/
Repositories
Repositories
Tools
Plan
Policy&Infrastructure
21. Fears Researchers Experience
• Getting scooped
• Time & effort by researcher
• Someone else finding a path-breaking application
of the data that researcher hasn’t considered
• Fear of problems/errors in the measurement
process being exposed
• Confidentiality/privacy of respondents - ethics
clearance
• Intellectual Property Rights – signed away, little
understanding, no IP in place
22. • When should research data be open?
• When should research data be closed?
30. Data Repositories vs Social Media
• Social media sites/3rd party software:
• Connect researchers sharing interests
• Marketing data
• Sites belong to third parties – and data
• Repository:
• Supports export/harvesting of metadata
• Offers long-term preservation
• Non-profit – no advertisements
• Uses open standards and protocols
• Copyright
32. Register Data Initiatives
• re3data.org
https://www.re3data.org/
• Open Data Barometer
https://opendatabarometer.org/
• Global Open Data Index
https://index.okfn.org/
• African Open Science Platform
http://africanopenscience.org.za/
• Dataverse …. And more …
38. Working with Data
• Using R, Python, ggplot and more ..
• Collection e.g. Survey
• Normalisation & Cleaning e.g. OpenRefine
• Analysis
• Visualisation
• Preservation
• Mining
42. Data Mining
• Set of methods to analyse data from various
dimensions and perspectives, finding previously
unknown hidden patterns, classifying and grouping
the data and summarizing the identified
relationships
The tasks of data mining are twofold:
• Create predictive power using features to predict
unknown or future values of the same or other
feature
• Create a descriptive power, find interesting,
human-interpretable patterns that describe the
data
46. Self- & Lifelong Learning
• Bachelor of Science in Data Science, Sol Plaatje University (South Africa)
• Coursera Data Science
• Coursera Research Data Management and Sharing
• Foster Open Science Courses
• Masters Program in Biodiversity Informatics, Prof Jean Ganglo, University of Abomey-
Calavi (Benin)
• MANTRA for Researchers
• MANTRA for Librarians
• Agricultural Information Management Standards (AIMS)
• Author Carpentry
• Data Carpentry
• Library Carpentry
• WDS Training Resources
• UCT eResearch
59. Awareness – start the conversation
• To begin ….
• What data repositories? Which data type? Which
metadata standards?
• Data web page
• Market services re data support
• Meet with stakeholders at institution
• Form a committee to implement strategy, policy,
etc.
• Implement Research Data Management Plans
• Implement Institutional Data Repository
61. Thank you
Ina Smith
Project Manager, African Open Science Platform Project, Academy of
Science of South Africa (ASSAf)
ina@assaf.org.za
Susan Veldsman
Director, Scholarly Publishing Programme, Academy of Science of
South Africa (ASSAf)
susan@assaf.org.za
Visit http://africanopenscience.org.za