This presentation was provided by Stephanie Labou of The University of California - San Diego, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two" which was held on March 18, 2020.
This presentation was provided by Mark Llauferseiler of the University of Oklahoma, during part one of the NISO two-part webinar "Labor and Capacity for Research Data Management," which was held on March 11, 2020.
The document summarizes the role and challenges of research data management (RDM) information professionals from the perspective of a library practitioner. It discusses how RDM professionals educate researchers on topics like data management planning and repositories, consult on issues like workflows and publishing, and curate data to ensure findability, understandability and reuse. However, navigating relationships with different university offices, building shared understanding of technical concepts, and managing expectations with limited resources present challenges. Key principles for RDM professionals include keeping researchers central, considering future data re-users, and contributing to communities of practice. Ongoing gaps include supporting restricted and large data as well as developing actionable policies and training new professionals.
This presentation was provided by Julie Goldman of Harvard University, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
This presentation was provided by Courtney R. Butler of The Federal Reserve Bank - Kansas City, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
This document discusses best practices for content delivery platforms to support artificial intelligence projects. It recommends that platforms (1) accept that they do not have all the data needed and should integrate third-party sources, (2) provide consistent tagging of content, (3) offer a lightweight programmatic interface, (4) embrace allowing large amounts of content to be taken offline for analysis, and (5) enable complex filtering and selection of data. The document also suggests platforms could consider offering preprocessed datasets or AI tools as new products.
This presentation was provided by Carolyn Hansen of the University of Cincinnati during the NISO Training Thursday event, Metadata and the IR, held on Thursday, February 23, 2017.
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
This presentation was provided by Mark Llauferseiler of the University of Oklahoma, during part one of the NISO two-part webinar "Labor and Capacity for Research Data Management," which was held on March 11, 2020.
The document summarizes the role and challenges of research data management (RDM) information professionals from the perspective of a library practitioner. It discusses how RDM professionals educate researchers on topics like data management planning and repositories, consult on issues like workflows and publishing, and curate data to ensure findability, understandability and reuse. However, navigating relationships with different university offices, building shared understanding of technical concepts, and managing expectations with limited resources present challenges. Key principles for RDM professionals include keeping researchers central, considering future data re-users, and contributing to communities of practice. Ongoing gaps include supporting restricted and large data as well as developing actionable policies and training new professionals.
This presentation was provided by Julie Goldman of Harvard University, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
This presentation was provided by Courtney R. Butler of The Federal Reserve Bank - Kansas City, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
This document discusses best practices for content delivery platforms to support artificial intelligence projects. It recommends that platforms (1) accept that they do not have all the data needed and should integrate third-party sources, (2) provide consistent tagging of content, (3) offer a lightweight programmatic interface, (4) embrace allowing large amounts of content to be taken offline for analysis, and (5) enable complex filtering and selection of data. The document also suggests platforms could consider offering preprocessed datasets or AI tools as new products.
This presentation was provided by Carolyn Hansen of the University of Cincinnati during the NISO Training Thursday event, Metadata and the IR, held on Thursday, February 23, 2017.
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
Big Data & DS Analytics for PAARL aims to help library participants relate Big Data and Data Science applications to library services. The speaker discusses Big Data concepts like the 3 V's of volume, velocity, and variety. Library data resources and analytics challenges are presented. Opportunities for libraries in Big Data include expertise in metadata, assessment, and collaboration. Building a Big Data culture requires openness, investment, training, and data sharing standards. Data governance differs from data management. Machine learning and social listening are explored as examples. Trends in data science domains and tools are shared.
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
A discussion of user needs in anthropology and ways in which academic liaison librarians could support the lifecycle of qualitative research in a holistic way.
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
Slides | Targeting the librarian’s role in research servicesLibrary_Connect
Slides from the Nov. 8, 2016 Library Connect webinar "Targeting the librarian’s role in research services" with Nina Exner, Amanda Horsman and Mark Reed. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=223121
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
This presentation was provided by Rebecca Springer of Ithaka S+R, during part one of the NISO two-part webinar "Labor and Capacity for Research Data Management," which was held on March 11, 2020.
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
This document discusses the data skills required of librarians and presents a matrix of factors that influence these skills, including the librarian's role, the data lifecycle services provided by the library, and the research intensity of the institution. It notes the wide range of possible data-related skills and acknowledges that no individual can master all of them, emphasizing the need for librarians to work as a team with complementary skills. The document also examines questions around how librarians can become more involved in data science and what their future roles may be in supporting data-intensive research.
This document summarizes research data support services at Tufts University. It discusses the context at Tufts including relevant support organizations. It describes collaborations between the libraries, technology services, and research centers to provide data management resources like the Tufts Data Lab, a data management team, and Carpentries data workshops. Ongoing work includes developing guidance on data storage, a centralized support website, and expanding the use of the Dataverse sharing platform.
SLIDES | 12 time-saving tips for research supportLibrary_Connect
The document provides 25 tips for using various tools to work smart, work together, and stay up-to-date as a researcher. The tips include creating a document library, downloading and marking up documents, using an electronic lab notebook, joining a research ecosystem, setting alerts, following researchers, analyzing search results, and more. The overall message is that new tools can help researchers organize the growing amount of data, connect with collaborators, and maintain novelty in their work.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Building Best Practices in Research Data Management: Tisch Library’s Initiatives
Regina F. Raboin, Science Research and Instruction Librarian/ Data Management Services Group Coordinator, Tisch Library, Tufts University
1) The document provides tips for good research data management (RDM), including file management, naming, versioning, formats, documentation, storage, and addressing common questions.
2) It emphasizes the importance of RDM for identifying, locating, understanding, and reusing data effectively, as well as satisfying funder requirements.
3) Good RDM practices such as consistent naming, versioning, and use of open formats make data more accessible for collaboration, analysis, and preservation.
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Christie Wiley, University of Illinois Urbana-Champaign
This presentation was provided by Anne Washington of the University of Houston during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
Day in the life of a data librarian [presentation for ANU 23Things group]Jane Frazier
This document summarizes the job responsibilities and career path of a data librarian. It describes how the librarian draws on skills from traditional librarianship, metadata work, digital curation, software development and research to support data management and sharing. The librarian's current role involves developing metadata standards, providing training and consultancy to researchers, and engaging with colleagues both within and outside their organization to improve data services. The document suggests aspiring data librarians learn new technologies, describe their skills to potential employers, and stay active developing their expertise through conferences and online resources.
This document provides an overview of data librarianship presented by Kimberly Silk. It defines data librarianship and the role of data librarians in supporting data management, metadata, and teaching data use. The presentation covers basic data terminology, common data sources like government surveys and international organizations, challenges around big and open data, tools for data analysis and discovery like Dataverse, and examples of data visualizations.
This presentation was delivered as part of a Digital Humanities workshop in Medieval Studies at the University of Toronto. Its aim was to engage with digital humanists in the area of data management and start a conversation about what good data management means (from collection to preservation). Included is a data management checklist for DH projects.
How to crack Big Data and Data Science rolesUpXAcademy
How to crack Big Data and Data Science roles is the flagship event of UpX Academy. This slide was used for the event on 10th Sept that was attended by hundreds of participants globally.
Big Data & DS Analytics for PAARL aims to help library participants relate Big Data and Data Science applications to library services. The speaker discusses Big Data concepts like the 3 V's of volume, velocity, and variety. Library data resources and analytics challenges are presented. Opportunities for libraries in Big Data include expertise in metadata, assessment, and collaboration. Building a Big Data culture requires openness, investment, training, and data sharing standards. Data governance differs from data management. Machine learning and social listening are explored as examples. Trends in data science domains and tools are shared.
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
A discussion of user needs in anthropology and ways in which academic liaison librarians could support the lifecycle of qualitative research in a holistic way.
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
Slides | Targeting the librarian’s role in research servicesLibrary_Connect
Slides from the Nov. 8, 2016 Library Connect webinar "Targeting the librarian’s role in research services" with Nina Exner, Amanda Horsman and Mark Reed. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=223121
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
This presentation was provided by Rebecca Springer of Ithaka S+R, during part one of the NISO two-part webinar "Labor and Capacity for Research Data Management," which was held on March 11, 2020.
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
This document discusses the data skills required of librarians and presents a matrix of factors that influence these skills, including the librarian's role, the data lifecycle services provided by the library, and the research intensity of the institution. It notes the wide range of possible data-related skills and acknowledges that no individual can master all of them, emphasizing the need for librarians to work as a team with complementary skills. The document also examines questions around how librarians can become more involved in data science and what their future roles may be in supporting data-intensive research.
This document summarizes research data support services at Tufts University. It discusses the context at Tufts including relevant support organizations. It describes collaborations between the libraries, technology services, and research centers to provide data management resources like the Tufts Data Lab, a data management team, and Carpentries data workshops. Ongoing work includes developing guidance on data storage, a centralized support website, and expanding the use of the Dataverse sharing platform.
SLIDES | 12 time-saving tips for research supportLibrary_Connect
The document provides 25 tips for using various tools to work smart, work together, and stay up-to-date as a researcher. The tips include creating a document library, downloading and marking up documents, using an electronic lab notebook, joining a research ecosystem, setting alerts, following researchers, analyzing search results, and more. The overall message is that new tools can help researchers organize the growing amount of data, connect with collaborators, and maintain novelty in their work.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Building Best Practices in Research Data Management: Tisch Library’s Initiatives
Regina F. Raboin, Science Research and Instruction Librarian/ Data Management Services Group Coordinator, Tisch Library, Tufts University
1) The document provides tips for good research data management (RDM), including file management, naming, versioning, formats, documentation, storage, and addressing common questions.
2) It emphasizes the importance of RDM for identifying, locating, understanding, and reusing data effectively, as well as satisfying funder requirements.
3) Good RDM practices such as consistent naming, versioning, and use of open formats make data more accessible for collaboration, analysis, and preservation.
This presentation was provided by Carly Strasser of the Chan Zuckerberg Initiative during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Christie Wiley, University of Illinois Urbana-Champaign
This presentation was provided by Anne Washington of the University of Houston during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
Day in the life of a data librarian [presentation for ANU 23Things group]Jane Frazier
This document summarizes the job responsibilities and career path of a data librarian. It describes how the librarian draws on skills from traditional librarianship, metadata work, digital curation, software development and research to support data management and sharing. The librarian's current role involves developing metadata standards, providing training and consultancy to researchers, and engaging with colleagues both within and outside their organization to improve data services. The document suggests aspiring data librarians learn new technologies, describe their skills to potential employers, and stay active developing their expertise through conferences and online resources.
This document provides an overview of data librarianship presented by Kimberly Silk. It defines data librarianship and the role of data librarians in supporting data management, metadata, and teaching data use. The presentation covers basic data terminology, common data sources like government surveys and international organizations, challenges around big and open data, tools for data analysis and discovery like Dataverse, and examples of data visualizations.
This presentation was delivered as part of a Digital Humanities workshop in Medieval Studies at the University of Toronto. Its aim was to engage with digital humanists in the area of data management and start a conversation about what good data management means (from collection to preservation). Included is a data management checklist for DH projects.
How to crack Big Data and Data Science rolesUpXAcademy
How to crack Big Data and Data Science roles is the flagship event of UpX Academy. This slide was used for the event on 10th Sept that was attended by hundreds of participants globally.
Feb.2016 Demystifying Digital Humanities - Workshop 3Paige Morgan
Slides from Demystifying Digital Humanities Workshop 3: Data Wrangling: Programming on the Whiteboard -- taught at the University of Miami Libraries in February, 2016
1) The document discusses big data and data science, defining big data using the three Vs of volume, velocity, and variety to characterize high amounts of diverse data sources.
2) Data science is presented as a combination of techniques from fields like mathematics, computer science, and statistics to extract knowledge from data.
3) Successful data scientists require a diverse skillset that includes quantitative skills, technical skills, skepticism, collaboration, and knowledge from multiple disciplines.
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya
Exploratory Data Analytics (EDA) is a data Pre-Processing, manual data summarization and visualization related discipline which is an earlier phase of data processing. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
The document provides an overview of research data management and the importance of avoiding a "DATApocalypse" or data disaster. It discusses the definition of research data, why data management is important, questions to consider, best practices for data management planning, documentation, and long-term preservation. The goal is to help researchers and institutions properly manage data to enable sharing and preservation, as required by most major funders.
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
This document summarizes a presentation on data science consulting. It discusses:
1) The Agile Analytics group at ThoughtWorks which does data science consulting projects using probabilistic modeling, machine learning, and big data technologies.
2) Two case studies are described, including developing a machine learning model to improve matching of healthcare product data and using logistic regression for retail recommendation systems.
3) The origins and future of the field are discussed, noting that while not entirely new, data science has grown due to improvements in technology, programming languages, and libraries that have increased productivity and driven new career opportunities in the field.
On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.
The document discusses the growth of data and the field of data science. It begins by noting the large amounts of data being generated daily by various sources like web/e-commerce transactions, social networks, and scientific projects. It then discusses some of the challenges of big data including volume, velocity, and variety. The document provides an overview of the multidisciplinary nature of data science and the skills required of data scientists. It also summarizes different approaches to and job roles in data science.
This document provides an introduction to a course on big data analytics. It discusses the characteristics of big data, including large scale, variety of data types and formats, and fast data generation speeds. It defines big data as data that requires new techniques to manage and analyze due to its scale, diversity and complexity. The document outlines some of the key challenges in handling big data and introduces Hadoop and MapReduce as technologies for managing large datasets in a scalable way. It provides an overview of what topics will be covered in the course, including programming models for Hadoop, analytics tools, and state-of-the-art research on big data technologies and optimizations.
Many of us data science and business analytics practitioners perform research and analysis for decision makers on a regular basis. The deliverable of such analysis often results in a Power Point presentation, and/or a model that needs to be productionalized. The code used to produce the analysis also needs to be considered a deliverable.
Many of us perform analysis without reproducibility in mind. With the increasing democratization of data, it is becoming more and more important for people that may not have scientific training to be able to create analysis that can be picked up by somebody else who can then reproduce your results. That, and creating reproducible research is just solid science.
We are going to spend an evening walking though the various tools available to create reproducible research on Big Data. You will get introduced to the Tidyverse of R packages and how to use them. We will discuss the ins and outs of various notebook technologies like Jupyter, and Zeppelin. You will have an opportunity to learn how to get up and running with R and Spark and the various options you have to learn on real clusters instead of just your local environment. There also be a quick introduction to source control and the various options you have around using Git.
The theme of the evening will be “getting started”. We will go over various training resources and show you the optimal path to go from zero to master. Some commentary will be provided around the current state of the job market and intel from the front lines of the data science language wars. This is a large topic and the evening will be fairly dynamic and responsive to the needs of the audience.
Bob Wakefield has spent the better part of 16 years building data systems for many organizations across various industries. He has been running Hadoop in a lab environment for 3 years. He is the principal of Mass Street Analytics, LLC a boutique data consultancy. Mass Street is a Hortonworks Consultant Partner and Confluent Partner.
In his spare time, he likes to work on an equity investment application that combines various sources of information to automatically arrive at investing decisions. When he is not doing that, you’ll find him flying his A-10 simulator. Full CV can be found here: https://www.linkedin.com/in/bobwakefieldmba/
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
This document summarizes the findings of the Jisc Research Data Spring project at the University of York and Hull which investigated how Archivematica could be used to provide digital preservation for research data. The project tested Archivematica, explored how it handles different file formats and research data, and identified ways to improve Archivematica and integrate it into research data management workflows. The next phases will develop Archivematica further and implement proof of concepts at York and Hull to preserve research data using Archivematica.
The term "life cycle" refers to the series of stages or phases that an organism, system, or product goes through from its beginning to its end. It is a concept that can be applied to various contexts, such as biology, ecology, business, technology, and project management. Here are a few examples of life cycles:
Biological Life Cycle: In biology, the life cycle refers to the sequence of stages that an organism undergoes from birth to reproduction and eventually death. This can include processes like birth or germination, growth and development, reproduction, and death.
Product Life Cycle: The product life cycle describes the stages a product goes through from its introduction to the market until its eventual decline. These stages typically include introduction, growth, maturity, and decline. Companies monitor the product life cycle to make strategic decisions regarding marketing, production, and product development.
Project Life Cycle: The project life cycle outlines the stages involved in the management and execution of a project. These stages typically include initiation, planning, execution, monitoring and control, and closure. Each phase has specific activities and deliverables, ensuring that the project progresses in a structured and organized manner.
Ecological Life Cycle: Ecological life cycles refer to the stages that ecosystems or species go through over time. This can involve the growth and decline of populations, adaptation to environmental changes, and interactions within the ecosystem.
Human Life Cycle: The human life cycle encompasses the different stages of development and growth that individuals go through from birth to death. This includes infancy, childhood, adolescence, adulthood, and eventually old age.
Understanding life cycles is important as it provides insight into the processes and changes that occur within various systems. It allows for better planning, decision-making, and adaptation to ensure sustainable growth, effective management, and optimal utilization of resources throughout the life cycle.
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
This document provides an overview of databases, web services, tools, and computing resources needed for systems immunology. It discusses the importance of having a clear hypothesis, statistical understanding, large datasets from different levels of biology, software tools, programming expertise, and computing power. Specific databases, tools, and programming languages discussed include ImmPort, Stanford's HIMC database, MySQL, GenePattern, Galaxy, Weka, R, Perl, Python, and Amazon Cloud computing. The document provides recommendations and resources for learning statistics, data mining, programming languages, and using cloud computing resources.
Dorothea Salo gave a presentation on various "open" movements and how they relate to libraries. She discussed open source software, open standards, open access, open data, and open notebook science. For each topic, she explained what is being opened, how it is opened through things like licensing and standards, and why libraries should care about supporting these movements. The overall goals were to disambiguate jargon, explain her role in promoting open access, and suggest opportunities for libraries to participate in and support open initiatives.
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the closing segment of the NISO training series "AI & Prompt Design." Session Eight: Limitations and Potential Solutions, was held on May 23, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the seventh segment of the NISO training series "AI & Prompt Design." Session 7: Open Source Language Models, was held on May 16, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the sixth segment of the NISO training series "AI & Prompt Design." Session Six: Text Classification with LLMs, was held on May 9, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fifth segment of the NISO training series "AI & Prompt Design." Session Five: Named Entity Recognition with LLMs, was held on May 2, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the fourth segment of the NISO training series "AI & Prompt Design." Session Four: Structured Data and Assistants, was held on April 25, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the third segment of the NISO training series "AI & Prompt Design." Session Three: Beginning Conversations, was held on April 18, 2024.
This presentation was provided by Kaveh Bazargan of River Valley Technologies, during the NISO webinar "Sustainability in Publishing." The event was held April 17, 2024.
This presentation was provided by Dana Compton of the American Society of Civil Engineers (ASCE), during the NISO webinar "Sustainability in Publishing." The event was held April 17, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, during the second segment of the NISO training series "AI & Prompt Design." Session Two: Large Language Models, was held on April 11, 2024.
This presentation was provided by Teresa Hazen of the University of Arizona, Geoff Morse of Northwestern University. and Ken Varnum of the University of Michigan, during the Spring ODI Conformance Statement Workshop for Libraries. This event was held on April 9, 2024
This presentation was provided by William Mattingly of the Smithsonian Institution, during the opening segment of the NISO training series "AI & Prompt Design." Session One: Introduction to Machine Learning, was held on April 4, 2024.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the eight and final session of NISO's 2023 Training Series on Text and Data Mining. Session eight, "Building Data Driven Applications" was held on Thursday, December 7, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the seventh session of NISO's 2023 Training Series on Text and Data Mining. Session seven, "Vector Databases and Semantic Searching" was held on Thursday, November 30, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the sixth session of NISO's 2023 Training Series on Text and Data Mining. Session six, "Text Mining Techniques" was held on Thursday, November 16, 2023.
This presentation was provided by William Mattingly of the Smithsonian Institution, for the fifth session of NISO's 2023 Training Series on Text and Data Mining. Session five, "Text Processing for Library Data" was held on Thursday, November 9, 2023.
This presentation was provided by Todd Carpenter, Executive Director, during the NISO webinar on "Strategic Planning." The event was held virtually on November 8, 2023.
More from National Information Standards Organization (NISO) (20)
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
Labou "Data Science and the Library at UC San Diego"
1. DATA SCIENCE
AND THE LIBRARY
AT UC SAN DIEGO
STEPHANIE LABOU
DATA SCIENCE LIBRARIAN
MARCH 18, 2020
NISO WEBINAR
2. SOME CONTEXT
• University of California San
Diego
• R1 university
• ~39,000 students
• Data Science Librarian = Data
Librarian +
• I don’t have a library degree or
any previous library
experience
• But I do have lots of experience
3. TODAY’S TOPIC
“This roundtable discussion will focus on the on-
going need for information professionals to be
well-versed in data science skills in order to
successfully support the work of students, scholars
and other professionals.”
“…additional tools or support are needed for
information professionals as they extract, wrangle,
analyze and present data? “
5. LET’S TALK SEMANTICS
• Data science = artificial intelligence (AI), deep
learning, machine learning (ML), neural
networks, high performance computing (HPC)
• Data science = data cleaning and manipulation,
using code to automate data tasks, data at “big
enough” scale
7. MY ROLE
• Questions about:
• 1) Looking for specific data about X
• What does data science – and other domains
leveraging data science methodologies – need? Data!
• 2) Have data – now what?
• Makes up the vast majority of support
8. THE COMMON THREAD:
COMPUTING
• Questions about using data in compute-heavy ways
• Reading in and formatting data in R/Python
• Working with non-traditional data formats
• API access, web scraping
• Access to additional resources for large (TB)
datasets
• Data & GIS Lab
• Using other platforms related to coding, like
GitHub, Jupyter
9. WHAT SKILLS DO I NEED FOR
THIS?
• Data life cycle 101 (find, manage, analyze,
preserve, etc.)
• For data science support, need knowledge of at
least one programming language
• Concepts transfer between languages
• My path: self-taught!
• Cons: this is the long and rocky path
• Pros: forced early on to develop excellent problem-solving skills
10. DO WE ALL NEED TO LEARN
“DATA SCIENCE”?
• In my opinion: no (but it depends)
• What are the support needs?
• Knowing “enough” goes a long way
• A handful of functions for a subset of topics (mostly
data cleaning and manipulation in an automated
platform) goes a long way
• More important to know where to find help, think
through how to approach a problem
11. SO WHAT SHOULD WE DO?
• Skilling up existing employees
• Library Carpentry, etc.
• “Know just enough to be dangerous”
• Hiring non-library for new/adapted roles
• Aka, my experience
• In-the-field skillset is valuable; higher level of support
• Outsourcing – collaborating with other groups on
campus
• IT, other computing groups
13. EXAMPLE PROJECTS
• Things we’ve done
• Python scripts to automate parts of metadata ingest
into system
• OpenRefine for metadata cleaning
• What we’d like to do
• Automate scraping DataCite
• Perhaps APIs?
14. GUIDING PRINCIPLES
• Look for problems where data science
methodologies could be the solution
• Could this manual process be automated? (coding)
• Could we better assess our metrics? (analytics)
• Could we better display this info for findability?
(visualization)
• Not “fancy solution in search of a problem”
• Data science for the sake of data science is just more work
for everyone
15. OTHER POPULAR TOPICS
• Collections as data
• Making existing collections more accessible for data
science topics
• Text mining, natural language processing, etc.
• Data as collections
• Once again: What does data science need? Data!
• Data collections/guides as high value, high use
16. LESSONS LEARNED
• Adaptability/flexibility
• Software changes but best practices remain (and get
better)
• This is a natural fit for the library!
• Building infrastructure today that will handle
tomorrow’s needs
• Collaboration is key
• Within-library and campus partners
Hello, my name is Stephanie Labou and today I’m going to be talking with you about data science and the library at UC San Diego.
I want to start with some background information, since it will help contextualize my perspective.
I’m at UC San Diego, which is a large R1 university – meaning we are doctoral granting with very high research activity. We have a medical school and a business school and about 39,000 students.
UCSD had a Data Librarian for decades, but the position was recently rebranded as Data Science Librarian. I’ve been here almost two years and I’m the inaugural “data science librarian”.
I do want to mention: I don’t have a library degree and this job is my first job working in a library. But! I do have a master’s degree and before this job, I worked for 3 years as a data manager and research assistant with a large interdisciplinary environmental research group. So I have lots of experience working with data and – crucially – with scientific programming.
So this is today’s topic. I wanted to highlight a few phrases in particular because these guided how I put together this talk. Of all the topics, I want to focus on two in particular:
First, how can information professionals be well-versed in data science skills in order to fill a support role, and second, what tools do information professionals need to work with data moving forward?
I’ve split these into the two components – outward vs inward – because I think the skill sets are complementary, but not necessarily the same.
To start, I want to take a step back because all this is predicated on the term “data science”. So what is “data science”?
Well, it’s got a lot going on. You’ve maybe seen one of these types of data science venn diagrams and you can see that data science is a large, often poorly defined, multidisciplinary, and ever-evolving field.
https://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html
To be blunt, there’s a lot of hype about data science. I see two flavors of data science: the “hard core” stuff like AI and deep learning and the aspects of data science that are really like computer science engineering. And then there’s the rest of data science – which, to be clear, I love! – which are what I think more of as modern data literacy skills. This is being able to work with data. You may have heard the statistic that 80% of a data scientist’s work is data cleaning, and it’s true.
I don’t mind the term “data science” for these kinds of skills. I think they’re incredibly important and they are, in a real sense, the “science of data”. This is what I’m going to focus on in this talk.
I wanted to touch on this briefly because I also think the term “data science” has a lot of baggage. It can intimidate people. It can be stressful to think “oh no, now I’ve got to learn machine learning and AI and it seems like such a big leap”. And I think being clear about what we mean when we say “how are libraries using data science, what should librarians know about data science” is not just helpful, but necessary in terms of setting expectations and goals.
So let’s start by talking about data science as a support area – how can we support students and researchers.
For this, I’ll talk about my role specifically. What kinds of questions does the data science librarian get?
Well, a lot of them are about finding data. Because what does data science methodologies need? Data!
For the other type of question, it’s really about how do we extend the support we provide for data, to data science.
And the common thread here is computing.
These questions – coming from pretty much all disciplines on campus – are about using data in compute-heavy ways. They’re about working with data in R or Python, or working with non-traditional data formats like JSON or HTML of netCDF, which was mentioned in the webinar last week – formats that we new to people or disciplines that are used to working with spreadsheets. It’s questions about API use and web scraping – about accessing and leveraging the massive amounts of data that are available out there.
It also means that sometimes, a traditional laptop isn’t going to cut it. The library at UCSD has a Data & GIS Lab and I see it as filling the space between “I can do what I need on my laptop” and “I need a supercomputer”. This is for large – even TB – datasets and our computers have more memory and processing power than a laptop. So sometimes the “support” is providing hardware, as well as software and software help. It is this niche that’s not computer science, but is about reproducible research, research automation, scientific programming.
Which means I also spend a lot of time talking about best practices and how to get started with these things, because usually people aren’t coming at this with a background in computer science. Often, they’ve never taken a single programming class, or maybe they’ve taken a quick bootcamp.
Ok! So, considering that that’s my purview, what skills do I need to do this?
Obviously, I needed to have a strong grasp of the entire data life cycle.
It has also been crucial for me that I had deep knowledge of at least one programming language. I consider myself quite experienced with the software program R, and I know enough to be dangerous in Python and Stata. But, a lot of that additional platform knowledge entails a lot of Googling – I know what I want to do, and I understand the conceptual framework or order of operations to get there, in a programming sense, but I may not know the exact syntax. But I can Google that.
And I’m self-taught when it comes to programming. I took a stats class in grad school that used R, but everything else I learned on-the-job in my previous position. There’s nothing like getting thrown in the deep end to make you learn fast. Plus, learning project-by-project also meant that I picked up the skills that were most useful first, rather than starting from fundamentals. Whether this was a good thing is debateable – I know I’m missing some building blocks of basic computer science knowledge – but it rarely causes problems.
So, do we all need to learn data science to be able to provide data science support, in this sense?
Well, probably not, but it depends on what level of support you want to be able to provide, which in turns depends on what your patron needs are. We’re a large, STEM-heavy campus, with a data science institute and major, so it makes sense for us to be able to provide this deep level of support, which includes code support from someone who is not only conversant, but experienced with at least one programming language.
But, being conversant is often enough. The important thing is the conceptual framework of how to approach a problem. For instance, if I need to present data, how can I get data from format and structure of type X to type Y? Breaking this down into steps: first, I need to convert my character dates to date formats, I want to have columns of ABC which are currently in row format, etc. Thinking through the workflow is what helps the most. And this is something that comes from experience working in a programming language but a little goes a long way. Long vs wide data, the concept of grouping data, etc. – a lot of basic data literacy, but in this programming/data science context.
And for other librarians, I would say that this is just one more reference area. As a reference area, it’s more about knowing where/what to search: knowing the names of some common platforms, maybe the names of some packages, and where to find help.
So, this is the big question and again, depends: we don’t all need to learn everything, but it is helpful to know a bit. And the level of support your organization can provide will increase with more in-house knowledge.
The first option is, of course, skilling up existing employees. I just finished saying that we don’t all need to become full-fledged data scientists, and I believe it! But, I do think having the exposure to it can definitely help provide more in-depth support for patrons. This is a popular topic, not just within data science, but across all domains. So how can libraries scale up their support of data, to support for data science?
The Carpentries organization – an international organization with free curriculum online – has a curriculum for Library Carpentry, which is these kinds of skills for librarians. It covers a variety of different platforms for automating tasks, including R and Python, as well as command line and others, as well as how to get into that mindset of working with data at scale in an automated fashion. We’ve run one of these workshops here recently and it was quite popular.
Another option is my experience: bringing a non-library trained professional in. It has been an amazing fit for me – libraries is clearly where I belong – but I know this isn’t always a popular option. But until MLIS curriculum changes to incorporate more of this – and if it should is another conversation; no one program can prepare one person with every skill – this is a definite option to hit the ground running with providing a deeper level of support. I had that in-the-field skillset so I can work with students and faculty at a higher level, from recommending specific packages, to teaching workshops, to reviewing code.
The third option I see is outsourcing these duties to another group, likely one on campus. This may be the IT group, or a research facilitator group, or another group. This does assume though, that these groups (a) exist and (b) are willing to take this one, which may be not the case. I’m fortunate that at UCSD we have a vibrant campus-wide collaboration for all things data science and we can each tackle a component that fits most naturally. For instance, central IT runs an online virtual machine, in essence, that students can use to run GPU-intensive calculations. And honestly, I’m relieved that that’s not something I need to worry about – it exists and it’s not my department. But it does exist!
Ok, so moving on to data science within the library. This is less about supporting patrons and more about using data science to enhance our own work.
There’s been a lot of talk about data science methodologies in libraries, but – and correct me if I’m wrong – I haven’t yet seen more than case studies from more university libraries. That is, no one has yet gone 100% data science and revamped their entire workflows. But, please let me know if I’m wrong here!
What I have seen other groups do, and what our library has done, is implement “data science” methodologies for certain projects. So for example we’ve created Python scrips to automate parts of metadata ingest into our internal system for certain projects. The cataloguers seem quite keen on using OpenRefine, which is an open source platform for automated data cleaning.
We’ve talked about how we would like to figure out how to automate scraping DataCite and maybe leverage API capabilities more than we currently do.
We’re definitely still in the early stages – we have some folks in-house who know Python and can do these types of things, and I’ve also had some of the students in the Data & GIS labs work on some Python scripts for us as well. I’d love to get some data science majors hired in at some point, but that’s for the future and for specific projects.
The main take away, I think, from what we’ve done so far, is that using data science in libraries should be a case of using new methodologies because they solve a problem faster or better. Not because they’re flashy and everyone is talking about it and the higher ups are suddenly expecting us to “do data science”. It’s really been about automating manual processes, or talking about how we could better assess our own metrics, or better display information. It is targeted and it is on a case-by-case basis by people who feel comfortable with it and are eager to learn and implement these projects.
And from other libraries and universities, I’ve seen some great “collections as data” projects, which entail making existing collections more accessible for data science topics. So using a sprinkle of data science and FAIR principles – findable, accessible, interoperable, and reusable – to make existing collections more likely to be used for these data science types of projects. Especially for natural language processing or text mining, this is a huge opportunity.
I also think we should be thinking a lot about data as collections. I said it before and I’ll say it again – what does data science methodologies need, in every domain? They need data! We have a research data collection here at UCSD, as well as data we have purchased or licensed, and I want to see those collections, or guide, as high value and high use, even more so than they currently are. This is a chance for reuse of data to really speed up in certain fields and I want the library to be at the center of it.
I want to close with a few takeaways.
Adaptability and flexibility will be key, in terms of data science and libraries moving forward. There’s a lot of hype and focus on specific platforms, or languages, or packages, or what have you, but these things will absolutely change. What won’t change is best practices. How to manage data and code and information. How to structure data and projects. Where to find data, how to cite it, when to reuse or not reuse data. These are all areas that absolutely fall within the library’s purview and where the library can excel. This is modern data literacy and by focusing on those aspects, the library can not only remain relevant, but provide much-needed guidance.
This also fits when talking about infrastructure: hardware, software, and people. We’ll have to adapt and we’ll have to grow. It can be scary. But it can also be a good opportunity. So planning forward when talking about what spaces we want, what staff we want, and what hardware/software capacity and capability we want.
Finally, collaboration. I’m the point of contact for data and data science, but I work closely with our subject libraries, other groups in the library, and our library IT. I work with campus IT, our research facilitators, and the Halıcıoğlu Data Science Institute here. Data science is way too big – not just in terms of TB scale and hardware, but also that it’s now in basically every domain across campus – for any one person or group to handle. Remember the Venn diagram? It’s got a lot going on! Collaboration is key and I think a lot of the reasons my position in particular has been so successful is that there was a network of people I could work with, which has been invaluable.
Thank you very much for listening. I’d be happy to take questions now and you can of course email me with questions about this or anything else related to data science. My email is here: slabou@ucsd.edu.
Thanks again!