Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...ASIS&T
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries
A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Erica M. Johns, Jon Corson-Rikert, Huda J. Khan, Dean B. Krafft and Matthew S. Mayernik
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Keynote Address: Data Management Plan Requirements at the US Department of Energy
Laura J. Biven, Ph.D., Senior Science and Technology Advisor, Office of the Deputy Director for Science Programs, Office of Science, US Department of Energy
Integration of research literature and data (InFoLiS)Philipp Zumstein
Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...ASIS&T
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries
A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Erica M. Johns, Jon Corson-Rikert, Huda J. Khan, Dean B. Krafft and Matthew S. Mayernik
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Keynote Address: Data Management Plan Requirements at the US Department of Energy
Laura J. Biven, Ph.D., Senior Science and Technology Advisor, Office of the Deputy Director for Science Programs, Office of Science, US Department of Energy
Integration of research literature and data (InFoLiS)Philipp Zumstein
Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
RDAP14: Collaboration and tension between institutions and units providing da...ASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
David Minor, University of California, San Diego
Amanda Whitmire, Oregon State University
Stephanie Wright, University of Washington
Lisa Zilinski, Purdue University
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
RDAP14: Building a data management and curation program on a shoestring budgetASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
Margaret Henderson
Director, Research Data Management
Virginia Commonwealth University
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)
NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Enabling transparency and efficiency in the research landscape
Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Department of Medical Informatics and Epidemiology, Oregon Health & Science University
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Outline for Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...ASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
J. Steven Hughes
NASA Jet Propulsion Laboratory
Robert R. Downs
Center for International Earth Science Information Network (CIESIN), Columbia University
David Giaretta
Alliance for Permanent Access
RDAP14: DataONE: Data Observation Network for EarthASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Amber E. Budden, Director for Community Engagement and Outreach, DataONE, University of New Mexico
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
RDAP 15: Lessons Learned from the Data Information Literacy ProjectASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Part of “Developing Data Literacy Programs: Working with Faculty, Graduate Students and Undergraduates”
Jake Carlson, Research Data Services Manager, University of Michigan
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
RDAP14: Collaboration and tension between institutions and units providing da...ASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
David Minor, University of California, San Diego
Amanda Whitmire, Oregon State University
Stephanie Wright, University of Washington
Lisa Zilinski, Purdue University
February 18 2014 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Capacity Building: Leveraging existing library networks to take on research data
Heidi Imker, Director of the Research Data Service, University of Illinois at Urbana-Champaign
RDAP14: Building a data management and curation program on a shoestring budgetASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
Margaret Henderson
Director, Research Data Management
Virginia Commonwealth University
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)
NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Enabling transparency and efficiency in the research landscape
Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Department of Medical Informatics and Epidemiology, Oregon Health & Science University
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Outline for Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP14: Policy Recommendations for Institutions to Serve as Trustworthy Stewa...ASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
J. Steven Hughes
NASA Jet Propulsion Laboratory
Robert R. Downs
Center for International Earth Science Information Network (CIESIN), Columbia University
David Giaretta
Alliance for Permanent Access
RDAP14: DataONE: Data Observation Network for EarthASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Amber E. Budden, Director for Community Engagement and Outreach, DataONE, University of New Mexico
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
RDAP 15: Lessons Learned from the Data Information Literacy ProjectASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Part of “Developing Data Literacy Programs: Working with Faculty, Graduate Students and Undergraduates”
Jake Carlson, Research Data Services Manager, University of Michigan
"Undergrad ecologists aren't learning data management" - ESA 2013Carly Strasser
Presentation for Ecological Society of America 2013 Meeting in Minneapolis, MN on 6 August 2013. Results published in Ecosphere doi: 10.1890/ES12-00139.1
About the Webinar
Big data is being collected at a rate that is surpassing traditional analytical methods due to the constantly expanding ways in which data can be created and mined. Faculty in all disciplines are increasingly creating and/or incorporating big data into their research and institutions are creating repositories and other tools to manage it all. There are many challenge to effectively manage and curate this data—challenges that are both similar and different to managing document archives. Libraries can and are assuming a key role in making this information more useful, visible, and accessible, such as creating taxonomies, designing metadata schemes, and systematizing retrieval methods.
Our panelists will talk about their experience with big data curation, best practices for research data management, and the tools used by libraries as they take on this evolving role.
Organizational Implications of Data Science Environments in Education, Resear...Victoria Steeves
Data science (DS) poses key organizational challenges for academic institutions. DS is a multidisciplinary field that includes a range of research methodologies and fields of inquiry. DS as a domain is interested in many of the same issues as libraries: data access and curation, reproducibility, the value of ontologies, and open scholarship. At the same time, identifying opportunities to collaborate and deploy unified services can be challenging. The Data Science Environment (DSE) program, co-funded by the Gordon and Betty Moore and Alfred P. Sloan foundations, provides resources to help universities develop collaborations between researchers, develop tools in DS, and create new career paths for data scientists. Working groups within the DSE focus on reproducibility, career paths, education/training, research methods, space issues, and software/tools. This program has introduced new opportunities for libraries to explore how to engage with this community and consider how to bring the expertise in the DS community to bear on library missions and goals. In this panel, program members from each of the three partner universities, the University of Washington, New York University and the University of California, Berkeley, consider the research questions of the DSE and the organizational impact of these groups in the University as a whole and for the libraries specifically. The panel will employ a case-study presentation model framed through three lenses: the role of data sciences in information science, the
potential career paths for data scientists in libraries, and the potential
amplification of information services (e.g. data curation, institutional repositories, scholarly publishing).
CNI Program: Talk Description: https://www.cni.org/topics/digital-curation/organizational-implications-of-data-science-environments-in-education-research-and-research-management-in-libraries
Video of Talk--Vimeo: https://vimeo.com/149713097
Video of Talk--YouTube: https://www.youtube.com/watch?v=L0G9JsPMEXY
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
Slides | Research data literacy and the libraryLibrary_Connect
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Christian Lauersen, Sarah J. Wright and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
Education
Advanced Technologies and
Data Management Practices in
Environmental Science: Lessons
from Academia
REBECCA R. HERNANDEZ, MATTHEW S. MAYERNIK, MICHELLE L. MURPHY-MARISCAL, AND MICHAEL F. ALLEN
Environmental scientists are increasing their capitalization on advancements in technology, computation, and data management. However, the
extent ofthat capitalization is unknown. We analyzed the survey responses of 434 graduate students to evaluate the understanding and use of
such advances in the environmental sciences. Two-thirds of the students had not taken courses related to information science and the analysis of
complex data. Seventy-four percent of the students reported no skill in programming languages or computational applications. Of the students
who had completed research projects, 26% had created metadata for research data sets, and 29% had archived their data so that it was available
online. One-third of these students used an environmental sensor. The results differed according to the students' research status, degree type, and
university type. Changes may be necessary in the curricula of university programs that seek to prepare environmental scientists for this techno-
logically advanced and data-intensive age.
Keywords: data life cycle, data repository, education, environmental sensors, eScience
With the advent of recent technological and computationaladvances, scientists are using increasing numbers of
in situ environmental sensors, model simulations, crowd-
sourcing tasks, and embedded networked systems that
enable environmental studies to incorporate various spatio-
temporal scales and to produce utiprecedented amounts
of data (Porter et al. 2005, Benson et aL 2010). Such tech-
nologies and an increasing interest in synthesis studies of
environmental phenomena have made data valuable beyond
their immediate use (Peters et al. 2008). The flood of data
that digital technologies produce (Hey and Trefethen 2003)
underscores the urgency of a rapid adoption of pertinent
skills and best practices by environmental scientists in the
proper management of data sets. Studies in which such
preparedness in the environmental sciences is evaluated
are absent; however, academic institutions may play a role
in imparting the relevant knowledge and skills to the next
generation of scientists.
As electronic devices become smaller and cheaper and
as complementary computer power grows and applications
increase in efficiency, scientists at all career stages are finding
technology useful for addressing topics from global epidem-
ics to climate change. Such integration has transformed
both the experimental techniques and the solitary working
platforms known by predecessors in the field in the not-so-
distant past (Nature 2003). But the use of technology and
interdisciplinary collaborations often necessitates analytical
tools for the integration and analysis of large and hetero-
geneous data sets. In a survey of a distributed seminar course
fo.
PREDICTING SUCCESS: AN APPLICATION OF DATA MINING TECHNIQUES TO STUDENT OUTCOMESIJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
Predicting Success : An Application of Data Mining Techniques to Student Outc...IJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter
tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
PREDICTING SUCCESS: AN APPLICATION OF DATA MINING TECHNIQUES TO STUDENT OUTCOMESIJDKP
This project examines the effectiveness of applying machine learning techniques to the realm of college
student success, specifically with the intent of discovering and identifying those student characteristics and
factors that show the strongest predictive capability with regards to successful graduation. The student
data examined consists of first time freshmen and transfer students who matriculated at California State
University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully
or discontinued their education. Operating on over 30,000 student observations, random forests are used
to determine the relative importance of the student characteristics with genetic algorithms to perform
feature selection and pruning. To improve the machine learning algorithm cross validated hyperparameter tuning was also implemented. Overall predictive strength is relatively high as measured by the
Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the
learning model are explored.
Improving Student Achievement with New Approaches to Dataecatalst
Presentation by John Whitmer to WASC Academic Resource Conference on April 11, 2013.
The CSU Data Dashboard seeks to improve student achievement by monitoring on-track indicators so that institutional leaders can better understand not only which milestones students are failing to reach, but why they are not reaching them. It can also help campuses to design interventions or policy changes to increase student success and to gauge the impact of interventions.
Academic technologies collect highly detailed student usage data. How can this data be used to understand and predict student performance, especially of at-risk students? This presentation will discuss research on a high-enrollment undergraduate course exploring the relationship between LMS activity, student background characteristics, current enrollment information, and student achievement.
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 2, Sustainability
Presenter:
Margaret Henderson, Virginia Commonwealth University
Panel Leads:
Kristin Briney, University of Wisconsin-Milwaukee & Erica Johns, Cornell University
RDAP 16: Sustainability of data infrastructure: The history of science scienc...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 2, Sustainability
Presenter:
Kristin Eschenfelder, University of Wisconsin-Madison
Panel Leads:
Kristin Briney, University of Wisconsin-Milwaukee & Erica Johns, Cornell University
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Jonathan Petters, Johns Hopkins University
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Lisa Federer, National Institutes of Health
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Angi Ogier, Virginia Tech University
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 5, "DMPs and Public Access: Agency and Data Service Experiences"
Presenter:
Laura J. Biven, US Department of Energy
Panel Lead:
Margaret Henderson, Virginia Commonwealth University
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Amy Koshoffer, University of Cincinnati
Eric J. Tepe, University of Cincinnati
RDAP 16 Poster: Interpreting Local Data Policies in PracticeASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Line Pouchard, Purdue University
Donna Ferullo, Purdue University
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Jan Cheetham, University of Wisconsin-Madison
Wendy Kozlowski, Cornell University
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenter:
Caitlin Bakker, University of Minnesota
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Tina Griffin, University of Illinois at Chicago
RDAP 16 Lightning: RDM Discussion Group: How'd that go?ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Margaret Janz, Temple University
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Christie Wiley, University of Illinois Urbana-Champaign
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Sara Mannheimer, Montana State University
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Ana Van Gulick, Carnegie Mellon University
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Megan N. O'Donnell, Iowa State University
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Betty Rozum, Utah State University
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 4, "Measuring Up: How Are We Defining Success for Research Data Services?"
Presenter:
Yasmeen Shorish, James Madison University
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 4, "Measuring Up: How Are We Defining Success for Research Data Services?"
Presenter:
Jake Carlson, University of Michigan
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
Poster RDAP13: Data information literacy multiple paths to a single goal
1. Purdue University
Purdue e-Pubs
Libraries Faculty and Staff Presentations Purdue Libraries
1-1-2013
Data Information Literacy: Multiple Paths to a
Single Goal
Jake Carlson
Purdue University, jakecarlson@purdue.edu
Sarah Wright
Cornell University, sjw256@cornell.edu
Brian Westra
University of Oregon, bwestra@uoregon.edu
Jon Jeffryes
University of Minnesota - Twin Cities, jeffryes@umn.edu
Follow this and additional works at: http://docs.lib.purdue.edu/lib_fspres
Part of the Library and Information Science Commons
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.
Recommended Citation
Carlson, Jake; Wright, Sarah; Westra, Brian; and Jeffryes, Jon, "Data Information Literacy: Multiple Paths to a Single Goal" (2013).
Libraries Faculty and Staff Presentations. Paper 12.
http://docs.lib.purdue.edu/lib_fspres/12
2. Mini-‐Course
Readings
&
Seminar
Objec&ve
What
skills
will
graduate
students
need
to
be
successful
in
managing,
working
with
and
cura=ng
their
research
data?
This
poster
reports
on
ini=al
results
from
a
two-‐year
project
funded
by
the
Ins=tute
of
Museum
and
Library
Services
(IMLS)
that
is
centered
on
exploring
this
ques=on.
Methods
The
project
is
comprised
of
five
teams
(each
made
up
of
a
data
services
librarian,
a
subject
or
informa=on
literacy
specialist
and
a
faculty
researcher)
from
four
ins=tu=ons.
Each
team
conducted
environment
scans
of
the
discipline
and
conducted
interviews
of
their
faculty
partner
and
his
graduate
students.
Using
this
knowledge
each
team
developed
an
educa=onal
program
tailored
to
their
specific
discipline
and
local
prac=ces.
Findings
Data
Informa&on
Literacy:
Mul&ple
Paths
to
a
Single
Goal
Jake
Carlson,
Purdue
University;
Jon
Jeffryes,
University
of
Minnesota;
Brian
Westra,
University
of
Oregon;
Sarah
Wright,
Cornell
University
Discipline
Natural
Resources
Civil
Engineering
Ecology
Electrical
&
Computer
Engineering
Ag
&
Bio
Engineering
Iden=fied
Needs
Data
Sharing
Databases
Metadata
Data
Ownership
Long-‐term
Access
Cultures
of
Prac=ce
Metadata
Closing
Out
a
Grant
Document-‐
a=on
&
Organiza=on
Transfer
of
Responsibilty
Data
Sharing
Protocols
Metadata
Response
Outcomes
Faculty
Engagement
Applica=on
of
Best
Prac=ces
Completed
DMP
Improved
Data
Prac=ces
Awareness
of
Tools,
Resources,
and
Best
Prac=ces
Raised
Awareness
Resources
for
Evalua=ng
Student
Work
Refine
Exis=ng
SOPs
on
Data
Checklist
for
Enforcing
SOPs
Cornell
University
University
of
Minnesota
University
of
Oregon
Purdue
University
#1
Purdue
University
#2
Online
Course
Embedded
Librarianship
Workshops
• All
12
competencies
were
seen
as
important
by
faculty
and
graduate
students.
• Lack
of
formal
training
in
data
management
• Lack
of
formal
policies
in
the
research
team
• Self-‐directed
learning
through
trial
and
error
• Focus
on
data
mechanics
and
local,
immediate
needs
over
deeper
concepts
or
applica=on
outside
of
the
lab.
Note:
Due
to
the
small
size
and
use
of
convenience
sampling
these
findings
cannot
be
generalized
beyond
this
project.
Project
Personnel:
Jake
Carlson
(PI),
Camille
Andrews,
Marianne
Stowell
Bracke,
Michael
Fosmire,
Jon
Jeffryes,
Lisa
Johnston,
Megan
Sapp
Nelson,
Dean
Walton,
Brian
Westra,
Sarah
Wright.
Data
Informa=on
Literacy
Model
Image
Credits:
Photos:
“School
of
Fish”
by
Tom
Weilenmann:
h_p://www.flickr.com/photos/tom_weilenmann/51673288/;
“Roosevelt
Dam
Bridge”
by
Al_HikesAZ:
h_p://www.flickr.com/photos/alanenglish/466658759/;
“Sunshine
Sparkling
on
the
Prairie
Grass”
by
Carol
VanHook:
/librariesrock/3613075606/;
“Code
Obfusca=on
-‐
Part
2:
Obfusca=ng
Data
Structures”
by
Sonia
Gupta:
h_p://palizine.plynt.com/issues/2005Sep/code-‐obfusca=on-‐con=nued/;
“SEPACisco”
from
the
Water
Quality
Field
Sta=on:
h_p://www.agry.purdue.edu/water/fieldstn/photogallery/SEPACisco.jpg.
Graphics:
“User
Experience”
stencils
by
Todd
Zazelenchuk
&
Elizabeth
Boling:
h_p://www.userfocus.co.uk/resources/omnigraffle.html
hKp://datainfolit.org