Data Management Lab: Session 3 slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Week 1 lecture for High School Bioinformatics course; covers why we need to use computers in biology, what bioinformatics/computational biology is, an introduction to machine learning, and examples from current research
RDAP 15: The Role of Assessment in Research Data ServicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Amanda Whitmire, Jake Carlson, Patricia Hswe, Susan Wells Parham, Lizzy Rolando and Brian Westra
“Using assessment of NSF data management plans to enable evidence-based evolution of research data services”
Travis Weller, Amalia Monroe-Gulick
“Evaluating Research Needs by Methodology: Assessment at the University of Kansas”
Kathleen Fear, Data Librarian, University of Rochester
“Where’s the data? Assessing researcher compliance with publisher requirements for data sharing”
Expert panel on industrialising microbiomics - with UnileverEagle Genomics
A panel of experts, including Dr Barry Murphy, Microbiomics Science Lead at Unilever, Dr Craig McAnulla, Senior Consultant for Bioinformatics and Dr Yasmin Alam-Faruque, Scientific Data Manager/Biocurator discuss first-hand experience and views on how to get better insights faster from microbiome data.
3 data normalization (2014 lab tutorial)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
The quality is the key concept in each and every analysis as well as in computing applications. Today we
gather large volumes of information and store them in multidimensional mode that is in data warehouses
then analyze the data to be used in exact decision making in various fields. The studies proved that most of
the volumes of data are not useful for analysis due to lack of quality caused by improper data handling
techniques. This paper try to find out a solution to achieve the quality of data from the foundation of data
repositories and try to avoid quality anomalies at meta data level. This paper also proposes the new model
of Meta data architecture.
Week 1 lecture for High School Bioinformatics course; covers why we need to use computers in biology, what bioinformatics/computational biology is, an introduction to machine learning, and examples from current research
RDAP 15: The Role of Assessment in Research Data ServicesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Amanda Whitmire, Jake Carlson, Patricia Hswe, Susan Wells Parham, Lizzy Rolando and Brian Westra
“Using assessment of NSF data management plans to enable evidence-based evolution of research data services”
Travis Weller, Amalia Monroe-Gulick
“Evaluating Research Needs by Methodology: Assessment at the University of Kansas”
Kathleen Fear, Data Librarian, University of Rochester
“Where’s the data? Assessing researcher compliance with publisher requirements for data sharing”
Expert panel on industrialising microbiomics - with UnileverEagle Genomics
A panel of experts, including Dr Barry Murphy, Microbiomics Science Lead at Unilever, Dr Craig McAnulla, Senior Consultant for Bioinformatics and Dr Yasmin Alam-Faruque, Scientific Data Manager/Biocurator discuss first-hand experience and views on how to get better insights faster from microbiome data.
3 data normalization (2014 lab tutorial)Dmitry Grapov
Get more information:
http://imdevsoftware.wordpress.com/2014/10/11/2014-metabolomic-data-analysis-and-visualization-workshop-and-tutorials/
Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
The quality is the key concept in each and every analysis as well as in computing applications. Today we
gather large volumes of information and store them in multidimensional mode that is in data warehouses
then analyze the data to be used in exact decision making in various fields. The studies proved that most of
the volumes of data are not useful for analysis due to lack of quality caused by improper data handling
techniques. This paper try to find out a solution to achieve the quality of data from the foundation of data
repositories and try to avoid quality anomalies at meta data level. This paper also proposes the new model
of Meta data architecture.
Are Your Students Ready for Lab?
11/5/2015
Presenters: Bill Heslop and Tony Baldwin, Directors and Co-founders, Learning Science Ltd.
LabSkills is an online program that prepares students for their lab sessions through assignments inOWLv2, the leading online learning system for Chemistry. LabSkills makes it easy for you to requirestudents to complete laboratory preparation prior to attending lab with demonstrations, interactivesimulations, and quizzes. The newest version of LabSkills PreLabs is an enhanced course with 10 new techniques, plus new mobile-compatible simulations. LabSkills content is easy to assign and is automatically graded. LabSkills is currently used by schools and universities in more than 30 countries worldwide.In this webinar, you will learn how to get your students:-Engaged with practical work-Prepared when they get to the lab-Confident in performing the experiments-Using the time in the lab effectively
Corporate Data Quality Management Research and Services OverviewBoris Otto
This presentation provides an overview of the research and services portfolio of the Business Engineering Institute (BEI) St. Gallen in the field of corporate data quality managemnet (CDQM). CDQM comprises topics such as data governance, data quality measurement, master data management, data architecture management etc. At the core of the research and service portfolio is the Competence Center Corporate Data Quality (CC CDQ). The CC CDQ is a consortium research project at the Institute of Information Management at the University of St. Gallen (IWI-HSG). Partner companies come from various industry and service sectors.
How big is big data? We have moved a long way since storing files in floppy disks. Our guest speaker, Matt LeMay, explores big data at a human scale. Find out more about this topic at our upcoming Lab this September in Singapore.
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
Talk held at DOAG 2016 conference (2016.doag.org/de/home) discussing a data lab concept incl.architecture blueprint, collaboration and tool examples based on Oracle solutions like Oracle Big Data Discovery (in combination with Jupyter Notebook)
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
I dati sono il nuovo Capitale: come il capitale finanziario, sono una risorsa che deve essere gestita, raccolta e tenuta al sicuro, ma deve essere anche investita dalle organizzazioni che vogliono ottenere vantaggio competitivo. I dati non sono una risorsa nuova, ma soltanto oggi per la prima volta sono disponbili in abbondanza assieme alle tecnologie necessarie per massimizzarne il ritorno. Esattamente come l'elettricità fu una curiosità da laboratorio per molto tempo, finché non venne resa disponibile alle masse e dunque cambiò totalmente il volto dell'industria moderna.Ecco perché per accelerare il cambiamento è necessario un approccio innovativo alla esecuzione delle iniziative orientate ai Big Data: un laboratorio analitico come catalizzatore dell'innovazione (Data Lab).In questo webinar sulle tecnologie Oracle, utilizzeremo il consueto approccio del racconto basato su casi d’uso ed esperienze concrete.
Spring 2014 Data Management Lab: Session 1 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Are Your Students Ready for Lab?
11/5/2015
Presenters: Bill Heslop and Tony Baldwin, Directors and Co-founders, Learning Science Ltd.
LabSkills is an online program that prepares students for their lab sessions through assignments inOWLv2, the leading online learning system for Chemistry. LabSkills makes it easy for you to requirestudents to complete laboratory preparation prior to attending lab with demonstrations, interactivesimulations, and quizzes. The newest version of LabSkills PreLabs is an enhanced course with 10 new techniques, plus new mobile-compatible simulations. LabSkills content is easy to assign and is automatically graded. LabSkills is currently used by schools and universities in more than 30 countries worldwide.In this webinar, you will learn how to get your students:-Engaged with practical work-Prepared when they get to the lab-Confident in performing the experiments-Using the time in the lab effectively
Corporate Data Quality Management Research and Services OverviewBoris Otto
This presentation provides an overview of the research and services portfolio of the Business Engineering Institute (BEI) St. Gallen in the field of corporate data quality managemnet (CDQM). CDQM comprises topics such as data governance, data quality measurement, master data management, data architecture management etc. At the core of the research and service portfolio is the Competence Center Corporate Data Quality (CC CDQ). The CC CDQ is a consortium research project at the Institute of Information Management at the University of St. Gallen (IWI-HSG). Partner companies come from various industry and service sectors.
How big is big data? We have moved a long way since storing files in floppy disks. Our guest speaker, Matt LeMay, explores big data at a human scale. Find out more about this topic at our upcoming Lab this September in Singapore.
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
Talk held at DOAG 2016 conference (2016.doag.org/de/home) discussing a data lab concept incl.architecture blueprint, collaboration and tool examples based on Oracle solutions like Oracle Big Data Discovery (in combination with Jupyter Notebook)
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
I dati sono il nuovo Capitale: come il capitale finanziario, sono una risorsa che deve essere gestita, raccolta e tenuta al sicuro, ma deve essere anche investita dalle organizzazioni che vogliono ottenere vantaggio competitivo. I dati non sono una risorsa nuova, ma soltanto oggi per la prima volta sono disponbili in abbondanza assieme alle tecnologie necessarie per massimizzarne il ritorno. Esattamente come l'elettricità fu una curiosità da laboratorio per molto tempo, finché non venne resa disponibile alle masse e dunque cambiò totalmente il volto dell'industria moderna.Ecco perché per accelerare il cambiamento è necessario un approccio innovativo alla esecuzione delle iniziative orientate ai Big Data: un laboratorio analitico come catalizzatore dell'innovazione (Data Lab).In questo webinar sulle tecnologie Oracle, utilizzeremo il consueto approccio del racconto basato su casi d’uso ed esperienze concrete.
Spring 2014 Data Management Lab: Session 1 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Knowledge discovery is the process of adding knowledge from a large amount of data. The quality of knowledge generated from the process of knowledge discovery greatly affects the results of the decisions obtained. Existing data must be qualified and tested to ensure knowledge discovery processes can produce knowledge or information that is useful and feasible. It deals with strategic decision making for an organization. Combining multiple operational databases and external data create data warehouse. This treatment is very vulnerable to incomplete, inconsistent, and noisy data. Data mining provides a mechanism to clear this deficiency before finally stored in the data warehouse. This research tries to give technique to improve the quality of information in the data warehouse.
Machine Learning for Predictive Data Analysis in Clinical ResearchClinosolIndia
Machine learning (ML) techniques have the potential to revolutionize predictive data analysis in clinical research by enabling researchers to uncover insights, make informed decisions, and develop more personalized treatment approaches. Here's how machine learning can be applied to predictive data analysis in clinical research
Data Cleaning and Validation: Best Practices for Data IntegrityClinosolIndia
Data cleaning and validation are critical processes to ensure the integrity, accuracy, and reliability of clinical data. These best practices can help maintain data quality and enhance the validity of research outcomes:
Define Data Cleaning and Validation Procedures Early: Establish clear data cleaning and validation procedures as part of the study protocol or data management plan. Define data validation rules, data range checks, and data cleaning criteria upfront to ensure consistency and adherence to predefined standards.
Use Electronic Data Capture (EDC) Systems: Implement EDC systems that offer built-in data validation checks, range validations, and skip patterns. EDC systems can prevent certain types of errors during data entry and facilitate real-time validation as data is collected.
Develop Data Validation Checks: Create automated validation checks to identify discrepancies, outliers, missing data, and inconsistencies. These checks can include cross-field validations, data range validations, and logical validations based on predefined rules.
Standardize Data Entry: Enforce standardized data entry formats and units to minimize variability and errors. Provide clear instructions to data entry personnel to ensure consistent and accurate data collection.
Implement Double Data Entry and Review: For critical data points, consider implementing a double data entry process where data is entered by two independent personnel. Any discrepancies between the two entries are flagged for resolution. A third reviewer can adjudicate discrepancies if necessary.
A simplified approach for quality management in data warehouseIJDKP
Data warehousing is continuously gaining importance as organizations are realizing the benefits of
decision oriented data bases. However, the stumbling block to this rapid development is data quality issues
at various stages of data warehousing. Quality can be defined as a measure of excellence or a state free
from defects. Users appreciate quality products and available literature suggests that many organization`s
have significant data quality problems that have substantial social and economic impacts. A metadata
based quality system is introduced to manage quality of data in data warehouse. The approach is used to
analyze the quality of data warehouse system by checking the expected value of quality parameters with
that of actual values. The proposed approach is supported with a metadata framework that can store
additional information to analyze the quality parameters, whenever required.
Data Management Lab: Data mapping exercise instructionsIUPUI
Spring 2014 Data Management Lab: Session 1 Data mapping exercise instructions (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Health Catalyst
Healthcare organizations increasingly rely on data to inform strategic decisions. This growing dependence makes ensuring data across the organization is fit for purpose more critical than ever. Decision-making challenges associated with pandemic-driven urgency, variety of data, and lack of resources have further highlighted the critical importance of healthcare data quality and prompted more focus and investment. However, many data quality initiatives are too narrow in focus and reactive in nature or take longer than expected to demonstrate value. This leaves organizations unprepared for future events, like COVID-19, that require a rapid enterprise-wide analytic response.
What are some actionable ways you can help your organization guard against the data quality challenges uncovered this past year and better prepare to respond in the future? Join Taylor Larsen, Director of Data Quality for Health Catalyst, to learn more.
What You’ll Learn
- How data profiling and data quality assessments, in combination with your data catalog, can increase data quality transparency, expedite root cause analysis, and close data quality monitoring gaps.
- How to leverage AI to reduce data quality monitoring configuration and maintenance time and improve accuracy.
- How defining data quality based on its measurable utility (i.e., data represents information that supports better decisions) can provide a scalable way to ensure data are fit for purpose and avoid cost outstripping return.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Automating Data Science over a Human Genomics Knowledge BaseVaticle
# Automating Data Science over a Human Genomics Knowledge Base
Radouane Oudrhiri, the CTO of Eagle Genomics, will talk about how Eagle Genomics is building a platform for automating data science over a human genomics knowledge base. Rad will dive into the architecture Eagle Genomics and also discuss how Grakn serves as the knowledge base foundation of the system. Rad also give a brief history of databases, semantic expressiveness and how Grakn fits in the big picture.
# Radouane Oudrhiri, CTO, Eagle Genomics
Radouane has an extensive experience in leading world-class software and data-intensive system developments in different industries from Telecom to Healthcare, Nuclear, Automotive, Financials. Radouane is Lean/Six Sigma Master Black Belt with speciality in high-tech, IT and Software engineering and he is recognised as the leader and early adaptor of Lean/Six Sigma and DFSS to IT and Software. He is a fellow of the Royal Statistical Society (RSS) and member of the ISO Technical Committee (TC69: Applications of Statistical methods) where he is co-author of the Lean & Six Sigma Standard (ISO 18404) as well as the new standard under development (Design for Six Sigma). He is also part of the newly formed international Group on Big Data (nominated by BSI as the UK representative/expert). Radouane has also been Chair of the working group on Measurement Systems for Automated Processes/Systems within the ISPE (International Society for Pharmaceutical Engineering).
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
**Assessing the Quality and Reliability of Data Sources in Data Analysis**
Data is often referred to as the lifeblood of data analysis. It forms the foundation upon which decisions are made, insights are drawn, and actions are taken. However, not all data is created equal. The quality and reliability of data sources are paramount to the success of data analysis efforts. In this essay, we will explore the intricate process of assessing data quality and reliability, touching on the methods, considerations, and best practices to ensure the data used in the analysis is trustworthy and fit for purpose.
Enhancing Data Quality in Clinical Trials: Best Practices and Quality Control...ClinosolIndia
Ensuring data quality is crucial in clinical trials to generate reliable and valid results. High-quality data allows for accurate analysis, interpretation, and decision-making regarding the safety and efficacy of investigational products. Here are some best practices and quality control measures to enhance data quality in clinical trials:
Standardized Data Collection: Implement standardized data collection procedures, including the use of case report forms (CRFs) or electronic data capture (EDC) systems. Clearly define data elements, variables, and measurement scales to minimize inconsistencies and errors in data entry.
Training and Education: Provide comprehensive training to investigators, site staff, and data entry personnel on the protocol, data collection procedures, and Good Clinical Practice (GCP) guidelines. Training ensures understanding and adherence to the study requirements, leading to accurate and consistent data collection.
Source Data Verification (SDV): Perform source data verification to compare data recorded in the CRFs or EDC systems with the original source documents (e.g., medical records, laboratory reports). This process helps identify discrepancies, errors, or missing data, ensuring data accuracy and integrity.
Data Management Plan: Develop a robust data management plan that outlines procedures for data collection, handling, storage, and analysis. The plan should include data validation checks, query resolution processes, and data reconciliation between different data sources.
Electronic Data Capture (EDC) Systems: Utilize EDC systems to facilitate real-time data capture, improve data accuracy, and streamline data management processes. EDC systems often have built-in data validation checks, range checks, and skip patterns to minimize data entry errors.
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
Data Quality in Test Automation: Navigating the Path to Reliable Testing" delves into the crucial role of data quality within the realm of test automation. It explores strategies and methodologies for ensuring reliable testing outcomes by addressing challenges related to the accuracy, completeness, and consistency of test data. The discussion encompasses techniques for managing, validating, and optimizing data sets to enhance the effectiveness and efficiency of automated testing processes, ultimately fostering confidence in the reliability of software systems.
Ethical Priniciples for the All Data RevolutionMelissa Moody
A presentation by Stephanie Shipp, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
Everything related to CDM. Importance of CDM, Flow Activities in Clinical Trials, Data Management Plan, Database Designing, Data Management tools, Essential Characters of the database, Standard Global Dictionaries, Data Review and Validation, Query Generation, Database Lock, Technology in CDM, and Professionals of CDM.
Similar to Data Management Lab: Session 3 Slides (20)
LITA’s Altmetrics and Digital Analytics Interest Group is proud to present Heather Coates, Richard Naples, and Lauren Collister in our second free webinar of the season. Heather will introduce the concept of altmetrics with a quick "Altmetrics 101," Richard will discuss the Smithsonian's implementation of Altmetric, and Lauren will share the University of Pittsburgh's experience with Plum Analytics.
Gather evidence to demonstrate the impact of your researchIUPUI
This workshop is the 3rd in a series of 4 titled "Maximize your impact" offered by the IUPUI University Library Center for Digital Scholarship. Faculty must provide strong evidence of impact in order to achieve promotion and tenure. Having strong evidence in year 5 is made easier by strategic dissemination early in your tenure track. In this hands-on workshop, we will introduce key sources of evidence to support your case, demonstrate strategies for gathering this evidence, and provide a variety of examples. These sources include citation metrics, article level metrics, and altmetrics as indicators of impact to support your narrative of excellence.
An introduction to open science for the Library Journal webcast Case Studies for Open Science on February 9, 2016.
http://lj.libraryjournal.com/2016/01/webcasts/case-studies-for-open-science/
Academics must provide evidence to demonstrate the impact and outcomes of their scholarly work. This webinar, presented by librarians, will help faculty explore various forms of documentary evidence to support their case for excellence. Sponsored by the IUPUI Office of Academic Affairs.
Note: The webinar included demonstrations of Web of Science & Scopus, which the slides do not reflect.
Teaching data management in a lab environment (IASSIST 2014)IUPUI
Equipping researchers with the skills to effectively utilize data in the global data ecosystem requires proficiency with data literacies and electronic resource management. This is a valuable opportunity for libraries to leverage existing expertise and infrastructure to address a significant gap data literacy education. This session will describe a workshop for developing core skills in data literacy. In light of the significant gap between common practice and effective strategies emerging from specific research communities, we incorporated elements of a lab format to build proficiency with specific strategies. The lab format is traditionally used for training procedural skills in a controlled setting, which is also appropriate for teaching many daily data management practices. The focus of the curriculum is to teach data management strategies that support data quality, transparency, and re-use. Given the variety of data formats and types used in health and social sciences research, we adopted a skills-based approach that transcends particular domains or methodologies. Attendees applied selected strategies using a combination of their own research projects and a carefully defined case study to build proficiency.
Objectives: To explore potential collaborations between academic libraries and Clinical Translational Science Award (CTSA)-funded institutes with respect to
data management training and support.
Methods: The National Institutes of Health CTSAs have established a well-funded, crucial infrastructure supporting large-scale collaborative biomedical research. This infrastructure is also valuable for smaller, more localized research projects. While infrastructure and corresponding support is often available for large, well-funded projects, these services have generally not been extended to smaller projects. This is a missed opportunity on both accounts. Academic libraries providing data services can leverage CTSA-based resources, while CTSA-funded institutes can extend their reach beyond large biomedical projectsto serve the long tail of research data.
Results: A year-long series of conversations with the Indiana CTSI Data Management Team resulted in resource sharing, consensus building about key issues in data management, provision of expert feedback on a data management training curriculum, and several avenues for future collaborations.
Conclusions:Data management training for graduate students and early career researchers is a vital area of need that would benefit from the combined infrastructure and expertise of translational science institutes and academic libraries. Such partnerships can leverage the instructional, preservation, and access expertise in academic libraries, along with the storage, security, and analytical expertise in translational science institutes to improve the management, protection, and access of valuable research data.
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
Data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated. Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. Do you know what a data sharing plan should include? Are you aware of common practices and standards for data sharing? Do you know what services are available to help share your data responsibly? This workshop will begin to address these questions. Q&A will follow the presentation. Anyone interested in or planning to apply for NIH funding should attend. Note: The NIH data-sharing policy applies to applicants seeking $500,000 or more in direct costs in any year of the proposed research.
Data Management Lab: Session 4 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Data Management Lab: Session 4 Review OutlineIUPUI
Data Management Lab: Session 4 Review Outline (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Data Management Lab: Session 3 Data Entry Best PracticesIUPUI
Data Management Lab: Session 3 Data Entry Best Practices (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Data Management Lab: Session 3 Data Coding Best PracticesIUPUI
Data Management Lab: Session 3 Data Entry Best Practices (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Spring 2014 Data Management Lab: Session 2 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
4. Data Integrity
1. Data have integrity if they have been
maintained without unauthorized alteration
or destruction
2. Data integrity is data that has a complete or
whole structure.
(http://www.princeton.edu/~achaney/tmve/
wiki100k/docs/Data_integrity.html)
5. Data Quality
• Fitness for use (depends on context of your questions)
• Data quality is the most important aspect of data
management
• Ensured by
– Sufficient resources and expertise
– Paying close attention to the design of data collection
instruments
– Creating appropriate entry, validation, and reporting processes
– Ongoing QC processes
– Understanding the data collected
Chapman, 2005
Dept of Biostatistics – Data Management, IUSM
6. Data Quality Standards
• Check data for its logical consistency.
• Check data for reasonableness.
• Ensure adherence to sound estimation methodologies.
• Ensure adherence to monetary submission standards for
stolen and recovered property.
• Ensure that other statistical edit functions are processed
within established parameters.
FBI: http://www.fbi.gov/about-us/cjis/ucr/data_quality_guidelines
Dept of Biostatistics – Data Management, IUSM
7. Data Entry and Manipulation
• Strategies for preventing errors from entering a dataset
• Activities to ensure quality of data before collection
• Activities that involve monitoring and maintaining the
quality of data during the study
8. Data Entry and Manipulation
• Define & enforce standards
◦ Formats
◦ Codes
◦ Measurement units
◦ Metadata
• Assign responsibility for data quality
◦ Be sure assigned person is educated in QA/QC
9. Quality Assurance v. Control
• QA: set of processes, procedures, and activities that
are initiated prior to data collection to ensure the
expected level of quality will be reached and data
integrity will be maintained.
• QC: a system for verifying and maintaining a desired
level of quality in a product or service.
http://c2.com/cgi/wiki?QualityAssuranceIsNotQualityC
ontrol
10. Quality Assurance in Practice
• CRF (data collection instrument) review & validation
• System/process testing & validation
• Training, education, communication of a team
• Standard Operating Procedures, Standard Operating
Guidelines
• Site audits
Dept of Biostatistics – Data Management, IUSM
11. Quality Control in Practice
• Set of processes, procedures, and activities
associated with monitoring, detection, and action
during and after data collection.
• Examples:
– Errors in individual data fields
– Systematic errors
– Violation of protocol
– Staff performance issues
– Fraud or scientific misconduct
Dept of Biostatistics – Data Management, IUSM
12. Activity
Define data quality standards for the following
variables:
• Age
• Height
• BMI
• Life satisfaction scale
• Number of close friends
Don’t forget to upload this to Box.
Suggested file name “Data Quality Standards”
13. References
1. Department of Biostatistics – Data Management Team, Indiana
University School of Medicine (2013). Data Management including
REDCap. (provided via email)
2. Chapman, A. D. 2005. Principles of Data Quality, version 1.0. Report for
the Global Biodiversity Information Facility, Copenhagen. ISBN 87-92020-
03-8. http://www.gbif.org/resources/2829
3. DataONE Education Module: Data Quality Control and Assurance.
DataONE. From http://www.dataone.org/sites/all/documents
/L05_DataQualityControlAssurance.pptx
18. Activity
Draft data collection instrument
See document “DataMgmtLab-Spr14-
CollectionCodingEntry_EX“
Don’t forget to upload this to Box.
Suggested file name “Data Collection Tool”
19. References
1. Brosh. A. 2010. Boyfriend doesn’t have ebola. Probably.
http://hyperboleandahalf.blogspot.com/2010/02/boyfriend-doesnt-
have-ebola-probably.html
23. Goals of Data Entry
• Publishable results!
– Valid data that are organized to support smooth
analysis
• Easy to import into analytical program
• Minimize manipulations and errors
• Has a logical [data] structure
24.
25. Activity
Draft data coding scheme for data
entry
• Review data entry best practices
document in Box
Don’t forget to upload this to Box.
Suggested file name “Coding Scheme”
26. References
1. DataONE Education Module: Data Entry and Manipulation. DataONE.
From http://www.dataone.org/sites/all/documents/
L04_DataEntryManipulation.pptx
2. Tilmes, C. (2011). Data Management 101 for the Earth Scientist
presented at the AGU Workshop. From
http://wiki.esipfed.org/index.php/2011AGUworkshop
3. Scott, T. (2012). Guidelines to Data Collection and Data Entry, Vanderbilt
CRC Research Skills Workshop Series. From
http://www.mc.vanderbilt.edu/gcrc/workshop_files/2012-09-07.pdf
29. Data Entry and Manipulation
Data Contamination
• Process or phenomenon, other than the one of interest,
that affects the variable value
• Erroneous values
CCimagebyMichaelCoghlanonFlickr
30. Data Entry and Manipulation
• Errors of Commission
o Incorrect or inaccurate data entered
o Examples: malfunctioning instrument, mistyped data
• Errors of Omission
o Data or metadata not recorded
o Examples: inadequate documentation, human error, anomalies in the
field
CCimagebyNickJWebbonFlickr
31. Data Entry and Manipulation
• Double entry
◦ Data keyed in by two independent people
◦ Check for agreement with computer verification
• Record a reading of the data and transcribe from the
recording
• Use text-to-speech program to read data back
CCimagebyweskrieselonFlickr
32. Data Entry and Manipulation
• Design data storage well
◦ Minimize number of times items that must be entered repeatedly
◦ Use consistent terminology
◦ Atomize data: one cell per piece of information
• Document changes to data
◦ Avoids duplicate error checking
◦ Allows undo if necessary
33. Data Entry and Manipulation
• Make sure data line up in proper columns
• No missing, impossible, or anomalous values
• Perform statistical summaries
CCimagebychesapeakeclimateonFlickr
34. Data Entry and Manipulation
• Look for outliers
◦ Outliers are extreme values for a variable given the statistical model
being used
◦ The goal is not to eliminate outliers but to identify potential data
contamination
0
10
20
30
40
50
60
0 5 10 15 20 25 30 35
35. Data Entry and Manipulation
• Methods to look for outliers
◦ Graphical
• Normal probability plots
• Regression
• Scatter plots
◦ Maps
◦ Subtract values from mean
36. Data Entry and Manipulation
• Data contamination is data that results from a factor not
examined by the study that results in altered data values
• Data error types: commission or omission
• Quality assurance and quality control are strategies for
◦ preventing errors from entering a dataset
◦ ensuring data quality for entered data
◦ monitoring, and maintaining data quality throughout the project
• Identify and enforce quality assurance and quality control
measures throughout the Data Life Cycle
37. Discussion
Using the Data Review Checklist,
evaluate the HBSC codebook
“DataMgmtLab-Spr14_DataReviewChecklist_EX”
What screening & cleaning procedures
were used?
38. Data Entry and Manipulation
1. D. Edwards, in Ecological Data: Design, Management and Processing,
WK Michener and JW Brunt, Eds. (Blackwell, New York, 2000), pp. 70-
91. Available at www.ecoinformatics.org/pubs
2. R. B. Cook, R. J. Olson, P. Kanciruk, L. A. Hook, Best practices for
preparing ecological data sets to share and archive. Bull. Ecol. Soc.
Amer. 82, 138-141 (2001).
3. A. D. Chapman, “Principles of Data Quality:. Report for the Global
Biodiversity Information Facility” (Global Biodiversity Information
Facility, Copenhagen, 2004). Available at
http://www.gbif.org/communications/resources/print-and-online-
resources/download-publications/bookelets/
39. References
1. Cook, 2013, NACP Best Data Management Practices Workshop. From
http://daac.ornl.gov/NACP_AIM_2013/04_data_management_cook_201
3.02.03.ppt
2. Simmhan, Y. L., Plale, B., & Gannon, D. (2005). A survey of data
provenance in e-Science. SIGMOD Record, 34(3), 31-36. From
http://www.sigmod.org/publications/sigmod-record/0509/p31-special-
sw-section-5.pdf
3. Ram, S. (2012). Emerging Role of Social Media in Data Sharing and
Management. From http://www.slideshare.net/INSITEUA/provenance-
management-to-enable-data-sharing
42. Choose your tools wisely
• Documents
• Excel
• Access
• SPSS, Minitab
• Mathematica, MATLAB, Scilab
• SAS, Stata
• R
• MapReduce
• NVivo, Atlas.ti, Dedoose, HyperRESEARCH, etc.
http://www.dataone.org/all-software-tools
43. Data Formats; Version 1.0
Overview
• Spreadsheets are amazingly flexible, and are commonly
used for data collection, analysis and management
• Spreadsheets are seldom self-documenting, and seldom
well-documented
• Subtle (and not so subtle) errors are easily introduced
during entry, manipulation and analysis
• Spreadsheet conventions – often ad hoc and evolutionary –
may change or be applied inconsistently
• Spreadsheet file formats are proprietary and thus generally
unacceptable as long term archival purposes
44. Data Entry and Manipulation
• Great for charts, graphs,
calculations
• Flexible about cell content
type—cells in same column
can contain numbers or text
• Lack record integrity--can
sort a column independently
of all others)
• Easy to use – but harder to
maintain as complexity and
size of data grows
• Easy to query to select
portions of data
• Data fields are typed – For
example, only integers are
allowed in integer fields
• Columns cannot be sorted
independently of each other
• Steeper learning curve than
a spreadsheet
45. NACP Best Data Management Practices, February 3, 2013
5. Preserve information (cont)
• Use a scripted language to process data
– R Statistical package (free, powerful)
– SAS
– MATLAB
• Processing scripts are records of processing
– Scripts can be revised, rerun
• Graphical User Interface-based analyses may
seem easy, but don’t leave a record
45
46. Provenance, Audit Trails, etc.
• “…information that helps determine the
derivation history of a data product, starting from
its original sources.” (Simmhan et al, 2005)
– Ancestral data products from which the data evolved
– Process of transformation of these ancestral data
products
• Uses: data quality, audit trail, replication recipe,
attribution, informational
47. More Considerations
• Field names & descriptions
• Structured entry
• Validation
• Record integrity
• Missing data
• Data/field types
• File types: common, open documented standard
• Output required for analysis and visualization
48. Demonstration & Discussion
Run [analysis] in Excel and Stata.
Compare output.
• What features does Stata have that Excel
does not?
• How do these features support
provenance and data integrity?
49. References
1. DataONE Education Module: Data Entry and Manipulation. DataONE.
From http://www.dataone.org/sites/all/documents/
L04_DataEntryManipulation.pptx