Submit Search
Upload
Computing Workflows for Biologists: An Overview
•
0 likes
•
266 views
T
tracykteal
Follow
Talk based on paper from Shade and Teal on Computing Workflows for Biologists: A Roadmap
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 20
Download now
Download to read offline
Recommended
Responsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
Data processing and analysis final
Data processing and analysis final
Akul10
Database Engine
Database Engine
prashanthbabu07
Introduction to Data Science
Introduction to Data Science
Caserta
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
Technical Presentation
Technical Presentation
Naito Watanabe
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
MLconf
Data Structure Assignment Help
Data Structure Assignment Help
Lesa Cote
Recommended
Responsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
Data processing and analysis final
Data processing and analysis final
Akul10
Database Engine
Database Engine
prashanthbabu07
Introduction to Data Science
Introduction to Data Science
Caserta
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
Duncan Hull
Technical Presentation
Technical Presentation
Naito Watanabe
Josh Wills, MLconf 2013
Josh Wills, MLconf 2013
MLconf
Data Structure Assignment Help
Data Structure Assignment Help
Lesa Cote
Lecture 1 introduction
Lecture 1 introduction
Abirami A
The Data Analysis Workflow
The Data Analysis Workflow
JonathanEarley3
Presentation on data preparation with pandas
Presentation on data preparation with pandas
AkshitaKanther
Machine Learning using Big data
Machine Learning using Big data
Vaibhav Kurkute
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
Machine Learning in the age of Big Data
Machine Learning in the age of Big Data
Daniel Sârbe
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
Data preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
charlslabarda
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
Data Management: Tips & Tools
Data Management: Tips & Tools
Stephanie Wright
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
Building Data Scientists
Building Data Scientists
Mitch Sanders
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD
Data Management for librarians
Data Management for librarians
C. Tobin Magle
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
The Art of Requesting Data from IT
The Art of Requesting Data from IT
Brad Adams
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
lyarmey
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
Introduction to Bioinformatics
Introduction to Bioinformatics
Leighton Pritchard
Best practices data collection
Best practices data collection
Sherry Lake
More Related Content
What's hot
Lecture 1 introduction
Lecture 1 introduction
Abirami A
The Data Analysis Workflow
The Data Analysis Workflow
JonathanEarley3
Presentation on data preparation with pandas
Presentation on data preparation with pandas
AkshitaKanther
Machine Learning using Big data
Machine Learning using Big data
Vaibhav Kurkute
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
Machine Learning in the age of Big Data
Machine Learning in the age of Big Data
Daniel Sârbe
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
What's hot
(7)
Lecture 1 introduction
Lecture 1 introduction
The Data Analysis Workflow
The Data Analysis Workflow
Presentation on data preparation with pandas
Presentation on data preparation with pandas
Machine Learning using Big data
Machine Learning using Big data
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Machine Learning in the age of Big Data
Machine Learning in the age of Big Data
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Similar to Computing Workflows for Biologists: An Overview
Data preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
charlslabarda
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
Data Management: Tips & Tools
Data Management: Tips & Tools
Stephanie Wright
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
Building Data Scientists
Building Data Scientists
Mitch Sanders
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD
Data Management for librarians
Data Management for librarians
C. Tobin Magle
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Maninda Edirisooriya
The Art of Requesting Data from IT
The Art of Requesting Data from IT
Brad Adams
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
lyarmey
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
Introduction to Bioinformatics
Introduction to Bioinformatics
Leighton Pritchard
Best practices data collection
Best practices data collection
Sherry Lake
Datascience methodology
Datascience methodology
ArunakumariAkula1
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
Carly Strasser
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
Hadoop for Data Science
Hadoop for Data Science
Donald Miner
Similar to Computing Workflows for Biologists: An Overview
(20)
Data preprocessing using Machine Learning
Data preprocessing using Machine Learning
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Data Management: Tips & Tools
Data Management: Tips & Tools
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Building Data Scientists
Building Data Scientists
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Management for librarians
Data Management for librarians
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
The Art of Requesting Data from IT
The Art of Requesting Data from IT
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Introduction to Bioinformatics
Introduction to Bioinformatics
Best practices data collection
Best practices data collection
Datascience methodology
Datascience methodology
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Hadoop for Data Science
Hadoop for Data Science
More from tracykteal
Carpentries nasa meeting_2018-10-30
Carpentries nasa meeting_2018-10-30
tracykteal
Data and Software Carpentry Science Gateways webinar 2017-05-10
Data and Software Carpentry Science Gateways webinar 2017-05-10
tracykteal
Data Carpentry NSBE Informational Webinar
Data Carpentry NSBE Informational Webinar
tracykteal
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24
tracykteal
Data carpentry run-a-workshop
Data carpentry run-a-workshop
tracykteal
Data carpentry instructor-onboarding
Data carpentry instructor-onboarding
tracykteal
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
tracykteal
More from tracykteal
(7)
Carpentries nasa meeting_2018-10-30
Carpentries nasa meeting_2018-10-30
Data and Software Carpentry Science Gateways webinar 2017-05-10
Data and Software Carpentry Science Gateways webinar 2017-05-10
Data Carpentry NSBE Informational Webinar
Data Carpentry NSBE Informational Webinar
Data carpentry replicathon_2017-03-24
Data carpentry replicathon_2017-03-24
Data carpentry run-a-workshop
Data carpentry run-a-workshop
Data carpentry instructor-onboarding
Data carpentry instructor-onboarding
Data carpentry ndic-2015-05-05
Data carpentry ndic-2015-05-05
Recently uploaded
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
Lars Albertsson
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Boston Institute of Analytics
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
Neil Barnes
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
ffjhghh
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Suhani Kapoor
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
Stephen266013
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
sapnasaifi408
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
atducpo
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Social Samosa
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
TanveerAhmed817946
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
makika9823
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Suhani Kapoor
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Boston Institute of Analytics
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
dajasot375
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Rachmat Ramadhan H
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
Anupama Kate
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Sapana Sha
Recently uploaded
(20)
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Computing Workflows for Biologists: An Overview
1.
Compu&ng Workflows for Biologists Based on: Shade & Teal, Compu&ng Workflows for Biologists: A Roadmap, PLOS Biology Data Carpentry data organiza&on lessons
2.
• How many people here plan to analyze data with a computer in their work? • Are you working with other people on this analysis? •
Do other people need to understand your analysis? • Do you need to remember and understand your analysis?
3.
Elements of compu&ng • How data was generated (metadata) • Data •
Data cleaning steps • Data analysis steps • Final plots and charts
4.
Data! • Keep raw data raw • Use meaningful names •
Organize your data so computers can read it
5.
Keep raw data raw • What is raw data? • Why should I leave it alone?
6.
Use meaningful names
7.
Organize your data so computers can read it (let’s talk about spreadsheets) hTp://www.datacarpentry.org/spreadsheet-ecology-lesson/00-intro.html … also avoid formaZng errors
8.
Organizing data in spreadsheets The cardinal rules of using spreadsheet programs for data: • Put all your variables in columns - the thing you're measuring, like 'weight' or 'temperature'. • Put each observa/on in its own row. •
Don't combine mul/ple pieces of informa/on in one cell. Some&mes it just seems like one thing, but think if that's the only way you'll want to be able to use or sort that data. • Leave the raw data raw - don't mess with it! • Export the cleaned data to a text based format like CSV. This ensures that anyone can use the data, and is the format required by most data repositories.
9.
10.
FormaZng problems hTp://www.datacarpentry.org/spreadsheet- ecology-lesson/02-common-mistakes.html
11.
A Roadmap for the Compu&ng Biologist • Consider the overarching goals of the analysis • Adopt an Itera&ve, Branching PaTern to Systema&cally Explore Op&ons •
Reproducibility Checkpoints • Taking Notes for Computa&onal Analysis • Shared Responsibility: The Team Approach to Reproducibility and Data Management Shade and Teal, Compu&ng Workflows for Biologists: A Roadmap hTp://journals.plos.org/plosbiology/ar&cle?id=10.1371/journal.pbio.1002303
12.
Consider the Overarching Goals of the Analysis • Working to address a given hypothesis will mo&vate different analysis strategies than conduc&ng data explora&on
13.
Reproducibility Checkpoints Reproducibility checkpoints are places in a workflow devoted to scru&nizing its integrity - the workflow (or step in the workflow) can be seamlessly used (it doesn’t crash halfway or return error messages) - the outcomes are consistent and validated across mul&ple, iden&cal itera&ons -
results should make biological sense
14.
Adopt an Itera/ve, Branching PaFern to Systema/cally Explore Op/ons
15.
Taking Notes for Computa/onal Analysis • Take notes like you would for experimental work • Comment code •
Use version control (Github/Gitlab)
16.
What needs to go in notes: - Soiware versions used - Descrip&on of what the soiware is doing/goal of that step -
Brief notes on devia&ons from default op&ons - Workflows can include different soiware (e.g., PANDAseq to QIIME to R), and should also include all “formaZng steps” needed to move between tools hopefully you don’t need to manually format too much; avoid if possible
17.
Shared Responsibility: The Team Approach to Reproducibility and Data Management We posit that integrity in computa&onal analysis of biological data is enhanced if there is a sense of shared responsibility for ensuring reproducible workflows. Research teams that work together to develop and debug code, perform internal reproducibility checkpoints for each other, and generally hold one another accountable for high-quality results likely will enjoy a low manuscript retrac&on rate, high level of confidence in their results, and strong sense of collabora&on. You, your lab mates and PI need to value the &me it takes to do analyses reproducibly and correctly
18.
Shared responsibility • Shared storage and workspace can facilitate access to all group data • Using version control repositories can provide access to code and documenta&on (Github, Dropbox) •
SeZng expecta&ons for ‘reproducibility checkpoints’ (team “hackathons”: open-computer group mee&ngs dedicated to analysis) • Paper reviews • Looking for help/support outside the lab (bioinforma&cs or user groups, office hours, StackOverflow)
19.
Looking for help hTps://github.com/mblmicdiv/course2016/ blob/master/bioinfo-resources.md You are not alone Survey responses
20.
Exercise hFp:///nyurl.com/mbl-workflows
Download now