SlideShare a Scribd company logo
#BeyondTheBench	
  
#BECareer2013	
  
#CurrentExchange	
  

ORGANIZERS:

SPONSORS:
ur
yo
ing ce
lish sen il
tab pre oukhal
Es ne
b
er t A
Ro b
nli
o
?
Why

You’r
e

bein

g Go

ogle

d
inke

#1: L

dIn
Why LinkedIn?
•  Online CV + networking
•  Recruiters use LinkedIn
•  Find jobs posted on LinkedIn
•  Apply to jobs
www.linkedin.com/pub/robert-aboukhalil/84/a648df/
#2: F
aceb
o

ok
#3: T

witte
r
#4: Y
our w
ebsi
te
Step 1: Wordpress.com
Step 1: Wordpress.com
Step 2: themeforest.net
Step 2: themeforest.net
Step 3: Have an awesome portfolio
Now

what
?
A language all scientists should know
How R helped me look at billions of genotypes and how it can
help you too
Mitchell Bekritsky
WSBS Graduate Student
What is R?
•  Language for statistical
analysis, data manipulation
and graphics
•  Open source
•  Flexible language
•  Powerful built-in functions
•  Strong user community
•  Publication quality graphs
•  Free!

Graphic	
  from	
  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html	
  
Who uses R?

Source:	
  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php	
  
What is R used for?
•  Movie recommendations

•  Clinical drug development

•  Credit risk analysis

•  News graphics

•  Tailoring online advertising

•  Modeling oil spills

•  Predicting economic activity

•  Predicting election outcomes

Graphic	
  from	
  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html	
  
But I’m a biologist…
How R helped me see my data
•  First time looking at microsatellite genotypes
•  How many microsatellites differ from reference genome?
•  By how much?
Problems:
–  Lots of data (4.7 million genotypes)
–  Complex information
–  Too big for Excel
–  No good graphics in Excel either
One of my first graphs in R
Lessons learned about my data
•  Lots of microsatellites differ
from reference by a little bit
•  Thousands differ by ± 20 bp
•  8.27% of all microsatellites
differ from reference (~400k)
Lessons learned about my graph
•  This is a terrible graph
A bad R graph is better than no R graph
Bad graphs helped me
•  Understand my data better
•  Improve my analyses
•  Improve how I communicate
my data
•  R has incredible flexibility for
graphing—if you can dream it,
you can probably build it
A bad R graph is better than no R graph
Bad graphs helped me
•  Understand my data better
•  Improve my analyses
•  Improve how I communicate
my data
•  R has incredible flexibility for
graphing—if you can dream it,
you can probably build it

My best R graphs make one point clearly without clutter
For example…
How R saved my thesis
•  Processing lots of sequencing
data in hundreds of people
•  Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
•  After data processing did data
look bi-allelic?
How R saved my thesis
•  Processing lots of sequencing
data in hundreds of people
•  Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
•  After data processing did data
look bi-allelic?

No!!	
  
Troubleshooting using R
•  People don’t actually have massive deletions and amplifications
•  My pipeline was deleting files because of a bug, which would
remove large chunks of chromosomes
•  Thanks to R, I found people where this had happened, tracked
down the bug, and didn’t report massive CNVs in autistic children
Side note
•  If it looks too good to be true, it probably is
R helped me build a better genotyper
•  Some non-reference alleles
aren’t covered well
•  Leads to incorrect genotype
calls
Problem
•  How do I develop a smarter
genotyper and know that it
works?
R helped me build a better genotyper
•  Some non-reference alleles
chr19:54772760 A repeat, reference length 8

aren’t covered well

Genotypes
100

•  Leads to incorrect genotype

works?

60
40
20
0

genotyper and know that it

10 bp allele coverage

•  How do I develop a smarter

80

calls
Problem

10|-1
10|10
8|-1
8|10
8|8

0

20

40

60

8 bp allele coverage

80

100
Modeling genotypes in R
•  Built a model for biased
genotypes in R
•  Model helped me build a more
accurate genotyper
•  When applied to real data,
clear improvements
R finds de novo mutations for me
•  >300 million genotypes
•  How do I find de novo mutations in all that data?

R to the rescue!
What R has done for me
Data mining
• 

Finding de novo mutations

• 

Quality control for my data

Data manipulation
• 

Converting raw read counts to genotypes

Data simulation and modeling
• 

Finding ways to improve my genotyper

Data visualization
R has extensive support for biologists
Bioconductor is an incredible resource for biological analyses in R
•  Microarrays
•  Differential expression (DESeq, edgeR, cummeRbund)
•  Gene models
•  Flow cytometry (flowCore, flowStats, flowViz)
•  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
Installing R
•  R can be downloaded from rproject.org
•  R runs on PCs, Macs and
Linux computers
•  The R project website has an
R manual to get you started
Working in R
Native R interface can be hard to
work with
•  Lots of windows
•  Difficult to keep things
organized
RStudio interface
•  All your variables, help pages,
script windows and consoles
in one place
•  Highlights R code for easier
programming
•  Tabbed windows for multiple
scripts
•  History saves all previous
commands, plot history saves
all previous plots
•  Find it at rstudio.com
Learning R
Many online tutorials
•  R has its own introduction
•  Statistics Using R with Biological Examples
Take interesting data, use it to explore R
•  Plot, graph, use statistical tests
Ask someone who knows R
•  Getting started is pretty easy
•  Learn what you need when you need it
Thanks!!
The Bioscience Entreprise Club is dedicated to helping CSHL’s science research
professionals and alumni cultivate and leverage their cross-disciplinary skill sets and
expertise to transition into diverse careers.
Current Exchange is CSHL’s very own student-run magazine. We feature articles about
science aimed at a general audience. Check out our inaugural issue at issuu.com/
currentexchange
Send your articles to raboukha@cshl.edu by November 5, 2013	
  

More Related Content

Viewers also liked

Blog
BlogBlog
Research task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadResearch task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spread
asmediae15
 
Segunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaSegunda parte movilidad no motorizada
Segunda parte movilidad no motorizada
Rodolfo Moran
 
Claroline
ClarolineClaroline
L’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechL’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTech
Frédérique Bordignon
 
в большинстве случаев
в большинстве случаевв большинстве случаев
в большинстве случаев
yogatherapia
 
Vmware desktop infrastructure virtualization assessment
Vmware  desktop infrastructure virtualization assessmentVmware  desktop infrastructure virtualization assessment
Vmware desktop infrastructure virtualization assessment
solarisyougood
 
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваВізитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Victor Kravtsov
 
La litosfera
La litosferaLa litosfera
La litosfera
Armando López
 
Chipping away at healthcare special interests yet
Chipping away at healthcare special interests yetChipping away at healthcare special interests yet
Chipping away at healthcare special interests yet
Wayne Caswell
 
Toxicology-History
Toxicology-HistoryToxicology-History
Toxicology-History
tmondol
 
Modelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesModelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoes
acajado
 

Viewers also liked (13)

Blog
BlogBlog
Blog
 
Research task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadResearch task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spread
 
Segunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaSegunda parte movilidad no motorizada
Segunda parte movilidad no motorizada
 
Krishnan Kameshwaran-Resume_
Krishnan Kameshwaran-Resume_Krishnan Kameshwaran-Resume_
Krishnan Kameshwaran-Resume_
 
Claroline
ClarolineClaroline
Claroline
 
L’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechL’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTech
 
в большинстве случаев
в большинстве случаевв большинстве случаев
в большинстве случаев
 
Vmware desktop infrastructure virtualization assessment
Vmware  desktop infrastructure virtualization assessmentVmware  desktop infrastructure virtualization assessment
Vmware desktop infrastructure virtualization assessment
 
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваВізитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
 
La litosfera
La litosferaLa litosfera
La litosfera
 
Chipping away at healthcare special interests yet
Chipping away at healthcare special interests yetChipping away at healthcare special interests yet
Chipping away at healthcare special interests yet
 
Toxicology-History
Toxicology-HistoryToxicology-History
Toxicology-History
 
Modelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesModelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoes
 

Similar to Beyond The Bench Workshops

Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Rehgan Avon
 
Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...
kristgen
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
Peter Skomoroch
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
Jen Stirrup
 
Hofstra University - Overview of Big Data
Hofstra University - Overview of Big DataHofstra University - Overview of Big Data
Hofstra University - Overview of Big Data
sarasioux
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
Adel Rahimi
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
philippbayer
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
Thinkful
 
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
Yuan Chuan Kee
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
Edward Baker
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Shawn Day
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
WGBH Media Library and Archives
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa Wahedi
UNICORNS IN TECH
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
Korkrid Akepanidtaworn
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
Miguel Pastor
 

Similar to Beyond The Bench Workshops (20)

Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Hofstra University - Overview of Big Data
Hofstra University - Overview of Big DataHofstra University - Overview of Big Data
Hofstra University - Overview of Big Data
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
2013 arizona-swc
2013 arizona-swc2013 arizona-swc
2013 arizona-swc
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa Wahedi
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Beyond The Bench Workshops

  • 2.
  • 3. ur yo ing ce lish sen il tab pre oukhal Es ne b er t A Ro b nli o
  • 6. Why LinkedIn? •  Online CV + networking •  Recruiters use LinkedIn •  Find jobs posted on LinkedIn •  Apply to jobs
  • 8.
  • 10.
  • 17. Step 3: Have an awesome portfolio
  • 19.
  • 20.
  • 21. A language all scientists should know How R helped me look at billions of genotypes and how it can help you too Mitchell Bekritsky WSBS Graduate Student
  • 22. What is R? •  Language for statistical analysis, data manipulation and graphics •  Open source •  Flexible language •  Powerful built-in functions •  Strong user community •  Publication quality graphs •  Free! Graphic  from  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html  
  • 23.
  • 24. Who uses R? Source:  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php  
  • 25. What is R used for? •  Movie recommendations •  Clinical drug development •  Credit risk analysis •  News graphics •  Tailoring online advertising •  Modeling oil spills •  Predicting economic activity •  Predicting election outcomes Graphic  from  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html  
  • 26. But I’m a biologist…
  • 27. How R helped me see my data •  First time looking at microsatellite genotypes •  How many microsatellites differ from reference genome? •  By how much? Problems: –  Lots of data (4.7 million genotypes) –  Complex information –  Too big for Excel –  No good graphics in Excel either
  • 28. One of my first graphs in R Lessons learned about my data •  Lots of microsatellites differ from reference by a little bit •  Thousands differ by ± 20 bp •  8.27% of all microsatellites differ from reference (~400k) Lessons learned about my graph •  This is a terrible graph
  • 29. A bad R graph is better than no R graph Bad graphs helped me •  Understand my data better •  Improve my analyses •  Improve how I communicate my data •  R has incredible flexibility for graphing—if you can dream it, you can probably build it
  • 30. A bad R graph is better than no R graph Bad graphs helped me •  Understand my data better •  Improve my analyses •  Improve how I communicate my data •  R has incredible flexibility for graphing—if you can dream it, you can probably build it My best R graphs make one point clearly without clutter
  • 32. How R saved my thesis •  Processing lots of sequencing data in hundreds of people •  Too many people and processes to monitor all steps of pipeline by eye while data was being processed Sanity check •  After data processing did data look bi-allelic?
  • 33. How R saved my thesis •  Processing lots of sequencing data in hundreds of people •  Too many people and processes to monitor all steps of pipeline by eye while data was being processed Sanity check •  After data processing did data look bi-allelic? No!!  
  • 34. Troubleshooting using R •  People don’t actually have massive deletions and amplifications •  My pipeline was deleting files because of a bug, which would remove large chunks of chromosomes •  Thanks to R, I found people where this had happened, tracked down the bug, and didn’t report massive CNVs in autistic children Side note •  If it looks too good to be true, it probably is
  • 35. R helped me build a better genotyper •  Some non-reference alleles aren’t covered well •  Leads to incorrect genotype calls Problem •  How do I develop a smarter genotyper and know that it works?
  • 36. R helped me build a better genotyper •  Some non-reference alleles chr19:54772760 A repeat, reference length 8 aren’t covered well Genotypes 100 •  Leads to incorrect genotype works? 60 40 20 0 genotyper and know that it 10 bp allele coverage •  How do I develop a smarter 80 calls Problem 10|-1 10|10 8|-1 8|10 8|8 0 20 40 60 8 bp allele coverage 80 100
  • 37. Modeling genotypes in R •  Built a model for biased genotypes in R •  Model helped me build a more accurate genotyper •  When applied to real data, clear improvements
  • 38. R finds de novo mutations for me •  >300 million genotypes •  How do I find de novo mutations in all that data? R to the rescue!
  • 39. What R has done for me Data mining •  Finding de novo mutations •  Quality control for my data Data manipulation •  Converting raw read counts to genotypes Data simulation and modeling •  Finding ways to improve my genotyper Data visualization
  • 40. R has extensive support for biologists Bioconductor is an incredible resource for biological analyses in R •  Microarrays •  Differential expression (DESeq, edgeR, cummeRbund) •  Gene models •  Flow cytometry (flowCore, flowStats, flowViz) •  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
  • 41. Installing R •  R can be downloaded from rproject.org •  R runs on PCs, Macs and Linux computers •  The R project website has an R manual to get you started
  • 42. Working in R Native R interface can be hard to work with •  Lots of windows •  Difficult to keep things organized
  • 43. RStudio interface •  All your variables, help pages, script windows and consoles in one place •  Highlights R code for easier programming •  Tabbed windows for multiple scripts •  History saves all previous commands, plot history saves all previous plots •  Find it at rstudio.com
  • 44. Learning R Many online tutorials •  R has its own introduction •  Statistics Using R with Biological Examples Take interesting data, use it to explore R •  Plot, graph, use statistical tests Ask someone who knows R •  Getting started is pretty easy •  Learn what you need when you need it
  • 46.
  • 47. The Bioscience Entreprise Club is dedicated to helping CSHL’s science research professionals and alumni cultivate and leverage their cross-disciplinary skill sets and expertise to transition into diverse careers.
  • 48. Current Exchange is CSHL’s very own student-run magazine. We feature articles about science aimed at a general audience. Check out our inaugural issue at issuu.com/ currentexchange Send your articles to raboukha@cshl.edu by November 5, 2013