SlideShare a Scribd company logo
1 of 48
Download to read offline
#BeyondTheBench	
  
#BECareer2013	
  
#CurrentExchange	
  

ORGANIZERS:

SPONSORS:
ur
yo
ing ce
lish sen il
tab pre oukhal
Es ne
b
er t A
Ro b
nli
o
?
Why

You’r
e

bein

g Go

ogle

d
inke

#1: L

dIn
Why LinkedIn?
•  Online CV + networking
•  Recruiters use LinkedIn
•  Find jobs posted on LinkedIn
•  Apply to jobs
www.linkedin.com/pub/robert-aboukhalil/84/a648df/
#2: F
aceb
o

ok
#3: T

witte
r
#4: Y
our w
ebsi
te
Step 1: Wordpress.com
Step 1: Wordpress.com
Step 2: themeforest.net
Step 2: themeforest.net
Step 3: Have an awesome portfolio
Now

what
?
A language all scientists should know
How R helped me look at billions of genotypes and how it can
help you too
Mitchell Bekritsky
WSBS Graduate Student
What is R?
•  Language for statistical
analysis, data manipulation
and graphics
•  Open source
•  Flexible language
•  Powerful built-in functions
•  Strong user community
•  Publication quality graphs
•  Free!

Graphic	
  from	
  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html	
  
Who uses R?

Source:	
  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php	
  
What is R used for?
•  Movie recommendations

•  Clinical drug development

•  Credit risk analysis

•  News graphics

•  Tailoring online advertising

•  Modeling oil spills

•  Predicting economic activity

•  Predicting election outcomes

Graphic	
  from	
  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html	
  
But I’m a biologist…
How R helped me see my data
•  First time looking at microsatellite genotypes
•  How many microsatellites differ from reference genome?
•  By how much?
Problems:
–  Lots of data (4.7 million genotypes)
–  Complex information
–  Too big for Excel
–  No good graphics in Excel either
One of my first graphs in R
Lessons learned about my data
•  Lots of microsatellites differ
from reference by a little bit
•  Thousands differ by ± 20 bp
•  8.27% of all microsatellites
differ from reference (~400k)
Lessons learned about my graph
•  This is a terrible graph
A bad R graph is better than no R graph
Bad graphs helped me
•  Understand my data better
•  Improve my analyses
•  Improve how I communicate
my data
•  R has incredible flexibility for
graphing—if you can dream it,
you can probably build it
A bad R graph is better than no R graph
Bad graphs helped me
•  Understand my data better
•  Improve my analyses
•  Improve how I communicate
my data
•  R has incredible flexibility for
graphing—if you can dream it,
you can probably build it

My best R graphs make one point clearly without clutter
For example…
How R saved my thesis
•  Processing lots of sequencing
data in hundreds of people
•  Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
•  After data processing did data
look bi-allelic?
How R saved my thesis
•  Processing lots of sequencing
data in hundreds of people
•  Too many people and
processes to monitor all steps
of pipeline by eye while data
was being processed
Sanity check
•  After data processing did data
look bi-allelic?

No!!	
  
Troubleshooting using R
•  People don’t actually have massive deletions and amplifications
•  My pipeline was deleting files because of a bug, which would
remove large chunks of chromosomes
•  Thanks to R, I found people where this had happened, tracked
down the bug, and didn’t report massive CNVs in autistic children
Side note
•  If it looks too good to be true, it probably is
R helped me build a better genotyper
•  Some non-reference alleles
aren’t covered well
•  Leads to incorrect genotype
calls
Problem
•  How do I develop a smarter
genotyper and know that it
works?
R helped me build a better genotyper
•  Some non-reference alleles
chr19:54772760 A repeat, reference length 8

aren’t covered well

Genotypes
100

•  Leads to incorrect genotype

works?

60
40
20
0

genotyper and know that it

10 bp allele coverage

•  How do I develop a smarter

80

calls
Problem

10|-1
10|10
8|-1
8|10
8|8

0

20

40

60

8 bp allele coverage

80

100
Modeling genotypes in R
•  Built a model for biased
genotypes in R
•  Model helped me build a more
accurate genotyper
•  When applied to real data,
clear improvements
R finds de novo mutations for me
•  >300 million genotypes
•  How do I find de novo mutations in all that data?

R to the rescue!
What R has done for me
Data mining
• 

Finding de novo mutations

• 

Quality control for my data

Data manipulation
• 

Converting raw read counts to genotypes

Data simulation and modeling
• 

Finding ways to improve my genotyper

Data visualization
R has extensive support for biologists
Bioconductor is an incredible resource for biological analyses in R
•  Microarrays
•  Differential expression (DESeq, edgeR, cummeRbund)
•  Gene models
•  Flow cytometry (flowCore, flowStats, flowViz)
•  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
Installing R
•  R can be downloaded from rproject.org
•  R runs on PCs, Macs and
Linux computers
•  The R project website has an
R manual to get you started
Working in R
Native R interface can be hard to
work with
•  Lots of windows
•  Difficult to keep things
organized
RStudio interface
•  All your variables, help pages,
script windows and consoles
in one place
•  Highlights R code for easier
programming
•  Tabbed windows for multiple
scripts
•  History saves all previous
commands, plot history saves
all previous plots
•  Find it at rstudio.com
Learning R
Many online tutorials
•  R has its own introduction
•  Statistics Using R with Biological Examples
Take interesting data, use it to explore R
•  Plot, graph, use statistical tests
Ask someone who knows R
•  Getting started is pretty easy
•  Learn what you need when you need it
Thanks!!
The Bioscience Entreprise Club is dedicated to helping CSHL’s science research
professionals and alumni cultivate and leverage their cross-disciplinary skill sets and
expertise to transition into diverse careers.
Current Exchange is CSHL’s very own student-run magazine. We feature articles about
science aimed at a general audience. Check out our inaugural issue at issuu.com/
currentexchange
Send your articles to raboukha@cshl.edu by November 5, 2013	
  

More Related Content

Viewers also liked

Research task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadResearch task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadasmediae15
 
Segunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaSegunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaRodolfo Moran
 
L’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechL’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechFrédérique Bordignon
 
в большинстве случаев
в большинстве случаевв большинстве случаев
в большинстве случаевyogatherapia
 
Vmware desktop infrastructure virtualization assessment
Vmware  desktop infrastructure virtualization assessmentVmware  desktop infrastructure virtualization assessment
Vmware desktop infrastructure virtualization assessmentsolarisyougood
 
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваВізитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваVictor Kravtsov
 
Chipping away at healthcare special interests yet
Chipping away at healthcare special interests yetChipping away at healthcare special interests yet
Chipping away at healthcare special interests yetWayne Caswell
 
Toxicology-History
Toxicology-HistoryToxicology-History
Toxicology-Historytmondol
 
Modelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesModelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesacajado
 

Viewers also liked (13)

Blog
BlogBlog
Blog
 
Research task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spreadResearch task 3c analysis of own magazine double page spread
Research task 3c analysis of own magazine double page spread
 
Segunda parte movilidad no motorizada
Segunda parte movilidad no motorizadaSegunda parte movilidad no motorizada
Segunda parte movilidad no motorizada
 
Krishnan Kameshwaran-Resume_
Krishnan Kameshwaran-Resume_Krishnan Kameshwaran-Resume_
Krishnan Kameshwaran-Resume_
 
Claroline
ClarolineClaroline
Claroline
 
L’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTechL’identification des publications de l’Ecole des Ponts ParisTech
L’identification des publications de l’Ecole des Ponts ParisTech
 
в большинстве случаев
в большинстве случаевв большинстве случаев
в большинстве случаев
 
Vmware desktop infrastructure virtualization assessment
Vmware  desktop infrastructure virtualization assessmentVmware  desktop infrastructure virtualization assessment
Vmware desktop infrastructure virtualization assessment
 
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.РоманіваВізитка бібліотеки Сокальської гімназії ім.О.Романіва
Візитка бібліотеки Сокальської гімназії ім.О.Романіва
 
La litosfera
La litosferaLa litosfera
La litosfera
 
Chipping away at healthcare special interests yet
Chipping away at healthcare special interests yetChipping away at healthcare special interests yet
Chipping away at healthcare special interests yet
 
Toxicology-History
Toxicology-HistoryToxicology-History
Toxicology-History
 
Modelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoesModelo para encardenacao_de_teses_e_dissertacoes
Modelo para encardenacao_de_teses_e_dissertacoes
 

Similar to Beyond The Bench Workshops

Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkAdaryl "Bob" Wakefield, MBA
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Rehgan Avon
 
Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...kristgen
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Hofstra University - Overview of Big Data
Hofstra University - Overview of Big DataHofstra University - Overview of Big Data
Hofstra University - Overview of Big Datasarasioux
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeAdel Rahimi
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017philippbayer
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)Yuan Chuan Kee
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesShawn Day
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa WahediUNICORNS IN TECH
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 

Similar to Beyond The Bench Workshops (20)

Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...Data quality challenges in the Canadensys network of occurrence records: exam...
Data quality challenges in the Canadensys network of occurrence records: exam...
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Hofstra University - Overview of Big Data
Hofstra University - Overview of Big DataHofstra University - Overview of Big Data
Hofstra University - Overview of Big Data
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017HPCAC - the state of bioinformatics in 2017
HPCAC - the state of bioinformatics in 2017
 
2013 arizona-swc
2013 arizona-swc2013 arizona-swc
2013 arizona-swc
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
IDA MOOC Coursera Data Science Capstone (Data Cleaning/Data Exploration)
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
 
R programming language - Mustafa Wahedi
R programming language - Mustafa WahediR programming language - Mustafa Wahedi
R programming language - Mustafa Wahedi
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Beyond The Bench Workshops

  • 2.
  • 3. ur yo ing ce lish sen il tab pre oukhal Es ne b er t A Ro b nli o
  • 6. Why LinkedIn? •  Online CV + networking •  Recruiters use LinkedIn •  Find jobs posted on LinkedIn •  Apply to jobs
  • 8.
  • 10.
  • 17. Step 3: Have an awesome portfolio
  • 19.
  • 20.
  • 21. A language all scientists should know How R helped me look at billions of genotypes and how it can help you too Mitchell Bekritsky WSBS Graduate Student
  • 22. What is R? •  Language for statistical analysis, data manipulation and graphics •  Open source •  Flexible language •  Powerful built-in functions •  Strong user community •  Publication quality graphs •  Free! Graphic  from  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html  
  • 23.
  • 24. Who uses R? Source:  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php  
  • 25. What is R used for? •  Movie recommendations •  Clinical drug development •  Credit risk analysis •  News graphics •  Tailoring online advertising •  Modeling oil spills •  Predicting economic activity •  Predicting election outcomes Graphic  from  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html  
  • 26. But I’m a biologist…
  • 27. How R helped me see my data •  First time looking at microsatellite genotypes •  How many microsatellites differ from reference genome? •  By how much? Problems: –  Lots of data (4.7 million genotypes) –  Complex information –  Too big for Excel –  No good graphics in Excel either
  • 28. One of my first graphs in R Lessons learned about my data •  Lots of microsatellites differ from reference by a little bit •  Thousands differ by ± 20 bp •  8.27% of all microsatellites differ from reference (~400k) Lessons learned about my graph •  This is a terrible graph
  • 29. A bad R graph is better than no R graph Bad graphs helped me •  Understand my data better •  Improve my analyses •  Improve how I communicate my data •  R has incredible flexibility for graphing—if you can dream it, you can probably build it
  • 30. A bad R graph is better than no R graph Bad graphs helped me •  Understand my data better •  Improve my analyses •  Improve how I communicate my data •  R has incredible flexibility for graphing—if you can dream it, you can probably build it My best R graphs make one point clearly without clutter
  • 32. How R saved my thesis •  Processing lots of sequencing data in hundreds of people •  Too many people and processes to monitor all steps of pipeline by eye while data was being processed Sanity check •  After data processing did data look bi-allelic?
  • 33. How R saved my thesis •  Processing lots of sequencing data in hundreds of people •  Too many people and processes to monitor all steps of pipeline by eye while data was being processed Sanity check •  After data processing did data look bi-allelic? No!!  
  • 34. Troubleshooting using R •  People don’t actually have massive deletions and amplifications •  My pipeline was deleting files because of a bug, which would remove large chunks of chromosomes •  Thanks to R, I found people where this had happened, tracked down the bug, and didn’t report massive CNVs in autistic children Side note •  If it looks too good to be true, it probably is
  • 35. R helped me build a better genotyper •  Some non-reference alleles aren’t covered well •  Leads to incorrect genotype calls Problem •  How do I develop a smarter genotyper and know that it works?
  • 36. R helped me build a better genotyper •  Some non-reference alleles chr19:54772760 A repeat, reference length 8 aren’t covered well Genotypes 100 •  Leads to incorrect genotype works? 60 40 20 0 genotyper and know that it 10 bp allele coverage •  How do I develop a smarter 80 calls Problem 10|-1 10|10 8|-1 8|10 8|8 0 20 40 60 8 bp allele coverage 80 100
  • 37. Modeling genotypes in R •  Built a model for biased genotypes in R •  Model helped me build a more accurate genotyper •  When applied to real data, clear improvements
  • 38. R finds de novo mutations for me •  >300 million genotypes •  How do I find de novo mutations in all that data? R to the rescue!
  • 39. What R has done for me Data mining •  Finding de novo mutations •  Quality control for my data Data manipulation •  Converting raw read counts to genotypes Data simulation and modeling •  Finding ways to improve my genotyper Data visualization
  • 40. R has extensive support for biologists Bioconductor is an incredible resource for biological analyses in R •  Microarrays •  Differential expression (DESeq, edgeR, cummeRbund) •  Gene models •  Flow cytometry (flowCore, flowStats, flowViz) •  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)
  • 41. Installing R •  R can be downloaded from rproject.org •  R runs on PCs, Macs and Linux computers •  The R project website has an R manual to get you started
  • 42. Working in R Native R interface can be hard to work with •  Lots of windows •  Difficult to keep things organized
  • 43. RStudio interface •  All your variables, help pages, script windows and consoles in one place •  Highlights R code for easier programming •  Tabbed windows for multiple scripts •  History saves all previous commands, plot history saves all previous plots •  Find it at rstudio.com
  • 44. Learning R Many online tutorials •  R has its own introduction •  Statistics Using R with Biological Examples Take interesting data, use it to explore R •  Plot, graph, use statistical tests Ask someone who knows R •  Getting started is pretty easy •  Learn what you need when you need it
  • 46.
  • 47. The Bioscience Entreprise Club is dedicated to helping CSHL’s science research professionals and alumni cultivate and leverage their cross-disciplinary skill sets and expertise to transition into diverse careers.
  • 48. Current Exchange is CSHL’s very own student-run magazine. We feature articles about science aimed at a general audience. Check out our inaugural issue at issuu.com/ currentexchange Send your articles to raboukha@cshl.edu by November 5, 2013