SlideShare a Scribd company logo
1 of 27
RAPID PROTOTYPING DATA
PRODUCTS USING SHINY
rstudio::conf 2018
2017-02-02
2004 201220072005 2006 20142013 2015
SPEAKER PROFILE
TANYA CASHORALI
@TANYACASH21
2
HUMANS ARE NOT FORTUNE TELLERS
Missing
Data
OutliersNonlinearityCollinearity
Delimiters!!
1
t;||
Of course I knew there
wouldn’t be enough data
in Oglala Lakota County
when I wrote the 25
page requirements doc!
WE’RE NOT BUILDING ASTON MARTINS
“Laugh at perfection. It’s boring and keeps you from being done.”
THE DONE MANIFESTO
• http://www.manifestoproject.it/bre-pettis-and-kio-stark/
• https://www.bakadesuyo.com/2015/09/impostor-syndrome/
“Pretending you know what you’re doing
is almost the same as knowing what you
are doing, so just accept that you know
what you’re doing even if you don’t
and do it.”
There are three states of being:
1. Not knowing
2. Action
3. Completion.
CASE STUDIES
1.6 BILLION DOCUMENTS
Problem
Need to enable scientists to query 1.6 billion
“documents” (SNP + phenotype combinations)
quickly and filter based on significance and
various other filters.
CUSTOM RMONGO PACKAGE
RMongo package built in Scala did not support authentication for Mongo 3.0
So we built an RJMongo package using Java = ACTION!
That same issue still isn’t resolved – originally reported in June 2015
PERFORMANCE?
action <- dataTableAjax(session, result,rownames = FALSE,filter = function(data, params) {
q = params
data=dataFromMongo(qs,q$search,q$start,q$length,q$column,q$order)
list(
draw = as.integer(q$draw),
recordsTotal = recordCount,
recordsFiltered =recordCount ,
data = unname(as.matrix(data)),
DT_rows_all = 5
)
})
widget <- datatable(result,
rownames = FALSE,
class = 'display cell-border compact',
selection = 'none',
options = list(ajax = list(url = action),scrollX = TRUE,serverSide = TRUE,stateSave = TRUE,
escape=FALSE,filter=FALSE,processing=TRUE,language = list(processing = "<img src='spin.gif'>"),columnDefs = list(list(targets =
c(0:4,6:25),sortable = FALSE)),order = list(list(5,'asc')))
)
* https://www.rdocumentation.org/packages/DT/versions/0.2/topics/dataTableAjax
In order to improve query performance… dataTableAjax() to the resuce!
FIRST VERSION
“Accept that everything is a draft.
It helps to get done.”
CURRENT PRODUCT
LET’S ADD 2.5 BILLION MORE!
• One node cluster w/ 512GB of RAM
• Current data size ~3 terabytes in JSON format
“Done is the engine of more.”
CMR API
Problem – API access to data from
Centre for Medicines Research (CMR)
International, which provides pharmaceutical
industry metrics and trends analysis.
Issues:
• Clunky API
• Tons of parameter combinations and
results returned in aggregate
• Time-consuming
• IT dumped some of the data
• Slow
• Poor usability on their GUI (filters are
clunky)
• Ineffective visualizations
• Data extracts contain limited details and
were difficult to use
CMR API
First iteration was just ggplots and iterating with client on necessary parameters,
don’t need thousands of indications
AUTHENTICATION (PYTHON! GASP!)
“The point of being done
is not to finish but
to get other things done.”
HOW IT WORKS
cmr_api.R
auth.py
server.R ui.R
fetch_data(token, endpt, params)
reticulate
get_token()
“Once you’re done you
can throw it away.”
CURRENT PRODUCT
DRUG MANUFACTURING
• Many combinations of raw materials in
specific order used to create final drug
substance
• Time Consuming
• Costly
• One problematic substance = lost
batches = millions of dollars
• Single user was running 100s of SQL
queries manually
Throw out massive
requirements docs
NETWORKD3
“People without dirty
hands are wrong.
Doing something makes
you right.”
FIRST VERSION – CORE FUNCTIONALITY
“There is no editing stage.”
DETAILS COME LATER
“Failure counts as done. So do mistakes.”
SHINY AND D3 COMMUNICATION
server.R: session$sendCustomMessage(type="jsondata",var_json)
www/: main.js
Shiny.addCustomMessageHandler("jsondata", function (message) {
if (typeof(message) !== 'undefined') {
var json_data = JSON.parse(message);
initTree(json_data.left);
initSide(json_data.right);
}
});
ui.R: tags$script(src=”main.js")
• http://myinspirationinformation.com/visualisation/d3-js/integrating-d3-js-into-r-shiny/
”FINAL” PRODUCT
Previously:
6 months and full
team to identify
problematic
substance
Now:
1-2 users and 1 day
to identify
problematic
substance
OVERVIEW OF RAPID PROTOTYPING PROCESS
IF WE WERE MAKING DONUTS
THANK YOU
Patrick
Brophy
Daron
Carlson
Mike
Fitzpatrick
Roland
Zhou
Olivia
Brode-Roger
Rajesh
Mikkilineni
Jason
Tetrault
Marianna
Foos

More Related Content

Similar to Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Offline First Applications
Offline First ApplicationsOffline First Applications
Offline First Applicationstechmaddy
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataMatt Bowen
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichMemi Beltrame
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Fuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven companyFuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven companySteven Stadler
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven OrganizationIT Weekend
 
Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...WeLoveSEO
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
Le Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton CaldwellLe Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton Caldwelldaltoncaldwell
 
Inside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter CoffeeInside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter CoffeePeter Coffee
 
Python vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experiencePython vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experiencePython Ireland
 
WisdomEye Technologies
WisdomEye TechnologiesWisdomEye Technologies
WisdomEye TechnologiesAshish Jha
 

Similar to Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018 (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Offline first geeknight
Offline first geeknightOffline first geeknight
Offline first geeknight
 
Offline First Applications
Offline First ApplicationsOffline First Applications
Offline First Applications
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Protecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test dataProtecting privacy with fuzzy-feeling test data
Protecting privacy with fuzzy-feeling test data
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Fuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven companyFuck Spreadsheets - first steps to become a data-driven company
Fuck Spreadsheets - first steps to become a data-driven company
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Building a Data Driven Organization
Building a Data Driven OrganizationBuilding a Data Driven Organization
Building a Data Driven Organization
 
Big Data
Big DataBig Data
Big Data
 
Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...Alexis max-Creating a bot experience as good as your user experience - Alexis...
Alexis max-Creating a bot experience as good as your user experience - Alexis...
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
Le Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton CaldwellLe Web 2012 presentation - Dalton Caldwell
Le Web 2012 presentation - Dalton Caldwell
 
Inside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter CoffeeInside Out and Upside Down - FOO Camp 2016 - Peter Coffee
Inside Out and Upside Down - FOO Camp 2016 - Peter Coffee
 
Python vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experiencePython vs JLizard.... a python logging experience
Python vs JLizard.... a python logging experience
 
WisdomEye Technologies
WisdomEye TechnologiesWisdomEye Technologies
WisdomEye Technologies
 

More from Tanya Cashorali

When and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial ApplicationsWhen and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial ApplicationsTanya Cashorali
 
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...Tanya Cashorali
 
Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017 Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017 Tanya Cashorali
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016Tanya Cashorali
 
Popular Industry Applications of R
Popular Industry Applications of RPopular Industry Applications of R
Popular Industry Applications of RTanya Cashorali
 
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...Tanya Cashorali
 
Big data meetup_10_9_2013
Big data meetup_10_9_2013Big data meetup_10_9_2013
Big data meetup_10_9_2013Tanya Cashorali
 
Front endrequirements 09_25_2013
Front endrequirements 09_25_2013Front endrequirements 09_25_2013
Front endrequirements 09_25_2013Tanya Cashorali
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Tanya Cashorali
 

More from Tanya Cashorali (10)

When and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial ApplicationsWhen and Why to Use Shiny for Commercial Applications
When and Why to Use Shiny for Commercial Applications
 
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
Strata 2017 NYC - How to Hire and Test for Data Skills: A One-Size-Fits-All I...
 
Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017 Rapid Prototyping Data Products in Shiny - ODSC 2017
Rapid Prototyping Data Products in Shiny - ODSC 2017
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
Popular Industry Applications of R
Popular Industry Applications of RPopular Industry Applications of R
Popular Industry Applications of R
 
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
Solving Real Business Problems with Big Data: Measuring Customer Loyalty in t...
 
DataCon Talk
DataCon Talk DataCon Talk
DataCon Talk
 
Big data meetup_10_9_2013
Big data meetup_10_9_2013Big data meetup_10_9_2013
Big data meetup_10_9_2013
 
Front endrequirements 09_25_2013
Front endrequirements 09_25_2013Front endrequirements 09_25_2013
Front endrequirements 09_25_2013
 
Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013Microsoft NERD Talk - R and Tableau - 2-4-2013
Microsoft NERD Talk - R and Tableau - 2-4-2013
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Rapid Prototyping Data Products in Shiny - RStudio::Conf 2018

  • 1. RAPID PROTOTYPING DATA PRODUCTS USING SHINY rstudio::conf 2018 2017-02-02
  • 2. 2004 201220072005 2006 20142013 2015 SPEAKER PROFILE TANYA CASHORALI @TANYACASH21 2
  • 3. HUMANS ARE NOT FORTUNE TELLERS Missing Data OutliersNonlinearityCollinearity Delimiters!! 1 t;|| Of course I knew there wouldn’t be enough data in Oglala Lakota County when I wrote the 25 page requirements doc!
  • 4. WE’RE NOT BUILDING ASTON MARTINS “Laugh at perfection. It’s boring and keeps you from being done.”
  • 5. THE DONE MANIFESTO • http://www.manifestoproject.it/bre-pettis-and-kio-stark/ • https://www.bakadesuyo.com/2015/09/impostor-syndrome/ “Pretending you know what you’re doing is almost the same as knowing what you are doing, so just accept that you know what you’re doing even if you don’t and do it.” There are three states of being: 1. Not knowing 2. Action 3. Completion.
  • 7. 1.6 BILLION DOCUMENTS Problem Need to enable scientists to query 1.6 billion “documents” (SNP + phenotype combinations) quickly and filter based on significance and various other filters.
  • 8. CUSTOM RMONGO PACKAGE RMongo package built in Scala did not support authentication for Mongo 3.0 So we built an RJMongo package using Java = ACTION! That same issue still isn’t resolved – originally reported in June 2015
  • 9. PERFORMANCE? action <- dataTableAjax(session, result,rownames = FALSE,filter = function(data, params) { q = params data=dataFromMongo(qs,q$search,q$start,q$length,q$column,q$order) list( draw = as.integer(q$draw), recordsTotal = recordCount, recordsFiltered =recordCount , data = unname(as.matrix(data)), DT_rows_all = 5 ) }) widget <- datatable(result, rownames = FALSE, class = 'display cell-border compact', selection = 'none', options = list(ajax = list(url = action),scrollX = TRUE,serverSide = TRUE,stateSave = TRUE, escape=FALSE,filter=FALSE,processing=TRUE,language = list(processing = "<img src='spin.gif'>"),columnDefs = list(list(targets = c(0:4,6:25),sortable = FALSE)),order = list(list(5,'asc'))) ) * https://www.rdocumentation.org/packages/DT/versions/0.2/topics/dataTableAjax In order to improve query performance… dataTableAjax() to the resuce!
  • 10. FIRST VERSION “Accept that everything is a draft. It helps to get done.”
  • 12. LET’S ADD 2.5 BILLION MORE! • One node cluster w/ 512GB of RAM • Current data size ~3 terabytes in JSON format “Done is the engine of more.”
  • 13. CMR API Problem – API access to data from Centre for Medicines Research (CMR) International, which provides pharmaceutical industry metrics and trends analysis. Issues: • Clunky API • Tons of parameter combinations and results returned in aggregate • Time-consuming • IT dumped some of the data • Slow • Poor usability on their GUI (filters are clunky) • Ineffective visualizations • Data extracts contain limited details and were difficult to use
  • 14. CMR API First iteration was just ggplots and iterating with client on necessary parameters, don’t need thousands of indications
  • 15. AUTHENTICATION (PYTHON! GASP!) “The point of being done is not to finish but to get other things done.”
  • 16. HOW IT WORKS cmr_api.R auth.py server.R ui.R fetch_data(token, endpt, params) reticulate get_token() “Once you’re done you can throw it away.”
  • 18. DRUG MANUFACTURING • Many combinations of raw materials in specific order used to create final drug substance • Time Consuming • Costly • One problematic substance = lost batches = millions of dollars • Single user was running 100s of SQL queries manually
  • 20.
  • 21. NETWORKD3 “People without dirty hands are wrong. Doing something makes you right.”
  • 22. FIRST VERSION – CORE FUNCTIONALITY “There is no editing stage.”
  • 23. DETAILS COME LATER “Failure counts as done. So do mistakes.”
  • 24. SHINY AND D3 COMMUNICATION server.R: session$sendCustomMessage(type="jsondata",var_json) www/: main.js Shiny.addCustomMessageHandler("jsondata", function (message) { if (typeof(message) !== 'undefined') { var json_data = JSON.parse(message); initTree(json_data.left); initSide(json_data.right); } }); ui.R: tags$script(src=”main.js") • http://myinspirationinformation.com/visualisation/d3-js/integrating-d3-js-into-r-shiny/
  • 25. ”FINAL” PRODUCT Previously: 6 months and full team to identify problematic substance Now: 1-2 users and 1 day to identify problematic substance
  • 26. OVERVIEW OF RAPID PROTOTYPING PROCESS IF WE WERE MAKING DONUTS

Editor's Notes

  1. R 2005 story,
  2. Number 1 of the done manifesto
  3. Single nucleotide polymorphisms, frequently called SNPs (pronounced “snips”), are the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide. For example, a SNP may replace the nucleotide cytosine (C) with the nucleotide thymine (T) in a certain stretch of DNA. SNPs occur normally throughout a person’s DNA. They occur once in every 300 nucleotides on average, which means there are roughly 10 million SNPs in the human genome. Most commonly, these variations are found in the DNA between genes.
  4. We had authentication issues with Rmongo and Mongo 3.0, package was built in scala, we re-built it in java. Still wasn’t resolved 1 year later (jun 2015 when I reported, still open today)
  5. It is basically an implementation of server-side processing of DataTables in R. Also set up auth using the copmany’s single-sign on
  6. Full web dev team would take much longer
  7. 2 years later! Still being used and wanting to expand upon. Shiny infrastructure is there though.
  8. What are the latest trends in R&D productivity across the industry? What are the key factors that influence R&D productivity? How do different companies compare — with the industry, with competitors? What are the latest trends in industry pipeline volumes, cycle times and success rates – by therapeutic area and granular indications? What are the most effective and useful metrics for measuring and comparing R&D productivity across the global pharmaceutical industry? Are the timelines and success rates by therapy area being experienced by my company competitive with the rest of the industry and what are the drivers for above or below average performance?
  9. Add more charts
  10. Fastest way to get the data, python auth code example in their docs
  11. Refactor not throw away
  12. networkD3 wasn’t enough needed more customization
  13. Need a bi-directional tree, colors showed up that the client didn’t know existed!
  14. Send custom message to front-end This searches for the custom message of the type “jsondata”. Then it takes the contents of the message, and assigns them to a java script variable, in this case json_data