SlideShare a Scribd company logo
1 of 35
Download to read offline
#MDBW17
Jane Uyvova
Sr. Solutions Architect, MongoDB
USING R FOR ADVANCED
ANALYTICS WITH MONGODB
#MDBW17
LEARNING OBJECTIVES
Aggregation
Framework
How to design
your MongoDB
schema and
utilize the
aggregation
framework for
data
preparation and
enrichment.
R Connectors
How to connect
R to your
MongoDB
environment.
Understand
connectors
available and be
able to choose
the right
deployment
topology for
production.
Analytical
Patters
How to
recognize the
analytical
patterns
required and
apply MongoDB
and R together
to deliver key
insights in your
organization.
01 02 03
#MDBW17
DATA VS INSIGHT
big data is not valuable
insight is valuable
time-to-insight is critical
source of competitive advantage
Why
?
• The most popular data science environment
• Wide variety of statistical and graphical techniqu
• Open source / highly extensible
• 2+ M users
• Taught in most universities
• Thriving user groups worldwide
• 10,000+ contributed packages
Language
+
Platform
Community
Ecosystem
• Rich application & platform integration
• Finance, Genetics, Social Sciences, Geospatial &
Why
?
• The most popular NoSQL database
• 20M+ downloads, 3500+ customers
• Open source
• Analytics against live operational systems
• Flexible schema to capture ALL data
• More / new / changing data types
Community
+
Ecosystem
Speed
Innovation • Advanced capabilities beyond K-V
• Many industries and use cases
• Extreme developer productivity
#MDBW17
Geospatial
Text Search &
Collation
Aggregation
Left Outer
Join
AGGREGATION FRAMEWORK
Graph
Processing
Faceted
Navigation
Map Reduce
$match $group $sort $limit
$lookup
$geoNear
$meta
$facet $bucket
$graphLooku
p
a a a a a
á á á á á
#MDBW17
MONGODB AND R TOGETHER: USE CASES
Multi-genre analytics
churn analysis
fraud detection
drug discovery via mining genomes at scale
sentiment analysis
geospatial analysis
customer segmentation
predictive failure & maintenance
USE CASE 1: GENOME-
WIDE ASSOCIATION
ANALYSIS
#MDBW17
GENOMICS 101
#MDBW17
GENOMICS 101
#MDBW17
GENOMICS 101
#MDBW17
GENOME-WIDE ASSOCIATION STUDIES
Examination of a genome-wide set of genetic variants in different individuals to see if any variant is
associated with a trait. GWASs typically focus on associations between single-nucleotide
polymorphisms (SNPs) and traits like major human diseases
Manhattan plot is popular graphical methods for visualizing results from high-dimensional data
analysis such as a genome wide association study in which p-values, Z-scores, test statistics are
plotted on a scatter plot against their genomic position. Manhattan plots are used for visualizing
potential regions of interest in the genome that are associated with a phenotype.
#MDBW17
DATA SET: HAPMAP
HapMap3: release 2 of genome-wide SNP genotyping using 1,115 DNA
samples from 11 human populations and total 1.6M SNPs
The p-values, zscores, and effectsizes used are taken from relevant
Prostate Cancer and Breast Cancer studies using GWAS Catalog
Annotation information (nearest gene and distance to nearest gene)
was obtained from the UCSC genome annotation database using
#MDBW17
GENOME-WIDE ASSOCIATION STUDY:
DATA EXPLORATION
#load data from mongo into R
hapmap <- mongo("hapmap", url =
"mongodb://localhost:27017/gene_annotation")
#count the number of records
hapmap$count('{}')
# read all the data back into R dataframe
hapmap_data <- hapmap$find('{}')
# create manhattan plot for exploration
manhattanly(hapmap_data, snp = "SNP", gene =
"GENE")
#MDBW17
GENOME-WIDE ASSOCIATION STUDY:
ANNOTATION
#use Bioconductor libraries to annotate the
genome: find nearest gene and distance to it
BSgenome.Hsapiens.UCSC.hg19
genome <- BSgenome.Hsapiens.UCSC.hg19
seqlengths(genome)
hapmap_range <- genome$chr5
#use aggregation framework to zoom into
chrmosomal region of interest
hapmap_range <- hapmap $find('{ "CHR": { "$gt": 3,
"$lt": 8 } }' )
#read data back into mongo
hapmap_range $export(file(" hapmap_range.json"))
#MDBW17
DNA SEQUENCE REGION MAP
CLINICAL
SIGNIFICANCE MY GENOMICS
WHAT ELSE MIGHT YOU WANT TO KNOW?
#MDBW17
FIELDS CAN CONTAIN AN
ARRAY
OF SUB-DOCUMENTS
TYPED
FIELD
VALUES
STRING
FIELDS
DNA SEQUENCE
RICH DOCUMENTS FOR GENOMIC DATA
REGION MAP
CLINICAL
SIGNIFICANCE
#MDBW17
SCHEMA DESIGN FOR GENOMIC ANALYSIS
{HapMap}
‒ EMBED: SNP, annotations, populations, p-values, effective size, phenotype,
study
‒ $lookup: genes, mRNA, clinical conditions
‒ $graphlookup: gene & disease associations
‒ $bucket/$facet: clinical conditions, phenotypes
‒ $textsearch: scientific articles
‒ $geoNear: populations, epidemiology
‒ $mapreduce: depth of coverage,
Bayesian estimation, etc.
{My DNA}
{Clinical
Condition}
{Gene}
{Scientific
Study}
{HapMap}
{Sequence}
USE CASE 2:
VEHICLE SITUATIONAL
AWARENESS
#MDBW17
DATA SET: CHICAGO OPEN DATA
311 Service Requests
Abandoned Buildings
Potholes
Tree Trimming
Sanitation Code Complaints
Abandoned Vehicles
Garbage Carts
Tree Debris
Street Lights Out
Transportation
Street Closures
Red Light Camera Violations
Speed Camera Violations
Transportation Department
Permits
Public Right-of-Way Use
Permits
Events
Special Events Permits
Public Building Commission
Public Parks
#MDBW17
VEHICLE SITUATIONAL AWARENESS:
DATA EXPLORATION
#import 311 service request data into mongo
mongoimport -d chicago -c street_closures
#create geoindex
#read data into R
mdb <- mongo("street_closures", url = "mongodb://localhost:27017/chicago)
#MDBW17
VEHICLE SITUATIONAL AWARENESS PLOT
#calculate all geopoints for city alerts
points <- get_points
('{"location":{"$geoWithin":{"$centerSphere":[[-
87.622772,41.887694], 0.0001567865]}}}')
#plot Hyatt Regency marker and all points on the map
hyatt_map
+ geo_point(
+ aes(x = lons, y = lats),
+ color = "red",
+ alpha = 0.1,
+ size = 2,
+ data = points)
+ hyatt_marker
#MDBW17
VEHICLE SITUATIONAL AWARENESS
DENSITY MAP
# use stat_density2d from ggplot2 to estimates
contours from discrete samples
hyatt_map
+ geom_density2d(
data = points,
aes(x = lons, y = lats),
size = 0.3)
+ stat_density2d(
data = points,
aes(x = lons, y = lats,
fill = ..level.., alpha = ..level..),
size = 0.01,
bins = 16,
geom = "polygon”)
+ scale_fill_gradient(low = "green", high = "red")
+ scale_alpha(range = c(0, 0.3), guide = FALSE)
#MDBW17
TRAFFIC
WHAT ELSE MIGHT YOU WANT TO KNOW?
CITY DATA
WEATHER
#MDBW17
ARRAY
OF SUB-
DOCUMENTS
TYPED
FIELD
VALUES
REFERENCE
OBJECT IDS
FIELDS
WEATHER
RICH DOCUMENTS FOR GEOSPATIAL ANALYSIS
ALERTS
ROUTES
#MDBW17
SCHEMA DESIGN FOR GEOSPATIAL & IOT
{MyTrip}
‒ EMBED: routes, service requests, public safety alerts
‒ $lookup: traffic, weather, bus routes, bikeshare
‒ $graphlookup: transportation network
‒ $textsearch: twitter / social media
‒ $geoNear: restaurants, services, alerts
‒ $mapreduce: analyze most efficient routes
{Weather}{Twitter}
{Public
Safety
Alerts}
{Traffic}
{MyTrip}
{BikeShare}
ON SCALABILITY
#MDBW17
From single machines to distributed
computing
WHERE DOES SPARK FIT IN?
#MDBW17
USING SPARK TO PARALLELIZE & SCALE R
SparkR
distributed data frame implementation
supports selection, filtering, aggregation on large datasets.
supports distributed machine learning using MLlib
SparkDataFrame
distributed & optimized collection of data organized
can be constructed from a wide array of sources including
MongoDB
SparkSession
entry point into SparkR which connects your R program to a
Spark cluster. You can use a SparkSession object to write data to
MongoDB, read data from MongoDB, and perform SQL
operations.
#import data into Spark R
df <- read.df("",source = "com.mongodb.spark.sql.DefaultSource",
database = ”chicago", collection = ”three_eleven")
#MDBW17
MONGODB & SPARK ARCHITECTURE
#MDBW17
BRING IT ALL TOGETHER
open standards
high adoption
iterative workflow -> fail fast
don’t throw ANY data away
scale out
WHAT WILL YOU SOLVE?
#MDBW17
REFERENCES & THANKS
MongoDB 3.4 - https://docs.mongodb.com/manual/
Aggregation Framework - https://docs.mongodb.com/manual/aggregation/
MongoDB Compass - https://docs.mongodb.com/compass/current/
Mongo Spark Connector 2.0 - https://docs.mongodb.com/spark-connector/current/r-api/
Mongolite: https://github.com/jeroen/mongolite
https://github.com/ajdavis/three-eleven-mongolite-
demo/blob/master/mongolite-demo.R
Rstudio - https://www.rstudio.com/
Visualization: plot.ly – https://plot.ly/
manhattanly - http://sahirbhatnagar.com/manhattanly/
ggplot2 - https://cran.r-project.org/web/packages/ggplot2/
HapMap3- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/
Bioconductor - http://bioconductor.org/
City of Chicago - https://data.cityofchicago.org/
Special Thanks to Jeroen Ooms and A. Jesse Jiryu Davis
THANK YOU!
QUESTIONS?
jane.uyvova@mongodb.com
Webinar: Using R for Advanced Analytics with MongoDB

More Related Content

Similar to Webinar: Using R for Advanced Analytics with MongoDB

Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 
rworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global Datarworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global DataDr. Volkan OBAN
 
DNA Guide, Inc. - Tech Summary
DNA Guide, Inc. - Tech SummaryDNA Guide, Inc. - Tech Summary
DNA Guide, Inc. - Tech SummaryDNA Compass
 
Data science lab project
Data science lab projectData science lab project
Data science lab projectLuciaRavazzi
 
expeditions praneeth_june-2021
expeditions praneeth_june-2021expeditions praneeth_june-2021
expeditions praneeth_june-2021Praneeth Vepakomma
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoningWilliam Smith
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
MapInfo Discover 3D for Wind Energy Resources
MapInfo Discover 3D for Wind Energy ResourcesMapInfo Discover 3D for Wind Energy Resources
MapInfo Discover 3D for Wind Energy ResourcesPrakher Hajela Saxena
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammedAmeeruddin MD
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEEFINALYEARSTUDENTPROJECTS
 

Similar to Webinar: Using R for Advanced Analytics with MongoDB (20)

Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 
Smart Geo. Guido Satta (Maggio 2015)
Smart Geo. Guido Satta (Maggio 2015)Smart Geo. Guido Satta (Maggio 2015)
Smart Geo. Guido Satta (Maggio 2015)
 
rworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global Datarworldmap: A New R package for Mapping Global Data
rworldmap: A New R package for Mapping Global Data
 
566_SriramDandamudi_CEE
566_SriramDandamudi_CEE566_SriramDandamudi_CEE
566_SriramDandamudi_CEE
 
DNA Guide, Inc. - Tech Summary
DNA Guide, Inc. - Tech SummaryDNA Guide, Inc. - Tech Summary
DNA Guide, Inc. - Tech Summary
 
662_AravindKumarN_CEE
662_AravindKumarN_CEE662_AravindKumarN_CEE
662_AravindKumarN_CEE
 
671_JeevanRavula_CEE
671_JeevanRavula_CEE671_JeevanRavula_CEE
671_JeevanRavula_CEE
 
Data science lab project
Data science lab projectData science lab project
Data science lab project
 
expeditions praneeth_june-2021
expeditions praneeth_june-2021expeditions praneeth_june-2021
expeditions praneeth_june-2021
 
587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE587_EswarPrasadReddyMachireddy_CEE
587_EswarPrasadReddyMachireddy_CEE
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
598_RamaSrikanthJakkam_CEE
598_RamaSrikanthJakkam_CEE598_RamaSrikanthJakkam_CEE
598_RamaSrikanthJakkam_CEE
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
MapInfo Discover 3D for Wind Energy Resources
MapInfo Discover 3D for Wind Energy ResourcesMapInfo Discover 3D for Wind Energy Resources
MapInfo Discover 3D for Wind Energy Resources
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammed
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

Webinar: Using R for Advanced Analytics with MongoDB

  • 1. #MDBW17 Jane Uyvova Sr. Solutions Architect, MongoDB USING R FOR ADVANCED ANALYTICS WITH MONGODB
  • 2. #MDBW17 LEARNING OBJECTIVES Aggregation Framework How to design your MongoDB schema and utilize the aggregation framework for data preparation and enrichment. R Connectors How to connect R to your MongoDB environment. Understand connectors available and be able to choose the right deployment topology for production. Analytical Patters How to recognize the analytical patterns required and apply MongoDB and R together to deliver key insights in your organization. 01 02 03
  • 3. #MDBW17 DATA VS INSIGHT big data is not valuable insight is valuable time-to-insight is critical source of competitive advantage
  • 4. Why ? • The most popular data science environment • Wide variety of statistical and graphical techniqu • Open source / highly extensible • 2+ M users • Taught in most universities • Thriving user groups worldwide • 10,000+ contributed packages Language + Platform Community Ecosystem • Rich application & platform integration • Finance, Genetics, Social Sciences, Geospatial &
  • 5. Why ? • The most popular NoSQL database • 20M+ downloads, 3500+ customers • Open source • Analytics against live operational systems • Flexible schema to capture ALL data • More / new / changing data types Community + Ecosystem Speed Innovation • Advanced capabilities beyond K-V • Many industries and use cases • Extreme developer productivity
  • 6. #MDBW17 Geospatial Text Search & Collation Aggregation Left Outer Join AGGREGATION FRAMEWORK Graph Processing Faceted Navigation Map Reduce $match $group $sort $limit $lookup $geoNear $meta $facet $bucket $graphLooku p a a a a a á á á á á
  • 7. #MDBW17 MONGODB AND R TOGETHER: USE CASES Multi-genre analytics churn analysis fraud detection drug discovery via mining genomes at scale sentiment analysis geospatial analysis customer segmentation predictive failure & maintenance
  • 8. USE CASE 1: GENOME- WIDE ASSOCIATION ANALYSIS
  • 12. #MDBW17 GENOME-WIDE ASSOCIATION STUDIES Examination of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWASs typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases Manhattan plot is popular graphical methods for visualizing results from high-dimensional data analysis such as a genome wide association study in which p-values, Z-scores, test statistics are plotted on a scatter plot against their genomic position. Manhattan plots are used for visualizing potential regions of interest in the genome that are associated with a phenotype.
  • 13. #MDBW17 DATA SET: HAPMAP HapMap3: release 2 of genome-wide SNP genotyping using 1,115 DNA samples from 11 human populations and total 1.6M SNPs The p-values, zscores, and effectsizes used are taken from relevant Prostate Cancer and Breast Cancer studies using GWAS Catalog Annotation information (nearest gene and distance to nearest gene) was obtained from the UCSC genome annotation database using
  • 14. #MDBW17 GENOME-WIDE ASSOCIATION STUDY: DATA EXPLORATION #load data from mongo into R hapmap <- mongo("hapmap", url = "mongodb://localhost:27017/gene_annotation") #count the number of records hapmap$count('{}') # read all the data back into R dataframe hapmap_data <- hapmap$find('{}') # create manhattan plot for exploration manhattanly(hapmap_data, snp = "SNP", gene = "GENE")
  • 15. #MDBW17 GENOME-WIDE ASSOCIATION STUDY: ANNOTATION #use Bioconductor libraries to annotate the genome: find nearest gene and distance to it BSgenome.Hsapiens.UCSC.hg19 genome <- BSgenome.Hsapiens.UCSC.hg19 seqlengths(genome) hapmap_range <- genome$chr5 #use aggregation framework to zoom into chrmosomal region of interest hapmap_range <- hapmap $find('{ "CHR": { "$gt": 3, "$lt": 8 } }' ) #read data back into mongo hapmap_range $export(file(" hapmap_range.json"))
  • 16. #MDBW17 DNA SEQUENCE REGION MAP CLINICAL SIGNIFICANCE MY GENOMICS WHAT ELSE MIGHT YOU WANT TO KNOW?
  • 17. #MDBW17 FIELDS CAN CONTAIN AN ARRAY OF SUB-DOCUMENTS TYPED FIELD VALUES STRING FIELDS DNA SEQUENCE RICH DOCUMENTS FOR GENOMIC DATA REGION MAP CLINICAL SIGNIFICANCE
  • 18. #MDBW17 SCHEMA DESIGN FOR GENOMIC ANALYSIS {HapMap} ‒ EMBED: SNP, annotations, populations, p-values, effective size, phenotype, study ‒ $lookup: genes, mRNA, clinical conditions ‒ $graphlookup: gene & disease associations ‒ $bucket/$facet: clinical conditions, phenotypes ‒ $textsearch: scientific articles ‒ $geoNear: populations, epidemiology ‒ $mapreduce: depth of coverage, Bayesian estimation, etc. {My DNA} {Clinical Condition} {Gene} {Scientific Study} {HapMap} {Sequence}
  • 19. USE CASE 2: VEHICLE SITUATIONAL AWARENESS
  • 20. #MDBW17 DATA SET: CHICAGO OPEN DATA 311 Service Requests Abandoned Buildings Potholes Tree Trimming Sanitation Code Complaints Abandoned Vehicles Garbage Carts Tree Debris Street Lights Out Transportation Street Closures Red Light Camera Violations Speed Camera Violations Transportation Department Permits Public Right-of-Way Use Permits Events Special Events Permits Public Building Commission Public Parks
  • 21. #MDBW17 VEHICLE SITUATIONAL AWARENESS: DATA EXPLORATION #import 311 service request data into mongo mongoimport -d chicago -c street_closures #create geoindex #read data into R mdb <- mongo("street_closures", url = "mongodb://localhost:27017/chicago)
  • 22. #MDBW17 VEHICLE SITUATIONAL AWARENESS PLOT #calculate all geopoints for city alerts points <- get_points ('{"location":{"$geoWithin":{"$centerSphere":[[- 87.622772,41.887694], 0.0001567865]}}}') #plot Hyatt Regency marker and all points on the map hyatt_map + geo_point( + aes(x = lons, y = lats), + color = "red", + alpha = 0.1, + size = 2, + data = points) + hyatt_marker
  • 23. #MDBW17 VEHICLE SITUATIONAL AWARENESS DENSITY MAP # use stat_density2d from ggplot2 to estimates contours from discrete samples hyatt_map + geom_density2d( data = points, aes(x = lons, y = lats), size = 0.3) + stat_density2d( data = points, aes(x = lons, y = lats, fill = ..level.., alpha = ..level..), size = 0.01, bins = 16, geom = "polygon”) + scale_fill_gradient(low = "green", high = "red") + scale_alpha(range = c(0, 0.3), guide = FALSE)
  • 24. #MDBW17 TRAFFIC WHAT ELSE MIGHT YOU WANT TO KNOW? CITY DATA WEATHER
  • 26. #MDBW17 SCHEMA DESIGN FOR GEOSPATIAL & IOT {MyTrip} ‒ EMBED: routes, service requests, public safety alerts ‒ $lookup: traffic, weather, bus routes, bikeshare ‒ $graphlookup: transportation network ‒ $textsearch: twitter / social media ‒ $geoNear: restaurants, services, alerts ‒ $mapreduce: analyze most efficient routes {Weather}{Twitter} {Public Safety Alerts} {Traffic} {MyTrip} {BikeShare}
  • 28. #MDBW17 From single machines to distributed computing WHERE DOES SPARK FIT IN?
  • 29. #MDBW17 USING SPARK TO PARALLELIZE & SCALE R SparkR distributed data frame implementation supports selection, filtering, aggregation on large datasets. supports distributed machine learning using MLlib SparkDataFrame distributed & optimized collection of data organized can be constructed from a wide array of sources including MongoDB SparkSession entry point into SparkR which connects your R program to a Spark cluster. You can use a SparkSession object to write data to MongoDB, read data from MongoDB, and perform SQL operations. #import data into Spark R df <- read.df("",source = "com.mongodb.spark.sql.DefaultSource", database = ”chicago", collection = ”three_eleven")
  • 30. #MDBW17 MONGODB & SPARK ARCHITECTURE
  • 31. #MDBW17 BRING IT ALL TOGETHER open standards high adoption iterative workflow -> fail fast don’t throw ANY data away scale out WHAT WILL YOU SOLVE?
  • 32. #MDBW17 REFERENCES & THANKS MongoDB 3.4 - https://docs.mongodb.com/manual/ Aggregation Framework - https://docs.mongodb.com/manual/aggregation/ MongoDB Compass - https://docs.mongodb.com/compass/current/ Mongo Spark Connector 2.0 - https://docs.mongodb.com/spark-connector/current/r-api/ Mongolite: https://github.com/jeroen/mongolite https://github.com/ajdavis/three-eleven-mongolite- demo/blob/master/mongolite-demo.R Rstudio - https://www.rstudio.com/ Visualization: plot.ly – https://plot.ly/ manhattanly - http://sahirbhatnagar.com/manhattanly/ ggplot2 - https://cran.r-project.org/web/packages/ggplot2/ HapMap3- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/ Bioconductor - http://bioconductor.org/ City of Chicago - https://data.cityofchicago.org/ Special Thanks to Jeroen Ooms and A. Jesse Jiryu Davis