SlideShare a Scribd company logo
JSON and MongoDB in R
PhillyR x Philadelphia MongoDB User Group - May 2019
“the most ambitious crossover event in history”
May 2019
Many thanks to our sponsors
Introduction to R
❖ Turing complete
❖ High-level
❖ Functional (at its heart)
❖ 1-indexed
❖ Everything is an object
Language Features
For more technical and rigorous introduction to R language, read Hadley Wickham’s (new) Advanced R https://adv-r.hadley.nz/
❖ R has the typical data types you
normally find in other languages
Boolean
T, F, TRUE, FALSE
Integer
1L, 1.2e1L, 0xDEADL
Double (floating point)
3.14, 1.23e1, 0xDEADBEEF
Character
“a”, ‘b’, “c”, ‘d”
Complex
1+2i
Raw (for binary data)
00 12 34
Variables
❖ Variable assignment by reference
Variables
❖ Copy-on-modify (aka immutability)
❖ aka “R is slow”
Variables
❖ x and y are both vectors.
❖ Think of vector as being composted of scalar of same type
(integer, double, boolean, or character)
≈ array of primitive in other languages
❖ Scalar is a vector of length 1
❖ A vector of length 1 ≈ primitive in Python
❖ i.e. [1] ≈ 1
(in R, c(1) == 1)
Data structure - vectors
** Very over-simplified and crude (and incorrect) explanations / comparisons in order to prime you for the upcoming slides on R → JSON.
Things are a lot more subtle. If you love computer science concepts and want to learn more, seriously take a look at Advanced R book
❖ Just like how variable name points to
values, elements of a vector can point to
values, but in this case it would be a list
❖ ≈ array of variables ?! **
Data structure - lists
** These are just approximation / alternative explanation. RTARB (Read The Advanced R Book)!
❖ This allows you to have heterogenous
values (different types) for each element
of a list
❖ “variable name” concept applies here
❖ Note that here we use an equal sign
instead of an arrow
Data structure - lists
❖ An element in a list can be anything, even another list.
Data structure - lists
❖ An element in a list can be anything, even another list.
Data structure - lists
(inner) List with two
elements, each with vector
of different size and data
type
Nested elements in list are
easily accessible by
indexing sequentially
❖ An element in a list can be anything, even another list.
Data structure - lists
Vector of length 1
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
{
“innerElem1” : [1, 2, 3],
“innerElem2” : [“A”, “B”, “C”, “D”, “E”]
}
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
{
“innerElem1” : [1, 2, 3],
“innerElem2” : [“A”, “B”, “C”, “D”, “E”]
}
We shall call this curly bracket-y format – JSON!
R - JSON “rules”
{
“outerElem1” : … ,
“outerElem2” : …
}
❖ Named lists become JSON object
❖ Unnamed list becomes JSON array of array elements
[
[ … ],
[ … ]
]
❖ Anything that is / can be named → { “name” : <<value>> }
R - JSON “rules”
[true, false, true, false]
❖ R data types are intuitively converted
Booleans
T, F, TRUE, FALSE
Integers
1L, 1.2e1L, 0xDEADL
Double (floating point)
3.14, 1.23e1, 0xDEADBEEF
Character
“a”, ‘b’, “c”, “d”
Complex
1+2i
Raw (for binary data)
00 12 34
[1, 12, 57005]
[3.14, 12.3, 3735928559]
[“a”, ‘b’, “c”, “d”]
??
??
R - JSON problems
❖ Should R vector of length 1 be a JSON array?
R object JSON object
“a”
“a”
[“a”]
R - JSON problems
❖ Should R vector of length 1 be a JSON array?
❖ JSON to R conversion is more troubling!
R object JSON object
“a”
“a”
[“a”]
“a”
“a”
[“a”]
R - JSON problems
❖ In R, there are NA and NULL values for different types of missingness.
How would this represent this in JSON?
❖ Conversely, how do you represent JSON null into R object?
❖ How do you represent more complex R objects, like complex, raw, factor,
Date, and POSIXt?
❖ How do you represent higher dimension R objects, like matrix and
data.frame?
❖ How do you represent other metadata associated with complex R objects,
like factor levels, row names for data.frame?
R - JSON conversion
using library(jsonlite)
❖ toJSON(…) to convert R object into JSON
❖ fromJSON(…) to convert JSON (represented as R’s character) into R object
❖ Automatic conversion of complex R objects with consistent default rule settings.
These can be overwritten if neccessary
- R vectors are always converted to JSON array.
- Complex R & JSON objects are mapped in R-user friendly way
❖ See vignette https://cran.r-project.org/web/packages/jsonlite/vignettes/json-
aaquickstart.html
❖ library(rjson) also exists, but library(jsonlite) is more widely used today
due to more consistent rule and better maintenance.
Why use R and JSON?
❖ JSON is widely used for language / technology agnostic data transfer format
❖ Use library(httr), library(opencpu), library(plumber) to query
HTTP API that returns results as JSON or productionize R code as HTTP API
❖ NoSQL databases often use JSON-like data format for transferring data between
DB server and your R session.
❖ Using MongoDB database is facilitated by library(jsonlite)and
library(mongolite).
Demo
library(mongolite)
library(tidyverse)
# Free book! https://jeroen.github.io/mongolite/
# look for sample collection "listingsAndReviews" on "sample_airbnb"
m <- mongo(
db = "sample_airbnb",
collection = "listingsAndReviews",
url = "mongodb+srv://phillyr:risawesome@phillyr-djozr.azure.mongodb.net/test?retryWrites=true",
verbose = T
)
# How many documents? i.e. SELECT COUNT(*) FROM listingsAndReviews
m$count('{}')
# Query only one, i.e. SELECT * FROM listingsAndReviews LIMIT 1
oneTrueListing <- m$find(fields = '{}', limit = 1)
# Is automatically a data.frame
class(oneTrueListing)
colnames(oneTrueListing)
# tibblify to view data easily
(oneTrueListing <- tibble::as_tibble(oneTrueListing))
# Using iterate to get 1 value as JSON (by passing automatic conversion to dataframe)
findOne_asJSON <- m$iterate()
oneTrueListing_json <- findOne_asJSON$json(1)
# Print as pretty
jsonlite::prettify(oneTrueListing_json)
# let's remove summary, space, description, neighborhood_overview, and notes because they really long texts
jsonlite::prettify(
m$iterate(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }',
limit = 1)$json(1)
)
# Some of the fields are "complex". Let's explore
simpleListing <- m$find(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }'
)
# What is the class of each column in data.frame?
sapply(simpleListing, function(x) {paste(class(x), collapse = "/")})
# Which column is not a vector?
colnames(simpleListing)[!sapply(simpleListing, is.vector)]
# Example of nested document
jsonlite::prettify(
m$iterate(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }',
limit = 1)$json(1)
)
# Watch what happens to "price" and "images"
(nestedObjects <- m$find(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }'
))
class(nestedObjects$images)
nestedObjects$images
# flattens non-recursively, leading to 4-col tibble with "images" column being a data.frame
as_tibble(nestedObjects)
sapply(as_tibble(nestedObjects), function(x) {paste(class(x), collapse = "/")})
# What if the value was an array? (e.g. "amenities")
class(simpleListing$amenities)
(nestedArray <- m$find(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"_id" : true, "beds" : true, "price": true, "images" : true, "amenities" : true }'
))
class(nestedArray$amenities)
nestedArray$amenities
# flattens non-recursively, leading to 5-col tibble with "images" column being a data.frame,
# and "amenties" as a list
as_tibble(nestedArray)
sapply(as_tibble(nestedArray), function(x) {paste(class(x), collapse = "/")})

More Related Content

What's hot

Semantic web
Semantic webSemantic web
Semantic web
tariq1352
 
XML Schema
XML SchemaXML Schema
XML Schema
Kumar
 
Json
JsonJson
DTD
DTDDTD
DTD
Kumar
 
SWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mappingSWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mapping
Mariano Rodriguez-Muro
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
Frank van Harmelen
 
Xml schema
Xml schemaXml schema
Xml schema
Harry Potter
 
XML's validation - XML Schema
XML's validation - XML SchemaXML's validation - XML Schema
XML's validation - XML Schema
videde_group
 
Xsd examples
Xsd examplesXsd examples
Xsd examples
Bình Trọng Án
 
SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2
Mariano Rodriguez-Muro
 
Xml schema
Xml schemaXml schema
Xml schema
Prabhakaran V M
 
SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1
Mariano Rodriguez-Muro
 
Publishing xml
Publishing xmlPublishing xml
Publishing xml
Kumar
 
Xml schema
Xml schemaXml schema
Xml schema
Akshaya Akshaya
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
Abhra Basak
 
02 xml schema
02 xml schema02 xml schema
02 xml schema
Baskarkncet
 
XSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshopXSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshop
nunoalexandrelopes
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
krisztianbalog
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
Mariano Rodriguez-Muro
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Tara Athan
 

What's hot (20)

Semantic web
Semantic webSemantic web
Semantic web
 
XML Schema
XML SchemaXML Schema
XML Schema
 
Json
JsonJson
Json
 
DTD
DTDDTD
DTD
 
SWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mappingSWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mapping
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
 
Xml schema
Xml schemaXml schema
Xml schema
 
XML's validation - XML Schema
XML's validation - XML SchemaXML's validation - XML Schema
XML's validation - XML Schema
 
Xsd examples
Xsd examplesXsd examples
Xsd examples
 
SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2
 
Xml schema
Xml schemaXml schema
Xml schema
 
SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1
 
Publishing xml
Publishing xmlPublishing xml
Publishing xml
 
Xml schema
Xml schemaXml schema
Xml schema
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
02 xml schema
02 xml schema02 xml schema
02 xml schema
 
XSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshopXSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshop
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
 

Similar to JSON and MongoDB in R

Json demo
Json demoJson demo
Json demo
Sreeni I
 
Json
JsonJson
Text processing by Rj
Text processing by RjText processing by Rj
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
Ryan B Harvey, CSDP, CSM
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
Ramamohan Chokkam
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
AmitSharma397241
 
Chap1introppt2php(finally done)
Chap1introppt2php(finally done)Chap1introppt2php(finally done)
Chap1introppt2php(finally done)
monikadeshmane
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
Faysal Shaarani (MBA)
 
JSON.pptx
JSON.pptxJSON.pptx
JSON.pptx
TilakaRt
 
RedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document storeRedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document store
Redis Labs
 
Json
JsonJson
Json
soumya
 
R training2
R training2R training2
R training2
Hellen Gakuruh
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Ontico
 
Json
JsonJson
PhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure IntroPhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure Intro
Leon Kim
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
Kazuki Yoshida
 
JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're going
Chase Tingley
 
javascript
javascript javascript
javascript
Kaya Ota
 
Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !
Alexander Korotkov
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
Raj vardhan
 

Similar to JSON and MongoDB in R (20)

Json demo
Json demoJson demo
Json demo
 
Json
JsonJson
Json
 
Text processing by Rj
Text processing by RjText processing by Rj
Text processing by Rj
 
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
Chap1introppt2php(finally done)
Chap1introppt2php(finally done)Chap1introppt2php(finally done)
Chap1introppt2php(finally done)
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
JSON.pptx
JSON.pptxJSON.pptx
JSON.pptx
 
RedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document storeRedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document store
 
Json
JsonJson
Json
 
R training2
R training2R training2
R training2
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
 
Json
JsonJson
Json
 
PhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure IntroPhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure Intro
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're going
 
javascript
javascript javascript
javascript
 
Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
 

Recently uploaded

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 

Recently uploaded (20)

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 

JSON and MongoDB in R

  • 1. JSON and MongoDB in R PhillyR x Philadelphia MongoDB User Group - May 2019 “the most ambitious crossover event in history” May 2019
  • 2. Many thanks to our sponsors
  • 4. ❖ Turing complete ❖ High-level ❖ Functional (at its heart) ❖ 1-indexed ❖ Everything is an object Language Features For more technical and rigorous introduction to R language, read Hadley Wickham’s (new) Advanced R https://adv-r.hadley.nz/
  • 5. ❖ R has the typical data types you normally find in other languages Boolean T, F, TRUE, FALSE Integer 1L, 1.2e1L, 0xDEADL Double (floating point) 3.14, 1.23e1, 0xDEADBEEF Character “a”, ‘b’, “c”, ‘d” Complex 1+2i Raw (for binary data) 00 12 34 Variables
  • 6. ❖ Variable assignment by reference Variables
  • 7. ❖ Copy-on-modify (aka immutability) ❖ aka “R is slow” Variables
  • 8. ❖ x and y are both vectors. ❖ Think of vector as being composted of scalar of same type (integer, double, boolean, or character) ≈ array of primitive in other languages ❖ Scalar is a vector of length 1 ❖ A vector of length 1 ≈ primitive in Python ❖ i.e. [1] ≈ 1 (in R, c(1) == 1) Data structure - vectors ** Very over-simplified and crude (and incorrect) explanations / comparisons in order to prime you for the upcoming slides on R → JSON. Things are a lot more subtle. If you love computer science concepts and want to learn more, seriously take a look at Advanced R book
  • 9. ❖ Just like how variable name points to values, elements of a vector can point to values, but in this case it would be a list ❖ ≈ array of variables ?! ** Data structure - lists ** These are just approximation / alternative explanation. RTARB (Read The Advanced R Book)!
  • 10. ❖ This allows you to have heterogenous values (different types) for each element of a list ❖ “variable name” concept applies here ❖ Note that here we use an equal sign instead of an arrow Data structure - lists
  • 11. ❖ An element in a list can be anything, even another list. Data structure - lists
  • 12. ❖ An element in a list can be anything, even another list. Data structure - lists (inner) List with two elements, each with vector of different size and data type Nested elements in list are easily accessible by indexing sequentially
  • 13. ❖ An element in a list can be anything, even another list. Data structure - lists Vector of length 1
  • 14. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure
  • 15. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure
  • 16. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure { “innerElem1” : [1, 2, 3], “innerElem2” : [“A”, “B”, “C”, “D”, “E”] }
  • 17. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure { “innerElem1” : [1, 2, 3], “innerElem2” : [“A”, “B”, “C”, “D”, “E”] } We shall call this curly bracket-y format – JSON!
  • 18. R - JSON “rules” { “outerElem1” : … , “outerElem2” : … } ❖ Named lists become JSON object ❖ Unnamed list becomes JSON array of array elements [ [ … ], [ … ] ] ❖ Anything that is / can be named → { “name” : <<value>> }
  • 19. R - JSON “rules” [true, false, true, false] ❖ R data types are intuitively converted Booleans T, F, TRUE, FALSE Integers 1L, 1.2e1L, 0xDEADL Double (floating point) 3.14, 1.23e1, 0xDEADBEEF Character “a”, ‘b’, “c”, “d” Complex 1+2i Raw (for binary data) 00 12 34 [1, 12, 57005] [3.14, 12.3, 3735928559] [“a”, ‘b’, “c”, “d”] ?? ??
  • 20. R - JSON problems ❖ Should R vector of length 1 be a JSON array? R object JSON object “a” “a” [“a”]
  • 21. R - JSON problems ❖ Should R vector of length 1 be a JSON array? ❖ JSON to R conversion is more troubling! R object JSON object “a” “a” [“a”] “a” “a” [“a”]
  • 22. R - JSON problems ❖ In R, there are NA and NULL values for different types of missingness. How would this represent this in JSON? ❖ Conversely, how do you represent JSON null into R object? ❖ How do you represent more complex R objects, like complex, raw, factor, Date, and POSIXt? ❖ How do you represent higher dimension R objects, like matrix and data.frame? ❖ How do you represent other metadata associated with complex R objects, like factor levels, row names for data.frame?
  • 23. R - JSON conversion using library(jsonlite) ❖ toJSON(…) to convert R object into JSON ❖ fromJSON(…) to convert JSON (represented as R’s character) into R object ❖ Automatic conversion of complex R objects with consistent default rule settings. These can be overwritten if neccessary - R vectors are always converted to JSON array. - Complex R & JSON objects are mapped in R-user friendly way ❖ See vignette https://cran.r-project.org/web/packages/jsonlite/vignettes/json- aaquickstart.html ❖ library(rjson) also exists, but library(jsonlite) is more widely used today due to more consistent rule and better maintenance.
  • 24. Why use R and JSON? ❖ JSON is widely used for language / technology agnostic data transfer format ❖ Use library(httr), library(opencpu), library(plumber) to query HTTP API that returns results as JSON or productionize R code as HTTP API ❖ NoSQL databases often use JSON-like data format for transferring data between DB server and your R session. ❖ Using MongoDB database is facilitated by library(jsonlite)and library(mongolite).
  • 25. Demo
  • 26. library(mongolite) library(tidyverse) # Free book! https://jeroen.github.io/mongolite/ # look for sample collection "listingsAndReviews" on "sample_airbnb" m <- mongo( db = "sample_airbnb", collection = "listingsAndReviews", url = "mongodb+srv://phillyr:risawesome@phillyr-djozr.azure.mongodb.net/test?retryWrites=true", verbose = T ) # How many documents? i.e. SELECT COUNT(*) FROM listingsAndReviews m$count('{}') # Query only one, i.e. SELECT * FROM listingsAndReviews LIMIT 1 oneTrueListing <- m$find(fields = '{}', limit = 1) # Is automatically a data.frame class(oneTrueListing) colnames(oneTrueListing) # tibblify to view data easily (oneTrueListing <- tibble::as_tibble(oneTrueListing)) # Using iterate to get 1 value as JSON (by passing automatic conversion to dataframe) findOne_asJSON <- m$iterate() oneTrueListing_json <- findOne_asJSON$json(1) # Print as pretty jsonlite::prettify(oneTrueListing_json)
  • 27. # let's remove summary, space, description, neighborhood_overview, and notes because they really long texts jsonlite::prettify( m$iterate( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }', limit = 1)$json(1) ) # Some of the fields are "complex". Let's explore simpleListing <- m$find( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }' ) # What is the class of each column in data.frame? sapply(simpleListing, function(x) {paste(class(x), collapse = "/")}) # Which column is not a vector? colnames(simpleListing)[!sapply(simpleListing, is.vector)]
  • 28. # Example of nested document jsonlite::prettify( m$iterate( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }', limit = 1)$json(1) ) # Watch what happens to "price" and "images" (nestedObjects <- m$find( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }' )) class(nestedObjects$images) nestedObjects$images # flattens non-recursively, leading to 4-col tibble with "images" column being a data.frame as_tibble(nestedObjects) sapply(as_tibble(nestedObjects), function(x) {paste(class(x), collapse = "/")})
  • 29. # What if the value was an array? (e.g. "amenities") class(simpleListing$amenities) (nestedArray <- m$find( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"_id" : true, "beds" : true, "price": true, "images" : true, "amenities" : true }' )) class(nestedArray$amenities) nestedArray$amenities # flattens non-recursively, leading to 5-col tibble with "images" column being a data.frame, # and "amenties" as a list as_tibble(nestedArray) sapply(as_tibble(nestedArray), function(x) {paste(class(x), collapse = "/")})