SlideShare a Scribd company logo
1 of 29
Download to read offline
JSON and MongoDB in R
PhillyR x Philadelphia MongoDB User Group - May 2019
“the most ambitious crossover event in history”
May 2019
Many thanks to our sponsors
Introduction to R
❖ Turing complete
❖ High-level
❖ Functional (at its heart)
❖ 1-indexed
❖ Everything is an object
Language Features
For more technical and rigorous introduction to R language, read Hadley Wickham’s (new) Advanced R https://adv-r.hadley.nz/
❖ R has the typical data types you
normally find in other languages
Boolean
T, F, TRUE, FALSE
Integer
1L, 1.2e1L, 0xDEADL
Double (floating point)
3.14, 1.23e1, 0xDEADBEEF
Character
“a”, ‘b’, “c”, ‘d”
Complex
1+2i
Raw (for binary data)
00 12 34
Variables
❖ Variable assignment by reference
Variables
❖ Copy-on-modify (aka immutability)
❖ aka “R is slow”
Variables
❖ x and y are both vectors.
❖ Think of vector as being composted of scalar of same type
(integer, double, boolean, or character)
≈ array of primitive in other languages
❖ Scalar is a vector of length 1
❖ A vector of length 1 ≈ primitive in Python
❖ i.e. [1] ≈ 1
(in R, c(1) == 1)
Data structure - vectors
** Very over-simplified and crude (and incorrect) explanations / comparisons in order to prime you for the upcoming slides on R → JSON.
Things are a lot more subtle. If you love computer science concepts and want to learn more, seriously take a look at Advanced R book
❖ Just like how variable name points to
values, elements of a vector can point to
values, but in this case it would be a list
❖ ≈ array of variables ?! **
Data structure - lists
** These are just approximation / alternative explanation. RTARB (Read The Advanced R Book)!
❖ This allows you to have heterogenous
values (different types) for each element
of a list
❖ “variable name” concept applies here
❖ Note that here we use an equal sign
instead of an arrow
Data structure - lists
❖ An element in a list can be anything, even another list.
Data structure - lists
❖ An element in a list can be anything, even another list.
Data structure - lists
(inner) List with two
elements, each with vector
of different size and data
type
Nested elements in list are
easily accessible by
indexing sequentially
❖ An element in a list can be anything, even another list.
Data structure - lists
Vector of length 1
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
{
“innerElem1” : [1, 2, 3],
“innerElem2” : [“A”, “B”, “C”, “D”, “E”]
}
Data structure - lists
{
“outerElem1” : … ,
“outerElem2” : “This is complicated but flexible”
}
❖ Alternative way of looking at this complex structure
{
“innerElem1” : [1, 2, 3],
“innerElem2” : [“A”, “B”, “C”, “D”, “E”]
}
We shall call this curly bracket-y format – JSON!
R - JSON “rules”
{
“outerElem1” : … ,
“outerElem2” : …
}
❖ Named lists become JSON object
❖ Unnamed list becomes JSON array of array elements
[
[ … ],
[ … ]
]
❖ Anything that is / can be named → { “name” : <<value>> }
R - JSON “rules”
[true, false, true, false]
❖ R data types are intuitively converted
Booleans
T, F, TRUE, FALSE
Integers
1L, 1.2e1L, 0xDEADL
Double (floating point)
3.14, 1.23e1, 0xDEADBEEF
Character
“a”, ‘b’, “c”, “d”
Complex
1+2i
Raw (for binary data)
00 12 34
[1, 12, 57005]
[3.14, 12.3, 3735928559]
[“a”, ‘b’, “c”, “d”]
??
??
R - JSON problems
❖ Should R vector of length 1 be a JSON array?
R object JSON object
“a”
“a”
[“a”]
R - JSON problems
❖ Should R vector of length 1 be a JSON array?
❖ JSON to R conversion is more troubling!
R object JSON object
“a”
“a”
[“a”]
“a”
“a”
[“a”]
R - JSON problems
❖ In R, there are NA and NULL values for different types of missingness.
How would this represent this in JSON?
❖ Conversely, how do you represent JSON null into R object?
❖ How do you represent more complex R objects, like complex, raw, factor,
Date, and POSIXt?
❖ How do you represent higher dimension R objects, like matrix and
data.frame?
❖ How do you represent other metadata associated with complex R objects,
like factor levels, row names for data.frame?
R - JSON conversion
using library(jsonlite)
❖ toJSON(…) to convert R object into JSON
❖ fromJSON(…) to convert JSON (represented as R’s character) into R object
❖ Automatic conversion of complex R objects with consistent default rule settings.
These can be overwritten if neccessary
- R vectors are always converted to JSON array.
- Complex R & JSON objects are mapped in R-user friendly way
❖ See vignette https://cran.r-project.org/web/packages/jsonlite/vignettes/json-
aaquickstart.html
❖ library(rjson) also exists, but library(jsonlite) is more widely used today
due to more consistent rule and better maintenance.
Why use R and JSON?
❖ JSON is widely used for language / technology agnostic data transfer format
❖ Use library(httr), library(opencpu), library(plumber) to query
HTTP API that returns results as JSON or productionize R code as HTTP API
❖ NoSQL databases often use JSON-like data format for transferring data between
DB server and your R session.
❖ Using MongoDB database is facilitated by library(jsonlite)and
library(mongolite).
Demo
library(mongolite)
library(tidyverse)
# Free book! https://jeroen.github.io/mongolite/
# look for sample collection "listingsAndReviews" on "sample_airbnb"
m <- mongo(
db = "sample_airbnb",
collection = "listingsAndReviews",
url = "mongodb+srv://phillyr:risawesome@phillyr-djozr.azure.mongodb.net/test?retryWrites=true",
verbose = T
)
# How many documents? i.e. SELECT COUNT(*) FROM listingsAndReviews
m$count('{}')
# Query only one, i.e. SELECT * FROM listingsAndReviews LIMIT 1
oneTrueListing <- m$find(fields = '{}', limit = 1)
# Is automatically a data.frame
class(oneTrueListing)
colnames(oneTrueListing)
# tibblify to view data easily
(oneTrueListing <- tibble::as_tibble(oneTrueListing))
# Using iterate to get 1 value as JSON (by passing automatic conversion to dataframe)
findOne_asJSON <- m$iterate()
oneTrueListing_json <- findOne_asJSON$json(1)
# Print as pretty
jsonlite::prettify(oneTrueListing_json)
# let's remove summary, space, description, neighborhood_overview, and notes because they really long texts
jsonlite::prettify(
m$iterate(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }',
limit = 1)$json(1)
)
# Some of the fields are "complex". Let's explore
simpleListing <- m$find(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }'
)
# What is the class of each column in data.frame?
sapply(simpleListing, function(x) {paste(class(x), collapse = "/")})
# Which column is not a vector?
colnames(simpleListing)[!sapply(simpleListing, is.vector)]
# Example of nested document
jsonlite::prettify(
m$iterate(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }',
limit = 1)$json(1)
)
# Watch what happens to "price" and "images"
(nestedObjects <- m$find(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }'
))
class(nestedObjects$images)
nestedObjects$images
# flattens non-recursively, leading to 4-col tibble with "images" column being a data.frame
as_tibble(nestedObjects)
sapply(as_tibble(nestedObjects), function(x) {paste(class(x), collapse = "/")})
# What if the value was an array? (e.g. "amenities")
class(simpleListing$amenities)
(nestedArray <- m$find(
query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`),
fields = '{"_id" : true, "beds" : true, "price": true, "images" : true, "amenities" : true }'
))
class(nestedArray$amenities)
nestedArray$amenities
# flattens non-recursively, leading to 5-col tibble with "images" column being a data.frame,
# and "amenties" as a list
as_tibble(nestedArray)
sapply(as_tibble(nestedArray), function(x) {paste(class(x), collapse = "/")})

More Related Content

What's hot

What's hot (20)

Semantic web
Semantic webSemantic web
Semantic web
 
XML Schema
XML SchemaXML Schema
XML Schema
 
Json
JsonJson
Json
 
DTD
DTDDTD
DTD
 
SWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mappingSWT Lecture Session 9 - RDB2RDF direct mapping
SWT Lecture Session 9 - RDB2RDF direct mapping
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
 
Xml schema
Xml schemaXml schema
Xml schema
 
XML's validation - XML Schema
XML's validation - XML SchemaXML's validation - XML Schema
XML's validation - XML Schema
 
Xsd examples
Xsd examplesXsd examples
Xsd examples
 
SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2SWT Lecture Session 11 - R2RML part 2
SWT Lecture Session 11 - R2RML part 2
 
Xml schema
Xml schemaXml schema
Xml schema
 
SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1
 
Publishing xml
Publishing xmlPublishing xml
Publishing xml
 
Xml schema
Xml schemaXml schema
Xml schema
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
02 xml schema
02 xml schema02 xml schema
02 xml schema
 
XSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshopXSPARQL CrEDIBLE workshop
XSPARQL CrEDIBLE workshop
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
SWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDFSWT Lecture Session 2 - RDF
SWT Lecture Session 2 - RDF
 
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation LanguagesSyntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
Syntax Reuse: XSLT as a Metalanguage for Knowledge Representation Languages
 

Similar to JSON and MongoDB in R

JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...Ryan B Harvey, CSDP, CSM
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college projectAmitSharma397241
 
Chap1introppt2php(finally done)
Chap1introppt2php(finally done)Chap1introppt2php(finally done)
Chap1introppt2php(finally done)monikadeshmane
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)Faysal Shaarani (MBA)
 
RedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document storeRedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document storeRedis Labs
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
 
PhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure IntroPhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure IntroLeon Kim
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into RKazuki Yoshida
 
JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingChase Tingley
 
javascript
javascript javascript
javascript Kaya Ota
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...Raj vardhan
 

Similar to JSON and MongoDB in R (20)

Json demo
Json demoJson demo
Json demo
 
Json
JsonJson
Json
 
Text processing by Rj
Text processing by RjText processing by Rj
Text processing by Rj
 
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
json.ppt download for free for college project
json.ppt download for free for college projectjson.ppt download for free for college project
json.ppt download for free for college project
 
Chap1introppt2php(finally done)
Chap1introppt2php(finally done)Chap1introppt2php(finally done)
Chap1introppt2php(finally done)
 
JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)JSON Data Parsing in Snowflake (By Faysal Shaarani)
JSON Data Parsing in Snowflake (By Faysal Shaarani)
 
JSON.pptx
JSON.pptxJSON.pptx
JSON.pptx
 
RedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document storeRedisConf17 - Redis as a JSON document store
RedisConf17 - Redis as a JSON document store
 
Json
JsonJson
Json
 
R training2
R training2R training2
R training2
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
 
Json
JsonJson
Json
 
PhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure IntroPhillyR 18-19 Kickoff - Data Structure Intro
PhillyR 18-19 Kickoff - Data Structure Intro
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
JLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're goingJLIFF: Where we are, and where we're going
JLIFF: Where we are, and where we're going
 
javascript
javascript javascript
javascript
 
Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !Oh, that ubiquitous JSON !
Oh, that ubiquitous JSON !
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

JSON and MongoDB in R

  • 1. JSON and MongoDB in R PhillyR x Philadelphia MongoDB User Group - May 2019 “the most ambitious crossover event in history” May 2019
  • 2. Many thanks to our sponsors
  • 4. ❖ Turing complete ❖ High-level ❖ Functional (at its heart) ❖ 1-indexed ❖ Everything is an object Language Features For more technical and rigorous introduction to R language, read Hadley Wickham’s (new) Advanced R https://adv-r.hadley.nz/
  • 5. ❖ R has the typical data types you normally find in other languages Boolean T, F, TRUE, FALSE Integer 1L, 1.2e1L, 0xDEADL Double (floating point) 3.14, 1.23e1, 0xDEADBEEF Character “a”, ‘b’, “c”, ‘d” Complex 1+2i Raw (for binary data) 00 12 34 Variables
  • 6. ❖ Variable assignment by reference Variables
  • 7. ❖ Copy-on-modify (aka immutability) ❖ aka “R is slow” Variables
  • 8. ❖ x and y are both vectors. ❖ Think of vector as being composted of scalar of same type (integer, double, boolean, or character) ≈ array of primitive in other languages ❖ Scalar is a vector of length 1 ❖ A vector of length 1 ≈ primitive in Python ❖ i.e. [1] ≈ 1 (in R, c(1) == 1) Data structure - vectors ** Very over-simplified and crude (and incorrect) explanations / comparisons in order to prime you for the upcoming slides on R → JSON. Things are a lot more subtle. If you love computer science concepts and want to learn more, seriously take a look at Advanced R book
  • 9. ❖ Just like how variable name points to values, elements of a vector can point to values, but in this case it would be a list ❖ ≈ array of variables ?! ** Data structure - lists ** These are just approximation / alternative explanation. RTARB (Read The Advanced R Book)!
  • 10. ❖ This allows you to have heterogenous values (different types) for each element of a list ❖ “variable name” concept applies here ❖ Note that here we use an equal sign instead of an arrow Data structure - lists
  • 11. ❖ An element in a list can be anything, even another list. Data structure - lists
  • 12. ❖ An element in a list can be anything, even another list. Data structure - lists (inner) List with two elements, each with vector of different size and data type Nested elements in list are easily accessible by indexing sequentially
  • 13. ❖ An element in a list can be anything, even another list. Data structure - lists Vector of length 1
  • 14. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure
  • 15. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure
  • 16. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure { “innerElem1” : [1, 2, 3], “innerElem2” : [“A”, “B”, “C”, “D”, “E”] }
  • 17. Data structure - lists { “outerElem1” : … , “outerElem2” : “This is complicated but flexible” } ❖ Alternative way of looking at this complex structure { “innerElem1” : [1, 2, 3], “innerElem2” : [“A”, “B”, “C”, “D”, “E”] } We shall call this curly bracket-y format – JSON!
  • 18. R - JSON “rules” { “outerElem1” : … , “outerElem2” : … } ❖ Named lists become JSON object ❖ Unnamed list becomes JSON array of array elements [ [ … ], [ … ] ] ❖ Anything that is / can be named → { “name” : <<value>> }
  • 19. R - JSON “rules” [true, false, true, false] ❖ R data types are intuitively converted Booleans T, F, TRUE, FALSE Integers 1L, 1.2e1L, 0xDEADL Double (floating point) 3.14, 1.23e1, 0xDEADBEEF Character “a”, ‘b’, “c”, “d” Complex 1+2i Raw (for binary data) 00 12 34 [1, 12, 57005] [3.14, 12.3, 3735928559] [“a”, ‘b’, “c”, “d”] ?? ??
  • 20. R - JSON problems ❖ Should R vector of length 1 be a JSON array? R object JSON object “a” “a” [“a”]
  • 21. R - JSON problems ❖ Should R vector of length 1 be a JSON array? ❖ JSON to R conversion is more troubling! R object JSON object “a” “a” [“a”] “a” “a” [“a”]
  • 22. R - JSON problems ❖ In R, there are NA and NULL values for different types of missingness. How would this represent this in JSON? ❖ Conversely, how do you represent JSON null into R object? ❖ How do you represent more complex R objects, like complex, raw, factor, Date, and POSIXt? ❖ How do you represent higher dimension R objects, like matrix and data.frame? ❖ How do you represent other metadata associated with complex R objects, like factor levels, row names for data.frame?
  • 23. R - JSON conversion using library(jsonlite) ❖ toJSON(…) to convert R object into JSON ❖ fromJSON(…) to convert JSON (represented as R’s character) into R object ❖ Automatic conversion of complex R objects with consistent default rule settings. These can be overwritten if neccessary - R vectors are always converted to JSON array. - Complex R & JSON objects are mapped in R-user friendly way ❖ See vignette https://cran.r-project.org/web/packages/jsonlite/vignettes/json- aaquickstart.html ❖ library(rjson) also exists, but library(jsonlite) is more widely used today due to more consistent rule and better maintenance.
  • 24. Why use R and JSON? ❖ JSON is widely used for language / technology agnostic data transfer format ❖ Use library(httr), library(opencpu), library(plumber) to query HTTP API that returns results as JSON or productionize R code as HTTP API ❖ NoSQL databases often use JSON-like data format for transferring data between DB server and your R session. ❖ Using MongoDB database is facilitated by library(jsonlite)and library(mongolite).
  • 25. Demo
  • 26. library(mongolite) library(tidyverse) # Free book! https://jeroen.github.io/mongolite/ # look for sample collection "listingsAndReviews" on "sample_airbnb" m <- mongo( db = "sample_airbnb", collection = "listingsAndReviews", url = "mongodb+srv://phillyr:risawesome@phillyr-djozr.azure.mongodb.net/test?retryWrites=true", verbose = T ) # How many documents? i.e. SELECT COUNT(*) FROM listingsAndReviews m$count('{}') # Query only one, i.e. SELECT * FROM listingsAndReviews LIMIT 1 oneTrueListing <- m$find(fields = '{}', limit = 1) # Is automatically a data.frame class(oneTrueListing) colnames(oneTrueListing) # tibblify to view data easily (oneTrueListing <- tibble::as_tibble(oneTrueListing)) # Using iterate to get 1 value as JSON (by passing automatic conversion to dataframe) findOne_asJSON <- m$iterate() oneTrueListing_json <- findOne_asJSON$json(1) # Print as pretty jsonlite::prettify(oneTrueListing_json)
  • 27. # let's remove summary, space, description, neighborhood_overview, and notes because they really long texts jsonlite::prettify( m$iterate( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }', limit = 1)$json(1) ) # Some of the fields are "complex". Let's explore simpleListing <- m$find( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"summary" : false, "space" : false, "description" : false, "neighborhood_overview" : false, "notes" : false }' ) # What is the class of each column in data.frame? sapply(simpleListing, function(x) {paste(class(x), collapse = "/")}) # Which column is not a vector? colnames(simpleListing)[!sapply(simpleListing, is.vector)]
  • 28. # Example of nested document jsonlite::prettify( m$iterate( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }', limit = 1)$json(1) ) # Watch what happens to "price" and "images" (nestedObjects <- m$find( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"_id" : true, "beds" : true, "price": true, "images" : true }' )) class(nestedObjects$images) nestedObjects$images # flattens non-recursively, leading to 4-col tibble with "images" column being a data.frame as_tibble(nestedObjects) sapply(as_tibble(nestedObjects), function(x) {paste(class(x), collapse = "/")})
  • 29. # What if the value was an array? (e.g. "amenities") class(simpleListing$amenities) (nestedArray <- m$find( query = sprintf('{ "_id": "%s" }', oneTrueListing$`_id`), fields = '{"_id" : true, "beds" : true, "price": true, "images" : true, "amenities" : true }' )) class(nestedArray$amenities) nestedArray$amenities # flattens non-recursively, leading to 5-col tibble with "images" column being a data.frame, # and "amenties" as a list as_tibble(nestedArray) sapply(as_tibble(nestedArray), function(x) {paste(class(x), collapse = "/")})