SlideShare a Scribd company logo
1 of 51
Download to read offline
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Working with deeply nested documents in Apache Solr
Anshum Gupta, Alisa Zhila
IBM Watson
3
Anshum Gupta
• Apache Lucene/Solr committer and PMC member
• Search guy @ IBM Watson.
• Interested in search and related stuff.
• Apache Lucene since 2006 and Solr since 2010.
4
Alisa Zhila
• Apache Lucene/Solr supporter :)
• Natural Language Processing technologies @ IBM Watson
• Interested in search and related stuff
5
Agenda
• Hierarchical Data/Nested Documents
• Indexing Nested Documents
• Querying Nested Documents
• Faceting on Nested Documents
Hierarchical Documents
7
• Social media comments, Email threads,
Annotated data - AI
• Relationship between documents
• Possibility to flatten
Need for nested data
EXAMPLE: Blog Post with Comments
Peter Navarro outlines the Trump economic plan
Tyler Cowen, September 27, 2016 at 3:07am
Trump proposes eliminating America’s $500 billion
trade deficit through a combination of increased
exports and reduced imports.
1 Ray Lopez September 27, 2016 at 3:21 am
I’ll be the first to say this, but the analysis is flawed.
{negative}
2 Brian Donohue September 27, 2016 at 9:20 am
The math checks out. Solid.
{positive}
examples from http://marginalrevolution.com
8
• Can not flatten, need to retain context
• Relationship between documents
• Get all 'positive comments' to 'posts about
Trump' -- IMPOSSIBLE!!!
Nested Documents
EXAMPLE: Data Flattening
Title: Peter Navarro outlines the Trump economic plan
Author: Tyler Cowen
Date: September 27, 2016 at 3:07am
Body: Trump proposes eliminating America’s $500 billion
trade deficit through a combination of increased exports and
reduced imports.
Comment_authors: [Ray Lopez, Brian Donohue]
Comment_dates: [September 27, 2016 at 3:21 am,
September 27, 2016 at 9:20 am]
Comment_texts: ["I’ll be the first to say this, but the analysis is
flawed.", "The math checks out. Solid."]
Comment_sentiments: [negative, positive]
9
• Can not flatten, need to retain context
• Relationship between documents
• Get all 'positive comments' to 'posts about
Trump' -- POSSIBLE!!! (stay tuned)
Nested Documents
EXAMPLE: Hierarchical Documents
Type: Post
Title: Peter Navarro outlines the Trump economic plan
Author: Tyler Cowen
Date: September 27, 2016 at 3:07am
Body: Trump proposes eliminating America’s $500 billion
trade deficit through a combination of increased exports and
reduced imports.
Type: Comment
Author: Ray Lopez
Date: September 27, 2016 at 3:21 am
Text: I’ll be the first to say this, but the analysis is flawed.
Sentiment: negative
Type: Comment
Author: Brian Donohue
Date: September 27, 2016 at 9:20 am
Text: The math checks out. Solid.
Sentiment: positive
10
• Blog Post Data with Comments and Replies
from http://marginalrevolution.com (cured)
• 2 posts, 2-3 comments per post, 0-3 replies
per comment
• Extracted keywords & sentiment data
• 4 levels of "nesting"
• Too big to show on slides
• Data + Scripts + Demo Queries:
• https://github.com/alisa-ipn/solr-
revolution-2016-nested-demo
Running Example
Indexing Nested Documents
12
• Nested XML
• JSON Documents
• Add _childDocument_ tags for all children
• Pre-process field names to FQNs
• Lose information, or add that as meta-data during pre-processing
• JSON Document endpoint (6x only) - /update/json/docs
• Field name mappings
• Child Document splitting - Enhanced support coming soon.
Sending Documents to Solr
13
solr-6.2.1$ bin/post -c demo-xml ./data/example-data.xml
Sending Documents to Solr: Nested XML
<add>
<doc>
<field name="type">post</field>
<field name="author"> "Alex Tabarrok"</field>
<field name="title">"The Irony of Hillary Clinton’s Data Analytics"</
field>
<field name="body">"Barack Obama’s campaign adopted data but
Hillary Clinton’s campaign has been molded by data from birth."</field>
<field name="id">"12015-24204"</field>
<doc>
<field name="type">comment</field>
<field name="author">"Todd"</field>
<field name="text">"Clinton got out data-ed and out organized in
2008 by Obama. She seems at least to learn over time, and apply the
lessons learned to the real world."</field>
<field name="sentiment">"positive"</field>
<field name="id">"29798-24171"</field>
<doc>
<field name="type">reply</field>
<field name="author">"The Other Jim"</field>
<field name="text">"No, she lost because (1) she is thoroughly
detested person and (2) the DNC decided Obama should therefore
win."</field>
<field name="sentiment">"negative"</field>
<field name="id">"29798-21232"</field>
</doc>
</doc>
</doc>
</add>
14
• Add _childDocument_ tags for all children
• Pre-process field names to FQNs
• Lose information, or add that as meta-data during pre-processing
solr-6.2.1$ bin/post -c demo-solr-json ./data/small-example-data-solr.json -format solr
Sending Documents to Solr: JSON Documents
[{ "path": "1.posts",
"id": "28711",
"author": "Alex Tabarrok",
"title": "The Irony of Hillary Clinton’s Data Analytics",
"body": "Barack Obama’s campaign adopted data but Hillary Clinton’s campaign
has been molded by data from birth.",
"_childDocuments_": [
{
"path": "2.posts.comments",
"id": "28711-19237",
"author": "Todd",
"text": "Clinton got out data-ed and out organized in 2008 by Obama. She
seems at least to learn over time, and apply the lessons learned to the real world.",
"sentiment": "positive",
"_childDocuments_": [
{
"path": "3.posts.comments.replies",
"author": "The Other Jim",
"id": "28711-12444",
"sentiment": "negative",
"text": "No, she lost because (1) she is thoroughly detested person and
(2) the DNC decided Obama should therefore win."
}]}]}]
15
• JSON Document endpoint (6x only) - /update/json/docs
• Field name mappings
• Child Document splitting - Enhanced support coming soon.
solr-6.2.1$ curl 'http://localhost:8983/solr/gettingstarted/update/json/docs?
split=/|/posts|/posts/comments|/posts/comments/replies&commit=true' --data-
binary @small-example-data.json -H ‘Content-type:application/json'
NOTE: All documents must contain a unique ID.
Sending Documents to Solr: JSON Endpoint
16
• Update Request Processors don’t work with nested documents
• Example:
• UUID update processor does not auto-add an id for a child document.
• Workaround:
• Take responsibility at the client layer to handle the computation for nested
documents.
• Change the update processor in Solr to handle nested documents.
Update Processors and Nested Documents
17
• The entire block needs reindexing
• Forgot to add a meta-data field that might be useful? Complete reindex
• Store everything in Solr IF
• it’s too expensive to reconstruct the doc from original data source
• No access to data anymore e.g. streaming data
Re-Indexing Your Documents
18
• Various ways to index nested documents
• Need to re-index entire block
Nested Document Indexing Summary
Let’s ask some interesting questions
20
{
"path":["4.posts.comments.replies.keywords"],
"text":["Trump"]},
{
"path":["3.posts.comments.keywords"],
"text":["Trump"]},
{
"path":["2.posts.keywords"],
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"text":["Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports."],
"path":["1.posts"]},
{
"text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."],
"path":["2.posts.comments"]}
Easy question first
Find all documents that mention Trump
q=text:Trump
21
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."],
"path":["2.posts.comments"]},
{
"text":["No one goes to Clinton rallies while tens of thousands line up to see Trump, data-mining leads to a fantasy view of the World."],
"path":["2.posts.comments"]}
Returning certain types of documents
Find all comments and replies that mention Trump
q=(path:2.posts.comments OR path:3.posts.comments.replies) AND text:Trump
Recipe:
At the data pre-processing stage, add a field that indicates document type
and also its path in the hierarchy (-- stay tuned):
22
{
"path":["3.posts.comments.keywords"],
"sentiment":["positive"],
"text":["Hillary"]},
{
"path":["4.posts.comments.replies.keywords"],
"sentiment":["negative"],
"text":["Hillary"]},
{
"path":["2.posts.keywords"],
"text":["Hillary"]}
Returning similar type from different level
Find all keywords that are Hillary
q=path:*.keywords AND text:Hillary
Recipe:
Use wild-cards in the field that stores the hierarchy path
Cross-Level Querying
24
{
"path":["3.posts.comments.keywords"],
"sentiment":["positive"],
"text":["Hillary"]},
{
"path":["4.posts.comments.replies.keywords"],
"sentiment":["negative"],
"text":["Hillary"]},
{
"path":["2.posts.keywords"],
"text":["Hillary"]}
Recap so far...
Find all keywords that are Hillary
q=path:*.keywords AND text:Hillary
We're querying precisely for documents
which we provide a search condition for
Query
Level 3
Result
Level 3
Query
Level 4
Result
Level 4
Query
Level 2
Result
Level 2
25
Returning parents by querying children:
Block Join Parent Query
Find all comments whose keywords detected positive sentiment towards Hillary
q={!parent which="path:2.posts.comments"}path:3.posts.comments.keywords AND text:Hillary AND sentiment:positive
Query
Level 3
Result
Level 2
{
"author":["Brian Donohue"],
"text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering,
but he was actually able to find his feet and score some points."],
"path":["2.posts.comments"]},
{
"author":["Todd"],
"text":["Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to
learn over time, and apply the lessons learned to the real world."],
"path":["2.posts.comments"]}
26
{
"sentiment":["negative"],
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"sentiment":["neutral"],
"text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S.
asset values?"],
"path":["3.posts.comments.replies"]},
{
"sentiment":["positive"],
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"]}
Returning children by querying parents:
Block Join Child Query
Find replies to negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies
Query
Level 2
Result
Level 3
27
{
"sentiment":["negative"],
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"sentiment":["neutral"],
"text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S.
asset values?"],
"path":["3.posts.comments.replies"]},
{
"sentiment":["positive"],
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"]}
Returning children by querying parents:
Block Join Child Query
Find replies to negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies
Query
Level 2
Result
Level 3
Block Join Child Query + Filtering Query
A bit counterintuitive and non-symmetrical to the BJPQ
28
{
"path":["4.posts.comments.replies.keywords"],
"id":"17413-13550",
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"],
"id":"17413-66188"},
{
"path":["3.posts.comments.keywords"],
"id":"12413-12487",
"text":["Hillary"]},
{
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"],
"id":"12413-10998"}
Returning all document's descendants
Block Join Child Query
Find all descendants of negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative
Query
Level 2
Results
Level 3
Results
Level 4
29
Returning all document's descendants
Block Join Child Query
Find all descendants of negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative
Query
Level 2
Results
Level 3
Results
Level 4
{
"path":["4.posts.comments.replies.keywords"],
"id":"17413-13550",
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"],
"id":"17413-66188"},
{
"path":["3.posts.comments.keywords"],
"id":"12413-12487",
"text":["Hillary"]},
{
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"],
"id":"12413-10998"}
Issue: no grouping by parent
What if we want to bring the whole sub-structure?
30
Find all negative comments and return them with all their descendants
q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.*]
Query
Level 2
Result
Level 2
sub-
hierarchy
Returning document with all descendants:
ChildDocTransformer
{
"sentiment":["negative"],
"text":["I’ll be the first to say this, but the analysis is flawed."],
"path":["2.posts.comments"],
"_childDocuments_":[
{
"path":["4.posts.comments.replies.keywords"],
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"path":["4.posts.comments.replies.keywords"],
"text":["U.S."]},
{
"text":["So then I guess he will also eliminate the current account surplus? What
will happen to U.S. asset values?"],
"path":["3.posts.comments.replies"]}
]
},
...
Issue: the "sub-hierarchy" is flat
• Returns all descendant documents along with the queried document
• flattens the sub-hierarchy
• Workarounds:
• Reconstruct the document using path ("path":["3.posts.comments.replies"])
information in case you want the entire subtree (result post-processing)
• use childFilter in case you want a specific level
31
“This transformer returns all descendant documents of each parent document matching your query in
a flat list nested inside the matching parent document." (ChildDocTransformer cwiki)
Returning document with all descendants:
ChildDocTransformer
32
Find all negative comments and return them with all replies to them
q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.*
childFilter=path:3.posts.comments.replies]
{
"sentiment":["negative"],
"text":["I’ll be the first to say this, but the analysis is flawed."],
"path":["2.posts.comments"],
"_childDocuments_":[
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is
funnier."],
"path":["3.posts.comments.replies"]},
{
"text":["So then I guess he will also eliminate the current account surplus? What
will happen to U.S. asset values?"],
"path":["3.posts.comments.replies"]}
]
},
...
Returning document with specific descendants:
ChildDocTransformer + childFilter
Query
Level 2:comments
Result
Level 2:comments
+ Level 3:replies
33
Find all negative comments and return them with all their descendants that mention Trump
q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.* childFilter=text:Trump]
{
"sentiment":["negative"],
"text":["I’ll be the first to say this, but the analysis is flawed."],
"path":["2.posts.comments"],
"_childDocuments_":[
{
"path":["4.posts.comments.replies.keywords"],
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is
funnier."],
"path":["3.posts.comments.replies"]}
]
},
...
Returning document with queried descendants:
ChildDocTransformer + childFilter
Query
Level 2:comments
Result
Level 2:comments
+ sub-levels
Issue: cannot use boolean expressions in childFilter query
34
Cross-Level Querying Mechanisms:
• Block Join Parent Query
• Block Join Children Query
• ChildDocTransformer
Good points:
• overlapping & complementary features
• good capabilities of querying direct ancestors/descendants
• possible to query on siblings of different type
Drawbacks:
• need for data-preprocessing for better querying flexibility
• limited support of querying over non-directly related branches (overcome with graphs?)
• flattening nested data (additional post-processing is needed for reconstruction)
Nested Document Querying Summary
Faceting on Nested Documents
36
• Solr allows faceting on nested documents!
• Two mechanisms for faceting:
• Faceting with JSON Facet API (since Solr 5.3)
• Block Join Faceting (since Solr 5.5)
Faceting on Nested Documents
37
q=path:2.posts.comments AND sentiment:positive&
json.facet={
most_liked_authors : {
type: terms,
field: author,
domain: { blockParent : "path:1.posts"}
}}
Faceting on parents by descendants
JSON Facet API: Parent Domain
Count authors of the posts that received positive comments
"most_liked_authors":{
"buckets":[
{
"val":"Alex Tabarrok",
"count":1},
{
"val":"Tyler Cowen",
"count":1}
]
}
Query
Level 2
Facet
Level 1
38
Faceting on descendants by ancestors
JSON Facet API: Child Domain
Distribution of keywords that appear in comments and replies by the top-level posts
Query
Level 1
Facet
Descendant
Levels
"top_keywords":{
"buckets":[{
"val":"hillary",
"count":4,
"counts_by_posts":2},
{
"val":"trump",
"count":3,
"counts_by_posts":2},
{
"val":"dnc",
"count":1,
"counts_by_posts":1},
{
"val":"obama",
"count":2,
"counts_by_posts":1},
{
"val":"u.s",
"count":1,
"counts_by_posts":1}
]}
39
q=path:1.posts&rows=0&
json.facet={
filter_by_child_type :{
type:query,
q:"path:*comments*keywords",
domain: { blockChildren : "path:1.posts" },
facet:{
top_keywords : {
type: terms,
field: text,
sort: "counts_by_posts desc",
facet: {
counts_by_posts: "unique(_root_)"
}}}}}
Faceting on descendants by ancestors
JSON Facet API: Child Domain
Distribution of keywords that appear in comments and replies by the top-level posts
Query
Level 1
Facet
Descendant
Levels
40
Faceting on descendants by top-level ancestor
JSON Facet API: Child Domain
Distribution of keywords that appear in comments and replies by the top-level posts
Query
Level 1
Facet
Descendant
Levels
Issue: only the top-ancestor gets the unique "_root_" field by default
q=path:1.posts&rows=0&
json.facet={
filter_by_child_type :{
type:query,
q:"path:*comments*keywords",
domain: { blockChildren : "path:1.posts" },
facet:{
top_keywords : {
type: terms,
field: text,
sort: "counts_by_posts desc",
facet: {
counts_by_posts: "unique(_root_)"
}}}}}
41
q=path:2.posts.comments&rows=0&
json.facet={
filter_by_child_type :{
type:query,
q:"path:*comments*keywords",
domain: { blockChildren : "path:2.posts.comments" },
facet:{
top_keywords : {
type: terms,
field: text,
sort: "counts_by_comments desc",
facet: {
counts_by_comments: "unique(2.posts.comments-id)"
}}}}}
Faceting on descendants by intermediate ancestors
JSON Facet API: Child Domain + unique fields
Distribution of keywords that appear in comments and replies by the comments
Query
Level 2
Facet
Descendant
Levels
At pre-processing, introduce unique fields for each level
42
Faceting on descendants by intermediate ancestors
JSON Facet API: Child Domain + unique fields
Distribution of keywords that appear in comments and replies by the comments
Query
Level 2
Facet
Descendant
Levels
"top_keywords":{
"buckets":[{
"val":"Hillary",
"count":4,
"counts_by_comments":3},
{
"val":"Trump",
"count":3,
"counts_by_comments":3},
{
"val":"DNC",
"count":1,
"counts_by_comments":1},
{
"val":"Obama",
"count":2,
"counts_by_comments":1},
{
"val":"U.S.",
"count":1,
"counts_by_comments":1}
]}
Now let's try the same using Block Join Faceting
44
• Experimental Feature
• Needs to be turned on explicitly in solrconfig.xml
More info: https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting
Block Join Faceting
45
bjqfacet?q={!parent which=path:2.posts.comments}
path:*.comments*keywords&rows=0&facet=true&child.facet.field=text
Faceting on descendants by ancestors #2:
Block Join Faceting on Children Domain
Distribution of keywords that appear in comments and replies by the comments
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
Query
Level 2
Facet
Descendant
Levels
46
bjqfacet?q={!parent which=path:2.posts.comments}
path:*.comments*keywords&rows=0&facet=true&child.facet.field=text
Faceting on descendants by ancestors #2:
Block Join Faceting on Children Domain
Distribution of keywords that appear in comments and replies by the comments
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
Query
Level 2
Facet
Descendant
Levels
bjqfacet request handler instead of query
47
Output Comparison
Block Join Facet JSON Facet API
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
"top_keywords":{
"buckets":[{
"val":"Hillary",
"count":4,
"counts_by_comments":3},
{
"val":"Trump",
"count":3,
"counts_by_comments":3},
{
"val":"DNC",
"count":1,
"counts_by_comments":1},
{
"val":"Obama",
"count":2,
"counts_by_comments":1},
{
"val":"U.S.",
"count":1,
"counts_by_comments":1}
]}
Distribution of keywords that appear in comments and replies by the comments
48
Output Comparison
Block Join Facet JSON Facet API
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
"top_keywords":{
"buckets":[{
"val":"Hillary",
"count":4,
"counts_by_comments":3},
{
"val":"Trump",
"count":3,
"counts_by_comments":3},
{
"val":"DNC",
"count":1,
"counts_by_comments":1},
...
Distribution of keywords that appear in comments and replies by the comments
Output is sorted in alphabetical
order. It cannot be changed
facet:{
top_keywords : {
...
sort: "counts_by_comments desc"
}}}
49
JSON Facet API:
• Experimental - but more mature
• More developed and established feature
• bulky JSON syntax
• faceting on children by non-top level ancestors requires introducing unique branch
identifiers similar to "_root_" on each level
Block Join Facet:
• Experimental feature
• Lacks controls: sorting, limit...
• traditional query-style syntax
• proper handling of faceting on children by non-top level ancestors
Hierarchical Faceting Summary
50
• Returning hierarchical structure
• JSON facet rollups is in the works - SOLR-8998
• Graph querying might replace a lot of functionalities of cross-level querying - No
distributed support right now.
• There’s more but the community would love to have more people involved!
Community Roadmap
Thank you!
Anshum Gupta anshum@apache.org | @anshumgupta
Alisa Zhila alisa.zhila@gmail.com
https://github.com/alisa-ipn/solr-revolution-2016-nested-demo

More Related Content

What's hot

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Juan Sequeda
 
Why Java Needs Hierarchical Data
Why Java Needs Hierarchical DataWhy Java Needs Hierarchical Data
Why Java Needs Hierarchical DataMarakana Inc.
 
JSON-LD for RESTful services
JSON-LD for RESTful servicesJSON-LD for RESTful services
JSON-LD for RESTful servicesMarkus Lanthaler
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011Ross Singer
 
Semantic Web and Schema.org
Semantic Web and Schema.orgSemantic Web and Schema.org
Semantic Web and Schema.orgrvguha
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Aadil Hussaini
 
Google and google scholar
Google and google scholarGoogle and google scholar
Google and google scholarJoelle Pitts
 
Effective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorialEffective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorialJaclyn Lee Parrott
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteNeo4j
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Open Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceOpen Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceDawn Foster
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked DataRuben Verborgh
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is FederatedRuben Verborgh
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk publicNesta
 
Open Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMOpen Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMDawn Foster
 
DBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern FragmentsDBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern FragmentsRuben Verborgh
 

What's hot (19)

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Why Java Needs Hierarchical Data
Why Java Needs Hierarchical DataWhy Java Needs Hierarchical Data
Why Java Needs Hierarchical Data
 
JSON-LD for RESTful services
JSON-LD for RESTful servicesJSON-LD for RESTful services
JSON-LD for RESTful services
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 
Semantic Web and Schema.org
Semantic Web and Schema.orgSemantic Web and Schema.org
Semantic Web and Schema.org
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report
 
Google and google scholar
Google and google scholarGoogle and google scholar
Google and google scholar
 
Effective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorialEffective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorial
 
Google Hack
Google HackGoogle Hack
Google Hack
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole White
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Open Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceOpen Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice Conference
 
Google as a Hacking Tool
Google as a Hacking ToolGoogle as a Hacking Tool
Google as a Hacking Tool
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is Federated
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk public
 
Open Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMOpen Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEM
 
DBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern FragmentsDBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern Fragments
 

Viewers also liked

Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Lucidworks
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10Anshum Gupta
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsAnshum Gupta
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsAnshum Gupta
 
Webinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionWebinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionLucidworks
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Solr as your search and suggest engine karan nangru
Solr as your search and suggest engine   karan nangruSolr as your search and suggest engine   karan nangru
Solr as your search and suggest engine karan nangruIndicThreads
 
Adapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan CodinaAdapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan Codinalucenerevolution
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill lucenerevolution
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchSematext Group, Inc.
 
Proposal for nested document support in Lucene
Proposal for nested document support in LuceneProposal for nested document support in Lucene
Proposal for nested document support in LuceneMark Harwood
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrThomas Koch
 
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Lucidworks
 
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Jodok Batlogg
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and RecommendersLucidworks
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceLucidworks
 

Viewers also liked (20)

Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
it's just search
it's just searchit's just search
it's just search
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIs
 
Webinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionWebinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with Fusion
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Solr as your search and suggest engine karan nangru
Solr as your search and suggest engine   karan nangruSolr as your search and suggest engine   karan nangru
Solr as your search and suggest engine karan nangru
 
Adapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan CodinaAdapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan Codina
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
 
Proposal for nested document support in Lucene
Proposal for nested document support in LuceneProposal for nested document support in Lucene
Proposal for nested document support in Lucene
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und Solr
 
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
 
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
 

Similar to Working with deeply nested documents in Apache Solr

Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsJoe Drumgoole
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamMikkel Flindt Heisterberg
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamLetsConnect
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosMongoDB
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
How Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the DotsHow Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the Dotsjexp
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Vizwik part 3 data
Vizwik part 3 dataVizwik part 3 data
Vizwik part 3 dataVizwik
 
So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?kristinferrier
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012exponential-inc
 
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdfLESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdfJoshCasas1
 

Similar to Working with deeply nested documents in Apache Solr (20)

Webofdata
WebofdataWebofdata
Webofdata
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in Documents
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity Stream
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity Stream
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
Gray_Compass99.ppt
Gray_Compass99.pptGray_Compass99.ppt
Gray_Compass99.ppt
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
How Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the DotsHow Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the Dots
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Amazon DynamoDB Design Workshop
Amazon DynamoDB Design WorkshopAmazon DynamoDB Design Workshop
Amazon DynamoDB Design Workshop
 
Vizwik part 3 data
Vizwik part 3 dataVizwik part 3 data
Vizwik part 3 data
 
So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
 
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdfLESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
 

More from Anshum Gupta

SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsAnshum Gupta
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Anshum Gupta
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworksAnshum Gupta
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyondAnshum Gupta
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache SolrAnshum Gupta
 

More from Anshum Gupta (8)

SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache Solr
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Working with deeply nested documents in Apache Solr

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. Working with deeply nested documents in Apache Solr Anshum Gupta, Alisa Zhila IBM Watson
  • 3. 3 Anshum Gupta • Apache Lucene/Solr committer and PMC member • Search guy @ IBM Watson. • Interested in search and related stuff. • Apache Lucene since 2006 and Solr since 2010.
  • 4. 4 Alisa Zhila • Apache Lucene/Solr supporter :) • Natural Language Processing technologies @ IBM Watson • Interested in search and related stuff
  • 5. 5 Agenda • Hierarchical Data/Nested Documents • Indexing Nested Documents • Querying Nested Documents • Faceting on Nested Documents
  • 7. 7 • Social media comments, Email threads, Annotated data - AI • Relationship between documents • Possibility to flatten Need for nested data EXAMPLE: Blog Post with Comments Peter Navarro outlines the Trump economic plan Tyler Cowen, September 27, 2016 at 3:07am Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports. 1 Ray Lopez September 27, 2016 at 3:21 am I’ll be the first to say this, but the analysis is flawed. {negative} 2 Brian Donohue September 27, 2016 at 9:20 am The math checks out. Solid. {positive} examples from http://marginalrevolution.com
  • 8. 8 • Can not flatten, need to retain context • Relationship between documents • Get all 'positive comments' to 'posts about Trump' -- IMPOSSIBLE!!! Nested Documents EXAMPLE: Data Flattening Title: Peter Navarro outlines the Trump economic plan Author: Tyler Cowen Date: September 27, 2016 at 3:07am Body: Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports. Comment_authors: [Ray Lopez, Brian Donohue] Comment_dates: [September 27, 2016 at 3:21 am, September 27, 2016 at 9:20 am] Comment_texts: ["I’ll be the first to say this, but the analysis is flawed.", "The math checks out. Solid."] Comment_sentiments: [negative, positive]
  • 9. 9 • Can not flatten, need to retain context • Relationship between documents • Get all 'positive comments' to 'posts about Trump' -- POSSIBLE!!! (stay tuned) Nested Documents EXAMPLE: Hierarchical Documents Type: Post Title: Peter Navarro outlines the Trump economic plan Author: Tyler Cowen Date: September 27, 2016 at 3:07am Body: Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports. Type: Comment Author: Ray Lopez Date: September 27, 2016 at 3:21 am Text: I’ll be the first to say this, but the analysis is flawed. Sentiment: negative Type: Comment Author: Brian Donohue Date: September 27, 2016 at 9:20 am Text: The math checks out. Solid. Sentiment: positive
  • 10. 10 • Blog Post Data with Comments and Replies from http://marginalrevolution.com (cured) • 2 posts, 2-3 comments per post, 0-3 replies per comment • Extracted keywords & sentiment data • 4 levels of "nesting" • Too big to show on slides • Data + Scripts + Demo Queries: • https://github.com/alisa-ipn/solr- revolution-2016-nested-demo Running Example
  • 12. 12 • Nested XML • JSON Documents • Add _childDocument_ tags for all children • Pre-process field names to FQNs • Lose information, or add that as meta-data during pre-processing • JSON Document endpoint (6x only) - /update/json/docs • Field name mappings • Child Document splitting - Enhanced support coming soon. Sending Documents to Solr
  • 13. 13 solr-6.2.1$ bin/post -c demo-xml ./data/example-data.xml Sending Documents to Solr: Nested XML <add> <doc> <field name="type">post</field> <field name="author"> "Alex Tabarrok"</field> <field name="title">"The Irony of Hillary Clinton’s Data Analytics"</ field> <field name="body">"Barack Obama’s campaign adopted data but Hillary Clinton’s campaign has been molded by data from birth."</field> <field name="id">"12015-24204"</field> <doc> <field name="type">comment</field> <field name="author">"Todd"</field> <field name="text">"Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to learn over time, and apply the lessons learned to the real world."</field> <field name="sentiment">"positive"</field> <field name="id">"29798-24171"</field> <doc> <field name="type">reply</field> <field name="author">"The Other Jim"</field> <field name="text">"No, she lost because (1) she is thoroughly detested person and (2) the DNC decided Obama should therefore win."</field> <field name="sentiment">"negative"</field> <field name="id">"29798-21232"</field> </doc> </doc> </doc> </add>
  • 14. 14 • Add _childDocument_ tags for all children • Pre-process field names to FQNs • Lose information, or add that as meta-data during pre-processing solr-6.2.1$ bin/post -c demo-solr-json ./data/small-example-data-solr.json -format solr Sending Documents to Solr: JSON Documents [{ "path": "1.posts", "id": "28711", "author": "Alex Tabarrok", "title": "The Irony of Hillary Clinton’s Data Analytics", "body": "Barack Obama’s campaign adopted data but Hillary Clinton’s campaign has been molded by data from birth.", "_childDocuments_": [ { "path": "2.posts.comments", "id": "28711-19237", "author": "Todd", "text": "Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to learn over time, and apply the lessons learned to the real world.", "sentiment": "positive", "_childDocuments_": [ { "path": "3.posts.comments.replies", "author": "The Other Jim", "id": "28711-12444", "sentiment": "negative", "text": "No, she lost because (1) she is thoroughly detested person and (2) the DNC decided Obama should therefore win." }]}]}]
  • 15. 15 • JSON Document endpoint (6x only) - /update/json/docs • Field name mappings • Child Document splitting - Enhanced support coming soon. solr-6.2.1$ curl 'http://localhost:8983/solr/gettingstarted/update/json/docs? split=/|/posts|/posts/comments|/posts/comments/replies&commit=true' --data- binary @small-example-data.json -H ‘Content-type:application/json' NOTE: All documents must contain a unique ID. Sending Documents to Solr: JSON Endpoint
  • 16. 16 • Update Request Processors don’t work with nested documents • Example: • UUID update processor does not auto-add an id for a child document. • Workaround: • Take responsibility at the client layer to handle the computation for nested documents. • Change the update processor in Solr to handle nested documents. Update Processors and Nested Documents
  • 17. 17 • The entire block needs reindexing • Forgot to add a meta-data field that might be useful? Complete reindex • Store everything in Solr IF • it’s too expensive to reconstruct the doc from original data source • No access to data anymore e.g. streaming data Re-Indexing Your Documents
  • 18. 18 • Various ways to index nested documents • Need to re-index entire block Nested Document Indexing Summary
  • 19. Let’s ask some interesting questions
  • 20. 20 { "path":["4.posts.comments.replies.keywords"], "text":["Trump"]}, { "path":["3.posts.comments.keywords"], "text":["Trump"]}, { "path":["2.posts.keywords"], "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "text":["Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports."], "path":["1.posts"]}, { "text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."], "path":["2.posts.comments"]} Easy question first Find all documents that mention Trump q=text:Trump
  • 21. 21 { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."], "path":["2.posts.comments"]}, { "text":["No one goes to Clinton rallies while tens of thousands line up to see Trump, data-mining leads to a fantasy view of the World."], "path":["2.posts.comments"]} Returning certain types of documents Find all comments and replies that mention Trump q=(path:2.posts.comments OR path:3.posts.comments.replies) AND text:Trump Recipe: At the data pre-processing stage, add a field that indicates document type and also its path in the hierarchy (-- stay tuned):
  • 24. 24 { "path":["3.posts.comments.keywords"], "sentiment":["positive"], "text":["Hillary"]}, { "path":["4.posts.comments.replies.keywords"], "sentiment":["negative"], "text":["Hillary"]}, { "path":["2.posts.keywords"], "text":["Hillary"]} Recap so far... Find all keywords that are Hillary q=path:*.keywords AND text:Hillary We're querying precisely for documents which we provide a search condition for Query Level 3 Result Level 3 Query Level 4 Result Level 4 Query Level 2 Result Level 2
  • 25. 25 Returning parents by querying children: Block Join Parent Query Find all comments whose keywords detected positive sentiment towards Hillary q={!parent which="path:2.posts.comments"}path:3.posts.comments.keywords AND text:Hillary AND sentiment:positive Query Level 3 Result Level 2 { "author":["Brian Donohue"], "text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."], "path":["2.posts.comments"]}, { "author":["Todd"], "text":["Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to learn over time, and apply the lessons learned to the real world."], "path":["2.posts.comments"]}
  • 26. 26 { "sentiment":["negative"], "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "sentiment":["neutral"], "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]}, { "sentiment":["positive"], "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"]} Returning children by querying parents: Block Join Child Query Find replies to negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies Query Level 2 Result Level 3
  • 27. 27 { "sentiment":["negative"], "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "sentiment":["neutral"], "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]}, { "sentiment":["positive"], "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"]} Returning children by querying parents: Block Join Child Query Find replies to negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies Query Level 2 Result Level 3 Block Join Child Query + Filtering Query A bit counterintuitive and non-symmetrical to the BJPQ
  • 28. 28 { "path":["4.posts.comments.replies.keywords"], "id":"17413-13550", "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"], "id":"17413-66188"}, { "path":["3.posts.comments.keywords"], "id":"12413-12487", "text":["Hillary"]}, { "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"], "id":"12413-10998"} Returning all document's descendants Block Join Child Query Find all descendants of negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative Query Level 2 Results Level 3 Results Level 4
  • 29. 29 Returning all document's descendants Block Join Child Query Find all descendants of negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative Query Level 2 Results Level 3 Results Level 4 { "path":["4.posts.comments.replies.keywords"], "id":"17413-13550", "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"], "id":"17413-66188"}, { "path":["3.posts.comments.keywords"], "id":"12413-12487", "text":["Hillary"]}, { "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"], "id":"12413-10998"} Issue: no grouping by parent What if we want to bring the whole sub-structure?
  • 30. 30 Find all negative comments and return them with all their descendants q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.*] Query Level 2 Result Level 2 sub- hierarchy Returning document with all descendants: ChildDocTransformer { "sentiment":["negative"], "text":["I’ll be the first to say this, but the analysis is flawed."], "path":["2.posts.comments"], "_childDocuments_":[ { "path":["4.posts.comments.replies.keywords"], "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "path":["4.posts.comments.replies.keywords"], "text":["U.S."]}, { "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]} ] }, ... Issue: the "sub-hierarchy" is flat
  • 31. • Returns all descendant documents along with the queried document • flattens the sub-hierarchy • Workarounds: • Reconstruct the document using path ("path":["3.posts.comments.replies"]) information in case you want the entire subtree (result post-processing) • use childFilter in case you want a specific level 31 “This transformer returns all descendant documents of each parent document matching your query in a flat list nested inside the matching parent document." (ChildDocTransformer cwiki) Returning document with all descendants: ChildDocTransformer
  • 32. 32 Find all negative comments and return them with all replies to them q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.* childFilter=path:3.posts.comments.replies] { "sentiment":["negative"], "text":["I’ll be the first to say this, but the analysis is flawed."], "path":["2.posts.comments"], "_childDocuments_":[ { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]} ] }, ... Returning document with specific descendants: ChildDocTransformer + childFilter Query Level 2:comments Result Level 2:comments + Level 3:replies
  • 33. 33 Find all negative comments and return them with all their descendants that mention Trump q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.* childFilter=text:Trump] { "sentiment":["negative"], "text":["I’ll be the first to say this, but the analysis is flawed."], "path":["2.posts.comments"], "_childDocuments_":[ { "path":["4.posts.comments.replies.keywords"], "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]} ] }, ... Returning document with queried descendants: ChildDocTransformer + childFilter Query Level 2:comments Result Level 2:comments + sub-levels Issue: cannot use boolean expressions in childFilter query
  • 34. 34 Cross-Level Querying Mechanisms: • Block Join Parent Query • Block Join Children Query • ChildDocTransformer Good points: • overlapping & complementary features • good capabilities of querying direct ancestors/descendants • possible to query on siblings of different type Drawbacks: • need for data-preprocessing for better querying flexibility • limited support of querying over non-directly related branches (overcome with graphs?) • flattening nested data (additional post-processing is needed for reconstruction) Nested Document Querying Summary
  • 35. Faceting on Nested Documents
  • 36. 36 • Solr allows faceting on nested documents! • Two mechanisms for faceting: • Faceting with JSON Facet API (since Solr 5.3) • Block Join Faceting (since Solr 5.5) Faceting on Nested Documents
  • 37. 37 q=path:2.posts.comments AND sentiment:positive& json.facet={ most_liked_authors : { type: terms, field: author, domain: { blockParent : "path:1.posts"} }} Faceting on parents by descendants JSON Facet API: Parent Domain Count authors of the posts that received positive comments "most_liked_authors":{ "buckets":[ { "val":"Alex Tabarrok", "count":1}, { "val":"Tyler Cowen", "count":1} ] } Query Level 2 Facet Level 1
  • 38. 38 Faceting on descendants by ancestors JSON Facet API: Child Domain Distribution of keywords that appear in comments and replies by the top-level posts Query Level 1 Facet Descendant Levels "top_keywords":{ "buckets":[{ "val":"hillary", "count":4, "counts_by_posts":2}, { "val":"trump", "count":3, "counts_by_posts":2}, { "val":"dnc", "count":1, "counts_by_posts":1}, { "val":"obama", "count":2, "counts_by_posts":1}, { "val":"u.s", "count":1, "counts_by_posts":1} ]}
  • 39. 39 q=path:1.posts&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:1.posts" }, facet:{ top_keywords : { type: terms, field: text, sort: "counts_by_posts desc", facet: { counts_by_posts: "unique(_root_)" }}}}} Faceting on descendants by ancestors JSON Facet API: Child Domain Distribution of keywords that appear in comments and replies by the top-level posts Query Level 1 Facet Descendant Levels
  • 40. 40 Faceting on descendants by top-level ancestor JSON Facet API: Child Domain Distribution of keywords that appear in comments and replies by the top-level posts Query Level 1 Facet Descendant Levels Issue: only the top-ancestor gets the unique "_root_" field by default q=path:1.posts&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:1.posts" }, facet:{ top_keywords : { type: terms, field: text, sort: "counts_by_posts desc", facet: { counts_by_posts: "unique(_root_)" }}}}}
  • 41. 41 q=path:2.posts.comments&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:2.posts.comments" }, facet:{ top_keywords : { type: terms, field: text, sort: "counts_by_comments desc", facet: { counts_by_comments: "unique(2.posts.comments-id)" }}}}} Faceting on descendants by intermediate ancestors JSON Facet API: Child Domain + unique fields Distribution of keywords that appear in comments and replies by the comments Query Level 2 Facet Descendant Levels At pre-processing, introduce unique fields for each level
  • 42. 42 Faceting on descendants by intermediate ancestors JSON Facet API: Child Domain + unique fields Distribution of keywords that appear in comments and replies by the comments Query Level 2 Facet Descendant Levels "top_keywords":{ "buckets":[{ "val":"Hillary", "count":4, "counts_by_comments":3}, { "val":"Trump", "count":3, "counts_by_comments":3}, { "val":"DNC", "count":1, "counts_by_comments":1}, { "val":"Obama", "count":2, "counts_by_comments":1}, { "val":"U.S.", "count":1, "counts_by_comments":1} ]}
  • 43. Now let's try the same using Block Join Faceting
  • 44. 44 • Experimental Feature • Needs to be turned on explicitly in solrconfig.xml More info: https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting Block Join Faceting
  • 45. 45 bjqfacet?q={!parent which=path:2.posts.comments} path:*.comments*keywords&rows=0&facet=true&child.facet.field=text Faceting on descendants by ancestors #2: Block Join Faceting on Children Domain Distribution of keywords that appear in comments and replies by the comments "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } Query Level 2 Facet Descendant Levels
  • 46. 46 bjqfacet?q={!parent which=path:2.posts.comments} path:*.comments*keywords&rows=0&facet=true&child.facet.field=text Faceting on descendants by ancestors #2: Block Join Faceting on Children Domain Distribution of keywords that appear in comments and replies by the comments "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } Query Level 2 Facet Descendant Levels bjqfacet request handler instead of query
  • 47. 47 Output Comparison Block Join Facet JSON Facet API "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } "top_keywords":{ "buckets":[{ "val":"Hillary", "count":4, "counts_by_comments":3}, { "val":"Trump", "count":3, "counts_by_comments":3}, { "val":"DNC", "count":1, "counts_by_comments":1}, { "val":"Obama", "count":2, "counts_by_comments":1}, { "val":"U.S.", "count":1, "counts_by_comments":1} ]} Distribution of keywords that appear in comments and replies by the comments
  • 48. 48 Output Comparison Block Join Facet JSON Facet API "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } "top_keywords":{ "buckets":[{ "val":"Hillary", "count":4, "counts_by_comments":3}, { "val":"Trump", "count":3, "counts_by_comments":3}, { "val":"DNC", "count":1, "counts_by_comments":1}, ... Distribution of keywords that appear in comments and replies by the comments Output is sorted in alphabetical order. It cannot be changed facet:{ top_keywords : { ... sort: "counts_by_comments desc" }}}
  • 49. 49 JSON Facet API: • Experimental - but more mature • More developed and established feature • bulky JSON syntax • faceting on children by non-top level ancestors requires introducing unique branch identifiers similar to "_root_" on each level Block Join Facet: • Experimental feature • Lacks controls: sorting, limit... • traditional query-style syntax • proper handling of faceting on children by non-top level ancestors Hierarchical Faceting Summary
  • 50. 50 • Returning hierarchical structure • JSON facet rollups is in the works - SOLR-8998 • Graph querying might replace a lot of functionalities of cross-level querying - No distributed support right now. • There’s more but the community would love to have more people involved! Community Roadmap
  • 51. Thank you! Anshum Gupta anshum@apache.org | @anshumgupta Alisa Zhila alisa.zhila@gmail.com https://github.com/alisa-ipn/solr-revolution-2016-nested-demo