SlideShare a Scribd company logo
1 of 51
Download to read offline
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Working with deeply nested documents in Apache Solr
Anshum Gupta, Alisa Zhila
IBM Watson
3
Anshum Gupta
• Apache Lucene/Solr committer and PMC member
• Search guy @ IBM Watson.
• Interested in search and related stuff.
• Apache Lucene since 2006 and Solr since 2010.
4
Alisa Zhila
• Apache Lucene/Solr supporter :)
• Natural Language Processing technologies @ IBM Watson
• Interested in search and related stuff
5
Agenda
• Hierarchical Data/Nested Documents
• Indexing Nested Documents
• Querying Nested Documents
• Faceting on Nested Documents
Hierarchical Documents
7
• Social media comments, Email threads,
Annotated data - AI
• Relationship between documents
• Possibility to flatten
Need for nested data
EXAMPLE: Blog Post with Comments
Peter Navarro outlines the Trump economic plan
Tyler Cowen, September 27, 2016 at 3:07am
Trump proposes eliminating America’s $500 billion
trade deficit through a combination of increased
exports and reduced imports.
1 Ray Lopez September 27, 2016 at 3:21 am
I’ll be the first to say this, but the analysis is flawed.
{negative}
2 Brian Donohue September 27, 2016 at 9:20 am
The math checks out. Solid.
{positive}
examples from http://marginalrevolution.com
8
• Can not flatten, need to retain context
• Relationship between documents
• Get all 'positive comments' to 'posts about
Trump' -- IMPOSSIBLE!!!
Nested Documents
EXAMPLE: Data Flattening
Title: Peter Navarro outlines the Trump economic plan
Author: Tyler Cowen
Date: September 27, 2016 at 3:07am
Body: Trump proposes eliminating America’s $500 billion
trade deficit through a combination of increased exports and
reduced imports.
Comment_authors: [Ray Lopez, Brian Donohue]
Comment_dates: [September 27, 2016 at 3:21 am,
September 27, 2016 at 9:20 am]
Comment_texts: ["I’ll be the first to say this, but the analysis is
flawed.", "The math checks out. Solid."]
Comment_sentiments: [negative, positive]
9
• Can not flatten, need to retain context
• Relationship between documents
• Get all 'positive comments' to 'posts about
Trump' -- POSSIBLE!!! (stay tuned)
Nested Documents
EXAMPLE: Hierarchical Documents
Type: Post
Title: Peter Navarro outlines the Trump economic plan
Author: Tyler Cowen
Date: September 27, 2016 at 3:07am
Body: Trump proposes eliminating America’s $500 billion
trade deficit through a combination of increased exports and
reduced imports.
Type: Comment
Author: Ray Lopez
Date: September 27, 2016 at 3:21 am
Text: I’ll be the first to say this, but the analysis is flawed.
Sentiment: negative
Type: Comment
Author: Brian Donohue
Date: September 27, 2016 at 9:20 am
Text: The math checks out. Solid.
Sentiment: positive
10
• Blog Post Data with Comments and Replies
from http://marginalrevolution.com (cured)
• 2 posts, 2-3 comments per post, 0-3 replies
per comment
• Extracted keywords & sentiment data
• 4 levels of "nesting"
• Too big to show on slides
• Data + Scripts + Demo Queries:
• https://github.com/alisa-ipn/solr-
revolution-2016-nested-demo
Running Example
Indexing Nested Documents
12
• Nested XML
• JSON Documents
• Add _childDocument_ tags for all children
• Pre-process field names to FQNs
• Lose information, or add that as meta-data during pre-processing
• JSON Document endpoint (6x only) - /update/json/docs
• Field name mappings
• Child Document splitting - Enhanced support coming soon.
Sending Documents to Solr
13
solr-6.2.1$ bin/post -c demo-xml ./data/example-data.xml
Sending Documents to Solr: Nested XML
<add>
<doc>
<field name="type">post</field>
<field name="author"> "Alex Tabarrok"</field>
<field name="title">"The Irony of Hillary Clinton’s Data Analytics"</
field>
<field name="body">"Barack Obama’s campaign adopted data but
Hillary Clinton’s campaign has been molded by data from birth."</field>
<field name="id">"12015-24204"</field>
<doc>
<field name="type">comment</field>
<field name="author">"Todd"</field>
<field name="text">"Clinton got out data-ed and out organized in
2008 by Obama. She seems at least to learn over time, and apply the
lessons learned to the real world."</field>
<field name="sentiment">"positive"</field>
<field name="id">"29798-24171"</field>
<doc>
<field name="type">reply</field>
<field name="author">"The Other Jim"</field>
<field name="text">"No, she lost because (1) she is thoroughly
detested person and (2) the DNC decided Obama should therefore
win."</field>
<field name="sentiment">"negative"</field>
<field name="id">"29798-21232"</field>
</doc>
</doc>
</doc>
</add>
14
• Add _childDocument_ tags for all children
• Pre-process field names to FQNs
• Lose information, or add that as meta-data during pre-processing
solr-6.2.1$ bin/post -c demo-solr-json ./data/small-example-data-solr.json -format solr
Sending Documents to Solr: JSON Documents
[{ "path": "1.posts",
"id": "28711",
"author": "Alex Tabarrok",
"title": "The Irony of Hillary Clinton’s Data Analytics",
"body": "Barack Obama’s campaign adopted data but Hillary Clinton’s campaign
has been molded by data from birth.",
"_childDocuments_": [
{
"path": "2.posts.comments",
"id": "28711-19237",
"author": "Todd",
"text": "Clinton got out data-ed and out organized in 2008 by Obama. She
seems at least to learn over time, and apply the lessons learned to the real world.",
"sentiment": "positive",
"_childDocuments_": [
{
"path": "3.posts.comments.replies",
"author": "The Other Jim",
"id": "28711-12444",
"sentiment": "negative",
"text": "No, she lost because (1) she is thoroughly detested person and
(2) the DNC decided Obama should therefore win."
}]}]}]
15
• JSON Document endpoint (6x only) - /update/json/docs
• Field name mappings
• Child Document splitting - Enhanced support coming soon.
solr-6.2.1$ curl 'http://localhost:8983/solr/gettingstarted/update/json/docs?
split=/|/posts|/posts/comments|/posts/comments/replies&commit=true' --data-
binary @small-example-data.json -H ‘Content-type:application/json'
NOTE: All documents must contain a unique ID.
Sending Documents to Solr: JSON Endpoint
16
• Update Request Processors don’t work with nested documents
• Example:
• UUID update processor does not auto-add an id for a child document.
• Workaround:
• Take responsibility at the client layer to handle the computation for nested
documents.
• Change the update processor in Solr to handle nested documents.
Update Processors and Nested Documents
17
• The entire block needs reindexing
• Forgot to add a meta-data field that might be useful? Complete reindex
• Store everything in Solr IF
• it’s too expensive to reconstruct the doc from original data source
• No access to data anymore e.g. streaming data
Re-Indexing Your Documents
18
• Various ways to index nested documents
• Need to re-index entire block
Nested Document Indexing Summary
Let’s ask some interesting questions
20
{
"path":["4.posts.comments.replies.keywords"],
"text":["Trump"]},
{
"path":["3.posts.comments.keywords"],
"text":["Trump"]},
{
"path":["2.posts.keywords"],
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"text":["Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports."],
"path":["1.posts"]},
{
"text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."],
"path":["2.posts.comments"]}
Easy question first
Find all documents that mention Trump
q=text:Trump
21
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."],
"path":["2.posts.comments"]},
{
"text":["No one goes to Clinton rallies while tens of thousands line up to see Trump, data-mining leads to a fantasy view of the World."],
"path":["2.posts.comments"]}
Returning certain types of documents
Find all comments and replies that mention Trump
q=(path:2.posts.comments OR path:3.posts.comments.replies) AND text:Trump
Recipe:
At the data pre-processing stage, add a field that indicates document type
and also its path in the hierarchy (-- stay tuned):
22
{
"path":["3.posts.comments.keywords"],
"sentiment":["positive"],
"text":["Hillary"]},
{
"path":["4.posts.comments.replies.keywords"],
"sentiment":["negative"],
"text":["Hillary"]},
{
"path":["2.posts.keywords"],
"text":["Hillary"]}
Returning similar type from different level
Find all keywords that are Hillary
q=path:*.keywords AND text:Hillary
Recipe:
Use wild-cards in the field that stores the hierarchy path
Cross-Level Querying
24
{
"path":["3.posts.comments.keywords"],
"sentiment":["positive"],
"text":["Hillary"]},
{
"path":["4.posts.comments.replies.keywords"],
"sentiment":["negative"],
"text":["Hillary"]},
{
"path":["2.posts.keywords"],
"text":["Hillary"]}
Recap so far...
Find all keywords that are Hillary
q=path:*.keywords AND text:Hillary
We're querying precisely for documents
which we provide a search condition for
Query
Level 3
Result
Level 3
Query
Level 4
Result
Level 4
Query
Level 2
Result
Level 2
25
Returning parents by querying children:
Block Join Parent Query
Find all comments whose keywords detected positive sentiment towards Hillary
q={!parent which="path:2.posts.comments"}path:3.posts.comments.keywords AND text:Hillary AND sentiment:positive
Query
Level 3
Result
Level 2
{
"author":["Brian Donohue"],
"text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering,
but he was actually able to find his feet and score some points."],
"path":["2.posts.comments"]},
{
"author":["Todd"],
"text":["Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to
learn over time, and apply the lessons learned to the real world."],
"path":["2.posts.comments"]}
26
{
"sentiment":["negative"],
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"sentiment":["neutral"],
"text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S.
asset values?"],
"path":["3.posts.comments.replies"]},
{
"sentiment":["positive"],
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"]}
Returning children by querying parents:
Block Join Child Query
Find replies to negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies
Query
Level 2
Result
Level 3
27
{
"sentiment":["negative"],
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"sentiment":["neutral"],
"text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S.
asset values?"],
"path":["3.posts.comments.replies"]},
{
"sentiment":["positive"],
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"]}
Returning children by querying parents:
Block Join Child Query
Find replies to negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies
Query
Level 2
Result
Level 3
Block Join Child Query + Filtering Query
A bit counterintuitive and non-symmetrical to the BJPQ
28
{
"path":["4.posts.comments.replies.keywords"],
"id":"17413-13550",
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"],
"id":"17413-66188"},
{
"path":["3.posts.comments.keywords"],
"id":"12413-12487",
"text":["Hillary"]},
{
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"],
"id":"12413-10998"}
Returning all document's descendants
Block Join Child Query
Find all descendants of negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative
Query
Level 2
Results
Level 3
Results
Level 4
29
Returning all document's descendants
Block Join Child Query
Find all descendants of negative comments
q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative
Query
Level 2
Results
Level 3
Results
Level 4
{
"path":["4.posts.comments.replies.keywords"],
"id":"17413-13550",
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"],
"id":"17413-66188"},
{
"path":["3.posts.comments.keywords"],
"id":"12413-12487",
"text":["Hillary"]},
{
"text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see
a fantasy in person?"],
"path":["3.posts.comments.replies"],
"id":"12413-10998"}
Issue: no grouping by parent
What if we want to bring the whole sub-structure?
30
Find all negative comments and return them with all their descendants
q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.*]
Query
Level 2
Result
Level 2
sub-
hierarchy
Returning document with all descendants:
ChildDocTransformer
{
"sentiment":["negative"],
"text":["I’ll be the first to say this, but the analysis is flawed."],
"path":["2.posts.comments"],
"_childDocuments_":[
{
"path":["4.posts.comments.replies.keywords"],
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."],
"path":["3.posts.comments.replies"]},
{
"path":["4.posts.comments.replies.keywords"],
"text":["U.S."]},
{
"text":["So then I guess he will also eliminate the current account surplus? What
will happen to U.S. asset values?"],
"path":["3.posts.comments.replies"]}
]
},
...
Issue: the "sub-hierarchy" is flat
• Returns all descendant documents along with the queried document
• flattens the sub-hierarchy
• Workarounds:
• Reconstruct the document using path ("path":["3.posts.comments.replies"])
information in case you want the entire subtree (result post-processing)
• use childFilter in case you want a specific level
31
“This transformer returns all descendant documents of each parent document matching your query in
a flat list nested inside the matching parent document." (ChildDocTransformer cwiki)
Returning document with all descendants:
ChildDocTransformer
32
Find all negative comments and return them with all replies to them
q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.*
childFilter=path:3.posts.comments.replies]
{
"sentiment":["negative"],
"text":["I’ll be the first to say this, but the analysis is flawed."],
"path":["2.posts.comments"],
"_childDocuments_":[
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is
funnier."],
"path":["3.posts.comments.replies"]},
{
"text":["So then I guess he will also eliminate the current account surplus? What
will happen to U.S. asset values?"],
"path":["3.posts.comments.replies"]}
]
},
...
Returning document with specific descendants:
ChildDocTransformer + childFilter
Query
Level 2:comments
Result
Level 2:comments
+ Level 3:replies
33
Find all negative comments and return them with all their descendants that mention Trump
q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.* childFilter=text:Trump]
{
"sentiment":["negative"],
"text":["I’ll be the first to say this, but the analysis is flawed."],
"path":["2.posts.comments"],
"_childDocuments_":[
{
"path":["4.posts.comments.replies.keywords"],
"text":["Trump"]},
{
"text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is
funnier."],
"path":["3.posts.comments.replies"]}
]
},
...
Returning document with queried descendants:
ChildDocTransformer + childFilter
Query
Level 2:comments
Result
Level 2:comments
+ sub-levels
Issue: cannot use boolean expressions in childFilter query
34
Cross-Level Querying Mechanisms:
• Block Join Parent Query
• Block Join Children Query
• ChildDocTransformer
Good points:
• overlapping & complementary features
• good capabilities of querying direct ancestors/descendants
• possible to query on siblings of different type
Drawbacks:
• need for data-preprocessing for better querying flexibility
• limited support of querying over non-directly related branches (overcome with graphs?)
• flattening nested data (additional post-processing is needed for reconstruction)
Nested Document Querying Summary
Faceting on Nested Documents
36
• Solr allows faceting on nested documents!
• Two mechanisms for faceting:
• Faceting with JSON Facet API (since Solr 5.3)
• Block Join Faceting (since Solr 5.5)
Faceting on Nested Documents
37
q=path:2.posts.comments AND sentiment:positive&
json.facet={
most_liked_authors : {
type: terms,
field: author,
domain: { blockParent : "path:1.posts"}
}}
Faceting on parents by descendants
JSON Facet API: Parent Domain
Count authors of the posts that received positive comments
"most_liked_authors":{
"buckets":[
{
"val":"Alex Tabarrok",
"count":1},
{
"val":"Tyler Cowen",
"count":1}
]
}
Query
Level 2
Facet
Level 1
38
Faceting on descendants by ancestors
JSON Facet API: Child Domain
Distribution of keywords that appear in comments and replies by the top-level posts
Query
Level 1
Facet
Descendant
Levels
"top_keywords":{
"buckets":[{
"val":"hillary",
"count":4,
"counts_by_posts":2},
{
"val":"trump",
"count":3,
"counts_by_posts":2},
{
"val":"dnc",
"count":1,
"counts_by_posts":1},
{
"val":"obama",
"count":2,
"counts_by_posts":1},
{
"val":"u.s",
"count":1,
"counts_by_posts":1}
]}
39
q=path:1.posts&rows=0&
json.facet={
filter_by_child_type :{
type:query,
q:"path:*comments*keywords",
domain: { blockChildren : "path:1.posts" },
facet:{
top_keywords : {
type: terms,
field: text,
sort: "counts_by_posts desc",
facet: {
counts_by_posts: "unique(_root_)"
}}}}}
Faceting on descendants by ancestors
JSON Facet API: Child Domain
Distribution of keywords that appear in comments and replies by the top-level posts
Query
Level 1
Facet
Descendant
Levels
40
Faceting on descendants by top-level ancestor
JSON Facet API: Child Domain
Distribution of keywords that appear in comments and replies by the top-level posts
Query
Level 1
Facet
Descendant
Levels
Issue: only the top-ancestor gets the unique "_root_" field by default
q=path:1.posts&rows=0&
json.facet={
filter_by_child_type :{
type:query,
q:"path:*comments*keywords",
domain: { blockChildren : "path:1.posts" },
facet:{
top_keywords : {
type: terms,
field: text,
sort: "counts_by_posts desc",
facet: {
counts_by_posts: "unique(_root_)"
}}}}}
41
q=path:2.posts.comments&rows=0&
json.facet={
filter_by_child_type :{
type:query,
q:"path:*comments*keywords",
domain: { blockChildren : "path:2.posts.comments" },
facet:{
top_keywords : {
type: terms,
field: text,
sort: "counts_by_comments desc",
facet: {
counts_by_comments: "unique(2.posts.comments-id)"
}}}}}
Faceting on descendants by intermediate ancestors
JSON Facet API: Child Domain + unique fields
Distribution of keywords that appear in comments and replies by the comments
Query
Level 2
Facet
Descendant
Levels
At pre-processing, introduce unique fields for each level
42
Faceting on descendants by intermediate ancestors
JSON Facet API: Child Domain + unique fields
Distribution of keywords that appear in comments and replies by the comments
Query
Level 2
Facet
Descendant
Levels
"top_keywords":{
"buckets":[{
"val":"Hillary",
"count":4,
"counts_by_comments":3},
{
"val":"Trump",
"count":3,
"counts_by_comments":3},
{
"val":"DNC",
"count":1,
"counts_by_comments":1},
{
"val":"Obama",
"count":2,
"counts_by_comments":1},
{
"val":"U.S.",
"count":1,
"counts_by_comments":1}
]}
Now let's try the same using Block Join Faceting
44
• Experimental Feature
• Needs to be turned on explicitly in solrconfig.xml
More info: https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting
Block Join Faceting
45
bjqfacet?q={!parent which=path:2.posts.comments}
path:*.comments*keywords&rows=0&facet=true&child.facet.field=text
Faceting on descendants by ancestors #2:
Block Join Faceting on Children Domain
Distribution of keywords that appear in comments and replies by the comments
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
Query
Level 2
Facet
Descendant
Levels
46
bjqfacet?q={!parent which=path:2.posts.comments}
path:*.comments*keywords&rows=0&facet=true&child.facet.field=text
Faceting on descendants by ancestors #2:
Block Join Faceting on Children Domain
Distribution of keywords that appear in comments and replies by the comments
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
Query
Level 2
Facet
Descendant
Levels
bjqfacet request handler instead of query
47
Output Comparison
Block Join Facet JSON Facet API
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
"top_keywords":{
"buckets":[{
"val":"Hillary",
"count":4,
"counts_by_comments":3},
{
"val":"Trump",
"count":3,
"counts_by_comments":3},
{
"val":"DNC",
"count":1,
"counts_by_comments":1},
{
"val":"Obama",
"count":2,
"counts_by_comments":1},
{
"val":"U.S.",
"count":1,
"counts_by_comments":1}
]}
Distribution of keywords that appear in comments and replies by the comments
48
Output Comparison
Block Join Facet JSON Facet API
"facet_fields":{
"text":[
"dnc",1,
"hillary",3,
"obama",1,
"trump",3,
"u.s",1
]
}
"top_keywords":{
"buckets":[{
"val":"Hillary",
"count":4,
"counts_by_comments":3},
{
"val":"Trump",
"count":3,
"counts_by_comments":3},
{
"val":"DNC",
"count":1,
"counts_by_comments":1},
...
Distribution of keywords that appear in comments and replies by the comments
Output is sorted in alphabetical
order. It cannot be changed
facet:{
top_keywords : {
...
sort: "counts_by_comments desc"
}}}
49
JSON Facet API:
• Experimental - but more mature
• More developed and established feature
• bulky JSON syntax
• faceting on children by non-top level ancestors requires introducing unique branch
identifiers similar to "_root_" on each level
Block Join Facet:
• Experimental feature
• Lacks controls: sorting, limit...
• traditional query-style syntax
• proper handling of faceting on children by non-top level ancestors
Hierarchical Faceting Summary
50
• Returning hierarchical structure
• JSON facet rollups is in the works - SOLR-8998
• Graph querying might replace a lot of functionalities of cross-level querying - No
distributed support right now.
• There’s more but the community would love to have more people involved!
Community Roadmap
Thank you!
Anshum Gupta anshum@apache.org | @anshumgupta
Alisa Zhila alisa.zhila@gmail.com
https://github.com/alisa-ipn/solr-revolution-2016-nested-demo

More Related Content

What's hot

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Juan Sequeda
 
Why Java Needs Hierarchical Data
Why Java Needs Hierarchical DataWhy Java Needs Hierarchical Data
Why Java Needs Hierarchical DataMarakana Inc.
 
JSON-LD for RESTful services
JSON-LD for RESTful servicesJSON-LD for RESTful services
JSON-LD for RESTful servicesMarkus Lanthaler
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011Ross Singer
 
Semantic Web and Schema.org
Semantic Web and Schema.orgSemantic Web and Schema.org
Semantic Web and Schema.orgrvguha
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Aadil Hussaini
 
Google and google scholar
Google and google scholarGoogle and google scholar
Google and google scholarJoelle Pitts
 
Effective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorialEffective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorialJaclyn Lee Parrott
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteNeo4j
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Open Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceOpen Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceDawn Foster
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked DataRuben Verborgh
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is FederatedRuben Verborgh
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk publicNesta
 
Open Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMOpen Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMDawn Foster
 
DBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern FragmentsDBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern FragmentsRuben Verborgh
 

What's hot (19)

Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
Why Java Needs Hierarchical Data
Why Java Needs Hierarchical DataWhy Java Needs Hierarchical Data
Why Java Needs Hierarchical Data
 
JSON-LD for RESTful services
JSON-LD for RESTful servicesJSON-LD for RESTful services
JSON-LD for RESTful services
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 
Semantic Web and Schema.org
Semantic Web and Schema.orgSemantic Web and Schema.org
Semantic Web and Schema.org
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Mis 510 cyber analytics project report
Mis 510 cyber analytics project report Mis 510 cyber analytics project report
Mis 510 cyber analytics project report
 
Google and google scholar
Google and google scholarGoogle and google scholar
Google and google scholar
 
Effective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorialEffective and efficient google searching power point tutorial
Effective and efficient google searching power point tutorial
 
Google Hack
Google HackGoogle Hack
Google Hack
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole White
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Open Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice ConferenceOpen Source Community Metrics LibreOffice Conference
Open Source Community Metrics LibreOffice Conference
 
Google as a Hacking Tool
Google as a Hacking ToolGoogle as a Hacking Tool
Google as a Hacking Tool
 
Sustainable queryable access to Linked Data
Sustainable queryable access to Linked DataSustainable queryable access to Linked Data
Sustainable queryable access to Linked Data
 
The Future is Federated
The Future is FederatedThe Future is Federated
The Future is Federated
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk public
 
Open Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEMOpen Source Community Metrics for FOSDEM
Open Source Community Metrics for FOSDEM
 
DBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern FragmentsDBpedia's Triple Pattern Fragments
DBpedia's Triple Pattern Fragments
 

Viewers also liked

Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Lucidworks
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10Anshum Gupta
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsAnshum Gupta
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsAnshum Gupta
 
Webinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionWebinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionLucidworks
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Solr as your search and suggest engine karan nangru
Solr as your search and suggest engine   karan nangruSolr as your search and suggest engine   karan nangru
Solr as your search and suggest engine karan nangruIndicThreads
 
Adapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan CodinaAdapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan Codinalucenerevolution
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill lucenerevolution
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchSematext Group, Inc.
 
Proposal for nested document support in Lucene
Proposal for nested document support in LuceneProposal for nested document support in Lucene
Proposal for nested document support in LuceneMark Harwood
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrThomas Koch
 
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Lucidworks
 
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Jodok Batlogg
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and RecommendersLucidworks
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Lucidworks
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceLucidworks
 

Viewers also liked (20)

Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
Working with Deeply Nested Documents in Apache Solr: Presented by Anshum Gupt...
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
What's New in Apache Solr 4.10
What's New in Apache Solr 4.10What's New in Apache Solr 4.10
What's New in Apache Solr 4.10
 
it's just search
it's just searchit's just search
it's just search
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
Managing a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIsManaging a SolrCloud cluster using APIs
Managing a SolrCloud cluster using APIs
 
Webinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with FusionWebinar: Building Conversational Search with Fusion
Webinar: Building Conversational Search with Fusion
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Solr as your search and suggest engine karan nangru
Solr as your search and suggest engine   karan nangruSolr as your search and suggest engine   karan nangru
Solr as your search and suggest engine karan nangru
 
Adapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan CodinaAdapting Alax Solr to Compare different sets of documents - Joan Codina
Adapting Alax Solr to Compare different sets of documents - Joan Codina
 
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
 
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. ElasticsearchBattle of the Giants Round 2 - Apache Solr vs. Elasticsearch
Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
 
Proposal for nested document support in Lucene
Proposal for nested document support in LuceneProposal for nested document support in Lucene
Proposal for nested document support in Lucene
 
Facettensuche mit Lucene und Solr
Facettensuche mit Lucene und SolrFacettensuche mit Lucene und Solr
Facettensuche mit Lucene und Solr
 
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
 
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
 
Webinar: Search and Recommenders
Webinar: Search and RecommendersWebinar: Search and Recommenders
Webinar: Search and Recommenders
 
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
 
Webinar: Fusion for Business Intelligence
Webinar: Fusion for Business IntelligenceWebinar: Fusion for Business Intelligence
Webinar: Fusion for Business Intelligence
 

Similar to Working with deeply nested documents in Apache Solr

Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsJoe Drumgoole
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamMikkel Flindt Heisterberg
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamLetsConnect
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosMongoDB
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
How Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the DotsHow Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the Dotsjexp
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Vizwik part 3 data
Vizwik part 3 dataVizwik part 3 data
Vizwik part 3 dataVizwik
 
So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?kristinferrier
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012exponential-inc
 
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdfLESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdfJoshCasas1
 

Similar to Working with deeply nested documents in Apache Solr (20)

Webofdata
WebofdataWebofdata
Webofdata
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in Documents
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity Stream
 
Mikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity StreamMikkel Heisterberg - An introduction to developing for the Activity Stream
Mikkel Heisterberg - An introduction to developing for the Activity Stream
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
Gray_Compass99.ppt
Gray_Compass99.pptGray_Compass99.ppt
Gray_Compass99.ppt
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosConceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
How Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the DotsHow Graphs Help Investigative Journalists to Connect the Dots
How Graphs Help Investigative Journalists to Connect the Dots
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Amazon DynamoDB Design Workshop
Amazon DynamoDB Design WorkshopAmazon DynamoDB Design Workshop
Amazon DynamoDB Design Workshop
 
Vizwik part 3 data
Vizwik part 3 dataVizwik part 3 data
Vizwik part 3 data
 
So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
 
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdfLESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
LESSON 1- MICROSOFT ACCESS CREATING DATABASE.pdf
 

More from Anshum Gupta

SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsAnshum Gupta
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Anshum Gupta
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworksAnshum Gupta
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyondAnshum Gupta
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scaleAnshum Gupta
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache SolrAnshum Gupta
 

More from Anshum Gupta (8)

SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015Understanding the Solr security framework - Lucene Solr Revolution 2015
Understanding the Solr security framework - Lucene Solr Revolution 2015
 
Solr security frameworks
Solr security frameworksSolr security frameworks
Solr security frameworks
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Deploying and managing Solr at scale
Deploying and managing Solr at scaleDeploying and managing Solr at scale
Deploying and managing Solr at scale
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Ease of use in Apache Solr
Ease of use in Apache SolrEase of use in Apache Solr
Ease of use in Apache Solr
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Working with deeply nested documents in Apache Solr

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. Working with deeply nested documents in Apache Solr Anshum Gupta, Alisa Zhila IBM Watson
  • 3. 3 Anshum Gupta • Apache Lucene/Solr committer and PMC member • Search guy @ IBM Watson. • Interested in search and related stuff. • Apache Lucene since 2006 and Solr since 2010.
  • 4. 4 Alisa Zhila • Apache Lucene/Solr supporter :) • Natural Language Processing technologies @ IBM Watson • Interested in search and related stuff
  • 5. 5 Agenda • Hierarchical Data/Nested Documents • Indexing Nested Documents • Querying Nested Documents • Faceting on Nested Documents
  • 7. 7 • Social media comments, Email threads, Annotated data - AI • Relationship between documents • Possibility to flatten Need for nested data EXAMPLE: Blog Post with Comments Peter Navarro outlines the Trump economic plan Tyler Cowen, September 27, 2016 at 3:07am Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports. 1 Ray Lopez September 27, 2016 at 3:21 am I’ll be the first to say this, but the analysis is flawed. {negative} 2 Brian Donohue September 27, 2016 at 9:20 am The math checks out. Solid. {positive} examples from http://marginalrevolution.com
  • 8. 8 • Can not flatten, need to retain context • Relationship between documents • Get all 'positive comments' to 'posts about Trump' -- IMPOSSIBLE!!! Nested Documents EXAMPLE: Data Flattening Title: Peter Navarro outlines the Trump economic plan Author: Tyler Cowen Date: September 27, 2016 at 3:07am Body: Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports. Comment_authors: [Ray Lopez, Brian Donohue] Comment_dates: [September 27, 2016 at 3:21 am, September 27, 2016 at 9:20 am] Comment_texts: ["I’ll be the first to say this, but the analysis is flawed.", "The math checks out. Solid."] Comment_sentiments: [negative, positive]
  • 9. 9 • Can not flatten, need to retain context • Relationship between documents • Get all 'positive comments' to 'posts about Trump' -- POSSIBLE!!! (stay tuned) Nested Documents EXAMPLE: Hierarchical Documents Type: Post Title: Peter Navarro outlines the Trump economic plan Author: Tyler Cowen Date: September 27, 2016 at 3:07am Body: Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports. Type: Comment Author: Ray Lopez Date: September 27, 2016 at 3:21 am Text: I’ll be the first to say this, but the analysis is flawed. Sentiment: negative Type: Comment Author: Brian Donohue Date: September 27, 2016 at 9:20 am Text: The math checks out. Solid. Sentiment: positive
  • 10. 10 • Blog Post Data with Comments and Replies from http://marginalrevolution.com (cured) • 2 posts, 2-3 comments per post, 0-3 replies per comment • Extracted keywords & sentiment data • 4 levels of "nesting" • Too big to show on slides • Data + Scripts + Demo Queries: • https://github.com/alisa-ipn/solr- revolution-2016-nested-demo Running Example
  • 12. 12 • Nested XML • JSON Documents • Add _childDocument_ tags for all children • Pre-process field names to FQNs • Lose information, or add that as meta-data during pre-processing • JSON Document endpoint (6x only) - /update/json/docs • Field name mappings • Child Document splitting - Enhanced support coming soon. Sending Documents to Solr
  • 13. 13 solr-6.2.1$ bin/post -c demo-xml ./data/example-data.xml Sending Documents to Solr: Nested XML <add> <doc> <field name="type">post</field> <field name="author"> "Alex Tabarrok"</field> <field name="title">"The Irony of Hillary Clinton’s Data Analytics"</ field> <field name="body">"Barack Obama’s campaign adopted data but Hillary Clinton’s campaign has been molded by data from birth."</field> <field name="id">"12015-24204"</field> <doc> <field name="type">comment</field> <field name="author">"Todd"</field> <field name="text">"Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to learn over time, and apply the lessons learned to the real world."</field> <field name="sentiment">"positive"</field> <field name="id">"29798-24171"</field> <doc> <field name="type">reply</field> <field name="author">"The Other Jim"</field> <field name="text">"No, she lost because (1) she is thoroughly detested person and (2) the DNC decided Obama should therefore win."</field> <field name="sentiment">"negative"</field> <field name="id">"29798-21232"</field> </doc> </doc> </doc> </add>
  • 14. 14 • Add _childDocument_ tags for all children • Pre-process field names to FQNs • Lose information, or add that as meta-data during pre-processing solr-6.2.1$ bin/post -c demo-solr-json ./data/small-example-data-solr.json -format solr Sending Documents to Solr: JSON Documents [{ "path": "1.posts", "id": "28711", "author": "Alex Tabarrok", "title": "The Irony of Hillary Clinton’s Data Analytics", "body": "Barack Obama’s campaign adopted data but Hillary Clinton’s campaign has been molded by data from birth.", "_childDocuments_": [ { "path": "2.posts.comments", "id": "28711-19237", "author": "Todd", "text": "Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to learn over time, and apply the lessons learned to the real world.", "sentiment": "positive", "_childDocuments_": [ { "path": "3.posts.comments.replies", "author": "The Other Jim", "id": "28711-12444", "sentiment": "negative", "text": "No, she lost because (1) she is thoroughly detested person and (2) the DNC decided Obama should therefore win." }]}]}]
  • 15. 15 • JSON Document endpoint (6x only) - /update/json/docs • Field name mappings • Child Document splitting - Enhanced support coming soon. solr-6.2.1$ curl 'http://localhost:8983/solr/gettingstarted/update/json/docs? split=/|/posts|/posts/comments|/posts/comments/replies&commit=true' --data- binary @small-example-data.json -H ‘Content-type:application/json' NOTE: All documents must contain a unique ID. Sending Documents to Solr: JSON Endpoint
  • 16. 16 • Update Request Processors don’t work with nested documents • Example: • UUID update processor does not auto-add an id for a child document. • Workaround: • Take responsibility at the client layer to handle the computation for nested documents. • Change the update processor in Solr to handle nested documents. Update Processors and Nested Documents
  • 17. 17 • The entire block needs reindexing • Forgot to add a meta-data field that might be useful? Complete reindex • Store everything in Solr IF • it’s too expensive to reconstruct the doc from original data source • No access to data anymore e.g. streaming data Re-Indexing Your Documents
  • 18. 18 • Various ways to index nested documents • Need to re-index entire block Nested Document Indexing Summary
  • 19. Let’s ask some interesting questions
  • 20. 20 { "path":["4.posts.comments.replies.keywords"], "text":["Trump"]}, { "path":["3.posts.comments.keywords"], "text":["Trump"]}, { "path":["2.posts.keywords"], "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "text":["Trump proposes eliminating America’s $500 billion trade deficit through a combination of increased exports and reduced imports."], "path":["1.posts"]}, { "text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."], "path":["2.posts.comments"]} Easy question first Find all documents that mention Trump q=text:Trump
  • 21. 21 { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."], "path":["2.posts.comments"]}, { "text":["No one goes to Clinton rallies while tens of thousands line up to see Trump, data-mining leads to a fantasy view of the World."], "path":["2.posts.comments"]} Returning certain types of documents Find all comments and replies that mention Trump q=(path:2.posts.comments OR path:3.posts.comments.replies) AND text:Trump Recipe: At the data pre-processing stage, add a field that indicates document type and also its path in the hierarchy (-- stay tuned):
  • 24. 24 { "path":["3.posts.comments.keywords"], "sentiment":["positive"], "text":["Hillary"]}, { "path":["4.posts.comments.replies.keywords"], "sentiment":["negative"], "text":["Hillary"]}, { "path":["2.posts.keywords"], "text":["Hillary"]} Recap so far... Find all keywords that are Hillary q=path:*.keywords AND text:Hillary We're querying precisely for documents which we provide a search condition for Query Level 3 Result Level 3 Query Level 4 Result Level 4 Query Level 2 Result Level 2
  • 25. 25 Returning parents by querying children: Block Join Parent Query Find all comments whose keywords detected positive sentiment towards Hillary q={!parent which="path:2.posts.comments"}path:3.posts.comments.keywords AND text:Hillary AND sentiment:positive Query Level 3 Result Level 2 { "author":["Brian Donohue"], "text":["Hillary was impressive, for sure, and Trump spent time spluttering and floundering, but he was actually able to find his feet and score some points."], "path":["2.posts.comments"]}, { "author":["Todd"], "text":["Clinton got out data-ed and out organized in 2008 by Obama. She seems at least to learn over time, and apply the lessons learned to the real world."], "path":["2.posts.comments"]}
  • 26. 26 { "sentiment":["negative"], "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "sentiment":["neutral"], "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]}, { "sentiment":["positive"], "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"]} Returning children by querying parents: Block Join Child Query Find replies to negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies Query Level 2 Result Level 3
  • 27. 27 { "sentiment":["negative"], "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "sentiment":["neutral"], "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]}, { "sentiment":["positive"], "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"]} Returning children by querying parents: Block Join Child Query Find replies to negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative&fq=path:3.posts.comments.replies Query Level 2 Result Level 3 Block Join Child Query + Filtering Query A bit counterintuitive and non-symmetrical to the BJPQ
  • 28. 28 { "path":["4.posts.comments.replies.keywords"], "id":"17413-13550", "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"], "id":"17413-66188"}, { "path":["3.posts.comments.keywords"], "id":"12413-12487", "text":["Hillary"]}, { "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"], "id":"12413-10998"} Returning all document's descendants Block Join Child Query Find all descendants of negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative Query Level 2 Results Level 3 Results Level 4
  • 29. 29 Returning all document's descendants Block Join Child Query Find all descendants of negative comments q={!child of="path:2.posts.comments"}path:2.posts.comments AND sentiment:negative Query Level 2 Results Level 3 Results Level 4 { "path":["4.posts.comments.replies.keywords"], "id":"17413-13550", "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"], "id":"17413-66188"}, { "path":["3.posts.comments.keywords"], "id":"12413-12487", "text":["Hillary"]}, { "text":["Agreed why spend time data-mining for a fantasy view of the world , when instead you can see a fantasy in person?"], "path":["3.posts.comments.replies"], "id":"12413-10998"} Issue: no grouping by parent What if we want to bring the whole sub-structure?
  • 30. 30 Find all negative comments and return them with all their descendants q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.*] Query Level 2 Result Level 2 sub- hierarchy Returning document with all descendants: ChildDocTransformer { "sentiment":["negative"], "text":["I’ll be the first to say this, but the analysis is flawed."], "path":["2.posts.comments"], "_childDocuments_":[ { "path":["4.posts.comments.replies.keywords"], "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "path":["4.posts.comments.replies.keywords"], "text":["U.S."]}, { "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]} ] }, ... Issue: the "sub-hierarchy" is flat
  • 31. • Returns all descendant documents along with the queried document • flattens the sub-hierarchy • Workarounds: • Reconstruct the document using path ("path":["3.posts.comments.replies"]) information in case you want the entire subtree (result post-processing) • use childFilter in case you want a specific level 31 “This transformer returns all descendant documents of each parent document matching your query in a flat list nested inside the matching parent document." (ChildDocTransformer cwiki) Returning document with all descendants: ChildDocTransformer
  • 32. 32 Find all negative comments and return them with all replies to them q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.* childFilter=path:3.posts.comments.replies] { "sentiment":["negative"], "text":["I’ll be the first to say this, but the analysis is flawed."], "path":["2.posts.comments"], "_childDocuments_":[ { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]}, { "text":["So then I guess he will also eliminate the current account surplus? What will happen to U.S. asset values?"], "path":["3.posts.comments.replies"]} ] }, ... Returning document with specific descendants: ChildDocTransformer + childFilter Query Level 2:comments Result Level 2:comments + Level 3:replies
  • 33. 33 Find all negative comments and return them with all their descendants that mention Trump q=path:2.posts.comments AND sentiment:negative&fl=*,[child parentFilter=path:2.* childFilter=text:Trump] { "sentiment":["negative"], "text":["I’ll be the first to say this, but the analysis is flawed."], "path":["2.posts.comments"], "_childDocuments_":[ { "path":["4.posts.comments.replies.keywords"], "text":["Trump"]}, { "text":["LOL. I enjoyed Trump during last night’s stand-up bit, but this is funnier."], "path":["3.posts.comments.replies"]} ] }, ... Returning document with queried descendants: ChildDocTransformer + childFilter Query Level 2:comments Result Level 2:comments + sub-levels Issue: cannot use boolean expressions in childFilter query
  • 34. 34 Cross-Level Querying Mechanisms: • Block Join Parent Query • Block Join Children Query • ChildDocTransformer Good points: • overlapping & complementary features • good capabilities of querying direct ancestors/descendants • possible to query on siblings of different type Drawbacks: • need for data-preprocessing for better querying flexibility • limited support of querying over non-directly related branches (overcome with graphs?) • flattening nested data (additional post-processing is needed for reconstruction) Nested Document Querying Summary
  • 35. Faceting on Nested Documents
  • 36. 36 • Solr allows faceting on nested documents! • Two mechanisms for faceting: • Faceting with JSON Facet API (since Solr 5.3) • Block Join Faceting (since Solr 5.5) Faceting on Nested Documents
  • 37. 37 q=path:2.posts.comments AND sentiment:positive& json.facet={ most_liked_authors : { type: terms, field: author, domain: { blockParent : "path:1.posts"} }} Faceting on parents by descendants JSON Facet API: Parent Domain Count authors of the posts that received positive comments "most_liked_authors":{ "buckets":[ { "val":"Alex Tabarrok", "count":1}, { "val":"Tyler Cowen", "count":1} ] } Query Level 2 Facet Level 1
  • 38. 38 Faceting on descendants by ancestors JSON Facet API: Child Domain Distribution of keywords that appear in comments and replies by the top-level posts Query Level 1 Facet Descendant Levels "top_keywords":{ "buckets":[{ "val":"hillary", "count":4, "counts_by_posts":2}, { "val":"trump", "count":3, "counts_by_posts":2}, { "val":"dnc", "count":1, "counts_by_posts":1}, { "val":"obama", "count":2, "counts_by_posts":1}, { "val":"u.s", "count":1, "counts_by_posts":1} ]}
  • 39. 39 q=path:1.posts&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:1.posts" }, facet:{ top_keywords : { type: terms, field: text, sort: "counts_by_posts desc", facet: { counts_by_posts: "unique(_root_)" }}}}} Faceting on descendants by ancestors JSON Facet API: Child Domain Distribution of keywords that appear in comments and replies by the top-level posts Query Level 1 Facet Descendant Levels
  • 40. 40 Faceting on descendants by top-level ancestor JSON Facet API: Child Domain Distribution of keywords that appear in comments and replies by the top-level posts Query Level 1 Facet Descendant Levels Issue: only the top-ancestor gets the unique "_root_" field by default q=path:1.posts&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:1.posts" }, facet:{ top_keywords : { type: terms, field: text, sort: "counts_by_posts desc", facet: { counts_by_posts: "unique(_root_)" }}}}}
  • 41. 41 q=path:2.posts.comments&rows=0& json.facet={ filter_by_child_type :{ type:query, q:"path:*comments*keywords", domain: { blockChildren : "path:2.posts.comments" }, facet:{ top_keywords : { type: terms, field: text, sort: "counts_by_comments desc", facet: { counts_by_comments: "unique(2.posts.comments-id)" }}}}} Faceting on descendants by intermediate ancestors JSON Facet API: Child Domain + unique fields Distribution of keywords that appear in comments and replies by the comments Query Level 2 Facet Descendant Levels At pre-processing, introduce unique fields for each level
  • 42. 42 Faceting on descendants by intermediate ancestors JSON Facet API: Child Domain + unique fields Distribution of keywords that appear in comments and replies by the comments Query Level 2 Facet Descendant Levels "top_keywords":{ "buckets":[{ "val":"Hillary", "count":4, "counts_by_comments":3}, { "val":"Trump", "count":3, "counts_by_comments":3}, { "val":"DNC", "count":1, "counts_by_comments":1}, { "val":"Obama", "count":2, "counts_by_comments":1}, { "val":"U.S.", "count":1, "counts_by_comments":1} ]}
  • 43. Now let's try the same using Block Join Faceting
  • 44. 44 • Experimental Feature • Needs to be turned on explicitly in solrconfig.xml More info: https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting Block Join Faceting
  • 45. 45 bjqfacet?q={!parent which=path:2.posts.comments} path:*.comments*keywords&rows=0&facet=true&child.facet.field=text Faceting on descendants by ancestors #2: Block Join Faceting on Children Domain Distribution of keywords that appear in comments and replies by the comments "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } Query Level 2 Facet Descendant Levels
  • 46. 46 bjqfacet?q={!parent which=path:2.posts.comments} path:*.comments*keywords&rows=0&facet=true&child.facet.field=text Faceting on descendants by ancestors #2: Block Join Faceting on Children Domain Distribution of keywords that appear in comments and replies by the comments "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } Query Level 2 Facet Descendant Levels bjqfacet request handler instead of query
  • 47. 47 Output Comparison Block Join Facet JSON Facet API "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } "top_keywords":{ "buckets":[{ "val":"Hillary", "count":4, "counts_by_comments":3}, { "val":"Trump", "count":3, "counts_by_comments":3}, { "val":"DNC", "count":1, "counts_by_comments":1}, { "val":"Obama", "count":2, "counts_by_comments":1}, { "val":"U.S.", "count":1, "counts_by_comments":1} ]} Distribution of keywords that appear in comments and replies by the comments
  • 48. 48 Output Comparison Block Join Facet JSON Facet API "facet_fields":{ "text":[ "dnc",1, "hillary",3, "obama",1, "trump",3, "u.s",1 ] } "top_keywords":{ "buckets":[{ "val":"Hillary", "count":4, "counts_by_comments":3}, { "val":"Trump", "count":3, "counts_by_comments":3}, { "val":"DNC", "count":1, "counts_by_comments":1}, ... Distribution of keywords that appear in comments and replies by the comments Output is sorted in alphabetical order. It cannot be changed facet:{ top_keywords : { ... sort: "counts_by_comments desc" }}}
  • 49. 49 JSON Facet API: • Experimental - but more mature • More developed and established feature • bulky JSON syntax • faceting on children by non-top level ancestors requires introducing unique branch identifiers similar to "_root_" on each level Block Join Facet: • Experimental feature • Lacks controls: sorting, limit... • traditional query-style syntax • proper handling of faceting on children by non-top level ancestors Hierarchical Faceting Summary
  • 50. 50 • Returning hierarchical structure • JSON facet rollups is in the works - SOLR-8998 • Graph querying might replace a lot of functionalities of cross-level querying - No distributed support right now. • There’s more but the community would love to have more people involved! Community Roadmap
  • 51. Thank you! Anshum Gupta anshum@apache.org | @anshumgupta Alisa Zhila alisa.zhila@gmail.com https://github.com/alisa-ipn/solr-revolution-2016-nested-demo