The document discusses different approaches for modeling relationships between documents in Lucene and Elasticsearch, including index-time and query-time joins. Index-time joins use block indexing to store related documents together, allowing nested queries, filters and facets in Elasticsearch. Query-time joins do not require block indexing and collect matching child documents during query execution. Both Lucene and Elasticsearch support various parent-child and field-based query-time join approaches.
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!Johan Biere
Gain high level understanding of various challenges faced by organizations in planning their Migration and BC/DR Strategy for applications in Azure and Hybrid.
Learn about the capabilities that make Azure the ideal destination for your applications, data, and infrastructure. You will get clear, scenario-based guidance on how to approach your technical migration & innovation journeys. Understand how to migrate different on-premises applications to Azure, including moving them to Azure IaaS Using ASR with Hyper-V Assessments & Agentless migration.
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
With a centralized data store, the entire spectrum of analytics is at your fingertips. Using Looker & Segment, you can collect, store and analyze everything from click-stream and event data to transactional and behavioral data in your data warehouse.
Some of the topics this webinar will include:
-The advantages of a centralized data warehouse with Segment Warehouses
-Creating a data model to get your company on the same page with Looker Blocks
-Putting it all together: Best practices for making your data accessible to your end users
These are slides from an introductory session for Microsoft Azure done at IIT Sri Lanka giving the students hands-on exposure to Microsoft Azure. Introducing them to Azure App Service and Azure Functions.
Azure Site Recovery - BC/DR - Migrations & assessments in 60 minutes!Johan Biere
Gain high level understanding of various challenges faced by organizations in planning their Migration and BC/DR Strategy for applications in Azure and Hybrid.
Learn about the capabilities that make Azure the ideal destination for your applications, data, and infrastructure. You will get clear, scenario-based guidance on how to approach your technical migration & innovation journeys. Understand how to migrate different on-premises applications to Azure, including moving them to Azure IaaS Using ASR with Hyper-V Assessments & Agentless migration.
Creating a Single Source of Truth: Leverage all of your data with powerful an...Looker
With a centralized data store, the entire spectrum of analytics is at your fingertips. Using Looker & Segment, you can collect, store and analyze everything from click-stream and event data to transactional and behavioral data in your data warehouse.
Some of the topics this webinar will include:
-The advantages of a centralized data warehouse with Segment Warehouses
-Creating a data model to get your company on the same page with Looker Blocks
-Putting it all together: Best practices for making your data accessible to your end users
These are slides from an introductory session for Microsoft Azure done at IIT Sri Lanka giving the students hands-on exposure to Microsoft Azure. Introducing them to Azure App Service and Azure Functions.
Azure SQL Database now has a Managed Instance, for near 100% compatibility for lifting-and-shifting applications running on Microsoft SQL Server to Azure. Contact me for more information.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Microsoft Azure Platform-as-a-Service (PaaS)Chris Dufour
Azure is Microsoft’s cloud computing platform made up of a growing collection of integrated services: compute, storage, data, networking and apps.
Azure is the only major cloud platform ranked by Gartner as an industry leader for both Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). This powerful combination of managed and unmanaged services lets you build, deploy and manage applications in any way you like for unmatched productivity.
In this talk we will take a look at Microsoft’s cloud strategy and see how you can leverage PaaS in your environment.
Azure Training | Microsoft Azure Tutorial | Microsoft Azure Certification | E...Edureka!
This tutorial on Azure Training will give you a kick start in the Azure Cloud Environment, we will be creating a real life use case from scratch in this Azure Training Tutorial. Stay Tuned! In this Microsoft Azure Training you will understand:
1) What is Cloud?
2) Why Microsoft Azure?
3) What is Microsoft Azure?
4) Launching Services in Azure
5) Demo
6) Azure Pricing
Azure SQL Database (SQL DB) is a database-as-a-service (DBaaS) that provides nearly full T-SQL compatibility so you can gain tons of benefits for new databases or by moving your existing databases to the cloud. Those benefits include provisioning in minutes, built-in high availability and disaster recovery, predictable performance levels, instant scaling, and reduced overhead. And gone will be the days of getting a call at 3am because of a hardware failure. If you want to make your life easier, this is the presentation for you.
Adelaide Global Azure Bootcamp 2018 - Azure 101Balabiju
A one day session that covers all the foundation of Azure services.
Microsoft Cloud Overview - IaaS, PaaS and SaaS
• Microsoft Azure Resource Manager (ARM)
• Microsoft Azure Storage
• Microsoft Azure Virtual Machines
• Microsoft Azure Identity
• Microsoft Azure Backup
This document is a collage document cut & paste from the original SCSM document published by Microsoft. I only took what needed for the document to be completed. As I was developing a System Center Service Manager Sizing on Hardware and Software, the actual BOM is listed at the bottom of the document with design guidelines.
Launching a startup isn't easy. At each stage of scaling - from founding to product-market fit, from product-market fit to hyper growth, and from hyper growth to maturity - entrepreneurs face unique challenges. Greylock Partners hosted an event, called Greyscale, focused on these challenges at each stage. In the opening keynote, Jerry Chen of Greylock Partners discusses the state of enterprise software after the first quarter of 2016. He summarizes the private and public markets, M&A activity, and explains how this climate affects the startup environment.
Introduction to Oracle Cloud Infrastructure ServicesKnoldus Inc.
Oracle Cloud Infrastructure is a set of complementary cloud services that enable you to build and run a wide range of applications and services in a highly available hosted environment. Oracle Cloud Infrastructure (OCI) offers high-performance compute capabilities (as physical hardware instances) and storage capacity in a flexible overlay virtual network that is securely accessible from your on-premises network.
Searching Relational Data with Elasticsearchsirensolutions
Second Galway Data Meetup, 29th April 2015
Elasticsearch was originally developed for searching flat documents. However, as real world data is inherently more complex, e.g., nested json data, relational data, interconnected documents and entities, Elasticsearch quickly evolves to support more advanced search scenarios. In this presentation, we will review existing features and plugins to support such scenarios, discuss their advantages and disadvantages, and understand which one is more appropriate for a particular scenario.
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.
Azure SQL Database now has a Managed Instance, for near 100% compatibility for lifting-and-shifting applications running on Microsoft SQL Server to Azure. Contact me for more information.
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
Today’s data-driven companies have a choice to make – where do we store our data? As the move to the cloud continues to be a driving factor, the choice becomes either the data warehouse (Snowflake et al) or the data lake (AWS S3 et al). There are pro’s and con’s for each approach. While the data warehouse will give you strong data management with analytics, they don’t do well with semi-structured and unstructured data with tightly coupled storage and compute, not to mention expensive vendor lock-in. On the other hand, data lakes allow you to store all kinds of data and are extremely affordable, but they’re only meant for storage and by themselves provide no direct value to an organization.
Enter the Open Data Lakehouse, the next evolution of the data stack that gives you the openness and flexibility of the data lake with the key aspects of the data warehouse like management and transaction support.
In this webinar, you’ll hear from Ali LeClerc who will discuss the data landscape and why many companies are moving to an open data lakehouse. Ali will share more perspective on how you should think about what fits best based on your use case and workloads, and how some real world customers are using Presto, a SQL query engine, to bring analytics to the data lakehouse.
Microsoft Azure Platform-as-a-Service (PaaS)Chris Dufour
Azure is Microsoft’s cloud computing platform made up of a growing collection of integrated services: compute, storage, data, networking and apps.
Azure is the only major cloud platform ranked by Gartner as an industry leader for both Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). This powerful combination of managed and unmanaged services lets you build, deploy and manage applications in any way you like for unmatched productivity.
In this talk we will take a look at Microsoft’s cloud strategy and see how you can leverage PaaS in your environment.
Azure Training | Microsoft Azure Tutorial | Microsoft Azure Certification | E...Edureka!
This tutorial on Azure Training will give you a kick start in the Azure Cloud Environment, we will be creating a real life use case from scratch in this Azure Training Tutorial. Stay Tuned! In this Microsoft Azure Training you will understand:
1) What is Cloud?
2) Why Microsoft Azure?
3) What is Microsoft Azure?
4) Launching Services in Azure
5) Demo
6) Azure Pricing
Azure SQL Database (SQL DB) is a database-as-a-service (DBaaS) that provides nearly full T-SQL compatibility so you can gain tons of benefits for new databases or by moving your existing databases to the cloud. Those benefits include provisioning in minutes, built-in high availability and disaster recovery, predictable performance levels, instant scaling, and reduced overhead. And gone will be the days of getting a call at 3am because of a hardware failure. If you want to make your life easier, this is the presentation for you.
Adelaide Global Azure Bootcamp 2018 - Azure 101Balabiju
A one day session that covers all the foundation of Azure services.
Microsoft Cloud Overview - IaaS, PaaS and SaaS
• Microsoft Azure Resource Manager (ARM)
• Microsoft Azure Storage
• Microsoft Azure Virtual Machines
• Microsoft Azure Identity
• Microsoft Azure Backup
This document is a collage document cut & paste from the original SCSM document published by Microsoft. I only took what needed for the document to be completed. As I was developing a System Center Service Manager Sizing on Hardware and Software, the actual BOM is listed at the bottom of the document with design guidelines.
Launching a startup isn't easy. At each stage of scaling - from founding to product-market fit, from product-market fit to hyper growth, and from hyper growth to maturity - entrepreneurs face unique challenges. Greylock Partners hosted an event, called Greyscale, focused on these challenges at each stage. In the opening keynote, Jerry Chen of Greylock Partners discusses the state of enterprise software after the first quarter of 2016. He summarizes the private and public markets, M&A activity, and explains how this climate affects the startup environment.
Introduction to Oracle Cloud Infrastructure ServicesKnoldus Inc.
Oracle Cloud Infrastructure is a set of complementary cloud services that enable you to build and run a wide range of applications and services in a highly available hosted environment. Oracle Cloud Infrastructure (OCI) offers high-performance compute capabilities (as physical hardware instances) and storage capacity in a flexible overlay virtual network that is securely accessible from your on-premises network.
Searching Relational Data with Elasticsearchsirensolutions
Second Galway Data Meetup, 29th April 2015
Elasticsearch was originally developed for searching flat documents. However, as real world data is inherently more complex, e.g., nested json data, relational data, interconnected documents and entities, Elasticsearch quickly evolves to support more advanced search scenarios. In this presentation, we will review existing features and plugins to support such scenarios, discuss their advantages and disadvantages, and understand which one is more appropriate for a particular scenario.
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
The volume of data that we are working with is growing every day, the size of data is pushing us to find new intelligent solutions for problem’s put in front of us. Elasticsearch server has proved it self as an excellent full text search solution for big volume’s of data.
Nested and Parent/Child Docs in ElasticSearchBeyondTrees
A key part of the architecture of RefWorks Flow, a new document workflow tool for researchers, is an ElasticSearch cluster used for citation canonicalization. We will present our findings of how to use the "nested" type and parent-child relations in ElasticSearch to do complex where-clause queries in an efficient way
One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions: Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document? Are join tables necessary, or is there another technique for building out many-to-many relationships? What level of denormalization is appropriate? How do my data modeling decisions affect the efficiency of updates and queries? In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.
Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.
This is a presentation on CouchDB that I gave at the New York PHP User Group. I talked about the basics of CouchDB, its JSON documents, its RESTful API, writing and querying MapReduce views, using CouchDB from within PHP, and scaling.
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
Modeling JSON data for NoSQL document databasesRyan CrawCour
Modeling data in a relational database is easy, we all know how to do it because that's what we've always been taught; But what about NoSQL Document Databases?
Document databases take (much) of what you know and flip it upside down. This talk covers some common patterns for modeling data and how to approach things when working with document stores such as Azure DocumentDB
1. Document relations
Martijn van Groningen
@mvgroningen
Tuesday, November 6, 12
2. Overview
• Background
• Document relations with joining.
• Various solutions in Lucene and Elasticsearch
Tuesday, November 6, 12
3. Background - Lucene model
• Lucene is document based.
• Lucene doesn’t store information about relations
between documents.
• Data often holds relations.
• Good free text search over relational data.
Tuesday, November 6, 12
4. Background - Common solutions
• Compound documents.
• May result in documents with many fields.
• Subsequent searches.
• May cause a lot of network overhead.
• Non Lucene based approach:
• Use Lucene in combination with a relational database.
Tuesday, November 6, 12
5. Background - Example
• Product
• Name
• Description
• Product-item
• Color
• Size
• Price
Tuesday, November 6, 12
6. Background - Example
• Compound Product & Product-items document.
• Each product-item has its own field prefix.
Tuesday, November 6, 12
7. Background - Other solutions
• Lucene offers solutions to have a 'relational' like
search.
• Joining
• Result grouping
• Elasticsearch builds on top of the joining
capabilities.
• These solutions aren't naturally supported.
Tuesday, November 6, 12
9. Joining
• Join support available since Lucene 3.4
• Not a SQL join!
• Two distinct joining types:
• Index time join
• Query time join
• Joining provides a solution to handle document
relations.
Tuesday, November 6, 12
10. Joining - What is out there?
• Index time:
• Lucene’s block join implementation.
• Elasticsearch’s nested filter, query and facets.
• Built on top of the Lucene’s block join support.
• Query time:
• Lucene’s query time join utility.
• Solr’s join query parser.
• Elasticsearch’s various parent-child queries and filters.
Tuesday, November 6, 12
11. Index time join
And nested documents.
Tuesday, November 6, 12
13. Joining - Block indexing
• Atomically adding documents.
• A block of documents.
• Each document gets sequentially assigned Lucene
document id.
• IndexWriter#addDocuments(docs);
Tuesday, November 6, 12
14. Joining - Block indexing
• Index doesn't record blocks.
• App is responsible for identifying block documents.
• Segment merging doesn’t re-order documents in a
segment.
• Adding athe whole block.block requires you to
reindex
document to a
Tuesday, November 6, 12
15. Joining - block join query
• Parent is the last document in a block.
Tuesday, November 6, 12
18. Block join - ToChildBlockJoinQuery
• Parent filter marks the parent documents.
• Child query is executed in the parent space.
Tuesday, November 6, 12
19. Block join & Elasticsearch
• In Elasticsearch exposed as nested objects.
• Documents are constructed as JSON.
• JSON’s nested structure works nicely with block
indexing.
• Elasticsearchoftakes nested documents. and also
keeps track the
care of block indexing
Tuesday, November 6, 12
20. Elasticsearch’s nested support
• Support for a nested type in mappings.
• Nested query.
• Nested filter.
• Nested facets.
Tuesday, November 6, 12
21. Nested type
• The nested types enables Lucene’s block indexing.
index
curl -XPUT 'localhost:9200/products' -d '{
type "mappings" : {
"product" : {
"properties" : {
Nested offers "offers" : { "type" : "nested" }
}
}
}
}'
Tuesday, November 6, 12
25. Query time join
and parent & child relations.
Tuesday, November 6, 12
26. Query time joining
• Documents are joined during query time.
• More expensive, but more flexible.
• Two types of query time joins:
• Parent child joining.
• Field based joining.
Tuesday, November 6, 12
27. Lucene’s query time join
• Query time joining is executed in two phases.
• Field based joining:
• ‘from’ field
• ‘to’ field
• Doesn’t require block indexing.
Tuesday, November 6, 12
28. Query time join - JoinUtil
• Firstthe documents thatthe terms in the fromField
for
phase collects all
match with the original
query.
• The second the collected termsdocumentsprevious
match with
phase returns the
from the
that
phase in the toField.
• One public method:
• JoinUtil#createJoinQuery(...)
Tuesday, November 6, 12
31. Joining - JoinUtil
Join utility
• Result will contain one products.
• Possible to do ‘join’ across indices.
Tuesday, November 6, 12
32. Elasticsearch’s query time join
• A parent child solution.
• Not related to Lucene’s query time join.
• Support consists out of:
• The _parent field.
• The top_children query.
• The has_parent & has_child filter & query.
• Scoped facets.
Tuesday, November 6, 12
33. The _parent field
• Points to the parent type.
• Mapping attribute to be define on the child type.
curl -XPUT 'localhost:9200/products' -d '{
"mappings" : {
"offer" : {
"_parent" : {
"type" : "product"
}
}
}
}'
• Elasticsearch uses the _parent field to build an id
cache.
• Makes parent/child queries & filters fast.
Tuesday, November 6, 12
34. Indexing parent & child documents
• Parent document:
curl -XPOST 'localhost:9200/products/product/1' -d '{
"name" : "Polo shirt",
"description" : "Made of 100% cotton"
}' The id of the parent
document. Also used for
• Child documents: routing.
curl -XPOST 'localhost:9200/products/offer?parent=1' -d '{
"color" : "red",
"size" : "s",
"price" : 999
}'
curl -XPOST 'localhost:9200/products/offer?parent=1' -d '{
"color" : "blue",
"red",
"size" : "s",
"m",
"price" : 999
1099
}'
Tuesday, November 6, 12
35. The ‘top_children’ query
curl -XPOST 'localhost:9200/products/_search' -d '{
"query" : {
Child type
"top_children" : {
"type" : "offer",
"query" : {
"term" : { Child query
"size" : "m"
}
},
"score" : "sum" Score mode
}
}
}'
• Internally the in order to getpotentially executed
several times
child query is
enough parent hits.
Tuesday, November 6, 12
36. The ‘has_child’ query
curl -XPOST 'localhost:9200/products/_search' -d '{ Child type
"query" : {
"has_child" : {
"type" : "offer",
"query" : {
"term" : {
"size" : "m"
Child query
}
}
}
}
}'
• Doesn’tdoc. Works as a filter. into the matching
parent
map the child scores
• The has_parent query matches child document
instead.
Tuesday, November 6, 12
38. Conclusion
• Block join & nested object are fast and efficient,
but lack flexibility.
• Query time and parent child join are flexible at the
cost of performance and memory.
• Field based query time joining is the most flexible.
• Parent child based joining is the fastest.
• Faceting in combination with document relations
gives a nice analytical view.
Tuesday, November 6, 12