SlideShare a Scribd company logo
1 of 34
Download to read offline
DOCUMENT LEVEL SECURITY IN SEARCH BASED
APPLICATIONS

Rajani Maski

- Senior Software Engineer
Agenda






Introduction to Search Based Applications
Requirement Analysis of Document Level Security
Access Control Lists
Multiple Solutions
Summary
Search Based Applications
 Search Based Applications are software application in which Search Engine
platform is used as the core infrastructure for information accessing and
reporting.
 E-commerce web applications or content management systems are the types of
search based application.
Overview of Search Based System
User Authentication System

Search Based Application Server

Unified Data Layer

Archives

Documents

Emails

File
Server

Authentication
• User is authenticated before providing access
to the application
Application
•
Presents with full fledge User Interface
• Perform user operations such as upload
documents, send emails, search, etc.
Unified Data Layer
• Search Server
• Indexes content across the sources
• Retrieves data at very high speed.
Data Storage
• Volume of data sources from different
repositories
So Far, So Good!

What’s the problem?
Common Access
To Unified data Layer

User Authentication System

Search Based Application

How is this a
threat?
Unified Data Layer

Archives

Documents

Emails

File
Servers
Consider a Sample Use Case
User A :
- Logs in to application.
- Performs a search operation
- With the key words such as ‘Pay Slips’, ‘Personal’ or ‘appraisal’.
Sample results demonstrated for “appraisal”
Search Results

Un Authorized
Results
Observations
Relevant Search Results : [Correct]
- User A was returned with relevant search results based on his search query;
such as exact matches, more like this key words, synonym key words, etc.
Unauthorized Search results: [Wrong]
- Few of the search results retrieved were the documents to which he was not
authorized to view.
How are we
doing with this?
Threats:
• Exposure to other users’ confidential documents
• Access to Unauthorized information.
Problem Definition
•

To develop a search platform where every user has access to only those
documents to which he/she is authorized to.

•

To ensure that all the confidential data uploaded is not globally searchable unless
it is intended to be globally accessible.

How can we
achieve this?
Solution
Maintaining Access Control List mapped to each document
object.

Access
Control
List?
Access Control List
• Access Controls are Security features
that control how users [subject] and
documents[object] communicate and
interact with one another.
• Subject: An active entity[User] that
requests access to an
object[Document].
• Object: A passive entity[Document]
that contains information

Interaction

Subject

Document

Object
Data Model
Let’s first understand the data model of search engine.

Alec_1167
{_id:”1167”,
Name:”Ale C”,
Agent:”Miller”
Place:”NY, NJ, CA”,
Units:570}

NY

2

NJ

1167

1167 Alec Miller

1

1167

How are documents stored in search engine?
Document Oriented Approach.

1167

3

CA

570

3424 Kiwi

reds

340

5612 Reh

Mo’s

664
Indexing and Storing Document Object
•
•
•
•

User A uploads a document into the system
Metadata and Text Extraction
Convert it to a flat structure
Input it to Search Engine

Document

Metadata
Extract

Search
Engine

Document
Saved
•

We missed to capture something!

•

What did we miss?
– Capturing of User information for each document!
• Who uploaded the document
• To whom did the user share with?
Document

•

Metadata
Extract

Search
Engine

How do we maintain this information?
– Access control list to each document object.

Document
Saved
Conventional Solution
•

Access Control Lists for each user.

•

At the time of search,
– Retrieve search results,
– And perform a check on each document for
user’s authorization and
– Finally return the results.

Search Engine

Security Filter Each
Document
Return Results to
User
 Multiple Solutions
Access Control Models
Solutions are dependent on the Access Control Models we choose.

Two important types of Access Control Models:
1.
2.

Non-Discretionary Access Control(Role Based)
Discretionary Access Control (DAC)
1. Non-Discretionary (Role Based)
Sales
Definition:
•

Non-Discretionary ACL uses a
administered set of rules to
determine how Users and
Documents interact.

Sales Documents

Marketing Documents

Manager

Engineering Documents

•

It is referred to as
nondiscretionary because
assigning a user to a role is
unavoidable

Admin Documents

Super User
Solution For Role Based ACL - Type 1
System that has,
• Roles defined during design time and Static ACL set
to each document .
•

We choose, “Early Binding with ACL bound to

Document Objects”
In such systems,
• Document objects will include a multi-valued Roleid field that will contain list of role-Ids which has
access to the document.

Index Time
Document 1
role-Ids: [“1”, “2”, “3”]
Document 1
role-Ids: [“1”, “2”, “3”]
Document 2
“role-Ids:” [ “2”, “3”]

Documents with ACLs
Continued…
At the time of search,
•
User Search Query should be appended with user’s
Role Id.
•
Solr’s Filter Query feature and it’s caching
techniques gives the most efficient solution for
such ACL Techniques. This approach is called as
‘Early Binding’ approach.

Query
Request

Early Binding
User Role-Id

Solr J Client

Query
Response
Solution For Role Based ACL - Type 2
Systems that has,
•
Roles which often change; data is normalized by
segregating access control information into
different tables.
•

Document1
D1

This approach is called as ‘Early Binding with

Externalized ACL’

•
•
•

In such systems:
Role-Ids are not attached to the document object.
Instead they are stored into different tables with
foreign key relation.
Use Pseudo Joins at the time of Search

Doc ID

Role-Ids

D1

1, 2, 3, N
2. Discretionary Access Control
Definition:
• Discretionary – Document
owner has the authority to
control access of the document.

• A system that enables the
document owner to specify set
of Users with access to a set of
documents

Owner

Specifies Users/groups
who can Access

Object
Solution for Discretionary ACL - Type 1
System that has
•
Frequent changes in ACL
•
ACL is defined for each user and a document,
•
We choose ‘Late Binding Approach with
Externalized ACL’

Users

Doc1

Doc2 Doc N

User A

1

1

1

User B

0

1

1

User M
In such systems,
•
ACL is a 2D-matrix with users and documents
along its rows and columns

Encode Values – 0 :No access, 1 : Access
N : Number of Users, M – Number of Documents
Continued…
For implementation, the ACL matrix can be represented as a array of bits.
Users

Doc1 Doc2

Doc N

UserA 1

1

1

UserB

1

1

0

[1] 111
[2] 011

This compact representation improves search efficiency and memory over head.
Example
Consider,
•
•
•
•

Maximum documents in the Search systems is 5 with document ids:{1,2, 3, 4, 5}
Maximum Users are 2 { Id : 1,2 }
User 1 has access to document {1, 2, 3} 1 1 1 0 0
User 2 has access to Document {1,2,3,4,5} 1 1 1 1 1

•

ACL matrix and array representation:
User

1 2 3 4 5

1

1 1 1 0 0

2

1 1 1 1 1

[1] 11100
[2] 11111
Solr Implementation
Solution 1
• Solr has a Post Filter Interface that can be extended to develop a Custom Plugin.
• Interface has a method called ‘collect()’

•

Collect() has a list of documents matched to the user’s search query.
– Iterate through the list, get the document-Id from the Field Cache and
apply ACL using bit array . 1 1 1 0 0

•

Code Snippets: https://gist.github.com/rajanim/7197154
Other Implementation Solution
Solution 2
• Using BitSet utilities
• Get the bitset of documents matched by the search query from Search Engine
• Get the User ACL bitset instance
• Obtain the intersection of the two bitsets [intersect(bitset other)]

1

1

1

0

0

1

1

1

1

1

0

0

1

0

0
Solution for Discretionary ACL - Type 2
•
•

Discretionary ACL systems have static ACL
We choose, “Early Binding with ACL bound to Document
Objects”

In such systems,
• Document objects will include a multi-valued user-id field that
contains a list of user-ids with access to the document.
• The user-id field has to be indexed.
Continued…
•

This solution requires the ACL and document data to be de-normalized to flat
structure.

Index Time
Parse Document

Search Time
Query Request
With User ID

Add List of Users
Who has access

Solr J Client

Query
Response
 Summary
Summary
•

Discretionary ACL with late binding solution is a complex model and it requires
extensive verification

•

Leverage Solr’s smart caching capability

•

Since ACL always adds an additional over head it has to be optimized to provide
minimum delay.
References:
•
•
•
•
•
•

searchhub.org/2012/02/22/custom-security-filtering-in-solr/
Secure Search in Enterprise Webs: Tradeoffs in Efficient Implementation for
Document Level Security By Peter Bailey, David Hawking, Brett Matson
All in One Book (Shon Harris, 2005)
http://www.searchtechnologies.com/enterprise-search-document-levelsecurity.html
http://alvinalexander.com/java/jwarehouse/lucene/src/test/org/apache/lucene
/search/TestFilteredQuery.java.shtml
https://github.com/Zvents/score_stats_component/blob/master/src/main/java/
com/zvents/solr/components/ScoreStatsPostFilter.java
Thank
You

More Related Content

What's hot

Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformIan Foster
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Srinath Perera
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overviewNeo4j
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologyLucidworks
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Globus
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!Michele Leroux Bustamante
 
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.Lucidworks
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming OverviewBrian Hughes
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...MongoDB
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...Malin Weiss
 
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopBSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopAjay Choudhary
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hoodSmartCat
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Document Validation in MongoDB 3.2
Document Validation in MongoDB 3.2Document Validation in MongoDB 3.2
Document Validation in MongoDB 3.2MongoDB
 
Metadata based statistics for DSpace
Metadata based statistics for DSpaceMetadata based statistics for DSpace
Metadata based statistics for DSpaceBram Luyten
 

What's hot (20)

Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis TechnologySimple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
Simple Fuzzy Name Matching in Solr: Presented by Chris Mack, Basis Technology
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
HTML5 hacking
HTML5 hackingHTML5 hacking
HTML5 hacking
 
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
Optimize Is (Not) Bad For You - Rafał Kuć, Sematext Group, Inc.
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
 
BSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming WorkshopBSides SG Practical Red Teaming Workshop
BSides SG Practical Red Teaming Workshop
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Pci multitenancy exalogic at AMIS25
Pci multitenancy exalogic at AMIS25Pci multitenancy exalogic at AMIS25
Pci multitenancy exalogic at AMIS25
 
Document Validation in MongoDB 3.2
Document Validation in MongoDB 3.2Document Validation in MongoDB 3.2
Document Validation in MongoDB 3.2
 
Metadata based statistics for DSpace
Metadata based statistics for DSpaceMetadata based statistics for DSpace
Metadata based statistics for DSpace
 

Viewers also liked

Integrate ManifoldCF with Solr
Integrate ManifoldCF with SolrIntegrate ManifoldCF with Solr
Integrate ManifoldCF with Solrfrancelabs
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Thomas Vanhove
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing Luay AL-Assadi
 

Viewers also liked (6)

Integrate ManifoldCF with Solr
Integrate ManifoldCF with SolrIntegrate ManifoldCF with Solr
Integrate ManifoldCF with Solr
 
Super Size Your Search
Super Size Your SearchSuper Size Your Search
Super Size Your Search
 
Apache ManifoldCF
Apache ManifoldCFApache ManifoldCF
Apache ManifoldCF
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Real time big data stream processing
Real time big data stream processing Real time big data stream processing
Real time big data stream processing
 

Similar to A Novel methodology for handling Document Level Security in Search Based Applications

Presentation security measure
Presentation security measurePresentation security measure
Presentation security measuremukarram522
 
Least privilege, access control, operating system security
Least privilege, access control, operating system securityLeast privilege, access control, operating system security
Least privilege, access control, operating system securityG Prachi
 
Documentum content server
Documentum content serverDocumentum content server
Documentum content serverSanjay Singh
 
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptxFAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptxgattamanenitejeswar
 
Software engineering lecture 1
Software engineering  lecture 1Software engineering  lecture 1
Software engineering lecture 1JusperKato
 
Database managementsystemes_Unit-7.pptxe
Database managementsystemes_Unit-7.pptxeDatabase managementsystemes_Unit-7.pptxe
Database managementsystemes_Unit-7.pptxechnrketan
 
Library Management System
Library Management SystemLibrary Management System
Library Management SystemMartins Okoi
 
Secure File Sharing on Cloud
Secure File Sharing on CloudSecure File Sharing on Cloud
Secure File Sharing on CloudSupriya .
 
Scalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalScalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalGlobus
 
Concepts for Object Oriented Databases.ppt
Concepts for Object Oriented Databases.pptConcepts for Object Oriented Databases.ppt
Concepts for Object Oriented Databases.pptnafsigenet
 
information security(authentication application, Authentication and Access Co...
information security(authentication application, Authentication and Access Co...information security(authentication application, Authentication and Access Co...
information security(authentication application, Authentication and Access Co...Zara Nawaz
 
documentation for identity based secure distrbuted data storage schemes
documentation for identity based secure distrbuted data storage schemesdocumentation for identity based secure distrbuted data storage schemes
documentation for identity based secure distrbuted data storage schemesSahithi Naraparaju
 
Sand Governance for QlikView
Sand Governance for QlikViewSand Governance for QlikView
Sand Governance for QlikViewSand
 
Database management system lecture notes
Database management system lecture notesDatabase management system lecture notes
Database management system lecture notesUTSAHSINGH2
 
Electronic document management system Software
Electronic document management system SoftwareElectronic document management system Software
Electronic document management system SoftwareDigismartek
 

Similar to A Novel methodology for handling Document Level Security in Search Based Applications (20)

Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Presentation security measure
Presentation security measurePresentation security measure
Presentation security measure
 
Least privilege, access control, operating system security
Least privilege, access control, operating system securityLeast privilege, access control, operating system security
Least privilege, access control, operating system security
 
Documentum content server
Documentum content serverDocumentum content server
Documentum content server
 
Protection
ProtectionProtection
Protection
 
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptxFAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
FAST PHRASE SEARCH FOR ENCRYPTED CLOUD STORAGE.pptx
 
Cache Security- The Basics
Cache Security- The BasicsCache Security- The Basics
Cache Security- The Basics
 
Software engineering lecture 1
Software engineering  lecture 1Software engineering  lecture 1
Software engineering lecture 1
 
Database managementsystemes_Unit-7.pptxe
Database managementsystemes_Unit-7.pptxeDatabase managementsystemes_Unit-7.pptxe
Database managementsystemes_Unit-7.pptxe
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
 
Oracle Identity Manager Basics
Oracle Identity Manager BasicsOracle Identity Manager Basics
Oracle Identity Manager Basics
 
MCSA 70-412 Chapter 03
MCSA 70-412 Chapter 03MCSA 70-412 Chapter 03
MCSA 70-412 Chapter 03
 
Secure File Sharing on Cloud
Secure File Sharing on CloudSecure File Sharing on Cloud
Secure File Sharing on Cloud
 
Scalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalScalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data Portal
 
Concepts for Object Oriented Databases.ppt
Concepts for Object Oriented Databases.pptConcepts for Object Oriented Databases.ppt
Concepts for Object Oriented Databases.ppt
 
information security(authentication application, Authentication and Access Co...
information security(authentication application, Authentication and Access Co...information security(authentication application, Authentication and Access Co...
information security(authentication application, Authentication and Access Co...
 
documentation for identity based secure distrbuted data storage schemes
documentation for identity based secure distrbuted data storage schemesdocumentation for identity based secure distrbuted data storage schemes
documentation for identity based secure distrbuted data storage schemes
 
Sand Governance for QlikView
Sand Governance for QlikViewSand Governance for QlikView
Sand Governance for QlikView
 
Database management system lecture notes
Database management system lecture notesDatabase management system lecture notes
Database management system lecture notes
 
Electronic document management system Software
Electronic document management system SoftwareElectronic document management system Software
Electronic document management system Software
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

A Novel methodology for handling Document Level Security in Search Based Applications

  • 1. DOCUMENT LEVEL SECURITY IN SEARCH BASED APPLICATIONS Rajani Maski - Senior Software Engineer
  • 2. Agenda      Introduction to Search Based Applications Requirement Analysis of Document Level Security Access Control Lists Multiple Solutions Summary
  • 3. Search Based Applications  Search Based Applications are software application in which Search Engine platform is used as the core infrastructure for information accessing and reporting.  E-commerce web applications or content management systems are the types of search based application.
  • 4. Overview of Search Based System User Authentication System Search Based Application Server Unified Data Layer Archives Documents Emails File Server Authentication • User is authenticated before providing access to the application Application • Presents with full fledge User Interface • Perform user operations such as upload documents, send emails, search, etc. Unified Data Layer • Search Server • Indexes content across the sources • Retrieves data at very high speed. Data Storage • Volume of data sources from different repositories
  • 5. So Far, So Good! What’s the problem?
  • 6. Common Access To Unified data Layer User Authentication System Search Based Application How is this a threat? Unified Data Layer Archives Documents Emails File Servers
  • 7. Consider a Sample Use Case User A : - Logs in to application. - Performs a search operation - With the key words such as ‘Pay Slips’, ‘Personal’ or ‘appraisal’. Sample results demonstrated for “appraisal”
  • 9. Observations Relevant Search Results : [Correct] - User A was returned with relevant search results based on his search query; such as exact matches, more like this key words, synonym key words, etc. Unauthorized Search results: [Wrong] - Few of the search results retrieved were the documents to which he was not authorized to view. How are we doing with this? Threats: • Exposure to other users’ confidential documents • Access to Unauthorized information.
  • 10. Problem Definition • To develop a search platform where every user has access to only those documents to which he/she is authorized to. • To ensure that all the confidential data uploaded is not globally searchable unless it is intended to be globally accessible. How can we achieve this?
  • 11. Solution Maintaining Access Control List mapped to each document object. Access Control List?
  • 12. Access Control List • Access Controls are Security features that control how users [subject] and documents[object] communicate and interact with one another. • Subject: An active entity[User] that requests access to an object[Document]. • Object: A passive entity[Document] that contains information Interaction Subject Document Object
  • 13. Data Model Let’s first understand the data model of search engine. Alec_1167 {_id:”1167”, Name:”Ale C”, Agent:”Miller” Place:”NY, NJ, CA”, Units:570} NY 2 NJ 1167 1167 Alec Miller 1 1167 How are documents stored in search engine? Document Oriented Approach. 1167 3 CA 570 3424 Kiwi reds 340 5612 Reh Mo’s 664
  • 14. Indexing and Storing Document Object • • • • User A uploads a document into the system Metadata and Text Extraction Convert it to a flat structure Input it to Search Engine Document Metadata Extract Search Engine Document Saved
  • 15. • We missed to capture something! • What did we miss? – Capturing of User information for each document! • Who uploaded the document • To whom did the user share with? Document • Metadata Extract Search Engine How do we maintain this information? – Access control list to each document object. Document Saved
  • 16. Conventional Solution • Access Control Lists for each user. • At the time of search, – Retrieve search results, – And perform a check on each document for user’s authorization and – Finally return the results. Search Engine Security Filter Each Document Return Results to User
  • 18. Access Control Models Solutions are dependent on the Access Control Models we choose. Two important types of Access Control Models: 1. 2. Non-Discretionary Access Control(Role Based) Discretionary Access Control (DAC)
  • 19. 1. Non-Discretionary (Role Based) Sales Definition: • Non-Discretionary ACL uses a administered set of rules to determine how Users and Documents interact. Sales Documents Marketing Documents Manager Engineering Documents • It is referred to as nondiscretionary because assigning a user to a role is unavoidable Admin Documents Super User
  • 20. Solution For Role Based ACL - Type 1 System that has, • Roles defined during design time and Static ACL set to each document . • We choose, “Early Binding with ACL bound to Document Objects” In such systems, • Document objects will include a multi-valued Roleid field that will contain list of role-Ids which has access to the document. Index Time Document 1 role-Ids: [“1”, “2”, “3”] Document 1 role-Ids: [“1”, “2”, “3”] Document 2 “role-Ids:” [ “2”, “3”] Documents with ACLs
  • 21. Continued… At the time of search, • User Search Query should be appended with user’s Role Id. • Solr’s Filter Query feature and it’s caching techniques gives the most efficient solution for such ACL Techniques. This approach is called as ‘Early Binding’ approach. Query Request Early Binding User Role-Id Solr J Client Query Response
  • 22. Solution For Role Based ACL - Type 2 Systems that has, • Roles which often change; data is normalized by segregating access control information into different tables. • Document1 D1 This approach is called as ‘Early Binding with Externalized ACL’ • • • In such systems: Role-Ids are not attached to the document object. Instead they are stored into different tables with foreign key relation. Use Pseudo Joins at the time of Search Doc ID Role-Ids D1 1, 2, 3, N
  • 23. 2. Discretionary Access Control Definition: • Discretionary – Document owner has the authority to control access of the document. • A system that enables the document owner to specify set of Users with access to a set of documents Owner Specifies Users/groups who can Access Object
  • 24. Solution for Discretionary ACL - Type 1 System that has • Frequent changes in ACL • ACL is defined for each user and a document, • We choose ‘Late Binding Approach with Externalized ACL’ Users Doc1 Doc2 Doc N User A 1 1 1 User B 0 1 1 User M In such systems, • ACL is a 2D-matrix with users and documents along its rows and columns Encode Values – 0 :No access, 1 : Access N : Number of Users, M – Number of Documents
  • 25. Continued… For implementation, the ACL matrix can be represented as a array of bits. Users Doc1 Doc2 Doc N UserA 1 1 1 UserB 1 1 0 [1] 111 [2] 011 This compact representation improves search efficiency and memory over head.
  • 26. Example Consider, • • • • Maximum documents in the Search systems is 5 with document ids:{1,2, 3, 4, 5} Maximum Users are 2 { Id : 1,2 } User 1 has access to document {1, 2, 3} 1 1 1 0 0 User 2 has access to Document {1,2,3,4,5} 1 1 1 1 1 • ACL matrix and array representation: User 1 2 3 4 5 1 1 1 1 0 0 2 1 1 1 1 1 [1] 11100 [2] 11111
  • 27. Solr Implementation Solution 1 • Solr has a Post Filter Interface that can be extended to develop a Custom Plugin. • Interface has a method called ‘collect()’ • Collect() has a list of documents matched to the user’s search query. – Iterate through the list, get the document-Id from the Field Cache and apply ACL using bit array . 1 1 1 0 0 • Code Snippets: https://gist.github.com/rajanim/7197154
  • 28. Other Implementation Solution Solution 2 • Using BitSet utilities • Get the bitset of documents matched by the search query from Search Engine • Get the User ACL bitset instance • Obtain the intersection of the two bitsets [intersect(bitset other)] 1 1 1 0 0 1 1 1 1 1 0 0 1 0 0
  • 29. Solution for Discretionary ACL - Type 2 • • Discretionary ACL systems have static ACL We choose, “Early Binding with ACL bound to Document Objects” In such systems, • Document objects will include a multi-valued user-id field that contains a list of user-ids with access to the document. • The user-id field has to be indexed.
  • 30. Continued… • This solution requires the ACL and document data to be de-normalized to flat structure. Index Time Parse Document Search Time Query Request With User ID Add List of Users Who has access Solr J Client Query Response
  • 32. Summary • Discretionary ACL with late binding solution is a complex model and it requires extensive verification • Leverage Solr’s smart caching capability • Since ACL always adds an additional over head it has to be optimized to provide minimum delay.
  • 33. References: • • • • • • searchhub.org/2012/02/22/custom-security-filtering-in-solr/ Secure Search in Enterprise Webs: Tradeoffs in Efficient Implementation for Document Level Security By Peter Bailey, David Hawking, Brett Matson All in One Book (Shon Harris, 2005) http://www.searchtechnologies.com/enterprise-search-document-levelsecurity.html http://alvinalexander.com/java/jwarehouse/lucene/src/test/org/apache/lucene /search/TestFilteredQuery.java.shtml https://github.com/Zvents/score_stats_component/blob/master/src/main/java/ com/zvents/solr/components/ScoreStatsPostFilter.java