Your SlideShare is downloading. ×
A Novel methodology for handling Document Level Security in Search Based Applications
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Novel methodology for handling Document Level Security in Search Based Applications


Published on

Presented by Rajini Maski, Senior Software Engineer, Happiest Minds Technologies …

Presented by Rajini Maski, Senior Software Engineer, Happiest Minds Technologies

An important problem with document-search in any content management system (CMS) is the handling of permission-based search requests for each user. In this session, we present an algorithm and framework that allows the Search Engine to plainly index both public and privileged documents without any early binding overhead—thus enforcing document-level security policies only at the time of search. With our late-binding approach for ACL (access control lists) and some custom components, we have achieved reduction in search-time overhead. We will also discuss the order of complexity and execution time for the search overhead.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 2. Agenda      Introduction to Search Based Applications Requirement Analysis of Document Level Security Access Control Lists Multiple Solutions Summary
  • 3. Search Based Applications  Search Based Applications are software application in which Search Engine platform is used as the core infrastructure for information accessing and reporting.  E-commerce web applications or content management systems are the types of search based application.
  • 4. Overview of Search Based System User Authentication System Search Based Application Server Unified Data Layer Archives Documents Emails File Server Authentication • User is authenticated before providing access to the application Application • Presents with full fledge User Interface • Perform user operations such as upload documents, send emails, search, etc. Unified Data Layer • Search Server • Indexes content across the sources • Retrieves data at very high speed. Data Storage • Volume of data sources from different repositories
  • 5. So Far, So Good! What’s the problem?
  • 6. Common Access To Unified data Layer User Authentication System Search Based Application How is this a threat? Unified Data Layer Archives Documents Emails File Servers
  • 7. Consider a Sample Use Case User A : - Logs in to application. - Performs a search operation - With the key words such as ‘Pay Slips’, ‘Personal’ or ‘appraisal’. Sample results demonstrated for “appraisal”
  • 8. Search Results Un Authorized Results
  • 9. Observations Relevant Search Results : [Correct] - User A was returned with relevant search results based on his search query; such as exact matches, more like this key words, synonym key words, etc. Unauthorized Search results: [Wrong] - Few of the search results retrieved were the documents to which he was not authorized to view. How are we doing with this? Threats: • Exposure to other users’ confidential documents • Access to Unauthorized information.
  • 10. Problem Definition • To develop a search platform where every user has access to only those documents to which he/she is authorized to. • To ensure that all the confidential data uploaded is not globally searchable unless it is intended to be globally accessible. How can we achieve this?
  • 11. Solution Maintaining Access Control List mapped to each document object. Access Control List?
  • 12. Access Control List • Access Controls are Security features that control how users [subject] and documents[object] communicate and interact with one another. • Subject: An active entity[User] that requests access to an object[Document]. • Object: A passive entity[Document] that contains information Interaction Subject Document Object
  • 13. Data Model Let’s first understand the data model of search engine. Alec_1167 {_id:”1167”, Name:”Ale C”, Agent:”Miller” Place:”NY, NJ, CA”, Units:570} NY 2 NJ 1167 1167 Alec Miller 1 1167 How are documents stored in search engine? Document Oriented Approach. 1167 3 CA 570 3424 Kiwi reds 340 5612 Reh Mo’s 664
  • 14. Indexing and Storing Document Object • • • • User A uploads a document into the system Metadata and Text Extraction Convert it to a flat structure Input it to Search Engine Document Metadata Extract Search Engine Document Saved
  • 15. • We missed to capture something! • What did we miss? – Capturing of User information for each document! • Who uploaded the document • To whom did the user share with? Document • Metadata Extract Search Engine How do we maintain this information? – Access control list to each document object. Document Saved
  • 16. Conventional Solution • Access Control Lists for each user. • At the time of search, – Retrieve search results, – And perform a check on each document for user’s authorization and – Finally return the results. Search Engine Security Filter Each Document Return Results to User
  • 17.  Multiple Solutions
  • 18. Access Control Models Solutions are dependent on the Access Control Models we choose. Two important types of Access Control Models: 1. 2. Non-Discretionary Access Control(Role Based) Discretionary Access Control (DAC)
  • 19. 1. Non-Discretionary (Role Based) Sales Definition: • Non-Discretionary ACL uses a administered set of rules to determine how Users and Documents interact. Sales Documents Marketing Documents Manager Engineering Documents • It is referred to as nondiscretionary because assigning a user to a role is unavoidable Admin Documents Super User
  • 20. Solution For Role Based ACL - Type 1 System that has, • Roles defined during design time and Static ACL set to each document . • We choose, “Early Binding with ACL bound to Document Objects” In such systems, • Document objects will include a multi-valued Roleid field that will contain list of role-Ids which has access to the document. Index Time Document 1 role-Ids: [“1”, “2”, “3”] Document 1 role-Ids: [“1”, “2”, “3”] Document 2 “role-Ids:” [ “2”, “3”] Documents with ACLs
  • 21. Continued… At the time of search, • User Search Query should be appended with user’s Role Id. • Solr’s Filter Query feature and it’s caching techniques gives the most efficient solution for such ACL Techniques. This approach is called as ‘Early Binding’ approach. Query Request Early Binding User Role-Id Solr J Client Query Response
  • 22. Solution For Role Based ACL - Type 2 Systems that has, • Roles which often change; data is normalized by segregating access control information into different tables. • Document1 D1 This approach is called as ‘Early Binding with Externalized ACL’ • • • In such systems: Role-Ids are not attached to the document object. Instead they are stored into different tables with foreign key relation. Use Pseudo Joins at the time of Search Doc ID Role-Ids D1 1, 2, 3, N
  • 23. 2. Discretionary Access Control Definition: • Discretionary – Document owner has the authority to control access of the document. • A system that enables the document owner to specify set of Users with access to a set of documents Owner Specifies Users/groups who can Access Object
  • 24. Solution for Discretionary ACL - Type 1 System that has • Frequent changes in ACL • ACL is defined for each user and a document, • We choose ‘Late Binding Approach with Externalized ACL’ Users Doc1 Doc2 Doc N User A 1 1 1 User B 0 1 1 User M In such systems, • ACL is a 2D-matrix with users and documents along its rows and columns Encode Values – 0 :No access, 1 : Access N : Number of Users, M – Number of Documents
  • 25. Continued… For implementation, the ACL matrix can be represented as a array of bits. Users Doc1 Doc2 Doc N UserA 1 1 1 UserB 1 1 0 [1] 111 [2] 011 This compact representation improves search efficiency and memory over head.
  • 26. Example Consider, • • • • Maximum documents in the Search systems is 5 with document ids:{1,2, 3, 4, 5} Maximum Users are 2 { Id : 1,2 } User 1 has access to document {1, 2, 3} 1 1 1 0 0 User 2 has access to Document {1,2,3,4,5} 1 1 1 1 1 • ACL matrix and array representation: User 1 2 3 4 5 1 1 1 1 0 0 2 1 1 1 1 1 [1] 11100 [2] 11111
  • 27. Solr Implementation Solution 1 • Solr has a Post Filter Interface that can be extended to develop a Custom Plugin. • Interface has a method called ‘collect()’ • Collect() has a list of documents matched to the user’s search query. – Iterate through the list, get the document-Id from the Field Cache and apply ACL using bit array . 1 1 1 0 0 • Code Snippets:
  • 28. Other Implementation Solution Solution 2 • Using BitSet utilities • Get the bitset of documents matched by the search query from Search Engine • Get the User ACL bitset instance • Obtain the intersection of the two bitsets [intersect(bitset other)] 1 1 1 0 0 1 1 1 1 1 0 0 1 0 0
  • 29. Solution for Discretionary ACL - Type 2 • • Discretionary ACL systems have static ACL We choose, “Early Binding with ACL bound to Document Objects” In such systems, • Document objects will include a multi-valued user-id field that contains a list of user-ids with access to the document. • The user-id field has to be indexed.
  • 30. Continued… • This solution requires the ACL and document data to be de-normalized to flat structure. Index Time Parse Document Search Time Query Request With User ID Add List of Users Who has access Solr J Client Query Response
  • 31.  Summary
  • 32. Summary • Discretionary ACL with late binding solution is a complex model and it requires extensive verification • Leverage Solr’s smart caching capability • Since ACL always adds an additional over head it has to be optimized to provide minimum delay.
  • 33. References: • • • • • • Secure Search in Enterprise Webs: Tradeoffs in Efficient Implementation for Document Level Security By Peter Bailey, David Hawking, Brett Matson All in One Book (Shon Harris, 2005) /search/ com/zvents/solr/components/
  • 34. Thank You