HDAP:
A Breakthrough in Directory Technology
Bringing Together LDAP, Context, and Big Data
• What Is HDAP?
• Why HDAP?
• Why even LDAP?
• Evaluating the models for structured data
• Hierarchical model and LDAP
• T...
What is HDAP?
• This highly-available version of LDAP offers better performance and
increased scalability.
• Now, you may be thinking:
•...
Why HDAP?W
Why HDAP?
• Identity remains essential to IT because people are often the center
of activities.
• While there are multiple use cases...
Roadmap:
The Role of Identity and Context Virtualization
in the Technology Food Chain
Company Confidential
Are the Hierarchies of LDAP Still
Necessary?
• The Protocol
• The Schema
• The Storage: Hierarchy
• Searching and Navigati...
The World of Data
Structured
(SQL)
Unstructured
(Search)
Relational
Structured Data: The Three Models and
Their Respective Installed Bases
Network/Graph
Graph
Database
Hierarchica...
• These three models are similar in terms of what you can represent
with them. But they are optimized for different functi...
Object/Entity, Attribute, Value/Keyword
Attribute 1 Attribute 3Attribute 2
Keyword/Value Keyword/Value Keyword/Value
Attri...
Object, Relationship, Data Model
Object
Relationship
Network Data Model
Hierarchical Data Model
1
2
3
1
2
3
Relational Data Model (ERM, ORM, & UML)
Tables/Entities/Object & Relations
From Graph to Functions to E/R
From E/R to Semantic Model
Verb
Verb
Verb
Subject Object
How The Models Stack Up
Relational
Graph/Hierarchy
FasterSlower
Slower
Faster
Write
Update
Query
Search
Navigation/Travers...
SQL is the Workhorse for Modern
Data Management
Data Management
ETLMDM/CDI
Data Warehouse
Analytics/BISearch
Big Data
SQL
...
LDAP is Key to Identity Management
Identity Management
(ETL)
Sync engine
Provisioning
MDM
Metadirectory
Analytics/SIEMSear...
Why Should Identity Management be
Separate from the Rest of the Chain?
Identity Management
ETLMDM/CDI
Data Warehouse
Analy...
Identity and Context Virtualization Process
Foundation for an Identity Service:
Building a Global Virtual Identifier
and Global Virtual Registry
Solution:
Building a Global List with No Duplicates
Link Identity to Context, Regrouping Objects into
Sentences and Sentences into Contexts
Solution: Gather Attributes and Join Them
to Build a Virtualized Global Profile
• A system made of two parts
• Integration layer based on virtualization
• Storage layer (Persistent Cache)
• LDAP (up to ...
Why We Need a Federated Identity
That’s Based on Virtualization and
Stored in HDAP Directories
The World of Access Keeps Expanding
App sourcing and hosting
User
populations
App access
channels
SasS apps
Apps in public...
The Challenges of implementing an Enterprise IdP:
How to Handle Different Internal Security Domains?
Federation
Cloud Apps...
A Federated Identity Hub Manages Authentication
and Attributes to Support the IdP
AD
Forest/Domain A
AD
Forest/Domain B Da...
Federated Identity Service and Provisioning
Legacy Applications
(and respective stores)
AD Sun LDAP
Cloud Apps
LDAP/
SQL/
...
Virtual View Based on Org Chart
Top Manager
Full
Management
Hierarchy
Virtual View Based on Location
Country
State
City
Virtual View Based on Role, Location,
and Territory
Role
Location
Territory
New Use Case: Contextual Search
Company Confidential
Webster’s Definition of “Context”
Latin Contextus: a joining together, origin pp of contexere “to wea...
Company Confidential
Trees as a Representation of Sentences
Company Confidential
Trees as a Way to Represent Sentences
and Context
Searching for HDAP on Google
Diving into one sentence from the
contextual search result
Navigating the different sentences returned in the
context search:
Account the Great Outdoors purchased Order 21
Navigating sentences returned in the search:
SalesRep Nancy Davolio has account The Great
Outdoors
HDAP:
RadiantOne High-Availability LDAP
Based on Lucene/ZooKeeper
(Sub-components of Hadoop)
• An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the di...
From Lucene to Hadoop to ZooKeeper
• Hadoop is an offshoot of the Lucene/Nutch project, aimed at
creating an open source s...
Millions of
Entries
Millions of
Users
Node management
LDAP Front-End
Components
(BER encoding etc…...)
Distributed
Configu...
• HDAP (VDS + Lucene)/10M entries
• 1 node: 30k/sec
2 nodes: 65k/sec
3 nodes: 95k/sec
4 nodes: 130k/sec
5 nodes: 149k/sec
...
The Architecture of the
RadiantOne Federated Identity Service:
• Acting as an abstraction layer between applications and t...
• An LDAP directory is a hierarchical database with this architecture:
• A set of entries, indexed by a main index: the di...
• Everything is automatically indexed in HDAP so you can search the
directory the same way you search Google…
• An invert...
Upcoming SlideShare
Loading in …5
×

CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

1,033 views

Published on

Michel Prompt, Chairman & CEO, Radiant Logic
There's a sea of change coming in terms of scaling identity and access management. This session will look at what's next in directory technology, scalability and possibility.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,033
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room with LDAP, Context and Big Data

  1. 1. HDAP: A Breakthrough in Directory Technology Bringing Together LDAP, Context, and Big Data
  2. 2. • What Is HDAP? • Why HDAP? • Why even LDAP? • Evaluating the models for structured data • Hierarchical model and LDAP • The requirements/ drivers for more scalability • Using Identity and Context Virtualization to build a Federated Identity Service (FID) • Why FID is essential • Powering a new use case: Contextual Search • How HDAP works/ Performance. What We’ll Cover Today
  3. 3. What is HDAP?
  4. 4. • This highly-available version of LDAP offers better performance and increased scalability. • Now, you may be thinking: • LDAP is already very fast and scalable. • And who needs LDAP anyway? Shouldn’t we do as Ian Glazer says, and “kill IdM in order to save it”? • But HDAP goes beyond LDAP, delivering much more and doing it all much faster. A Next-Gen LDAP Directory Driven by Hadoop and Search Technology 7/15/2013 4
  5. 5. Why HDAP?W Why HDAP?
  6. 6. • Identity remains essential to IT because people are often the center of activities. • While there are multiple use cases, one of the key functions of identity is to act as an integration point. • As such, identity management is at the center of application integration. • We need a way to store identities and their attributes, but is LDAP still relevant? • Do we really need a hierarchical system, when the world is moving toward these models? • Path • Graph • Directed Graph • Relational To Bring New Life to the Heart of IT: People and What They Do
  7. 7. Roadmap: The Role of Identity and Context Virtualization in the Technology Food Chain Company Confidential
  8. 8. Are the Hierarchies of LDAP Still Necessary? • The Protocol • The Schema • The Storage: Hierarchy • Searching and Navigation: Traversing the Tree • Searching by Attributes • Navigation: One level or sub-tree. There are not many ways to navigate a tree: • First, you enumerate the children. • Then you reiterate for each child node. • So you either believe that a hierarchical system is sufficient, or you don’t. • The storage
  9. 9. The World of Data Structured (SQL) Unstructured (Search)
  10. 10. Relational Structured Data: The Three Models and Their Respective Installed Bases Network/Graph Graph Database Hierarchical Database SQL Database
  11. 11. • These three models are similar in terms of what you can represent with them. But they are optimized for different functions. • Relational (SQL) is the most ubiquitous for good reasons: • The most complete model and extremely flexible • ACID properties make it great for capturing and updating information, and it’s optimized for non-redundant write • But it’s also slow to navigate and perform ad-hoc query and search • Graphs and hierarchies belong to the same family; after all, trees are “DAG” or “directed acrylic graphs: • Slow for write and update (NO ACID properties in general) • Fast in navigation and ad hoc query and search The Three Models
  12. 12. Object/Entity, Attribute, Value/Keyword Attribute 1 Attribute 3Attribute 2 Keyword/Value Keyword/Value Keyword/Value Attribute 4 Keyword/Value Keyword/Value Keyword/Value
  13. 13. Object, Relationship, Data Model Object Relationship
  14. 14. Network Data Model
  15. 15. Hierarchical Data Model 1 2 3 1 2 3
  16. 16. Relational Data Model (ERM, ORM, & UML) Tables/Entities/Object & Relations
  17. 17. From Graph to Functions to E/R
  18. 18. From E/R to Semantic Model Verb Verb Verb Subject Object
  19. 19. How The Models Stack Up Relational Graph/Hierarchy FasterSlower Slower Faster Write Update Query Search Navigation/Traversal
  20. 20. SQL is the Workhorse for Modern Data Management Data Management ETLMDM/CDI Data Warehouse Analytics/BISearch Big Data SQL IntegrationUnstructured Data
  21. 21. LDAP is Key to Identity Management Identity Management (ETL) Sync engine Provisioning MDM Metadirectory Analytics/SIEMSearch Big Data (along with Web Services and SQL) Integration LDAP Virtualization
  22. 22. Why Should Identity Management be Separate from the Rest of the Chain? Identity Management ETLMDM/CDI Data Warehouse Analytics/BISearch Big Data (SIEM) Directory Web Services SQL Integration
  23. 23. Identity and Context Virtualization Process
  24. 24. Foundation for an Identity Service: Building a Global Virtual Identifier and Global Virtual Registry
  25. 25. Solution: Building a Global List with No Duplicates
  26. 26. Link Identity to Context, Regrouping Objects into Sentences and Sentences into Contexts
  27. 27. Solution: Gather Attributes and Join Them to Build a Virtualized Global Profile
  28. 28. • A system made of two parts • Integration layer based on virtualization • Storage layer (Persistent Cache) • LDAP (up to R1 V 6.1) • HDAP (based on Hadoop/Lucene/Solr, V 7.0) Integration and Cache/Storage Layer
  29. 29. Why We Need a Federated Identity That’s Based on Virtualization and Stored in HDAP Directories
  30. 30. The World of Access Keeps Expanding App sourcing and hosting User populations App access channels SasS apps Apps in public clouds Partner apps Apps in private clouds On-premise enterprise apps Enterprise computers Enterprise-issued devices Public computers Personal devices Employees Contractors Customers Partners Members
  31. 31. The Challenges of implementing an Enterprise IdP: How to Handle Different Internal Security Domains? Federation Cloud Apps IdP Authentication and SSO Enterprise Identity Data Sources ? ?? Implementation
  32. 32. A Federated Identity Hub Manages Authentication and Attributes to Support the IdP AD Forest/Domain A AD Forest/Domain B Databases Internal Enterprise Apps Directories Federation Cloud Apps Identity Sources IdP
  33. 33. Federated Identity Service and Provisioning Legacy Applications (and respective stores) AD Sun LDAP Cloud Apps LDAP/ SQL/ SPML FID as reference store SPML SCIM Internal Systems External Systems
  34. 34. Virtual View Based on Org Chart Top Manager Full Management Hierarchy
  35. 35. Virtual View Based on Location Country State City
  36. 36. Virtual View Based on Role, Location, and Territory Role Location Territory
  37. 37. New Use Case: Contextual Search
  38. 38. Company Confidential Webster’s Definition of “Context” Latin Contextus: a joining together, origin pp of contexere “to weave together.” 1.The parts of a sentence, paragraph, discourse immediately next to or surrounding a specified word or passage and determining its exact meaning [to quote a remark out of context] (Language Representation) 2.The whole situation, background, or environment relevant to a particular event, personality, creation, etc…(Perception)
  39. 39. Company Confidential Trees as a Representation of Sentences
  40. 40. Company Confidential Trees as a Way to Represent Sentences and Context
  41. 41. Searching for HDAP on Google
  42. 42. Diving into one sentence from the contextual search result
  43. 43. Navigating the different sentences returned in the context search: Account the Great Outdoors purchased Order 21
  44. 44. Navigating sentences returned in the search: SalesRep Nancy Davolio has account The Great Outdoors
  45. 45. HDAP: RadiantOne High-Availability LDAP Based on Lucene/ZooKeeper (Sub-components of Hadoop)
  46. 46. • An LDAP directory is a hierarchical database with this architecture: • A set of entries, indexed by a main index: the directory tree • A set of indexes to support attribute search (one per attribute). • The core technology over the last 10 years was to implement the tree as a set of B-tree indexes. B-trees can scale to 100’s of millions of entries. Current Implementation of LDAP Servers is Based on B-Tree Indexation Entries B Tree
  47. 47. From Lucene to Hadoop to ZooKeeper • Hadoop is an offshoot of the Lucene/Nutch project, aimed at creating an open source search engine. • Lucene is the search and index part of the search engine. • Hadoop is the distributed storage (HDFS) and compute (Map/Reduce batch-oriented) engine, offering very sizable throughput on a large cluster of commoditized servers. • There are many components and sub-projects that came out of the Hadoop project. • ZooKeeper is a low-level component for managing configuration and replication for a large number of nodes in a Hadoop cluster.
  48. 48. Millions of Entries Millions of Users Node management LDAP Front-End Components (BER encoding etc…...) Distributed Configuration Manager Add Node, Define new leader, SWAP in and SWAP out dynamically. Scale Out Add more VDS for faster queries and more documents Replication (Leader/Followers) Add more replicas (followers) for better throughput (queries/sec) and fault tolerance Hard commit (Flushed to disk) configures Manage Configuration and State Per Node We are getting 60000 LDAP q/sec before VDS, 30000q/sec after VDS LDAP Front End functions) One Core per JVM Java Web App VDS Core LDAP Processing add/update/del LDAP Query Processing and Caching Schema etc….xml <fields> <types> VDS Config Distributed VDS + Lucene Index on each node Soft commit (in memory) Near Real-Time Replica n Follower replica1 cluster of commodity servers Zookeeper For VDS LDAP and Other Protocols: Front-End XML/JSON/HTTP Indexing Queries Leader Follower
  49. 49. • HDAP (VDS + Lucene)/10M entries • 1 node: 30k/sec 2 nodes: 65k/sec 3 nodes: 95k/sec 4 nodes: 130k/sec 5 nodes: 149k/sec • Google daily average load: 3 million q/minute or 50,000 q/sec Initial Performance Tests (LDAP Search) 0 20000 40000 60000 80000 100000 120000 140000 160000 1 2 3 4 5 Series1 Series2
  50. 50. The Architecture of the RadiantOne Federated Identity Service: • Acting as an abstraction layer between applications and the underlying identity silos, virtualization isolates applications from the complexity of backends. Aggregation Correlation Integration Virtualization by model Population C Population B Population A Groups Roles LDAP SQL Web Services /SOA App A App B App C App D App E App F Contexts Services REST
  51. 51. • An LDAP directory is a hierarchical database with this architecture: • A set of entries, indexed by a main index: the directory tree • A set of indexes to support attribute search (one per attribute). • The core technology over the last 10 years was to implement the tree as a set of B-tree indexes. B-trees can scale to 100’s of millions of entries. Current Implementation of LDAP Servers is Based on B-Tree Indexation Entries B Tree
  52. 52. • Everything is automatically indexed in HDAP so you can search the directory the same way you search Google… • An inverted tree is not necessarily balanced; you could have some paths that are very shallow, while some are very deep. HDAP Uses a Key/Value System Based on Search Technology: Inverted Tree Inverted Tree

×