Evaluating Role Mining
Algorithms
SACMAT’09, June 3 - 5, 2009, Stresa, Italy.

Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang
@ CERIAS Research Center Department of Computer Science, Purdue University
Jorge Lobo @ IBM T.J. Watson Research Center
Presentation by Onur Yılmaz - onur@onuryilmaz.me
Outline
 Introduction
 Overview
 Role Mining Algorithms
 Evaluation Results
 Analysis
 Conclusion
 Future Work
Introduction

 Aim of the study
 Comprehensive study to compare role mining algorithms
 What is presented?
 Two new methods for generating datasets
 Analysis of nine role mining algorithms
Introduction
Role Mining
 Using data mining techniques to discover roles from existing
system configuration data
Overview

 3 key points:
Output of a role mining algorithm
Criteria to compare outputs of algorithms

Input datasets
Overview
Output of Role Mining Algorithm

Existing algorithms based on their outputs:
 Class 1: Outputting prioritized roles
 Class 2: Outputting RBAC states
Overview
Output of Role Mining Algorithm
Class 1: Outputting prioritized roles
 Prioritized list of candidate roles, each of which is a set of permissions
 CompleteMiner and Fast-Miner

Candidate
role
generation
a set of candidate
roles from the userpermission assignment
data

Candidate
role
prioritization
Overview
Output of Role Mining Algorithm
Class 2: Outputting RBAC states
ρ = <User, Permission, UP >

RBAC state γ = <Roles, UserRoleAss,
RolePermissionAss,
RoleHierarchy, DirectUserPermissionAss>
Overview
Output of Role Mining Algorithm
Class 2: Outputting RBAC states
 Minimize some cost measure while finding RBAC
output
Number of roles, number of user assignments
etc..
Overview
Output of Role Mining Algorithm
Class 2: Outputting RBAC states
Weighted Structural Complexity (WSC)
Sums up the number of relationships in an RBAC state, with

possibly different weights for each relationship.
Overview
Output of Role Mining Algorithm
Class 2: Outputting RBAC states
Weighted Structural Complexity (WSC)
Given a weight vector W = < wr,

wu, wp, wh, wd >

wsc(γ,W) = wr ∗ |R| + wu ∗ |UA| + wp ∗ |PA|+
wh ∗ |transitive_reduce(RH)| + wd∗ |DUPA|
Overview
Output of Role Mining Algorithm
Class 2: Outputting RBAC states
Weighted Structural Complexity (WSC)
 Different weight vectors encode different mining objectives and
minimization goals
 HierarchicalMiner takes both a configuration ρ and a weight vector and
aims at outputting an RBAC state with low WSC.

 Graph optimization minimizes the number of edges
Overview
Output of Role Mining Algorithm
Class 1 vs Class 2 Algorithms
 RBAC states are easy to compare
 List of candidate roles can be more useful in practice

Administrator examines the role mining results and
determine whether to adopt some part of it.
In practice, whether role mining algorithms can suggest the
best candidate roles.
Overview
Metrics for Comparing Algorithms

Two metrics:
 Complexity of the RBAC state
 Quality of roles
Overview
Metrics for Comparing Algorithms
Complexity of the RBAC state
Using WSC, how well each algorithm performs
under a variety of mining objectives
Overview
Metrics for Comparing Algorithms
Quality of Roles
 For each weight vector W, evaluate the complexity of the optimal
RBAC state using only the top k roles.

 Among the top k roles, how quickly do the mined roles cover the UP
relation?
 Among the top k roles, how well do they «resemble» the original
roles?
Overview
Input Data Type
Access Control Configuration
ρ = <User, Permission, UserPermissionRelation >
Overview
Input Data Type
Datasets from literature
Overview
Input Data Type
 Generated Datasets
Random Data Generator
Tree-Based Data Generator
ERBAC Data Generator
Overview
Input Data Type
Random Data Generator
Permission

Role
User – Permission
Assignment

Roles

Number of Users,
Number of Roles,
Number of Permissions,

Users

Maximum Number of Roles for Users,
Maximum Number of Permissions for Role
Overview
Input Data Type
Tree-Based Data Generator
Assign
permissions to
nodes in the
tree

Assign users to
leaf nodes

Randomly
generate a tree

Number of Users,
Number of Permissions,
Height of Tree

Upper bound on number of children node,
Lower bound on number of children node
Overview
Input Data Type
ERBAC Data Generator
Permissions

Functional
Roles

Functional
Roles

Users

Number of Users,
Number of Business Roles,
Number of Functional Roles,
Number of Permissions

Business
Roles

Business
Roles

Maximum # of Business Roles,
Maximum # of Functional Roles,
Maximum # of Permissions
Role Mining Algorithms

Class 1

Class 2

CompleteMiner (CM)

ORCA

FastMiner (FM)

Graph Optimization (GO)

DynamicMiner (DM)

HP Role Minimization (HPr)

PairCount (PC)

HP Edge Minimization (HPe)
HierarchicalMiner (HM)
Role Mining Algorithms
CompleteMiner (CM)

Initial set of
roles

from user
permission sets

All possible
intersections

Prioritization
of roles

Candidate roles

Based on
number of
exact matches

Exponential Time
Role Mining Algorithms
FastMiner (FM)

Initial set of
roles

from user
permission sets

Only
intersection
between pairs of
initial roles

Prioritization of
roles

Candidate roles

O (n2m)
n: users, m: permissions
Role Mining Algorithms
DynamicMiner (DM)

 CM and FM -> static prioritization (does not consider candidate
roles that been already chosen)

Initial set of
roles
from user
permission sets

All possible
intersections

Prioritization
of roles

Candidate roles
with the highest
priority first
O (n * |C| * min{n,m} )
C: Set of candidate roles
Role Mining Algorithms
PairCount (PC)

 Newly proposed method

 CM -> Prioritization based on exact numbers
 In reality, multiple roles are assigned to a user

 Pair Count: Pairs of users that share the only role, but no other

PC(P) = | { (ui, uj ) | ui = uj ∧ P(ui) ∩ P(uj) = P } |

O (n2m)
Role Mining Algorithms
PairCount (PC)

Initial set of
roles

from user
permission sets

All possible
intersections

Prioritization
of roles

Candidate roles

Based on Pair
Counts

O (n2m)
Role Mining Algorithms
ORCA

 Hierarchical clustering on permissions

Set of clusters of
permissions

Find pairs of
clusters

The number
of users
having both
permissions is
the largest

Continue
until

One cluster
or
No user with
permissions in
two clusters

O (m2n)
Role Mining Algorithms
HP Role Minimization (HPr)

 Minimal set of roles to cover the user-permission assignment
relation

Selecting the
next user with
the fewest
uncovered
permissions

Select a user u and finds a
pair <U(u), P(u)>

All user-permission assignments between
U(u) and P(u) are removed

This pair forms a
«role»

P(u): Permissions of user u
U(u): All users have all the permissions of u

O (nm)
Role Mining Algorithms
HP Edge Minimization (HPe)

 Finding a RBAC state with minimal number of edges, called edge
concentration
 Similar to Graph Optimization algorithm, except this does not create a
role hierarchy

HPr

Greedily
improve
objective
function

If two roles have
overlap in the
permission or
user sets ->
restructuring

Converge

O (k2m)
k : number of iterations
Role Mining Algorithms
HierarchicalMiner (HM)

 Concept: < P, U > such that
 U contains all the users that have all permissions in P,
 P contains all the permissions that are shared by all users in U

Reduced
family of
concepts

Remove a role
if RBAC state
is improved
Removing a role:
- Redistribution of users
down the hierarchy
- Permissions up the
hierarchy

Heuristically
continue

Similar to Graph
Optimization but
uses concept
lattice.
Evaluation Results
 For each dataset, each algorithm
 Ranked according to their ability to optimize
evaluation criteria
 1 to N
 Two metrics mentioned before:
 Comparing Complexity of the RBAC States
 Comparing Prioritized Role Quality
Evaluation Results
Comparing Complexity of the RBAC States

 Role Minimization
Evaluation Results
Comparing Complexity of the RBAC States

 Edge Concentration

HM has an advantage in this test because its roles are designed for a role-hierarchy
Evaluation Results
Comparing Complexity of the RBAC States

 Allowed Noise at Direct Assignments

Dataset contains errors that should not be covered by roles.
Evaluation Results
Comparing Complexity of the RBAC States

 Discovering Original Roles
 Similarity of mined roles to original data
 Used metric is average maximal Jaccard

HM: The top 40+ roles
are more or less the ones
generated
PC: Performed the worst,
generating roles farthest
from the original data
Evaluation Results
Comparing Prioritized Role Quality

 Quality of WSC over k-roles
Evaluation Results
Comparing Prioritized Role Quality

 Quality of Coverage

How well the algorithm at quickly covering the UP relation?
Analysis
 Algorithms that minimize the number of roles often generate RBAC states
with a larger number of edges, resulting in increased complexity.
 GO generates large role hierarchies when the number of users is greater

than the number of permissions.
 DM is over-fitting some of the roles to cover users, and does not consider
the entire resulting RBAC state.
 HM is computationally and memory intensive.
Conclusion

 Aim of the study
 Comprehensive study to compare role mining algorithms
 What is presented?
 Two new methods for generating datasets
 Analysis of nine role mining algorithms
Future Work

 Handling data with attribute information
 In addition to the user-permission data, attribute
information may also be available.

 Handling noisy data
 In some scenarios, the input user-permission data
may contain noises.
Evaluating Role Mining
Algorithms
SACMAT’09, June 3 - 5, 2009, Stresa, Italy.

Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang
@ CERIAS Research Center Department of Computer Science, Purdue University
Jorge Lobo @ IBM T.J. Watson Research Center
Presentation by Onur Yılmaz - onur@onuryilmaz.me

Evaluating Role Mining Algorithms

  • 1.
    Evaluating Role Mining Algorithms SACMAT’09,June 3 - 5, 2009, Stresa, Italy. Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang @ CERIAS Research Center Department of Computer Science, Purdue University Jorge Lobo @ IBM T.J. Watson Research Center Presentation by Onur Yılmaz - onur@onuryilmaz.me
  • 2.
    Outline  Introduction  Overview Role Mining Algorithms  Evaluation Results  Analysis  Conclusion  Future Work
  • 3.
    Introduction  Aim ofthe study  Comprehensive study to compare role mining algorithms  What is presented?  Two new methods for generating datasets  Analysis of nine role mining algorithms
  • 4.
    Introduction Role Mining  Usingdata mining techniques to discover roles from existing system configuration data
  • 5.
    Overview  3 keypoints: Output of a role mining algorithm Criteria to compare outputs of algorithms Input datasets
  • 6.
    Overview Output of RoleMining Algorithm Existing algorithms based on their outputs:  Class 1: Outputting prioritized roles  Class 2: Outputting RBAC states
  • 7.
    Overview Output of RoleMining Algorithm Class 1: Outputting prioritized roles  Prioritized list of candidate roles, each of which is a set of permissions  CompleteMiner and Fast-Miner Candidate role generation a set of candidate roles from the userpermission assignment data Candidate role prioritization
  • 8.
    Overview Output of RoleMining Algorithm Class 2: Outputting RBAC states ρ = <User, Permission, UP > RBAC state γ = <Roles, UserRoleAss, RolePermissionAss, RoleHierarchy, DirectUserPermissionAss>
  • 9.
    Overview Output of RoleMining Algorithm Class 2: Outputting RBAC states  Minimize some cost measure while finding RBAC output Number of roles, number of user assignments etc..
  • 10.
    Overview Output of RoleMining Algorithm Class 2: Outputting RBAC states Weighted Structural Complexity (WSC) Sums up the number of relationships in an RBAC state, with possibly different weights for each relationship.
  • 11.
    Overview Output of RoleMining Algorithm Class 2: Outputting RBAC states Weighted Structural Complexity (WSC) Given a weight vector W = < wr, wu, wp, wh, wd > wsc(γ,W) = wr ∗ |R| + wu ∗ |UA| + wp ∗ |PA|+ wh ∗ |transitive_reduce(RH)| + wd∗ |DUPA|
  • 12.
    Overview Output of RoleMining Algorithm Class 2: Outputting RBAC states Weighted Structural Complexity (WSC)  Different weight vectors encode different mining objectives and minimization goals  HierarchicalMiner takes both a configuration ρ and a weight vector and aims at outputting an RBAC state with low WSC.  Graph optimization minimizes the number of edges
  • 13.
    Overview Output of RoleMining Algorithm Class 1 vs Class 2 Algorithms  RBAC states are easy to compare  List of candidate roles can be more useful in practice Administrator examines the role mining results and determine whether to adopt some part of it. In practice, whether role mining algorithms can suggest the best candidate roles.
  • 14.
    Overview Metrics for ComparingAlgorithms Two metrics:  Complexity of the RBAC state  Quality of roles
  • 15.
    Overview Metrics for ComparingAlgorithms Complexity of the RBAC state Using WSC, how well each algorithm performs under a variety of mining objectives
  • 16.
    Overview Metrics for ComparingAlgorithms Quality of Roles  For each weight vector W, evaluate the complexity of the optimal RBAC state using only the top k roles.  Among the top k roles, how quickly do the mined roles cover the UP relation?  Among the top k roles, how well do they «resemble» the original roles?
  • 17.
    Overview Input Data Type AccessControl Configuration ρ = <User, Permission, UserPermissionRelation >
  • 18.
  • 19.
    Overview Input Data Type Generated Datasets Random Data Generator Tree-Based Data Generator ERBAC Data Generator
  • 20.
    Overview Input Data Type RandomData Generator Permission Role User – Permission Assignment Roles Number of Users, Number of Roles, Number of Permissions, Users Maximum Number of Roles for Users, Maximum Number of Permissions for Role
  • 21.
    Overview Input Data Type Tree-BasedData Generator Assign permissions to nodes in the tree Assign users to leaf nodes Randomly generate a tree Number of Users, Number of Permissions, Height of Tree Upper bound on number of children node, Lower bound on number of children node
  • 22.
    Overview Input Data Type ERBACData Generator Permissions Functional Roles Functional Roles Users Number of Users, Number of Business Roles, Number of Functional Roles, Number of Permissions Business Roles Business Roles Maximum # of Business Roles, Maximum # of Functional Roles, Maximum # of Permissions
  • 23.
    Role Mining Algorithms Class1 Class 2 CompleteMiner (CM) ORCA FastMiner (FM) Graph Optimization (GO) DynamicMiner (DM) HP Role Minimization (HPr) PairCount (PC) HP Edge Minimization (HPe) HierarchicalMiner (HM)
  • 24.
    Role Mining Algorithms CompleteMiner(CM) Initial set of roles from user permission sets All possible intersections Prioritization of roles Candidate roles Based on number of exact matches Exponential Time
  • 25.
    Role Mining Algorithms FastMiner(FM) Initial set of roles from user permission sets Only intersection between pairs of initial roles Prioritization of roles Candidate roles O (n2m) n: users, m: permissions
  • 26.
    Role Mining Algorithms DynamicMiner(DM)  CM and FM -> static prioritization (does not consider candidate roles that been already chosen) Initial set of roles from user permission sets All possible intersections Prioritization of roles Candidate roles with the highest priority first O (n * |C| * min{n,m} ) C: Set of candidate roles
  • 27.
    Role Mining Algorithms PairCount(PC)  Newly proposed method  CM -> Prioritization based on exact numbers  In reality, multiple roles are assigned to a user  Pair Count: Pairs of users that share the only role, but no other PC(P) = | { (ui, uj ) | ui = uj ∧ P(ui) ∩ P(uj) = P } | O (n2m)
  • 28.
    Role Mining Algorithms PairCount(PC) Initial set of roles from user permission sets All possible intersections Prioritization of roles Candidate roles Based on Pair Counts O (n2m)
  • 29.
    Role Mining Algorithms ORCA Hierarchical clustering on permissions Set of clusters of permissions Find pairs of clusters The number of users having both permissions is the largest Continue until One cluster or No user with permissions in two clusters O (m2n)
  • 30.
    Role Mining Algorithms HPRole Minimization (HPr)  Minimal set of roles to cover the user-permission assignment relation Selecting the next user with the fewest uncovered permissions Select a user u and finds a pair <U(u), P(u)> All user-permission assignments between U(u) and P(u) are removed This pair forms a «role» P(u): Permissions of user u U(u): All users have all the permissions of u O (nm)
  • 31.
    Role Mining Algorithms HPEdge Minimization (HPe)  Finding a RBAC state with minimal number of edges, called edge concentration  Similar to Graph Optimization algorithm, except this does not create a role hierarchy HPr Greedily improve objective function If two roles have overlap in the permission or user sets -> restructuring Converge O (k2m) k : number of iterations
  • 32.
    Role Mining Algorithms HierarchicalMiner(HM)  Concept: < P, U > such that  U contains all the users that have all permissions in P,  P contains all the permissions that are shared by all users in U Reduced family of concepts Remove a role if RBAC state is improved Removing a role: - Redistribution of users down the hierarchy - Permissions up the hierarchy Heuristically continue Similar to Graph Optimization but uses concept lattice.
  • 33.
    Evaluation Results  Foreach dataset, each algorithm  Ranked according to their ability to optimize evaluation criteria  1 to N  Two metrics mentioned before:  Comparing Complexity of the RBAC States  Comparing Prioritized Role Quality
  • 34.
    Evaluation Results Comparing Complexityof the RBAC States  Role Minimization
  • 35.
    Evaluation Results Comparing Complexityof the RBAC States  Edge Concentration HM has an advantage in this test because its roles are designed for a role-hierarchy
  • 36.
    Evaluation Results Comparing Complexityof the RBAC States  Allowed Noise at Direct Assignments Dataset contains errors that should not be covered by roles.
  • 37.
    Evaluation Results Comparing Complexityof the RBAC States  Discovering Original Roles  Similarity of mined roles to original data  Used metric is average maximal Jaccard HM: The top 40+ roles are more or less the ones generated PC: Performed the worst, generating roles farthest from the original data
  • 38.
    Evaluation Results Comparing PrioritizedRole Quality  Quality of WSC over k-roles
  • 39.
    Evaluation Results Comparing PrioritizedRole Quality  Quality of Coverage How well the algorithm at quickly covering the UP relation?
  • 40.
    Analysis  Algorithms thatminimize the number of roles often generate RBAC states with a larger number of edges, resulting in increased complexity.  GO generates large role hierarchies when the number of users is greater than the number of permissions.  DM is over-fitting some of the roles to cover users, and does not consider the entire resulting RBAC state.  HM is computationally and memory intensive.
  • 41.
    Conclusion  Aim ofthe study  Comprehensive study to compare role mining algorithms  What is presented?  Two new methods for generating datasets  Analysis of nine role mining algorithms
  • 42.
    Future Work  Handlingdata with attribute information  In addition to the user-permission data, attribute information may also be available.  Handling noisy data  In some scenarios, the input user-permission data may contain noises.
  • 43.
    Evaluating Role Mining Algorithms SACMAT’09,June 3 - 5, 2009, Stresa, Italy. Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang @ CERIAS Research Center Department of Computer Science, Purdue University Jorge Lobo @ IBM T.J. Watson Research Center Presentation by Onur Yılmaz - onur@onuryilmaz.me