Evaluating Role Mining Algorithms

Evaluating Role Mining
Algorithms
SACMAT’09, June 3 - 5, 2009, Stresa, Italy.

Ian Molloy, Ninghui Li, Tiancheng Li, Ziqing Mao, Qihua Wang
@ CERIAS Research Center Department of Computer Science, Purdue University
Jorge Lobo @ IBM T.J. Watson Research Center
Presentation by Onur Yılmaz - onur@onuryilmaz.me

Outline
 Introduction
 Overview
 Role Mining Algorithms
 Evaluation Results
 Analysis
 Conclusion
 Future Work

Introduction

 Aim of the study
 Comprehensive study to compare role mining algorithms
 What is presented?
 Two new methods for generating datasets
 Analysis of nine role mining algorithms

Introduction
Role Mining
 Using data mining techniques to discover roles from existing
system configuration data

Overview

 3 key points:
Output of a role mining algorithm
Criteria to compare outputs of algorithms

Input datasets

Overview
Output of Role Mining Algorithm

Existing algorithms based on their outputs:
 Class 1: Outputting prioritized roles
 Class 2: Outputting RBAC states

Overview
Class 1: Outputting prioritized roles
 Prioritized list of candidate roles, each of which is a set of permissions
 CompleteMiner and Fast-Miner

Candidate
role
generation
a set of candidate
roles from the userpermission assignment
data

Candidate
role
prioritization

Overview
Class 2: Outputting RBAC states
ρ = <User, Permission, UP >

RBAC state γ = <Roles, UserRoleAss,
RolePermissionAss,
RoleHierarchy, DirectUserPermissionAss>

Overview
 Minimize some cost measure while finding RBAC
output
Number of roles, number of user assignments
etc..

Overview
Weighted Structural Complexity (WSC)
Sums up the number of relationships in an RBAC state, with

possibly different weights for each relationship.

Overview
Given a weight vector W = < wr,

wu, wp, wh, wd >

wsc(γ,W) = wr ∗ |R| + wu ∗ |UA| + wp ∗ |PA|+
wh ∗ |transitive_reduce(RH)| + wd∗ |DUPA|

Overview
 Different weight vectors encode different mining objectives and
minimization goals
 HierarchicalMiner takes both a configuration ρ and a weight vector and
aims at outputting an RBAC state with low WSC.

 Graph optimization minimizes the number of edges

Overview
Class 1 vs Class 2 Algorithms
 RBAC states are easy to compare
 List of candidate roles can be more useful in practice

Administrator examines the role mining results and
determine whether to adopt some part of it.
In practice, whether role mining algorithms can suggest the
best candidate roles.

Overview
Metrics for Comparing Algorithms

Two metrics:
 Complexity of the RBAC state
 Quality of roles

Overview
Complexity of the RBAC state
Using WSC, how well each algorithm performs
under a variety of mining objectives

Overview
Quality of Roles
 For each weight vector W, evaluate the complexity of the optimal
RBAC state using only the top k roles.

 Among the top k roles, how quickly do the mined roles cover the UP
relation?
 Among the top k roles, how well do they «resemble» the original
roles?

Overview
Input Data Type
Access Control Configuration
ρ = <User, Permission, UserPermissionRelation >

Overview
Input Data Type
Datasets from literature

Overview
Input Data Type
 Generated Datasets
Random Data Generator
Tree-Based Data Generator
ERBAC Data Generator

Overview
Input Data Type
Random Data Generator
Permission

Role
User – Permission
Assignment

Roles

Number of Users,
Number of Roles,
Number of Permissions,

Users

Maximum Number of Roles for Users,
Maximum Number of Permissions for Role

Overview
Input Data Type
Tree-Based Data Generator
Assign
permissions to
nodes in the
tree

Assign users to
leaf nodes

Randomly
generate a tree

Number of Users,
Number of Permissions,
Height of Tree

Upper bound on number of children node,
Lower bound on number of children node

Overview
Input Data Type
ERBAC Data Generator
Permissions

Functional
Roles

Functional
Roles

Users

Number of Users,
Number of Business Roles,
Number of Functional Roles,
Number of Permissions

Business
Roles

Business
Roles

Maximum # of Business Roles,
Maximum # of Functional Roles,
Maximum # of Permissions

Role Mining Algorithms

Class 1

Class 2

CompleteMiner (CM)

ORCA

FastMiner (FM)

Graph Optimization (GO)

DynamicMiner (DM)

HP Role Minimization (HPr)

PairCount (PC)

HP Edge Minimization (HPe)
HierarchicalMiner (HM)

CompleteMiner (CM)

Initial set of
roles

from user
permission sets

All possible
intersections

Prioritization
of roles

Candidate roles

Based on
number of
exact matches

Exponential Time

FastMiner (FM)

Initial set of
roles

from user
permission sets

Only
intersection
between pairs of
initial roles

Prioritization of
roles

Candidate roles

O (n2m)
n: users, m: permissions

DynamicMiner (DM)

 CM and FM -> static prioritization (does not consider candidate
roles that been already chosen)

Initial set of
roles
from user
permission sets

All possible
intersections

Prioritization
of roles

Candidate roles
with the highest
priority first
O (n * |C| * min{n,m} )
C: Set of candidate roles

PairCount (PC)

 Newly proposed method

 CM -> Prioritization based on exact numbers
 In reality, multiple roles are assigned to a user

 Pair Count: Pairs of users that share the only role, but no other

PC(P) = | { (ui, uj ) | ui = uj ∧ P(ui) ∩ P(uj) = P } |

O (n2m)

PairCount (PC)

Initial set of
roles

from user
permission sets

All possible
intersections

Prioritization
of roles

Candidate roles

Based on Pair
Counts

O (n2m)

ORCA

 Hierarchical clustering on permissions

Set of clusters of
permissions

Find pairs of
clusters

The number
of users
having both
permissions is
the largest

Continue
until

One cluster
or
No user with
permissions in
two clusters

O (m2n)

HP Role Minimization (HPr)

 Minimal set of roles to cover the user-permission assignment
relation

Selecting the
next user with
the fewest
uncovered
permissions

Select a user u and finds a
pair <U(u), P(u)>

All user-permission assignments between
U(u) and P(u) are removed

This pair forms a
«role»

P(u): Permissions of user u
U(u): All users have all the permissions of u

O (nm)

HP Edge Minimization (HPe)

 Finding a RBAC state with minimal number of edges, called edge
concentration
 Similar to Graph Optimization algorithm, except this does not create a
role hierarchy

HPr

Greedily
improve
objective
function

If two roles have
overlap in the
permission or
user sets ->
restructuring

Converge

O (k2m)
k : number of iterations

HierarchicalMiner (HM)

 Concept: < P, U > such that
 U contains all the users that have all permissions in P,
 P contains all the permissions that are shared by all users in U

Reduced
family of
concepts

Remove a role
if RBAC state
is improved
Removing a role:
- Redistribution of users
down the hierarchy
- Permissions up the
hierarchy

Heuristically
continue

Similar to Graph
Optimization but
uses concept
lattice.

Evaluation Results
 For each dataset, each algorithm
 Ranked according to their ability to optimize
evaluation criteria
 1 to N
 Two metrics mentioned before:
 Comparing Complexity of the RBAC States
 Comparing Prioritized Role Quality

Evaluation Results
Comparing Complexity of the RBAC States

 Role Minimization

Evaluation Results

 Edge Concentration

HM has an advantage in this test because its roles are designed for a role-hierarchy

Evaluation Results

 Allowed Noise at Direct Assignments

Dataset contains errors that should not be covered by roles.

Evaluation Results

 Discovering Original Roles
 Similarity of mined roles to original data
 Used metric is average maximal Jaccard

HM: The top 40+ roles
are more or less the ones
generated
PC: Performed the worst,
generating roles farthest
from the original data

Evaluation Results
Comparing Prioritized Role Quality

 Quality of WSC over k-roles

Evaluation Results
Comparing Prioritized Role Quality

 Quality of Coverage

How well the algorithm at quickly covering the UP relation?

Analysis
 Algorithms that minimize the number of roles often generate RBAC states
with a larger number of edges, resulting in increased complexity.
 GO generates large role hierarchies when the number of users is greater

than the number of permissions.
 DM is over-fitting some of the roles to cover users, and does not consider
the entire resulting RBAC state.
 HM is computationally and memory intensive.

Conclusion

 Aim of the study
 Comprehensive study to compare role mining algorithms
 What is presented?
 Two new methods for generating datasets
 Analysis of nine role mining algorithms

Future Work

 Handling data with attribute information
 In addition to the user-permission data, attribute
information may also be available.

 Handling noisy data
 In some scenarios, the input user-permission data
may contain noises.

Evaluating Role Mining Algorithms

More Related Content

Similar to Evaluating Role Mining Algorithms

Recently uploaded

Evaluating Role Mining Algorithms