Hi , Welcome to this Oracle Open World session on Access at Scale for Hundreds of Millions of UsersMy name is Venu Shastri and I am the Senior Product Manager in the Identity Management group with Oracle Fusion Middleware.I will be joined by my colleague, Selva Neelamegam from the IDM Performance team.
Here is a an overview of the agenda for this session.We will start with an overview and the key capabilities of the Oracle Access Management solution before we dive into Architecture and Deployment.We will learn how the Oracle Access Management solution supports deployment across multiple data centers which is a pretty typical requirement for large deployments supporting multi-million users.We will also learn some of the best practices to keep in mind for such large deployments.While this will provide you all with a good idea of how we achieve this kind of scalability, I am sure a lot of you will also be interested to know our benchmark figures. WE had already presented the results of our 250 million user benchmark testing in the last Open World. Our performance engineers have been busy all year optimizing it further.My colleague Selva Neelamegam from the IDM Performance team will be joining me to present our findings and share the latest benchmark figures.We will take some questions from the audience before we get into the next part of this session. The customer panel.We have a pretty interesting panel today with some real-world customers who would be sharing their experiences and challenges with their own large deployments.All in all, this will be a pretty packed agenda with a lot of exciting stuff. So lets get started with the…
Before we dive into the solution itself, we need to ask ourselves – Why is scalability so crucial? What is driving the demand ? In other words…why are we having this session here today.One of the most common use cases is providing access and SSO for large enterprises with a global work force - hundreds of thousands of employees, contractors, partners. Access management is a critical piece of their infra-structure to ensure their information is secure and access policies are uniformly applied across the enterprise.Over the last decade or so massive internet deployments providing online services or e-commerce have become more common-place. These typically have multi-million user base and need to be up 24 x7. They need to authenticate users accessing their site to provide relevant services.In almost all these cases, access is absolutely mission critical. Authenticating and identifying the user is almost always the first, critical step. Any down-time of the access piece would imply loss of service or loss of business.To add to this demand is what we call the Device Multiplier Effect. Smart phones and tablets are ubiquitous and corporate as well as consumer user’s assume that they will be able to access the same resources and services through these smart devices that they could do through their desktops. Where you had a million desktop requests earlier, you will now have to add a million smart phone requests, a million tablet requests and so on. And all this hits the same access infrastructure increasing the load and scalability requirements.Finally the explosion of social media through Facebook, Google, Twitter etc add another dimension to the demand. It not only adds traffic but also creates the desire to somehow tie the user’s social identity with his identity on the particular site or service and provide a seamless single sign on experience.
To cater to these ever-increasing demand on the access infrastructure we here at Oracle created the Oracle Access Management 11g. We believe this is one of the most comprehensive and scalable solutions in the market today.It goes beyond the usual point solutions in the market that address one specific access management requirement. We have taken a platform approach so customers can be confident of meeting not just their current but also future needs from their access infrastructure.
While designing the 11g Access solution, we kept the Large Extranets in mind, from scaling perspective.Apart from this internet level scalability, several other features introduced in 11g including mobile security, seamless integration with social identities (like Facebook or Google) combined with powerful fraud prevention capabilities and light weight user management via XE makes the 11g Access the platform of choice to build the next generation Extranet.
From an architecture perspective, as some of you may know, Oracle Access Mgmt 11gR2 server infrastructure is built as 100 % Java solution allowing us to leverage the scalability features of the Java platform.And OOTB it is deployed on an Oracle WebLogic Cluster. This not only simplifies installation but also helps us take advantage of the clustering and scalability features of the underlying WebLogic platform.Oracle Coherence provides the high-performance distributed cache that keeps all the nodes of a cluster in sync.And we achieve horizontal scalability within a single data center by adding nodes to the cluster. This provides for balancing the load across multiple nodes as well as failover if one or more nodes were to go down.As the deployment size increases and we need to cater to a global user-base, we need to scale beyond a single data center. Oracle Access Mgmt supports a robust multi data center deployment model allowing user sessions to seamlessly transfer from one DC to another.Finally, I should mention that we have tuned and benchmarked this on the Oracle Exa platform and will be sharing the numbers with you later in this session.
This is a quick 101 on the deployment of the Access Mgmt platform.We start off with the Access Mgmt cluster which will have the Access Mgmt admin server running on top of the Weblogic admin server. This provides the console and acts as the “Policy Administration point” or PAP to use the industry terminology . The AM runtime servers run on the managed nodes of the cluster and provide the “Policy Decision Point” for evaluating policies and providing access decisions – Allow or Deny.The administrator would create and manage policies using the consoleWhich get stored in the Policy Store – Oracle DB.When end users try to access any resource, their request gets intercepted by the webgates which act as the “Policy Enforcement Point” or PEP. These webgates interact with the AM runtime servers.If the user is not yet authenticated, the AM server would authenticate the user against the User Store. Once successfully authenticated it establishes a session for that user.It reads the policies for the particular resource in the policy store And based on the outcome of the policy evaluation either allows or denies access to the particular resource.The entire transaction gets recorded and stored in the Audit logs.
Now , what happens when we add the mobile clients to the mix.The server infrastructure remains the same – Access Management cluster, the Policy Store, the User store etc.But we added mobile clients which interact with the server infrastructure via the mobile SDK. This ensures that the same set of policies get applied however you access the resource.This also implies that the load of client requests on the server infrastructure increases. So your server infrastructure should scale up to cater to all these requests.
So we scale this up within the data center by adding nodes to the cluster. All the nodes read through the same policy store and authenticate against the same user store.The policy enforcements points or clients – whether these are webgates on your web servers or SDKs or custom access clients will be spread across the enterprise And these have specific nodes of the cluster configured as their primary And other nodes as secondary servers.These can be configured with different permutations based on expected load and application characteristics – the load gets spread across multiple nodes and if one or more nodes were to go down, your server infrastructure would still continue to function.
Customers can choose to deploy load balancers between the access manager components to simplify the configuration by using virtual host names.However, there are certain constraining requirements to keep in mind when you add a load balancer for managing OAP traffic:- OAP connections are persistent and need to be kept open for a configurable duration even while idle. - WebGates need to be configured to recycle their connections proactively prior to the Load Balancer terminating the connections- The Load Balancer should distribute the OAP connections uniformly across the active Access Manager Servers
Coherence is the high performance in-memory distributed caching layer and it is seamlessly integrated with the solution. Administrators do not have to configure or tweak Coherence.It keeps the session data across nodes in sync. So a user can be seamlessly and transparently transferred from one node to the other during his session.
Here are some of the high level points about the multi-data center deployment model that we support for Access Mgmt.We support all three models - Active - Active, Active - Passive or Active - Hot Standby deployments. The idea is to enable seamless User SSO as a user gets transferred from one data center to the other. And ensure that his session can continue without interruption.It is important to node that for MDC, the WebLogic domain does not span across data centers. Rather we recommend separate but identical clusters in each data center.In fact, we recommend a master-clone configuration where the policy and configuration changes are done at only one data center which is designated as the master and these changes are synchronized to other clone data centersAdministrators can also configure the Session Adoption Policy to control the behavior when a user gets transferred from one data center to the other and his session gets adopted. Whether the user should be forced to re-authenticate ? Whether the session in the previous data center should be invalidated ? Whether the session data from the previous data center should be retrieved.
Lets see how this plays out at run-time.Consider an MDC deployment with 2 data centers. One in New York is the master and the changes are synchronized with the clone data center at London.During normal operations, a User 1 from the US would be routed by the Global Load balancer to the NYDC – due to geographical proximity. And the LDC would be a stand-by for this user.Similarly, for User 2 in Europe, LDC would be the active DC and NYDC would be the stand-by.Both the DCs are active at the same time, catering to different sets of users. The user’s OAM ID cookie keeps track of which cluster does the user have a session.
Now , if the NYDC were to get overloaded is completely down, User 1 would get transferred to LDC by the GLB.Based on the session adoption policy, the user will either be challenged to re-authenticate or will be let thru in LDC which will create a new session for this user.The LDC cluster would also make back-channel OAP calls to the NYDC cluster to retrieve the remote session details and invalidate the remote session. Only in the case where the NYDC is completely down and inaccessible, the user would potentially lose his session data though the deployment itself would still continue to be operational.
This gives a more detailed picture of the MDC deployment showing the web gates as well as the AM cluster in each data center.The HTTP traffic is routed to appropriate data centers and the web gates in each DC interact with the AM cluster in the corresponding DC.
Again, customers could choose to configure local as well as global load balancers to route the OAP traffic across load balancers.So in cases where the data center itself is operational but the AM cluster is completely down, you could potentially have web gates in that data center failing over to the AM cluster in the second data center.These are just some high level scalability use cases for MDC that we are touching on.
But the idea is that you can use the MDC deployment to spread your access infrastructure and load across multiple data centers around the world.Reducing network latency for geographically spread out users and ensuring session continuity when users get transferred from one DC to another.
Finally, lets look at some best practices to keep in mind for large deployments. This list is by no means exhaustive. We recommend users to follow the EDG and high availability documentation on OTN.It is important to model your resources correctly so you are not evaluating policies unnecessarily. Using Excluded instead of Anonymous.Caching at the agent level has been improved in the 11g deployment and we recommend you to leverage those to reduce latency.Slow network connections between the Web, Middleware and Data Tiers are often the underlying problem. This should be rectified.There are a number of default OOTB settings in the Agent as well as server for connection mgmt, caching etc. These should not be used as-is but tuned for your deployment.Using load balancers have their pros and cons. Customers should leverage these where applicable to improve performance and manageability.There are a number of metrics as well as detailed diagnostics available via the Oracle Data Monitoring Service and exposed via the Enterprise Manager. Administrators should use these pro-actively to address issues before they escalate .
Finally, lets look at some best practices to keep in mind for large deployments. This list is by no means exhaustive. We recommend users to follow the EDG and high availability documentation on OTN.It is important to model your resources correctly so you are not evaluating policies unnecessarily. Using Excluded instead of Anonymous.Caching at the agent level has been improved in the 11g deployment and we recommend you to leverage those to reduce latency.Slow network connections between the Web, Middleware and Data Tiers are often the underlying problem. This should be rectified.There are a number of default OOTB settings in the Agent as well as server for connection mgmt, caching etc. These should not be used as-is but tuned for your deployment.Following Maximum Availability Architecture patterns is highly recommended Using load balancers have their pros and cons. Customers should leverage these where applicable to improve performance and manageability.There are a number of metrics as well as detailed diagnostics available via the Oracle Data Monitoring Service and exposed via the Enterprise Manager. Administrators should use these pro-actively to address issues before they escalate .
With this, I hand it over to Selva who will be sharing the benchmark figures.
Overview and Key Capabilities of the solution
Let me take a few minutes to introduce the guests on our panel.
With Fusion Middleware, you can extend and maximize your existing technology investment with the same technologies used in Fusion Applications, including embedded analytics and social collaboration, and mobile and cloud computing. Oracle’s complete SOA platform lets your IT organization rapidly design, assemble, deploy, and manage adaptable business applications and—with Oracle’s business process management tools—even bring the task of modeling business processes directly to the business analysts. Oracle Business Intelligence foundation brings together all your enterprise data sources in a single, easy-to-use solution, delivering consistent insights whether it’s through ad hoc queries and analysis, interactive dashboards, scorecards, OLAP, or reporting. And, your existing enterprise applications can leverage the rich social networking capabilities and content sharing that users have come to expect in consumer software. Oracle Fusion Middleware is based on 100 percent open standards, so you aren’t locked into one deployment model when your business requirements change.