I have tried to summarize my experience from development of large-scale IdP. And highlight typical problems, that companies can face, when they will build their own IdP platforms.
2. Introduction
● 15 years in IT
● Security experience
● Architectures and management for
Ecosystems of large internet companies
(~500M users)
● Identity Providers (IdP), Antifraud, SRE on
company wide scale
● I’m climbing rocks in free time
3. Agenda
1. What is it Identity Provider (IdP)? Features and expectations.
2. First troubles.
3. Reliability and disaster recovery.
4. Battles with standards and complexity.
5. Safety and regulations.
6. Is there nirvana in the “clouds”?
7. Pessimistic conclusion.
5. Identity Provider
Stores and manages digital identities? Manages identities for principals?
● Authentication
● Authorization
● Profile access and management
● Security and privacy
● Support of multiple platforms
● Documentation
6. Identity Provider
Stores and manages digital identities? Manages identities for principals?
● Authentication
○ SSO
○ Social Authentication
○ Federated Id
● Authorization
○ RBAC/ABAC
○ Roles & resources
○ Authorization in the Private Cloud and between microservices
● Profile access and management
○ Federated Id Provider
○ Domain profiles
● Security and privacy
○ Personal information management
○ Fraud prevention
● Support of multiple platforms
○ Mobile SDK
○ Middleware for programming languages and frameworks
● Documentation
○ API flows
7. Imagine a Journey…
From small IdP, implementing basic functions.
To IdP for global company with millions of customers.
8. First troubles
Where to get ideas about IdP architecture?
Maybe in RFC (https://datatracker.ietf.org/doc/html/rfc6749#section-1.2)
9. First troubles
P → SHA3-256(“x”)=32 bytes
L → 64 bytes
U = len(P) + len(L)=96 bytes
128 Gb RAM ~ 1.3B users
64 Gb RAM ~ 650M users
My Super Application
user
*****************
Login
Password
Sign In
Id Login Password
1 user ????????
10. First troubles
P → SHA3-256(“x”)=32 bytes
L → 64 bytes
U = len(P) + len(L)=96 bytes
128 Gb RAM ~ 1.3B users
64 Gb RAM ~ 650M users
We must build distributed
system.
My Super Application
user
*****************
Login
Password
Sign In
Id Login Password
1 user ????????
11. CAP/PACELC
What should we sacrifice C (“consistency”) or A (“availability”)?
“C”:
● Duplicate client Id’s;
● Password update desynchronization;
“A”:
● Users can’t login;
● Users can’t update profiles;
Availability Consistency
Partitioning
tolerance
12. CAP/PACELC
What should we sacrifice C (“consistency”) or A (“availability”)?
“C”:
● Duplicate client Id’s;
● Password update desynchronization;
“A”:
● Users can’t login;
● Users can’t update profiles;
● Credentials can be decoupled
from profiles;
● Profile updates can be
delayed or queued (not so
frequent requests);
● Client balancing can handle
some not critical writes;
13. PACELC
If everything works fine, “latency” or “consistency”?
User attributes Latency Consistency
Id ? +
Credentials ? +
Name + -
Picture + -
Address + ?
14. PACELC
If everything works fine, “latency” or “consistency”?
User attributes Latency Consistency
Id ? +
Credentials ? +
Name + -
Picture + -
Address + ?
For this categories you should
have storage supporting
strong consistency
For this categories you need
to reflect updates faster, but
for global service you can
propagate changes in home
region.
15. Reliability and Disaster Recovery
From architecture perspective we already have:
1. Distributed database (or databases).
2. Backend on servers (containers).
3. Frontend in CDN.
We also have some basic business functions:
● Authentication
● Authorization
● Profile access and management
16. Reliability and Disaster Recovery
And if something not working, you have:
● Massive headache
And
● Huge management pressure on IdP team
● Business stopped
● Customers calling for help
● No CVR on registration page
● LTV falling down (user really hate, when
they can’t login)
17. Reliability and Disaster Recovery
● Consider service complexity you will have 24x7 customer support team.
● IdP operations should be 100% automated: deployment, release, rollback.
● You should prevent collateral damage: separated tenants, SLB, probably till
HW layer.
● IdP must have SLO, SLA, and this SLO should be respected by business.
18. Functional and per-User SLO Separation
Functional SLO separation:
1. Latency and availability depending
on business function.
2. Different microservices for different
business functions.
3. Graceful degradation (or partial
degradation).
AUTH AUTH REG AUTH AUTH REG
AUTH AUTH REG AUTH AUTH REG
AUTH AUTH AUTH
19. Functional and per-User SLO Separation
User clustering:
1. Activity – profiles for most active
users can be cached, or distributed
across more nodes.
2. Geolocation – user profiles can be
cached in regions with physical
presence.
3. Priorities defined by business –
priority for “high tier” users.
Cache
Persistent storage
20. Battles with Standards and Complexity
Eventually you’ll need more features for users and business:
1. Logout
2. Single Sign On (any acquisition for big company).
3. OpendID Federation (or Social Authentication).
4. Mobile devices support.
5. Passwordless authentication.
6. Fraud prevention system.
21. Battles with Standards and Complexity
Logout
If session managed on user side, we can rely only on expiration time.
If session managed on backend side, we can’t logout immediately and will have
replication issues. If you’re using long-living tokens, you must keep them in
“revocation” database until expiration time. Bumping into memory limits.
If IdP based on OpenID ⇒ (3-4 drafts explaining different “logouts”).
https://medium.com/@robert.broeckelmann/openid-connect-logout-eccc73df758f
22. Battles with Standards and Complexity
Single Sign On
Usual situation, when you have >2 businesses on different domains:
● Transition within Ecosystem should be smooth
● One account base – one login and logout
Typical issues:
● 3rd-party cookies restrictions (for legacy IdP)
● Overall complexity (for both SAML 2.0 and OIDC)
https://en.wikipedia.org/wiki/List_of_single_sign-on_implementations
23. Battles with Standards and Complexity
OpenID Federation (and Social Authentication)
Simple request: we want register users faster, grow using social network
user base.
Complicated request: we want clients to login using credentials for their own
domain.
Typical issues:
● There are too many social authentication providers
● Reliability problems
● Integration problems
https://en.wikipedia.org/wiki/List_of_single_sign-on_implementations
24. Battles with Standards and Complexity
Mobile devices support
Typical request: we have many mobile applications, and would like to have SDKs and
SSO.
Typical issues:
● There are 2 major mobile platforms, but Android extremely fragmented
● Rooted devices
● Update issues
Android:
https://github.com/openid/AppAuth-Android
https://developer.chrome.com/docs/android/custom-tabs/
iOS:
https://developer.apple.com/documentation/safariservices/sfsafariviewcontroller
25. Battles with Standards and Complexity
Passwordless authentication
Problem: login/password not safe enough (passwords reuse,
constant password recovery requests)
Typical issues:
● Email as 2FA – first user asset accessed by attacker
● SMS as 2FA – SIM-clones
● HOTP generators – too expensive or (surprise!) not safe
● TOTP generators – affected by time desync
https://arstechnica.com/information-technology/2011/06/rsa-
finally-comes-clean-securid-is-compromised/
26. Battles with Standards and Complexity
Fraud Prevention System
Question: do we have registration from real humans or not? Is it
normal login, or password brute-force attack?
Typical issues:
● Detection delays – post compromise account recovery
● False positives
● Explanatory issues – why user has been blocked?
● Additional complex development and continuous support
27. Safety and regulations
Unavoidable problem of “personal data”
Magic keywords: GDPR, CCPA, HIPAA,
APPI
Gigantic headache:
● Breach detection and report
● Right to be forgotten
● Users consent management
28. Is there nirvana in the “clouds”?
After looking on that list of potential issues
one of the decisions: “let’s buy it!”
Pros:
● Savings of development resources;
● No headaches with standards;
● No problems with regulators;
● Time to market advantage.
Build
IdP
29. Cons of “cloud” IdP
● Difficult to choose: hundreds of companies with similar offers;
● One observation: many of them “enterprise” oriented, more than > 100k users
causing performance issues;
● They can fail, and your contracts will not compensate reputational risks;
● There are always some trust issues (you have no access to IdP internals);
● Strong vendor lock, migration for IdP always troublesome;
● Sometimes you can’t get integrations and plugins that you need.
30. Pessimistic conclusion
1. IdP - core part of Ecosystem, almost all business functions depending on IdP.
2. IdP - complex software product, with multiple counterintuitive requirements.
3. IdP - covered by multiple regulations, and has very wide range of potential
security risks.
4. For big company almost impossible to avoid IdP development, only some
components can be bought or borrowed.
Troubles with Large Identity Providers
Agenda: we will discuss problems, which you will experience, if your identity platform will grow up to handle millions of users. Security, Performance, and Reliability issues, which are impacting different parts of Identity Platform, and how to avoid them.
Introduction:
Currently I’m working on cross-department projects in Rakuten Group Inc. Helping teams maintaining infrastructure and ecosystem services (including Identity providers). Previously worked in Yandex as architect/manager in Passport (have been responsible for security improvements, and mobile integrations). In IT I’m doing different things for 15 years.