Introduction to Tokenization 
Prepared by @nabeelxy 
8/28/2014
What is tokenization? 
• Replace a value with a surrogate value called 
“token” 
value Tokenize token 
• Examples 
Value Token Comment 
1344 6423 1231 1521 aX73pQ43T1#+4oxT4 Token consists of alphanumeric values 
1344 6423 1231 1521 3124224578918001 Token consists of numeric values only 
1344 6423 1231 1521 aX73pQ43T1#+y1521 Token replaces the first 12 digits with a alphanumeric value
Properties of a Good Token 
• Format and length preserving 
• Some characteristics may be preserved (e.g. last four 
digits of CC#s) 
• Irreversible without some private information (i.e. 
given a token, it is difficult to find the value) 
• Distinguishable from the value 
– If the token is not distinguishable from the value, 
customers won’t be able to identify sensitive data and 
apply proper protection mechanisms; further, customers 
may inadvertently leak sensitive data thinking they are 
tokens
What is de-tokenization? 
• The reverse process of finding the actual value 
from a token 
token De-tokenize value
Why tokenize? 
• Reduced risk due to limited exposure of 
sensitive information (sensitive information is 
centralized in one location and downstream 
apps work with tokens) 
• Reduce the PCI scope (the number of nodes 
with sensitive data reduces) 
• Minimal changes to applications to support 
tokenization (tokenization is format and 
length preserving)
An Example – Tokenizing CC#s 
Point of Payment App 
Sale Tokenization 
System 
(2) Tokenize CC 
(3) Tokenized CC 
(1) Payment, CC 
Customer Data 
Warehouse 
(4) Tokenized CC 
Order Processing 
App 
CRM App 
[INTERNET] 
MERCHANT 
DATA CENTER 
(5) Tokenized CC
Single-use vs. Multi-use tokens 
Single-use token Multi-use token 
Usually used to represent a single 
transaction 
Usually used to represent a unique 
value (for example, CC#), usually 
used across multiple transactions 
A given value, it may map to 
multiple tokens 
Token maps to a unique value 
within the tokenization system 
Short lived Long lived
How to Generate Tokens? 
• Use a mathematically reversible cryptographic 
function (e.g. Format Preserving Encryption) 
• Use a one-way non-reversible cryptographic 
function (e.g. a hash function such as SHA-2) 
• Static tables mapping values to random 
tokens (tokens are not mathematically 
derived from values)
Tokenization Process
De-tokenization Process
How to manage tokens? 
• Two options 
– In-house 
– Third-party service provider 
• In-house tokenization server 
– Company owns and operates the token system and token database 
– The token server stores the original sensitive data 
– Usually used by large companies who wants to keep sensitive data 
• Third-party tokenization server (TaaS – Tokenization as a Service) 
– Third-party service providers generate tokens and give to companies 
– Usually used by small companies who do not want actual sensitive 
data 
– E.g. In CC transactions, the payment processor generates a token and 
gives only the token to merchant for future references (e.g.: recurring 
fees, refund, etc.) – sacrifice control and pay higher tax fee in 
exchange for convenience, reduced liability and cheaper PCI 
compliance.
Tokenization vs. Encryption 
Tokenization Encryption 
Output is format and length preserving Output is not generally format or length 
preserving (e.g. AES, RSA) (exception – 
FPE – Format Preserving Encryption, OPE 
– Order Preserving Encryption) 
May or may not use encryption as the 
mapping function (could use a hash 
function or a static mapping table) 
Encryption does not have any using 
tokenization internally 
Out is may or may not be reversible Output is always reversible given the key 
Regulatory compliance – PCI DSS Regulatory compliance – Safe Harbor, 
HIPAA 
A main use case is to reduce PCI scope by 
passing tokens to downstream 
applications 
A main use case is to ensure the 
confidentiality of data at rest (even if the 
storage media is compromised to lost, 
attackers are not able to see the actual 
data as they don’t have the keys)
How Tokenization is currently Used 
in the corporate market? 
• Use tokenization to replace sensitive data such as 
CC# with random numbers (3rd method of 
tokenization mentioned earlier) 
• Keep the sensitive data encrypted in a database 
• Since tokens preserve the length and format, 
changes to applications is minimal 
• The sensitive data is exposed only when it is 
necessary; otherwise, apps work with the tokens
References 
• PCI DSS Tokenization Guidelines, 2011

Introduction to Tokenization

  • 1.
    Introduction to Tokenization Prepared by @nabeelxy 8/28/2014
  • 2.
    What is tokenization? • Replace a value with a surrogate value called “token” value Tokenize token • Examples Value Token Comment 1344 6423 1231 1521 aX73pQ43T1#+4oxT4 Token consists of alphanumeric values 1344 6423 1231 1521 3124224578918001 Token consists of numeric values only 1344 6423 1231 1521 aX73pQ43T1#+y1521 Token replaces the first 12 digits with a alphanumeric value
  • 3.
    Properties of aGood Token • Format and length preserving • Some characteristics may be preserved (e.g. last four digits of CC#s) • Irreversible without some private information (i.e. given a token, it is difficult to find the value) • Distinguishable from the value – If the token is not distinguishable from the value, customers won’t be able to identify sensitive data and apply proper protection mechanisms; further, customers may inadvertently leak sensitive data thinking they are tokens
  • 4.
    What is de-tokenization? • The reverse process of finding the actual value from a token token De-tokenize value
  • 5.
    Why tokenize? •Reduced risk due to limited exposure of sensitive information (sensitive information is centralized in one location and downstream apps work with tokens) • Reduce the PCI scope (the number of nodes with sensitive data reduces) • Minimal changes to applications to support tokenization (tokenization is format and length preserving)
  • 6.
    An Example –Tokenizing CC#s Point of Payment App Sale Tokenization System (2) Tokenize CC (3) Tokenized CC (1) Payment, CC Customer Data Warehouse (4) Tokenized CC Order Processing App CRM App [INTERNET] MERCHANT DATA CENTER (5) Tokenized CC
  • 7.
    Single-use vs. Multi-usetokens Single-use token Multi-use token Usually used to represent a single transaction Usually used to represent a unique value (for example, CC#), usually used across multiple transactions A given value, it may map to multiple tokens Token maps to a unique value within the tokenization system Short lived Long lived
  • 8.
    How to GenerateTokens? • Use a mathematically reversible cryptographic function (e.g. Format Preserving Encryption) • Use a one-way non-reversible cryptographic function (e.g. a hash function such as SHA-2) • Static tables mapping values to random tokens (tokens are not mathematically derived from values)
  • 9.
  • 10.
  • 11.
    How to managetokens? • Two options – In-house – Third-party service provider • In-house tokenization server – Company owns and operates the token system and token database – The token server stores the original sensitive data – Usually used by large companies who wants to keep sensitive data • Third-party tokenization server (TaaS – Tokenization as a Service) – Third-party service providers generate tokens and give to companies – Usually used by small companies who do not want actual sensitive data – E.g. In CC transactions, the payment processor generates a token and gives only the token to merchant for future references (e.g.: recurring fees, refund, etc.) – sacrifice control and pay higher tax fee in exchange for convenience, reduced liability and cheaper PCI compliance.
  • 12.
    Tokenization vs. Encryption Tokenization Encryption Output is format and length preserving Output is not generally format or length preserving (e.g. AES, RSA) (exception – FPE – Format Preserving Encryption, OPE – Order Preserving Encryption) May or may not use encryption as the mapping function (could use a hash function or a static mapping table) Encryption does not have any using tokenization internally Out is may or may not be reversible Output is always reversible given the key Regulatory compliance – PCI DSS Regulatory compliance – Safe Harbor, HIPAA A main use case is to reduce PCI scope by passing tokens to downstream applications A main use case is to ensure the confidentiality of data at rest (even if the storage media is compromised to lost, attackers are not able to see the actual data as they don’t have the keys)
  • 13.
    How Tokenization iscurrently Used in the corporate market? • Use tokenization to replace sensitive data such as CC# with random numbers (3rd method of tokenization mentioned earlier) • Keep the sensitive data encrypted in a database • Since tokens preserve the length and format, changes to applications is minimal • The sensitive data is exposed only when it is necessary; otherwise, apps work with the tokens
  • 14.
    References • PCIDSS Tokenization Guidelines, 2011