Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data at AadhaarDr. Pramod K Varma       Regunath Balasubramaianpramod.uid@gmail.com        regunathb@gmail.comTwitter:...
Aadhaar at a Glance         2
India• 1.2 billion residents   – 640,000 villages, ~60% lives under $2/day   – ~75% literacy, <3% pays Income Tax, <20% ba...
Vision• Create a common “national identity” for every  “resident”  – Biometric backed identity to eliminate duplicates  – ...
Aadhaar System• Enrolment  –   One time in a person’s lifetime  –   Minimal demographics  –   Multi-modal biometrics (Fing...
Architecture Principles• Design for scale   – Every component needs to scale to large volumes   – Millions of transactions...
Designed for Scale• Horizontal scalability for all components   –   “Open Scale-out” is the key   –   Distributed computin...
Enrolment Volume• 600 to 800 million UIDs in 4 years   – 1 million a day   – 200+ trillion matches every day!!!• ~5MB per ...
Authentication Volume• 100+ million authentications per day (10 hrs)   – Possible high variance on peak and average   – Su...
Open APIs• Aadhaar Services  – Core Authentication API and supporting Best    Finger Detection, OTP Request APIs  – New se...
Implementation       11
Patterns & Technologies• Principles    • POJO based application implementation    • Light-weight, custom application conta...
Aadhaar Data Stores                                           (Data consistency challenges..)Shard        Shard           ...
Aadhaar Architecture                       • Real-time monitoring using Events• Work distribution  using SEDA &  Messaging...
Deployment Monitoring
Learnings• Make everything API based• Everything fails  (hardware, software, network, storage)  – System must recover, ret...
Thank You!Dr. Pramod K Varma            Regunath Balasubramaianpramod.uid@gmail.com             regunathb@gmail.comTwitter...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Building mobile platforms for scale and differentiated user experience
Next
Upcoming SlideShare
Building mobile platforms for scale and differentiated user experience
Next
Download to read offline and view in fullscreen.

Share

Aadhaar at 5th_elephant_v3

Download to read offline

Slides used in the talk at the Fifth Elephant Big Data conference by Dr. Pramod Varma and Regunath B

Related Books

Free with a 30 day trial from Scribd

See all

Aadhaar at 5th_elephant_v3

  1. Big Data at AadhaarDr. Pramod K Varma Regunath Balasubramaianpramod.uid@gmail.com regunathb@gmail.comTwitter: @pramodkvarma Twitter: @RegunathB
  2. Aadhaar at a Glance 2
  3. India• 1.2 billion residents – 640,000 villages, ~60% lives under $2/day – ~75% literacy, <3% pays Income Tax, <20% banking – ~800 million mobile, ~200-300 mn migrant workers• Govt. spends about $25-40 bn on direct subsidies – Residents have no standard identity document – Most programs plagued with ghost and multiple identities causing leakage of 30-40% 3
  4. Vision• Create a common “national identity” for every “resident” – Biometric backed identity to eliminate duplicates – “Verifiable online identity” for portability• Applications ecosystem using open APIs – Aadhaar enabled bank account and payment platform – Aadhaar enabled electronic, paperless KYC 4
  5. Aadhaar System• Enrolment – One time in a person’s lifetime – Minimal demographics – Multi-modal biometrics (Fingerprints, Iris) – 12-digit unique Aadhaar number assigned• Authentication – Verify “you are who you claim to be” – Open API based – Multi-device, multi-factor, multi-modal 5
  6. Architecture Principles• Design for scale – Every component needs to scale to large volumes – Millions of transactions and billions of records – Accommodate failure and design for recovery• Open architecture – Use of open standards to ensure interoperability – Allow the ecosystem to build libraries to standard APIs – Use of open-source technologies wherever prudent• Security – End to end security of resident data – Use of open source – Data privacy handling (API and data anonymization) 6
  7. Designed for Scale• Horizontal scalability for all components – “Open Scale-out” is the key – Distributed computing on commodity hardware – Distributed data store and data partitioning – Horizontal scaling of “data store” a must! – Use of right data store for right purpose• No single point of bottleneck for scaling• Asynchronous processing throughout the system – Allows loose coupling various components – Allows independent component level scaling 7
  8. Enrolment Volume• 600 to 800 million UIDs in 4 years – 1 million a day – 200+ trillion matches every day!!!• ~5MB per resident – Maps to about 10-15 PB of raw data (2048-bit PKI encrypted!) – About 30 TB I/O every day – Replication and backup across DCs of about 5+ TB of incremental data every day – Lifecycle updates and new enrolments will continue for ever• Additional process data – Several million events on an average moving through async channels (some persistent and some transient) – Needing complete update and insert guarantees across data stores 8
  9. Authentication Volume• 100+ million authentications per day (10 hrs) – Possible high variance on peak and average – Sub second response – Guaranteed audits• Multi-DC architecture – All changes needs to be propagated from enrolment data stores to all authentication sites• Authentication request is about 4 K – 100 million authentications a day – 1 billion audit records in 10 days (30+ billion a year) – 4 TB encrypted audit logs in 10 days – Audit write must be guaranteed 9
  10. Open APIs• Aadhaar Services – Core Authentication API and supporting Best Finger Detection, OTP Request APIs – New services being built on top• Aadhaar Open Standards for Plug-n-play – Biometric Device API – Biometric SDK API – Biometric Identification System API – Transliteration API for Indian Languages 10
  11. Implementation 11
  12. Patterns & Technologies• Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs• Compute Patterns • Data Locality • Distribute compute (within a OS process and across)• Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid• Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  13. Aadhaar Data Stores (Data consistency challenges..)Shard Shard Shard Shard 0 2 6 9 Low latency indexed read (Documents per sec), Solr cluster Low latency random search (Documents per sec)Shard Shard Shard (all enrolment records/documents a d f – selected demographics only) Shard Shard 1 2 Shard Low latency indexed read (Documents per sec), 3 Mongo cluster High latency random search (seconds per read) Shard Shard (all enrolment records/documents 4 5 – demographics + photo) Low latency indexed read (milli-seconds Enrolment UID master DB MySQL per read), (sharded) (all UID generated records - demographics only, High latency random search (seconds per track & trace, enrolment status ) read) HBase High read throughput (MB per sec), Region Region Region Region (all enrolment Low-to-Medium latency read (milli-seconds per read) Ser. 1 Ser. 10 Ser. .. Ser. 20 biometric templates) DataNode 1 Data Node 10 Data Node .. Data Node 20 HDFS High read throughput (MB per sec), (all raw packets) High latency read (seconds per read) LUN 1 LUN 2 LUN 3 LUN 4 Moderate read throughput, NFS High latency read (seconds per read) (all archived raw packets)
  14. Aadhaar Architecture • Real-time monitoring using Events• Work distribution using SEDA & Messaging• Ability to scale within JVM and across• Recovery through check-pointing• Sync Http based Auth gateway• Protocol Buffers & XML payloads• Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  15. Deployment Monitoring
  16. Learnings• Make everything API based• Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self- heal• Security and privacy should not be an afterthought• Scalability does not come from one product• Open scale out is the only way you should go. – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt! 16
  17. Thank You!Dr. Pramod K Varma Regunath Balasubramaianpramod.uid@gmail.com regunathb@gmail.comTwitter: @pramodkvarma Twitter: @RegunathB 17
  • vineet.0505

    Sep. 21, 2020
  • emailaayush

    Mar. 12, 2020
  • smahesh33

    Mar. 3, 2020
  • b_sanjoy

    May. 11, 2018
  • SandeepKuwar

    Apr. 3, 2018
  • absawant22

    Mar. 25, 2018
  • Harshailupi

    Jun. 26, 2017
  • ajayjose1982

    Dec. 20, 2016
  • javaamtho

    Jun. 24, 2016
  • om1042

    May. 9, 2016
  • rrubio74

    Jan. 2, 2016
  • sachinbhosale1

    Nov. 30, 2015
  • shashwat72

    Nov. 4, 2015
  • ksansor

    Oct. 6, 2015
  • array2207

    Jun. 27, 2015
  • TarunRathor

    Feb. 28, 2015
  • srianvesh

    Feb. 23, 2015
  • abhiramviswamitra

    Dec. 14, 2014
  • wgpshashank

    Dec. 14, 2014
  • SenthilBalakrishnan

    Nov. 22, 2014

Slides used in the talk at the Fifth Elephant Big Data conference by Dr. Pramod Varma and Regunath B

Views

Total views

41,826

On Slideshare

0

From embeds

0

Number of embeds

24,559

Actions

Downloads

824

Shares

0

Comments

0

Likes

47

×