Aadhaar at 5th_elephant_v3
Upcoming SlideShare
Loading in...5

Aadhaar at 5th_elephant_v3



Slides used in the talk at the Fifth Elephant Big Data conference by Dr. Pramod Varma and Regunath B

Slides used in the talk at the Fifth Elephant Big Data conference by Dr. Pramod Varma and Regunath B



Total Views
Views on SlideShare
Embed Views



45 Embeds 18,157

http://nosql.mypopescu.com 7422
http://blogs.vmware.com 6788
http://plurrimi.wordpress.com 1215
http://funnel.hasgeek.com 1063
https://funnel.hasgeek.com 373
https://hasgeek.tv 247
http://www.kopens.com 245
http://b0op.com 148
http://feeds.feedburner.com 133
http://www.linkedin.com 103
https://twitter.com 82
http://hadoopbigdata.wordpress.com 73
http://broadbandforum.in 70
http://hasgeek.tv 67
https://www.linkedin.com 23
http://newsblur.com 18
http://www.hanrss.com 12
http://www.newsblur.com 12
https://si0.twimg.com 9
https://www.facebook.com 6
http://www.opentapestry.dev 5
http://xianguo.com 4
http://translate.googleusercontent.com 4 3
http://www.tuicool.com 3
http://feedproxy.google.com 3
http://cafe.naver.com 3
http://webcache.googleusercontent.com 3
http://plus.url.google.com 2
http://tumblr.hootsuite.com 2
http://tweetedtimes.com 2
https://www.google.com 1
http://staging.slideshare.com 1
http://gsar.slideshare.com 1
https://www.google.ca 1
http://www.google.com 1
http://www.google.co.in 1
http://communities.kopens.com 1
http://ecm.kopens.com 1
http://www.twylah.com 1 1
http://feeds 1
https://twimg0-a.akamaihd.net 1
http://www.acushare.com 1
http://twimblr.appspot.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • @AnkitAnandPande These slides are about the technology behind the World's largest Identity platform. Cross posting your views on the purpose (stated or otherwise) of the programme is not of much use here. I am pretty sure they will get better response on other forums.
    Are you sure you want to
    Your message goes here
  • Aadhar is a Nazi program. Why are we celebrating it? Please refer my blog:
    Are you sure you want to
    Your message goes here
  • Here is the video of the talk : http://www.youtube.com/watch?v=08sq0y8V1sE
    Are you sure you want to
    Your message goes here
  • Is there any recorded video available?
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Aadhaar at 5th_elephant_v3 Presentation Transcript

  • 1. Big Data at AadhaarDr. Pramod K Varma Regunath Balasubramaianpramod.uid@gmail.com regunathb@gmail.comTwitter: @pramodkvarma Twitter: @RegunathB
  • 2. Aadhaar at a Glance 2
  • 3. India• 1.2 billion residents – 640,000 villages, ~60% lives under $2/day – ~75% literacy, <3% pays Income Tax, <20% banking – ~800 million mobile, ~200-300 mn migrant workers• Govt. spends about $25-40 bn on direct subsidies – Residents have no standard identity document – Most programs plagued with ghost and multiple identities causing leakage of 30-40% 3
  • 4. Vision• Create a common “national identity” for every “resident” – Biometric backed identity to eliminate duplicates – “Verifiable online identity” for portability• Applications ecosystem using open APIs – Aadhaar enabled bank account and payment platform – Aadhaar enabled electronic, paperless KYC 4
  • 5. Aadhaar System• Enrolment – One time in a person’s lifetime – Minimal demographics – Multi-modal biometrics (Fingerprints, Iris) – 12-digit unique Aadhaar number assigned• Authentication – Verify “you are who you claim to be” – Open API based – Multi-device, multi-factor, multi-modal 5
  • 6. Architecture Principles• Design for scale – Every component needs to scale to large volumes – Millions of transactions and billions of records – Accommodate failure and design for recovery• Open architecture – Use of open standards to ensure interoperability – Allow the ecosystem to build libraries to standard APIs – Use of open-source technologies wherever prudent• Security – End to end security of resident data – Use of open source – Data privacy handling (API and data anonymization) 6
  • 7. Designed for Scale• Horizontal scalability for all components – “Open Scale-out” is the key – Distributed computing on commodity hardware – Distributed data store and data partitioning – Horizontal scaling of “data store” a must! – Use of right data store for right purpose• No single point of bottleneck for scaling• Asynchronous processing throughout the system – Allows loose coupling various components – Allows independent component level scaling 7
  • 8. Enrolment Volume• 600 to 800 million UIDs in 4 years – 1 million a day – 200+ trillion matches every day!!!• ~5MB per resident – Maps to about 10-15 PB of raw data (2048-bit PKI encrypted!) – About 30 TB I/O every day – Replication and backup across DCs of about 5+ TB of incremental data every day – Lifecycle updates and new enrolments will continue for ever• Additional process data – Several million events on an average moving through async channels (some persistent and some transient) – Needing complete update and insert guarantees across data stores 8
  • 9. Authentication Volume• 100+ million authentications per day (10 hrs) – Possible high variance on peak and average – Sub second response – Guaranteed audits• Multi-DC architecture – All changes needs to be propagated from enrolment data stores to all authentication sites• Authentication request is about 4 K – 100 million authentications a day – 1 billion audit records in 10 days (30+ billion a year) – 4 TB encrypted audit logs in 10 days – Audit write must be guaranteed 9
  • 10. Open APIs• Aadhaar Services – Core Authentication API and supporting Best Finger Detection, OTP Request APIs – New services being built on top• Aadhaar Open Standards for Plug-n-play – Biometric Device API – Biometric SDK API – Biometric Identification System API – Transliteration API for Indian Languages 10
  • 11. Implementation 11
  • 12. Patterns & Technologies• Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs• Compute Patterns • Data Locality • Distribute compute (within a OS process and across)• Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid• Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  • 13. Aadhaar Data Stores (Data consistency challenges..)Shard Shard Shard Shard 0 2 6 9 Low latency indexed read (Documents per sec), Solr cluster Low latency random search (Documents per sec)Shard Shard Shard (all enrolment records/documents a d f – selected demographics only) Shard Shard 1 2 Shard Low latency indexed read (Documents per sec), 3 Mongo cluster High latency random search (seconds per read) Shard Shard (all enrolment records/documents 4 5 – demographics + photo) Low latency indexed read (milli-seconds Enrolment UID master DB MySQL per read), (sharded) (all UID generated records - demographics only, High latency random search (seconds per track & trace, enrolment status ) read) HBase High read throughput (MB per sec), Region Region Region Region (all enrolment Low-to-Medium latency read (milli-seconds per read) Ser. 1 Ser. 10 Ser. .. Ser. 20 biometric templates) DataNode 1 Data Node 10 Data Node .. Data Node 20 HDFS High read throughput (MB per sec), (all raw packets) High latency read (seconds per read) LUN 1 LUN 2 LUN 3 LUN 4 Moderate read throughput, NFS High latency read (seconds per read) (all archived raw packets)
  • 14. Aadhaar Architecture • Real-time monitoring using Events• Work distribution using SEDA & Messaging• Ability to scale within JVM and across• Recovery through check-pointing• Sync Http based Auth gateway• Protocol Buffers & XML payloads• Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  • 15. Deployment Monitoring
  • 16. Learnings• Make everything API based• Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self- heal• Security and privacy should not be an afterthought• Scalability does not come from one product• Open scale out is the only way you should go. – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt! 16
  • 17. Thank You!Dr. Pramod K Varma Regunath Balasubramaianpramod.uid@gmail.com regunathb@gmail.comTwitter: @pramodkvarma Twitter: @RegunathB 17