• Share
  • Email
  • Embed
  • Like
  • Private Content
Mongo for aadhaar

Mongo for aadhaar






Total Views
Views on SlideShare
Embed Views



1 Embed 137

http://www.linkedin.com 137



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Mongo for aadhaar Mongo for aadhaar Presentation Transcript

    • Search data store for the worlds largest biometric identity system Regunath Balasubramanian Shashikant Soni regunathb@gmail.com soni.shashikant@gmail.com twitter @regunathbCONFIDENTIAL: For limited circulation only Slide 1
    • India● 1.2 billion residents ● 640,000 villages, ~60% lives under $2/day ● ~75% literacy, <3% pays Income Tax, <20% banking ● ~800 million mobile, ~200-300 mn migrant workers● Govt. spends about $25-40B on direct subsidies ● Residents have no standard identity document ● Most programs plagued with ghost and multiple identities causing leakage of 30-40% Slide 2
    • Aadhaar● Create a common ‘national identity’ for every ‘resident’ ●Biometric backed identity to eliminate duplicates ●‘Verifiable online identity’ for portability● Applications ecosystem using open APIs ●Aadhaar enabled bank account and payment platform ●Aadhaar enabled electronic, paperless KYC (Know Your Customer) Slide 3
    • Search Requirements● Multi-attribute query like: name contains ‘regunath’ AND city = ‘bangalore’ AND address contains ‘J P Nagar’ AND YearOfBirth = ……● Search 1.2B resident data with photo, history ●35Kb - Average record size● Response times in milliseconds● Open scale out Slide 4
    • Why MongoDB● Auto-sharding● Replication● Failover … Essentially an AP (slaveOk) data store in CAP parlance● Evolving schema● Map-Reduce for analysis● Full text search ●Compound (or) multi-keys Slide 5
    • Design { _id:123456789, name: ‘abcde’, year:1980, ….. } MongoDB 2 Search API Client App Name=‘abcde’ Solr 1 Address=‘some place’ Indexes Name: ‘abcde’ Year= 1980 Address: ‘some place’ year: 1980● Read/Search ●Sharded Solr indexes for search ●Keyed document read from MongoDB● Write ●Eventual consistency (across data sources) driven by application ●Composite MongodDB-Solr app persistence handler Slide 6
    • Implementation and Deployment ● Start - 4M records in 2 shards Current - 250M records in 8 shards ( 8 x ~2 TB x 3 replicas) ● Performance , Reliability & Durability ●SlaveOk ●getLastError, Write Concern: availability vs durability  j = journaling  w = nodes-to-write ● Replica-sets / Shards – how? RS 1 RS 1 RS 1 Rs 2 RS 2 RS 2Primary Config 1 Config 2 Config 3SecondaryArbiter Router Router Router Slide 7
    • Monitoring and Troubleshooting● Monitoring tools evaluated ●MMS ●munin● Manual approach - daily ritual ●RS, DB, config, router - health and stats● Problem analysis stats ●mongostat, iostat, currentOps, logs ●Client connections● Stats for storage, shards addition ●Data file size ●Shard data distribution ●Replication Slide 8
    • Key Learnings on MongoDB● Indexing 32 fields ●Compound indexes ●Multi-keys indexes  {…"indexes" : [{ "email":"john.doe@email.com", "phone":"123456789“ }] }  db.coll.find ({ "indexes.email" : "john.doe@email.com" }) ●Indexes use b-tree ●Many fields to index ●Performs well upto 1-2M documents ●Best if index fits in memory● Data replication, RS failover ●Rollback when RS goes out of sync  Manual restore (physical data copy)  Restarting a very stale node Slide 9
    • Questions? Regunath Balasubramanian Shashikant Soni regunathb@gmail.com soni.shashikant@gmail.com twitter @regunathbCONFIDENTIAL: For limited circulation only Slide 10