Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Is the elephant in the room

2,127 views

Published on

Published in: Technology, Business
  • A little bit of background. One of the Siemens employees attended my talk on Aadhaar in 2010 and wanted a repeat of the same for their employees. This one therefore is a mashup of technology trends that uses Aadhaar and Flipkart examples for illustration. This was the keynote address at Siemens TECh Days - Aug 2012, Bangalore.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Is the elephant in the room

  1. 1. Is the Elephant in the room? Regunath B regunathb@gmail.com Twitter : @RegunathB
  2. 2. Quick read 1.8 million words?The story is about a battle between great kings and sons, with the principal characters beingArjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc. Source : The Gramener blog for visualizations – Analysis of the entire text contained in the Mahabharatha (http://blog.gramener.com/category/visualisations)
  3. 3. Insights from Social Media Source : ttwick Billionaires page (Bill Gates Twitter Social Media profile) (http://ttwick.com/blog/bill-gates-twitter-social-media/)
  4. 4. Insights from Social Media Source : Impact page of Satyamevjayate (http://www.satyamevjayate.in/impact/impact.php/)
  5. 5. What is Big Data?● Big Data challenges and opportunities arise when information in an enterprise demonstrates following characteristics: – Volume ● Transaction data from enterprise systems – For example : Financial transactions, Orders – Variety ● Structured and Unstructured data – For example : Customer contact, Social Media, Biometrics – Velocity ● High information arrival rates – For example : Application events, Tagging, Rating of content● Big Data opportunities arise when the enterprise is able to derive Value from the data characteristics defined above
  6. 6. Food for thought.... on theorems and laws● Do hardware and technology trends affect your technology selection? – CPU, RAM and disk size double every 18-24 months [Moore’s law] – Disk seek time remains nearly constant at around 5% speed-up per year● Data Seek vs. Data transfer – Software that leverage one of the above (or) a combination B+ tree index, LSM tree index, “Fractal tree”● CAP theorem effect – ability to achieve only 2 of 3 properties of shared- data systems : data Consistency, system Availability and tolerance to network Partitions● Bandwidth is the most scare commodity in a Data Center
  7. 7. Aadhaar Patterns & Technologies• Principles • POJO based application implementation • Light-weight, custom application container • Http gateway for APIs• Compute Patterns • Data Locality • Distribute compute (within a OS process and across)• Compute Architectures • SEDA – Staged Event Driven Architecture • Master-Worker(s) Compute Grid• Data Access types • High throughput streaming : bio-dedupe, analytics • High volume, moderate latency : workflow, UID records • High volume , low latency : auth, demo-dedupe, search – eAadhaar, KYC
  8. 8. Aadhaar Architecture • Real-time monitoring using Events• Work distribution using SEDA & Messaging• Ability to scale within JVM and across• Recovery through check-pointing• Sync Http based Auth gateway• Protocol Buffers & XML payloads• Sharded clusters • Near Real-time data delivery to warehouse • Nightly data-sets used to build dashboards, data marts and reports
  9. 9. Putting data to work at Aadhaar
  10. 10. Deployment Monitoring
  11. 11. Big Data at Flipkart ● Website traffic – Millions of page hits per day – product catalogs, item availability, promotions, search – Millions of active sessions and shopping carts – Latencies measured in low digit milliseconds ● Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...) – Electronic inventory – MP3, eBooks, movies ● New business models, newer channels ● Understanding users, user profiles, social media, experience – Tera bytes of logs containing browsing behavior, data from multiple engagement channels – Recommendations based on millions of possible item matches and relevance algorithms
  12. 12. Is the Elephant in the room?From Wikipedia:"Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignoredor goes unaddressed.Big Data opportunities and challenges are real and present -It is the Elephant in the room.
  13. 13. Some takeaways from experience● Make everything API based● Everything fails (hardware, software, network, storage) – System must recover, retry transactions, and sort of self-heal● Security and privacy should not be an afterthought● Scalability does not come from one product – Watch out for solution and technology stereotyping● Open scale out is the only way to go – Heterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt!

×