Your SlideShare is downloading. ×
Info vision sanjeev kumar _ why data is drowning it world
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Info vision sanjeev kumar _ why data is drowning it world

550
views

Published on

Why Data is Drowning the (IT) World? …

Why Data is Drowning the (IT) World?
Sanjeev Kumar
VP & MD, Informatica India


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
550
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Why Data is Drowning the (IT) World? Sanjeev Kumar VP & MD, Informatica India Infovision 2012 Summit October 20121
  • 2. Agenda• Why the Data Deluge?• Trends Affecting Data Growth• New Use-cases Enabled by Big Data 2
  • 3. Agenda• Why the Data Deluge?• Trends Affecting Data Growth• New Use-cases Enabled by Big Data• Trends Underlying Big Data• Building-blocks for Managing Big Data• Q&A 3
  • 4. Data is the New Plastic 4
  • 5. Where Are We? Computing Circa 2012! 5
  • 6. Where Are We? Computing Circa 2012!• Six decades into the Computer Revolution 6
  • 7. Where Are We? Computing Circa 2012!• Six decades into the Computer Revolution• Four decades since the invention of Microprocessor 7
  • 8. Where Are We? Computing Circa 2012!• Six decades into the Computer Revolution• Four decades since the invention of Microprocessor• Two decades into the rise of modern Internet 8
  • 9. Where Are We? Computing Circa 2012!• Six decades into the Computer Revolution• Four decades since the invention of Microprocessor• Two decades into the rise of modern Internet• Two billion people using the broadband Internet 9
  • 10. Where Are We? Computing Circa 2012!• Six decades into the Computer Revolution• Four decades since the invention of Microprocessor• Two decades into the rise of modern Internet• Two billion people using the broadband Internet Major businesses and industries running on software and delivered as online services* *”Why software is eating the world” Marc Andreessen, WSJ Aug 2011 10
  • 11. Trends: Exploding Data Volumes, “Big Data” Complex, Unstructured Relational Kilo – Mega – Giga – Terra – Peta – Exa – Zetta - Yotta • 2,500 Exabytes of new information in 2012 with Internet as primary driver • Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “Zettabytes” this year Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. . 11
  • 12. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity 12
  • 13. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data 13
  • 14. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data• 9000 job search results for “data scientists” 14
  • 15. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month 15
  • 16. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month• 2,000,000 PDFs from search on “big data white paper” 16
  • 17. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month• 2,000,000 PDFs from search on “big data white paper”• 112,000,000 Blog posts discussing big data 17
  • 18. Big Data Buzz!• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month• 2,000,000 PDFs from search on “big data white paper”• 112,000,000 Blog posts discussing big data• 1,350,000,000 Google results for “What is big data?” Source IBM 2012 18
  • 19. Why Now? Exploding Data Volumes Proliferation of Increased consumption web connected devices of digital contentExplosion in user generated content Internet of things 19
  • 20. Trends: Changing Data EconomicsReturn on Byte = value to be extracted from thatbyte / cost of storing that byte. High ROB Low ROB 20
  • 21. Trends : Data Seen as a Strategic Asset• Companies leveraging data assets to • Create new and differentiated products • Product recommendation engines • Increase revenues • Optimize ad placement to improve click-thru • Improve customer satisfaction / retention • Analyze CDRs for dropped callsThe sexy job in the next ten years will be statisticians. The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, tocommunicate it—that’s going to be a hugely important skill. Hal Varian : ChiefEconomist, Google. 21
  • 22. Big Data in the Enterprise 22
  • 23. Why Now? Big Data Use-cases – User Behavior• Location & Proximity Tracking • GPS in operational apps, security analysis, navigation & social media • New business opportunities for sales and services in proximity 23
  • 24. Why Now? Big Data Use-cases – User Behavior• Location & Proximity Tracking • GPS in operational apps, security analysis, navigation & social media • New business opportunities for sales and services in proximity• Ad Tracking • Dynamic changes in ad placement, color, size and wording • Improved click-through behavior 24
  • 25. Why Now? Big Data Use-cases – User Behavior• Location & Proximity Tracking • GPS in operational apps, security analysis, navigation & social media • New business opportunities for sales and services in proximity• Ad Tracking • Dynamic changes in ad placement, color, size and wording • Improved click-through behavior• Social CRM • Text analytics on huge array of unstructured social media • KPI’s: share of voice, audience engagement, conversation reach, … 25
  • 26. Why Now? Big Data Use-cases – User Behavior• Location & Proximity Tracking • GPS in operational apps, security analysis, navigation & social media • New business opportunities for sales and services in proximity• Ad Tracking • Dynamic changes in ad placement, color, size and wording • Improved click-through behavior• Social CRM • Text analytics on huge array of unstructured social media • KPI’s: share of voice, audience engagement, conversation reach, …• Causal Factor Discovery in Retail • Deviations based on competition, weather, promos, holidays, events 26
  • 27. Why Now? “Hadoop-able” Use-cases – Sensors• Building Sensors • Temperature, humidity, vibration and noise • Energy usage, security violations, failures in a/c, heat, plumbing 27
  • 28. Why Now? “Hadoop-able” Use-cases – Sensors• Building Sensors • Temperature, humidity, vibration and noise • Energy usage, security violations, failures in a/c, heat, plumbing• In-flight Aircraft Sensors • Variables on engines, hydraulics, fuel & electrical systems • Real-time adaptive control, fuel usage, part failure prediction 28
  • 29. Why Now? “Hadoop-able” Use-cases – Sensors• Building Sensors • Temperature, humidity, vibration and noise • Energy usage, security violations, failures in a/c, heat, plumbing• In-flight Aircraft Sensors • Variables on engines, hydraulics, fuel & electrical systems • Real-time adaptive control, fuel usage, part failure prediction• Smart Utility Meters – Electric Grid • One read-out per second per meter across entire customer base • Dynamic load balancing on grid, failure response, adaptive pricing 29
  • 30. Why Now? “Hadoop-able” Use-cases – Sensors• Building Sensors • Temperature, humidity, vibration and noise • Energy usage, security violations, failures in a/c, heat, plumbing• In-flight Aircraft Sensors • Variables on engines, hydraulics, fuel & electrical systems • Real-time adaptive control, fuel usage, part failure prediction• Smart Utility Meters – Electric Grid • One read-out per second per meter across entire customer base • Dynamic load balancing on grid, failure response, adaptive pricing• Mobile Cell Tower Networks • Analyze call-data-records(CDRs) to optimize cell tower placement • Improved user experience and network monetization 30
  • 31. “Hadoop-able” Use-cases – Computing Delta’s• Commercial Seed Gene Sequencing • Analyzing the sequence, identifying genes and gene families • Baseline reference for the larger cotton crop genome 31
  • 32. “Hadoop-able” Use-cases – Computing Delta’s• Commercial Seed Gene Sequencing • Analyzing the sequence, identifying genes and gene families • Baseline reference for the larger cotton crop genome• Satellite Image Comparison • Overlay of images to create “hot spot” maps to show differences • Construction, destruction, changes due to disasters, encroachment 32
  • 33. “Hadoop-able” Use-cases – Computing Delta’s• Commercial Seed Gene Sequencing • Analyzing the sequence, identifying genes and gene families • Baseline reference for the larger cotton crop genome• Satellite Image Comparison • Overlay of images to create “hot spot” maps to show differences • Construction, destruction, changes due to disasters, encroachment• CAT Scan Comparison • Images taken as “slices” of human body • Automatic diagnosis of medical issues and their prevalence 33
  • 34. “Hadoop-able” Use-cases – Computing Delta’s• Commercial Seed Gene Sequencing • Analyzing the sequence, identifying genes and gene families • Baseline reference for the larger cotton crop genome• Satellite Image Comparison • Overlay of images to create “hot spot” maps to show differences • Construction, destruction, changes due to disasters, encroachment• CAT Scan Comparison • Images taken as “slices” of human body • Automatic diagnosis of medical issues and their prevalence• Document Similarity Testing • Latent semantic analysis: “documents that agree with my doc” • Threat discovery, sentiment analysis and opinion polls 34
  • 35. Agenda• Why the Data Deluge?• Trends Affecting Data Growth• New Use-cases Enabled by Big Data• Trends Underlying Big Data• Building-blocks for Managing Big Data• Q&A 35
  • 36. Big DataConfluence of Big Transaction, Big Interaction and Big Data Processing BIG TRANSACTION DATA BIG INTERACTION DATA Online Online Analytical Social Device Transaction Processing Media Data Sensor Data Processing (OLAP) & (OLTP) DW Appliances Call detail records, image, click stream data Scientific, genomic Machine/Device BIG DATA PROCESSING 36
  • 37. Big Transaction DataOLTP and Analytic Databases BIG TRANSACTION DATA Online Online Analytical Transaction Processing Processing (OLAP) & (OLTP) DW Appliances Oracle Teradata DB2 Redbrick Britton-Lee EssBase Ingres Sybase IQ Informix Netezza Sybase Greenplum SQLServer DataAllegro Asterdata Vertica Paraccel Hana 37
  • 38. Big Transaction DataChanging Economics of Computing From Buy To Rent CRM Application Custom Custom Custom Application Application Application Mainframe Custom HR Custom Application Application Application 38
  • 39. Big Interaction DataChanging Role Of Computing From Transactions to Interactions BIG INTERACTION DATA Social Media Data Device Sensor Data Social Media Clickstream Image/Text Scientific • Genomic/Pharma • Medical Machine/Device • Sensors/Meters/ Device Sensor Data RFID Tags • CDR/Mobile 39
  • 40. Big Interaction DataFrom Operational Efficiency To Organizational EffectivenessBusiness Management Brand Management• Business Analysis • Sentiment Analysis• Operational Automation • Proactive Customer Engagement Relational Social Transactions Interactions 1970 - Current 2008 - Current 40
  • 41. Big Interaction DataHow Do You Leverage Device Sensor Data? • Geo Encoding • Cell-phone Towers • Medical Sensors • RFID Tags • Edge Networks 41
  • 42. Big Data ProcessingHighly Scalable Processing Of All Data BIG TRANSACTION DATA BIG INTERACTION DATA Online Online Analytical Social Device Transaction Processing Media Data Sensor Data Processing (OLAP) & (OLTP) DW Appliances Call detail records, image, click stream data Scientific, genomic Machine/Device BIG DATA PROCESSING 42
  • 43. Big Data ProcessingWhat is Hadoop? SCRIPTING SQL QUERY PARALLEL PERSISTENCE 43
  • 44. Big Data ProcessingWhat does Hadoop do?• Cost effective scalability • Scale out on commodity hardware• Support for processing all data types • Structured, Semi-structured and Unstructured data• Extensibility • Open APIs to implement custom data processing logic• Hadoop Challenges • Data movement into/out of Hadoop / HDFS • Requires specialized development skills • Java, Hive, PIG etc. 44
  • 45. Ingest Data Into HDFS Support over 100 different data sources Integrated Perform any pre Native HDFS development processing Source and environment with needed before Target Support metadata and ingestion preview support 45
  • 46. Design and Execute Data Integration Logic onHadoop Design integration logic for Hadoop in a graphical and metadata driven environment Configure where the integration logic should run – Hadoop or Native 46
  • 47. Design and Execute Data Quality on HadoopBig Data Cleansing, Dedup, Unstructured Parsing Probabilistic or Deterministic Matching Address Validation and Geocoding enrichment across 260 countries Standardization and Reference Data Management Address Matching Validation Standardize Parsing of Unstructured Data/Text Fields of all data Parsing types of data (customer/ product/ social/ logs) DQ logic pushed down/run natively ON Hadoop 47
  • 48. Extract data from HDFS and Hive Extract from HDFS as a native source Perform any post Persist and write processing hadoop data into Extract from Hive DW, HDFS or as a native needed after extraction any target source systems 48 48
  • 49. Processing Big Data : What is missing?• Support for graph/networked data • How does one visualize complex relationships?• Data with dynamic schemas • Do the current patterns scale for very large number of columns?• Are mappings the right paradigm?• Ability to extract entities from unstructured data 49 49
  • 50. References• Why Software is Eating the World • Marc Andreessen, WSJ Aug 2011• Evolving Role of EDW in Era of Big Data Analytics • Ralph Kimball, Kimball Group 2011• Data Scientist: Sexiest Job of the 21st Century • Thomas H. Davenport & D.J.Patil, HBR Sept 2012• Newly Emerging Best Practices for Big Data • Ralph Kimball, Kimball Group Oct 2012 50
  • 51. Questions 51
  • 52. Informatica & Data Verbs on Data – We do things to data! INFA = Data + [ Archival | As a Service | Cleansing | Clustering | Consolidation | Conversion | De-duping | Exchange | Extraction | Federation | Hub | Identity | Integration | Life-cycle Management | Loading | Masking | Mastering | Matching | Migration | On Demand | Privacy | Profiling | Provisioning | Quality | Quality Assessment | Registry | Replication | Retirement | Services | Stewardship | Sub-setting | Synchronization | Test Management | Transformation | Validation | Virtualization | Warehousing |] 52
  • 53. 53