Hadoop India Summit, Feb 2011 - Informatica

  • 4,464 views
Uploaded on

Lightening talk by Sanjeev Kumar, from Informatica India. Presented at the Hadoop India Summit on Feb 16, 2011.

Lightening talk by Sanjeev Kumar, from Informatica India. Presented at the Hadoop India Summit on Feb 16, 2011.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,464
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
136
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Map/Reduce implementation Apache Open Source Project : Yahoo dominated Two major components HDFS Failure Resilient Distributed File Systems Map/Reduce Failure Resilient Distributed Computing Framework Scales to thousand+ node cluster Used by Yahoo, Facebook etc
  • This is the Informatica Corporate Presentation

Transcript

  • 1. Informatica & Big Data Sanjeev Kumar VP & MD, Informatica India Apache Hadoop India Summit 2011
  • 2. Agenda
    • Big Data
    • Big Data in Enterprise
    • Informatica & Data
    • Informatica & Big Data
  • 3. Why “Big Data” Now? : Exploding Data Volumes Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. . Relational Complex, Unstructured
    • 2,500 exabytes of new information in 2012 with Internet as primary driver
    • Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
  • 4. Why Now? Exploding Data Volumes
    • Explosion in user-generated content
      • e.g. Blogs, Twitter, Facebook etc.
    • Proliferation of web-connected devices
      • Smartphone interactions with the web
    • Increased consumption of digital content
      • Netflix, HULU, Pandora etc.
    • Internet of things
      • Smart-grid and smart-meters
      • Machine-generated data via the web
  • 5. Why Now? : New Apps/Use-cases
    • Analyze customer/market sentiment
      • Text analytics on Social Media, blogs
    • Achieve Operational Efficiency
      • e.g. Analyze CDRs to optimize cell tower placements
    • Make Recommendations
      • Data mining on click-stream, purchase history
    • Predict the future
      • e.g. Flightcast predicts flight delays
  • 6. Big Data Challenges
    • Storage
      • Cost-effective Scalability: to multi-terabytes and petabytes
      • Non-traditional data models: complex, semi-structured data
    • Processing
      • Data mining, collaborative filtering for structured data
      • Text Analytics, classification etc. for unstructured data
    • Regulatory Compliance
      • Data Privacy / Masking
      • Data Archival
  • 7. Addressing Big Data Challenges
    • Storage
      • Parallel Databases
        • Greenplum(EMC), Vertica, AsterData
      • Distributed Key/Value Stores
        • Hbase, Google’s BigTable, Amazon’s SimpleDB
      • Distributed File Systems
        • HDFS, GFS, ParAccel
    • Analytics
      • SQL with extensions
      • Map Reduce
      • DataFlow Languages : PIG, Sawzall etc
  • 8. Hadoop Technology Stack HDFS HBase Map/Reduce Pig Hive Cascading
  • 9. Hadoop Momentum Search Volume Index News Reference Volume Job Trends from Indeed.com
  • 10. Big Data in the Enterprise – Hadoop Usage
  • 11. Big Data in the Enterprise Case Studies: Hadoop World 2009
    • Yahoo!: Social Graph Analysis
    • VISA : Large Scale Transaction Analysis
    • China Mobile : Data Mining Platform for Telecom Industry
    • JP Morgan Chase : Data Processing for Financial Services
    • eHarmony : Matchmaking in the Hadoop Cloud
    • Rackspace: Cross Data Center Log Processing
    • Visible Technologies: Real-Time Business Intelligence
    • Booz Allen Hamilton : Protein Alignment using Hadoop
    • Slides and Videos at http://www.cloudera.com/hadoop-world-nyc
  • 12.
    • eBay: Hadoop at eBay
    • Twitter: The Hadoop Ecosystem at Twitter
    • General Electric : Sentiment Analysis powered by Hadoop
    • Yale University : MapReduce and Parallel Database Systems
    • AOL: AOL’s Data Layer
    • Facebook: Hbase in Production
    • Bank of America: The Business of Big Data
    • StumbleUpon: Mixing Real-Time and Batch Processing
    • Raytheon : SHARD: Storing and Querying Large-Scale Data
    • More info at - http://www.cloudera.com/company/press-center/hadoop-world-nyc /
    Big Data in the Enterprise Case Studies: Hadoop World 2010
  • 13. Agenda
    • Big Data
    • Big Data in Enterprise
    • Informatica & Data
    • Informatica & Big Data
  • 14.
    • We enable organizations to gain a competitive advantage from all their information assets to drive their top business imperatives
    Informatica – Our Singular Mission Enabling The Information Economy
  • 15. Informatica – What We Do Comprehensive, Unified, Open and Economical platform Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation Complex Event Processing Ultra Messaging Application Partner Data SWIFT NACHA HIPAA … Cloud Computing Unstructured Database
  • 16.
    • INFA = Data + [
      • Archival | As a Service | Cleansing | Clustering | Consolidation |
      • Conversion | De-duping | Exchange | Extraction | Federation |
      • Hub | Identity | Integration | Life-cycle Management |
      • Loading | Masking | Mastering | Matching | Migration | On Demand |
      • Privacy | Profiling | Provisioning | Quality | Quality Assessment |
      • Registry | Replication | Retirement | Services | Stewardship |
      • Sub-setting | Synchronization | Test Management | Transformation |
      • Validation | Virtualization | Warehousing |
    • ]
    Informatica & Data Verbs on Data – We do things to data!
  • 17. Informatica & Big Data
    • HDFS as a source and a target - Enable universal data connectivity for Hadoop developers
    • Enable Hadoop developers to leverage prebuilt Data Transformation and Data Quality logic
    • Lower the barrier to Hadoop-entry by using Informatica Developer as a development tool
    • Support virtualized access to data split across HDFS and (relational) data-warehouses
  • 18. Informatica & Hadoop – Big Picture HDFS Data Node HDFS Name Node HDFS Job Tracker Hadoop Cluster Transformation Engine for custom data processing Weblogs Enterprise Applications Databases Semi-structured Un-structured BI DW/DM Metadata Repository Graphical IDE for Hadoop Development Enterprise Connectivity for Hadoop programs
  • 19.