• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
 

Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar

on

  • 4,222 views

 

Statistics

Views

Total Views
4,222
Views on SlideShare
4,067
Embed Views
155

Actions

Likes
2
Downloads
220
Comments
0

2 Embeds 155

http://d.hatena.ne.jp 154
http://doryokujin.hatenablog.jp 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Map/Reduce implementationApache Open Source Project : Yahoo dominatedTwo major componentsHDFSFailure Resilient Distributed File SystemsMap/ReduceFailure Resilient Distributed Computing FrameworkScales to thousand+ node clusterUsed by Yahoo, Facebook etc

Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar Presentation Transcript

  • Informatica & Big Data
    Sanjeev Kumar
    VP & MD, Informatica India
    Apache Hadoop India Summit 2011
  • Agenda
    Big Data
    Big Data in Enterprise
    Informatica & Data
    Informatica & Big Data
  • Why “Big Data” Now? : Exploding Data Volumes
    Complex, Unstructured
    Relational
    • 2,500 exabytes of new information in 2012 with Internet as primary driver
    • Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year
    Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.
    .
  • Why Now? Exploding Data Volumes
    Explosion in user-generated content
    e.g. Blogs, Twitter, Facebook etc.
    Proliferation of web-connected devices
    Smartphone interactions with the web
    Increased consumption of digital content
    Netflix, HULU, Pandora etc.
    Internet of things
    Smart-grid and smart-meters
    Machine-generated data via the web
  • Why Now? : New Apps/Use-cases
    Analyze customer/market sentiment
    Text analytics on Social Media, blogs
    Achieve Operational Efficiency
    e.g. Analyze CDRs to optimize cell tower placements
    Make Recommendations
    Data mining on click-stream, purchase history
    Predict the future
    e.g. Flightcast predicts flight delays
  • Big Data Challenges
    Storage
    Cost-effective Scalability: to multi-terabytes and petabytes
    Non-traditional data models: complex, semi-structured data
    Processing
    Data mining, collaborative filtering for structured data
    Text Analytics, classification etc. for unstructured data
    Regulatory Compliance
    Data Privacy / Masking
    Data Archival
  • Addressing Big Data Challenges
    Storage
    Parallel Databases
    Greenplum(EMC), Vertica, AsterData
    Distributed Key/Value Stores
    Hbase, Google’s BigTable, Amazon’s SimpleDB
    Distributed File Systems
    HDFS, GFS, ParAccel
    Analytics
    SQL with extensions
    Map Reduce
    DataFlow Languages : PIG, Sawzall etc
  • Hadoop Technology Stack
    Pig
    Hive
    Cascading
    ZooKeeper
    Map/Reduce
    HBase
    HDFS
  • Hadoop Momentum
    Job Trends from Indeed.com
    Search Volume Index
    News Reference Volume
  • Big Data in the Enterprise – Hadoop Usage
  • Big Data in the EnterpriseCase Studies: Hadoop World 2009
    Yahoo!: Social Graph Analysis
    VISA: Large Scale Transaction Analysis
    China Mobile: Data Mining Platform for Telecom Industry
    JP Morgan Chase: Data Processing for Financial Services
    eHarmony: Matchmaking in the Hadoop Cloud
    Rackspace: Cross Data Center Log Processing
    Visible Technologies: Real-Time Business Intelligence
    Booz Allen Hamilton: Protein Alignment using Hadoop
    Slides and Videos at http://www.cloudera.com/hadoop-world-nyc
  • Big Data in the EnterpriseCase Studies: Hadoop World 2010
    eBay: Hadoop at eBay
    Twitter: The Hadoop Ecosystem at Twitter
    General Electric: Sentiment Analysis powered by Hadoop
    Yale University: MapReduce and Parallel Database Systems
    AOL: AOL’s Data Layer
    Facebook: Hbase in Production
    Bank of America: The Business of Big Data
    StumbleUpon: Mixing Real-Time and Batch Processing
    Raytheon: SHARD: Storing and Querying Large-Scale Data
    More info at - http://www.cloudera.com/company/press-center/hadoop-world-nyc/
  • Agenda
    Big Data
    Big Data in Enterprise
    Informatica & Data
    Informatica & Big Data
  • Informatica – Our Singular Mission Enabling The Information Economy
    We enable organizations to gain a competitive advantage from all their information assetsto drive their top business imperatives
  • Informatica – What We DoComprehensive, Unified, Open and Economical platform
    Application
    Partner Data
    SWIFT
    NACHA
    HIPAA

    Cloud Computing
    Unstructured
    Database
    Complex
    Event
    Processing
    Data
    Warehouse
    Data
    Migration
    Test Data
    Management
    & Archiving
    Master Data
    Management
    Data
    Synchronization
    B2B Data
    Exchange
    Data
    Consolidation
    UltraMessaging
  • Informatica & Data
    Verbs on Data – We do things to data!
    INFA = Data + [
    Archival | As a Service | Cleansing | Clustering | Consolidation |
    Conversion | De-duping | Exchange | Extraction | Federation |
    Hub | Identity | Integration | Life-cycle Management |
    Loading | Masking | Mastering | Matching | Migration | On Demand |
    Privacy | Profiling | Provisioning | Quality | Quality Assessment |
    Registry | Replication | Retirement | Services | Stewardship |
    Sub-setting | Synchronization | Test Management | Transformation |
    Validation | Virtualization | Warehousing|
    ]
  • Informatica & Big Data
    HDFS as a source and a target - Enable universal data connectivity for Hadoop developers
    Enable Hadoop developers to leverage prebuilt Data Transformation and Data Quality logic
    Lower the barrier to Hadoop-entry by using Informatica Developer as a development tool
    Support virtualized access to data split across HDFS and (relational) data-warehouses
  • Informatica & Hadoop – Big Picture
    Enterprise
    Connectivity for
    Hadoop programs
    Weblogs
    Databases
    BI
    DW/DM
    Metadata
    Repository
    Graphical IDE for
    Hadoop Development
    Semi-structured
    Un-structured
    Enterprise Applications
    Transformation
    Engine for custom
    data processing
    Hadoop Cluster
    HDFS
    Job Tracker
    HDFS
    Name Node
    Data Node
    HDFS