Your SlideShare is downloading. ×
0
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar

3,908

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,908
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
226
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Map/Reduce implementationApache Open Source Project : Yahoo dominatedTwo major componentsHDFSFailure Resilient Distributed File SystemsMap/ReduceFailure Resilient Distributed Computing FrameworkScales to thousand+ node clusterUsed by Yahoo, Facebook etc
  • Transcript

    • 1. Informatica &amp; Big Data <br />Sanjeev Kumar<br />VP &amp; MD, Informatica India<br />Apache Hadoop India Summit 2011<br />
    • 2. Agenda<br />Big Data <br />Big Data in Enterprise<br />Informatica &amp; Data<br />Informatica &amp; Big Data<br />
    • 3. Why “Big Data” Now? : Exploding Data Volumes<br />Complex, Unstructured<br />Relational<br /><ul><li> 2,500 exabytes of new information in 2012 with Internet as primary driver
    • 4. Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year</li></ul>Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. <br />.<br />
    • 5. Why Now? Exploding Data Volumes<br />Explosion in user-generated content<br />e.g. Blogs, Twitter, Facebook etc.<br />Proliferation of web-connected devices<br />Smartphone interactions with the web<br />Increased consumption of digital content<br />Netflix, HULU, Pandora etc.<br />Internet of things<br />Smart-grid and smart-meters<br />Machine-generated data via the web<br />
    • 6. Why Now? : New Apps/Use-cases<br />Analyze customer/market sentiment<br />Text analytics on Social Media, blogs<br />Achieve Operational Efficiency<br />e.g. Analyze CDRs to optimize cell tower placements<br />Make Recommendations<br />Data mining on click-stream, purchase history<br />Predict the future<br />e.g. Flightcast predicts flight delays<br />
    • 7. Big Data Challenges<br />Storage<br />Cost-effective Scalability: to multi-terabytes and petabytes<br />Non-traditional data models: complex, semi-structured data<br />Processing<br />Data mining, collaborative filtering for structured data<br />Text Analytics, classification etc. for unstructured data<br />Regulatory Compliance<br />Data Privacy / Masking<br />Data Archival<br />
    • 8. Addressing Big Data Challenges<br />Storage<br />Parallel Databases<br />Greenplum(EMC), Vertica, AsterData<br />Distributed Key/Value Stores <br />Hbase, Google’s BigTable, Amazon’s SimpleDB<br />Distributed File Systems<br />HDFS, GFS, ParAccel<br />Analytics<br />SQL with extensions<br />Map Reduce<br />DataFlow Languages : PIG, Sawzall etc<br />
    • 9. Hadoop Technology Stack<br />Pig<br />Hive<br />Cascading<br />ZooKeeper<br />Map/Reduce<br />HBase<br />HDFS<br />
    • 10. Hadoop Momentum<br />Job Trends from Indeed.com<br />Search Volume Index<br />News Reference Volume<br />
    • 11. Big Data in the Enterprise – Hadoop Usage<br />
    • 12. Big Data in the EnterpriseCase Studies: Hadoop World 2009<br />Yahoo!: Social Graph Analysis<br />VISA: Large Scale Transaction Analysis<br />China Mobile: Data Mining Platform for Telecom Industry<br />JP Morgan Chase: Data Processing for Financial Services<br />eHarmony: Matchmaking in the Hadoop Cloud<br />Rackspace: Cross Data Center Log Processing<br />Visible Technologies: Real-Time Business Intelligence<br />Booz Allen Hamilton: Protein Alignment using Hadoop<br />Slides and Videos at http://www.cloudera.com/hadoop-world-nyc<br />
    • 13. Big Data in the EnterpriseCase Studies: Hadoop World 2010<br />eBay: Hadoop at eBay<br />Twitter: The Hadoop Ecosystem at Twitter<br />General Electric: Sentiment Analysis powered by Hadoop<br />Yale University: MapReduce and Parallel Database Systems<br />AOL: AOL’s Data Layer<br />Facebook: Hbase in Production <br />Bank of America: The Business of Big Data<br />StumbleUpon: Mixing Real-Time and Batch Processing<br />Raytheon: SHARD: Storing and Querying Large-Scale Data<br />More info at - http://www.cloudera.com/company/press-center/hadoop-world-nyc/<br />
    • 14. Agenda<br />Big Data <br />Big Data in Enterprise<br />Informatica &amp; Data<br />Informatica &amp; Big Data<br />
    • 15. Informatica – Our Singular Mission Enabling The Information Economy <br /> We enable organizations to gain a competitive advantage from all their information assetsto drive their top business imperatives<br />
    • 16. Informatica – What We DoComprehensive, Unified, Open and Economical platform<br />Application<br />Partner Data<br />SWIFT<br />NACHA<br />HIPAA<br />…<br />Cloud Computing<br />Unstructured<br />Database<br />Complex<br />Event<br />Processing<br />Data <br />Warehouse<br />Data<br />Migration<br />Test Data<br />Management<br />&amp; Archiving<br />Master Data<br />Management<br />Data <br />Synchronization<br />B2B Data<br />Exchange<br />Data<br />Consolidation<br />UltraMessaging<br />
    • 17. Informatica &amp; Data<br />Verbs on Data – We do things to data!<br />INFA = Data + [ <br />Archival | As a Service | Cleansing | Clustering | Consolidation | <br />Conversion | De-duping | Exchange | Extraction | Federation | <br />Hub | Identity | Integration | Life-cycle Management | <br />Loading | Masking | Mastering | Matching | Migration | On Demand | <br />Privacy | Profiling | Provisioning | Quality | Quality Assessment | <br />Registry | Replication | Retirement | Services | Stewardship | <br />Sub-setting | Synchronization | Test Management | Transformation | <br />Validation | Virtualization | Warehousing|<br />]<br />
    • 18. Informatica &amp; Big Data<br />HDFS as a source and a target - Enable universal data connectivity for Hadoop developers<br />Enable Hadoop developers to leverage prebuilt Data Transformation and Data Quality logic <br />Lower the barrier to Hadoop-entry by using Informatica Developer as a development tool<br />Support virtualized access to data split across HDFS and (relational) data-warehouses<br />
    • 19. Informatica &amp; Hadoop – Big Picture<br />Enterprise <br />Connectivity for <br />Hadoop programs<br />Weblogs<br />Databases<br />BI<br />DW/DM<br />Metadata<br />Repository<br />Graphical IDE for<br />Hadoop Development<br />Semi-structured<br />Un-structured<br />Enterprise Applications<br />Transformation<br />Engine for custom<br />data processing<br />Hadoop Cluster<br />HDFS<br />Job Tracker<br />HDFS<br />Name Node<br />Data Node<br />HDFS<br />

    ×