Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ten tools for ten big data areas 01 informatica

1,410 views

Published on

Ten tools for ten big data areas series 01 informatica

Published in: Internet
  • Be the first to comment

Ten tools for ten big data areas 01 informatica

  1. 1. Informatica Overview Ten Tools for Ten Big Data Areas Series 01 Big Data Integration www.sparkera.ca
  2. 2. Ten Tools for Ten Big Data Areas – Overview 2© Sparkera. Confidential. All Rights Reserved 10 Tools 10 Areas Programming SearchandIndex First ETL fully on Yarn Data storing platform Data computing platform SQL & Metadata Visualize with just few clicks Powerful as Java Simple as Python real-time streaming Made easier Yours Google Lightning-fast cluster computing Real-time distributed data store High throughput distributed messaging
  3. 3. Agenda 3© Sparkera. Confidential. All Rights Reserved About data integration 2 About Informatica company and its approach 3 Informatica architecture, client, server components, developer tool overview 4 Informatica why and why not 5 Informatica job trend 1
  4. 4. Little About DI – Data Integration • DI involves combining data residing in different sources and providing users with a unified view of these data. • DI process is also called Enterprise Information Integration (EII). • DI usually means ETL - data extract, transformation, load. • 80% of enterprise data projects' efforts are spent on DI work. • Data cleansing, audit, master data management are usually considered with DI. © Sparkera. Confidential. All Rights Reserved
  5. 5. About Informatica Company • Found in 1993 • 2014 revenue – US$1.05 billion • Average growth rate 17% per year • Employee – 5500+ • Customers – 5000 • Value customer covers up to 70% of global top 500 company • Partners – 500+ • Cover various business, industries and government organizations including telecommunications, health care, financial and insurance services. • A company dedicate on data integration and management • Bought out as private company on August 2015. © Sparkera. Confidential. All Rights Reserved
  6. 6. The Tradition Approach Application Database Partner Data SWIFT NACHA HIPAA … Cloud Computing Unstructured 87% of enterprises use hand-coding for data integration 75% of enterprises reported increased maintenance costs Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation Complex Event Processing Ultra Messaging © Sparkera. Confidential. All Rights Reserved
  7. 7. The Informatica Approach Application Partner Data SWIFT NACHA HIPAA … Cloud Computing UnstructuredDatabase Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation Complex Event Processing Ultra Messaging © Sparkera. Confidential. All Rights Reserved
  8. 8. Informatica Latest Products v9.6 • Data Integration  PowerCenter  PowerExchange • Master Data Management • Cloud Integration • Big Data  BDE – Informatica Developer  Big data parser © Sparkera. Confidential. All Rights Reserved
  9. 9. Informatica PowerCenter Overview • An ETL tool ( Extract, Transform and Load) • The main advantages over other ETL tools lies in its robustness, across OS, and high performance. • It can read from a variety of different sources and write to as many targets, while transforming data in between. • The architecture design use SOA concept for better extensibility and high availability • Single sign on access, built-in version control, GUI development, built-in schedule and monitoring © Sparkera. Confidential. All Rights Reserved
  10. 10. Informatica PowerCenter Architecture © Sparkera. Confidential. All Rights Reserved
  11. 11. Informatica PowerCenter Client Component • Repository Manager – meta data management • Designer – Tool to build mapping for ETL logic • Workflow Manager – Tool to build/run session and workflow • Workflow Monitor – Tool to monitor job running • Administration Console (browser based) - administration © Sparkera. Confidential. All Rights Reserved
  12. 12. Repository Manager Navigate through multiple folders and repositories, export & import, user & folder management © Sparkera. Confidential. All Rights Reserved
  13. 13. Designer Create and debug mapping & maplet including source, target, transformations for core ETL logic. © Sparkera. Confidential. All Rights Reserved
  14. 14. Workflow Manager Create, schedule, and run session, workflow, worklet wrapping mapping. © Sparkera. Confidential. All Rights Reserved
  15. 15. Workflow Monitor Monitor running statistics and control execution of workflows. © Sparkera. Confidential. All Rights Reserved
  16. 16. Administration Console Monitor and manager various of Informatica service, licenses, etc. © Sparkera. Confidential. All Rights Reserved
  17. 17. Informatica PowerCenter Server Components • Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables. • Integration service: The Integration service runs sessions and workflows. • Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter workflows as services. • Informatica service: Overall service management and coordination © Sparkera. Confidential. All Rights Reserved
  18. 18. Informatica Big Data Edition Overview Extract, load, and transform with big data ecosystem. © Sparkera. Confidential. All Rights Reserved
  19. 19. Informatica BDE Component - Developer BDE is all in one tool and can fully push job running on Hadoop Developer component • Mapping – Tool to build mapping for ETL logic • Maplet – Reusable mapping • Workflow – Tool to build workflow • Application – Tool to deploy mapping/workflow Others • Monitoring Console (browser based) – job monitoring • Administration Console (browser based) - administration © Sparkera. Confidential. All Rights Reserved
  20. 20. Why Informatica Product • Proven technology leadership • A track record of continuous innovation • The most neutral trusted partner – very focus • Long history of customer success • Over 5000+ industry leaders relies on Informatica • Major banks, telecom, insurance, energy, health, research companies are using Informatica in Toronto • Easy and popular to use • Pull push job to Hadoop • Connector for many kinds of source • Performance and reliability © Sparkera. Confidential. All Rights Reserved
  21. 21. Side Effect - When May Not To • High price: 150K+ to start • Get challenges from ELT – Leverage database for transformation. Need investment on ETL server. Its push to database optimization has limitations. • Schedule, monitoring, and version control functions are limited • BDE is relative new although the concept is great • Alternatives - MS SSIS, Talend Studio, Pentaho Data Integration © Sparkera. Confidential. All Rights Reserved
  22. 22. Informatica Job Trends Level Junior Level (20%) Middle Level (40%) Expert Level (40%) Position ETL developer Informatica dev. DW developer Sr. ETL developer Data Specialist ETL specialist ETL designer ETL Admin Big data ETL dev. BDE developer Informatica architect Informatica consultant Tool PowerCenter Informatica Developer Other Usage Percentage 80% 10% 10% © Sparkera. Confidential. All Rights Reserved
  23. 23. www.sparkera.ca BIG DATA is not only about data, but the understanding of the data and how people use data actively to improve their life.

×