Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oil and gas big data edition


Published on

David Ramirez presented at Houston Hadoop Meetup in August 2015

Published in: Technology
  • Be the first to comment

Oil and gas big data edition

  1. 1. Big Data and The Informatica Platform 9/8/2015 David Ramirez Senior Solution Architect Oil and Gas Accounts
  2. 2. About Informatica • Founded: 1993 INFA Nasdaq • 2014 Revenue: $1.2b • Partners: 450+ • Major SI, ISV, OEM and On-Demand Leaders • Customers: 5,000+ • > 70% of the Global 500 • Customers in 82 Countries • Direct Presence in 26 Countries • # 1 in Customer Loyalty Rankings (7 Years in a Row) 2
  3. 3. B2B Data Exchange Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration. Cloud Data IntegrationEnterprise Data Integration Complex Event Processing Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day stand‐up of RulePoint. Ultra Messaging In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market. Data Quality Master Data Management Application ILM Proven Technology Leadership 3
  4. 4. Problem: • Analytics teams spend most of their time looking for and preparing data not analyzing it • Impacts project delays, cost overruns, missed opportunities Data Lake Solution • A single place to manage the supply and demand of data • Converts raw big data into fit- for-purpose, trusted, and secure information Intelligent Data Lake Manage Supply & Demand of Data
  5. 5. 80% of the work in big data projects is data intelligence “I spend more than half my time integrating, cleansing, and transforming data without doing any actual analysis.” “80% of the work in any data project is in cleaning the data” “70% of my value is an ability to pull the data, 20% of my value is using data-science…” Sources: (1) DJ Patil, Data Jujitsu; (2-3) Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012
  6. 6. First Pilot(s) Data Warehouse Optimization Data Discovery Real-Time Operational Intelligence Lower operational IT costs Big Data Analytics Operationalize Big Data Insights Predictive Maintenance Lower Total Cost of Care Customer X/Up-Sell Public Safety Fraud Detection Machine Device, Cloud Documents and Emails Relational, Mainframe Social Media, Web Logs DrivenbyITDrivenbyBusiness Lower Infrastructure Cost Added Business Value What’s Hadoop? Intelligent Data Lake Intelligent Data Lake Platform for Big Data Projects
  7. 7. Informatica knows the Data Lifecycle Related Challenges Source:- Gartner Informatica Platform Data Ingestion Refinement Mastery/ Delivery Data Security Data Retirement • Data Quality •Exception Management • Any Platform, Appication •Structured, Unstructured •Any latency • Master Data Management • Data Integration Hub • Data Archive •Records Retention/Discovery •Data Masking
  8. 8. Informatica Platform Overview Relational DB .pdf, email, email Dev Test Prod Archive 3. Analyze 1. Profile 2. Define Targets 5. Monitor 4. Build Rules D A T A Q U A L I T Y S E C U R I T Y E T L M D M MaterialsWellhead Customer Customer Customer Wellhead Wellhead Materials Materials Databases Unstructured Data Big Data Cloud Visualizations
  9. 9. Application Database Partner Data SWIFT NACHA HIPAA … Cloud Computing Unstructured Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation The Informatica DI Platform Comprehensive, Unified, Open and Economical platform
  10. 10. Data Sources Applications Data Warehouse MDM / PIM Data Ingestion Visualization Data Governance Data Security Archiving Replication Data Streaming Change Data Capture Batch Load Data Virtualization Event-Based Processing Data Integration Hub Data Integration & Data Quality Agile Analytics Advanced Analytics Machine Learning Virtual Data Machine Data Management Data Delivery Machine Device, Cloud Documents and Emails Relational, Mainframe Social Media, Web Logs Mobile Apps Visualization & Analytics Real-Time Alerts Batch Load Pub / Sub Data Service Integrate & Prepare Loose Coupling & Abstraction
  11. 11. 11 Development Agility 1
  12. 12. Logical Data Objects PRODUCT …CUSTOMER ORDER Jumpstart/Accelerate Projects Data SourceData SourceData Source 1 Instant Business-IT Collaboration with Analyst Tool 2 Profile to Discover Data Patterns and Issues 3 4 Prototype and Validate Results Data Source Fine-tune and Deploy Desired Solution in Days Business IT IT Business Business IT Business IT Common Repository Entire Life Cycle Supported by PowerCenter Standard Edition 9.
  13. 13. 13 Enterprise Scalability 2
  14. 14. Scale-up As Your Needs Grow 14 IT IT IT ITHigh Availability Pushdown Optimization Enterprise Grid Concurrent Users Partitioned Data IT Included in PowerCenter Advanced Edition 9.6
  15. 15. 15 Manage Metadata for Better Data Insights Data Lineage Consolidated Metadata Catalog Federated Business Glossary Mainframe Flat FilesDatabase Data Modeling BI ToolsERP Metadata Repository Custom Metadata Reports 3rd party BI Metadata Bookmarks
  16. 16. 16 Common Biz Language Via Business Glossary Provide a common vocabulary of business terms Easily search for glossary assets with workflow Manage relationships with other assets Manage business policies governing the assets Analyst
  17. 17. 17 Operational Confidence 3
  18. 18. Improve Operational Confidence With Automated Testing and Monitoring 18 End-to-End Agility Requirements Gathering Prototype & Validate Deploy IT IT Business IT IT Business Satisfied Business-IT Collaboration Develop Business IT IT Self Service Monitor IT Test IT
  19. 19. Automate Data Validation Testing Data Validation Testing Capability Enterprise Data PowerCenter Execute Tests DVO Repository & Warehouse ReportsDatabase Views Id: name name: string Price: integer Date in: date Date out: date Salary: float V_Summary Id: name name: string Price: integer Date in: date Date out: date Salary: float V_Tests Id: name name: string Price: integer Date in: date Date out: date Salary: float V_Results Define Tests DVO Clients Write Results Data Accessed • Relational databases • Flat files • Mainframe data • DW Appliances • Cloud-based data
  20. 20. Proactively Monitor with PowerCenter 9.6 20 PowerCenter WS Hub Send Alerts to Stakeholders Environnent Information Get Operating System, Database Statistics PowerCenter Repository Automated Monitoring and Detection (Source Feeds, Rules/Templates, Watchlists, Alerts) Analyst IT IT Operations Analyst Configure / Build Rules 1 2 4 Get PowerCenter Statistics Monitor PowerCenter Operations3
  21. 21. 1. Entire Informatica mapping translated to optimal open source project 2. Currently, MapReduce submitted to Hadoop cluster. 3. Advanced mapping transformations executed on Hadoop through User Defined Functions using Vibe MapReduce UDF Informatica on Hadoop Informatica Execution on Hadoop Architecture Flink
  22. 22. INFA’s Unified Platform = Strong Time-to-Value “Informatica and Microsoft are so much more consistent than their competitors [because] the platforms provided by these companies support transferable skills across projects more flexibly than do their rivals.“
  23. 23. TCO – Informatica vs. Hand Coding $8,500 $11,500 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000 Informatica Hand Coding Average Costs (3-year TCO) per project per end point
  24. 24. 2.4 1 2.4 0.7 5.3 1.2 2.7 0.8 0 2 4 6 Hand coding Informatica Master Data management Data Warehousing Data Migration Application Integration Informatica is Far More Productive than Hand Coding Source: “ Comparative Costs and Uses for Data Integration Platforms” Bloor Research, March 2014 24 Average Time to Develop by Project Type (Weeks) Depending on the project hand coding can take more than 4 weeks longer to develop!
  25. 25. • Demo – Data Profiling on Hadoop Big Data – Data Profiling on Hadoop 25