Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

8.17.11 big data and hadoop with informatica slideshare

6,418 views

Published on

This presentation provides a briefing on Big Data and Hadoop and how Informatica's Big Data Integration plays a role to empower the data-centric enterprise.

Published in: Technology, Business
  • Be the first to comment

8.17.11 big data and hadoop with informatica slideshare

  1. 1. Big Data and Hadoop with Informatica August 2011 Julianna DeLua, Enterprise Solution Evangelist
  2. 2. Globalization Operational Efficiency Consolidation Growth Governance Improve Decisions Modernize Business Improve Efficiency & Reduce Costs Mergers Acquisitions & Divestitures Acquire & Retain Customers Outsource Non-core Functions Governance Risk Compliance Increase Partner Network Efficiency Increase Business Agility Cloud Computing Application Database Unstructured Partner Data SWIFT NACHA HIPAA … The Information Economy Lack of Trustworthy Data Impedes Key Business Imperatives Lack of relevant, trustworthy and timely data
  3. 3. Improve Decisions Business & Operational Intelligence Data Warehouse Empowering the Data-Centric Enterprise Modernize Business Improve Efficiency & Reduce Costs Mergers Acquisitions & Divestitures Acquire & Retain Customers Outsource Non-core Functions Governance Risk Compliance Increase Partner Network Efficiency Increase Business Agility Business Imperatives Legacy Retirement Application ILM Application Consolidation Customer, Supplier, Product Hubs BPO SaaS Risk Mitigation & Regulatory Reporting B2B Integration Zero Latency Operations IT Initiatives IT Projects Data Migration Database Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation Complex Event Processing Ultra Messaging
  4. 4. Informatica for Big Data Integration Saved millions annually by improving trucking operations and empowering business with Hadoop-based free-form questions using sensor, mobile and geospatial data Unite operations across 200 brands over 100+ countries through migration of business data from five systems to one Deliver 5x faster & direct access to customer, risk, claims data in variety of sources – DW, 16 legacy, 30000 data marts, 10M claims via data feeds at 1/3 of the cost Business Imperatives Big Data Warehousing & Operational BI Big Data Services Big Data Archiving Social /Big Data Synchronization Big Data Consolidation Complex Event Processing Turned human review into automated alerts in seconds for maritime security – through geospatial and video tracking Deliver cloud access to 177+ million businesses worldwide and 53 million contacts. D&B 360 app updates with linkedin and twitter Increased monthly slot revenues by 4% while expanding target customer segments from 40 to 160 across 500 sources in real-time with social and machine data 25% savings in data center footprint ($1M+) reduce latency by 83 percent to 340 microseconds, enabling a 580 percent increase in throughput over 1B transactions per day and growing Ultra messaging Real-Time Customer View Big Data Collection & Aggregation Reduce Time to Market by 90% by On-Boarding New Data Sources Faster and enabling a wide variety of Data Formats Rationalized application portfolio and saved $1 million with 6 month payback. Reduced age of data by 87% for service monitoring & pattern identification of large scale data Deliver Analytical Insight Improve Business Processes Improve Efficiency & Reduce Costs Mergers Acquisitions & Divestitures Acquire & Retain Customers Outsource Non-core Functions Governance Risk Compliance Increase Partner Network Efficiency Increase Business Agility
  5. 5. Cloud Computing Enterprise Partner Trading Network (B2B) Information Infrastructure Data Infrastructure Business Value through Trustworthy, Actionable, Authoritative Information Assets ILM Enterprise Data Integration B2B Data Exchange EDI NACHA HIPAA Ultra Messaging Ultra Messaging Cloud Data Integration Trust Profile Act Sense Govern Model Complex Event Processing Master Data Management Data Quality Ultra Messaging
  6. 6. Big Data
  7. 7. WHERE PAST FUTURE WHAT HOW Mobile Nexus Of Secular Technology Megatrends Reinventing The Computer Industry On-Premise Transactions Desktops Interactions Cloud
  8. 8. Defining Big Data Definition: Big data is the confluence of the three trends consisting of Big Transaction Data, Big Interaction Data and Big Data Processing Online Transaction Processing (OLTP) Online Analytical Processing (OLAP) & DW Appliances Social Media Data Other Interaction Data Scientific Machine/Device BIG TRANSACTION DATA BIG INTERACTION DATA BIG DATA PROCESSING BIG DATA INTEGRATION
  9. 9. Big Transaction Data Maximize availability and performance of big transaction data All data including OLTP, OLAP and DW appliances Reliable, complete information No data discarded Greater confidence Continuous innovation Database Warehouse Appliances Universal Access Uncover new areas for growth & efficiency Better Actions & Operations Near-Universal Connectivity to Big Transaction Data
  10. 10. Big Interaction Data Achieve a complete view with social and interaction data What influence does she have with her family and friends? How connected is she? What will she do with this merchandise? Any additional services? Turn insights on relationships, influences and behaviors Into opportunities ? Databases Call Detailed Records, Image Files, RFIDs External Data Providers Applications Customer Product … Informatica MDM Connectivity to Big Interaction Data including social data
  11. 11. Weblogs, Mobile Data, Sensor Data Enterprise Applications Semi-structured Unstructured Big Data Processing Unleash the Power of Hadoop Cloud Applications, Social Data Databases, Data Warehouses Hadoop Cluster Sentiment Analysis Fraud Detection Predictive Analytics Portfolio & Risk Analysis Smart Devices Parse & Prepare Data Load Data Read & Deliver Data Transform & Analyze Data Monitor & Manage Orchestrate Workflows
  12. 12. Value of Big Data Integration Unleash the full business potential of Big Data to empower the data-centric enterprise
  13. 13. Hadoop
  14. 14. Big Data Processing What does Hadoop do? <ul><li>Complex data analytics </li></ul><ul><li>Store large amounts of data </li></ul><ul><li>Scaling through distributed processing </li></ul><ul><li>Cost advantage. </li></ul><ul><li>Power of Open Source community </li></ul>
  15. 15. Hadoop Related Use-Cases Meta Use Case Use Case Description High Volume Analytics Customer Churn Analysis Predictive analytics of weblogs to understand user behavior. Risk Analysis Massive modeling and data generation to understand what-if scenarios and total assets needed to cover various positions. ETL on Hadoop Data processing and transformation prior to loading into data warehouses for analytics Defect Tracking and Device Monitoring Device log file analysis to find root cause to issues or patterns of defects Sentiment Analysis Sentiment Analysis Social media data mining combined with transaction data to understand customer sentiments. Marketing Campaign and Ad Analysis Mining of clickstream and log data to understand campaign and offer effectiveness Interaction Analysis Fraud Analysis Clickstream, log mining and web scraping to understand fraudulent behaviors Data storage Data staging and archive Archive data for temporary or permanent storage
  16. 16. Weblogs, Mobile Data, Sensor Data Enterprise Applications Semi-structured Unstructured Big Data Processing Unleash the Power of Hadoop Cloud Applications, Social Data Databases, Data Warehouses Hadoop Cluster PowerExchange for Hadoop B2B Data Transformation for Hadoop Sentiment Analysis Fraud Detection Predictive Analytics Portfolio & Risk Analysis Smart Devices Parse & Prepare Data Load Data Read & Deliver Data Transform & Analyze Data Monitor & Manage Orchestrate Workflows
  17. 17. Tackling Diversity of Big Data Svc Repository social Device/sensor scientific <ul><li>Visual parsing environment </li></ul><ul><li>Predefined translations </li></ul>PIG EDW MDM <ul><ul><li>4. The DT engine can immediately use this service to process data. </li></ul></ul><ul><ul><li>The DT Engine is fully embeddable and can be invoked using any of the supported APIs. </li></ul></ul><ul><ul><li>Java, C++, C, .NET, web services </li></ul></ul><ul><ul><li>For simple integration, a command line interface is available to invoke services. </li></ul></ul><ul><ul><li>Internal custom applications can embed transformation services using the various APIs. </li></ul></ul><ul><ul><li>PowerCenter leverages DT via the Unstructured Data Transformation (UDT). </li></ul></ul><ul><ul><li>This is a GUI transformation widget in Powercenter which wraps around the DT API and engine. </li></ul></ul><ul><ul><li>DT can also be embedded in other middleware technologies. </li></ul></ul><ul><ul><li>For some (WBIMB, WebMethods, BizTalk) INFA provides similar GUI widgets (agents) for the respective design environments. </li></ul></ul><ul><ul><li>For others the API layer can be used directly. </li></ul></ul><ul><ul><li>DT can be invoked in two general ways: </li></ul></ul><ul><ul><li>Filenames can be passed to it, and DT will directly open the file(s) for processing. On the output side, DT can also directly write to the filesystem. </li></ul></ul><ul><ul><li>The calling application can buffer the data and send buffers to DT for processing. On the output side, DT can also write back to memory buffers which are returned to the calling application. </li></ul></ul><ul><li>Though not shown below, the engine fully supports multiple input and output files or buffers as needed by the transformation. </li></ul>Engine invocation is a shared library. The DT engine runs fully within the process of the calling application. It is not an external engine. This removes any overhead from passing data between processes, across the network, etc. The engine is also dynamically invoked and does not need to be ‘started up’ or maintained externally. The DT engine is also thread-safe and re-entrant. This allows the calling application to invoke DT in multiple threads to increase throughput. A good example is DT’s support of PowerCenter partitioning to scale up processing. As shown below, the actual transformation logic is completely independent of any calling application. This means you can develop a transformation once, and leverage it in multiple environments simultaneously resulting in reduced development and maintenance times and lower impact of change. <ul><ul><li>1. Developer uses Studio to develop a transformation </li></ul></ul><ul><ul><li>2. Developer deploys transformation to local service repository (directory). </li></ul></ul><ul><ul><li>All files needed for the transformation are moved. </li></ul></ul><ul><ul><li>3. To deploy to the server, this service folder is moved to the server via FTP, copy, script, etc. </li></ul></ul><ul><ul><li>NOTE : If the server file system is mountable from the developer machine directly, then step 2 would deploy directly to the server. </li></ul></ul>S S Flat Files & Documents Interaction data Industry Standards XML The broadest coverage for Big Data ^/>Delimited<^ Positional Name = Value Productivity Any DI/BI architecture
  18. 18. Device generated data Telco example <ul><ul><li>Support multiple standards, versions, and manufacturer specific extensions of call detail record and XML topology data </li></ul></ul><ul><ul><li>Securely and reliably transfer large volumes of data from staging area to the enterprise </li></ul></ul><ul><ul><li>Manage and monitor data aggregation / collection process to enable analytics using Hadoop </li></ul></ul>HDFS Map reduce Data Exchange - Call Detail Record (CDR) analytics - Node topology analytics Binary ASN.1/XML topology MFT MFT MFT The Challenge Firewall DT DT
  19. 19. External data Channel data analytics over time example HDFS Map reduce Future predictive analytic of channel information Channel/Customer Data Analytics over time of very large amount of data via multiple dimensions: POS, Customer, Product Feedback etc The Challenge DT
  20. 20. High-Level Technical Directions Universal data access Metadata management and auditability Processing in Hadoop Data quality and data governance Data parsing and exchange High throughput data provisioning <ul><li>Easily integrate diversity of Big Data and make sense of it all </li></ul><ul><li>Govern and audit Big Data </li></ul><ul><li>Arm business with right data with high performance data processing and provisioning </li></ul>
  21. 21. Key Takeaways <ul><li>Informatica for Big Data uniquely empowers the data centric enterprise </li></ul><ul><li>Big data integration turn Big Data Challenges into big opportunities </li></ul><ul><li>Informatica continues our pioneering efforts in pushing the frontiers of data integration. </li></ul>

×