• Save
Integrate Big Data into Your Organization with Informatica and Perficient
 

Integrate Big Data into Your Organization with Informatica and Perficient

on

  • 1,677 views

 

Statistics

Views

Total Views
1,677
Views on SlideShare
1,584
Embed Views
93

Actions

Likes
3
Downloads
0
Comments
0

2 Embeds 93

http://www.perficient.com 89
http://perficientauth.perficient.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Cost saving/ control of growing data environmentData management cost optimizationBusiness specific big data analyticsBig data integration to support analytics and new data products and services
  • Cost saving/ control of growing data environmentData management cost optimizationBusiness specific big data analyticsBig data integration to support analytics and new data products and services
  • Cost saving/ control of growing data environmentData management cost optimizationBusiness specific big data analyticsBig data integration to support analytics and new data products and services
  • Challenges & Problems Customers are facing with Big DataGrowing data volumes, expensive data warehouse upgradesVariety of data, onboarding new types of dataLack of Big Data skillsBuilding the business case for a big data projectDon’t know where to beginRegulatory compliance and security (e.g. data privacy, data sharing)Speaker Notes:There are several challenges related to Big Data AnalyticsAs data volumes continue to grow how can you continue to meet your SLAs for existing projects while controlling costs?It’s estimated that Big Transaction Data alone is growing at 50-60% per yearApplication databases are growing to the point where not only the hardware and software costs are rising but application performance is adversely affectedData warehouses are also growing too fast using up the capacity of current infrastructure investments. And with Big Interaction Data exploding who can afford to store all this information in their enterprise data warehouse. In fact one financial institution estimated that it costs $180K to manage 1 TB of data in their data warehouse over a 3-year periodAs more and more users demand information, organizations also experience a proliferation of datamarts that further increases hardware and database costs. A large healthcare insurance provider had over 30,000 datamarts and spreadmarts across the companyWith data volumes growing exponentially its becoming difficult to process all the data required for the data warehouse during the nightly batch windows.If you continue to just throw expensive hardware and database licenses at the Big Data problem your costs will spiral out of controlMore and more organizations would like to leverage the massive amounts of interaction data such as social media and machine device data to attract and retain customers, improve business operations, and their competitive advantage. But because so much of this data is multi-structured and being generated at a rate that is akin to drinking through a fire hose they find accessing, storing, and processing interaction data can be extremely difficult.Another challenge with big data is that because there is so much new data being generated and stored, it is difficult for organizations to find, understand, and trust the data
  • Challenges & Problems Customers are facing with Big DataGrowing data volumes, expensive data warehouse upgradesVariety of data, onboarding new types of dataLack of Big Data skillsBuilding the business case for a big data projectDon’t know where to beginRegulatory compliance and security (e.g. data privacy, data sharing)Speaker Notes:There are several challenges related to Big Data AnalyticsAs data volumes continue to grow how can you continue to meet your SLAs for existing projects while controlling costs?It’s estimated that Big Transaction Data alone is growing at 50-60% per yearApplication databases are growing to the point where not only the hardware and software costs are rising but application performance is adversely affectedData warehouses are also growing too fast using up the capacity of current infrastructure investments. And with Big Interaction Data exploding who can afford to store all this information in their enterprise data warehouse. In fact one financial institution estimated that it costs $180K to manage 1 TB of data in their data warehouse over a 3-year periodAs more and more users demand information, organizations also experience a proliferation of datamarts that further increases hardware and database costs. A large healthcare insurance provider had over 30,000 datamarts and spreadmarts across the companyWith data volumes growing exponentially its becoming difficult to process all the data required for the data warehouse during the nightly batch windows.If you continue to just throw expensive hardware and database licenses at the Big Data problem your costs will spiral out of controlMore and more organizations would like to leverage the massive amounts of interaction data such as social media and machine device data to attract and retain customers, improve business operations, and their competitive advantage. But because so much of this data is multi-structured and being generated at a rate that is akin to drinking through a fire hose they find accessing, storing, and processing interaction data can be extremely difficult.Another challenge with big data is that because there is so much new data being generated and stored, it is difficult for organizations to find, understand, and trust the data
  • Challenges & Problems Customers are facing with Big DataGrowing data volumes, expensive data warehouse upgradesVariety of data, onboarding new types of dataLack of Big Data skillsBuilding the business case for a big data projectDon’t know where to beginRegulatory compliance and security (e.g. data privacy, data sharing)Speaker Notes:There are several challenges related to Big Data AnalyticsAs data volumes continue to grow how can you continue to meet your SLAs for existing projects while controlling costs?It’s estimated that Big Transaction Data alone is growing at 50-60% per yearApplication databases are growing to the point where not only the hardware and software costs are rising but application performance is adversely affectedData warehouses are also growing too fast using up the capacity of current infrastructure investments. And with Big Interaction Data exploding who can afford to store all this information in their enterprise data warehouse. In fact one financial institution estimated that it costs $180K to manage 1 TB of data in their data warehouse over a 3-year periodAs more and more users demand information, organizations also experience a proliferation of datamarts that further increases hardware and database costs. A large healthcare insurance provider had over 30,000 datamarts and spreadmarts across the companyWith data volumes growing exponentially its becoming difficult to process all the data required for the data warehouse during the nightly batch windows.If you continue to just throw expensive hardware and database licenses at the Big Data problem your costs will spiral out of controlMore and more organizations would like to leverage the massive amounts of interaction data such as social media and machine device data to attract and retain customers, improve business operations, and their competitive advantage. But because so much of this data is multi-structured and being generated at a rate that is akin to drinking through a fire hose they find accessing, storing, and processing interaction data can be extremely difficult.Another challenge with big data is that because there is so much new data being generated and stored, it is difficult for organizations to find, understand, and trust the data
  • You don’t want expensive data scientists ($300K FTE) doing this work. JPMC – hand coding 3 weeks and INFA was 3 dayIn a recent Information Week article – Meet The Elusive Data Scientist – CatalinCiobanu, a physicist who spent ten years at Fermi National Accelerator Laboratory (Fermilab) and is now senior manager-BI at Carlson Wagonlit Travel, said “70% of my value is an ability to pull the data, 20% of my value is using data-science methods and asking the right questions, and 10% of my value is knowing the tools”. DJ Patil, Data Scientist in Residence at Greylock Partners (formerly Chief Data Scientist at LinkedIn) states in his book "Data Jujitsu" that “80% of the work in any data project is in cleaning the data.” In a recent study that surveyed 35 data scientists across 25 companies (Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012) a couple of data scientists expressed their frustration in preparing data for analysis: “I spend more than half my time integrating, cleansing, and transforming data without doing any actual analysis. Most of the time I’m lucky if I get to do any ‘analysis’ at all.”, and another data scientist informs us that “most of the time once you transform the data … the insights can be scarily obvious.”44% of big data projects are cancelled versus 25% for IT projects in general and many more fail to achieve project objectives according to a 2012 Enterprise Big Data Survey – Infochimps/SSWUG Enterprise Big Data Survey 2012, Synamic Markets Enterprise IT Survey 2008http://www.slideshare.net/infochimps/top-strategies-for-successful-big-data-projectsWhy do projects fail?BusinessInaccurate scope – not enough time / deadlines bustedNon-cooperation between departmentsHaving the right talent / lack of expertiseTechnicalTechnical or roll-out roadblocks – gathering data from different sourcesFinding and understanding tools, platforms, technologies
  • You don’t want expensive data scientists ($300K FTE) doing this work. JPMC – hand coding 3 weeks and INFA was 3 dayIn a recent Information Week article – Meet The Elusive Data Scientist – CatalinCiobanu, a physicist who spent ten years at Fermi National Accelerator Laboratory (Fermilab) and is now senior manager-BI at Carlson Wagonlit Travel, said “70% of my value is an ability to pull the data, 20% of my value is using data-science methods and asking the right questions, and 10% of my value is knowing the tools”. DJ Patil, Data Scientist in Residence at Greylock Partners (formerly Chief Data Scientist at LinkedIn) states in his book "Data Jujitsu" that “80% of the work in any data project is in cleaning the data.” In a recent study that surveyed 35 data scientists across 25 companies (Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012) a couple of data scientists expressed their frustration in preparing data for analysis: “I spend more than half my time integrating, cleansing, and transforming data without doing any actual analysis. Most of the time I’m lucky if I get to do any ‘analysis’ at all.”, and another data scientist informs us that “most of the time once you transform the data … the insights can be scarily obvious.”44% of big data projects are cancelled versus 25% for IT projects in general and many more fail to achieve project objectives according to a 2012 Enterprise Big Data Survey – Infochimps/SSWUG Enterprise Big Data Survey 2012, Synamic Markets Enterprise IT Survey 2008http://www.slideshare.net/infochimps/top-strategies-for-successful-big-data-projectsWhy do projects fail?BusinessInaccurate scope – not enough time / deadlines bustedNon-cooperation between departmentsHaving the right talent / lack of expertiseTechnicalTechnical or roll-out roadblocks – gathering data from different sourcesFinding and understanding tools, platforms, technologies
  • Use the Two slide version of thisLower CostsLower HW/SW costsOptimized end-to-end performanceRich pre-built connectors, library of transforms for ETL, data quality, parsing, profilingIncreased ProductivityUp to 5x productivity gains with no-code visual development environmentNo need for Hadoop expertise for data integrationProven Path to Innovation5000+ customers, 500+ partners, 100,000+ trained Informatica developersEnterprise scalability, security, & support
  • Use the Two slide version of thisLower CostsLower HW/SW costsOptimized end-to-end performanceRich pre-built connectors, library of transforms for ETL, data quality, parsing, profilingIncreased ProductivityUp to 5x productivity gains with no-code visual development environmentNo need for Hadoop expertise for data integrationProven Path to Innovation5000+ customers, 500+ partners, 100,000+ trained Informatica developersEnterprise scalability, security, & support
  • Use the Two slide version of thisLower CostsLower HW/SW costsOptimized end-to-end performanceRich pre-built connectors, library of transforms for ETL, data quality, parsing, profilingIncreased ProductivityUp to 5x productivity gains with no-code visual development environmentNo need for Hadoop expertise for data integrationProven Path to Innovation5000+ customers, 500+ partners, 100,000+ trained Informatica developersEnterprise scalability, security, & support
  • The Vibe VDM works by receiving a set of instructions that describe the data source(s) from which it will extract data, the rules and flow by which that data will be transformed, analyzed, masked, archived, matched, or cleansed, and ultimately where that data will be loaded when the processing is finished. Vibe consists of a number of fundamental components (see Figure 2):Transformation Library: This is a collectionof useful, prebuilt transformations that the engine calls to combine, transform, cleanse, match, and mask data. For those familiar with PowerCenter or Informatica Data Quality, this library is represented by the icons that the developer can drag and drop onto the canvas to perform actions on data. Optimizer: The Optimizer compiles data processing logic into internal representation to ensure effective resource usage and efficient run time based on data characteristics and execution environment configurations. Executor:This is a run-time execution engine that orchestrates the data logic using the appropriate transformations. The engine reads/writes data from an adapter or directly streams the data from an application. Connectors:Informatica’s connectivity extensions provide data access from various data sources. This is what allows Informatica Platform users to connect to almost any data source or application for use by a variety of data movement technologies and modes, including batch, request/response, and publish/subscribe.
  • The Vibe virtual data machine, although critical, is not sufficient by itself to solve the wide spectrum of data integration challenges. Vibe lets you master complexity and change, and it makes all data accessible. But in many places where data lives, especially some of the emerging data sources, the data is unfiltered. Unstandardized. Uncleansed. Unrelated. Some of it is even unnecessary. It takes a considerable amount of work and expertise to understand how to transform raw data into information that provides insight and value.So in addition to the enabling capabilities that Vibe delivers, you also need to layer on the data services and information solutions from a fully integrated information platform that ensures that data is: Complete: Insight comes from a complete picture, not from fragments. You have to integrate the data fragments so you are looking at a whole— a whole person, a whole account, a whole product, a whole business process, a whole organization, a whole nation —rather than pieces or parts. Timely: Different consumers and different use cases require data at different times and frequencies. You want one platform that accelerates the delivery of data when, where, and how it is needed, whether it is via messaging, bulk delivery, or through a virtual view. Trusted: If data is incomplete, inaccurate, or unrelated, it’s not of much use. You need data quality services that let you diagnose problems and then cleanse the data in a sustainable, efficient way. Authoritative: You also need master data management services to master the data and relationships that constitute the “whole” for your key business entities, even as the data fragments feeding into the “whole” continually change. Actionable: Ultimately, data needs to serve a user—whether it is a human or a machine. The platform needs to help the user understand when it needs to pay attention to an event, investigate an issue, or act. Secure: With the exponential rise in combinations of people accessing data across different systems, the potential for a security breach also rises exponentially. You must be able to secure data consistently and universally, no matter where it resides or how it is used.But it is not sufficient for an information platform to merely have a long checklist of information services. Only an information platform powered by a VDM provides the interoperability required to easily combine services on the fly to meet your specific business requirements. Only an information platform powered by a VDM can provide the right tools and capabilities for the simplest entry‐level uses to the most complex cross‐enterprise initiatives, allowing you to share work across that entire span without recoding. And only an information platform powered by a VDM has the flexibility to be deployed stand‐alone in the data center, as a cloud service, or embedded into applications, middleware infrastructure, and devices.
  • Informatica announced the launch of the PowerCenter Big Data Edition at Hadoop World with general availability in December.The PowerCenter Big Data Edition provides a proven path to innovation that lowers data management costs with benefits that include:Bringing innovative products and services to market faster and improve business operationsReducing big data management costs while handling growing data volumes and complexity Realizing performance and costs benefits by expanding adoption of Hadoop across projects Minimizing risk by investing in proven data integration software that hides the complexity of emerging technologiesPowerCenter Big Data Edition Key Features include:Universal Data AccessYour IT team can access all types of big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and others. You can also access all types of big interaction data, including social media data, log files, machine sensor data, Web sites, blogs, documents, emails, and other unstructured or multi-structured data. High-Speed Data Ingestion and ExtractionYou can access, load, replicate, transform, and extract big data between source and target systems or directly into Hadoop or your data warehouse. High performance connectivity through native APIs to source and target systems with parallel processing ensures high-speed data ingestion and extraction.No-Code ProductivityRemoving hand-coding within Hadoop through the visual Informatica development environment. Develop and scale data flows with no specialized hand-coding in order to maximize reuse. Users can build once and deploy anywhere Unlimited ScalabilityYour IT organization can process all types of data at any scale—from terabytes to petabytes—with no specialized coding on distributed computing platforms such as Hadoop.Optimized Performance for Lowest CostBased on data volumes, data type, latency requirements, and available hardware, PowerCenter Big Data Edition deploys big data processing on the highest performance and most cost-effective data processing platforms. You get the most out of your current investments and capacity whether you deploy data processing on SMP machines, traditional grid clusters, distributed computing platforms like Hadoop, or data warehouse appliancesETL on HadoopThis edition provides an extensive library of prebuilt transformation capabilities on Hadoop, including data type conversions and string manipulations, high-performance cache-enabled lookups, joiners, sorters, routers, aggregations, and many more. Your IT team can rapidly develop data flows on Hadoop using a codeless graphical development environment that increases productivity and promotes reuse.Profiling on HadoopData on Hadoop can be profiled through the Informatica developer tool and a browser-based analyst tool. This makes it easy for developers, analysts, and data scientists to understand the data, identify data quality issues earlier, collaborate on data flow specifications, and validate mapping transformation and rules logic.Design Once and Deploy AnywhereETL developers can focus on the data and transformation logic without having to worry where the ETL process is deployed—on Hadoop or traditional data processing platforms. Developers can design once, without any specialized knowledge of Hadoop concepts and languages, and easily deploy data flows on Hadoop or traditional systems. Complex Data Parsing on HadoopThis edition makes it easy to access and parse complex, multi-structured, unstructured, and industry standards data such as Web logs, JSON, XML, and machine device data. Prebuilt parsers for market data and industry standards like FIX, SWIFT, ACORD, HL7, HIPAA, and EDI are also available and licensed separately.Entity Extraction and Data Classification on HadoopUsing a list of keywords or phrases, entities related to your customers and products can be easily extracted and classified from unstructured data such as emails, social media data, and documents. You can enrich master data with insights into customer behavior or product information such as competitive pricing.Mixed WorkflowsYour IT team can easily coordinate, schedule, monitor, and manage all interrelated processes and workflows across your traditional and Hadoop environment to simplify operations and meet your SLAs. You can also drill down into individual Hadoop jobs. High AvailabilityThis edition provides 24x7 high availability with seamless failover, flexible recovery, and connection resilience. When it comes time to develop new products and services using big data insights, you can rest assured that they will scale and be available 24x7 for mission-critical operations.
  • Informatica announced the launch of the PowerCenter Big Data Edition at Hadoop World with general availability in December.The PowerCenter Big Data Edition provides a proven path to innovation that lowers data management costs with benefits that include:Bringing innovative products and services to market faster and improve business operationsReducing big data management costs while handling growing data volumes and complexity Realizing performance and costs benefits by expanding adoption of Hadoop across projects Minimizing risk by investing in proven data integration software that hides the complexity of emerging technologiesPowerCenter Big Data Edition Key Features include:Universal Data AccessYour IT team can access all types of big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and others. You can also access all types of big interaction data, including social media data, log files, machine sensor data, Web sites, blogs, documents, emails, and other unstructured or multi-structured data. High-Speed Data Ingestion and ExtractionYou can access, load, replicate, transform, and extract big data between source and target systems or directly into Hadoop or your data warehouse. High performance connectivity through native APIs to source and target systems with parallel processing ensures high-speed data ingestion and extraction.No-Code ProductivityRemoving hand-coding within Hadoop through the visual Informatica development environment. Develop and scale data flows with no specialized hand-coding in order to maximize reuse. Users can build once and deploy anywhere Unlimited ScalabilityYour IT organization can process all types of data at any scale—from terabytes to petabytes—with no specialized coding on distributed computing platforms such as Hadoop.Optimized Performance for Lowest CostBased on data volumes, data type, latency requirements, and available hardware, PowerCenter Big Data Edition deploys big data processing on the highest performance and most cost-effective data processing platforms. You get the most out of your current investments and capacity whether you deploy data processing on SMP machines, traditional grid clusters, distributed computing platforms like Hadoop, or data warehouse appliancesETL on HadoopThis edition provides an extensive library of prebuilt transformation capabilities on Hadoop, including data type conversions and string manipulations, high-performance cache-enabled lookups, joiners, sorters, routers, aggregations, and many more. Your IT team can rapidly develop data flows on Hadoop using a codeless graphical development environment that increases productivity and promotes reuse.Profiling on HadoopData on Hadoop can be profiled through the Informatica developer tool and a browser-based analyst tool. This makes it easy for developers, analysts, and data scientists to understand the data, identify data quality issues earlier, collaborate on data flow specifications, and validate mapping transformation and rules logic.Design Once and Deploy AnywhereETL developers can focus on the data and transformation logic without having to worry where the ETL process is deployed—on Hadoop or traditional data processing platforms. Developers can design once, without any specialized knowledge of Hadoop concepts and languages, and easily deploy data flows on Hadoop or traditional systems. Complex Data Parsing on HadoopThis edition makes it easy to access and parse complex, multi-structured, unstructured, and industry standards data such as Web logs, JSON, XML, and machine device data. Prebuilt parsers for market data and industry standards like FIX, SWIFT, ACORD, HL7, HIPAA, and EDI are also available and licensed separately.Entity Extraction and Data Classification on HadoopUsing a list of keywords or phrases, entities related to your customers and products can be easily extracted and classified from unstructured data such as emails, social media data, and documents. You can enrich master data with insights into customer behavior or product information such as competitive pricing.Mixed WorkflowsYour IT team can easily coordinate, schedule, monitor, and manage all interrelated processes and workflows across your traditional and Hadoop environment to simplify operations and meet your SLAs. You can also drill down into individual Hadoop jobs. High AvailabilityThis edition provides 24x7 high availability with seamless failover, flexible recovery, and connection resilience. When it comes time to develop new products and services using big data insights, you can rest assured that they will scale and be available 24x7 for mission-critical operations.
  • ETL, parsing, data quality, profiling, NLPFor talend and pentaho you need to code and now MRPowerCenter Big Data Edition reduces big data costs. Your IT team can manage twice the data volume with your existing analytics environment. You can offload data from your warehouse and source systems and offload processing to low-cost commodity hardware. High-Speed Data Ingestion and ExtractionLoad, process and extract big data across heterogeneous environments to optimize the end-to-end flow of data between Hadoop and traditional data management infrastructure.Near-Universal Data Access and Comprehensive ETL on HadoopReliably access a variety of types and sources of data using a rich library of pre-built ETL transforms for both transaction and interaction data that run on Hadoop or traditional grid infrastructure
  • By moving away from hand coding to proven data integration productivity tools, you triple your productivity—you no longer need an army of developers. This edition provides unified administration for all data integration projects. You can build it once and deploy it anywhere, which keeps costs down by optimizing data processing utilization across both existing data platforms and emerging technologies like HadoopNo-code Development EnvironmentRemoves hand-coding within Hadoop through a visual development environmentDevelop and scale data flows with no specialized hand-coding in order to maximize reuse.Virtual Data MachineBuild transformation logic once, and deploy at any scale on Hadoop or traditional ETL grid infrastructureAt a recent TDWI Big Data Summit last summer, eHarmony presented their Informatica Hadoop implementation. There was a question from the audience that asked, “How many new resources did you need to hire to implement this on Hadoop?”. The Director of IT at eHarmony, said “None”
  • Informatica® PowerCenter® Big Data Edition is the safe on-ramp to big data that works with both emerging technologies and traditional data management infrastructure. With this edition, your IT organization can rapidly create innovative products and services by integrating and analyzing new types and sources of data. It provides a proven path of innovation while lowering big data management costs and minimizing risk. With Big Data you don’t always know what you are looking for. Instead of being given the requirements from the business for a report you are instead tasked with a business goal such as increase customer acquisition and retention or improve fraud detection.With this goal in mind and a wealth of big transaction data, big interaction data, and big data processing technologies how can you achieve this goal cost-effectively?Let’s consider an online retailer having several big data projects at various stages of implementation to increase customer acquisition & retention, increase profitability and improve fraud detectionSince we don’t necessarily have a well-defined set of requirements we need to create a sandbox environment where data science teams can play and experiment with big data.A team of data scientists, analysts, developers, architects, and LOB stakeholders collaborate within the sandbox to discover insights that will achieve the goals of each project.This requires us to access and ingest in this case customer transaction information from the ERP and CRM systems, web logs from our online store, social data from twitter, facebook, and blogs, and geo-location data from mobile devices.The data science team goes through an iterative process of accessing, preparing, profiling, and analyzing data sets to discover patterns and insights that could help achieve the business goals of the project.However, what many people fail to acknowledge is that 70-80% of this work is accessing and preparing the data sets for analysis. This includes parsing, transforming, and integrating a variety of disparate data sets coming from different platforms, in different formats, and at different latencies.DJ Patil, Data Scientist in Residence at Greylock (a VC firm) stated in his book – “Good data scientists understand, in a deep way, that the heavy lifting of cleanup and preparation isn’t something that gets in the way of solving the problem: it is the problem.”The data science team may discover a few insights which they need to test, validate, and measure the impact to the business. They might apply techniques such as A/B testing to determine which algorithms and data flows produce the best results for a stated goal like increase customer share of wallet with next best offer recommendations or increase profitability through pricing optimization or identify trends & reduce false positives related to fraud detection.Once organizations overcome the hurdles of accessing, preparing, and integrating data sets to discover these insights they then face the challenge of operationalizing the insights. This is where organizations seem to struggle in turning insights into real business value.To turn insights into business value means the insight needs to be delivered reliably to the point of use whether it be a report, enterprise or web app, or part of an automated workflowFor example…The Fraud department needs to be notified in real-time if fraud is suspected or if there is a spike in a particular region that is seeing an upward trend of fraudCustomers shopping on an eCommerce website need to see next best offers in real-time as they click through the websiteThe customer service rep needs to immediately know if a customer is likely to churn when a customer calls or files an online complaintPricing optimization needs to be delivered directly to the sales rep via a CRM mobile app based on customer location, purchase history, demographics, etc.Too many organizations end up rebuilding and hand-coding the data flows created during design and analysis when it comes time to deploy to production Informatica is a metadata driven development environment that provides near universal data access, hundreds of pre-built parsers, transformations, and data quality rules and the data flows created during design and analysis can be easily extended and deployed for production use.Another benefit is that datasets and data flows can easily be shared across projectsThis helps an organization to be agile and rapidly innovate with big dataThe datasets used in design and analysis may not have been optimized for a production data flow. PowerCenter Big Data Edition provides highly scalable throughput data integration to handle all volumes of data processing with high performancePowerCenter Big Data Edition separates the design from deployment and as we have seen enables organizations to have a consistent & repeatable process, reuse data flows to maximize productivity, scale performance using high performance distributed computing platforms, ensure 24x7 operations with high availability, flexiblity to change as data volumes continue to grow and as new data sources are added, deliver and analyze data at various latencies, easier to support and maintain than hand-coding, all while controlling costs and cost-effectively optimizing processing
  • Subset production data and mask sensitive data in non-production systems
  • But once you discover thesetreasures of data can you trust its origins and the transformations that have been applied?On this island we have found a treasure of valuable data combined with hazardous waste of bad data. How can we extract the gold and get rid of the waste?Do you know where the treasure of data came from? Is it authentic?In order to trust the data you need to know where it came from and what was done to the dataIts also unfortunate that data management teams end up recreating datasets over and over that have already been normalized, cleansed, and curated using up a lot of storage and resources. I’d like to recommend that you commit to data governance to improve your business processes, decisions and interactionsWe talk about managing data as asset but what does that really mean. You need a process to effectively govern your data so you can deliver trusted and reliable data.First you need to decide the cost of bad data – for example the cost of having bad customer addresses or duplicate parts could be millions The process starts iteratively with the discovery and definition of data so that you know what you have in terms of data definitions, domains, relationships, and business rules, etc.For example one company had people showing up to meetings with different numbers related to claims payments. The problem wasn’t in the data. The problem was that the people were from 3 different departments with three different definitions of the dataTherefore the Business and IT require a process and tools to efficiently collaborate and automate steps that continuously improve the quality of data over time.In order to manage data governance effectively and support continuous improvement requires KPI dashboards, proactive monitoring, and a clear ROI
  • Enrich master data with customer behavior insights, relationships, and key influencers so you can proactively engage with customers and increase upsell/cross-sell opportunities
  • Identify unused and unnecessary data to drive data retention policiesAssess data usage and performance metrics to focus optimizationArchived 13 TB of data in first 2 months and continue to retire data monthlyPhase 2: Offload data and processing to Hadoop
  • Why INFAEase of use for developers and administratorsEasy to scale performance and at a comparatively lower cost using PowerCenter grid with commodity hardwareCould standardize on one data integration platform for all data movement use-cases and requirementsComprehensive data integration platform for big data processing, batch and real-time data movement, metadata management, profiling, test data management, and protecting sensitive dataThe Challenge: Company is growing fast with data volumes and processing loads increasing. Ever increasing demand for data driven decision making , analytics and reporting. Inability to scale legacy systems due to cost as well as time factors (not being plug and play). Needed to standardize on a single platform/vendor to meet the various data related needs – ETL, Metadata, Masking, Subsetting, Real Time. (need to wordsmith this)The Result: Ability to scale easily by adding incremental nodes in comparatively shorter time periods. Reduction in hardware cost due to commodity hardware stack.Phase 1:PowerCenter Grid and HA implementationSeveral site facing OLTP Oracle DBsSeveral Oracle Data Marts and a Petabyte scale Teradata EDWTransactional Data, Behavioral Data , Web LogsProcess a few terabytes of incremental data every day through PowerCenter GridImplemented a single domain dual data center PowerCenter Grid (Primary vs. DR)Currently Active/Passive, will eventually become active/active and expand with further node additions.Commodity Linux machines with 64GB memory with shared NFS file system mounted across all nodes within a data center.Multiple Integration services assigned to the GRID, with the repository DB running on a dedicated DB.Grid requirementsHighly available data movement/data integration environment.Ability to scale horizontally with out having to extensively re-architect application design.Ability to automatically load balance.Ability to recover automatically in case of system errors.Phase 2:Grow PowerCenter grid to increase processing capacity to meet growing data volumes/ reduced processing times.Current BenefitsAbility to scale easily by adding incremental nodes in comparatively shorter time periods.Reduction in hardware cost due to commodity hardware stack.Future BenefitsExpect to reduce time to perform impact/lineage analysis when we implement metadata solution.Expect to re-use profiling information when we implement profiling solution.Expect to perform more comprehensive testing much faster when we implement masking/ sub setting.Expect to reduce batch loads from 30 minutes to a few seconds for fraud detection when we implement Ultra MessagingParticipate in PowerCenter on Hadoop Beta Testing program.Today use PERL scripts to process Web logs and move results into TeradataCurrently looking at utilizing Hadoop for various text and log data mining and analysis capabilities for things such as Risk Monitoring, Behavior Tracking and  for various marketing related activities. We believe that using Hadoop for low cost big data analysis/processing alongside capabilities of Infa Grid to deliver mission critical data to our datamarts would be complementary to each other while allowing us to maintain metadata and other operational capabilities within a single integrated platform.
  • Continuously collect all data from all carsAll cars by end of year will transmit data to central Teradata data warehouseReal-time data integration using PowerCenter, CDC, CEP
  • Do we want these after the final slide (currently Connect with Perficient)?

Integrate Big Data into Your Organization with Informatica and Perficient Integrate Big Data into Your Organization with Informatica and Perficient Presentation Transcript

  • Integrate Big Data into Your Organizationwith Lower Total Costs
  • 2Perficient is a leading information technology consulting firm serving clientsthroughout North America.We help clients implement business-driven technology solutions that integratebusiness processes, improve worker productivity, increase customer loyalty and createa more agile enterprise to better respond to new business opportunities.About Perficient
  • 3• Founded in 1997• Public, NASDAQ: PRFT• 2012 revenue of $327 million• Major market locations throughout North America• Atlanta, Austin, Boston, Charlotte, Chicago, Cincinnati, Cleveland, Columbus,Dallas, Denver, Detroit, Fairfax, Houston, Indianapolis, Minneapolis, NewOrleans, New York, Northern California, Philadelphia, Southern California, St.Louis , Toronto, and Washington, D.C.• Global delivery centers in China, Europe and India• ~2,000 colleagues• Dedicated solution practices• ~85% repeat business rate• Alliance partnerships with major technology vendors• Multiple vendor/industry technology and growth awardsPerficient Profile
  • 4Business Solutions• Business Intelligence• Business Process Management• Customer Experience and CRM• Enterprise Performance Management• Enterprise Resource Planning• Experience Design (XD)• Management ConsultingTechnology Solutions• Business Integration/SOA• Cloud Services• Commerce• Content Management• Custom Application Development• Education• Information Management• Mobile Platforms• Platform Integration• Portal & SocialOur Solutions Expertise
  • SpeakersRandall Gayle• Data Management Director for Perficient• 30+ years of data management experience• Helps companies develop solutions around master datamanagement, data quality, data governance and dataintegration.• Provides data management expertise to industries includingoil and gas, financial services, banking, healthcare,government, retail and manufacturing.John Haddad• Senior Director of Big Data Product Marketing for Informatica• 25+ years of experience developing and marketingenterprise applications.• Advises organizations on Big Data best practices from amanagement and technology perspective.5
  • Interesting Facts about BIG Data1. It took from the dawn of civilization to the year 2003 for the world to generate 1.8zettabytes (10 to the 12th gigabytes) of data. In 2011 it took two days on average togenerate the same amount of data.2. If you stacked a pile of CD-ROMs on top of one another until you’d reached the currentglobal storage capacity for digital information – about 295 Exabyte – if would stretch80,000 km beyond the moon.3. Every hour, enough information is consumed by internet traffic to fill 7 million DVDs.Side by side, they’d scale Mount Everest 95 times.4. 247 billion e-mail messages are sent each day… up to 80% of them are spam.5. 48 hours of video are uploaded to YouTube every minute, resulting in 8 years’ worth ofdigital content each day6. The world’s data doubles every two years7. There are nearly as many bits of information in the digital universe as there are stars inour actual universe.8. There are 30 billion pieces of content shared on Facebook every day and 750 millionphotos uploaded every two days6
  • Agenda• Innovation vs. Cost• PowerCenter Big Data Edition• What else does Informatica offer for Big Data?• What Are Customers Doing with Informatica and BigData?• Next Steps• Q&A7
  • How do you balance innovation andcost?
  • BusinessCEO and VP/Director ofSales & Marketing,Customer Service,Product DevelopmentINNOVATIONHow do you balance innovation andcost?
  • ITCIO and VP/Director ofInformation Management,BI / Data Warehousing,Enterprise ArchitectureBusinessCEO and VP/Director ofSales & Marketing,Customer Service,Product DevelopmentCOSTINNOVATIONHow do you balance innovation andcost?
  • Financial Services Retail & Telco Media & EntertainmentPublic SectorManufacturing Healthcare & PharmaBusiness is connecting innovation to Big Data
  • Risk & PortfolioAnalysis,InvestmentRecommendationsProactive CustomerEngagement,Location BasedServicesFinancial Services Retail & TelcoPublic SectorManufacturing Healthcare & PharmaMedia & EntertainmentOnline & In-GameBehaviorCustomer X/Up-SellBusiness is connecting innovation to Big Data
  • Risk & PortfolioAnalysis,InvestmentRecommendationsConnected Vehicle,Predictive MaintenanceHealth InsuranceExchanges,Public Safety,Tax OptimizationFraud DetectionPredicting PatientOutcomes,Total Cost of CareDrug DiscoveryProactive CustomerEngagement,Location BasedServicesFinancial Services Retail & TelcoPublic SectorManufacturing Healthcare & PharmaMedia & EntertainmentOnline & In-GameBehaviorCustomer X/Up-SellBusiness is connecting innovation to Big Data
  • IT is struggling with the cost of Big Data• Growing data volume isquickly consuming capacity
  • • Growing data volume isquickly consuming capacity• Need to onboard, store, &process new types of dataIT is struggling with the cost of Big Data
  • • Growing data volume isquickly consuming capacity• Need to onboard, store, &process new types of data• High expense and lack ofbig data skillsIT is struggling with the cost of Big Data
  • Big DataAnalysisBig DataIntegration &QualityBig Data Projects
  • Big DataAnalysisBig DataIntegration &Quality80% of the work in Big Data projects isdata integration and data qualityBig Data Projects
  • PowerCenter Big DataEdition
  • T i m ea v a i l a bl e f o rd a t aa n a l y s isT i m e s p e n t o n d a t ap r e p a r a t i o n (p a r s e ,p r o f i l e , c l e a n s e ,t r a n s f o r m , m a t c h )WithoutPowerCenterBig Data Edition
  • T i m ea v a i l a bl e f o rd a t aa n a l y s isT i m e s p e n t o n d a t ap r e p a r a t i o n (p a r s e ,p r o f i l e , c l e a n s e ,t r a n s f o r m , m a t c h )WithoutPowerCenterBig Data EditionWithPowerCenterBig Data Edition
  • Informatica + HadoopPowerCenter Developers are Now Hadoop DevelopersTransactions,OLTP, OLAPSocial Media, Web LogsMachine Device,ScientificDocuments and EmailsAnalytics & OpDashboardsMobileAppsReal-TimeAlertsArchive Profile Parse ETL Cleanse Match
  • 23The Vibe Virtual Data MachineOptimizerVirtual Data MachineExecutorConnectorsTransformation LibraryDefines logicDeploys most efficientlybased on data, logic andexecution environmentRun-time physicalexecutionConnectivity to datasources
  • 24Virtual Data MachineInformationExchangeMaster DataManagement3rd PartySolutionsData Integration Data QualityInformationLifecycleInfrastructureservicesRole-basedtoolsINFORMATIONSOLUTIONSAND DATASERVICESVibe Virtual Data MachineMap Once. Deploy Anywhere.DEPLOYANYWHERECloudEmbeddedDQ in appsDataVirtualizationServerDesktop HADOOPDataIntegrationHub
  • PowerCenter Big Data EditionThe Safe On-Ramp To Big DataBig Transaction Data Big Interaction DataOnline TransactionProcessing (OLTP)OracleDB2IngresInformixSysbaseSQL Server…CloudSalesforce.comConcurGoogle App EngineAmazon…Other Interaction DataClickstreamimage/TextScientificGenomoic/pharmaMedicalMedical/DeviceSensors/metersRFID tagsCDR/mobile…Social Media & Web DataFacebookTwitterLinkedinYoutube…Big Data ProcessingOnline AnalyticalProcessing (OLAP) &DW AppliancesTeradataRedbrickEssBaseSybase IQNetezzaExadataHANAGreenplumDataAllegroAsterdataVerticaParaccel …Web applicationsBlogsDiscussion forumsCommunitiesPartner portals…
  • PowerCenter Big Data EditionThe Safe On-Ramp To Big DataBig Transaction Data Big Interaction DataOnline TransactionProcessing (OLTP)OracleDB2IngresInformixSysbaseSQL Server…CloudSalesforce.comConcurGoogle App EngineAmazon…Other Interaction DataClickstreamimage/TextScientificGenomoic/pharmaMedicalMedical/DeviceSensors/metersRFID tagsCDR/mobile…Social Media & Web DataFacebookTwitterLinkedinYoutube…Big Data ProcessingOnline AnalyticalProcessing (OLAP) &DW AppliancesTeradataRedbrickEssBaseSybase IQNetezzaExadataHANAGreenplumDataAllegroAsterdataVerticaParaccel …Web applicationsBlogsDiscussion forumsCommunitiesPartner portals…Universal Data AccessHigh-Speed DataIngestion andExtractionETL on HadoopProfiling on HadoopComplex DataParsing on HadoopEntity Extraction andData Classification onHadoopNo-Code ProductivityBusiness-ITCollaborationUnified Administrationthe VibeTM virtualdata machinePowerCenterBig Data Edition
  • PowerCenter Big Data EditionLower CostsTransactions,OLTP, OLAPSocial Media, Web LogsMachine Device,ScientificDocuments and EmailsEDWDataMartDataMartOptimize processing onlow cost hardwareIncrease productivity up to 5XTraditional Grid
  • Traditional GridDeploy On-Premise orin the CloudQuickly staff projectswith trained expertsMap Once. Deploy AnywhereTMPowerCenter Big Data EditionMinimize Risk
  • PowerCenter Big Data EditionInnovate FasterTransactions,OLTP, OLAPSocial Media, Web LogsMachine Device,ScientificDocuments and EmailsAnalytics & OpDashboardsMobileAppsReal-TimeAlertsOnboard and analyze any type ofdata to gain big data insightsDiscover insights faster throughrapid development and collaborationOperationalize big data insights togenerate new revenue streams
  • • Currently using Hadoop?• Plan to implement Hadoop in 3-6 months• Plan to implement Hadoop in 6-12 months• No plans for Hadoop30What are your plans for Hadoop? (select one)Poll Question #1
  • What Else Does InformaticaOffer for Big Data?
  • Inactive dataActive dataPerformanceT I M EDATABASESIZEEnterprise DataWarehouseTransactions,OLTP, OLAP• Identify dormant data• Archive inactive data to low-cost storageLower Data Management Costs
  • Active dataT I M EDATABASESIZEEnterprise DataWarehouseLow-CostStorageArchiveTransactions,OLTP, OLAPLow-CostStorageArchive• Identify dormant data• Archive inactive data to low-cost storageLower Data Management Costs
  • DataMartDataMartDataMartDataMartDataMartDataMartDataMartDataMartDataMartEDWBI Reports /DashboardsODS MDM• Avoid copies of data and augment the data warehouse usingdata virtualization• Role-based fine-grained secure accessMinimize Risk
  • EDWBI Reports /DashboardsODS MDM• Avoid copies of data and augment the data warehouse usingdata virtualization• Role-based fine-grained secure accessMinimize RiskDynamic Data MaskingData Virtualization
  • Production(ERP, CRM,EDW, Custom)BI Reports /DashboardsDevelopmentTest• Mask sensitive data in non-production systemsMinimize RiskTraining
  • ApplyDataGovernanceApplyMeasureandMonitorDefineDiscoverDiscover• Data discovery• Data profiling• Data inventories• Process inventories• CRUD analysis• Capabilities assessmentDefine• Business glossary creation• Data classifications• Data relationships• Reference data• Business rules• Data governance policies• Other dependent policiesMeasure and Monitor• Proactive monitoring• Operational dashboards• Reactive operational DQ audits• Dashboard monitoring/audits• Data lineage analysis• Program performance• Business value/ROIApply• Automated rules• Manual rules• End to end workflows• Business/IT collaborationInnovate Faster With Big Data
  • • Enrich master data to proactively engagecustomers & improve products and servicesInnovate Faster With Big Data
  • • Analyze data in real-time using event-basedprocessing and proactive monitoringInnovate Faster With Big DataCustomerBusiness RulesSocial DataAlertGeo-locationDataTransaction DataMerchantOffers
  • • Data archiving• Data masking• Data virtualization• Data quality• Data discovery• MDM• Real-time event based processing40What other data management technologies are youconsidering within the next 12 months? (check all that apply)Poll Question #2
  • What Are Customers Doing withInformatica and Big Data?
  • The Challenge. Data volumes growing at 3-5 times over the next 2-3 yearsThe Solution The Result• Manage data integrationand load of 10+ billionrecords from multipledisparate data sources• Flexible data integrationarchitecture to supportchanging businessrequirements in aheterogeneous datamanagementenvironmentFlexible architecture to support rapid changesEDWMainframeDataVirtualizationRDBMSUnstructuredDataBusinessReportsTraditional GridLarge Government AgencyDWDW
  • The Challenge. Data warehouse exploding with over 200TB of data. User activitygenerating up to 5 million queries a day impacting query performanceThe Solution The Result• Saved $20M + $2-3Mon-going by archiving& optimization• Reduced projecttimeline from6 months to 2 weeks• Improvedperformance by 25%• Return on investmentin less than 6 monthsLower costs of Big Data projectsERPCRMCustomBusinessReportsArchivedDataInteraction DataLarge Global Financial InstitutionEDWArchivedData
  • Web LogsTraditional GridNear Real-TimeThe Challenge. Increasing demand for faster data driven decision making and analyticsas data volumes and processing loads rapidly increaseThe Solution The Result• Cost-effectively scaleperformance• Lower hardware costs• Increased agility bystandardizing on onedata integrationplatform• Leverage new datasources for fasterinnovationLower costs and minimize riskDatamartsDataWarehouseRDBMSRDBMSLarge Global Financial Institution
  • The Challenge. Collect data in real-time from all cars by end of the year for“Connected Car” programThe Solution The Result• Helps enable goals ofconnected vehicle program:• Embedding mobiletechnologies to enhancecustomer experience• Predictive maintenanceand improved fuelefficiency• On call roadsideassistance and autoscheduling serviceCreate Innovative Products and ServicesConnected Vehicle ProgramBusinessReportsLarge Global Automotive ManufacturerEDWComplexEventProcessing
  • Next Steps3
  • What should you be doing?• Tomorrow– Identify a business goal where data can have a significant impact– Identify the skills you need to build a big data analytics team• 3 months– Identify and prioritize the data you need to achieve your businessgoals– Put a business plan and reference architecture together to optimizeyour enterprise information management infrastructure– Execute a quick win big data project with measurable ROI• 1 year– Extend data governance to include more data and more types of datathat impacts the business– Consider a shared-services model to promote best practices andfurther lower infrastructure and labor costs
  • Questions?