Database and Business Intelligence – Karen Hsu, Informatica
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Database and Business Intelligence – Karen Hsu, Informatica

on

  • 662 views

 

Statistics

Views

Total Views
662
Views on SlideShare
662
Embed Views
0

Actions

Likes
0
Downloads
12
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Key Points: Data is fragmented - everywhere. Explosion of data sources both inside and outside the firewall. Need to seamlessly connect and move trusted data around the extended enterprise. Change is continuous. Example text: Today’s information economy is enabled through data. Data is the lifeblood of any corporation, it is the currency of the modern economy and over the last 10 years we have seen an explosion of data across the enterprise. Within the traditional enterprise (inside a corporation firewall) there are large quantities of data held within operational databases. However, alongside those are enormous quantities of valuable data held within unstructured data sources such as spreadsheets, PDF’s, word processing documents. This traditional environment is changing continually as vendor consolidation occurs alongside mergers & acquisitions and business modernization. However, alongside this change we are also seeing an explosion taking place outside the firewall as corporations turn to outsourcing and software-as-a-service to augment traditional computing delivery. In addition they are increasing interacting with partners through supply chains or partner networks to facilitate business differentiation. Financial institutions are needing to adhere to SEPA standards, or Healthcare to NAACHA. This expanded workplace and global immediacy of information is placing huge pressures on IT. Within the traditional enterprise corporations need to integrate data from disparate systems and ensure quality. They also need to integrate outsourced solutions with their mainstream computing solutions in order to facilitate seamless quote to cash processes. A the same time they need to be able to move data seamlessly between partner networks to optimise supply chain processes. Finally every piece of data must be as accurate as possible across the entire enterprise. Transition: So how can corporations make sense of all of this and ensure alignment of IT in delivering the trusted information required to enable corporations to compete globally?
  • Key Points: We provide a commercially available platform for data integration It is a platform that empowers your business to deliver on their business imperatives. Example Text: Informatica provides the software and services to manage the processes involved in all of the data integration projects that we have discussed. Our platform comprises a number of tightly integrated solutions called Powercentre, Powerexchange, Data Explorer, Data Quality and B2B Exchange. Through our data integration platform organizations can close the data value gap by reducing costs and expediting the time to address data issues of any complexity and scale. Through our Velocity methodology you have a rapid and repeatable process for analysing, designing and implementing data integration projects Transition: I’d like to spend a few moments now looking at some customers who have accomplished key business imperatives with Informatica.
  • Typical 100s of formats, legacy, and industry standards e.g. SWIFT MT and MX, and ISO20022, EDIFACT etc. Implemented multiple times. Multiple channel types and format combinations. Often format and channel type inextricably linked. Unused combinations don’t work. Lack of standard implementations formats and channels. Multiple implementations of SFTP usage and SWIFT MT etc. Some channels etc limited volume capability. Implemented before volume growth. Streamlined Single implementation of each format. Easy to link one format to another via common central format. Format and channels separated so any format can be received/send from any channel Fewer transformations required to link internal systems using legacy formats. Standard usage for formats, e.g. SWIFT FIN 50F .vs. 50L (Instructing Party/Ordering Customer) different usages in different systems. As a result of streamlined… Costs reduced. Less complexity and consequent maintenance. Less infrastructure, machines etc. Less operational overhead. Single platform to run. Less work developing single implementations of formats and channels. Reduced Complexity. Reduces Time to Market. Complexity slows development and testing. Improves STP less scope for unforeseen combination in data flows (format and channel combinations) causing failures. Single Data Processing Hub All data passed though one hub in consistent formats allows corporate dashboard style analysis allowed fact based business decisions.
  • How is this different: xbrl to database…, now that it is in the db there many different tools Proprietary to SWIFT SWIFT MT to MX Documents (excel, PDF)
  • Receive “paper” (fax, email, pdf, excel, Word) Process Order Manually
  • What is Data Exchange, and how does it fit? DX sits between the internal systems and the external partners. When we look at the components of DX at a very high level, we see PowerCenter at it’s core. How does data typically get delivered from external partners? As files. Since files are involved, what do we need? DT to handle the translation and validation. Why not just sell PowerCenter with Data Transformation? Because these exchanges are operational and require tools for management and monitoring – both by operators and by the business. Finally, since DX is built on the Platform with PowerCenter, customers can take advantage of all other Informatica Platform components – including PowerExchange, IDQ, IIR, MM, Data Masking, etc. The same technology used for internal system integration is used for multi-enterprise integration.
  • Typically a data analyst / Subject Matter Expert profilesthe data. This involves accessing the data and running a series of prebuilt rules using the metadata and data. The user then “surf” the results and tags the data where anomolies exist. A repository / database of tags are then available for the next stage of the project – which may be a (i) cleansing (ii) data movement project. The next stage of the project is managed by a developer. The developer uses the anomolies to ensure cleansing or transformation rules take into acccount all the issues identified in the data.
  • In order to implement a data quality process, the first step is to define what you mean by Data Quality. A potential data quality dimension framework includes a range of parameters which can be used to identify and categorise data quality issues. So if someone says “our data is not good”, we can now investigate further and describe the levels of data quality with tangible numbers e.g. data quality equals 80% could be an aggregate of completeness of key attributes = 70%, lack of conformity = 75%, duplicates of 20%, etc. Typically customers take this framework and edit the framework to suit their own company based on maturity and priorities at any point on time. So the framework needs to be customized per organization and agreed on by both the business and IT. Question: Which of these data quality dimensions describe the problems you have? Transition: Lets take an example:
  • Here is an example which applies data quality dimensions to some finance data. 1. Completeness - Empty or default values in fields 2. Conformity – this is all to do with the format of raw data with a field. Content related issues such as incorrect format in field e.g. a name prefix in the customer name field, noise around telephone numbers. 3. Consistency – look down a column everything looks okay – look across two columns and there’s a problem e.g. person coded as a company, company name coded as a person, last invoice date 2 years ago however the customer status being “live”. 4. Duplicates records– unique ID associated with each record, however when you look more closely at the raw contents of the other fields, there is a high degree of probability of duplicates. 5. Integrity – associated with integrity of relationships i.e. using fuzzy matching to identify records which should be linked to each other. 6. Accuracy – comparing data with a reference source – e.g. using a lookup table for an exact match.
  • Using data quality dimensions, the Subject matter expert then configures DQ rules that generate data quality dimension flags – used to create initial scorecards to help build the business case – but also as a basis for web based reporting – - which is configured and deployed by a developer.
  • Once the DQ dimension framework has been applied to the data, the priority fields will be identified and the priority data quality rules configured to “fix” or “remedy” the issues. The SME knows the rules and the developer must configure them. Key points: - IDQ supports all types of master data i.e. customer/supplier/product/asset/finance – i.e. data quality rules can be applied to any attribute in the cleansing, parsing, standardization and matching processes.
  • Here is the High level Process Flow Step 1 Source is fed into IDQ. Step 2 IDQ applies Data Quality rules. Data is split into exception records and passed records. Step 3 Passed records go straight to Target location. Step 4 Exception records go to staging area accessible by DQA . This is where Users Manage Bad records or Consolidate suspect duplicates. Once users are satisfied that the cleansed and consolidated records, the records can then be pushed out to target
  • Finally, data quality metrics are measured and tracked over time via browser based reporting and trending. Reports can be configured per user requirements, alerts can be automated and drilldown can be enabled for viewing the low quality records.
  • How is this different: xbrl to database…, now that it is in the db there many different tools Proprietary to SWIFT SWIFT MT to MX Documents (excel, PDF)

Database and Business Intelligence – Karen Hsu, Informatica Presentation Transcript

  • 1. xBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
  • 2. Extend Data Transformation and Exchange The Informatica Economy Enabled Through Data On-demand Integration Enterprise Data Integration B2B Data Integration Data Quality Across the Whole Enterprise Traditional Enterprise On-demand Enterprise (SaaS) Partner Trading Enterprise (B2B) SWIFT SEPA xBRL
  • 3. The Information Platform To Enable Business Imperatives On-demand Integration Enterprise Data Integration B2B Data Integration Data Quality Across the Whole Enterprise Informatica On Demand Improve Decisions & Regulatory Compliance Modernize Business & Reduce IT Costs Facilitate Mergers & Acquisitions Improve Customer Service & Operational Efficiency Outsource Non-core Functions Increase Partner network Efficiency Access Discover Cleanse Integrate Deliver Develop + Manage Audit + Monitor + Report Data Quality Data Explorer PowerCenter – Metadata Manager PowerExchange B2B Exchange B2B Exchange
  • 4. Vision: Enterprise Wide Transformation
    • 100s of formats
    • Multiple interaction channels
    • 1000s of combinations of formats, protocols, rules between channels
    • Lack of data and process standardization
    • Unable to support growth in volume
    Typical Infrastructure Streamlined Processing Reduce processing costs and costs of integrating data across product lines to target new or existing relationships Payroll PO Order Tracking REPORTING EDIFACT CUSTOMER SWIFT PARTNER TRADE PO
  • 5. Accelerate Adoption of xBRL
    • Reduce risk
      • Leverage existing expertise and infrastructure for lower TCO
      • Pre-built transformations to industry standards
      • Best-in-class transformation specification environment
    • Accelerate xBRL usage from within to outside
      • Transform any file to and from xBRL
        • Documents (excel, PDF, word)
        • Proprietary, COBOLPL1
        • Complex positional and delimited files
        • Industry standard interoperability (standard to standard, version to version, e.g SWIFT MT, MX, BAI)
      • Ensure quality of data
      • Manage and monitor data coming from sources and going to destinations
  • 6. Before: Shenzen Stock Exchange Issuers Stock Exchange Custodian Brokers Fax SFTP SMTP MT 564 Fax Enterprise actions DB SMTP 20+ IT persons full time Business user time lost in error analysis Flat file Flat file Flat file PDF PDF Flat file PDF
  • 7. Shenzen Stock Exchange with Informatica Issuers Stock Exchange Custodian Brokers Fax SFTP SMTP MT 564 Web Portal SMTP B2B Data Transformation Data Quality Enterprise actions DB B2B Data Exchange Sources Data Feeds management Monitoring Flat file Flat file PDF Flat file PDF
  • 8. Automatic extraction of Annual Report PDF to XML canonical format Semi Annual report
  • 9.
    • Best-of-breed transformation for unstructured (PDF) and financial reporting formats
    • Leverage existing resources familiar with PowerCenter for end to end data integration
    Accelerate time-to-market of credit ratings KEY BUSINESS IMPERATIVE B2B ADVANTAGE RESULTS/BENEFITS
    • Enable customers to respond quickly to market changes
    • Replace manual process of copying and pasting from PDF files to analytics application
    • Replace hand coded excel extraction VB macros
    THE CHALLENGE
    • Need to analyze raw data coming from 10,000 different formats every month
    • 40-80 people required per month to extract the data from these formats, delaying the publishing of securities ratings
    • Eliminated need for 40-80 people every month to extract data with up front investment of 60 days
    • Ability to support any new format or industry standard quickly
    ref123 Large Credit Rating Agency
  • 10. B2B Data Exchange Internal Systems External sources Data Warehouse Email B2B Data Exchange PowerCenter Monitoring Partner Management Data Transformation Adapters Quality Identity Managed File Transfer SWIFT
  • 11. Business and IT Collaboration Step 1: Profile the Data with Informatica Data Explorer
    • Catalog of Issues
      • Completeness
      • Conformity
      • Consistency
      • Accuracy
      • Duplicates
      • Dependencies
    • Cleansing specifications
    • Transformation specifications
    Profiles and tags anomalies with Informatica Data Explorer (IDE) Reviews anomalies with Informatica Data Explorer (IDE) SME Developer
  • 12. Data Quality Dimensions Framework Completeness What data is missing or unusable? Conformity What data is stored in a non-standard format? Consistency What data values give conflicting information? Accuracy What data is incorrect or out of date? Duplicates What data records or attributes are repeated? Integrity What data is missing or not referenced?
  • 13. Finance Data - Example COMPLETENESS CONFORMITY CONSISTENCY DUPLICATION INTEGRITY ACCURACY
  • 14. Business and IT Collaboration Step 2: Continued. Establish Metrics and Define Targets Create scorecards (i) for the initial business case and (ii) as the basis for web based monitoring Configures and deploys web based data quality reports for monitoring on an ongoing basis – using Informatica Reporting or a 3 rd party reporting tool SME Developer
  • 15. Business and IT Collaboration Step 3: Design and Implement Data Quality Rules
    • Standardize values
    • Correct missing or inaccurate data
    • Enrich the data
    • Identify and consolidate duplicate records
    Assists with design of Data Quality rules in Informatica Data Quality (IDQ) Creates Data Quality rules with Informatica Data Quality (IDQ) SME Developer
  • 16. Sources Data Quality Checks Exception Management Records that passed DQ rules Target High Quality Data Low Quality Data Exception Management process Step 4: Manage exceptions and consolidate Cleansing and Matching Rules Exceptions Data Quality Assistant Browser based exception review and manual consolidation process
  • 17. Business and IT Collaboration Step 5: Monitor Data Quality Versus Targets
    • Informatica’s scorecarding and reporting also support monitoring of quality over time
    • Companies can utilize Informatica’s toolset to perform the monitoring or utilize 3rd party reporting tool
    Monitors the quality of data to determine if objectives are being met. Uses results to support root cause analysis Review results to determine if rule enhancement is necessary as part of the automated data quality process SME Developer
  • 18. Accelerate Adoption of xBRL
    • Reduce risk
      • Leverage existing expertise and infrastructure for lower TCO
      • Pre-built transformations to industry standards
      • Best-in-class transformation specification environment
    • Accelerate xBRL usage from within to outside
      • Transform any file to and from xBRL
        • Documents (excel, PDF, word)
        • Proprietary, COBOLPL1
        • Complex positional and delimited files
        • Industry standard interoperability (standard to standard, version to version, e.g SWIFT MT, MX, BAI)
      • Ensure quality of data
      • Manage and monitor data coming from sources and going to destinations
  • 19. Thank you