Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rapid data integration and curation


Published on

Organizations must onboard new data sources more frequently and quickly. In this presentation, you will learn about practices that rapidly deliver business value, while shrinking time to business value from months to days.

Business decisions are becoming increasingly dependent on analyzing an ever-greater volume of data coming from a growing number of sources. Mobile technology is providing immediate access to data whenever and wherever it is needed. Users, customers, and business partners are waiting for answers, and the organization must reduce the time required to collect, understand, and analyze the data needed to provide those answers. Modern enterprises need to increase the agility, flexibility, and speed with which they can analyze a growing volume, variety, and velocity of data.

This presentation discusses a method for rapid data integration and curation:

- Techniques for light data integration of new data with existing data assets
- Framework for data quality management
- Refining data integration through evolutionary modeling
- Managing curation processes
- Validating business value

Timely delivery of new data assets allows users to begin asking questions earlier and getting answers more quickly, allowing the organization to uncover the new insights that drive lasting business benefits.

Published in: Technology, Education
  • Be the first to comment

Rapid data integration and curation

  1. 1. Rapid Data Integration and Curation Delivering Business Value in the First 24 Hours SPEAKER: Thomas Kelly, Practice Director Semantic Technology Center of Excellence Enterprise Information Management Cognizant Technology Solutions, Inc. | ©2013, Cognizant
  3. 3. We are at an Inflection Point at which Value is Created or Destroyed Source : The Motley Fool 3 | ©2013, Cognizant
  4. 4. Delivering Information Faster Produces Direct, Measurable Business Value What Difference Does One Day Make? A blockbuster drug generates $3M+ in revenue per day; a one-day delay in completing clinical trials can generate up to $500K in additional costs Banking A moderate-sized brokerage firm can generate up to $1M in financial services revenue per day 4 | ©2013, Cognizant
  5. 5. Barriers to Rapid Data Integration Rework is expensive – must “get it right” from the start Fit with the existing data; avoid data silos | ©2013, Cognizant Reconciling differences (data formats, coding, identifiers, etc.) Managing data quality (accuracy, precision, context) 5 Knowledge acquisition takes time; new insights come from experimentation Overcoming process inertia
  6. 6. Evolutionary Method to Data Integration and Curation Responsive Data Approach • As new information flows into the enterprise, people and processes are dynamic in nature • Questions arising during this phase are “what to do” and “how to make the best sense of the new data source”. Rapid integration tools will aid in quick prototyping and building solutions of value Rapid Integration and Curation Method • The data is profiled and explored for value and quality issues. • A rapid pruning exercise is undertaken by prototyping and integrating with in-house data to evaluate if data is fit for purpose. It influences in formulating a effective approach for further phases. Information Management Approach Time 6 | ©2013, Cognizant Managed • As we progress, issues with the new data are identified and managed. The main focus is on establishing data quality and adhering to enterprise standards and frameworks while building optimal integration approaches • The integration process is evolutionary as further discoveries are made for optimal design Evolutionary • Progressive build based on the new data. • Building awareness of the new platform and fine tuning the capabilities around the data source are primary activities Proactive • Data management evolves to a morerefined state. A feedback loop is built to enable proactive decisions around data organization and access. • Data integration is efficient and stable. Verifiable compliance and security. • Integrated with the enterprise information management framework Predictable • The services built around the new data sources are now managed. • The focus is on evolution of business processes, based on managed models Tactical Progressive Managed First 1-5 Days First 1 -3 Months After 3 months
  7. 7. Leverage Insights and Expertise, Rapidly and Sustainably Identify and leverage existing, relevant data assets and expertise Ingest new data sources (light integration and curation) Reuse Expertise Analyze Monitor and measure use and benefits achieved; identify next set of priorities Realize Benefits Extend Create and extend data relationships, leveraging insights from previous study cycles Govern Elevate proven data, relationships, and expertise to organization-wise definition 7 | ©2013, Cognizant Refine Capture insights from new data analysis cycles, refining relationships to support new analytics
  8. 8. Can You Help Me With Some Data? 8 | ©2013, Cognizant
  9. 9. Rapid Data Integration and Curation Method 1 Define Preliminary Objectives 2 Profile the New Data 3 Generate Initial Ontology for the New Data Generate Initial Ontology for the Existing Data (if necessary) 4 5 Integrate Entities over Common URIs 6 Create URI Links 7 Add Initial Data Quality Filters 8 9 | ©2013, Cognizant Analyze Data and Generate Feedback
  10. 10. 1. Define Preliminary Objectives 1. Discuss Functional and Timing Objectives, and Priorities 2. Clarify Immediate, Short-Term, and Long-Term Business Value (SMART *) a. Cost Reduction/Avoidance b. Meet Critical Customer Need 3. Is This the Right Solution? 4. Set Expectations a. Evolutionary Process b. Initial Results Quickly c. Frequent, Active Participation d. Feedback Critical to Making Refinements 5. Brainstorm Deliverables that Produce Business Benefits; Define a Few Sample Queries 6. Ask for Commitment to Benefits Realization 7. Start the Clock! * SMART -- Specific, Measurable, Attainable, Realistic, and Traceable 10 | ©2013, Cognizant
  11. 11. 2. Profile the New Data Light Profiling, focusing on Understanding Key Data Elements Needed to Meet the First Deliverable Identify Initial Data Filtering Candidates Capture Insights about Key Data Relationships 11 | ©2013, Cognizant
  12. 12. 3. Generate Initial Ontology for the New Data Reverse-engineer Ontology from New Data Load New Data into the RDF Store (or Create Link to the Data) Create Business-relevant Synonyms for High-Importance Attributes Refinements will be made in Future Iterations 12 | ©2013, Cognizant
  13. 13. 4. Generate Initial Ontology for the Existing Data (if necessary) Map Selected Entities and Critical Attributes for Existing Data Source(s) to the Source-specific Ontology Existing Data New Data 13 | ©2013, Cognizant Add Reference to the Source-specific Ontology to the New Data Ontology Refinements will be made in Future Iterations New Data Ontology manages integration with Existing Data until the ontology is sufficiently mature to be promoted into an enterprise ontology
  14. 14. 5. Integrate Entities over Common URIs Different URIs, Separately Maintained Focus on Key Entities Equivalence Functions Logically Integrate the Federated Data Reduces Query Complexity and Can Improve Query Performance 14 | ©2013, Cognizant
  15. 15. 6. Create URI Links Geography Customer cust:ZipCode JOIN geo:ZipCode Geography Customer cust:ZipCodeURI LINK The Data has Common Values that can be used in Join Operations, but Doesn’t have Links Links Reduce Query Complexity and Can Improve Query Performance Focus on Key Queries, Identify Complex or Time-Sensitive Joins Add Linking URI Attribute to Dependent Entity Amend Selected Queries to Leverage the New Link 15 | ©2013, Cognizant
  16. 16. 7. Add Initial Data Quality Filters and Transformations Traditional Data Warehouse Data Quality Happens Here Data Quality Happens Here Data Source A Data Source B Data Source C 16 | ©2013, Cognizant Existing Data ETL New Data And Data Here Warehouse JIT Data Quality Management, Everywhere that it is Needed Data Filtering and Transformation Rules are Encoded in the Ontology Focus is on Critical Data Quality Rules Rule Updates are Automatically in Effect, without Reloading All of the Data
  17. 17. 8. Analyze Data and Generate Feedback Demonstrate Visualization using Sample Queries Walk Through Available Data Sets and Data Organization Experiment with Data Access and New Visualizations Provide Next Steps Recommendations to Refine the Data Integration and Curation 17 | ©2013, Cognizant
  18. 18. Architectural Foundation for Rapid Data Integration and Curation SPARQL-based Visualization Relational-to-RDF Mapping Data Profiling 18 | ©2013, Cognizant Ontology Editor Automated Ontology Generation RDF Store Data Import RDF Store
  19. 19. Capabilities That We Have Introduced Rapid Response to New Data Onboarding Needs Process for Evolutionary Data Integration and Curation Flexible Design that is Responsive to Business Changes Foundation for Refinement and Expansion of Ontology Models from Fit-for-Purpose to Department, to Business Unit, to Enterprise 19 | ©2013, Cognizant
  20. 20. Questions? 20 | ©2013, Cognizant
  21. 21. Thank you! 21 | ©2013, Cognizant
  22. 22. Speaker Thomas (Tom) Kelly Practice Director, Enterprise Information Management, Cognizant Thomas Kelly is a Director in Cognizant’s Enterprise Information Management (EIM) Practice and heads its Semantic Technology Center of Excellence, a technology specialty of Cognizant Business Consulting (CBC). He has 20-plus years of technology consulting experience in leading data warehousing, business intelligence and big data projects, focused primarily on the life sciences and healthcare industries. Tom can be reached at 22 | ©2013, Cognizant