Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Emerging Data Lake IT Strategy


Published on

Meaning making – separating signal from noise. How do we transform the customer's next input into an action that creates a positive customer experience? We make the data more intelligent, so that it is able to guide our actions. The Data Lake builds on Big Data strengths by automating many of the manual development tasks, providing several self-service features to end-users, and an intelligent management layer to organize it all. This results in lower cost to create solutions, "smart" analytics, and faster time to business value.

  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

The Emerging Data Lake IT Strategy

  1. 1. © 2014 The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin, Founder and CTO Cambridge Semantics
  2. 2. © 20142 We’re living in an amazing world of information sharing, connecting with family, neighbors, vendors, and customers all over the world
  3. 3. © 20143 Telling the world about what we like and don’t like #HIMYMfinale @MLB … is now following Cognizant Technology Solutions and Cambridge Semantics
  4. 4. © 20144 What we’re doing and how we’re succeeding
  5. 5. © 20145 We’re deciding what advertising that we want to see… … and what we don’t Unsubscribe Influencing how business and customers engage
  6. 6. © 20146 Many businesses have emerged that embrace this model of customer engagement and we’ve said Goodbye to businesses that didn’t 10 million stays in 2013, without owning a hotel Grew to nearly $75B in annual retail revenue in 2013, without opening a storefront Shares over 40 million photos each day
  7. 7. © 20147 Retail Engaging in a more personalized shopping experience, retailers are building a stronger relationship with each customer
  8. 8. © 20148 Customer Service Delivering a positive and successful experience for each customer
  9. 9. © 20149 Life Sciences and Healthcare Combining health, genetic, clinical, and public sciences data to bring effective therapies to patients sooner
  10. 10. © 201410 Financial Services Delivering innovative products and services, based on a 360° view of the Customer, across all business lines, engaging all available data assets, internal and external
  11. 11. © 201411 The Challenges That We're Addressing Onboarding and Integrating Data is Slow and Expensive • Transforming data from a growing variety of technologies • Custom coded ETL • Existing ETL processes are not reusable • Optimization for analytics is time-consuming and costly • Often wait until there is a defined need for a set of data, delaying benefits realization while waiting to onboard the data Data Provenance is Often Poorly Recorded • Data meaning is “lost in translation” • Data transformations tracked in spreadsheets • Post-onboarding, maintenance and analysis cost for onboarded data is high • Recreating data lineage is manual, time-consuming, and error-prone
  12. 12. © 201412 The Challenges That We're Addressing Target Data is Difficult to Consume • Optimization favors known analytics, but not well suited to new requirements • A one-size-fits-all canonical view is used rather than fit-for-purpose views • Or, lacks a conceptual model to easily consume the target data • Difficult to identify what data is available, how to get access, and how to integrate the data to answer a question Industrializing the Big Data Environment is Difficult to Manage • Proliferation of data silos leads to inconsistency/syncing issues • Conflicting objectives of opening access to data assets while managing security and privacy requirements • Velocity of business change rapidly invalidate data organization and analytics optimizations • Managing the integration/interaction with the multiple data management technologies that make up the Big Data environment
  13. 13. © 201413 Data Ingestion The Data Lake is made up of four key components Data Lake Management Data Management Query Management Delivering • Low Cost, High Performance Storage • Flexible, Easy-to-Use Data Organization • Performance-Optimized Analytics • Automation of most manual Development and Query Activities • Self-Service End-User Features • Intelligent Processing
  14. 14. © 201414 Data Ingestion Data Lake Management Data Management Query Management Data Sources Linked Data Internet of Things IoT Data Ingestion On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Desktop and Mobile Operational Systems Social Media and Cloud
  15. 15. © 201415 Data Management Data Lake Management Data Management Query Management Provenance Data Movement Data Sources Linked Data Internet of Things IoT Semantic Graph Columnar In Memory Data Ingestion On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Desktop and Mobile NoSQL Map Reduce Operational Systems Social Media and Cloud HDFS Storage Structured and Unstructured Data HDFS Storage
  16. 16. © 201416 Data Ingestion Data Lake Management Data Management Query Management Semantic Graph Columnar In Memory Provenance Data Movement Data Lake Management Data Assets Catalog WorkflowModels Access Management Data Sources Linked Data Internet of Things IoT Data Mappings • Source-to-Target • Transformations • Internal and External Data Assets • Defined Data Orgs (ontologies, taxonomies, thesauri) • Authorization and Access Rules • Rule-based Security • Group, Role, and User Level Authorization • Auditable Access • Processes • Schedules • Provenance Capture On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Business-Focused • Business Unit Data Organization and Terms • Optimized to Assist Analytics Monitoring • Monitor and Manage Data Lake Operations Desktop and Mobile Data Governance • Focus on Shared Data • Standard Models • Controlled Vocabulary • Common Definitions • Standards-based Data Views (FIBO, CDISC/RDF) NoSQL Map Reduce Operational Systems Social Media and Cloud Structured and Unstructured Data HDFS Storage
  17. 17. © 201417 Query Management Data Ingestion On-Demand Query Streaming Semantic Tagging Data Lake Management Data Management Scheduled Batch Load Model- Driven Self-Service Query Management Provenance Data Movement Data Sources Linked Data Internet of Things IoT Semantic Graph Columnar In Memory Query Data, Metadata, and Provenance Capture and Share Analytics Expertise Semantic Search Analytics Directed to the Best Query Engine Data Discovery Desktop and Mobile NoSQL Map Reduce Operational Systems Social Media and Cloud HDFS Storage Structured and Unstructured Data HDFS Storage
  18. 18. © 201418 Semantic Technology Delivers “Smart” Data Integrates a network of internal and external data assets, insulating end users from the details of the underlying technologies Captures expertise (logic, inferencing) and integrates it with the data, delivering “smart” data to non-expert users Manages a comprehensive inventory of the data assets Secures access to the right data assets by the right users
  19. 19. © 201419 Key W3C Standards in Semantic Technology Resource Description Framework (RDF) Framework for storing and integrating data and data definitions in the form of subject- predicate-object expressions, or “triples”. Relationships are organized in a logical graph model. Reduced development time and cost; faster time-to- business value. Web Ontology Language (OWL) An ontology is a comprehensive model of data definitions and relationships that is human- and machine-readable. Ontologies are inheritable and extensible. Improved application quality, flexible iterative / investigative approach, easily adapts to business change. SPARQL Query Language SQL-like query language for semantic data that can leverage the ontological relationships and constructs to execute smarter queries. Access multiple internal and external databases simultaneously in a single query. Access and integrate data across business silos. Inference Reasoning over data through business rules. Expertise is captured and embedded in the ontology model, accessible through user queries. This is the “smart” in Smart Data. Easier end user access to expertise; intelligent systems capabilities. Linked Data Connects data contained in different databases, allowing queries to find, share and combine data so insights can be identified across the Web. Connect disparate databases to navigate and integrate data regardless of location or technology platform. RDB to RDF Mapping Language (R2RML) Preserving current investments in relational technology, R2RML maps relational data to an ontology. SPARQL can query RDF and relational databases simultaneously. Low cost of entry to use Semantic Technology to deliver high-value solutions
  20. 20. © 201420 The Common Model is the “Data Glue” Lead (SFA system) Quote (Quote system) Order (OMS system) Contract (CMS system) Common Model (“Data Glue”) Source Systems • Different business entities in physical systems actually share many of the same concepts, meanings, and relationships • Semantic data science exposes common business concepts and connects them with their physical expression in production systems • Data is “glued” together by its business meaning, rather than physical structures dictated by the underlying technologies The conceptual model can be directly used by both business and IT users to operationalize data services, understand the data landscape, track data lineage, and conduct downstream analytics.
  21. 21. © 201421 Semantic Models Relate Data by Business Meaning Life Events Life Style Preferences Interests Customer Music Purchasing Personal Network Entertainment Profession
  22. 22. © 201422 Implications to the Existing IT Architecture and Practices User Tools to Discover and Optimize Data Relationships Structured and Unstructured Data, Voice, and Video Data Analysis Automation Extends Existing Investments in IT Architecture Manages Secure Access Builds Out Enterprise Data Models, with Integration Hub Capabilities Self-Service Data Feeds and Analytics Infrastructure Capacity Elasticity Reduction of Data Mart Silos Easier Access to External Data
  23. 23. © 201423 Data Lake Approach to Meeting Business Needs Business Needs Traditional Technologies and Practices Data Lake Technologies and Practices Onboard New Data  Comprehensive analysis creates rigid structure that is difficult to change, or  Minimal definition of data organization requires detailed understanding of data contents  Flexible data model can be revised or extended without redesign of the database  Agile, evolutionary refinement of the data organization, leveraging new insights as users work with the data Connect External Data  External data is collected and loaded into the analytics repository.  Data is streamed, or is refreshed on a scheduled frequency.  External data can be sourced from databases, spreadsheets, Web pages, news feeds, and more; data is queried through common methods, without regard to location, with real-time values delivered at query time. Integrate Data between Business Units or Business Partners  Governance activities establish common vocabulary, and data definitions  And, systems of record publish existing data specifications or ontology model; each organization defines data in a manner that is best suited for its business.  Shared data is copied to an integrated database.  Federation and virtualization features provide choices in which data to copy and which data to retain in the system(s) of record  Organization-specific definitions may require duplicating certain data in marts  All models can be supported through a single copy of the data, maintained in the data lake or system of record. Capture and Embed Expertise  Expertise often captured in the reporting and analytics; change management challenge when updates required.  Expertise captured in the data definitions; single, shared definition minimizes change management efforts
  24. 24. © 201424 Lessons learned from early adopters Prioritize Prioritize data onboarding by the data’s ability to contribute to customer engagement Onboard Onboard data assets as they become available Connect Connect to available internal and external data assets Load Load the data unfiltered/untransformed Organize Use models to provide organization to the data Customize Create models that are tailored to the needs of the business groups Search Make it easy to find data Secure Manage security and privacy, but make it easy to authorize access to data that users need
  25. 25. © 201425 Addressing Challenges - Privacy vs Personal Value - Granularity of customer understanding - Delivering strategic objectives when projects tend to have a technical focus - Opening access to data - Need for executive sponsorship - Access to external data - Establishing firewalls - Persistent, pervasive data quality issues
  26. 26. © 201426 Clues to better customer engagement will be found in the ever-growing volume of data that we’re creating
  27. 27. © 201427 A Data Lake Strategy helps you to create a personalized, engaging experience with each customer Visibility Self-Service SmartProvenance Open, yet Secure Internet Scale Agile Adaptable Universal Data Access
  28. 28. © 201428 Questions?
  29. 29. © 201429 Thank you!