© 2016 Autodesk | Enterprise Information Services
Designing an Agile Fast Data Architecture for Big Data Ecosystem
using Logical Data Warehouse and Data Virtualization
Kurt Jackson
Autodesk Enterprise Information Services
© 2016 Autodesk | Enterprise Information Services
Introduction
© 2016 Autodesk | Enterprise Information Services 3
Some Definitions
 Agile
 “The division of tasks into short
phases of work and frequent
reassessment and adaptation of
plans.”
 Data Architecture
 “The models, policies, rules or
standards that govern which data is
collected, and how it is stored,
arranged, integrated.”
 Logical Data Warehouse
 “A logical abstraction layer which sits
on top of a variety of enterprise data
sources. The logical layer provides
durable data views without needing to
move or transform data from the
sources.”
 Data Virtualization
 “Data management that allows an
application to retrieve and
manipulate data without knowing
specific details about the data, such as
how it is formatted or where it is
physically located.”
© 2016 Autodesk | Enterprise Information Services 4
Agile
Data
Architecture
Logical Data
Warehouse
Data
Virtualization
Agile Data Architecture Lifecycle
© 2016 Autodesk | Enterprise Information Services
Business Problem
© 2016 Autodesk | Enterprise Information Services 6
Multi-year Transition
Autodesk’s Business Challenge
Subscription
and
Rental
Perpetual
© 2016 Autodesk | Enterprise Information Services 7
© 2016 Autodesk | Enterprise Information Services 8
Most of us are in the same boat
© 2016 Autodesk | Enterprise Information Services
The Autodesk Agile Data Architecture
© 2016 Autodesk | Enterprise Information Services 10
Philosophy
 Access and refine data
near the source
 Published logical data
interfaces
 Truly agile data
environment
© 2016 Autodesk | Enterprise Information Services 11
Autodesk Data Architecture
© 2016 Autodesk | Enterprise Information Services 12
Why Build the Logical Data Warehouse Data virtualization can be used
throughout your data pipeline!
© 2016 Autodesk | Enterprise Information Services 13
Big Data Ecosystem
© 2016 Autodesk | Enterprise Information Services 14
One More Definition
 Data Governance
 “The management of the
availability, usability, integrity,
and security of
the data employed in an
enterprise.”
© 2016 Autodesk | Enterprise Information Services 15
Logical Data Warehouses are an essential part of your Data
Governance Strategy for your Big Data Ecosystem
 Availability
 Channeling end user access
through a single governance
point simplifies administration
 Usability
 The LDW provides a single
repository for schema
definitions
 Simplifies end-user access for
visualization and interpretation
 Integrity
 Only published views in the LDW
are publically available
 Coupled with ownership,
guarantees the quality of the
data set
 Security
 The LDW can provide a single
point for authentication,
authorization and audit trail for
end user access
© 2016 Autodesk | Enterprise Information Services 16
The Logical Data Warehouse implements the philosophy
 Access and refine data near the source
 No painful ETL pipelines for data
derivation
 Leverage power of Spark for fast access
 Published logical data interfaces
 Single access point for all of external data
sets
 Enterprise-class governance across the
big data ecosystem
 Truly agile data environment
 Facilitates rapid change/evolution in your
big data ecosystem
 Rip and replace becomes almost
transparent – replace the system that
delivers those views and you’re done
© 2016 Autodesk | Enterprise Information Services
Building the Agile Data Architecture at Autodesk
© 2016 Autodesk | Enterprise Information Services 18
Implementation Approach
 Identify enterprise data sources
 Harder than you think
 All new custom streaming, highly-available
ingestion mechanism
 Self-service or nearly so
 Kafka/Flume
 Leverage best-of breed for individual
components
 Spark for ETL and fast access
 Hcatalog/Oozie for metadata and job
orchestration
 Denodo for LDW
 Leverage highly-redundant cloud storage for
the data lake
 S3
 Develop canonical representations for your
data sets
 Freakin’ hard!
 Virtualize Spark fast access, data
warehouses and marts with a next
generation Logical DW
 New implementations leverage the LDW
 Legacy migrates opportunistically to Spark
fast access
© 2016 Autodesk | Enterprise Information Services 19
Data Consumers
Architecting the Data Virtualization Layer
Corporate
LDAP
Data Virt
Instance
1
Data Virt
Instance
n
…
Logging Infrastructure
CI/CD
Source
Repository
Data
Data
Code
Audit
Audit
Legacy
Data Sources
© 2016 Autodesk | Enterprise Information Services 20
Build an Information Architecture
 Base views to abstract data sources
 Layered derived views to reflect successively refined
derivations
 Create the notion of publication for curated, externally
visible views
 Expose services on top of views to make views more
accessible
 Separate namespaces (schemas) by project or
subject area
 Build the notion of commonality for views shared
across schemas
 Naming conventions for all objects
 Data portal for one-stop shopping for data consumers
© 2016 Autodesk | Enterprise Information Services 21
Building an LDW makes your Big
Data Ecosystem Enterprise-Ready
Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk
reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.
© 2016 Autodesk | Enterprise Information Services. All rights reserved

Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logical Data Warehouse and Data Virtualization

  • 1.
    © 2016 Autodesk| Enterprise Information Services Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logical Data Warehouse and Data Virtualization Kurt Jackson Autodesk Enterprise Information Services
  • 2.
    © 2016 Autodesk| Enterprise Information Services Introduction
  • 3.
    © 2016 Autodesk| Enterprise Information Services 3 Some Definitions  Agile  “The division of tasks into short phases of work and frequent reassessment and adaptation of plans.”  Data Architecture  “The models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated.”  Logical Data Warehouse  “A logical abstraction layer which sits on top of a variety of enterprise data sources. The logical layer provides durable data views without needing to move or transform data from the sources.”  Data Virtualization  “Data management that allows an application to retrieve and manipulate data without knowing specific details about the data, such as how it is formatted or where it is physically located.”
  • 4.
    © 2016 Autodesk| Enterprise Information Services 4 Agile Data Architecture Logical Data Warehouse Data Virtualization Agile Data Architecture Lifecycle
  • 5.
    © 2016 Autodesk| Enterprise Information Services Business Problem
  • 6.
    © 2016 Autodesk| Enterprise Information Services 6 Multi-year Transition Autodesk’s Business Challenge Subscription and Rental Perpetual
  • 7.
    © 2016 Autodesk| Enterprise Information Services 7
  • 8.
    © 2016 Autodesk| Enterprise Information Services 8 Most of us are in the same boat
  • 9.
    © 2016 Autodesk| Enterprise Information Services The Autodesk Agile Data Architecture
  • 10.
    © 2016 Autodesk| Enterprise Information Services 10 Philosophy  Access and refine data near the source  Published logical data interfaces  Truly agile data environment
  • 11.
    © 2016 Autodesk| Enterprise Information Services 11 Autodesk Data Architecture
  • 12.
    © 2016 Autodesk| Enterprise Information Services 12 Why Build the Logical Data Warehouse Data virtualization can be used throughout your data pipeline!
  • 13.
    © 2016 Autodesk| Enterprise Information Services 13 Big Data Ecosystem
  • 14.
    © 2016 Autodesk| Enterprise Information Services 14 One More Definition  Data Governance  “The management of the availability, usability, integrity, and security of the data employed in an enterprise.”
  • 15.
    © 2016 Autodesk| Enterprise Information Services 15 Logical Data Warehouses are an essential part of your Data Governance Strategy for your Big Data Ecosystem  Availability  Channeling end user access through a single governance point simplifies administration  Usability  The LDW provides a single repository for schema definitions  Simplifies end-user access for visualization and interpretation  Integrity  Only published views in the LDW are publically available  Coupled with ownership, guarantees the quality of the data set  Security  The LDW can provide a single point for authentication, authorization and audit trail for end user access
  • 16.
    © 2016 Autodesk| Enterprise Information Services 16 The Logical Data Warehouse implements the philosophy  Access and refine data near the source  No painful ETL pipelines for data derivation  Leverage power of Spark for fast access  Published logical data interfaces  Single access point for all of external data sets  Enterprise-class governance across the big data ecosystem  Truly agile data environment  Facilitates rapid change/evolution in your big data ecosystem  Rip and replace becomes almost transparent – replace the system that delivers those views and you’re done
  • 17.
    © 2016 Autodesk| Enterprise Information Services Building the Agile Data Architecture at Autodesk
  • 18.
    © 2016 Autodesk| Enterprise Information Services 18 Implementation Approach  Identify enterprise data sources  Harder than you think  All new custom streaming, highly-available ingestion mechanism  Self-service or nearly so  Kafka/Flume  Leverage best-of breed for individual components  Spark for ETL and fast access  Hcatalog/Oozie for metadata and job orchestration  Denodo for LDW  Leverage highly-redundant cloud storage for the data lake  S3  Develop canonical representations for your data sets  Freakin’ hard!  Virtualize Spark fast access, data warehouses and marts with a next generation Logical DW  New implementations leverage the LDW  Legacy migrates opportunistically to Spark fast access
  • 19.
    © 2016 Autodesk| Enterprise Information Services 19 Data Consumers Architecting the Data Virtualization Layer Corporate LDAP Data Virt Instance 1 Data Virt Instance n … Logging Infrastructure CI/CD Source Repository Data Data Code Audit Audit Legacy Data Sources
  • 20.
    © 2016 Autodesk| Enterprise Information Services 20 Build an Information Architecture  Base views to abstract data sources  Layered derived views to reflect successively refined derivations  Create the notion of publication for curated, externally visible views  Expose services on top of views to make views more accessible  Separate namespaces (schemas) by project or subject area  Build the notion of commonality for views shared across schemas  Naming conventions for all objects  Data portal for one-stop shopping for data consumers
  • 21.
    © 2016 Autodesk| Enterprise Information Services 21 Building an LDW makes your Big Data Ecosystem Enterprise-Ready
  • 22.
    Autodesk is aregistered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2016 Autodesk | Enterprise Information Services. All rights reserved