SlideShare a Scribd company logo
© 2015 Autodesk
Building a Self-Service
Big Data Pipeline
Charlie Crocker
Business Analytics Program Lead
Hadoop Summit, San Jose – June 2015
© 2015 Autodesk
© 2015 Autodesk
Multi-core & GPU
Cloud
Distributed Computing
Reality Capture
Model Sophistication
Variations Data
Compute
© 2015 Autodesk
© 2015 Autodesk
BIG DATA PIPELINE DETAILS
© 2015 Autodesk
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
© 2014 Autodesk
CONSISTENT TRUSTED ACCESSIBLE
INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
© 2015 Autodesk
Production Big Data Pipeline Stats
• Core Services
• 360 Products/Services
• Desktop Products
• Operations Data
• 2.1 billion transactions/day
• 350 source types
• 750-800 GB indexed daily
• 165(+) active Users
• 800 Terabytes total
• 90 GB/day
• 350 S3
Aggregations
• 128 Tableau Desktop
• 57 Tableau Server
• 25 Datameer Users
• 10 Qlikview Dashboards
• 150 QV Users
• >80 GBQ Tables
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
Example: Specific Service Calls
Over 60 million/day
© 2015 Autodesk
© 2015 Autodesk
Example: Desktop Analytics Managed Source:
Trusted
Consistent
Accessible
3.1M Users/Wk
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to Kafka
Apply
Log
Schema
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to Kafka
Apply
Log
Schema
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
SLOW
DOWN
SLOW
DOWN
SLOW
DOWN
SLOW
DOWN
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
SLOW
DOWN
SLOW
DOWN
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
Forward
to Kafka
Apply
Log
Schema
Onboard faster:
Transition to Services
© 2015 Autodesk
Production Big Data Pipeline
Teams
Engage
Forward
to Kafka
Apply
Log
Schema
Forward
to
Hadoop
Define
Cubes
Deploy
Cubes
Publish
Data &
Explore
Deliver value faster:
Streamlined Access
Onboard faster:
Transition to Services
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
TRANSITION TO SERVICES
© 2015 Autodesk
Tools
Fragmented
Architecture
Manual ingestion
(Kafka)
Dashboard POCs
Production Scaling
Services
Architecture Alignment
Managed Ingestion (CSE)
ADSK Dashboard
Framework
© 2015 Autodesk
. highly available
. secure
. massively scalable
. insanely high volume
. cloud ops infrastructure
Build Services
© 2015 Autodesk
. easy to consume sdks
. simple data contracts
. self service onboarding
. fault tolerant sdks
Make Services Ridiculously Easy
© 2015 Autodesk
Fast Access
Layer
Client SDKs
Data Portal
Analytics as a Service
API Access
Cross Service
Eventing
Metadata
Management
Analytics Tools Scoring Pipeline Dashboard
Framework
Other
Services
+
Scaleable
Compute
Workflow
Management
Ingestion Injection
© 2015 Autodesk
Platform Services Detail
Desktop
(Windows, Mac, Linux)
Mobile
(iOS, Android,
Windows)
Web
(Chrome, Explorer,
Safari, etc.)
Client MPA
Service
Cloud Services
Explore/Publish
Datameer
API Access
Data Virtualization (EDW)
Denodo
Batch Processing
(Hive Cluster)
Fast Access
Google BigQuery, Red Shift, Spark, QVD
Reporting
Tableau, Qlikview,
Dashboards
Core Services Traditional Data Warehouses
Back Office
(SAP, Siebel, etc.)
Enterprise Data Lake: Storage (S3)
Query Processing
(Hive Cluster)
CSE (Ingestion) Injector
Govern Enterprise Data Lake: Metadata
© 2015 Autodesk
STREAMLINED DATA ACCESS
© 2015 Autodesk
Analytics Consumers
Non-Technical Users
1000s
10s
Business Analyst
Data Analyst
Data
Scientists
Analytics
Ops
© 2014 Autodesk
• Excel like
• Easy to access
• Medium to small
data set
• Easy to display
• Easy to aggregate
• Handle large data
• Data visualization
• Integration with
other tools• Connection with other
data source
• Handle unstructured
data
• Combine data from
multiple sources
© 2015 Autodesk
Self-Service Explore, Aggregation and Publish
Non-technical users need to quickly explore,
create, and publish aggregations from the data lake
and visualize the results in their tool of choice.
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Monitoring &
Discovery
Realtime
Products,
Platform &
Infrastructure
1 month
Indexed
Data
Data
Gathering
All analytics &
debug data
Raw service data
1 week
Raw data
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Web Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Unified Customer
Profile
QlikView
Web
Services
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Monitoring &
Discovery
Realtime
Products, Platform &
Infrastructure
Data Gathering
All analytics & debug
data
Raw service data
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 week
Raw data
1 month
Indexed data
1 year (or more)
Aggregated &
summarized
data
Services
Infrastructure
Hardware
Pla) orms
Network
Security
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Kafka
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
One Source, Multiple Access Points
 Daily push to
 S3 buckets and REST API
 Google Big Query or Redshift
 Access
 Tableau Server (GBQ)
 Qlikview (REST, QVDs)
 ADSK Dashboards (S3)
 Datameer (S3)
 Hive (EMR and S3)
 Data Products
 Early Warning System
 Syndicated Video Wall
 Executive Daily Reports
 Personalized Product Experiences
Analytics &
Reports
Batch Oriented
Business,
Product &
Customer
Behavior
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 year (or more)
Aggregated &
summarized
data
Business &
Transactional
ODS
SAP
Subscrip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Unified Customer
Profile
QlikView
1 year (+)
Aggregated &
Summarized
data
Curated
Data
Product &
Business
Analysis
Analytics & Reports
Batch oriented
Business, Product &
Customer Behavior
1 year (or more)
Aggregated &
summarized
data
ess &
actional
ODS
AP
crip: on
Product/Business
Analysis
Interactive & Focused
Any amount
needed
Product
Group Data
Other...
Metrics
GA
Data Cube
Unified Customer
Profile
QlikView
1 year (+)
Aggregated &
summarized
data
ADSKDashboar
d
© 2015 Autodesk
From this
© 2015 Autodesk
To this
© 2015 Autodesk
Datameer: Big Data Analytics for Hadoop
Wizard-led Data Integration
No ETL
70+ Connectors + plug-in API
Smart Sampling
Point-and-click Analytics
Spreadsheet UI
270+ pre-built functions
Visual Data Profiling
Drag-and-Drop Visualization
30+ Visualization Widgets
HTML5 support
View on any device
© 2015 Autodesk
Datameer: Create Standard Aggregations
 Parse JSON from S3
 Join to account data
 Process using EMR compute
 Output directly to S3
 Output directly to Tableau Server
Couple hours instead of 5 weeks
waiting for engineering sprint
© 2015 Autodesk
One
Catalog
© 2015 Autodesk
One
Catalog
© 2015 Autodesk
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
0
1
0
1
1
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
1
0
0
© 2014 Autodesk
CONSISTENT TRUSTED ACCESSIBLE
INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
© 2015 Autodesk
We’re Hiring!
Data Geeks (Scientists?)
Data Analysts
Data Engineers
Charlie.Crocker@autodesk.com
Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to
their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or
graphical errors that may appear in this document.
© 2015 Autodesk, Inc. All rights reserved.

More Related Content

What's hot

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
DataScienceConferenc1
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
Ontotext
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Dr. Arif Wider
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made Simple
James Serra
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
DATAVERSITY
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
Wasm1953
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
Mark Kromer
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Cathrine Wilhelmsen
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
Mark Kromer
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 

What's hot (20)

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Choosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your ProjectChoosing the Right Graph Database to Succeed in Your Project
Choosing the Right Graph Database to Succeed in Your Project
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made Simple
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 

Viewers also liked

Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
StampedeCon
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For Enterprise
NVIDIA
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
Paco Nathan
 
2017 Digital Yearbook
2017 Digital Yearbook2017 Digital Yearbook
2017 Digital Yearbook
We Are Social Singapore
 
Digital in 2017 Global Overview
Digital in 2017 Global OverviewDigital in 2017 Global Overview
Digital in 2017 Global Overview
We Are Social Singapore
 

Viewers also liked (6)

Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For Enterprise
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
2017 Digital Yearbook
2017 Digital Yearbook2017 Digital Yearbook
2017 Digital Yearbook
 
Digital in 2017 Global Overview
Digital in 2017 Global OverviewDigital in 2017 Global Overview
Digital in 2017 Global Overview
 

Similar to Building a Self-Service Big Data Pipeline

Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
Hadoop @ LifeWay
Hadoop @ LifeWayHadoop @ LifeWay
Hadoop @ LifeWay
jimforrester11
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Denodo
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Nicola Sandoli
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
Khalid Salama
 
BI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPBI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAP
tdwiindia
 
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
Sergio Zenatti Filho
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
GIS Into to Cloud Microsoft Azure
GIS  Into  to Cloud Microsoft Azure GIS  Into  to Cloud Microsoft Azure
GIS Into to Cloud Microsoft Azure
Usama Wahab Khan Cloud, Data and AI
 
TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015
Bipin Singh
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
James Serra
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environment
DataWorks Summit
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
Drive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data EnvironmentsDrive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data Environments
Cisco Services
 
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your DataMongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB
 

Similar to Building a Self-Service Big Data Pipeline (20)

Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Hadoop @ LifeWay
Hadoop @ LifeWayHadoop @ LifeWay
Hadoop @ LifeWay
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
BI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPBI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAP
 
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
Auckland SQLSaturday 2018 - Building a Modern Analytics Solution in the cloud...
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
GIS Into to Cloud Microsoft Azure
GIS  Into  to Cloud Microsoft Azure GIS  Into  to Cloud Microsoft Azure
GIS Into to Cloud Microsoft Azure
 
TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015TIBCO Advanced Analytics Meetup (TAAM) November 2015
TIBCO Advanced Analytics Meetup (TAAM) November 2015
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environment
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Drive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data EnvironmentsDrive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data Environments
 
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your DataMongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 

Recently uploaded (20)

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 

Building a Self-Service Big Data Pipeline

  • 1. © 2015 Autodesk Building a Self-Service Big Data Pipeline Charlie Crocker Business Analytics Program Lead Hadoop Summit, San Jose – June 2015
  • 3. © 2015 Autodesk Multi-core & GPU Cloud Distributed Computing Reality Capture Model Sophistication Variations Data Compute
  • 5. © 2015 Autodesk BIG DATA PIPELINE DETAILS
  • 6. © 2015 Autodesk 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 © 2014 Autodesk CONSISTENT TRUSTED ACCESSIBLE INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
  • 7. © 2015 Autodesk Production Big Data Pipeline Stats • Core Services • 360 Products/Services • Desktop Products • Operations Data • 2.1 billion transactions/day • 350 source types • 750-800 GB indexed daily • 165(+) active Users • 800 Terabytes total • 90 GB/day • 350 S3 Aggregations • 128 Tableau Desktop • 57 Tableau Server • 25 Datameer Users • 10 Qlikview Dashboards • 150 QV Users • >80 GBQ Tables Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 8. © 2015 Autodesk Example: Specific Service Calls Over 60 million/day
  • 10. © 2015 Autodesk Example: Desktop Analytics Managed Source: Trusted Consistent Accessible 3.1M Users/Wk
  • 11. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Kafka Apply Log Schema Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 12. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Kafka Apply Log Schema Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore SLOW DOWN SLOW DOWN SLOW DOWN SLOW DOWN Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 13. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore SLOW DOWN SLOW DOWN Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d Forward to Kafka Apply Log Schema Onboard faster: Transition to Services
  • 14. © 2015 Autodesk Production Big Data Pipeline Teams Engage Forward to Kafka Apply Log Schema Forward to Hadoop Define Cubes Deploy Cubes Publish Data & Explore Deliver value faster: Streamlined Access Onboard faster: Transition to Services Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 16. © 2015 Autodesk Tools Fragmented Architecture Manual ingestion (Kafka) Dashboard POCs Production Scaling Services Architecture Alignment Managed Ingestion (CSE) ADSK Dashboard Framework
  • 17. © 2015 Autodesk . highly available . secure . massively scalable . insanely high volume . cloud ops infrastructure Build Services
  • 18. © 2015 Autodesk . easy to consume sdks . simple data contracts . self service onboarding . fault tolerant sdks Make Services Ridiculously Easy
  • 19. © 2015 Autodesk Fast Access Layer Client SDKs Data Portal Analytics as a Service API Access Cross Service Eventing Metadata Management Analytics Tools Scoring Pipeline Dashboard Framework Other Services + Scaleable Compute Workflow Management Ingestion Injection
  • 20. © 2015 Autodesk Platform Services Detail Desktop (Windows, Mac, Linux) Mobile (iOS, Android, Windows) Web (Chrome, Explorer, Safari, etc.) Client MPA Service Cloud Services Explore/Publish Datameer API Access Data Virtualization (EDW) Denodo Batch Processing (Hive Cluster) Fast Access Google BigQuery, Red Shift, Spark, QVD Reporting Tableau, Qlikview, Dashboards Core Services Traditional Data Warehouses Back Office (SAP, Siebel, etc.) Enterprise Data Lake: Storage (S3) Query Processing (Hive Cluster) CSE (Ingestion) Injector Govern Enterprise Data Lake: Metadata
  • 22. © 2015 Autodesk Analytics Consumers Non-Technical Users 1000s 10s Business Analyst Data Analyst Data Scientists Analytics Ops © 2014 Autodesk • Excel like • Easy to access • Medium to small data set • Easy to display • Easy to aggregate • Handle large data • Data visualization • Integration with other tools• Connection with other data source • Handle unstructured data • Combine data from multiple sources
  • 23. © 2015 Autodesk Self-Service Explore, Aggregation and Publish Non-technical users need to quickly explore, create, and publish aggregations from the data lake and visualize the results in their tool of choice. Analytics & Reports Batch Oriented Business, Product & Customer Behavior Monitoring & Discovery Realtime Products, Platform & Infrastructure 1 month Indexed Data Data Gathering All analytics & debug data Raw service data 1 week Raw data Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Web Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Unified Customer Profile QlikView Web Services 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Monitoring & Discovery Realtime Products, Platform & Infrastructure Data Gathering All analytics & debug data Raw service data Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 week Raw data 1 month Indexed data 1 year (or more) Aggregated & summarized data Services Infrastructure Hardware Pla) orms Network Security Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Kafka Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 24. © 2015 Autodesk One Source, Multiple Access Points  Daily push to  S3 buckets and REST API  Google Big Query or Redshift  Access  Tableau Server (GBQ)  Qlikview (REST, QVDs)  ADSK Dashboards (S3)  Datameer (S3)  Hive (EMR and S3)  Data Products  Early Warning System  Syndicated Video Wall  Executive Daily Reports  Personalized Product Experiences Analytics & Reports Batch Oriented Business, Product & Customer Behavior Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 year (or more) Aggregated & summarized data Business & Transactional ODS SAP Subscrip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Unified Customer Profile QlikView 1 year (+) Aggregated & Summarized data Curated Data Product & Business Analysis Analytics & Reports Batch oriented Business, Product & Customer Behavior 1 year (or more) Aggregated & summarized data ess & actional ODS AP crip: on Product/Business Analysis Interactive & Focused Any amount needed Product Group Data Other... Metrics GA Data Cube Unified Customer Profile QlikView 1 year (+) Aggregated & summarized data ADSKDashboar d
  • 27. © 2015 Autodesk Datameer: Big Data Analytics for Hadoop Wizard-led Data Integration No ETL 70+ Connectors + plug-in API Smart Sampling Point-and-click Analytics Spreadsheet UI 270+ pre-built functions Visual Data Profiling Drag-and-Drop Visualization 30+ Visualization Widgets HTML5 support View on any device
  • 28. © 2015 Autodesk Datameer: Create Standard Aggregations  Parse JSON from S3  Join to account data  Process using EMR compute  Output directly to S3  Output directly to Tableau Server Couple hours instead of 5 weeks waiting for engineering sprint
  • 31. © 2015 Autodesk 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 0 © 2014 Autodesk CONSISTENT TRUSTED ACCESSIBLE INSTRUMENT COLLECT CONSUMEPROCESSORGANIZE
  • 32. © 2015 Autodesk We’re Hiring! Data Geeks (Scientists?) Data Analysts Data Engineers Charlie.Crocker@autodesk.com
  • 33. Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2015 Autodesk, Inc. All rights reserved.

Editor's Notes

  1. Ingest 2.1 billion event/day 200 (+) data sources flowing into production Splunk 600-650 GB indexed daily Process 800 Terabytes of active data in Enterprise Data Lake (EDL) 90 GB/day entering the EDL 300 S3 Aggregations updating daily, with replicated in 50 GBQ Tables Consume >100 Tableau Desktop and 50 Tableau Server users Feeding 8 Qlikview dashboards with 150 active QV users Feeding the Early Warning System
  2. Global Reach Fast ramp up Managed Data Source What does it mean to be managed Owner Pipeline Metadata Stays current
  3. Interactive & Focused
  4. Interactive & Focused
  5. API access, workflow management (Oozie), support for data streaming and machine learning.