SlideShare a Scribd company logo
Presented by Derek Meng
Data Integration
On the Alibaba Cloud
Big Data Platform
From OSS, RDS to
MaxCompute
01
03
02
04
General Process of Data Integration
DataWorks Basics
MaxCompute Basics
Getting Start with Alibaba Cloud
DATA INTEGRATION MAXCOMPUTE
DATAWORKS DEMO
Overview
2 /25
(Slide No. 3-9) (Slide No. 10-20)
(Slide No. 23-24)(Slide No. 21-22)
01
General Process of Data Integration
3 /25
Data Source and Type
Data Source and Type Introduction
1
2
General Process of Data Integration
4 /25
Data Integration
Data Integration
Data Acquisition Data Transformation Data Governance
5 /25
Unstructured Data
TXT
Picture
Video
Audio
….
Semi-Structured
Log
XML
JSON
….
Structured Data
Oracle
MySQL
SQLServer
PostgreSQL
…
Test Data Set
6 /25
Alibaba Cloud Big Data Architecture
General Process of Data Integration
1
2
General Process Data Integration
7 /25
01
Offline
Streaming
Real-Time
Streaming Process
Schedule / Maintain
02
03 Get Insight
Decision Support
Data Warehouse
8 /25
Data Source
Acquisition
• Database
• Local File
• OSS
Data Scrubbing
• SQL
• Custom Code
Data EDA
• Statistics
• Modeling
Data Storage
• Database
Report BI
Agent
• Console App
• Servers
• Sensors
Transfer and Buffer
• Streaming Transfer
Tools
Streaming Process
• Streaming Process
Tools
Data Storage
• Database
01
02
Unified Data Storage
• Database
Ad-Hoc
• Ad-Hoc Query
General Data Processing Workflow
Offline Data Process
9 /25
RDS
Database
OSS Data
Store
Server Load
Balancer
ECS Cluster
Table Store
Auto Scaling
MaxCompute
RDBMS
MySql, Sql Server, Oracle, DB2……
Hadoop Data
Hive, HBASE
Other Data Source
Txt File, Web logs, Vedio /
Audio
Data Source
MaxCompute Basics
02
10 /25
MaxCompute Basics
Basic Concepts of MaxCompute
MaxCompute Architecture
1
2
MaxCompute Data Channel and SQL3
11 /25
12 /25
• Project is the most basic unit for resource
isolation
• Multiple projects can share the resources of
the same cluster
• A Project is similar to Oracle’s Database
• Tables, users and jobs are all subordinate to
a project
• After authorization, various projects can
achieve data interoperability
Basic Concepts
PROJECT 2 PROJECT 4
PROJECT 3
PROJECT 1
Table
User
Security
Policy
Job
Resource
13 /25
• Most of the MaxCompute-processed data is stored in a structured bi-dimensional table
• Tables are subordinate to the project
• Tables can be partitioned
• Data types in a table include Bigint, Boolean, Double, Date/Time, String, and Decimal
• Data is managed by the Pangu storage system. The automatic multi-replica storage
policy improves the data availability and blocks underlying hardware faults
• Column-store structure, compressed storage
• Built-in data lifecycle management policy
• Storage quota-based multi-tenant management mechanism
Storage
MaxCompute Basics
Basic Concepts of MaxCompute
MaxCompute Architecture
1
2
MaxCompute Data Channel and SQL3
14 /25
MaxCompute Basics
15 /25
SQL MapReduce Graph
Machine
Learning
10000 10000 10000
Cluster 1 Cluster 2 Cluster 3
Apsara Distributed System
MaxCompute Engine
MaxCompute Basics
Basic Concepts of MaxCompute
MaxCompute Architecture
1
2
MaxCompute Data Channel and SQL3
16 /25
17 /25
Tunnel
• The channel for data to go in and out of MaxCompute
• High-concurrency upload/download
• Horizontal expansion of service capabilities
• 1P throughput supported in a single day
• Batch and Real-time modes
• The real-time mode supports pub/sub models
• ODPS Tunnel-based tools include TT, CDP, Flume, and Fluentd
18 /25
• Reads and writes to tables are supported, but views are not supported
• Writes to tables adopt the Append mode
• Concurrency is supported to improve overall throughput
• Frequent commits are avoided
• The target partition for data uploads must exist
• Real-time upload mode
Tunnel
19 /25
Data Upload/Download in Tunnel
• odps@ > tunnel upload log.txt test_project.test_table/p1="b1",p2="b2“;
• odps@ > tunnel download test_project.test_table/p1="b1",p2="b2" log.txt;
• It is a Tunnel SDK-based command line tool that can be used for uploading local text
files to ODPS or downloading table data to a local location
• The table partitions should be established
• DataX, CDP, and TT have implemented better tools based on Tunnel, and the tools
can be used to support data interaction between ODPS and relational databases
• The log data can be imported using Flume, and Fluentd tools
• Special scenario users can develop custom tools based on Tunnel
Tunnel Command
20 /25
SQL
• Applicable to process a large amount of data (terabytes to petabytes)
• High Latency: the running time of every SQL statement ranges from dozens of
seconds to several hours.
• The syntax is similar to HQL of Hive, with some extensions on the basis of the
standard SQL.
• There is no transaction, and no primary key.
• UPDATE and DELETE commands are not supported.
DataWorks Basics
03
21 /25
22 /25
DataWorks
04
Getting Started with Alibaba Cloud
23 /25
DEMO
24 /25
Q&A
Big Data Quickstart Series 3: Perform Data Integration

More Related Content

What's hot

Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with ElasticsearchAlibaba Cloud
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSJohn Burwell
 
Kubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerKubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerAkshay Mathur
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxJohn Burwell
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...DataStax Academy
 
Choosing the right Cloud Database
Choosing the right Cloud DatabaseChoosing the right Cloud Database
Choosing the right Cloud DatabaseJanakiram MSV
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China TelecomMichael Stack
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackIvan Donev
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Ivan Donev
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersIvan Donev
 
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud EngineJelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud EngineRuslan Synytsky
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...confluent
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session IVMware Tanzu
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaMichael Stack
 
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...QCloudMentor
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...HostedbyConfluent
 

What's hot (20)

Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
Building Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CSBuilding Complete Private Clouds with Apache CloudStack and Riak CS
Building Complete Private Clouds with Apache CloudStack and Riak CS
 
Kubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning ControllerKubernetes as Orchestrator for A10 Lightning Controller
Kubernetes as Orchestrator for A10 Lightning Controller
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
Aneka platform
Aneka platformAneka platform
Aneka platform
 
Choosing the right Cloud Database
Choosing the right Cloud DatabaseChoosing the right Cloud Database
Choosing the right Cloud Database
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
 
Bi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stackBi and AI updates in the Microsoft Data Platform stack
Bi and AI updates in the Microsoft Data Platform stack
 
Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019Discovery Day 2019 Sofia - What is new in SQL Server 2019
Discovery Day 2019 Sofia - What is new in SQL Server 2019
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud EngineJelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
Jelastic (PaaS + IaaS) Virtual Cluster on Google Cloud Engine
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
 
Caching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session ICaching for Microservices Architectures: Session I
Caching for Microservices Architectures: Session I
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
 
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
AWS Study Group - Chapter 10 - Matching Supply and Demand [Solution Architect...
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
How Kafka and MemSQL Became the Dynamic Duo (Sarung Tripathi, MemSQL) Kafka S...
 

Similar to Big Data Quickstart Series 3: Perform Data Integration

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationDenodo
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936Alf Baez
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemJohn Efstathiades
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEOMACHBASE
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
History of Oracle and Databases
History of Oracle and DatabasesHistory of Oracle and Databases
History of Oracle and DatabasesConnor McDonald
 
Presentation racsig 090730
Presentation racsig 090730Presentation racsig 090730
Presentation racsig 090730maclean liu
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...moneyjh
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIBM_Info_Management
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Saa s multitenant database architecture
Saa s multitenant database architectureSaa s multitenant database architecture
Saa s multitenant database architecturemmubashirkhan
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 

Similar to Big Data Quickstart Series 3: Perform Data Integration (20)

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)An AMIS Overview of Oracle database 12c (12.1)
An AMIS Overview of Oracle database 12c (12.1)
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936
 
Adding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded SystemAdding Support for Networking and Web Technologies to an Embedded System
Adding Support for Networking and Web Technologies to an Embedded System
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
GPA Software Overview R3
GPA Software Overview R3GPA Software Overview R3
GPA Software Overview R3
 
History of Oracle and Databases
History of Oracle and DatabasesHistory of Oracle and Databases
History of Oracle and Databases
 
Presentation racsig 090730
Presentation racsig 090730Presentation racsig 090730
Presentation racsig 090730
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
 
Ibm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_CapabilitiesIbm_IoT_Architecture_and_Capabilities
Ibm_IoT_Architecture_and_Capabilities
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Saa s multitenant database architecture
Saa s multitenant database architectureSaa s multitenant database architecture
Saa s multitenant database architecture
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 

More from Alibaba Cloud

Why a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialWhy a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialAlibaba Cloud
 
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud
 
Serverless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueServerless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueAlibaba Cloud
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningAlibaba Cloud
 
Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Alibaba Cloud
 
An Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAn Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAlibaba Cloud
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesAlibaba Cloud
 
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Alibaba Cloud
 
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudThe Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudAlibaba Cloud
 
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudHow to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudAlibaba Cloud
 
Big Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationBig Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationAlibaba Cloud
 
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Alibaba Cloud
 
Guide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaGuide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaAlibaba Cloud
 
Introduction to WAF and Network Application Security
Introduction to WAF and Network Application SecurityIntroduction to WAF and Network Application Security
Introduction to WAF and Network Application SecurityAlibaba Cloud
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsAlibaba Cloud
 
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017Alibaba Cloud
 

More from Alibaba Cloud (16)

Why a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialWhy a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is Essential
 
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
 
Serverless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueServerless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business Value
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
 
Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)
 
An Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAn Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message Service
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best Practices
 
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
 
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudThe Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
 
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudHow to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
 
Big Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationBig Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data Visualization
 
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
 
Guide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaGuide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in China
 
Introduction to WAF and Network Application Security
Introduction to WAF and Network Application SecurityIntroduction to WAF and Network Application Security
Introduction to WAF and Network Application Security
 
How to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart LogisticsHow to Leverage Big Data to Deliver Smart Logistics
How to Leverage Big Data to Deliver Smart Logistics
 
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
China Connect Webinar: ChinaConnect: How to Apply for an ICP License in 2017
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 

Big Data Quickstart Series 3: Perform Data Integration

  • 1. Presented by Derek Meng Data Integration On the Alibaba Cloud Big Data Platform From OSS, RDS to MaxCompute
  • 2. 01 03 02 04 General Process of Data Integration DataWorks Basics MaxCompute Basics Getting Start with Alibaba Cloud DATA INTEGRATION MAXCOMPUTE DATAWORKS DEMO Overview 2 /25 (Slide No. 3-9) (Slide No. 10-20) (Slide No. 23-24)(Slide No. 21-22)
  • 3. 01 General Process of Data Integration 3 /25
  • 4. Data Source and Type Data Source and Type Introduction 1 2 General Process of Data Integration 4 /25
  • 5. Data Integration Data Integration Data Acquisition Data Transformation Data Governance 5 /25 Unstructured Data TXT Picture Video Audio …. Semi-Structured Log XML JSON …. Structured Data Oracle MySQL SQLServer PostgreSQL …
  • 7. Alibaba Cloud Big Data Architecture General Process of Data Integration 1 2 General Process Data Integration 7 /25
  • 8. 01 Offline Streaming Real-Time Streaming Process Schedule / Maintain 02 03 Get Insight Decision Support Data Warehouse 8 /25 Data Source Acquisition • Database • Local File • OSS Data Scrubbing • SQL • Custom Code Data EDA • Statistics • Modeling Data Storage • Database Report BI Agent • Console App • Servers • Sensors Transfer and Buffer • Streaming Transfer Tools Streaming Process • Streaming Process Tools Data Storage • Database 01 02 Unified Data Storage • Database Ad-Hoc • Ad-Hoc Query General Data Processing Workflow
  • 9. Offline Data Process 9 /25 RDS Database OSS Data Store Server Load Balancer ECS Cluster Table Store Auto Scaling MaxCompute RDBMS MySql, Sql Server, Oracle, DB2…… Hadoop Data Hive, HBASE Other Data Source Txt File, Web logs, Vedio / Audio Data Source
  • 11. MaxCompute Basics Basic Concepts of MaxCompute MaxCompute Architecture 1 2 MaxCompute Data Channel and SQL3 11 /25
  • 12. 12 /25 • Project is the most basic unit for resource isolation • Multiple projects can share the resources of the same cluster • A Project is similar to Oracle’s Database • Tables, users and jobs are all subordinate to a project • After authorization, various projects can achieve data interoperability Basic Concepts PROJECT 2 PROJECT 4 PROJECT 3 PROJECT 1 Table User Security Policy Job Resource
  • 13. 13 /25 • Most of the MaxCompute-processed data is stored in a structured bi-dimensional table • Tables are subordinate to the project • Tables can be partitioned • Data types in a table include Bigint, Boolean, Double, Date/Time, String, and Decimal • Data is managed by the Pangu storage system. The automatic multi-replica storage policy improves the data availability and blocks underlying hardware faults • Column-store structure, compressed storage • Built-in data lifecycle management policy • Storage quota-based multi-tenant management mechanism Storage
  • 14. MaxCompute Basics Basic Concepts of MaxCompute MaxCompute Architecture 1 2 MaxCompute Data Channel and SQL3 14 /25
  • 15. MaxCompute Basics 15 /25 SQL MapReduce Graph Machine Learning 10000 10000 10000 Cluster 1 Cluster 2 Cluster 3 Apsara Distributed System MaxCompute Engine
  • 16. MaxCompute Basics Basic Concepts of MaxCompute MaxCompute Architecture 1 2 MaxCompute Data Channel and SQL3 16 /25
  • 17. 17 /25 Tunnel • The channel for data to go in and out of MaxCompute • High-concurrency upload/download • Horizontal expansion of service capabilities • 1P throughput supported in a single day • Batch and Real-time modes • The real-time mode supports pub/sub models • ODPS Tunnel-based tools include TT, CDP, Flume, and Fluentd
  • 18. 18 /25 • Reads and writes to tables are supported, but views are not supported • Writes to tables adopt the Append mode • Concurrency is supported to improve overall throughput • Frequent commits are avoided • The target partition for data uploads must exist • Real-time upload mode Tunnel
  • 19. 19 /25 Data Upload/Download in Tunnel • odps@ > tunnel upload log.txt test_project.test_table/p1="b1",p2="b2“; • odps@ > tunnel download test_project.test_table/p1="b1",p2="b2" log.txt; • It is a Tunnel SDK-based command line tool that can be used for uploading local text files to ODPS or downloading table data to a local location • The table partitions should be established • DataX, CDP, and TT have implemented better tools based on Tunnel, and the tools can be used to support data interaction between ODPS and relational databases • The log data can be imported using Flume, and Fluentd tools • Special scenario users can develop custom tools based on Tunnel Tunnel Command
  • 20. 20 /25 SQL • Applicable to process a large amount of data (terabytes to petabytes) • High Latency: the running time of every SQL statement ranges from dozens of seconds to several hours. • The syntax is similar to HQL of Hive, with some extensions on the basis of the standard SQL. • There is no transaction, and no primary key. • UPDATE and DELETE commands are not supported.
  • 23. 04 Getting Started with Alibaba Cloud 23 /25
  • 25. Q&A

Editor's Notes

  1. (1) Cooperation with the partners of other BUs NOTE: There must be open and feasible cooperation modes. (2) Overlap with other products A: Elements that are under planning and overlap with existing products B: Elements allowing differentiated cooperation. Emphasize on the two existing differentiated elements of the other party, and then complete the whole development. Illustrate the above information in two PPT slides.
  2. Q&A