SlideShare a Scribd company logo
김형준 
GruterVectorizedProcessing in a Nutshell
김형준, babokim@gmail.com 
www.jaso.co.kr 
클라우드컴퓨팅구현기술등저 
Apache Tajo Committer 
Gruter(BigData) NHN(Hadoop) SDS(공공SI, 젂자ITO) 
Who I am
Vectorized 
Query Compilation 
CPU pipeline 
Branch misprediction 
이건무슨외계어? 
SI 개발자가왜여기까지왔나? 
Cache Miss 
SIMD 
LLVM
Hadoop쉽다며? 
Hadoop빠르다며?
메모리를잘활용해보자! 
실행계획을잘만들어보자! 
파일포맷을최적화해보자! 
... 
그래서다양한시도가 
SQL을이용한분석
단순하게하드웨어로해결? 
처리해야할데이터가많으니Disk만빠르면? 
http://www.enterprisestorageforum.com/storage-hardware/ssd-vs.-hdd-pricing-seven-myths-that-need-correcting-2.html 
164 
205 
563 
947 
3,277 
0 
5000 
10000 
15000 
0 
1000 
2000 
3000 
4000 
SATA-HDD 
SAS-HDD-15K 
SATA-SSD 
SAS-SSD 
PCIe-SSDRead MB/s 
$/TB 
MB/sec 
$/1TB
SAS * 24개= 200MB/sec * 24 = 4.8GB/s 
X 50대 
240GB/s10초에2.4TB 
이렇게했는데도별로안빠르다!
CPU, 53 
DISK, 30 
N/W, 17 
HP-DL385P, SATA/HDD*8, 10대Tajo, TPC-H Scale 1000질의 
데이터분석리소스사용유형CPU, 72DISK, 10
더빠른CPU를만들거나 
→ H/W 제조사가할일 
CPU/Memory가많은장비를사용 
(Scale-up) 
→ 비용이많이들고 
메인보드의한계 
그렇다고계속HDD만사용?
CPU를마른수건짜듯이잘활용Vectorized 
Query 
Compilation 
우리(개발자)가할수있는것은
어떻게하면CPU를 
잘활용할수있나? 
한번에더많은명령을실행하게하고 
→ CPU Pipeline 
CPU내메모리를잘활용하고 
→ CPU Cache 
CPU에서제공하는기능을잘활용한다. 
→ SIMD
DBMS는어떻게데이터를처리? 
SelectionGroupby 
Join 
Orderby 
Eval 
Filter 
SCAN 
... 
Basic OperatorSLECT o_custkey, l_lineitem, count(*) FROM lineitemaJOIN orders b ON a.l_orderkey= b.o_orderkeyWHERE a.l_shipdate> ‘2014-09-01’ GROUP BY o_custkey, l_lineitem 
SCAN 
(lineitem) 
SCAN 
(orders) 
Fliter 
Join 
Groupby 
Operator 조합으로 
Tree 형태의 
PhysicalPlan생성Call() 
Call() 
Return Tuple 
Return Tuple
Tajo란? 
•SQL on Hadoop솔루션으로분류 
–Hadoop을메인저장소로사용하는Data Warehouse 시스템 
–수초미만의Interactive 질의+ 수시갂의ETL 질의모두지원 
•Apache Open Source (http://tajo.apache.org) 
•자체분산실행엔진 
–MapReduce가아닌자체분산실행엔진이용 
•다양한성능최적화기법도입 
–Join Optimization, Dynamic Planning 
–Disk balanced scheduler, Multi-Level Count Distinct 
–Query Compilation, VectorizedEngine(2015 1Q) 등 
•이미엔터프라이즈에적용 
–상용DW Win back, 수TB/일처리 
http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon_Keuntae_Park_0.pdf 
http://spark-summit.org/wp-content/uploads/2014/07/Big-Telco-Bigger-Real-time-Demands-Jungryong-Lee.pdf
Tajo DW 사례Operational Systems 
Integration 
Layer 
Data Warehouse 
Data Mart 
Marketing 
Sales 
ERPSCM 
ODS 
Staging Area 
Strategic 
Marts 
Data 
Vault 
상용DW용DBMS 역할대체
이제본론으로...
이미지: http://www.digitalinternals.com/ 
Instruction 
처리속도향상이아닌 
처리용량(throughput) 
향상 
CPU Pipeline 
Not Pipelined 
Pipelined
CPU Pipeline의어려움 
•Structural Hazards 
–여러명령이같은하드웨어리소스를동시에필요로할때 
•Data Hazards 
–instruction들이사용하는data갂의존성문제 
–ex) c = a+b; 
e = c+d; 
•Control Hazards 
–조건문등이있는경우다음instruction을예측하여pipelining 
–상황에따라분기오예측(branch misprediction)이발생 
–분기예측이잘못되었을경우pipelining에있는instruction이모두flush됨 
–Pipelining stage가길수록throughput이떨어짐 
–분기오예측은일반적으로10-20 CPU cycle 소모 
다음과같은경우Pipeline이중지
for (i= 0; i< 128; i++) { 
a[i] = b[i] + c[i]; 
} for (i= 0; i< 128; i+=4) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; a[i+2] = b[i+2] + c[i+2]; a[i+3] = b[i+3] + c[i+3]; } 
Original Loop 
Unrolled Loop 
지시자: #pragmaGCC optimize ("unroll-loops") 
컴파일옵션: -funroll-loops 등 
Loop unrolling 
컴파일러옵션또는지시자로설정가능 
Control Hazard 개선 
조건비교
for (inti= 0; i< num; i++) { 
if (data[i] >= 128) { 
sum += data[i]; 
} 
} 
for (inti= 0; i< num; i++) { 
int v = (data[i] -128) >> 31; 
sum += ~v & data[i]; 
} 
Branch version 
Non branch version 
Branch avoidance 
http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array 
Random data : 35,683 ms 
Sorted data : 11,977 ms 
11,977 ms10,402 ms 
Control Hazard 개선
Memory Hierarchy 
http://article.techlabs.by/29_1411.html 
L1 Cache 
L2 Cache 
L3 Cache 
Main Memory
CPU Cache 특징 
•Spatial locality 
–cache line (64 bytes) 단위인접한데이터까지메모리로부터읽음 
–File System의Block size와같은개념 
•Temporal locality 
–cache의크기는제한적이며캐쉬유지젂략은LRU 
–즉, 가장최귺사용된것이캐쉬에남아있음 
Cache 
Cache line 
(64 byte) 
Main Memory
for (i= 0 to size) 
for (j = 0 to size) 
do something with ary[j][i] 
for (i= 0 to size) 
for (j = 0 to size) 
do something with ary[i][j] 
인접하지않은메모리를반복해서 
읽기때문에비효율적 
순차읽기로인한높은cache hit 
Case 1 
Case 2 
CPU Hit 비교 
Classic Example of Spatial Locality
i 
j 
Case 2 
Cache Line 
i 
j 
Case 1 
CPU Hit 비교
DBMS에서의CPU Cache 
•Tuple을Row 단위로저장할경우 
–Row의크기가커서Cache Hit율이낮음 
•Column 단위로저장할경우 
–Cache Hit율이높음 
–동일한Operator를동일컬럼에서여러개의데이터를한번에수행가능(SIMD) 
–하지만기존의처리방식(Tuple-at-a time)을사용하지못함 
•시스템젂반적인변경이필요
SIMD 
•Single Instruction Multiple Data 
–CPU에서제공하는병렬처리명령Set 
–사칙연산및논리연산을SIMD명령으로제공 
•Pipeline은명령을병렬처리, SIMD는데이터를병렬처리 
•Intel CPU의SIMD는MMX라는이름으로제안, SSE, AVX로진화중 
•Nehalem은단일명령으로128bit 데이터 
•Sandy Bridge 이후부터는256bit 데이터에대한명령셋제공 
–256 bit의크기= 16 Shorts, 8 Integers/Floats, 4 Long/Double 
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
SIMD
https://software.intel.com/sites/landingpage/IntrinsicsGuide/ 
Intel SIMD Instruction
DBMS 에서는 
•TupleMemory Model 
–Tuple을메모리에저장하는방식 
–Cache hit율에영향 
•TupleProcessing Model 
–처리하는Tuple의단위 
–Cache hit, CPU Pipeline, SIMD 활용에모두영향
OR 
출처:[Marcin2009] 
TupleMemory Model 
N-array Storage Model 
Decomposed Storage 
Model
TupleMemory Model 
•Row-store (NSM) 
–하나의row를레코드단위로하여물리적으로연속적인바이트열로저장 
–장점: 레코드단위의쉬운삽입/삭제/업데이트가능 
–단점: I/O의최소단위는레코드 
–OLTP에적합 
•Column-store (DSM에서발젂) 
–컬럼을별도의파일에저장 
–장점 
•필요한컬럼들에대해컬럼단위I/O가능 
•매우빠른읽기성능 
•더높은압축률 
–단점 
•많은컬럼을읽을경우tuple형태로만드는비용이큼 
•파일쓸때여러파일에대해I/O가발생 
•읽기위주의분석적처리에적합
Tuple-at-a-time 
Block-at-a-time 
Column-at-a-time 
Vectorized 
TupleProcessing Model
Tuple-at-a-time Model 
next() 
next() 
next() 
…. 
…. 
…. JOIN A,B 
Volcano-an extensible and parallel query evaluation system 
SCAN 
JOIN C 
SCAN 
... 
결과Tuple 
•next() 호출마다재귀적으로하위연산자의next() 호출 
•next()의반환값은하나의Tuple 
•N-array Storage Model(NSM) 레코드기반 
•CPU활용측면에서비효율적 
–함수호출비용(20 CPU cycle) 
–Pipeline 활용도낮음 
–SIMD 활용도낮음 
–빈번한instruction/data cache miss 
–NSM으로인한cache miss 
•MySQL, Pig,Tajo(AS-IS)가기반하는구조
Block-at-a-time Model 
•한번의next() 호출에N개의NSM 기반tuple 
•N-array Storage Model(NSM)에기반 
•Impala, PostgreSQL등 
•장점 
–함수호출비용감소 
–instruction/data cache hit 향상 
•단점 
–Pipelining 활용도여젂히낮음 
–SIMD활용도낮음 
–NSM으로인한cache miss 
Block oriented processing of Relational Database operations in modern Computer Architectures 
next() 
next() 
next() 
…. 
JOIN A,B 
SCAN 
JOIN C 
SCAN 
... 
결과Tuple 
…. 
….
Column-at-a-time Model 
•Decomposed Storage Model 사용 
•각컬럼배열자료구조에는젂체Row의 
데이터를반환 
•MonetDB에서사용 
•장점 
–함수호출비용감소 
–Instruction/datacache hit 향상 
–Pipelining 활용도높음 
–SIMD활용도높음 
•단점 
–중갂데이터유지비용높음 
–여젂히높은cache miss빈도 
•두컬럼의데이터를이용하는연산의경우 
Monet: a next-Generation DBMS Kernel 
For Query-Intensive Applications 
next() 
next() 
next() 
JOIN A,B 
SCAN 
JOIN C 
결과Tuple
Column-at-a-time Model 
•성능비교 
–MySQL: 26.2 sec, MonetDB: 3.7 sec 
–TPCH-Q1(1 Table Groupby, Orderby, 결과데이터는4건) 
–In-memory 데이터에대한비교실험결과 
•논리연산/사칙연산별로별도Primitive 연산자필요 
/* Returns a vector of row IDs’ satisfying comparison 
* and count of entries in vector. */ 
intselect_gt_float(int* res, float* column, float val, intn) { 
intidx=0; 
for (inti= 0; i< n; i++) { 
idx+= column[i] > val; 
res[idx] = j; 
} 
return idx; 
} 
Greater-than primitive 연산자의예 
기본데이터타입, 사칙연산, 논리연산에약5천여개필요
VectorizedModel 
•1개의Column에대해Vector에여러개의데이터를저장 
•서로다른Column vector를CPU L2 Cache 크기(256KB)에맞추어row group 단위로연산 
•조건식/계산식은vector 단위로처리 
•장점 
–함수호출비용감소 
–Instruction/datacache hit 향상 
–Pipelining 활용도높음 
–SIMD활용도높음 
–Interpret overhead 최소화 
•단점 
–새로운모델로개발필요 
MonetDB / X100 : Hyper-Pipelining Query Execution 
next() 
next() 
next() 
JOIN A,B 
SCAN 
JOIN C 
결과Tuple 
... 
... 
...
Filter 연산예제*+ 
/ 
< 
RowGroupRowGroup 
Vector 
Vectorized 
Primitives 
(Filter) 
VectorizedModel 
•하나의operator 처리를위해서row group에속하는다수의컬럼들을엑세스필요 
•Column-at-a-time 방식은그과정에서cache miss가발생 
•Vectorized방식은각row group의vector들이CPU L2 cache사이즈에맞도록설계 
•CPU Pipelining, SIMD, CPU Cache세가지활용을극대화 
•MySQL(Tuple-at-a-time) : 26.2 secs 
•MonetDB(Column-at-atime) : 3.7 secs 
•MonetDBX100(Vectorized) : 0.6 secs
VectorizedModelVectorSize
We always welcome 
new contributors! 
General 
http://tajo.apache.org 
Issue Tracker 
https://issues.apache.org/jira/browse/TAJO 
Join the mailing list 
dev-subscribe@tajo.apache.org, issues-subscribe@tajo.apache.org 
한국User group 
https://groups.google.com/forum/?hl=ko#!forum/tajo-user-kr
Q&A
THANK YOU

More Related Content

What's hot

Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
Ryousei Takano
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
Kohei KaiGai
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
csandit
 
CuPy v4 and v5 roadmap
CuPy v4 and v5 roadmapCuPy v4 and v5 roadmap
CuPy v4 and v5 roadmap
Preferred Networks
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
Kohei KaiGai
 
Streams on wires
Streams on wiresStreams on wires
Streams on wires
Takefumi MIYOSHI
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
Ganesan Narayanasamy
 
PG-Strom
PG-StromPG-Strom
PG-Strom
Kohei KaiGai
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
Kohei KaiGai
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
Ryousei Takano
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
Kohei KaiGai
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
Kohei KaiGai
 
Sun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionSun jdk 1.6 gc english version
Sun jdk 1.6 gc english version
bluedavy lin
 
Lecture3
Lecture3Lecture3
Lecture3
Asad Abbas
 
cuTau Leaping
cuTau LeapingcuTau Leaping
cuTau Leaping
Amritesh Srivastava
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
journalBEEI
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 

What's hot (19)

Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
 
CuPy v4 and v5 roadmap
CuPy v4 and v5 roadmapCuPy v4 and v5 roadmap
CuPy v4 and v5 roadmap
 
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~
 
Streams on wires
Streams on wiresStreams on wires
Streams on wires
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)SQL+GPU+SSD=∞ (English)
SQL+GPU+SSD=∞ (English)
 
Sun jdk 1.6 gc english version
Sun jdk 1.6 gc english versionSun jdk 1.6 gc english version
Sun jdk 1.6 gc english version
 
Lecture3
Lecture3Lecture3
Lecture3
 
cuTau Leaping
cuTau LeapingcuTau Leaping
cuTau Leaping
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 

Similar to [2A2]Vectorized_processing_in_a_Nutshell

Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Rose Toomey
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
Ludovic Piot
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
Ganesan Narayanasamy
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
DataStax Academy
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
cscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
csandit
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
Deepak Shankar
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
Amgad Muhammad
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
Chinmay Kulkarni
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Amazon Web Services
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
ScyllaDB
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
Amazon Web Services
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
bakers84
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
RichardWarburton
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
Amazon Web Services
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
Nitin Kumar
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData
 

Similar to [2A2]Vectorized_processing_in_a_Nutshell (20)

Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)In-Memory Data Grids - Ampool (1)
In-Memory Data Grids - Ampool (1)
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 

More from NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
NAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D2
 

More from NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Recently uploaded

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 

Recently uploaded (20)

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 

[2A2]Vectorized_processing_in_a_Nutshell

  • 1.
  • 3. 김형준, babokim@gmail.com www.jaso.co.kr 클라우드컴퓨팅구현기술등저 Apache Tajo Committer Gruter(BigData) NHN(Hadoop) SDS(공공SI, 젂자ITO) Who I am
  • 4. Vectorized Query Compilation CPU pipeline Branch misprediction 이건무슨외계어? SI 개발자가왜여기까지왔나? Cache Miss SIMD LLVM
  • 7. 단순하게하드웨어로해결? 처리해야할데이터가많으니Disk만빠르면? http://www.enterprisestorageforum.com/storage-hardware/ssd-vs.-hdd-pricing-seven-myths-that-need-correcting-2.html 164 205 563 947 3,277 0 5000 10000 15000 0 1000 2000 3000 4000 SATA-HDD SAS-HDD-15K SATA-SSD SAS-SSD PCIe-SSDRead MB/s $/TB MB/sec $/1TB
  • 8. SAS * 24개= 200MB/sec * 24 = 4.8GB/s X 50대 240GB/s10초에2.4TB 이렇게했는데도별로안빠르다!
  • 9. CPU, 53 DISK, 30 N/W, 17 HP-DL385P, SATA/HDD*8, 10대Tajo, TPC-H Scale 1000질의 데이터분석리소스사용유형CPU, 72DISK, 10
  • 10. 더빠른CPU를만들거나 → H/W 제조사가할일 CPU/Memory가많은장비를사용 (Scale-up) → 비용이많이들고 메인보드의한계 그렇다고계속HDD만사용?
  • 12. 어떻게하면CPU를 잘활용할수있나? 한번에더많은명령을실행하게하고 → CPU Pipeline CPU내메모리를잘활용하고 → CPU Cache CPU에서제공하는기능을잘활용한다. → SIMD
  • 13. DBMS는어떻게데이터를처리? SelectionGroupby Join Orderby Eval Filter SCAN ... Basic OperatorSLECT o_custkey, l_lineitem, count(*) FROM lineitemaJOIN orders b ON a.l_orderkey= b.o_orderkeyWHERE a.l_shipdate> ‘2014-09-01’ GROUP BY o_custkey, l_lineitem SCAN (lineitem) SCAN (orders) Fliter Join Groupby Operator 조합으로 Tree 형태의 PhysicalPlan생성Call() Call() Return Tuple Return Tuple
  • 14. Tajo란? •SQL on Hadoop솔루션으로분류 –Hadoop을메인저장소로사용하는Data Warehouse 시스템 –수초미만의Interactive 질의+ 수시갂의ETL 질의모두지원 •Apache Open Source (http://tajo.apache.org) •자체분산실행엔진 –MapReduce가아닌자체분산실행엔진이용 •다양한성능최적화기법도입 –Join Optimization, Dynamic Planning –Disk balanced scheduler, Multi-Level Count Distinct –Query Compilation, VectorizedEngine(2015 1Q) 등 •이미엔터프라이즈에적용 –상용DW Win back, 수TB/일처리 http://events.linuxfoundation.org/sites/events/files/slides/ApacheCon_Keuntae_Park_0.pdf http://spark-summit.org/wp-content/uploads/2014/07/Big-Telco-Bigger-Real-time-Demands-Jungryong-Lee.pdf
  • 15. Tajo DW 사례Operational Systems Integration Layer Data Warehouse Data Mart Marketing Sales ERPSCM ODS Staging Area Strategic Marts Data Vault 상용DW용DBMS 역할대체
  • 17. 이미지: http://www.digitalinternals.com/ Instruction 처리속도향상이아닌 처리용량(throughput) 향상 CPU Pipeline Not Pipelined Pipelined
  • 18. CPU Pipeline의어려움 •Structural Hazards –여러명령이같은하드웨어리소스를동시에필요로할때 •Data Hazards –instruction들이사용하는data갂의존성문제 –ex) c = a+b; e = c+d; •Control Hazards –조건문등이있는경우다음instruction을예측하여pipelining –상황에따라분기오예측(branch misprediction)이발생 –분기예측이잘못되었을경우pipelining에있는instruction이모두flush됨 –Pipelining stage가길수록throughput이떨어짐 –분기오예측은일반적으로10-20 CPU cycle 소모 다음과같은경우Pipeline이중지
  • 19. for (i= 0; i< 128; i++) { a[i] = b[i] + c[i]; } for (i= 0; i< 128; i+=4) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; a[i+2] = b[i+2] + c[i+2]; a[i+3] = b[i+3] + c[i+3]; } Original Loop Unrolled Loop 지시자: #pragmaGCC optimize ("unroll-loops") 컴파일옵션: -funroll-loops 등 Loop unrolling 컴파일러옵션또는지시자로설정가능 Control Hazard 개선 조건비교
  • 20. for (inti= 0; i< num; i++) { if (data[i] >= 128) { sum += data[i]; } } for (inti= 0; i< num; i++) { int v = (data[i] -128) >> 31; sum += ~v & data[i]; } Branch version Non branch version Branch avoidance http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array Random data : 35,683 ms Sorted data : 11,977 ms 11,977 ms10,402 ms Control Hazard 개선
  • 21. Memory Hierarchy http://article.techlabs.by/29_1411.html L1 Cache L2 Cache L3 Cache Main Memory
  • 22. CPU Cache 특징 •Spatial locality –cache line (64 bytes) 단위인접한데이터까지메모리로부터읽음 –File System의Block size와같은개념 •Temporal locality –cache의크기는제한적이며캐쉬유지젂략은LRU –즉, 가장최귺사용된것이캐쉬에남아있음 Cache Cache line (64 byte) Main Memory
  • 23. for (i= 0 to size) for (j = 0 to size) do something with ary[j][i] for (i= 0 to size) for (j = 0 to size) do something with ary[i][j] 인접하지않은메모리를반복해서 읽기때문에비효율적 순차읽기로인한높은cache hit Case 1 Case 2 CPU Hit 비교 Classic Example of Spatial Locality
  • 24. i j Case 2 Cache Line i j Case 1 CPU Hit 비교
  • 25. DBMS에서의CPU Cache •Tuple을Row 단위로저장할경우 –Row의크기가커서Cache Hit율이낮음 •Column 단위로저장할경우 –Cache Hit율이높음 –동일한Operator를동일컬럼에서여러개의데이터를한번에수행가능(SIMD) –하지만기존의처리방식(Tuple-at-a time)을사용하지못함 •시스템젂반적인변경이필요
  • 26. SIMD •Single Instruction Multiple Data –CPU에서제공하는병렬처리명령Set –사칙연산및논리연산을SIMD명령으로제공 •Pipeline은명령을병렬처리, SIMD는데이터를병렬처리 •Intel CPU의SIMD는MMX라는이름으로제안, SSE, AVX로진화중 •Nehalem은단일명령으로128bit 데이터 •Sandy Bridge 이후부터는256bit 데이터에대한명령셋제공 –256 bit의크기= 16 Shorts, 8 Integers/Floats, 4 Long/Double https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
  • 27. SIMD
  • 29. DBMS 에서는 •TupleMemory Model –Tuple을메모리에저장하는방식 –Cache hit율에영향 •TupleProcessing Model –처리하는Tuple의단위 –Cache hit, CPU Pipeline, SIMD 활용에모두영향
  • 30. OR 출처:[Marcin2009] TupleMemory Model N-array Storage Model Decomposed Storage Model
  • 31. TupleMemory Model •Row-store (NSM) –하나의row를레코드단위로하여물리적으로연속적인바이트열로저장 –장점: 레코드단위의쉬운삽입/삭제/업데이트가능 –단점: I/O의최소단위는레코드 –OLTP에적합 •Column-store (DSM에서발젂) –컬럼을별도의파일에저장 –장점 •필요한컬럼들에대해컬럼단위I/O가능 •매우빠른읽기성능 •더높은압축률 –단점 •많은컬럼을읽을경우tuple형태로만드는비용이큼 •파일쓸때여러파일에대해I/O가발생 •읽기위주의분석적처리에적합
  • 32. Tuple-at-a-time Block-at-a-time Column-at-a-time Vectorized TupleProcessing Model
  • 33. Tuple-at-a-time Model next() next() next() …. …. …. JOIN A,B Volcano-an extensible and parallel query evaluation system SCAN JOIN C SCAN ... 결과Tuple •next() 호출마다재귀적으로하위연산자의next() 호출 •next()의반환값은하나의Tuple •N-array Storage Model(NSM) 레코드기반 •CPU활용측면에서비효율적 –함수호출비용(20 CPU cycle) –Pipeline 활용도낮음 –SIMD 활용도낮음 –빈번한instruction/data cache miss –NSM으로인한cache miss •MySQL, Pig,Tajo(AS-IS)가기반하는구조
  • 34. Block-at-a-time Model •한번의next() 호출에N개의NSM 기반tuple •N-array Storage Model(NSM)에기반 •Impala, PostgreSQL등 •장점 –함수호출비용감소 –instruction/data cache hit 향상 •단점 –Pipelining 활용도여젂히낮음 –SIMD활용도낮음 –NSM으로인한cache miss Block oriented processing of Relational Database operations in modern Computer Architectures next() next() next() …. JOIN A,B SCAN JOIN C SCAN ... 결과Tuple …. ….
  • 35. Column-at-a-time Model •Decomposed Storage Model 사용 •각컬럼배열자료구조에는젂체Row의 데이터를반환 •MonetDB에서사용 •장점 –함수호출비용감소 –Instruction/datacache hit 향상 –Pipelining 활용도높음 –SIMD활용도높음 •단점 –중갂데이터유지비용높음 –여젂히높은cache miss빈도 •두컬럼의데이터를이용하는연산의경우 Monet: a next-Generation DBMS Kernel For Query-Intensive Applications next() next() next() JOIN A,B SCAN JOIN C 결과Tuple
  • 36. Column-at-a-time Model •성능비교 –MySQL: 26.2 sec, MonetDB: 3.7 sec –TPCH-Q1(1 Table Groupby, Orderby, 결과데이터는4건) –In-memory 데이터에대한비교실험결과 •논리연산/사칙연산별로별도Primitive 연산자필요 /* Returns a vector of row IDs’ satisfying comparison * and count of entries in vector. */ intselect_gt_float(int* res, float* column, float val, intn) { intidx=0; for (inti= 0; i< n; i++) { idx+= column[i] > val; res[idx] = j; } return idx; } Greater-than primitive 연산자의예 기본데이터타입, 사칙연산, 논리연산에약5천여개필요
  • 37. VectorizedModel •1개의Column에대해Vector에여러개의데이터를저장 •서로다른Column vector를CPU L2 Cache 크기(256KB)에맞추어row group 단위로연산 •조건식/계산식은vector 단위로처리 •장점 –함수호출비용감소 –Instruction/datacache hit 향상 –Pipelining 활용도높음 –SIMD활용도높음 –Interpret overhead 최소화 •단점 –새로운모델로개발필요 MonetDB / X100 : Hyper-Pipelining Query Execution next() next() next() JOIN A,B SCAN JOIN C 결과Tuple ... ... ...
  • 38. Filter 연산예제*+ / < RowGroupRowGroup Vector Vectorized Primitives (Filter) VectorizedModel •하나의operator 처리를위해서row group에속하는다수의컬럼들을엑세스필요 •Column-at-a-time 방식은그과정에서cache miss가발생 •Vectorized방식은각row group의vector들이CPU L2 cache사이즈에맞도록설계 •CPU Pipelining, SIMD, CPU Cache세가지활용을극대화 •MySQL(Tuple-at-a-time) : 26.2 secs •MonetDB(Column-at-atime) : 3.7 secs •MonetDBX100(Vectorized) : 0.6 secs
  • 40. We always welcome new contributors! General http://tajo.apache.org Issue Tracker https://issues.apache.org/jira/browse/TAJO Join the mailing list dev-subscribe@tajo.apache.org, issues-subscribe@tajo.apache.org 한국User group https://groups.google.com/forum/?hl=ko#!forum/tajo-user-kr
  • 41. Q&A