Netflix’s architecture involves thousands of microservices built to serve unique business needs. As this architecture grew, it became clear that the data storage and query needs were unique to each area; there is no one silver bullet which fits the data needs for all microservices. CDE (Cloud Database Engineering team) offers polyglot persistence, which promises to offer ideal matches between problem spaces and persistence solutions. In this meetup you will get a deep dive into the Self service platform, our solution to repairing Cassandra data reliably across different datacenters, Memcached Flash and cross region replication and Graph database evolution at Netflix.
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
Netflix’s architecture involves thousands of microservices built to serve unique business needs. As this architecture grew, it became clear that the data storage and query needs were unique to each area; there is no one silver bullet which fits the data needs for all microservices. CDE (Cloud Database Engineering team) offers polyglot persistence, which promises to offer ideal matches between problem spaces and persistence solutions. In this meetup you will get a deep dive into the Self service platform, our solution to repairing Cassandra data reliably across different datacenters, Memcached Flash and cross region replication and Graph database evolution at Netflix.
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
Drivetribe is the world’s digital hub for motoring, as envisioned by Jeremy Clarkson, Richard Hammond, and James May. The Drivetribe platform was designed ground up with high scalability in mind. Built on top of the Event Sourcing/CQRS pattern, the platform uses Apache Kafka as its source of truth and Apache Flink as its processing backbone. This talk aims to introduce the architecture, and elaborate on how common problems in social media, such as counting big numbers and dealing with outliers, can be resolved by a healthy mix of Flink and functional programming.
Hadoop & Spark Performance tuning using Dr. ElephantAkshay Rai
Dr. Elephant is a tool for the users of Hadoop to help them understand, analyze and tune their Hadoop/Spark applications easily, thus improving their productivity and the cluster’s efficiency. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.
Degrading Performance? You Might be Suffering From the Small Files SyndromeDatabricks
No matter if your data pipelines are handling real-time event-driven streams, near-real-time streams, or batch processing jobs. When you work with a massive amount of data made out of small files, specifically parquet, your system performance will degrade.
A small file is one that is significantly smaller than the storage block size. Yes, even with object stores such as Amazon S3, Azure Blob, etc., there is minimum block size. Having a significantly smaller object file can result in wasted space on the disk since the storage is optimized to support fast read and write for minimal block size.
To understand why this happens, you need first to understand how cloud storage works with the Apache Spark engine. In this session, you will learn about Parquet, the Storage API calls, how they work together, why small files are a problem, and how you can leverage DeltaLake for a more straightforward, cleaner solution.
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing.
by Jason Plurad and Jing Chen He of IBM
With hundreds of new and sometimes disparate tools, it’s hard to keep pace. Amazon Web Services provides a broad and fully integrated portfolio of cloud computing services to help you build, secure and deploy your big data applications.
Attend this webinar to get an overview of the different big data options available in the AWS Cloud – including popular big data frameworks such as Hadoop, Spark, NoSQL databases, and more. Learn about ideal use cases, cases to avoid, performance, interfaces, and more. Finally, learn how you can build valuable applications with a real-life example.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
How Adobe uses Structured Streaming at ScaleDatabricks
Adobe’s Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard earned lessons and as we reached this scale specifically with Structured Streaming.
Know thy Lag
While consuming off a Kafka topic which sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Reading Data In
Fan Out Pattern using minPartitions to Use Kafka Efficiently
Overload protection using maxOffsetsPerTrigger
More Apache Spark Settings used to optimize Throughput
MicroBatching Best Practices
Map() +ForEach() vs MapPartitons + forEachPartition
Adobe Spark Speculation and its Effects
Calculating Streaming Statistics
Windowing
Importance of the State Store
RocksDB FTW
Broadcast joins
Custom Aggegators
OffHeap Counters using Redis
Pipelining
Have you ever wondered what the relative differences are between two of the more popular open source, in-memory data stores and caches? In this session, we will describe those differences and, more importantly, provide live demonstrations of the key capabilities that could have a major impact on your architectural Java application designs.
At the StampedeCon 2015 Big Data Conference: YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the features of the Capacity Scheduler that enable Multi-Tenancy and how resource sharing can be rebalanced using features like Preemption.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
Activision Data team has been running a data pipeline for a variety of Activision games for many years. Historically we used a mix of micro-batch microservices coupled with classic Big Data tools like Hadoop and Hive for ETL. As a result, it could take up to 4-6 hours for data to be available to the end customers.
In the last few years, the adoption of data in the organization skyrocketed. We needed to de-legacy our data pipeline and provide near-realtime access to data in order to improve reporting, gather insights faster, power web and mobile applications. I want to tell a story about heavily leveraging Kafka Streams and Kafka Connect to reduce the end latency to minutes, at the same time making the pipeline easier and cheaper to run. We were able to successfully validate the new data pipeline by launching two massive games just 4 weeks apart.
Drivetribe is the world’s digital hub for motoring, as envisioned by Jeremy Clarkson, Richard Hammond, and James May. The Drivetribe platform was designed ground up with high scalability in mind. Built on top of the Event Sourcing/CQRS pattern, the platform uses Apache Kafka as its source of truth and Apache Flink as its processing backbone. This talk aims to introduce the architecture, and elaborate on how common problems in social media, such as counting big numbers and dealing with outliers, can be resolved by a healthy mix of Flink and functional programming.
Hadoop & Spark Performance tuning using Dr. ElephantAkshay Rai
Dr. Elephant is a tool for the users of Hadoop to help them understand, analyze and tune their Hadoop/Spark applications easily, thus improving their productivity and the cluster’s efficiency. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.
Degrading Performance? You Might be Suffering From the Small Files SyndromeDatabricks
No matter if your data pipelines are handling real-time event-driven streams, near-real-time streams, or batch processing jobs. When you work with a massive amount of data made out of small files, specifically parquet, your system performance will degrade.
A small file is one that is significantly smaller than the storage block size. Yes, even with object stores such as Amazon S3, Azure Blob, etc., there is minimum block size. Having a significantly smaller object file can result in wasted space on the disk since the storage is optimized to support fast read and write for minimal block size.
To understand why this happens, you need first to understand how cloud storage works with the Apache Spark engine. In this session, you will learn about Parquet, the Storage API calls, how they work together, why small files are a problem, and how you can leverage DeltaLake for a more straightforward, cleaner solution.
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing.
by Jason Plurad and Jing Chen He of IBM
With hundreds of new and sometimes disparate tools, it’s hard to keep pace. Amazon Web Services provides a broad and fully integrated portfolio of cloud computing services to help you build, secure and deploy your big data applications.
Attend this webinar to get an overview of the different big data options available in the AWS Cloud – including popular big data frameworks such as Hadoop, Spark, NoSQL databases, and more. Learn about ideal use cases, cases to avoid, performance, interfaces, and more. Finally, learn how you can build valuable applications with a real-life example.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
How Adobe uses Structured Streaming at ScaleDatabricks
Adobe’s Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard earned lessons and as we reached this scale specifically with Structured Streaming.
Know thy Lag
While consuming off a Kafka topic which sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Reading Data In
Fan Out Pattern using minPartitions to Use Kafka Efficiently
Overload protection using maxOffsetsPerTrigger
More Apache Spark Settings used to optimize Throughput
MicroBatching Best Practices
Map() +ForEach() vs MapPartitons + forEachPartition
Adobe Spark Speculation and its Effects
Calculating Streaming Statistics
Windowing
Importance of the State Store
RocksDB FTW
Broadcast joins
Custom Aggegators
OffHeap Counters using Redis
Pipelining
Have you ever wondered what the relative differences are between two of the more popular open source, in-memory data stores and caches? In this session, we will describe those differences and, more importantly, provide live demonstrations of the key capabilities that could have a major impact on your architectural Java application designs.
At the StampedeCon 2015 Big Data Conference: YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the features of the Capacity Scheduler that enable Multi-Tenancy and how resource sharing can be rebalanced using features like Preemption.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
Activision Data team has been running a data pipeline for a variety of Activision games for many years. Historically we used a mix of micro-batch microservices coupled with classic Big Data tools like Hadoop and Hive for ETL. As a result, it could take up to 4-6 hours for data to be available to the end customers.
In the last few years, the adoption of data in the organization skyrocketed. We needed to de-legacy our data pipeline and provide near-realtime access to data in order to improve reporting, gather insights faster, power web and mobile applications. I want to tell a story about heavily leveraging Kafka Streams and Kafka Connect to reduce the end latency to minutes, at the same time making the pipeline easier and cheaper to run. We were able to successfully validate the new data pipeline by launching two massive games just 4 weeks apart.
파이썬 데이터과학 레벨2 - 데이터 시각화와 실전 데이터분석, 그리고 머신러닝 입문 (2020년 이태영)Tae Young Lee
파이썬 데이터과학 레벨2 - 데이터 시각화와 실전 데이터분석, 그리고 머신러닝 입문 (2020년 이태영)
- 코스피 LG유플러스 주가분석, 대한민국 부동산 분석, 강남 아파트 매매 분석, VISA 보고서 분석, 워드클라우드 등
- 국내 어떤 책에서도 다루지 않는 진짜 데이터분석 강의
- (귀차니즘에..) 소수 금융권/대기업/공기업에게만 강의된 자료
2. 목차
1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
3. 컴파일러
• 프로그래밍 언어를 목적 언어로 변환하는 과정
Source
Code Lexer
Parser
Semantic Analyzer
Intermediate
Code Generator
Code Optimizer
Code Generator
Target
Code
Token Stream
Parse Tree
Abstract Syntax Tree
Intermediate Representation (IR)
Optimized IR
4. 목적
• 추상 구문(Abstract Syntax)과 추상 구문 트리(Abstract Syntax
Tree, AST)의 생성 방법에 대해 학습한다
• 의미 해석(Semantic Analysis)에 대해서 학습한다
• 변수 참조의 해결
• 타입 참조의 해결
• 타입 정의의 체크
• 식의 타당성 체크
• 타입 체크
5. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
6. Parse Tree
- 4 + 6 * 8
MINUS INT(4) PLUS INT(6) TIMES INT(8)
exp
expMINUS
PLUSexp exp
exp TIMESexp
INT(4) INT(6) INT(8)
Source Code
Token Stream
Parse Tree
LEXER
PARSER
7. Parse Tree
• 파싱 규칙(Parsing Rule)에 따라 소스코드를 파싱한 결과
• 문맥(Syntax) 규칙만 있을 뿐 의미(Semantic) 규칙이 없음
• 소스코드 구문 분석 친화적 <-> 목적코드 생성에 불리함
exp
expMINUS
PLUSexp exp
exp TIMESexp
INT(4) INT(6) INT(8)
Parsing Tree
exp의 코드 생성을 위해
다른 노드도 확인
8. Abstract Syntax Tree
• 의미 규칙(Semantic Rule)의 모음
• 의미 규칙 하나가 목적코드 생성과 바로 연결됨
• 목적코드 생성을 위한 의미 해석(Semantic Analysis)에 유리
AddExp
MinusExp MultExp
INT(4) INT(6) INT(8)
Abstract Syntax Tree
exp
expMINUS
PLUSexp exp
exp TIMESexp
INT(4) INT(6) INT(8)
Parsing Tree
9. Parse Tree -> AST
Parsing Rule
prog -> ( stmt )
stmt -> stmt ; stmt
stmt -> ID = expr
expr -> expr + expr
expr -> ID
expr -> NUM
Parsing Target
( a = 4 ; b = a + 5 )
prog
stmt( )
stmt stmt;
ID(“a”) = expr ID(“b”)
NUM(4)
expr
ID(“5”)
=
ID(“a”) +
Parse Tree
10. Parse Tree -> AST
Parsing Rule
prog -> ( stmt )
stmt -> stmt ; stmt
stmt -> ID = expr
expr -> expr + expr
expr -> ID
expr -> NUM
Semantic Rule
Program = {CompoundStmt}
CompoundStmt = {AssignStmt list}
AssignStmt = {DataId, Expr}
Expr -> {DataId | Int | AddExpr}
AddExpr = {Expr, Expr}
11. Parse Tree -> AST
prog
stmt( )
stmt stmt;
ID(“a”) = expr ID(“b”)
NUM(4)
expr
ID(“5”)
=
ID(“a”) +
Program
CompoundStmt
AssignStmt AssignStmt
DataId(“a”) AddExprDataId(“b”)
Int(5)DataId(“a”)
Int(4)
Parse Tree AST
의미 분석 필요
Parsing Target
( a = 4 ; b = a + 5 )
12. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
14. AST 생성
• Parser에서 파싱과정에서 생성
• 파싱과정에서 추가적인 의미 동작(Semantic Action)을
수행하여 AST를 생성
%token INT PLUS MINUS TIMES UMINUS
union {int num; string id;}
%token <num> INT
$token <id> ID
$type <num> exp
%start exp
%left PLUS MINUS
%left TIMES
%left UMINUS
%%
exp : INT {$$ = new Int($1);}
| exp PLUS exp {$$ = new Plus($1, $3);}
| exp MINUS exp {$$ = new Minus($1, $3);}
| exp TIMES exp {$$ = new MultiplyExp($1, $3);}
| MINUS exp %prec UMINS {$$ = new MinusExp($2)}
15. AST 생성
• 파스 트리를 순회하면서 생성, 패턴에 따라서 Visitor 방식
과 Listener 방식으로 구분
• Visitor 방식 : 각각의 Node가 하위 Node를 방문하는 함수를
직접 호출하는 형식
• Listener 방식 : 파스 트리를 탐사하면서 각 노드마다 방문,
탈출하는 함수가 호출됨
visit(ProgCtx ctx) {
return new Program(visit(ctx.stmt()));
}
visit(StmtCtx ctx) {
…
}
enter(ProgCtx ctx) {
currentProgram = new Program();
}
exit(ProgCtx ctx) {
…
}
18. Visitor VS. Listener
• Visitor 패턴
• 상위 노드가 하위 노드의 결과값에 접근할 수 있다.
• 필요한 노드만 선택적으로 방문할 수 있다.
• Listener 패턴
• Global Context를 계속 유지하기 때문에 여러 노드의 정보를 혼
합하기에 유리하다.
• 모든 노드에 방문하는 것이 보장된다.
19. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
20. Semantic Analysis
• 추상 구문에서 코드를 생성하기 위해 필요한 의미를 분석
• 언어의 특성마다 의미 분석의 범주가 달라짐
• C계열 언어의 경우
• 변수 참조의 해결
• 타입 참조의 해결
• 타입 정의의 체크
• 식의 타당성 체크
• 타입 체크
• …
• 의미 분석의 결과는 추상 구문 트리에 정보를 추가한 형태
변수 참조의 해결 타입 참조의 해결
타입 정의의 체크
식의 타당성 체크
타입 체크
21. 언어학적 비유
• 언어 해석에서 문장이 가지는 의미해석과 유사
• Joon is hungry. He wants to eat something.
• (고유 명사) (be 동사) (형용사) (대명사) (동사)
(to 부정사) (명사)
• 문장1: (고유 명사) (be 동사) (형용사)
• 문장2: (대명사) (동사) (to 부정사) (명사)
• 대명사 He는 Joon을 가리킨다
• Joon과 He는 남자 타입을 갖는다
• 문장 2의 wants는 to 부정사를 목적어로 갖는다
• He가 가리키는 대상이 여자 타입인지 확인한다
변수 참조의 해결
타입 참조의 해결
식의 타당성 체크
타입 체크
Lexing
Parsing
22. Semantic Analysis 개요
Program
CompoundStmt
AssignStmt AssignStmt
DataId(“a”) AddExprDataId(“b”)
Int(5)DataId(“a”)
Int(4)
AST
Parsing Target
( a = 4 ; b = a + 5 )
Program
CompoundStmt
AssignStmt (Type O)AssignStmt(Type O)
DataId(“a”)
Id:a
type:int
AddExpr
type:int
DataId(“b”)
Id:b
type:int
Int(5)
type:int
DataId(“a”)
Id:a
type:int
Int(4)
type:int
AST’
23. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
24. 변수 참조 해결
• 변수
• 정의(definition)와 사용(use)은 연결되어 있어야 함
• 변수 이름을 기준으로 연결(Binding)
• 예)
• int x = 3;
• Int y = x + 1;
• 변수의 생명 기한(life time)과 범위(Scope)에 따라서 변수 해결
규칙이 달라짐
• 예)
• int sum = 0;
• for(int i = 0; i < 10; ++i) {
• sum += i;
• }
• printf (“%dn”, i);
Binding
25. Symbol Table
• 소스 코드 상에서 정의되는 이름(Symbol)의 모음
• 범위(Scope)마다 이름을 저장하며 트리 형태로 구성
• 연결 리스트, 해쉬 테이블 등을 이용하여 표현
Global Scope
main Scope func Scope
if1 Scope else1 Scope if2 Scope
int global_a, global_b;
int main () {
int main_a;
if (main_a > 10) {
int if_a;
} else {
int else_a;
}
}
int func() {
int global_a;
if (global_a > 10) {
int if_a;
}
}
global_a
global_b
main_a
if_a else_a if_a
global_a
26. Symbol Table 구현
• 해쉬 테이블(Hash Table) 구현 예제
• 해쉬 함수: 특정 키 값을 바탕으로 한정된 값을 가리키는 함수
• 예) hash(x) = x mod 10
0 1 n-1
b -> int
a -> int
c -> float d -> int
void main() {
int a = 1;
int b = 2;
float c = 3.4;
int d = a + b;
for(int i = 1; i < a; ++i) {
float a = 1.0;
c += a;
}
}
main
scope
0 1 n-1
a -> float i -> int
for
scope
27. Scope in OOP
• C++ Scope
• Local scope : 함수나 구문 블록에서 선언하여 해당 블록에서만
보임
• Class scope : 클래스 멤버들에게 보임
• Public: 같은 Namespace scope에서 보임
• Private: 클래스 멤버에게만 보임
• Namespace scope : 이름공간 블록에서 보임
• Using: 다른 Namespace의 이름을 볼 수 있음
• File scope : 파일 내에서 보임
• Include: 다른 파일의 이름을 볼 수 있음
• Global scope : 프로그램 전체에서 보임
• Java 접근제어자
• Public, Protected, Default, Private
28. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
29. Type
• 데이터 타입
• 데이터가 갖는 값의 의미적인 해석 방법에 따른 분류
• 예) 0x65, char type -> ‘A’, int type-> ’0x65’
• 타입의 구성
• 기본타입
• Integer
• Floating point/Fixed point
• String
• Boolean
• 복합타입
• Pointer
• Structure, Class
• Array, List, Map, …
• Polymorphism, Generic, …
• Function
30. Type
• 타입 시스템
• 데이터, 표현식이 갖는 타입에 대한 규칙 정의의 모음
• 예)
• (int) + (int) -> (int), (float) + (float) -> (float)
• int main(int argc, char** argv) : (int), (char**) -> (int)
• 타입 규칙에 맞지 않은 타입의 식이 들어오면 문제가 발생
• 해야 할 일
• 타입 참조의 해결
• 타입 정의의 체크
• 식의 타당성 체크
• 타입 체크
변수 참조의 해결 타입 참조의 해결
타입 정의의 체크
식의 타당성 체크
타입 체크
31. 타입 참조의 해결
• 타입에 대한 정의를 해결
• 함수, 구조체, 클래스, typedef 정의 시 새로운 복합타입이 정의
타입 규칙에 추가
• 예)
• int main(int argc, char** argv);
• struct 2dPoint {
int x; int y;
};
• typedef list<2dPoint> PointList;
• 타입에 대한 참조를 해결
• 변수 선언 시, 변수의 정의에 대해 타입 해결
• 예)
• int a, float b, 2dPoint X;
33. 식의 타당성 체크
• 문법(Parsing Rule) 상으로는 문제가 없으나 값을 구할 수 없
는 식에 대한 오류를 체크한다.
• 대입할 수 없는 식에 대입
• 예: 1= 2+2
• 함수가 아닌 값을 사용한 함수 호출
• 예: “string”(“%dn”, i)
• 피연산자가 적합하지 않은 배열 참조
• 예: 1[0]
• 피연산자가 적합하지 않은 멤버 참조
• 예: 1.memb
• 피연산자가 적합하지 않은 포인터 간접 멤버 창조
• 예: 1->memb
• 포인터가 아닌 값의 역참조
• 예: *1
• Lvalue가 아닌 식의 주소 획득
• 예: &1
34. Lvalue와 Rvalue
• 표현식은 Lvalue와 Rvalue로 구분
• Lvalue: 이름이 있는 개체, const 변수를 비롯한 모든 변수
• Rvalue: Rvalue를 사용하는 식 외에서는 유지되지 않는 임시 값
https://msdn.microsoft.com/ko-kr/library/f90831hc.aspx
35. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
36. 타입 체크
• 타입 규칙에 따라 표현, 구문 등이 올바른 타입을 가지고 있
는지 검증
• 예) C 언어의 타입 규칙 중 일부
• 타입 체크 방법론
• 정적 타입 체크
• 타입 유추
http://www.cs.arizona.edu/~debray/Teaching/CSc453/DOCS/cminusminusspec.html
37. 정적 타입 체크
• Visitor 패턴을 이용하여 최하위 노드부터 타입을 확정하고
상위 표현 식의 타입을 결정 (Bottom Up 방식)
Parsing Target
( int a = 4 ; int b = a + 5 )
Program
CompoundStmt
AssignStmt AssignStmt
DataId(“a”)
Id:a
type:int
AddExprDataId(“b”)
Id:b
type:int
Int(5)
type:int
DataId(“a”)
Id:a
type:int
Int(4)
type:int
AddExpr의 두 Operand는 타입이 같아야 하며 AddExpr의
타입은 두 Operand의 타입이 됨
첫 번째 Operand는 int type (타입 해결에서 결정됨)
두 번째 Operand는 int type (타입 해결에서 결정됨)
따라서 AddExpr은 타입 체크를 통과하며 int type을
가짐
AssignStmt의 두 Operand는 타입이 같아야 함
첫 번째 Operand는 int type (타입 해결에서 결정됨)
두 번째 Operand는 아직 타입이 정해지지 않음
(AddExpr의 타입체크를 중도에 호출)
38. 암묵적 형변환
• 타입 체크 중에 타입이 불일치하지만 상위 타입으로 변환
이 가능한 경우 암묵적 형변환
http://www.tutorialspoint.com/cprogramming/c_type_casting.htm
Parsing Target
( int a = 4 ; float b = a + 5 )
Program
CompoundStmt
AssignStmt
DataId(“a”)
Id:a
type:int
AddExpr
type:int
DataId(“b”)
Id:b
type:float
Int(5)
type:int
DataId(“a”)
Id:a
type:int
Int(4)
type:int
AssignStmt
CastExpr
type:float
39. 타입 유추
• 표현이 어떤 타입을 가지고 있는지 추론 (Top Down 방식)
• “결론”이 성립하면 “가정”도 성립할 것.
• 예: 1+a의 타입추론
• 결론: 1+a가 int type을 갖는다
• 가정 1: 1이 int type을 갖는다
• 가정 2: a가 int type을 갖는다
• 타입에 대한 명세가 적은 언어(Javascript, Lisp 등..)에서 유
용
40. 타입 유추 표기
문장: ⊢ e1: int⊢ (Expr) : (Type)
(결론)
(가정1) (가정2) … (가정 n)명제:
⊢ e1 + e2: int
⊢ e1: int ⊢ e2: int
• Numeric
• Add
• Assign
expr -> NUM
expr -> expr + expr
expr -> id = expr⊢ NUM: int
⊢ e1 + e2: int
⊢ e1: int ⊢ e2: int
⊢ 1 + 2: int
⊢ 1: int ⊢ 2: int
1 + 2
⊢ id = e: int->int
⊢ id: int ⊢ e: int
41. 타입 유추 예제
• 변수?
• 타입의 환경
𝑥 𝑖𝑠 𝑎 𝑣𝑎𝑟𝑖𝑎𝑏𝑙e
⊢ 𝑥: ?
Γ[𝑥: 𝑇] ⊢ 𝑥: T
function test() {
var a = 1;
var b = a + 2;
}
[a: int] ⊢ a: int
⊢ a = 1: int->int
⊢ 1: int
[a: int] ⊢ a: int
[a: int] ⊢ a + 2: int
⊢ 2: int
[a: int, b: int] ⊢ b: int
[a: int] ⊢ b = a + 2: int->int
[a: float] ⊢ a: float
⊢ a = 1: float->float
⊢ 1: int
43. 정리
• 파스 트리로 부터 추상 구문 트리를 생성
• 추상 구문 트리는 의미 규칙의 모음
• 의미 해석에 유용
• 트리 탐색은 Visitor 패턴과 Listener 패턴
• 의미 해석
• 변수 참조의 해결
• 타입 참조
• 타입 참조, 정의의 해결
• 타입 정의의 체크
• 식의 타당성 체크
• Lvalue, Rvalue
• 타입 체크
• 정적 타입 체크
• 타입 유추
44. 1. 개요
2. Abstract Syntax
1. AST 정의
2. AST 생성
3. Semantic Analysis
1. 변수 참조의 해결
2. 타입 참조의 해결
3. 타입 정의의 체크
4. 식의 타당성 체크
5. 타입 체크
4. Appendix: Type
45. 타입 체크
• 타입 체크 관점에 따른 분류
• 정적/동적 타입 체크
• 강한/약한 타입 시스템
• 타입 안전성, 메모리 안전성
46. 정적/동적 타입 체크
• 정적 타입 체크
• 컴파일 타임에 일어나는 타입 체크
• 컴파일시 타입 체크
• 동적 타입 체크 및 런타임 타입 정보
• 런타임에 일어나는 타입 체크
• Runtime Type Information(RTTI)를 이용하여 추가적인 정보
• Dynamic Dispatch, Late binding
• Downcasting
• Reflection
• Duck Typing 지원
• 정적, 동적 복합 타입 체크
• 정적 타입 체크 + 동적 타입 체크에 대한 지원
• 예) C++의 dynamic_cast, java의 instanceOf
47. Duck Typing
• 실행 타임에 호출하는 함수에 대해 타입과 관련하여 정보
를 확인하여 있으면 실행을, 없으면 런타임 에러를 생성
• 상속없이 다형성을 지원
function calculate(a, b, c) => return (a + b)*c
example1 = calculate (1, 2, 3)
example2 = calculate ([1], [2, 3], 2)
example3 = calculate ('apples ', 'and oranges, ', 3)
print to_string example1
print to_string example2
print to_string example3
9
[1, 2, 3, 1, 2, 3]
apples and oranges, apples and oranges, apples and oranges,
48. 강한/약한 타입 시스템
• 강한 타입
• 타입에 대하여 명확히 명시하여 선언, 사용
• 타입 변환(Casting)에 대해서 명시적 사용
• Java, C
• 약한 타입
• 타입 변환에 대해서 암묵적 사용
• 타입에 대한 선언이 없음 (내부적인 타입 시스템만 동작)
• 타입에 대한 추론(Type punning)을 내부적으로 수행
• Javascript, Python
49. 타입 안전성, 메모리 안전성
• 타입 안전성(Type Safety)
• 타입 시스템의 타입 규칙과 형변환 규칙에 의거하여 프로그램
이 완전성(Completeness)과 안전함(Soundness)를 보장
• 타입 안전성을 보장하는 언어는 타입 이론(Type Theory)과
타입 추론(Type Inference)에 의해 타입 안정성을 증명 가능
• 가령,
• C는 unsigned int에서 pointer로의 강제 형변환 등을 지원함에 따라
타입 시스템이 프로그램의 안전성을 보장하지 않음.
• Java는 Primitive Type과 Reference Type의 구분으로 타입 안전성을
강제하도록 언어가 디자인 됨.
• ArrayList<int> X
• ArrayList<Integer> O
• 메모리 안전성(Memory Safety)
• 메모리 참조가 타입에 의해 안전한지 보장
• C의 포인터, 배열 참조는 메모리 안전성을 보장하지 않음