AWS CLOUD 2018- Amazon Neptune, 신규 그래프 데이터베이스 서비스 (김상필 솔루션즈 아키텍트)

Amazon Neptune
신규 그래프 데이터베이스 서비스
김상필 / AWS 솔루션즈 아키텍트

• 상호 연결성이 높은 데이터 활용 응용 프로그램 구축
• 그래프의 종류 및 쿼리 방법
• 프로퍼티 그래프 및 Apache TinkerPop을 이용한 소셜 친구 추천 예제
• 완전 관리형 그래프 데이터 베이스 Amazon Neptune 상세
목차

상호 연결성이 높은 데이터 활용
응용 프로그램 구축

상호 연결성이 높은(Highly Connected) 데이터
상거래 부정 탐지레스토랑 추천소셜 네트워크

상호 연결성이 높은 데이터의 활용 사례
소셜 네트워킹
생명 과학 네트워크 및 IT 운영부정 탐지
추천 지식 그래프

관계 기반의 추천 시스템

지식 그래프 어플리케이션
The Louvre에 작품이 있는 작
가들은?
모나리자를 그린 사람은?
Alice가 Paris에 있는 동안 방문
할 museums 들은?

글로벌 조세 정책 웹 탐색
“Our customers are increasingly required to navigate a complex web of global tax policies and r
egulations. We need an approach to model the sophisticated corporate structures of our largest
clients and deliver an end-to-end tax solution. We use a microservices architecture approach for
our platforms and are beginning to leverage Amazon Neptune as a graph-based system to quic
kly create links within the data.”
said Tim Vanderham, chief technology officer, Thomson Reuters Tax & Accounting

Challenges Building Apps with Highly Connected Data
상호 연결성이 높은 데이터의 관계형 데이터베이스 처리 어려움
그래프 쿼리의
부자연스러움
그래프 프로세싱의
비효율성
데이터 변경에 유연하지 않
은 고정된 스키마

상호 연결성이 높은 데이터에 대한 다른 접근 방법
비지니스 프로세스에 적합한 구조
관계 파악에 적합한 구조

그래프 데이터베이스
상호 연결성 높은 데이터의 처리 및 스토리지에 최적화

오픈 소스 Apache TinkerPop
Gremlin Traversal Language
주요 그래프 모델 및 프레임워크
W3C 표준
SPARQL Query Language
RESOURCE DESCRIPTION
FRAMEWORK (RDF)PROPERTY GRAPH

기존 그래프 데이터베이스에서의 어려움
고가용성 유지의
어려움
확장의 어려움
오픈 표준에 대한
제한적 지원
높은 비용

AMAZON NEPTUNE – 완전 관리형 그래프 데이터베이스
뛰어난 성능 높은 가용성 오픈 그래프
수십억개의 관계를
밀리초 단위 지연시간
으로 쿼리
3개 가용영역 6 개 복제
백업 및 복구
Gremlin 및 SPARQL
를 통한 강력한 쿼리
손쉽게 작성
Apache TinkerPop 및
W3C RDF 그래프 모델
완전 관리형
NEW!

AMAZON NEPTUNE 아키텍처
Amazon S3
벌크 로드
데이터베이
스 관리

그래프의 종류 및 쿼리 방법

A property graph is a set of vertices and edges with respective properties (i.e.
key/value pairs)
• Vertex represents entities/domains
• Edge represents directional relationship
between vertices.
• Each edge has a label that denotes the
type of relationship
• Each vertex & edge has a unique identifier
• Vertex and edges can have properties
• Properties express non-relational information about the vertices and edges
프로퍼티 그래프
FRIENDname: Bi
ll
name: Sara
h
UserUser
Since 11/29/16

• Apache TinkerPop
Open source graph computing framework for Property
Graph
• Gremlin
Graph traversal language used to analyze the graph
프로퍼티 그래프 및 APACHE TINKERPOP
Amazon Neptune is fully compatibility with Tinkerpop Gremlin 3.3.0 (latest v
ersion released August 2017) and provides optimized query execution engine
for Gremlin query language.

//Connect to Neptune and receive a remote graph, g.
user1 = g.addVertex (id, 1, label, "User", "name", "Bill");
user2 = g.addVertex (id, 2, label, "User", "name", "Sarah");
...
user1.addEdge("FRIEND", user2, id, 21);
TINKERPOP 그래프 생성
Gremlin (Apache TinkerPop 3.3)
FRIEND
name: Bi
ll
name: Sara
h
User
User

• RDF Graphs are described as a collection of triples: subject, predicate, and object.
• Internationalized Resource Identifiers (IRIs) uniquely identify subjects.
• The Object can be an IRI or Literal.
• A Literal in RDF is like a property and RDF supports the XML data types.
• When the Object is an IRI, it forms an “Edge” in the graph.
RDF 그래프
<http://www.socialnetwork.com/person#1>
rdf:type contacts:User;
contact:name: ”Bill” .
subject
predicate
Object (literal)
name: Bi
ll
User
<http://www.socialnetwork.com/person#1>IRI
<http://www.socialnetwork.com/person#1>
contacts:friend
<http://www.socialnetwork.com/person#2> .
subject
predicate
Object (IRI)
FRIEND
#1 2#2

그래프 VS. 관계형 데이터베이스 모델링
* Source : http://www.playnexacro.com/index.html#show:article
관계형 모델 그래프 모델
CompanyName:
Acme
…
Customers
OrderDate: 8/1/
2017
…
Order
PURCHASED
HAS_DETAILS
UnitPrice: $179.9
9
…
Order D
etailsProductName: “E
cho”
…
Product
HAS_PRODUCT
CompanyName:
“Amazon”
…
SupplierSUPPLIES

SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o ON /* Join the customer from the order */
(c.CustomerID = o.CustomerID)
JOIN order_details AS od /* Join the order details from the order */
ON (o.OrderID = od.OrderID)
JOIN products as p /* Join the products from the order details */
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = ’Echo'; /* Find the product named ‘Echo’ */
관계형 데이터베이스 SQL 쿼리
‘Echo’를 구매한 회사의 이름 조회

PREFIX sales_db: <http://sales.widget.com/>
SELECT distinct ?comp_name WHERE {
?customer <sales_db:HAS_ORDER> ?order ; #customer graph pattern
<sales_db:CompanyName> ?comp_name . #orders graph pattern
?order <sales_db:HAS_DETAILS> ?order_d . #order details graph pattern
?order_d <sales_db:HAS_PRODUCT> ?product . #products graph pattern
?product <sales_db:ProductName> “Echo” .
}
SPARQL DECLARATIVE 그래프 쿼리
* Source : http://www.playnexacro.com/index.html#show:article

/* All products named ”Echo” */
g.V().hasLabel(‘Product’).has('name',’Echo')
.in(’HAS_PRODUCT') /* Traverse to order details */
.in(‘HAS_DETAILS’) /* Traverse to order */
.in(’HAS_ORDER’) /* Traverse to Customer */
.values(’CompanyName’).dedup() /* Unique Company Name */
GREMLIN IMPERATIVE 그래프 탐색

프로퍼티 그래프 및 Apache TinkerPop을
이용한 소셜 네트워크 친구 추천 예제

TRIADIC CLOSURE – CLOSING TRIANGLES
FRIEND
FRIEND
Terry
Bill
Sarah
FRIEND

새로운 커넥션의 추천
Terry

현재 친구 관계
FRIEND
Terry
Bill

의미 및 동기
FRIEND
FRIEND
Terry
Bill
Sarah

추천
FRIEND
FRIEND
Terry
Bill
Sarah

새로운 커넥션의 추천
g = graph.traversal()
g.V().has('name','Terry').as('user').
both('FRIEND').aggregate('friends').
both('FRIEND').
where(neq('user')).where(neq('friends')).
groupCount().by('name').
order(local).by(values, decr)

TERRY를 검색
both('FRIEND').

TERRY의 친구를 검색
both('FRIEND').

그리고 친구의 친구를 검색
both('FRIEND').
user
friend
fof
FRIEND
FRIEND

... TERRY가 아니면서 TERRY의 친구가 아닌
both('FRIEND').
user
friend
fof
X
FRIEND
FRIEND

완전 관리형 그래프 데이터 베이스
Amazon Neptune 상세

완전 관리형 서비스
콘솔에서 손쉽게 구성 가능
Multi-AZ 고가용성
최대 15개 읽기 복제
저장 시 암호화
전송 시 암호화 (TLS)
백업 및 복구, 특정 시점으로 복원(point-i
n-time recovery)
장 점 들

AMAZON NEPTUNE: VPC 배포
• VPC 내에 보안성 높은 배포
• 복수 가용 영역에 걸친 서로 다른
서브넷에 배포를 통한 가용성 향상
• 클러스터의 볼륨은 항상 3개의 가용
영역에 확장되어 내구성 높은 스토리지
• VPC 구성 상세 - Amazon Neptune
Documentation

클라우드 네이티브 스토리지 엔진 개요
데이터는 3개의 가용 영역에 걸쳐 6개의 복제
Amazon S3에 연속 백업 (11 9s 내구성 설계)
노드 및 디스크 보수를 위한 지속적 모니터링
복구 및 핫스팟 재밸런스를 위한 단위로
10GB 세그먼트
읽기 및 쓰기를 위한 쿼럼 시스템 및 낮은
응답속도 유지
쿼럼 멤버십 변경에도 쓰기 영향 없음
스토리지 볼륨은 자동으로 64TB 까지 증가
AZ 1 AZ 2 AZ 3
Amazon S3
Amazon
Neptune
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Node
Storage
Monitoring

AMAZON NEPTUNE 고가용성 및 안정성
장애 가능 요소
세그먼트 장애 (디스크)
노드 장애(머신)
AZ 장애 (네트워크 및 데이터센터)
최적화
4 / 6 쓰기 쿼럼
3 / 6 읽기 쿼럼
복구를 위한 Peer-to-peer 복제
AZ 1 AZ 2 AZ 3
Caching
Amazon Nep
tune
AZ 1 AZ 2 AZ 3
Caching
Amazon Nep
tune

AMAZON NEPTUNE 읽기 복제
가용성
• 데이터베이스 노드 장애는 자동
감지 및 복구
• 데이터베이스 프로세스 장애는
자동 감지 및 재시작
• 읽기 복제는 필요 시 자동으로
프라이머리로 승격 (페일오버)
• 페일-오버 순서 사용자 지정
AZ 1 AZ 3AZ 2
Primary
Node
Primary
Node
Primary
Master
Node
Primary
Node
Primary
Node
Read Replic
a
Primary
Node
Primary
Node
Read Replic
a
Cluster
and Instance
Monitoring
성능
• 사용자 어플리케이션은 읽기
트래픽을 읽기 복제에 분산
• 읽기 복제에 읽기 부하 밸런싱

AMAZON NEPTUNE 신속한 페일 오버 (통상 <30초)
복제본-인지 어플리케이션 실행
장애 탐지 DNS 전파
복구
데이터베이스
장애
1 5 - 2 0 s e c 3 - 1 0 s e c
어플리케이션
실행

AMAZON NEPTUNE 지속적 백업
• 각 세그먼트의 주기적 스냅샷을 병렬로 수행 및 로그를 Amazon S3에
스트리밍 전송
• 백업은 지속적으로 발생하며 성능 및 가용성 영향 없이 수행
• 복구 시, 적절한 세그먼트 스냅샷을 반환 및 스토리지 노드에 로그 스트리밍
• 로그 스트림 적용은 병렬 및 비동기식 수행
Segment snapshot Log records
Recovery point
Segment 1
Segment 2
Segment 3
Time

AMAZON NEPTUNE 온라인 중 특정 시점으로 복원
온라인 특정 시점 복원(Online point-in-time restore)을 통해 백업으로부터
복구하지 않고 데이터베이스를 특정 시점으로 되돌리는 역할
• 데이터베이스를 신속하게 되돌리기
• 원하는 특정 시점으로 데이터베이스 상태 복구를 위하여 여러 차례
되돌리기
t0 t1 t2
t0 t1
t2
t3 t4
t3
t4
Rewind to t1
Rewind to t3
Invisible Invisible

• Amazon Neptune 은 완전 관리형 그래프 데이터베이스
• Apache TinkerPop 및 W3C RDF 그래프 모델 지원
• Gremlin 및 SPARQL 쿼리 언어 지원
• 콘솔에서 손쉽게 구성 가능
• Multi-AZ 고가용성
• 최대 15개 읽기 복제
• 저장 시 암호화 및 전송 시 암호화 (TLS)
• 백업 및 복구, 특정 시점으로 복원(point-in-time recovery)
체크포인트

1. AWS re:Invent 2017: NEW LAUNCH! Amazon Neptune Overview
and Customer Use Cases (DAT319) -
https://youtu.be/9pmQXua9LWA
2. AWS re:Invent 2017: NEW LAUNCH! Deep dive on Amazon
Neptune (DAT318) - https://youtu.be/6o1Ezf6NZ_E
3. Amazon Neptune 평가판 가입하기 -
https://pages.awscloud.com/NeptunePreview.html
본 강연이 끝난 후…

함께 해주셔서 감사합니다.

AWS CLOUD 2018- Amazon Neptune, 신규 그래프 데이터베이스 서비스 (김상필 솔루션즈 아키텍트)

More Related Content

What's hot

Similar to AWS CLOUD 2018- Amazon Neptune, 신규 그래프 데이터베이스 서비스 (김상필 솔루션즈 아키텍트)

More from Amazon Web Services Korea

AWS CLOUD 2018- Amazon Neptune, 신규 그래프 데이터베이스 서비스 (김상필 솔루션즈 아키텍트)