SlideShare a Scribd company logo
1 of 16
Eun-Jae Kim
Mobile Media Processor Lab., Sejong University
ejkim@rayman.sejong.ac.kr
http://rayman.sejong.ac.kr
July 27, 2017 1
 Abstract
◦ Bucketing chance node event와 high-level policy networks
를 사용하여 MCTS를 개선시킴.
 Search complexity를 줄임(in-tree phase)
 Move selection 개선함(rollout phase)
◦ 두 가지의 방법을 통해 하스스톤에 적용한 결과
기존 방식보다 더 좋은 성능을 보여줌.
 Introduction
◦ 실제 사람들이 하스스톤 플레이 할 때 최적의 플레이가 아닌
플레이들은 생각하지 않고 좋은 플레이 후보들만 골라서 선
택하는 메커니즘에서 아이디어를 착안함.
◦ Chance move bucketing, Pre-sampling, High-level rollout
policies 등을 이용한 MCTS 방식으로 성능을 올림.
July 27, 2017 2
 Chance Event Bucketing and Pre Sampling
◦ 유사한 chance event들을 bucket 그룹화시킴.
◦ 각 bucket에 pre-sampling을 함.
 Chance event 수를 줄임
July 27, 2017 3
- 네모 : Decision node
- 원 : Chance node
- Search space와 bucket abstraction에
따라서 bucket의 수(M)와 bucket안의
Node 수(N)가 달라짐.
Ex) 만약 state가 간단하면 M은 작음.
만약 bucket안에 노드가 매우 다르면
N의 수가 증가함.
 Learning High-Level Rollout Policies
◦ Ex) 플레이 할 카드를 선택하는 것은 High-level action
선택한 카드에서 타겟을 선택하는 것은 Low-level action
◦ High-level action을 선택할 때 가장 강력한 action을 선택함.
 (Ex) 공격력의 합이 제일 쌘 거)
◦ High-level action이 선택되면 그에 따르는 Low-level action
들을 검색함.
July 27, 2017 4
 Determinized MCTS
◦ 현재 알고 있는 정보를
기반으로 미리 world를
샘플링함.
◦ 여러 개의 world에서
같은 action을 공유하면
그 action을 선택 할
확률이 올라감.
◦ 최종적으로 가장 빈번하게
방문한 곳이 리턴됨.
July 27, 2017 5
 Search Time Budget
◦ 여러 개의 action들 중에서 마지막 action이
낮은 방문 수를 가짐.
◦ T 대신 T * 𝛽 를 할당. (T는 탐색 시간, 𝛽는…?)
◦ 만약 𝜓 보다 방문 횟수가 적으면 그때 (1 - 𝛽) * T
를 할당하고 새로운 탐색 시작함.
 Empirical Chance Event Bucketing
◦ 비슷한 마나 코스트를 가진 카드들은 보통 비슷한 힘을
가지고 있다는 걸 이용하여 bucket을 마나 코스트 기준으로
나눔.
July 27, 2017 6
 Experiments
◦ DUCT만 적용한 silverfish 를 가지고 기존 silverfish랑
대결한 결과 대부분 우위를 가짐.
◦ DUCT + CNB(Chance node bucketing) 적용한 silverfish는
더 압도적인 승률로 기존 silverfish를 이김.
July 27, 2017 7
 Learning high-level rollout policies
◦ MCTS rollout phase에서 heartstone card play decisions
을 신경망을 통해 학습시킴.
 Card-Play Policy Networks
◦ 하나의 game state n를 카드 확률 벡터로 매핑시킴.
◦ 수준 높은 플레이어가 쓰는 카드셋을 가지고 신경망을
훈련시킨 후 high-level rollout policies를 적용 시키는 게
목표임.
July 27, 2017 8
 State Features
◦ Global feature
 영웅의 HP, 다음 턴에 사용할 수 있는 마나,
나의 하수인 전체 데미지와 적 하수인 전체 체력보다 더 큰지 등
 1D Vector로 표현
◦ Hand feature
 각 플레이어에 손에 가지고 있는 카드와 카드 수.
 2D Vector로 표현
◦ Board feature
 하수인을 소환할 카드의 인덱스와 그 카드의 체력을 나타냄.
 3D Vector로 표현
July 27, 2017 9
 Training Data
◦ 3가지 덱을 사용(Mech Mage, Handlock, Face Hunter deck)
◦ Move당 10,000가지 rollouts을 사용하는 27,000가지
open-hand games 구성
◦ 총 4백만 개의 sample 사용.
 Network Architecture and Training
◦ 1. CNN + Merge
◦ 2. DNN + Merge
 Experiment Setup
◦ 그래픽 카드 : NVIDIA GeForce GTX 860M
램 : RAM 8GB
라이브러리 및 프레임워크 : CUDA 7.5 and cuDNN4
Keras 1.1.2 with the Theano 0.8.2
PhthonNet
July 27, 2017 10
 High-level Move Prediction Accuracy
◦ CNN + Merge, DNN + Merge, Silverifish, Greedy
방식을 비교하여 다음에 할 action의 예측 정확도를 측정.
◦ Silverfish는 3 depth까지 가능함.
◦ Greedy는 아래와 같은 휴리스틱 함수 사용함.
July 27, 2017 11
 Playing Games
July 27, 2017 12
 Incorporating Card-Play Network into DUCT
◦ Roll out function을 오른쪽과 같이 교체함.
 일반적인 roll out 보다 5~10배 느림
그러나 CNN 방식보다 10배 빠름.
◦ 처음에 DUCT만 적용한 모델과
500번의 게임을 돌림.
July 27, 2017 13
 Conclusions
 Future work
◦ Perfect information state에 의존함.
 Recurrent networks 사용 해볼 예정임.
◦ 구현의 문제로 인해 Silverfish에서 Metastone으로
바꿔서 할 예정임.
July 27, 2017 14
Thank You for Your Listening!
http://rayman.sejong.ac.kr
July 27, 2017 15
July 27, 2017 16

More Related Content

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

[Paper Review] Improving Heartstone AI by Learning High-Level Rollout Policies and Bucketing Chance Node Events

  • 1. Eun-Jae Kim Mobile Media Processor Lab., Sejong University ejkim@rayman.sejong.ac.kr http://rayman.sejong.ac.kr July 27, 2017 1
  • 2.  Abstract ◦ Bucketing chance node event와 high-level policy networks 를 사용하여 MCTS를 개선시킴.  Search complexity를 줄임(in-tree phase)  Move selection 개선함(rollout phase) ◦ 두 가지의 방법을 통해 하스스톤에 적용한 결과 기존 방식보다 더 좋은 성능을 보여줌.  Introduction ◦ 실제 사람들이 하스스톤 플레이 할 때 최적의 플레이가 아닌 플레이들은 생각하지 않고 좋은 플레이 후보들만 골라서 선 택하는 메커니즘에서 아이디어를 착안함. ◦ Chance move bucketing, Pre-sampling, High-level rollout policies 등을 이용한 MCTS 방식으로 성능을 올림. July 27, 2017 2
  • 3.  Chance Event Bucketing and Pre Sampling ◦ 유사한 chance event들을 bucket 그룹화시킴. ◦ 각 bucket에 pre-sampling을 함.  Chance event 수를 줄임 July 27, 2017 3 - 네모 : Decision node - 원 : Chance node - Search space와 bucket abstraction에 따라서 bucket의 수(M)와 bucket안의 Node 수(N)가 달라짐. Ex) 만약 state가 간단하면 M은 작음. 만약 bucket안에 노드가 매우 다르면 N의 수가 증가함.
  • 4.  Learning High-Level Rollout Policies ◦ Ex) 플레이 할 카드를 선택하는 것은 High-level action 선택한 카드에서 타겟을 선택하는 것은 Low-level action ◦ High-level action을 선택할 때 가장 강력한 action을 선택함.  (Ex) 공격력의 합이 제일 쌘 거) ◦ High-level action이 선택되면 그에 따르는 Low-level action 들을 검색함. July 27, 2017 4
  • 5.  Determinized MCTS ◦ 현재 알고 있는 정보를 기반으로 미리 world를 샘플링함. ◦ 여러 개의 world에서 같은 action을 공유하면 그 action을 선택 할 확률이 올라감. ◦ 최종적으로 가장 빈번하게 방문한 곳이 리턴됨. July 27, 2017 5
  • 6.  Search Time Budget ◦ 여러 개의 action들 중에서 마지막 action이 낮은 방문 수를 가짐. ◦ T 대신 T * 𝛽 를 할당. (T는 탐색 시간, 𝛽는…?) ◦ 만약 𝜓 보다 방문 횟수가 적으면 그때 (1 - 𝛽) * T 를 할당하고 새로운 탐색 시작함.  Empirical Chance Event Bucketing ◦ 비슷한 마나 코스트를 가진 카드들은 보통 비슷한 힘을 가지고 있다는 걸 이용하여 bucket을 마나 코스트 기준으로 나눔. July 27, 2017 6
  • 7.  Experiments ◦ DUCT만 적용한 silverfish 를 가지고 기존 silverfish랑 대결한 결과 대부분 우위를 가짐. ◦ DUCT + CNB(Chance node bucketing) 적용한 silverfish는 더 압도적인 승률로 기존 silverfish를 이김. July 27, 2017 7
  • 8.  Learning high-level rollout policies ◦ MCTS rollout phase에서 heartstone card play decisions 을 신경망을 통해 학습시킴.  Card-Play Policy Networks ◦ 하나의 game state n를 카드 확률 벡터로 매핑시킴. ◦ 수준 높은 플레이어가 쓰는 카드셋을 가지고 신경망을 훈련시킨 후 high-level rollout policies를 적용 시키는 게 목표임. July 27, 2017 8
  • 9.  State Features ◦ Global feature  영웅의 HP, 다음 턴에 사용할 수 있는 마나, 나의 하수인 전체 데미지와 적 하수인 전체 체력보다 더 큰지 등  1D Vector로 표현 ◦ Hand feature  각 플레이어에 손에 가지고 있는 카드와 카드 수.  2D Vector로 표현 ◦ Board feature  하수인을 소환할 카드의 인덱스와 그 카드의 체력을 나타냄.  3D Vector로 표현 July 27, 2017 9
  • 10.  Training Data ◦ 3가지 덱을 사용(Mech Mage, Handlock, Face Hunter deck) ◦ Move당 10,000가지 rollouts을 사용하는 27,000가지 open-hand games 구성 ◦ 총 4백만 개의 sample 사용.  Network Architecture and Training ◦ 1. CNN + Merge ◦ 2. DNN + Merge  Experiment Setup ◦ 그래픽 카드 : NVIDIA GeForce GTX 860M 램 : RAM 8GB 라이브러리 및 프레임워크 : CUDA 7.5 and cuDNN4 Keras 1.1.2 with the Theano 0.8.2 PhthonNet July 27, 2017 10
  • 11.  High-level Move Prediction Accuracy ◦ CNN + Merge, DNN + Merge, Silverifish, Greedy 방식을 비교하여 다음에 할 action의 예측 정확도를 측정. ◦ Silverfish는 3 depth까지 가능함. ◦ Greedy는 아래와 같은 휴리스틱 함수 사용함. July 27, 2017 11
  • 12.  Playing Games July 27, 2017 12
  • 13.  Incorporating Card-Play Network into DUCT ◦ Roll out function을 오른쪽과 같이 교체함.  일반적인 roll out 보다 5~10배 느림 그러나 CNN 방식보다 10배 빠름. ◦ 처음에 DUCT만 적용한 모델과 500번의 게임을 돌림. July 27, 2017 13
  • 14.  Conclusions  Future work ◦ Perfect information state에 의존함.  Recurrent networks 사용 해볼 예정임. ◦ 구현의 문제로 인해 Silverfish에서 Metastone으로 바꿔서 할 예정임. July 27, 2017 14
  • 15. Thank You for Your Listening! http://rayman.sejong.ac.kr July 27, 2017 15

Editor's Notes

  1. Thank you for listening to my presentation