Multithread programming 20151206_서진택

멀티쓰레드 프로그래밍 :
01. 쓰레딩 기본
2014 년 10 월 8 일
서진택 , jintaeks@gmail.com

목차 index
 문장 statement 과 표현식 expression
 rvalue reference
 Move semantics
 쓰레드 thread
2

문장 statement 과 표현식 expression
rvalue reference
Move semantics
쓰레드 thread
3

 Statement 를 구성하는 일부로서 , 값 value 을 가지
면 expression 이라고 합니다 .
 Statement 전체가 값 value 을 가진다면 , 그
statement 는 expression 입니다 .
 expression 은 항상 statement 입니다 .
– i = i + 4;
– 위 문장에서 expression 은 다음과 같은 것들입니다 .
– i
– i+4
– i=i+4
4

아래의 소스에서 expression 인 문장은 2 개입니다 .
int GetSum( int iLeft_, int iRight_ )
{
return iLeft_ + iRight_;
}//GetSum()
void Test( const int& iData_ )
{
std::cout << __FUNCTION__ << " const reference" << std::endl;
}//Test()
int main()
{
int i = 3;
int j = 0;
j = GetSum( 4, 5 ); // 문장이 값 9 를 가지므로 expression 입니다 .
if( j == 9 )
{
printf( "%irn", j ); // 문장이 값 3 을 가지므로 expression 입니다 .
}//if
Test( i );
}5

표현식의 lvalue/rvalue
int main()
{
int i = 3;
int j = 0;
i = i + 4; //(1)
Test( i );
6
i 의 주소가 [5000] 번지라고 가정합니다 . 32bit 환경에서 j 의 주소는 [4996] 번지가 될 것입니다 .
같은 변수 i 에 대해서 i 가 등호 = 의 왼쪽에 사용되면 [5000] 으로 해석하고 , i 가 등호의 오른쪽에
사용되면 [5000] 번지가 가리키는 값 , 즉 3 으로 해석한다는 것에 주목하세요 .
즉 변수 i 는 사용되는 환경에 따라 expression 에서 평가되는 값 value 이 달라집니다 .
표현식은 값을 가지는데 , 표현식이 등호 = 의 왼쪽에 올 때 가져야 하는 값을 lvalue, 등호의 오른쪽
에 올 때 가져야 하는 값을 rvalue 라고 합니다 .
명심하세요 . lvalue/rvalue 는 문장이나 변수에는 해당하지 않습니다 . lvalue/rvalue 는 표현식
expression 이 가지는 값입니다 .
(1)문장에서 등호의 왼쪽에 있는 표현식 i 는 lvalue 를 가지며 , lvalue 의 값은 [5000] 입니다 . (1)
문장에서 등호의 오른쪽에 있는 표현식 i+4 는 rvalue 를 가지며 , rvalue 의 값은 7 입니다 .
i=i 라는 문장에서 등호의 왼쪽에 있는 표현식 i 는 lvalue 를 가지며 , 값은 [5000] 입니다 . 등호의 오른쪽에
있는 i 는 rvalue 를 가지며 값은 3 입니다 .

rvalue 를 함수의 인자로 전달
void Test2( int i, int j )
{
printf( "%p, %prn", &i, &j );
}//Test2()
int main()
{
int i = 3;
int j = 0;
Test2( 5, i );
}//main()
7
표현식의 rvalue 가 값을 인자로 받는 함수로 전달하는 경우 , 호출된 함수 내부에서는 rvalue 성질이 유
지되지 않습니다 .
main() 에서 Test2() 를 호출할 때 , rvalue 5 를 넘겼지만 , Test2() 함수 내부에서는 이 값을 이름
name i 로 참조할 수 있기 때문입니다 .

lvalue reference
void Test2( int i, int& j )
{
}//Test2()
int main()
{
int i = 3;
int j = 0;
Test2( i, i );
}//main()
8
Test2() 가 두번째 인자를 참조 lvalue reference 로 받으면 , main() 의 Test2() 호출은 같은 표현식 i
에 대해서 첫번째 i 는 rvalue 3 을 전달하고 , 두번째 i 는 lvalue [5000] 을 전달합니다 .
왜냐하면 Test2( …, int& j) 가 두번째 인자 parameter 의 lvalue reference 를 요구하고 있기 때문입
니다 .
Test2( i, 5 ); 처럼 호출하면 컴파일 시간 에러가 발생합니다 . 왜냐하면 표현식 5 의 lvalue 를 구할
수 없기 때문입니다 .
5+3 이나 i+5 같은 이름을 가지지 않는 표현식은 rvalue 만 가집니다 .
Test2( i, 5 ); // error

rvalue 도 어딘가에 할당됨
void Test2( int i, const int& j )
{
}//Test2()
int main()
{
int i = 3;
int j = 0;
Test2( i, 5 );
}//main()
9
하지만 const lvalue reference 를 받으면 , rvalue 를 가지는 표현식을 함수의 인자로 전달하는 것이
가능합니다
이것은 컴파일러가 생성한 코드가 함수로 전달하는 표현식의 값을 , 사용자가 접근할 수 없는 어떤 메
모리 공간에 저장하기 때문에 가능한 일입니다 .
프로그래머는 정상적인 방법으로 [4992] 를 알아낼 방법이 없습니다 .
j == 5, &j==[4984] 이지만 , *j 는 에러입니다 .
컴파일러가 이미 rvalue 의 주소값을 전달하고 있다면 프로그래머가 의도적으로 rvalue 의 주소값을 지속
적으로 전달하도록 할 수 없을까요 ? 그것이 rvalue reference 입니다 .

rvalue reference, std::move
rvalue reference
Move semantics
쓰레드 thread
10

rvalue reference
 함수가 rvalue reference 를 받도록 인자를 선언할
수 있습니다 .
 타입 이름과 변수 이름 사이에 && 를 사용합니다 .
– int&& iData_
 그러면 컴파일러는 인자에 해당하는 표현식이
rvalue 를 가지는 경우 , rvalue 의 주소를 전달하도
록 코드를 생성합니다 .
11

rvalue reference
void Test( int& iData_ )
{
std::cout << __FUNCTION__ << " reference" << std::endl;
}//Hello()
void Test( const int& iData_ )
{
std::cout << __FUNCTION__ << " const reference" << std::endl;
}//Test()
void Test( int&& iData_ )
{
std::cout << __FUNCTION__ << " rvalue reference" << std::endl;
}//Test()
12
int main()
{
int i = 3;
int j = 0;
Test( 5 );
Test( j == 5 );
Test( j );
/** output:
Test rvalue reference
Test rvalue reference
Test reference
계속하려면 아무 키나 누르십시오
. . .
*/

rvalue 인자를 다시 rvalue 인자로 전달하기
void Hello( int& iData_ )
{
std::cout << "Hello reference" << std::endl;
}//Hello()
void Hello( int&& iData_ )
{
std::cout << "Hello rvalue reference" << std::endl;
}//Hello()
{
Hello( iData_ );
}//Test()
13
Test() 는 rvalue reference iData_ 를 받았지만 , Test() 내부에서 iData_ 는 이제 이름으로 참조할
수 있으므로 ( 스택에 할당되고 변수의 주소가 &iData_ 이므로 ), iData_ 라는 변수 표현식은 더 이상
rvalue 가 아닙니다 .
그러므로 Test() 내부의 Hello( iData ); 호출은 lvalue reference 를 인자로 받는 Hello(int& iData_) 를 호출
합니다 .

{
}//Hello()
template<typename T>
typename std::remove_reference<T>::type&& MyMove( T&& t_ )
{
return static_cast<std::remove_reference<T>::type&&>(t_);
}//MyMove()
{
Hello( MyMove( iData_ ) );
}//Test()
14
rvalue reference 를 받은 함수가 내부에서 호출하는 다른 함수에 인자의 rvalue reference 를 그대로
전달하는 방법은 컴파일러로 하여금 강제로 캐스팅 casting 하도록 하는 것입니다 .
MyMove() 는 인자로 받은 rvalue reference 의 명시적인 rvalue reference 를 리턴합니다 .
그러므로 오버로드된 두 개의 Hello() 중에서 Hello(int&&) 를 호출하는 것이 가능합니다 .

std::move
void Hello( int& iData_ )
{
std::cout << "Hello reference" << std::endl;
}//Hello()
{
}//Hello()
{
Hello( std::move( iData_ ) );
}//Test()
15
인자로 받은 rvalue reference 의 값을 그대로 유지하는 표준 구현이 std::move() 입니다 .
이렇게 외워 두세요 .
인자로 받은 rvalue reference 를 , 다른 함수에 rvalue reference 로 전달하기 위해서는
std::move() 를 반드시 사용해야 합니다 .

Move Semantics
rvalue reference
Move semantics
쓰레드 thread
16

Move Semantics
 deep copy 가 필요한 클래스는 copy constructor 와
copy assignment operator 를 제공해야 합니다 .
 container 가 deep copy 가 필요한 객체들을 노드 node
로 유지할 때 , 노드의 삽입이 일어날 때 , 임시 객체
temporary object 에 대한 copy constructor 와 copy
assignment operator 가 호출됩니다 .
 임시 객체는 이름이 없으므로 rvalue 입니다 .
 클래스 생성자가 이러한 rvalue 에 대해서 특별하게 동작하
도록 만든다면 , 임시 객체를 위한 복사 동작을 향상 할
수 있습니다 .
 이것을 이동 문맥 move semantics 이라고 합니다 .
 move semantics 는 move constructor 와 move
assignment operator 를 통해 구현합니다 .
17

class MemoryBlock
class MemoryBlock
{
public:
// Simple constructor that initializes the resource.
explicit MemoryBlock(size_t length);
// Destructor.
~MemoryBlock();
// Copy constructor.
MemoryBlock(const MemoryBlock& other);
// Copy assignment operator.
MemoryBlock& operator=(const MemoryBlock& other);
// Retrieves the length of the data resource.
size_t Length() const;
private:
size_t _length; // The length of the resource.
int* _data; // The resource.
};
18

MemoryBlock. 생성자와 파괴자
// Simple constructor that initializes the resource.
explicit MemoryBlock(size_t length)
: _length(length)
, _data(new int[length])
{
std::cout << "In MemoryBlock(size_t). length = "
<< _length << "." << std::endl;
}

// Destructor.
~MemoryBlock()
{
std::cout << "In ~MemoryBlock(). length = "
<< _length << ".";

if (_data != NULL)
{
std::cout << " Deleting resource.";
// Delete the resource.
delete[] _data;
}

std::cout << std::endl;
}

19
정수를 MemoryBlock 에
할당할 때 , 생성자가 호
출되는 것을 방지하기 위
해 , explicit 이 필요합
니다 .

MemoryBlock. 복사 생성자와 할당연산자
// Copy constructor.
MemoryBlock(const MemoryBlock& other)
: _length(other._length)
, _data(new int[other._length])
{
std::cout << "In MemoryBlock(const MemoryBlock&). length = "
<< other._length << ". Copying resource." << std::endl;
std::copy(other._data, other._data + _length, _data);
}
// Copy assignment operator.
MemoryBlock& operator=(const MemoryBlock& other)
{
std::cout << "In operator=(const MemoryBlock&). length = "
<< other._length << ". Copying resource." << std::endl;
if (this != &other)
{
// Free the existing resource.
delete[] _data;
_length = other._length;
_data = new int[_length];
std::copy(other._data, other._data + _length, _data);
}
return *this;
}
20

MemoryBlock.Move constructor
// Move constructor.
MemoryBlock(MemoryBlock&& other)
: _data(NULL)
, _length(0)
{
std::cout << "In MemoryBlock(MemoryBlock&&). length = "
<< other._length << ". Moving resource." << std::endl;
//*this = std::move( other );
// Copy the data pointer and its length from the
// source object.
_data = other._data;
// Release the data pointer from the source object so that
// the destructor does not free the memory multiple times.
other._data = NULL;
other._length = 0;
}
21

MemoryBlock.Move assignment operator
// Move assignment operator.
MemoryBlock& operator=(MemoryBlock&& other)
{
std::cout << "In operator=(MemoryBlock&&). length = "
<< other._length << "." << std::endl;
if (this != &other)
{
// Free the existing resource.
if( _data != nullptr )
delete[] _data;
// Copy the data pointer and its length from the
// source object.
_data = other._data;
// Release the data pointer from the source object so that
// the destructor does not free the memory multiple times.
other._data = NULL;
other._length = 0;
}
return *this;
}
22

MemoryBlock.Move constructor.cont
// Move constructor.
MemoryBlock(MemoryBlock&& other)
: _data(NULL)
, _length(0)
{
std::cout << "In MemoryBlock(MemoryBlock&&). length = "
<< other._length << ". Moving resource." << std::endl;
*this = std::move( other );
}
23

MemoryBlock.main()
int main()
{
// Create a vector object and add a few elements to it.
std::vector<MemoryBlock> v;
v.push_back(MemoryBlock(25));
v.push_back(MemoryBlock(75));
// Insert a new element into the second position of the vector.
v.insert(v.begin() + 1, MemoryBlock(50));
}
24

In Vs2010 or above
In MemoryBlock(size_t). length = 25.
In MemoryBlock(MemoryBlock&&). length = 25. Moving resource.
In ~MemoryBlock(). length = 0.
In operator=(MemoryBlock&&). length = 75.
In operator=(MemoryBlock&&). length = 50.
In ~MemoryBlock(). length = 25. Deleting resource.
25

Before Vs2010
In MemoryBlock(const MemoryBlock&). length = 25. Copying resource.
In operator=(const MemoryBlock&). length = 75. Copying resource.
In operator=(const MemoryBlock&). length = 50. Copying resource.
26

쓰레드 Thread
rvalue reference
Move semantics
쓰레드 thread
27

쓰레드 thread
 프로세스 내에서 실행되는 흐름의 단위입니다 .
 윈도우즈에서 실행되는 온라인게임의 경우 대부분
WinMain() 에서 하나의 쓰레드가 실행됩니다 .
 필요에 따라 다른 쓰레드를 만들고 실행할 수 있습니
다 .
 프로세스가 2 개 이상의 쓰레드를 실행하면 멀티쓰레
드 Multithread 프로그램입니다 .
 Critical Section
 Mutex
 Semaphore
 TLS(Thread Local Storage)
28

쓰레드 구현
 Win32 구현이 CreateThread() 입니다 .
 Microsoft 구현이 _beginthreadex() 입니다 .
 표준 라이브러리 구현이 std::thread 입니다 .
 boost 구현이 boost::thread 입니다 .
29

임계영역 Critical Section
 두 개의 쓰레드가 같은 루틴을 실행할 수 있습니다 .
 그 루틴이 동시에 실행되어서는 안 되는 코드블록이면 그것을 Critical
Section 이라고 합니다 .
 Critical Section 의 진입과 탈출을 제어하는 객체를 Mutex 라고 합니다
.30

Mutex
 Mutual Exclusion 의 약자입니다 .
 일반적으로 운영체제가 제공하는 동기화 객체입니다 .
 lock() 과 unlock() 을 제공하며 , lock() 과
unlock() 사이의 코드 블록이 Critical Section 입
니다 .
 lock() 을 시도한 쓰레드가 unlock() 해야 합니다 .
31

Mutex
 Mutex 의 Win32 구현이 CRITICAL_SECTION 입니다 .
– EnterCriticalSection()
– LeaveCriticalSection()
– InitializeCriticalSection()
– DeleteCriticalSection()
 Mutex 의 표준 구현이 std::mutex 입니다 .
– std::mutex 의 RAII 헬퍼가 std::lock_guard 입니다 .
 Mutex 의 boost 구현이 boost::mutex 입니다 .
32

세마포 Semaphore
 B 쓰레드가 A 쓰레드의 작업 완료를 기다려야 하는 상황이 있습니다 .
 이렇게 여러 쓰레드 사이의 동기화를 제공하는 객체를 Semaphore 라고 합
니다 .
 일반적으로 Signal() 과 Wait() 인터페이스를 제공합니다 .
33

세마포
 세마포의 Win32 구현이 Event 입니다 .
– CreateEvent()
– CloseEvent()
– SetEvent()
– ResetEvent()
– WiatForSingleObject()
 세마포의 표준 라이브러리 구현이
std::condition_variable 입니다 .
34

 데드락 Deadlock
– 어떤 쓰레드도 Critical Section 에 진입하지 못하는 상황입니다 .
– Crash 처럼 프로그램이 종료하지 않습니다 .
– 하지만 아무것도 할 수 없습니다 .
 레이스 조건 Race Condition
– 쓰레드가 Critical Section 에 서로 진입하려는 상황입니다 .
 굶어죽음 Starvation
– Race 상황에서 특정 쓰레드만 Critical Section 에 진입하지 못하는
상황입니다 .
35

쓰레드 모델
 각 쓰레드가 자신이 맡은 고유의 작업을 수행합니다
.
– 대부분의 게임 엔진이 이 모델을 사용합니다 .
– 쓰레드들은 메시지 큐를 통하여 통신합니다 .
 임의의 n 개 쓰레드는 다른 작업 Task 를 수행합니다
.
– Task Parallel 하다고 합니다 .
– 이러한 모델은 스케일러블 Scalable 합니다 .
– 즉 Core 의 개수가 늘어나면 쓰레드의 개수를 늘리면 됩니다 .
 임의의 n 개 쓰레드는 데이터 영역이 다른 같은 작업
을 수행합니다 .
– Data Parallel 하다고 합니다 .
– 이러한 모델 역시 Scalable 합니다 .
 최근의 쓰레드 라이브러리들은 Data Parallelism 과36

TLS,Thread Local Storage
 어떤 함수가 같은 변수를 접근하는 것 처럼 보이지만 , 쓰레드마다 유일
한 자신의 변수를 접근하도록 변수를 선언할 수 있습니다 .
 전역변수처럼 선언하지만 , 쓰레드마다 구별되는 변수입니다 .
 이러한 변수를 TLS 라고 합니다 .
 TLS 의 Win32 구현이 TlsAlloc() 류의 함수들입니다 .
 TLS 의 Microsoft 구현이 __declspec( thread ) 입니다 .
 TLS 의 boost 구현이 boost::thread_specific_ptr<> 입니다 .
37

참고 자료
 Threading
– http://en.wikibooks.org/wiki/C%2B%2B_Programming/Threading
 Thread Support Library
– http://en.cppreference.com/w/cpp/thread
 Mutex 와 Semaphore 의 차이
– http://stackoverflow.com/questions/62814/difference-between-
binary-semaphore-and-mutex
 std::condition_variable
– http://en.cppreference.com/w/cpp/thread/condition_variable
38

02. 쓰레드 라이브러리
2014 년 10 월 20 일
jintaeks@gmail.com

목차 index
 std::thread
 std::mutex
 std::unique_lock
 TLS
 atomic operations
 memory barriers
41

#include <thread>
#include <iostream>
void my_thread_func()
{
std::cout<<"hello"<<std::endl;
}
int main()
{
std::thread t(my_thread_func);
t.join();
}
43
std::thread 는 RAII 형식으로만 thread callback 을 실행할 수 있습니다 .
join() 은 쓰레드 t 가 종료하기를 기다립니다 .

class bar {
public:
void foo() {
std::cout << "hello from member function" << std::endl;
}
};
int main()
{
bar b;
std::thread t(&bar::foo, &b);
t.join();
}
44
객체의 멤버 함수를 thread callback 으로 전달 할 수 있습니다 .

thread.function object 사용하기
#include <thread>
#include <iostream>
class SayHello
{
public:
void operator()() const
{
std::cout<<"hello"<<std::endl;
}
};
int main()
{
std::thread t(SayHello());
t.join();
}
45

thread.std::bind 로 함수 객체 만들기
#include <thread>
#include <iostream>
#include <string>
#include <functional>
void greeting(std::string const& message)
{
std::cout<<message<<std::endl;
}
int main()
{
std::thread t(std::bind(greeting,"hi!"));
t.join();
}
46
std::bind 를 이용하여 함수 객체를 리턴하도록 합니다 .

thread. 쓰레드 함수로 인자 전달하기
#include <thread>
#include <iostream>
void write_sum(int x,int y)
{
std::cout<<x<<" + "<<y<<" = "<<(x+y)<<std::endl;
}
int main()
{
std::thread t(write_sum,123,456);
t.join();
}
47
std::thread 의 생성자는 가변 인자를 받도록 설계되어 있습니다 . 생성자 구현이 첫번째 파라미터를
쓰레드 함수로 인식하고 , 나머지 값들을 쓰레드 함수의 인자로 가집니다 .

#include <thread>
#include <iostream>
class SayHello
{
public:
void greeting(std::string const& message) const
{
std::cout<<message<<std::endl;
}
};
int main()
{
SayHello x;
std::thread t(&SayHello::greeting,&x,"goodbye");
t.join();
}
48

thread.shared_ptr 사용하기
#include <>
int main()
{
std::shared_ptr<SayHello> p(new SayHello);
std::thread t(&SayHello::greeting,p,"goodbye");
t.join();
}
49
스마트 포인터를 전달하는 것 가능합니다 . 쓰레드 객체 t 가 살아 있을 동안 , p 의 lifetime 또한 유
지됩니다 .

thread.reference 전달하기
#include <thread>
#include <iostream>
#include <functional> // for std::ref
class PrintThis
{
public:
void operator()() const
{
std::cout<<"this="<<this<<std::endl;
}
};
int main()
{
PrintThis x;
x();
std::thread t(std::ref(x));
t.join();
std::thread t2(x);
t2.join();
}
50
this=0x7fffb08bf7ef
this=0x7fffb08bf7ef
this=0x42674098

variadic template
int func() {} // termination version
template<typename Arg1, typename... Args>
int func(const Arg1& arg1, const Args&... args)
{
process( arg1 );
func(args...); // note: arg1 does not appear here!
}
51

variadic template.specialization
template<typename T>
class Template{
public:
void SampleFunction(T param){
}
};
template<>
class Template<int>{
public:
void SampleFunction(int param){
}
};
52
template<typename... Arguments>
class VariadicTemplate{
public:
void SampleFunction(Arguments... params){
}
};
template<>
class VariadicTemplate<double, int, long>{
public:
void SampleFunction(double param1, int param2
, long param3){
}
};

thread.constructor
thread();
(1)(since C++11)thread( thread&& other );
(2)(since C++11)template< class Function, class... Args >
explicit thread( Function&& f, Args&&... args );
(3)(since C++11)thread(const thread&) = delete ;
(4)(since C++11)Constructs new thread object.
53
표준 constructor 는 variadic template 으로 선언되어 있지만 , Vs2012 의 실제 구현은 BOOST_PP 와
비슷한 구현의 가변 매크로를 사용하여 구현되어 있습니다 .

thread.mutex
std::mutex m;
std::string s;
void append_with_lock_guard(std::string const& extra)
{
std::lock_guard<std::mutex> lk(m);
s+=extra;
}
void append_with_manual_lock(std::string const& extra)
{
m.lock();
try
{
s+=extra;
m.unlock();
}
catch(...)
{
m.unlock();
throw;
}
}55
std::lock_guard 를 이용하여 RAII 형식으로 예외에도 안전하게 동작
하도록 합니다 .

std::unique_lock
std::mutex mtx; // mutex for critical section
void print_block (int n, char c) {
// critical section (exclusive access to std::cout signaled by lifetime of lck):
std::unique_lock<std::mutex> lck (mtx);
for (int i=0; i<n; ++i) { std::cout << c; }
std::cout << 'n';
}
int main ()
{
std::thread th1 (print_block,50,'*');
std::thread th2 (print_block,50,'$');
th1.join();
th2.join();
return 0;
}
57
lock_guard 와 같은 의도로 사용할 수 있습니다 .
출력결과 :
**************************************************
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

std::unique_lock<std::mutex> acquire_lock()
{
static std::mutex m;
return std::unique_lock<std::mutex>(m);
}
//std::mutex mtx; // mutex for critical section
void print_block (int n, char c) {
// critical section (exclusive access to std::cout signaled by lifetime of lck):
//std::lock_guard<std::mutex> lck (mtx);
std::unique_lock<std::mutex> lck = acquire_lock();
for (int i=0; i<n; ++i)
{
std::cout << c;
std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );
}
std::cout << 'n';
}
58
move semantics 에 대해 안전하게 동작합니다 . acquire_lock()
이 unique_lock 을 사용하지 않고 lock_guard 를 사용하면
컴파일 타임 에러가 발생합니다 .

rvalue 객체 생성 막기
class KPreventRValueObject
{
private:
KPreventRValueObject( KPreventRValueObject&& rvalueref_ );
public:
KPreventRValueObject(){}
private:
std::string m_strData;
};//class KPreventRValueObject
KPreventRValueObject TestRValueObject()
{
return KPreventRValueObject(); // compile time error
}//TestRValueObject()
59

move semantic 구현하기
class data_to_protect
{
public:
void some_operation(){}
void other_operation(){}
};//class data_to_protect
class data_handle
{
private:
data_to_protect* ptr;
std::unique_lock<std::mutex> lk;
friend data_handle lock_data();
data_handle(data_to_protect* ptr_, std::unique_lock<std::mutex>&& lk_)
: ptr(ptr_)
, lk( std::move( lk_ ) ) {}
60

public:
data_handle(data_handle&& other)
: ptr(nullptr)
{
*this = std::move( other );
}
data_handle& operator=(data_handle&& other)
{
if( &other != this )
{
ptr = other.ptr;
lk = std::move(other.lk);
other.ptr = 0;
}//if
return *this;
}
void do_op()
{
ptr->some_operation();
}
void do_other_op()
{
ptr->other_operation();
}
};//class data_handle
61
data_handle lock_data()
{
static std::mutex m;
static data_to_protect the_data;
std::unique_lock<std::mutex> lk(m);
return data_handle(&the_data, std::move(lk) );
}//lock_data()
int main()
{
data_handle dh = lock_data(); // lock acquired
dh.do_op(); // lock still held
dh.do_other_op(); // lock still held
data_handle dh2 = std::move(dh); // transfer lock to other
handle
dh2.do_op(); // lock still held
return 0;
}//main()

std::unique_lock : 잠시 unlock 하기
std::mutex m;
std::vector<std::string> strings_to_process;
void update_strings()
{
std::unique_lock<std::mutex> lk(m);
if(strings_to_process.empty())
{
lk.unlock();
std::vector<std::string> local_strings=load_strings();
lk.lock();
strings_to_process.insert(strings_to_process.end(),
local_strings.begin(),local_strings.end());
}
}
62
unique_lock 은 RAII 패턴을 사용하면서도 , 원하는 때에
lock/unlock 이 가능합니다 .

deadlock 상황의 예
class account
{
std::mutex m;
currency_value balance;
public:
friend void transfer(account& from,account& to,
currency_value amount)
{
std::lock_guard<std::mutex> lock_from(from.m);
std::lock_guard<std::mutex> lock_to(to.m);
from.balance -= amount;
to.balance += amount;
}
};
63
두개의 쓰레드가 accout.transfer( A, B, … ),
account.transfer( B, A, … ) 형태로 호출하면 deaklock
이 발생할 수 있습니다 .

struct Box {
explicit Box(int num) : num_things{num} {}
int num_things;
std::mutex m;
};
void transfer(Box &from, Box &to, int num)
{
// don't actually take the locks yet
std::unique_lock<std::mutex> lock1(from.m, std::defer_lock);
std::unique_lock<std::mutex> lock2(to.m, std::defer_lock);
// lock both unique_locks without deadlock
std::lock(lock1, lock2);
from.num_things -= num;
to.num_things += num;
// 'from.m' and 'to.m' mutexes unlocked in 'unique_lock' dtors
}
64
unique_lock 의 lock 시점을 컨트롤하는 기능을 사용하면
std::lock() 과 사용하여 dead lock 을 예방하는 코드를
작성할 수 있습니다 .

std::mutex mtx;
std::condition_variable cv;
bool ready = false;
void print_id (int id) {
std::unique_lock<std::mutex> lck(mtx);
while (!ready) cv.wait(lck);
// ...
std::cout << "thread " << id << 'n';
}
void go() {
ready = true;
cv.notify_all();
}
66
int main ()
{
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i)
threads[i] = std::thread(print_id,i);
std::cout << "10 threads ready to race...n";
go(); // go!
for (auto& th : threads) th.join();
return 0;
}

std::mutex mtx;
std::condition_variable cv;
bool ready = false;
void print_id (int id) {
//while (!ready) cv.wait(lck);
cv.wait( lck, []{ return ready;} );
// ...
std::cout << "thread " << id << 'n';
}
void go() {
ready = true;
cv.notify_all();
}
67
int main ()
{
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i)
threads[i] = std::thread(print_id,i);
std::cout << "10 threads ready to race...n";
go(); // go!
for (auto& th : threads) th.join();
return 0;
}

condition_variable.example
std::mutex g_mutex;
std::condition_variable g_conditioVariable;
bool g_bReady = false;
bool g_bIsRunThread = true;
void consume (int n)
{
int iCounter = 0;
while( g_bIsRunThread == true )
{
{
std::unique_lock<std::mutex> ulock( g_mutex );
g_conditioVariable.wait( ulock, []{ return g_bReady;} );
}//block
std::cout << iCounter << std::endl;
std::this_thread::sleep_for( std::chrono::milliseconds(500) );
iCounter += 1;
}//while
}//consume()
68

int _tmain()
{
int ich = 0;
std::thread consumerThread(consume,10);
std::cout << "Thread started" << std::endl;
while( g_bIsRunThread == true )
{
ich = _getch();
if( ich == 'p' ) // pause
{
g_bReady = false;
}
else if( ich == 'c' ) // continue
{
g_bReady = true;
g_conditioVariable.notify_one();
}
69
else if( ich == 'e' ) // exit
{
g_bIsRunThread = false;
std::unique_lock<std::mutex>
ulock( g_mutex );
g_bReady = true;
g_conditioVariable.notify_one();
}//if.. else if..
}//while
consumerThread.join();
return 0;
}//main()

 Tls 는 하나의 코드 루틴이 서로 다른 쓰레드에서 호출될 때 , 접근하는
전역 메모리를 다르게 설정하는 것이 가능합니다 .
 전역 변수를 각 쓰레드별로 나누어서 해당 쓰레드에서 접근하도록 하는
방법과의 차이점은 다음과 같습니다 .
– 쓰레드를 위한 전역 변수를 직접 관리하면 각 쓰레드에 dependent 한 코드가 쓰레
드 루틴에 추가되어야 합니다 .
– Tls 를 사용하면 코드는 쓰레드와 상관없이 하나의 코드를 사용합니다 .
 Boost 는 각 플랫폼에 대한 Tls 구현을 숨긴
''boost::thread_specifix_ptr<>'' 을 제공합니다 .
– 각 쓰레드에서 필요한 메모리인 경우 boost::thread_specifix_ptr<> 타입으로 변수
를 선언합니다 .
– 아래 코드에서 문제가 된 부분은 FindNearestSplinePoint() 함수가 단일 쓰레드에
서만 사용하다가 멀티 쓰레드에서 사용하게 되면서 이 함수가 내부에서 사용할 용도
로 선언된 static 변수의 read/wirte 동작 때문에 크래시가 발생한 경우 였습니다
.
– 그래서 이 함수가 사용하는 static 변수를 thread safe 하게 만들어 주어야 했습
니다 .
71

/// TLS(Thread Local Storage) 를 사용하여 각 thread 가 자신의 local memory 를 사용하도록 수정하
다 .
/// KpSplineUtil namespace 의 함수들은 thread safe 해야 한다 .
/// native type 이 아니므로 ''boost::thread_specific_ptr'' 를 이용한다 .
/// - jintaeks on 2013-03-20, 13:35 */
static boost::thread_specific_ptr<KpIntervalHeap<KInfo>> s_kIntervalHeap;
bool KpSplineUtil::FindNearestSplinePoint(
IN OUT KpSplinePosition& kInOutArgument_
, const KpSpline& kInSpline_
, const KpVector3& vInPoint_
, KpReal rInError_ )
{
unsigned uNumSegs = kInSpline_.GetNumSegments();
ASSERT( uNumSegs > 0 );
if( uNumSegs == 0 )
{
kInOutArgument_.Invalidate();
return false;
}//if
72

KpIntervalInfo<KInfo> kInfo;
KpVector3 vMin, vMax;
/// 각 thread 에서 처음으로 호출될 때 , 이 값은 NULL 이다 .
/// 그 때 thread 가 사용할 메모리를 할당한다 .
/// 프로그램 종료할 때 , 메모리 delete 를 시켜주어야 한다 .
/// - jintaeks on 2013-03-20, 13:37
if( s_kIntervalHeap.get() == NULL )
{
s_kIntervalHeap.reset( new KpIntervalHeap<KInfo>() );
}//if
s_kIntervalHeap->MakeEmpty();
if( kInOutArgument_.IsValid() )
{
ASSERT( kInOutArgument_.m_iIndex < ( int ) uNumSegs );
73

 native 타입인 경우는 Visual Studio 2005 부터 지원
하는 확장 키워드 __declspec( thread ) 를 사용하
여 변수를 선언하면 됩니다 .
 코드 상으로 각 쓰레드가 같은 변수를 접근하는 것
같지만 , 각 쓰레드별로 다른 메모리 위치를 접근하
게 됩니다 .
74

/// TLS(Thread Local Storage) 를 사용하여 각 thread 가 자신의 local memory 를 사용하도록 수정하
다 .
/// KpSplineUtil namespace 의 함수들은 thread safe 해야 한다 .
/// native type 인 경우는 Visual Studio 의 확장 기능인 __declspec( thread ) 를 명시하기만 하면
된다 .
/// - jintaeks on 2013-03-20, 13:35
__declspec( thread ) static KpReal s_rBITolerance2;
__declspec( thread ) static KpReal s_rBIWidth;
__declspec( thread ) static KpReal s_rBIHeight;
__declspec( thread ) static KpReal s_rBIntersectionT;
__declspec( thread ) static KpReal s_rBMinDist2;
static bool _FindBezierRectangleIntersection(
const KpBezierControl& kInBezier_
, KpReal rInTBegin_
, KpReal rInTEnd_ )
{
{
KpVector3 vMin, vMax;
_CalcBezierControlAABB( vMin, vMax, kInBezier_ );
if( vMin.X() > s_rBIWidth || vMax.X() < KpReal( 0.0 )
75

참고문헌
 thread tutorial
– http://www.justsoftwaresolutions.co.uk/threading/multithreadi
ng-in-c++0x-part-1-starting-threads.html
 std::unique_lock
– http://www.cplusplus.com/reference/mutex/unique_lock/?
kw=unique_lock
– http://en.cppreference.com/w/cpp/thread/unique_lock
– http://stackoverflow.com/questions/13099660/c11-why-does-
stdcondition-variable-use-stdunique-lock
– http://www.cplusplus.com/reference/condition_variable/conditi
on_variable/?kw=condition_variable
– http://stackoverflow.com/questions/13099660/c11-why-does-
stdcondition-variable-use-stdunique-lock
76

03. 쓰레드 라이브러리 2
2014 년 11 월 3 일
jintaeks@gmail.com

목차 index
 1 회
– 문장 statement 과 표현식 expression
– rvalue reference
– Move semantics
– 쓰레드 thread
 2 회
– std::thread
– std::mutex
– std::unique_lock
– std::condition_variable
– TLS
 3 회
– atomic
– lock-free
– memory ordering
– std::atomic<>
79

Atomic
 연산의 중간 과정을 결과로 얻을 수 없다면 atomic
하다고 합니다 .
 중간 과정을 얻는 것을 data race 라고 하며 , data
race 의 결과 torn read/torn write 가 발생합니다
.
 non-atomic 이 발생하는 이유는 연산이 여러 개의
Cpu 명령으로 분리되기 때문입니다 ( 하나의 Cpu 명령
자체가 atomic 하지 않는 경우도 있습니다 ).
80

Atomic
 simple type 에 대한 aligned 된 read/write 는
atomic 합니다 .
 Win32 의 _InterlockedIncrement(), C++11 의
std::atomic<int>::fetch_add() 는 atomic RMW 의
예입니다 .
 std::atomic<> 은 lock-free 를 보장하지 않습니다 .
 std::atomic<>::is_lock_free() 로 검사해야 합니다
.
81

 RMW 의 가장 흔한 예는 Compare-And-Swap(CAS) 입니
다 .
 _InterlockedCompareExchange() 는 Win32 의 CAS 구
현 intrinsic 함수입니다 .
– intrinsic function 의 의미는 라이브러리가 제공하는 함수가 아니라
컴파일러가 제공하는 함수라는 의미입니다 .
82

Lock-free Programming
 일반적으로 mutex 를 쓰지 않는 프로그래밍 기법이라
고 알려져 있습니다 .
 mutex 의 사용 여부와 상관없이 하나의 쓰레드가 다
른 쓰레드를 영구히 block 시킬 수 없다면 lock-
free 하다고 합니다 .
83

예 ) lock-free queue 의 push 구현
void LockFreeQueue::push(Node* newHead)
{
   for (;;)
{
       // Copy a shared variable (m_Head) to a local.
Node* oldHead = m_Head;
       // Do some speculative work, not yet visible to other threads.
newHead->next = oldHead;
       // Next, attempt to publish our changes to the shared variable.
       // If the shared variable hasn't changed, the CAS succeeds and we return.
       // Otherwise, repeat.
       if (_InterlockedCompareExchange(&m_Head, newHead, oldHead) == oldHead)
           return;
}
}
84
 _InterlockedCompareExchange() 의 리턴값을 비교하는 짧은 순간에 ,
다른 쓰레드가 A 값을 B 로 바꾼 다음 다시 A 로 변경되었을 가능성이
있습니다 .
 CAS 는 ABA 문제 (ABA problem) 가 발생하지 않도록 조심스럽게 코딩해야
합니다 .

Memory Ordering
 atomic 한 일련의 연산이 보장된다고 , multi-core 에
서 동작하는 multi-threaded 프로그램의 data race
가 보장되지는 않습니다 .
 왜냐하면 compiler 에 의해 , 실행 시간 Cpu 에 의해
연산의 순서가 바뀔 수 있기 때문입니다 .
 프로세스는 여러가지 이유로 메모리 연산의 순서
order 를 바꿀 수 있습니다 .
86

volatile bool Ready = false;
int Value = 0;
// Thread A
while(!Ready) {}
printf("%d", Value);
// Thread B
Value = 1;
Ready = true;
87
 예상되는 Value 의 결과는 1 입니다 .
 하지만 , Ready = true; 가 Value = 1; 보다 먼저 실
행된다면 ?

 std::atomic<> 에는 6 개의 memory ordering 옵션
이 있습니다 .
 하지만 , 3 가지 memory ordering 모델중 한가지를
나타냅니다 .
– squentially-consistent ordering (memory_order_seq_cst)
– acquire-release ordering (memory_order_consume,
memory_order_acquire, memory_order_release, and
memory_order_acq_rel)
– relaxed ordering (memory_order_relaxed).
88

Memory Barrier
 pending 된 메모리 연산을 완료하도록 하는 일련의
명령들을 말합니다 .
 acquire, release, fence 의 세종류가 있습니다 .
89

Acquire semantics
operation 1
operation 2
<-operation 3-Acquire-> 3 is visible before 4-5
operation 4
operation 5
90
 데이터에 접근하기 위해서 atomic 연산을 사용할 때
, 다른 process 가 변경될 값들의 연산을 실행하기
전에 lock 을 볼 수 있어야 합니다 .
 이것을 acquire semantic 이라고 합니다 . 데이터를
접근하기 위한 권한을 얻을려고 acquire 하고 때문입
니다 .

Release semantics
operation 1
operation 2
<-operation 3-Release-> 1-2 are visible before 3
operation 4
operation 5
91
 atomic 연산이 최근에 변경된 값들을 release 하려고
할 때 , 새로운 값은 release 전에 다른 process 에
게 보여야 합니다 .
 이것을 release semantic 이라고 합니다 .

Fence semantics
operation 1
operation 2
<-operation 3-Fence-> 1-2 are visible before 3, 3
is visible before 4-5
operation 4
operation 5
92
 fence 는 full memory barrier 라고도 합니다 .

class SpinLock
{
   volatile tInt LockSem;
public:
   FORCEINLINE SpinLock()
   : LockSem(0)
   {}
   FORCEINLINE tBool Lock()
   {
       while(1)
       {
           // Atomically swap the lock variable with 1 if it's currently equal to 0
           if(!InterlockedCompareExchange(&LockSem, 1, 0))
           {
               // We successfully acquired the lock
               ImportBarrier();
               return;
           }
       }
   }
   FORCEINLINE void Unlock()
   {
       ExportBarrier();
       LockSem = 0;
   }
};
93

 volatile 로 지정된 변수에 write 하는 것은
release semantic 과 같습니다 .
 volotile 로 지정된 변수에서 읽는 것은 acquire
semantic 과 같습니다 .
94

예 ) 실제 문제 상황
HANDLE beginSema1;
HANDLE beginSema2;
HANDLE endSema;
int X, Y;
int r1, r2;
95
DWORD WINAPI thread1Func(LPVOID param)
{
MersenneTwister random(1);
for (;;)
{
WaitForSingleObject(beginSema1, INFINITE); // Wait for signal
while (random.integer() % 8 != 0) {} // Random delay
// ----- THE TRANSACTION! -----
X = 1;
#if USE_CPU_FENCE
MemoryBarrier(); // Prevent CPU reordering
#else
_ReadWriteBarrier(); // Prevent compiler reordering only
#endif
r1 = Y;
ReleaseSemaphore(endSema, 1, NULL); // Notify transaction
complete
}
return 0; // Never returns
};

DWORD WINAPI thread2Func(LPVOID param)
{
MersenneTwister random(2);
for (;;)
{
WaitForSingleObject(beginSema2, INFINITE); // Wait for signal
while (random.integer() % 8 != 0) {} // Random delay
// ----- THE TRANSACTION! -----
Y = 1;
#if USE_CPU_FENCE
MemoryBarrier(); // Prevent CPU reordering
#else
_ReadWriteBarrier(); // Prevent compiler reordering only
#endif
r2 = X;
ReleaseSemaphore(endSema, 1, NULL); // Notify transaction complete
}
return 0; // Never returns
};
96

#if USE_SINGLE_HW_THREAD
// Force thread affinities to the same cpu core.
SetThreadAffinityMask(thread1, 1);
SetThreadAffinityMask(thread2, 1);
#endif
// Repeat the experiment ad infinitum
int detected = 0;
for (int iterations = 1; ; iterations++)
{
// Reset X and Y
X = 0;
Y = 0;
// Signal both threads
ReleaseSemaphore(beginSema1, 1, NULL);
ReleaseSemaphore(beginSema2, 1, NULL);
// Wait for both threads
WaitForSingleObject(endSema, INFINITE);
WaitForSingleObject(endSema, INFINITE);
// Check if there was a simultaneous reorder
if (r1 == 0 && r2 == 0)
{
detected++;
printf("%d reorders detected after %d iterationsn", detected, iterations);
}
}
97

어떻게 해결하나요 ?
std::atomic<int> X(0), Y(0);
int r1, r2;
void thread1()
{
X.store(1);
r1 = Y.load();
}
void thread2()
{
Y.store(1);
r2 = X.load();
}
98
 std::atomic<> 은 atomic 연산과 memory barrier 를
지원하는 C++11 의 표준 라이브러리입니다 .
 .store() 와 .load() 는 디폴트로 fence 를 설치합니
다 .

Relaxed ordering
std::atomic<int> x;
std::atomic<int> y;
// Thread 1:
r1 = y.load(memory_order_relaxed); // A
x.store(r1, memory_order_relaxed); // B
// Thread 2:
r2 = x.load(memory_order_relaxed); // C
y.store(42, memory_order_relaxed); // D
100
 is allowed to produce r1 == r2 == 42 because, although A
is sequenced-before B and C is sequenced before D, nothing
prevents D from appearing before A in the modification order of
y, and B from appearing before C in the modification order of x.

예 ) counter
#include <vector>
#include <iostream>
#include <thread>
#include <atomic>
std::atomic<int> cnt = {0};
void f()
{
for (int n = 0; n < 1000; ++n) {
cnt.fetch_add(1, std::memory_order_relaxed);
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f);
}
for (auto& t : v) {
t.join();
}
std::cout << "Final counter value is " << cnt << 'n';
}
101

Release-Acquire ordering
 If an atomic store in thread A is
tagged std::memory_order_release and an atomic
load in thread B from the same variable is
tagged std::memory_order_acquire, all memory writes
(non-atomic and relaxed atomic) that happened-
before the atomic store from the point of view of
thread A, become visible side-effects in thread B, that
is, once the atomic load is completed, thread B is
guaranteed to see everything thread A wrote to
memory.
 기다리는 B 의 atomic load 이후의 명령들이 , A 의
atomic store 전에 처리한 모든 값을 볼 수 있습니다
.102

std::atomic<std::string*> ptr;
int data;
void producer()
{
std::string* p = new std::string("Hello");
data = 42;
ptr.store(p, std::memory_order_release);
}
void consumer()
{
std::string* p2;
while (!(p2 = ptr.load(std::memory_order_acquire)))
;
assert(*p2 == "Hello"); // never fires
assert(data == 42); // never fires
}
int main()
{
std::thread t1(producer);
std::thread t2(consumer);
t1.join(); t2.join();
}
103

Sequentially-consistent ordering
 Atomic operations
tagged std::memory_order_seq_cst not only order
memory the same way as release/acquire ordering
(everything that happened-before a store in one
thread becomes a visible side effect in the
thread that did a load), but also establish a
single total modification order of all atomic operations
that are so tagged.
 core 가 1 개인 경우 명령들이 정렬되어 실행되는 경
우와 같습니다 .
– fence(full memory barrier) 를 생성합니다 .
104

#include <thread>
#include <atomic>
#include <cassert>
std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};
void write_x()
{
x.store(true, std::memory_order_seq_cst);
}
void write_y()
{
y.store(true, std::memory_order_seq_cst);
}
105

void read_x_then_y()
{
while (!x.load(std::memory_order_seq_cst))
;
if (y.load(std::memory_order_seq_cst)) {
++z;
}
}
void read_y_then_x()
{
while (!y.load(std::memory_order_seq_cst))
;
if (x.load(std::memory_order_seq_cst)) {
++z;
}
}
int main()
{
std::thread a(write_x);
std::thread b(write_y);
std::thread c(read_x_then_y);
std::thread d(read_y_then_x);
a.join(); b.join(); c.join(); d.join();
assert(z.load() != 0); // will never happen
}
106

참고문헌
 http://preshing.com/20130618/atomic-vs-non-
atomic-operations/
– blog series
 http://en.cppreference.com/w/cpp/atomic/memory_
order
 http://www.developerfusion.com/article/138018/m
emory-ordering-for-atomic-operations-in-c0x/
108

04. Parallel Pattern Library
2014 년 11 월 9 일
jintaeks@gmail.com

목차 index
 3 회
– atomic
– lock-free
– memory ordering
– std::atomic<>
 4 회 : PPL, Parallel Pattern Library
– Task Parallelism (Concurrency Runtime)
– Parallel Algorithms
– Parallel Containers and Objects
– Cancellation in the PPL
– Debugging a Parallel Program
 5 회 : C++ AMP
 6 회 : 각 팀의 Thread 사용 현황
– 각 팀에서 사용중인 쓰레드
– PPL 적용 개선 가능한 것들
111

Concurrency Runtime
 자체의 thread pool 을 유지합니다 .
 Concurrency runtime 은 work-stealing 알고리즘으
로 각 쓰레드의 load 를 조정합니다 .
 Concurrency runtime 은 리소스 접근의 동기화를 위
해 서로 협동하는 blocking primitive 를 제공합니
다 .
– Parallel Pattern Library
– Asynchronous Agene Library
– Task Scheduler
– Resource Manager
113

PPL, Parallel Pattern Library
 Ppl 은 일반적인 목적의 parallel container 와
algorithm 을 제공합니다 .
 Ppl 은 parallel algorithm 을 통해 data
parallelism 을 제공합니다 .
 Ppl 은 task 를 통해 task parallelism 을 제공합니
다 .
114

Asynchronous Agent Library
 Actor-based programming 을 제공합니다 .
 Message passing interface 를 제공합니다 .
 아래 링크를 참조하세요 .
– http://msdn.microsoft.com/en-us/library/dd492627.aspx
115

Task Scheduler
 Task Scheduler 는 실행시간에 task 를 스케쥴링하고
조정 coordinate 합니다 .
 Processing 리소스를 최대한으로 사용하기 위해
work-stealing 알고리즘을 사용합니다 .
116

Resource Manager
 컴퓨팅 리소스 , 즉 프로세서 processor 와 메모리를
관리합니다 .
 가장 최적이 되도록 리소스를 할당합니다 .
 Task Scheduler 와 상호작용하면서 리소스에 대한 추
상 계층을 제공합니다 .
117

람다 lambda
In mathematical logic and computer science, lambda is used to
introduce anonymous functions expressed with the concepts of lambda
calculus.
118

람다 : syntax
a. lambda-introducer
(capture clause)
b. lambda declarator
(parameter list)
c. mutable (mutable
specification)
d. exception-
specification
(exception
specification)
e. trailing-return-type
(return type)
f. compound-statement
119

람다 : example
int main()
{
using namespace std;
// Assign the lambda expression that adds two numbers to an auto variable.
auto f1 = [](int x, int y) { return x + y; };
cout << f1(2, 3) << endl;
// Assign the same lambda expression to a function object.
function<int(int, int)> f2 = [](int x, int y) { return x + y; };
cout << f2(3, 4) << endl;
}
120

PPL 의 Task Parallelism
parallel pattern library
121

Ppl 예 ) 피보나치 수열 계산
// Calls the provided work function and returns the number of milliseconds
// that it takes to call that function.
template <class Function>
__int64 time_call( Function&& f )
{
__int64 begin = GetTickCount();
f();
return GetTickCount() - begin;
}
// Computes the nth Fibonacci number.
int fibonacci( int n )
{
if( n < 2 )
return n;
return fibonacci( n - 1 ) + fibonacci( n - 2 );
}
122

직렬처리 serial processing
__int64 elapsed;
// An array of Fibonacci numbers to compute.
std::array<int, 4> a = { 24, 26, 41, 42 };
// The results of the serial computation.
std::vector<std::tuple<int, int>> results1;
// Use the for_each algorithm to compute the results serially.
elapsed = time_call( [&]
{
std::for_each (std::begin(a), std::end(a), [&]( int n ) {
results1.push_back( std::make_tuple( n, fibonacci( n ) ) );
});
});
std::wcout << L"serial time: " << elapsed << L" ms" << std::endl;
123

병렬처리 parallel processing
// The results of the parallel computation.
concurrency::concurrent_vector<std::tuple<int, int>> results2;
// Use the parallel_for_each algorithm to perform the same task.
elapsed = time_call( [&]
{
concurrency::parallel_for_each( std::begin(a), std::end(a), [&]( int n ) {
results2.push_back( std::make_tuple( n, fibonacci( n ) ) );
});
// Because parallel_for_each acts concurrently, the results do not
// have a pre-determined order. Sort the concurrent_vector object
// so that the results match the serial version.
std::sort( std::begin( results2 ), std::end( results2 ) );
});
std::wcout << L"parallel time: " << elapsed << L" ms" << std::endl << std::endl;
124
/** Output
serial time: 9250
ms
parallel time:
5726 ms
fib(24): 46368
fib(26): 121393
fib(41): 165580141
fib(42): 267914296
*/

concurrency::task<>
#include <ppltasks.h>
#include <iostream>
//using namespace concurrency;
//using namespace std;
int wmain()
{
// Create a task.
concurrency::task<int> t( []()
{
return 42;
});
// In this example, you don't necessarily need to call wait() because
// the call to get() also waits for the result.
t.wait();
// Print the result.
std::wcout << t.get() << std::endl;
}
/* Output:
42
*/
125
concurrency::task<> 를 사용해 태
스크를 정의합니다 . task 의 템플
릿 인자는 태스크의 리턴타입 입
니다 .
wait() 는 태스크가 실행된 경우 ,
태스크의 종료를 기다립니다 .
get() 은 태스크의 종료시 리턴값을
얻습니다 .

concurrency::task::create_task(), then()
concurrency::task<std::wstring> write_to_string()
{
// Create a shared pointer to a string that is assigned to and read by multiple tasks.
// By using a shared pointer, the string outlives the tasks, which can run in the
background after
// this function exits.
auto s = std::make_shared<std::wstring>(L"Value 1");
return concurrency::create_task([s]
{
// Print the current value.
std::wcout << L"Current value: " << *s << std::endl;
// Assign to a new value.
*s = L"Value 2";
}).then([s]
{
// Print the current value.
std::wcout << L"Current value: " << *s << std::endl;
// Assign to a new value and return the string.
*s = L"Value 3";
return *s;
126
태스크를 생성하기 위해
create_task() 를 사용합니다 .
연속된 태스크는 task<> 의
then() 을 사용합니다 .

lambda 는 thread-safe 해야 합니다 .
// lambda-task-lifetime.cpp
// compile with: /EHsc
#include <ppltasks.h>
#include <iostream>
#include <string>
…
int wmain()
{
// Create a chain of tasks that work with a string.
auto t = write_to_string();
// Wait for the tasks to finish and print the result.
std::wcout << L"Final value: " << t.get() << std::endl;
}
/* Output:
Current value: Value 1
Current value: Value 2
Final value: Value 3
*/127
태스크의 동작은 thread-safe 해야
합니다 . 그러므로 람다함수는
thread-safe 한 동작이 보장되도
록 적절하게 변수를 capture 해
야 합니다 .
예에서 string 은
write_to_string() 이 리턴된 이
후에도 유효해야 하므로
std::shared_ptr<> 로 관리하고
있습니다 .

concurrency::task<std::array<std::array<int, 10>, 10>> create_identity_matrix([]
{
std::array<std::array<int, 10>, 10> matrix;
int row = 0;
std::for_each( std::begin(matrix), std::end(matrix), [&row](std::array<int, 10>& matrixRow)
{
std::fill( std::begin(matrixRow), std::end(matrixRow), 0);
matrixRow[row] = 1;
row++;
});
return matrix;
});
auto print_matrix = create_identity_matrix.then([](std::array<std::array<int, 10>, 10> matrix)
{
std::for_each( std::begin(matrix), std::end(matrix), [](std::array<int, 10>& matrixRow)
{
std::wstring comma;
std::for_each( std::begin(matrixRow), std::end(matrixRow), [&comma](int n)
{
std::wcout << comma << n;
comma = L", ";
});
std::wcout << std::endl;
});
});
128
then()

int wmain()
{
…
print_matrix.wait();
}
/* Output:
1, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 1, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 1, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 1, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 1, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 1, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 1, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 1, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 1, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 1
*/
129
auto create_identity_matrix = concurrency::create_task([]
{
std::array<std::array<int, 10>, 10> matrix;
int row = 0;
std::for_each( std::begin(matrix), std::end(matrix), [&row](std::array<int,
10>& matrixRow)
{
std::fill( std::begin(matrixRow), std::end(matrixRow), 0);
matrixRow[row] = 1;
row++;
});
return matrix;
});
concurrency::task 의 타입을 명시
적으로 정의하기보다는 auto 로
정의하고 , create_task() 를 이
용해서 정의합니다 .

task continuation
int wmain()
{
auto t = concurrency::create_task([]() -> int
{
return 0;
});
// Create a lambda that increments its input value.
auto increment = [](int n) { return n + 1; };
// Run a chain of continuations and print the result.
int result = t.then(increment).then(increment).then(increment).get();
std::wcout << result << std::endl;
}
/* Output:
3
*/
130

task in task
int wmain()
{
auto t = concurrency::create_task([]()
{
std::wcout << L"Task A" << std::endl;
// Create an inner task that runs before any continuation
// of the outer task.
return concurrency::create_task([]()
{
std::wcout << L"Task B" << std::endl;
});
});
// Run and wait for a continuation of the outer task.
t.then([]()
{
std::wcout << L"Task C" << std::endl;
}).wait();
}
131
/* Output:
Task A
Task B
Task C
*/

when_all()
int wmain()
{
// Start multiple tasks.
std::array<concurrency::task<void>, 3> tasks =
{
concurrency::create_task([] { std::wcout << L"Hello from taskA." << std::endl; }),
concurrency::create_task([] { std::wcout << L"Hello from taskB." << std::endl; }),
concurrency::create_task([] { std::wcout << L"Hello from taskC." << std::endl; })
};
auto joinTask = concurrency::when_all( std::begin(tasks), std::end(tasks) );
// Print a message from the joining thread.
std::wcout << L"Hello from the joining thread." << std::endl;
// Wait for the tasks to finish.
joinTask.wait();
}
132
/* Sample output:
Hello from the joining
thread.
Hello from taskA.
Hello from taskC.
Hello from taskB.
*/
when_all 은 task<std::vector<T>>
를 리턴합니다 .

when_all() : get returns
int wmain()
{
std::array<concurrency::task<int>, 3> tasks =
{
concurrency::create_task([]() -> int { return 88; }),
concurrency::create_task([]() -> int { return 99; })
};
auto joinTask = concurrency::when_all( std::begin(tasks), std::end(tasks) ).then([]( std::vector<int>
results )
{
std::wcout << L"The sum is "
<< std::accumulate( std::begin(results), std::end(results), 0 )
<< L'.' << std::endl;
});
// Print a message from the joining thread.
std::wcout << L"Hello from the joining thread." << std::endl;
// Wait for the tasks to finish.
joinTask.wait();
}133
/* Output:
Hello from the joining
thread.
The sum is 229.
*/

when_any()
int wmain()
{
std::array<concurrency::task<int>, 3> tasks = {
concurrency::create_task([]() -> int { return 99; })
};
// Select the first to finish.
concurrency::when_any( std::begin(tasks), std::end(tasks)).then([]( std::pair<int, size_t>
result)
{
std::wcout << "First task to finish returns "
<< result.first
<< L" and has index "
<< result.second
<< L'.' << std::endl;
}).wait();
}
134
/* Sample output:
First task to finish returns 42 and has
index 1.
*/
when_any() 는 최초로 완료된 태스크
의 std::pair< 리턴값 ,index> 를
리턴합니다 .

task group
 태스크의 집합을 관리합니다 .
– 태스크 그룹은 태스크를 work-stealing 큐에 push 합니다 .
– 그룹의 각 태스크는 concurrency::task_handl 로 접근합니다 .
 structured task group
– 그룹의 연산은 같은 쓰레드에서 일어나야 합니다 .
• cancel() 과 is_cancelling() 만 예외입니다 .
– wait() 호출 이후에 태스크를 추가하면 안 됩니다 .
– 쓰레드 간의 동기화를 하지 않으므로 task_group 보다 오버헤드가
적습니다 .
 unstructured task group
 concurrency::parallel_invoke() 는
structured_task_group 을 사용합니다 .
 태스크 그룹은 cancellation 을 지원합니다 .
135

structured_task_group
int wmain()
{
// Use the make_task function to define several tasks.
auto task1 = concurrency::make_task([] { /*TODO: Define the task body.*/ });
// Create a structured task group and run the tasks concurrently.
concurrency::structured_task_group tasks;
tasks.run( task1 );
tasks.run( task2 );
tasks.run_and_wait( task3 );
}
136
make_task 는 task 를 정의만 하고
실행하지 않습니다 .
structured task group 의
run_and_wait() 는 group 의 모
든 task 가 종료하기를 기다립니
다 .

parallel_invoke
template <typename T>
T twice( const T& t )
{
return t + t;
}
int wmain()
{
// Define several values.
int n = 54;
double d = 5.6;
std::wstring s = L"Hello";
// Call the twice function on each value concurrently.
// parallel_invoke uses structured_task_group internally. jintaeks on 20141107
concurrency::parallel_invoke(
[&n] { n = twice(n); },
[&d] { d = twice(d); },
[&s] { s = twice(s); }
);
// Print the values to the console.
std::wcout << n << L' ' << d << L' ' << s << std::endl;
}137
/** Output
108 11.2 HelloHello
*/
parallel_invoke() 는 내부적으로
structured_task_group 을 이용
합니다 .

unstructured task_group
int wmain()
{
// A task_group object that can be used from multiple threads.
concurrency::task_group tasks;
// Concurrently add several tasks to the task_group object.
concurrency::parallel_invoke(
[&] {
// Add a few tasks to the task_group object.
tasks.run([] { print_message(L"Hello"); });
tasks.run([] { print_message(42); });
},
[&] {
// Add one additional task to the task_group object.
tasks.run([] { print_message(3.14); });
}
);
// Wait for all tasks to finish.
tasks.wait();
}
138
/** Output:
Message from task: Hello
Message from task: 3.14
Message from task: 42
*/
non-structured task group 는 서로
다른 thread 에서 group 를 접근
해서 task 를 관리할 수 있습니다
.

Cancellation
concurrency::cancellation_token_source cts;
auto token = cts.get_token();
std::wcout << L"Creating task..." << std::endl;
// Create a task that performs work until it is canceled.
auto t = concurrency::create_task( []
{
bool moreToDo = true;
while( moreToDo )
{
// Check for cancellation.
if( concurrency::is_task_cancellation_requested() )
{
// TODO: Perform any necessary cleanup here...
// Cancel the current task.
concurrency::cancel_current_task();
}
else
{
// Perform work.
moreToDo = do_work();
}
}
}, token );
139

// Wait for one second and then cancel the task.
concurrency::wait( 1000 );
std::wcout << L"Canceling task..." << std::endl;
cts.cancel();
// Wait for the task to cancel.
std::wcout << L"Waiting for task to complete..." << std::endl;
t.wait();
std::wcout << L"Done." << std::endl;
140
/* Sample output:
Creating task...
Performing work...
Performing work...
Performing work...
Performing work...
Canceling task...
Waiting for task to complete...
Done.
*/

Cancellation callback
concurrency::cancellation_token_source cts;
auto token = cts.get_token();
// An event that is set in the cancellation callback.
concurrency::event e;
concurrency::cancellation_token_registration cookie;
cookie = token.register_callback( [&e, token, &cookie]()
{
std::wcout << L"In cancellation callback..." << std::endl;
e.set();
// Although not required, demonstrate how to unregister
// the callback.
token.deregister_callback(cookie);
} );
141

std::wcout << L"Creating task..." << std::endl;
// Create a task that waits to be canceled.
auto t = concurrency::create_task([&e]
{
e.wait();
}, token );
// Cancel the task.
std::wcout << L"Canceling task..." << std::endl;
cts.cancel();
// Wait for the task to cancel.
t.wait();
std::wcout << L"Done." << std::endl;
142
/* Sample output:
Creating task...
Canceling task...
In cancellation callback...
Done.
*/

Task tree
// Create a task group that serves as the root of the tree.
concurrency::structured_task_group tg1;
// Create a task that contains a nested task group.
auto t1 = concurrency::make_task([&] {
std::wcout << L"t1 task" << std::endl;
// Create a child task.
});
});
// Run the child tasks and wait for them to finish.
tg2.run(t4);
tg2.run(t5);
tg2.wait();
});143

});
});
tg1.run(t1);
tg1.run(t2);
tg1.run(t3);
tg1.wait();
144

Bitonic Merge Sort
 http://msdn.microsoft.com/en-
us/library/vstudio/dd728066(v=vs.110).aspx
146

현재 Kog 게임에 적용 , FrameMove
147

148
엘소드 Npc 의 OnFrameMove() 는 약
2700 줄 ㅡㅡ ;

예 ) create_task_tree()
// Create a task group that serves as the root of the tree.
// Create a task that contains a nested task group.
});
});
tg2.run(t4);
tg2.run(t5);
tg2.wait();
});150

});
});
tg1.run(t1);
tg1.run(t2);
tg1.run(t3);
tg1.wait();
151

Visual Studio 2012 Parallel 디버거
152

Parallel Task Window
 tg1 의 t1 에 breakpoint 가 활성화된 상황입니다 .
153

Parallel Task Window : task_group
 tg1 의 3 개의 task 중 2 개가 “활성” 상태입니다 .
154

Parallel Callstack Window
 “ 활성” 상태인 2 개의 task 가 2 개의 쓰레드에서 실
행되고 있습니다 .
155

Parallel Callstack Window : Task
 “ 작업 task” 단위로 callstack 을 관찰 할 수 있습니
다 .
156

Parallel Watch
 같은 변수의 각 thread 에서의 값을 관찰합니다 .
157

참고문헌
us/library/dd492418.aspx
 http://www.danielmoth.com/Blog/Parallel-Tasks-
New-Visual-Studio-2010-Debugger-Window.aspx
 http://channel9.msdn.com/Events/Windows-
Camp/Developing-Windows-8-Metro-style-apps-in-
Cpp/Async-made-simple-with-Cpp-PPL
 http://en.wikipedia.org/wiki/Bitonic_sorter
us/library/dd554943.aspx
159

Multithread programming 20151206_서진택

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Multithread programming 20151206_서진택

Similar to Multithread programming 20151206_서진택 (20)

More from JinTaek Seo

More from JinTaek Seo (20)

Multithread programming 20151206_서진택