C++11A subset of the features explored
What is happening?• We want• the performance of carefully optimized code• the convenience of a high level language• to use...
Example: Laplacian SmoothingVertex movesto center ofneighbors
Before
After 1k iterations
void laplacian_smooth0(Manifold& m,float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {VertexAttributeVecto...
void laplacian_smooth1(Manifold& m,float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {VertexAttributeVecto...
void laplacian_smooth2(Manifold& m,float weight, int max_iter){auto new_pos = m.positions_attribute_vector();for(int iter=...
void laplacian_smooth3(Manifold& m, float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {auto new_pos = m.po...
void laplacian_smooth4(Manifold& m,float weight, int max_iter){auto new_pos = m.positions_attribute_vector();for(int iter=...
void laplacian_smooth4_5(Manifold& m,float weight, int max_iter){auto new_pos = m.positions_attribute_vector();for(int ite...
void laplacian_smooth5(Manifold& m, float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {auto new_pos = m.po...
inline void laplacian_smooth_vertex(Manifold& m,vector<VertexID>& vids,VertexAttributeVector<Vec3d>& new_pos,float weight)...
StatisticsMedianBaseline 14,6 14,6 14,5 14,6 14,6 14,6Range for 14,4 14,2 14,2 14,2 14,2 14,2Copy back 12,4 12,4 12,4 12,4...
Now make it generic!
typedef vector<vector<VertexID>> VertexIDBatches;VertexIDBatches batch_vertices(Manifold& m) {VertexIDBatches vertex_ids(C...
void laplacian_smooth7(Manifold& m, float weight, int max_iter){auto vertex_ids = batch_vertices(m);auto new_pos = m.posit...
template<typename T>void for_each_vertex_parallel(int no_threads, const VertexIDBatches& batches, T& f) {vector<future<voi...
More C++11 examples
Polymorphism with std::functionclass MyClass {int c;public:MyClass(int _c): c(_c) {}function<int(int)> fun;void set_fun(fu...
Kinder, gentler member initclass VisObj! {! ! std::string file;! ! GLGraphics::GLViewController view_ctrl;! ! bool create_...
Kinder, gentler member initclass VisObj! {! ! std::string file = "";! ! GLGraphics::GLViewController view_ctrl =GLGraphics...
ArithVec changestemplate <class T, class V, unsigned int N>class ArithVec{protected:/// The actual contents of the vector....
ArithVec::ArithVec(T _a, T _b, T _c, T _d){assert(N==4);data[0] = _a;data[1] = _b;data[2] = _c;data[3] = _d;}ArithVec::Ari...
/// Assignment multiplication with scalar.const V& ArithVec::operator *=(T k){std::transform(data, &data[N], data,std::bin...
/// Assignment multiplication with scalar.const V& ArithVec::operator *=(T k){std::transform(data, &data[N], data,std::bin...
bool ArithVec:: operator==(const V& v) const{return std::equal(begin(),end(), v.begin());}bool ArithVec::operator==(const ...
circulate with functorsinline int circulate_vertex_ccw(const Manifold& m, VertexID v,std::function<void(Walker&)> f){Walke...
int valency(const Manifold& m, VertexID v){// perform full circulation to get valencyWalker vj = m.walker(v);while(!vj.ful...
bool connected(const Manifold& m, VertexID v0, VertexID v1){for(Walker vj = m.walker(v0); !vj.full_circle();vj = vj.circul...
inline Vec3d laplacian(const Manifold& m, VertexID v){Vec3d p(0);int n = circulate_vertex_ccw(m, v, [&](VertexID v){ p += ...
int no_edges(const Manifold& m, FaceID f){return circulate_face_ccw(m, f, [](Walker w){});}int no_edges(const Manifold& m,...
Conclusions• Multicore is very important and the C++11 thread library makesconcurrency easy.We will rely on the compiler f...
Discussion• A C++11 developer version of GEL has branched off: shouldwe go for built-in parallellism?• Hmm - just so you k...
Upcoming SlideShare
Loading in...5
×

Experiments with C++11

326

Published on

I experimented with several features of C++11: Lambda, range-for, thread, r-value references, etc. The context is a simple geometry processing/graphics example (10 lines) where I improve a mesh smoothing function. The old version took over 14 seconds. After C++11-ification it now takes 2.5 seconds

Published in: Education, Business, Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
326
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Experiments with C++11

  1. 1. C++11A subset of the features explored
  2. 2. What is happening?• We want• the performance of carefully optimized code• the convenience of a high level language• to use all our cores
  3. 3. Example: Laplacian SmoothingVertex movesto center ofneighbors
  4. 4. Before
  5. 5. After 1k iterations
  6. 6. void laplacian_smooth0(Manifold& m,float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {VertexAttributeVector<Vec3d> L_attr(m.no_vertices());for(VertexIDIterator v = m.vertices_begin(); v != m.vertices_end(); ++v)if(!boundary(m, *v))L_attr[*v] =laplacian(m, *v);for(VertexIDIterator v = m.vertices_begin(); v != m.vertices_end(); ++v)if(!boundary(m, *v))m.pos(*v) += weight*L_attr[*v];}}14.6OriginalIt is so C++98
  7. 7. void laplacian_smooth1(Manifold& m,float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {VertexAttributeVector<Vec3d> L_attr(m.no_vertices());for(VertexID v : m.vertices())if(!boundary(m, v))L_attr[v] =laplacian(m, v);for(VertexID v : m.vertices()){if(!boundary(m, v))m.pos(v) += weight*L_attr[v];}}}14.2Range forMuch better to read. Not only is thefor loop clear, we did away with `*´.vertices() returns a class which justcontains begin and end functions
  8. 8. void laplacian_smooth2(Manifold& m,float weight, int max_iter){auto new_pos = m.positions_attribute_vector();for(int iter=0;iter<max_iter; ++iter) {for(auto v : m.vertices())if(!boundary(m, v))new_pos[v] = weight*laplacian(m, v)+m.pos(v);m.positions_attribute_vector() = new_pos;}}12.4OptimizedAnd we only need one loop.We canmemory move the vertex positions
  9. 9. void laplacian_smooth3(Manifold& m, float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {auto new_pos = m.positions_attribute_vector();for(auto v : m.vertices())if(!boundary(m, v))new_pos[v] = weight*laplacian(m, v)+m.pos(v);m.positions_attribute_vector() = move(new_pos);}}12.6moveActually, we should move, but ... oh ...now I copy somewhere else
  10. 10. void laplacian_smooth4(Manifold& m,float weight, int max_iter){auto new_pos = m.positions_attribute_vector();for(int iter=0;iter<max_iter; ++iter) {for(auto v : m.vertices())if(!boundary(m, v))new_pos[v] = weight*laplacian(m, v)+m.pos(v);swap(m.positions_attribute_vector(),new_pos);}}12.1swapNow we only have two buffers forvertex positions and always read fromone and write to the other.Then swap!I think this version is the sweet spotfor single threaded code.
  11. 11. void laplacian_smooth4_5(Manifold& m,float weight, int max_iter){auto new_pos = m.positions_attribute_vector();for(int iter=0;iter<max_iter; ++iter) {for_each_vertex(m, [&](VertexID v) {new_pos[v] = weight*laplacian(m, v)+m.pos(v);});swap(m.positions_attribute_vector(),new_pos);}}Lambda variationNot much more clear.Should be about the sameperformance...
  12. 12. void laplacian_smooth5(Manifold& m, float weight, int max_iter){for(int iter=0;iter<max_iter; ++iter) {auto new_pos = m.positions_attribute_vector();vector<thread> t_vec;for(auto v : m.vertices())if(!boundary(m, v))t_vec.push_back(thread([&](VertexID vid){if(!boundary(m, vid))new_pos[vid] = weight*laplacian(m, vid)+ m.pos(vid);},v));for(int i=0;i<t_vec.size();++i)t_vec[i].join();m.positions_attribute_vector() = move(new_pos);}}∞Threads done wrongFor a brief moment I musthave thought I was codingto a GPU. First time I timedit, I got 666 times longer runtime
  13. 13. inline void laplacian_smooth_vertex(Manifold& m,vector<VertexID>& vids,VertexAttributeVector<Vec3d>& new_pos,float weight){for(auto v: vids)new_pos[v] = m.pos(v)+weight*laplacian(m, v);}void laplacian_smooth6(Manifold& m, float weight, int max_iter){vector<vector<VertexID>> vertex_ids(CORES);auto batch_size = m.no_vertices()/CORES;int cnt = 0;for_each_vertex(m, [&](VertexID v) {if (!boundary(m, v))vertex_ids[(cnt++/batch_size)%CORES].push_back(v);});vector<thread> t_vec(CORES);VertexAttributeVector<Vec3d> new_pos = m.positions_attribute_vector();for(int iter=0;iter<max_iter; ++iter) {for(int thread_no=0;thread_no<CORES;++thread_no)t_vec[thread_no] = thread(laplacian_smooth_vertex,ref(m), ref(vertex_ids[thread_no]),ref(new_pos), weight);for(int thread_no=0;thread_no<CORES;++thread_no)t_vec[thread_no].join();swap(m.positions_attribute_vector(), new_pos);}}2.5Almost five timesperformanceimprovement withfour physical cores.hyperthreadingworks!!CORES = 8
  14. 14. StatisticsMedianBaseline 14,6 14,6 14,5 14,6 14,6 14,6Range for 14,4 14,2 14,2 14,2 14,2 14,2Copy back 12,4 12,4 12,4 12,4 12,4 12,4Move back 12,5 12,5 12,9 12,9 12,6 12,6Swap 12,1 12,1 12,1 12,1 12,2 12,12 threads 6,8 6,7 6,8 6,7 6,7 6,74 threads 4,1 4,1 4,1 4,1 4,1 4,18 threads 2,5 2,5 2,5 2,5 2,5 2,5s s s s s s
  15. 15. Now make it generic!
  16. 16. typedef vector<vector<VertexID>> VertexIDBatches;VertexIDBatches batch_vertices(Manifold& m) {VertexIDBatches vertex_ids(CORES);auto batch_size = m.no_vertices()/CORES;int cnt = 0;for_each_vertex(m, [&](VertexID v) {if (!boundary(m, v))vertex_ids[(cnt++/batch_size)%CORES].push_back(v);});return vertex_ids;}template<typename T>void for_each_vertex_parallel(int no_threads, const VertexIDBatches& batches, T& f) {vector<thread> t_vec(no_threads);for(auto t : range(0, no_threads))t_vec[t] = thread(f, ref(batches[t]));for(auto t : range(0, no_threads))t_vec[t].join();}#1 Produces a vector ofvectors of vertex IDs#2 Actually spawns off workerthreads
  17. 17. void laplacian_smooth7(Manifold& m, float weight, int max_iter){auto vertex_ids = batch_vertices(m);auto new_pos = m.positions_attribute_vector();auto f = [&](const vector<VertexID>& vids) {for(VertexID v: vids)new_pos[v] = m.pos(v)+weight*laplacian(m, v);};for(auto _ : range(0, max_iter)) {for_each_vertex_parallel(CORES, vertex_ids, f);swap(m.positions_attribute_vector(), new_pos);}}Slightly faster, much simpler. Note I threw in a rangeclass to get rid of old school for loops.2.4
  18. 18. template<typename T>void for_each_vertex_parallel(int no_threads, const VertexIDBatches& batches, T& f) {vector<future<void>> f_vec(no_threads);for(auto t : range(0, no_threads))f_vec[t] = async(launch::async, f, ref(batches[t]));}template<typename T>void for_each_vertex_parallel(int no_threads, const VertexIDBatches& batches, T& f) {vector<thread> t_vec(no_threads);for(auto t : range(0, no_threads))t_vec[t] = thread(f, ref(batches[t]));for(auto t : range(0, no_threads))t_vec[t].join();}See the code above is simpler and the destructor joins!what happens if we ignore the future?!But the async code takes 50% more time than the old codewhere I join threads explicitly. Not sure why?!
  19. 19. More C++11 examples
  20. 20. Polymorphism with std::functionclass MyClass {int c;public:MyClass(int _c): c(_c) {}function<int(int)> fun;void set_fun(function<int(int,int)> f) {fun = bind1st(f, c);}};int fun1(int c, int x) { return c*x;}int fun2(int c, int x) { return x/c;}int main(int argc, const char * argv[]) {MyClass m1{1},m2{2};m1.set_fun(fun1);m2.set_fun(fun2);cout << m1.fun(42) << " " << m2.fun(42) << endl;}Maybe more exotic thanactually useful, but instructivethat polymorphism can beachieved so differently fromwhen using virtual functions
  21. 21. Kinder, gentler member initclass VisObj! {! ! std::string file;! ! GLGraphics::GLViewController view_ctrl;! ! bool create_display_list;! ! HMesh::Manifold mani;! ! HMesh::Manifold old_mani;! !! ! Harmonics* harmonics;GLGraphics::ManifoldRenderer* renderer;! ! CGLA::Vec3d bsphere_center;! ! float bsphere_radius;! public:! ! VisObj(): file(""), view_ctrl(WINX,WINY, CGLA::Vec3f(0), 1.0),create_display_list(true), harmonics(0) {}// ... and so onWe never really liked theselong initialization lists andalways wondered why wecould not just initialize whenwe declare
  22. 22. Kinder, gentler member initclass VisObj! {! ! std::string file = "";! ! GLGraphics::GLViewController view_ctrl =GLGraphics::GLViewController(WINX,WINY, CGLA::Vec3f(0), 1.0);! ! bool create_display_list = true;! ! HMesh::Manifold mani;! ! HMesh::Manifold old_mani;! !! ! Harmonics* harmonics = nullptr;GLGraphics::ManifoldRenderer* renderer = nullptr;! ! CGLA::Vec3d bsphere_center;! ! float bsphere_radius;! public:! ! VisObj() {}// and so onNow, we can! Whatis up with nullptr?!
  23. 23. ArithVec changestemplate <class T, class V, unsigned int N>class ArithVec{protected:/// The actual contents of the vector.std::array<T,N> data;// ......... Look, I did away with C stylearrays
  24. 24. ArithVec::ArithVec(T _a, T _b, T _c, T _d){assert(N==4);data[0] = _a;data[1] = _b;data[2] = _c;data[3] = _d;}ArithVec::ArithVec(T _a, T _b, T _c, T _d):data({_a,_b,_c,_d}) {assert(N==4);}Look! an initializer list ... hmmm MSVC does not like it
  25. 25. /// Assignment multiplication with scalar.const V& ArithVec::operator *=(T k){std::transform(data, &data[N], data,std::bind2nd(std::multiplies<T>(), k));return static_cast<const V&>(*this);}/// Assignment multiplication with scalar.const V& ArithVec::operator *=(T k){std::for_each(begin(), end(), [k](T& x){x*=k;});return static_cast<const V&>(*this);}Note: begin() and end()make the code nicerthan before
  26. 26. /// Assignment multiplication with scalar.const V& ArithVec::operator *=(T k){std::transform(data, &data[N], data,std::bind2nd(std::multiplies<T>(), k));return static_cast<const V&>(*this);}/// Assignment multiplication with scalar.const V& ArithVec::operator *=(T k){for(auto& x : data) {x*=k;}return static_cast<const V&>(*this);}Morten: this isactually simpler!
  27. 27. bool ArithVec:: operator==(const V& v) const{return std::equal(begin(),end(), v.begin());}bool ArithVec::operator==(const V& v) const{return std::inner_product(data, &data[N], v.get(), true,! ! ! std::logical_and<bool>(), std::equal_to<T>());}Just to use theobvious.This waspossible beforeC++11
  28. 28. circulate with functorsinline int circulate_vertex_ccw(const Manifold& m, VertexID v,std::function<void(Walker&)> f){Walker w = m.walker(v);for(; !w.full_circle(); w = w.circulate_vertex_ccw()) f(w);return w.no_steps();}inline int circulate_vertex_ccw(const Manifold& m, VertexID v,std::function<void(VertexID)> f){return circulate_vertex_ccw(m, v, [&](Walker& w){f(w.vertex());});}Five slides that showwhat we can do by havingcirculator functionsaccepting functors
  29. 29. int valency(const Manifold& m, VertexID v){// perform full circulation to get valencyWalker vj = m.walker(v);while(!vj.full_circle())vj = vj.circulate_vertex_cw();return vj.no_steps();}int valency(const Manifold& m, VertexID v){return circulate_vertex_ccw(m,v, [](Walker){});}
  30. 30. bool connected(const Manifold& m, VertexID v0, VertexID v1){for(Walker vj = m.walker(v0); !vj.full_circle();vj = vj.circulate_vertex_cw()){if(vj.vertex() == v1)return true;}return false;}bool connected(const Manifold& m, VertexID v0, VertexID v1){bool c=false;circulate_vertex_ccw(m, v0, [&](VertexID v){ c |= (v==v1);});return c;}
  31. 31. inline Vec3d laplacian(const Manifold& m, VertexID v){Vec3d p(0);int n = circulate_vertex_ccw(m, v, [&](VertexID v){ p += m.pos(v); });return p / n - m.pos(v);}Vec3d laplacian(const Manifold& m, VertexID v){Vec3d avg_pos(0);int n = 0;for(Walker w = m.walker(v); !w.full_circle(); w = w.circulate_vertex_cw()){avg_pos += m.pos(w.vertex());++n;}return avg_pos / n - m.pos(v);}
  32. 32. int no_edges(const Manifold& m, FaceID f){return circulate_face_ccw(m, f, [](Walker w){});}int no_edges(const Manifold& m, FaceID f){// perform full circulation to get valencyWalker w = m.walker(f);for(; !w.full_circle(); w = w.circulate_face_cw());return w.no_steps();}
  33. 33. Conclusions• Multicore is very important and the C++11 thread library makesconcurrency easy.We will rely on the compiler for SIMDoptimization!• range for is great. Makes code far more clear and we get rid ofiterators in many cases• move semantics & RVO make clear code faster• lambda functions improve on locality ... awesome with the STLalgorithms and std::function• auto helps us avoid obfuscation with ugly type names• uniform initialization and initializer lists also make code concise
  34. 34. Discussion• A C++11 developer version of GEL has branched off: shouldwe go for built-in parallellism?• Hmm - just so you know - there is much more in the C++11standard.This is just the part I understand so far...• Herb Sutter:“We broke all the books!”• Yet the learning curve is less daunting than when we first hadto do templates.

×