Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Real-Time Face Detection, Tracking,... by Jia-Bin Huang 1444 views
- Resources optimisation for OpenGL —... by Black Sea Summit ... 514 views
- TargetSummit Moscow Late 2016 | Loo... by TargetSummit 195 views
- Real-time video processing at a gla... by Black Sea Summit ... 858 views
- Fedor Polyakov - Optimizing compute... by Eastern European ... 1997 views
- Go an Epic Selfie Adventure at Your... by Shelly Sanchez Te... 7198 views

1,001 views

Published on

November 1, 2014

Published in:
Mobile

No Downloads

Total views

1,001

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

12

Comments

0

Likes

1

No embeds

No notes for slide

- 1. REAL-TIME FACE TRACKING NOV 2014
- 2. LOOKSERY + + VIDEO SELFIES FACE FILTERS INTEGRATED CHAT
- 3. REAL-TIME FACE TRACKING DEMO 3
- 4. - Algorithm based on Active Appearance Model. - Algorithm complexity is independent from image size. - You can control balance between tracking quality and tracking speed using only two constants. - Algorithm is iterative. Solve Least-Square problem at each iteration. - Average 5 iterations per frame. Maximum 10, minimum 1. - If you want run on 30 fps you have to perform about 150 iterations per second. 4 TRACKING ALGORITHM
- 5. Optimisation flow —— : Algorithm asymptotic optimisation 3 FPS: First implementation 8 FPS: Memory preallocation 10 FPS: Algorithm parameters optimisation 13 FPS: Matrix storage optimisation and removing OOP code 18 FPS: Rewrite bottleneck code at assembler 24 FPS: Asymptotic optimisation of matrices multiplication 27 FPS: Replacing operations with float to operations with int 30 FPS: Multithreading 5
- 6. From float to int 6 G[i][j] = (X[i][j] - Y[i][j]) / d[j]; We had to build so-called pseudo-inverse, that is So we have to perform many multiplication operations. Multiplication of two int is much faster then multiplication of two float. Lets create int matrix V: V[i][j] = X[i][j] - Y[i][j]; And float matrix D: D[i][j] = ( i== j ? d[i] : 0); // diagonal matrix Then G = V * D. From linear algebra:
- 7. 7 CODE TIME const int ITERATIONS = 2000000000; long long sum = 0; for (int i = 0; i < ITERATIONS; i++) sum += i * (long long)i; cout<<sum<<endl; 0.00 sec const int ITERATIONS = 2000000000; long long sum = 0; for (int i = 0; i < ITERATIONS; i++) sum += i * (long long)i / 3; cout<<sum<<endl; 2.10 sec const int ITERATIONS = 2000000000; float sum = 0; for (int i = 0; i < ITERATIONS; i++) sum += i * (float)i / 3; cout<<sum<<endl; 4.29 sec Demo benchmarks
- 8. Matrices multiplication optimisations 1) Don’t create a matrix with power of two size. Cache uses simple hash function to select a cash line in which the memory will be cached. This hash is just a some low (i.e. 16) bits of the memory address. When you use the matrix with the size power of two, each of the row has the same lowest bits, so you contain only one row in your cache instead of nearly a whole matrix. 2) Change the order of matrices multiplication: to multiply two matrix n x m and m x s you have to perform n * m * s operations. If you want to multiply the matrices A(n x m) * B(m x s) * C(s x k), you can do it in two ways with the same result: (A * B) * C with n*m*s + n*s*k operations. or A * (B * C) with m*s*k + n*m*k operations. n*m*s + n*s*k != m*s*k + n*m*k in general case, choose the smallest one. 8
- 9. Hello assembler 9 int *row = GT[i]; for (int j = i, pos = (int)(i * GT.columnCount()); j < GT.rowCount(); j++) { int curr = 0; for (int k = 0; k < GT.columnCount(); k++, pos++) curr += row[k] * GT.val[pos]; GTG[i][j] = GTG[j][i] = curr; } It looks optimised enough. Is there anything we can improve? Well, let’s have a look at ASM code.. 0x149ac2: ldr.w lr, [r5, r9, lsl #2] 0x149ac6: add.w r9, r9, #0x1 0x149aca: cmp r9, r2 0x149acc: ldr r8, [r12], #4 0x149ad0: mla r11, lr, r8, r11 0x149ad4: blo 0x149ac2 ;at AppearanceTracker.cpp:555 No SIMD instructions there :(
- 10. Let’s add some SIMD 10 int *row = GT[i]; int *rowInit = row; int *rowPos = GT.val + i * GT.columnCount(); int *rowEnd = row + processedCnt; for (int j = i; j < GT.rowCount(); j++) { row = rowInit; int accum[8] = {0}; __asm__ volatile ( "vld1.32 {d8-d11}, [%[accum]] nt" "L_mulStart%=:nt" "vld1.32 {d0-d3}, [%[row]]! nt" "vld1.32 {d4-d7}, [%[val]]! nt" "vmla.i32 q4, q2, q0 nt" "vmla.i32 q5, q3, q1 nt" "cmp %[row], %[rowEnd]nt" "blo L_mulStart%=nt" "vst1.32 {d8-d11}, [%[accum]]nt" : [row] "+r" (row), [val] "+r" (rowPos) : [rowEnd] "r" (rowEnd), [accum] "r" (accum) ); //собирание 8 значений из accum //допроцесс остатка mod 8 } int *row = GT[i]; for (int j = i, pos = (int)(i * GT.columnCount()); j < GT.rowCount(); j++) { int curr = 0; for (int k = 0; k < GT.columnCount(); k++, pos++) curr += row[k] * GT.val[pos]; GTG[i][j] = GTG[j][i] = curr; }
- 11. Practical difference? 11 Let’s profile it Before: After: Approx. 2-2.5 times faster
- 12. 12 Some issue about hardware Task: Crop a square from CMSampleBuffer(that contains CVImageBufferRef) and write it using AVAssetWriterInputPixelBufferAdaptor Input buffer address Target image address Create CMSampleBuffer by just moving base address and new setting height. O(1) operation. BAD Create CMSampleBuffer by creating new CVPixelBufferRef from CVTextureCache and copy image. O(Height*Width) operation GOOD
- 13. 13 iOS 8 strikes back iPhone 5S iOS 7.1 - 30 FPS iPhone 5S iOS 8.0 - 15 FPS O_o Possible reasons: 1) Memory corruption at C++ core code 2) iOS 8 QOS: Wrong queue priority: QOS_CLASS_BACKGROUND instead of QOS_CLASS_USER_INITIATED 3) Blinking of this guy
- 14. CONTACT INFORMATION FEDOR POLYAKOV Mobile: +38 097 59 0000 9 E-Mail: fedor@looksery.com YURII MONASTYRSHYN Mobile: +38 067 482 60 97 E-Mail: yurii@looksery.com VICTOR SHABUROV, FOUNDER Mobile: +1 650 575 9359 Fax: +1 866 626 9582 E-Mail: victor@looksery.com WEB looksery.com facebook.com/looksery twitter.com/looksery

No public clipboards found for this slide

Be the first to comment