1. Yuan Yipeng (GPU Architect, SoC System, Linux Device driver, Testing
Engineer)
(086)15026966201
larkforsure@gmail.com
Profile:
Now everything around GPU.
Key Skills:
Architecture building and early stage studying of Streaming
Multiprocessor(SM) plus Texture unit, the core of GPU: Focus on cycle
accurate Performance monitor. Also have knowledge of GPGPU, Graphics, flow
automation, RTL verification.
Multi-Core System Algorithm Implementation and Optimization: From paper
reading to algorithm simulation, choosing, implementation, and optimization.
Familiar with both heterogeneous and homogeneous hardware systems.
Embedded System Design and Implement: Linux kernel and Device-Driver
programming skills for advanced application. Familiar with kernel debugging,
profiling, Makefile, etc.
Automation Testing: Testing framework, interface design and implementation.
Framework extension capability guaranteeing. Testcases reviewing, checking,
integration and debugging, Agreements compliance and DTS tracking. Model-
based Testing exploring. Nearly everything involved in this.
Digital Logic Design: Familiar with Verilog, principles of synthesizable,
verifiable RTL coding style. Understanding the knowledge of timing, low power,
multiple clock domains, etc.
EXPERIENCE/ACCOMPLISHMENTS:
Next generation GPU SM/Texture unit performance modeling and
verification (NVIDIA 2015, 2016)
2. C++ based performance simulator for early architecture studying, and pre-tapeout
stage verification;
CUDA on new architecture fast-modeling perf. simulator project owner;
GPGPU and 3D Graphics instruction level chip pipeline investigation and
optimization;
CUDA system level perf simulator integrated to SM level simulator, as equally
important for CUDA programs perf study.
Perf. tuning for GPU Architecture in development and exploring for GPU
Architecture in next generation;
Trace perf. and function verification for RTL;
SASS assembly code analyzing and debugging;
SASS threads level ( WARP ) flow control unit (CBU, CRS) simulation,
optimization, and corner case pipe-clean;
CUDA traces capturing and analyzing;
Texture function implement;
Traces with perf. data delivery on clients' requests;
Highlight:
Found, triage, and fixed one history bug in SM/TPC performance simulator,
which improved department's most essential trace deliver flow, from 2-7 days to
less than 1 hour per delivery;
"Very Good" performance evaluation for the first year;
Automation Testing Framework Building and Directing (Huawei, Opto-
electronics)
Five months after the boarding, Huawei Opto-electronics depart. had a workload
improvement from nearly all manually test, to one third off to automation-based
test. Those ignored chains since of time restrictions before are now mostly
covered. Found two history bugs which existed for many years, and other bugs out
of counting.
The framework and interface are designed and implemented by himself alone,
characterized with usability, extendibility, stability. Directing testcases implement
3. of the whole depart. Responsible for submitted testcases, of all, reviewing,
checking, integration, and debugging. Also the bug location and DTS tracking
following.
He had no Ruby or testing experiences before.
Vector Media Accelerator 2 Development (IBM 2008):
Responsible for the software specification implement of H.264 SOC in the early
stage; FPGA prototype verification, Linux device driver coding, software-
hardware cooperative architecture design and implement, SOC hardware profiling
in the later stage.
Cooperating with other team members, he had implemented both the software
version and the Linux Device-Driver version of the H.264 Main Profile features
(including Loopfilter, B-frame, CABAC, VLC), with the performance improved
in a large pace as a result of his optimization to the original H.264 algorithm,
which was mainly based on his good understanding of the standard.
Vector Media Accelerator Component Development (IBM 2006):
Coding for the for the Vector Processor element which consisted of VLIW, SIMD
and pipeline programming skills.
He was responsible for the optimization of H.264 software for the Vector
Processor element which was integrated in that component. It could be taken as a
typical DSP based system developing process, which consisted of VLIW, SIMD
and pipeline software programming skills.
Rochester Medical Imaging Project (Segmentation) on CELL BE (IBM 2007)
Collaborating with colleagues from worldwide to develop a prototype on the Cell
platform to demonstrate the capability of Cell in medical imaging processing. He
has written fully optimized image processing code on the SPE side, moreover, he
has also written communication and synchronizing code both on the PPE and SPE
sides.
German Medical Imaging Projection B&P (Reconstruction) (IBM 2007)
Supporting German sales team in the B&P processing. Investigating several fast
Reconstruction algorithms and comparing them based on the speed, complexity,
4. reconstruction quality, bandwidth occupation, etc. Choosing one algorithm which
is the most suitable for CELL BE.
CELL FOAK project for STTRI (Shanghai Telecommunication Technology
Research Institute) (IBM 2007)
Developing a prototype of media server based on IBM Cell to show the high
performance of CELL on multimedia. Investigating the architecture of next-
generation high performance media server. His part of job includes implementing
media libraries of several standards on a multi-core processor, namely CELL;
designing a framework involving media libraries which must be de-coupling and
re-useable so can easily move it to support other telecom operations; designing a
scalable interface between caller(application server) and media libraries so as to
accommodate future practical application scenario (dynamic 0~8000 routes of
calls simultaneously); designing the interface between media framework and
users’ through RTP protocol.
EDUCATION:
MS. of Space Physics, Wuhan University, China (2006)
BS. of Electronics Engineering, Wuhan University, China (2004)