You may ask “Why are these design decisions different from a CPU?”In fact, the GPU’s goals differ significantly from the CPU’sGPU evolved to solve problems on a highly parallel workloadCPU evolved to be good at any problem whether it is parallel or notFor example, the trip out to memory is long and painfulThe question for the chip architect: How to deal with latency?One way is to avoid it: the CPU’s computational logic sits in a sea of cache. The idea is that few memory transactions are unique, and the data is usually found quickly after a short trip to the cache.Another way is amortization: GPUs forgo cache in favor of parallelization. Here, the idea is that most memory transactions are unique, and can be processed efficiently in parallel. The cost of a trip to memory is amortized across several independent threads, which results in high throughput.
In fact, manycore NVIDIA GPUs make parallel processing a commodity technologyGPUs are mass-market commodity products sold at tremendous economies of scale. We sell around 1 million GPUs per week! That’s about 100 per minute. GPUs are massively parallel devices. Our high-end GPUs have 3072 heavily multithreaded thread processors, 7.08 billion transistorsUpshot: Massively parallel computing has become a commodity technology!
Cognitive robotics tools and technology
Cognitive Robotics Technology and ToolsMartin Peniak
Overview iCub – iCognitive universal body YARP – Yet Another Robotics Platform GPU – Graphics Processing Unit Aquila – Acquisition of language and actions
iCub humanoid robot The dimensions are similar to that of a 3.5 year old child 53 degrees of freedom Came from the European Framework 6 project: RobotCub (www.robotcub.org) There are now 20 iCubs in different labs in Europe and 1 in the US Continued design - v2.0 to come out Various ongoing project outcomes are distributed via an open-source software repository and via hardware upgrades A free iCub simulator is available
iCub humanoid robotDexterous hands for object manipulation
iCub humanoid robotSimulator Open-source Developed as part of a joint effort with the European project iTalk Widely adopted within cognitive robotics community V. Tikhanoff, P. Fitzpatrick, F. Nori, L. Natale, G. Metta, and A. Cangelosi, “The icub humanoid robotsimulator,” In International Conference on Intel ligentRObots and Systems IROS, Nice, France, 2008
YARPYet Another Robotic Platform Supports building a robot control system as a collection of programs communicating via tcp, udp, multicast, local, MPI Can be broken down into: libYARP_OS - interfacing with the operating system(s) to support easy streaming of data across many threads across many machines libYARP_sig - performing common signal processing tasks (visual, auditory) in an open manner easily interfaced with other commonly used libraries, for example OpenCV libYARP_dev - interfacing with common devices used in robotics: framegrabbers, digital cameras, motor control boards, etc.
YARPYet Another Robotic Platform /icub /icubSimYARP moduleYARP module YARP module YARP module YARP module YARP module YARP server
YARPTerminal commandsyarp yarp namespaceyarp help yarp pingyarp check yarp readyarp clean yarp regressionyarp cmake yarp resourceyarp conf yarp rpcyarp detect yarp rpcserveryarp disconnect yarp runyarp exists yarp serveryarp forward yarp terminateyarp help yarp topicyarp name yarp versionyarp name check yarp waityarp name list yarp whereyarp name unregister yarp write
YARP and iCub simulatorControlling motors yarp rpc /icubSim/left_leg/rpc:I Terminal 1: yarpserver 6 joints Starts YARP server yarp rpc /icubSim/right_leg/rpc:i Terminal 2: iCub_SIM 6 joints Starts iCub simulator yarp rpc /icubSim/torso/rpc:I Terminal 3: yarp rpc /icubSim/left_arm/rpc:I 3 joints Terminal 3: set pos 0 – 90 yarp rpc /icubSim/left_arm/rpc:I Terminal 3: set vel 0 50 the arm includes the hand for a total of 16 controlled degrees of freedom Terminal 3: set pos 0 90 yarp rpc /icubSim/right_arm/rpc:I structure is identical to the left arm
YARP and iCub simulatorDisplaying camera outputs and controlling joints Terminal 1: yarpserver Terminal 2: iCub_SIM Terminal 3: yarpview /left Terminal 3: yarpview /right Terminal 3: yarp connect /icubSim/cam/left /left Terminal 3: yarp connect /icubSim/cam/right /right Move the iCub’s head and see the vision changing: Terminal 3: yarp rpc /icubSim/head/rpc:I Terminal 3: set pos 0 -30 (head will move down) Terminal 3: set pos 0 30 (head will move up) Easier way is to use the existing graphical user interface: Terminal 3: robotMotorGui To display camera outputs form the real iCub change the /icubSim prefix with /icub
Computation of visual, auditory, and tactile perception while performingelaborate motor control in real-time requires a lot of computation
YARP can run across any number of machines with different operating systems
CPU vs. GPU Different goals produce different designs GPU assumes work load is highly parallel CPU must be good at everything, parallel or not CPU: minimize latency experienced by 1 thread big on-chip caches sophisticated control logic GPU: maximize throughput of all threads # threads in flight limited by resources => lots of resources (registers, bandwidth, etc.) multithreading can hide latency => skip the big caches share control logic across many threads
GPU Evolution High throughput computation “Kepler” GeForce GTX 690: 2 x 2811 GFLOP/s 7B xtors High bandwidth memory GeForce GTX 690: 2 x 192 GB/s “Fermi” High availability to all 3B xtors 200+ million CUDA-capable GPUs in the world GeForce 8800 681M xtors GeForce FX 125M xtors GeForce 3 GeForce 256 60M xtorsRIVA 128 23M xtors3M xtors1995 2000 2005 2010 2012
Programming GPUs with CUDAHistory Nvidia creates CUDA to facilitate the development of parallel programs on GPUs (2007) The CUDA language is ANSI C extended with very few keywords for labeling data-parallel functions (kernels) and their associated data Nvidia technology benefits from massive economies of scale in the gaming market, CUDA-enabled cards are very inexpensive for the performance they provide 21
InspirationSimplification of commonly used features on the iCub and the simulator
InspirationDevelopment of bio-inspired models and tools
InspirationScalability modularity and platform-independency /aquila/yarprun/0 /aquila/yarprun/1 /aquila/yarprun/2 GPU CPU
InspirationOvercoming computational constrains by using GPU processors Motion compliance < 1 ms Vision (30fps) < 33 ms Vision (60fps) < 16 msWe typically take 33 ms as the cut-off time. 1 complete cycle ofeverything critical MUST be completed in that time.Of course some processes are not critical and their informationcan be used as and when it becomes available, subject tovarious constraints.