• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
icps-long.ppt
 

icps-long.ppt

on

  • 777 views

 

Statistics

Views

Total Views
777
Views on SlideShare
777
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The images from the left and right cameras are transferred to the PC using Sony's iLink interface which is based on IEEE 1394 serial bus standard (also known as FireWire) at a data rate reaching 400 Mbps. This stream is converted to WDM (Windows Digital Media) by a PCI card that hosts FireWire input ports for devices using FireWire standard. After that we hook capture filters provided by DirectShow to get hold of the video stream from the cameras. One we have video stream, the SampleGrabber is attached to capture the video samples from the stream. For termination purposes a null renderer is used to end the stream. If required, a renderer filter can be used to display the video on the primary output device.
  • Both the client and the server side software are written in Visual C++ using MFC (Microsoft Foundation Classes). In the beginning the client as well as the server needs to be setup and each side has different steps to be taken in the startup phase.
  • By having a look at the figure, it is clear that the server side setup is not changed. Rather we have allocated two buffers, one for each stereo frame on the server. Every time a picture is received, the callback function of the respective camera is invoked. Once inside the callback function, it accesses a shared variable among multiple threads which indicates which buffer was copied to in the previous successful callback of this very camera.
  • Two video cameras generate stereo pictures which are sent to the client with the help of vision server. The user may issue commands to the DecisionServer which in turn makes use of PUMA and Force Sensor components to carry out these commands. Both the stereo video data and the distributed component calls share the same LAN, however they open different ports for the data transfer. The client side uses the GUI as well as master arm to issue commands to the slave arm on remote side. The vision client receives the synchronized stereo data from the LAN through windows sockets and provides a stereo display of the remote scene to the viewer with the help of eye-shuttering glasses.
  • 2) This will enable the operator to see the location of the gripper one step ahead of time.
  • Projection matrix M can be calculated by finding the projections of four non-coplanar points in the pixel coordinates The inverse of A must exist if the 4 reference points are non coplanar. For the stereo projections, two matrices are required for left and right projection. The user can either choose the default locations of the fiducial points, or he can enter the new 3D locations of the same if they have changed since last setup.

icps-long.ppt icps-long.ppt Presentation Transcript

  • A DISTRIBUTED FRAMEWORK FOR RELAYING STEREO VISION FOR TELEROBOTICS M. Al-Mouhamed, O. Toker, A. Iqbal, and M. Nazeeruddin
  • Contents  Introduction  Background  Status Of The Problem  Literature Review  Thesis Objectives  Video Client-Server Framework  Distributed Telerobotic Framework  Augmented Reality  Conclusions  Thesis Contributions  Future Research Directions 2
  • Introduction  Telerobotics: humans to extend their manipulative skills over a distance, extend eye-hand motion coordination.  Telerobotic applications  Scaled-down: nano-scale, micro-surgery, clean-room  Hazardous: nuclear decommissioning & inspection, fire fighting, disposal of dangerous objects, minefield clearance, operation in harsh environments, unmanned, underwater, ice, desert, space,  Safety: rescue,  Security: surveillance, reconnaissance,  Unmanned: oil platform inspection, repair,  Teaching, training, and entertainment. 3
  • Introduction … (cont.)  Minefield clearance, unmanned underwater inspection, and search & rescue.  Those where humans adversely affect the environment such as medical applications and clean-room operations.  Those which are impossible for humans to be situated in such as deep space and nanorobotics. 4
  • Introduction … (cont.)  Extending eye-hand motion coordination using telerobotics  In natural eye-hand motion coordination, operator sees his hand and react accordingly.  In telerobotics:  Operator holds a master arm to dictate his hand motion,  Motion is transmitted to a remote slave arm and reproduced (replica),  Operator wears a head-mounted display (HMD) to see in 3D the effects of his motion on the remote tool,  Operator does not see his hand (HMD) nor the master arm, his hand is logically mapped to the remote tool,  Operator logically acts on the remote tool seen through the HMD.  Stereo vision: 3D views of slave scene and a metric to calculate 3D positions and orientations of objects. 5
  • Background … (cont.) A two-way logical communication link to transfer commands from client to the server through a Computer Network and to convey LAN  different kinds of feedback, e.g., video, force etc., back to the client site. Human Sensors, Video, Operator Network Sound, Force Feedback Master Arm, HMD, Actuators, Glove, Hands, Arms, Force feedback Feet 6
  • Background … (cont.)  A Telepresence system is one which displays high quality information from the remote world, visual or otherwise, in such a natural way that the operator feels physically present at the remote site.  Virtual Reality (VR) is the interactive simulation of a real or imagined environment that can be experienced visually or otherwise in the three dimensions of width, height, and depth. 7
  • Video Client-Server Framework  The provision of stereo video on the client side imposes severe requirements in terms of bandwidth to transfer real-time stream of video data in a telerobotic environment.  It requires the use of advanced technologies like DirectX and Windows Sockets to accomplish the capturing and relaying of video data over a LAN.  Commercially available software like Microsoft NetMeeting are optimized for a low band-width network like internet so they show too poor display resolution to be used for stereo vision in a telerobotic setup. 8
  • Video Client-Server Framework  Development of a highly optimized client- server framework for grabbing and relaying of a stereo video stream  Server tasks:  Capture or grab stereo images from two cameras  Establish a reliable client-server connection  Upon requests from the client send this stereo frame comprising of two pictures to the client through windows sockets 9
  • Video Client-Server Framework  Client tasks:  Detect and establish the connection with server  Establish a highly optimized fast graphic display system to show the pictures received from the server.  Display the pictures arrived from the server and continue in a loop each time asking a new stereo frame from the server.  Allow the viewer to adjust the alignment of pictures on the HMD to compensate for the misalignment and non-linearity present in the camera at server. 10
  • Video Client-Server Framework  Proposed client-server framework is based Microsoft Visual C# and Microsoft DirectX.  Microsoft DirectX provides COM based interfaces for various graphics related functionalities. DirectShow is one of these services. DirectShow, further, provides efficient interfaces for the capturing and playback of video data. 11
  • Video Client-Server Framework  We can use network services and send/receive data over a network using windows sockets. The stereo video setup uses synchronous windows sockets as an interface between vision server and client.  Two different schemes were implemented to transfer the video data. The schemes differ in the usage of multiple threads on the server side as well as some optimization steps to reduce the network traffic for the transfer of the video data.  A general overview of the image grabbing and displaying system is given before the detailed description of the above scheme. 12
  • Video Client-Server Framework  We use a component of DirectShow named SampleGrabber to capture video frames coming through a stream from a stereo camera setup. A block diagram of the scheme used at the server side to grab stereo frames is shown below: 13
  • Video Client-Server Framework  In order to show the received pictures from the server, we need to use GDI (Graphics Device Interface). A block diagram of the client side scheme to display the video is shown below: 14
  • Video Client-Server Framework (Single Buffer, Serialized Transfer) 15
  • Video Client-Server Framework Double Buffer, De-Serialized Transfer  In this scheme, we try to optimize the transfer of video data over the LAN by using thread manipulation on the server.  Thread overlapping among capture and sending thread is achieved using double buffers on the server side.  It is ensured that the thread responsible for sending the video data over the LAN will not wait after receiving a picture request from the client. 16
  • Video Client-Server Framework Double Buffer, De-Serialized Transfer 17
  • Video Client-Server Framework Double Buffer, De-Serialized Transfer  This approach enables us to send higher number of stereo frames over the same LAN and hardware.  The only overhead is the allocation of extra buffer in the server DRAM which not a real problem with available systems containing large memory. 18
  • Video Client-Server Framework 3D Visualization  There can be different methods to produce 3D effects on the client side once we have stereo images of the remote scene.  Similarly different hardware device such as eye- shuttering glasses, HMD (Head Mounted Display) are used to show the images to the user.  We have used following two methods for stereo image production on client side:  Sync-Doubling  Page Flipping 19
  • Video Client-Server Framework Sync-Doubling  Left and right eye images are arranged in an up and down way on the computer screen.  A sync-doubler sits between the display output from the PC and the monitor to insert an additional frame v-sync between the left and right frames (i.e. the top and bottom frames).  This will allow the left and right eye images to appear in an interlaced pattern on screen.  Using the frame v-sync as the shutter alternating sync allows us to synchronically transmit the right and left frames to respective left and right eyes, thus creating a three-dimensional image. 20
  • Video Client-Server Framework Sync-Doubling 21
  • Video Client-Server Framework Page Flipping  Page-flipping means alternately showing the left and right eye images on the screen.  Combining the 3D shuttering glasses with this type of 3D presentation requires the application of frame v-sync as the shutter alternating sync to create a 3D image.  HMD can also be used in a way that two different images are sent on two different LCD screens of the HMD. The user sees the different image for both eyes thus feeling the depth of the scene. DirectX can be used to flip both the images simultaneously. 22
  • Video Client-Server Framework Performance Evaluation  Different experiments were conducted to test the visual quality of the client-server setup as well as find the time delays and other measures of the video data.  The specifications of the stereo frame are as under:  Height of each picture = 288 pixels  Width of each picture = 360 pixels  Size = 304 KB (311040 Bytes) per picture = 608 KB (622080 Bytes) per stereo frame  Each stereo frame is of size 0.6 MB and requires a bandwidth of about 5Mbps/Frame on the LAN. This simple calculation shows the limitation of the 100 Mbps LAN to transfer only 20 fps at the highest possible transfer rate. 23
  • Video Client-Server Framework Performance Evaluation  Copying from SampleGrabber to DRAM  Case 1: Copy times on server – Single Force Thread  300 stereo frames  Mean value = 24.025 ms  95% CI between 23.29 ms and 24.75 ms. 24
  • Video Client-Server Framework Performance Evaluation  Copying from SampleGrabber to DRAM  Case 2: Copy times on server - Two Threads  300 stereo frames  Mean value = 60.48 ms  95 CI between 8 ms and 150 ms. 25
  • Video Client-Server Framework Performance Evaluation  Copying from SampleGrabber to DRAM  Case 3: Copy times on server with Force transfer over LAN  300 stereo frames  Mean value = 33.46 ms  9.43 ms additional for adding network transport thread. 26
  • Video Client-Server Framework Performance Evaluation  Transferring over the LAN  Case 1: Single Buffer, Serialized Transfer  300 stereo frames  Mean value = 86.1 ms  11.61 stereo frames/second. 27
  • Video Client-Server Framework Performance Evaluation  Transferring over the LAN  Case 2: Double Buffer, De-Serialized Transfer  60,000 stereo frames  Mean value = 58.94 ms  17 stereo frames/second.  90% CI between 56.0 and 64.8 ms. 28
  • Video Client-Server Framework Results Summary Scheme Cameras to Server Server to Frames Per DRAM (ms) Client (ms) Second Single Buffer, 24.025 86.1 11.61 Serialized Double Buffer, 24.025 58.94 17 De-serialized  Housheng et. al.[2001] reported a transfer rate of 9-12 fps for a compressed single image of size 200X150 pixels over a LAN. While our scheme transfers 17-18 uncompressed stereo fps of size 360X288 pixels each.  Network bandwidth is near saturated with 18 fps. 29
  • A Multi-threaded Distributed Telerobotic Framework  Distributed application programming is one of the different schemes to establish a reliable connection between master and slave arms.  Different items are realized as software components and then these components communicate with each other using distributed components paradigm.  Object Oriented Approach  Software reusability  Easy extensibility  One time debugging  Multi-user environment  Data encapsulation 30
  • A Multi-threaded Distributed Telerobotic Framework  By using the distributed programming, network protocol issues can be avoided. The distributed framework itself takes care of all the network resources and binary data transfer over the network.  Previously DCOM (Distributed Component Object Model) based components have been used in telerobotics by Yeuk et. al.  .NET components are more advanced than COM based components and offer complete support of .NET framework including .NET Remoting and SOAP technologies.  Several components are developed on server as well as client side and will be explained briefly. 31
  • A Multi-threaded Distributed Telerobotic Framework – MasterArm Component  Local force feedback uses a second order model for minimizing the force applied by the operator.  In order to estimate the force, the component maintains a record of all the force data read for a certain number of samples (history) along with the record of the system time.  Then it evaluates the velocity and acceleration of the master arm at each sampling instant and stores them in a circular buffer.  This information is used to calculate the force proportional to what the operator is applying which is then fed back to the master arm. 32
  • A Multi-threaded Distributed Telerobotic System Server 33
  • A Multi-threaded Distributed Telerobotic System Client 34
  • Client GUI 35
  • A Multi-threaded Distributed Telerobotic System – Performance Evaluation Force and video During the transfer streams of video data  3000 force packets.  3710 force packets.  Mean inter-arrival  Mean inter-arrival time time = 1.08 ms = 3.9 ms  An addition of 0.4 ms.  90% CI between 0.5  90% CI between 0.5 and 13 ms. and 3.9 ms.  Worst case inter- arrival = 789.74 ms. 36
  • A Multi-threaded Distributed Telerobotic System – Performance Evaluation 37
  • A Multi-threaded Distributed Telerobotic System – Performance Evaluation A magnified plot of inter-arrival times in the presence of force, video and command streams. 38
  • A Multi-threaded Distributed Telerobotic System – A comparison  Teresa[1999] developed JAVA and VRML based telerobotic system and reported a image acquisition time of 1s for one single frame of 16 bit depth. Our DirectShow based system reports a 24 ms stereo image acquisition time in a telerobotic system.  Al-Harthy[2001] implemented client-server framework takes around 50ms to transfer a command signal (48 bytes) from client to robot. In our case a similar packet (48 bytes) takes from 0.7 to 1.1 ms due to the efficient utilization of raw network resources by .NET Remoting. 39
  • Augmented Reality  The basic idea of an AR (augmented reality) reality system is to mix the real and virtual information in order to provide an augmented view of the remote scene that provides more information than a simple video could offer.  AR can be used as an effective way to overcome the effects of time delays in a telerobotic environment.  The information added locally must fit seamlessly into the remote real data so as to avoid any perplexities for the teleoperator. 40
  • Augmented Reality – Work Strategy  To introduce non-existent objects to that they appear to be part of the video scene.  Showing a small red ball in the most recent stereo video frame at the position of the gripper calculated locally using the command data from master arm.  Overlaying requires a one-to-one mapping of remote and virtual world coordinate spaces using a camera model.  We use the weak-perspective camera model. 41
  • Augmented Reality – Camera Identification  Using a camera model requires the identification of its projection matrix.  Two projection matrices are needed for left and right images for a stereo projection.  A 3D frame of reference serves as affine basis for all other points in the scene.  This affine relationship between frame of reference and other points remains invariant in the projected points. 42
  • Augmented Reality – Camera Identification  IdentifyCamera component is designed to help identify both cameras at the system initialization as well as when required. Referenc e Frame 43
  • Augmented Reality – Surfaces, HAL, Page Flipping  Microsoft DirectX is a set of highly optimized application programming interfaces (APIs) for developing high- performance 2D and 3D graphics (or multimedia) applications.  A DirectX surface can be thought of a piece of paper that you can draw on. Provides access to pixels data.  HAL (Hardware Abstraction Layer) provides a common set of graphics functions on all hardware devices.  Primary surface is the current video buffer. We write our next frame data to off-screen secondary surface. In one instruction, graphics device flips the addresses of both surfaces sending the off-screen to output surface -- Page Flipping. 44
  • Augmented Reality – Component Framework  On the server side, no new component is added for the AR application. However server side requires setting up cameras, placement and removal of reference frame, etc.  Client side has the following components:  StereoSocketClientcomponent  IdentifyCamera component  RobotModel component  DXInterface component 45
  • Augmented Reality – StereoSocketClient Component  A multi-threaded component initialized by client AR application to:  provide necessary un-blocking socket interface to vision server on the remote side by connecting and receiving data through a dedicated thread.  extract single as well as stereo images from binary video data stream being sent from vision server.  synchronize left and right images while providing stereo frames.  Invokes an event when a new stereo frame is received from the server. 46
  • Augmented Reality – StereoSocketClient Component 47
  • Augmented Reality – RobotModel Component  Acts as a passive proxy of PUMA robot on client side.  Provides updated gripper and joint positions in Cartesian space through PUMA direct and inverse geometric models G ( ) and G1 (M , X ) respectively.  IDecisionServer cannot be used because it is an active proxy of PUMA which does not allow manipulating the position of robot joints independent of PUMA. 48
  • Augmented Reality – DXInterface Component  Central component of AR framework.  Runs AR and visualization business in separate threads.  Handles several tasks such as:  Synchronization of real and virtual data  Projection on video surface  Augmentation of real video  Page Flipping for HMD stereo visualization 49
  • Augmented Reality – DXInterface Component 50
  • Augmented Reality – Complete System 51
  • Augmented Reality – Augmenting Video Augmente Augmente d Ball d Ball 52
  • Conclusions  Real-time control of telerobots in the presence of time delays and data loss is a dynamic research area.  Efficient teleoperation by the operator requires the availability of force and visual feedbacks which, over a LAN, can only be attained through multi-streaming the real-time data.  This work uses .NET based distributed components for the development of a reliable telerobotic scheme that offers multi-streaming the real-time data through extremely fast network connections in a multithreaded environment. 53
  • Contributions  A highly optimized stereo video client-server framework is designed and developed using Visual C++ and Visual C#.NET programming languages.  With this framework we are able to achieve an excellent video transfer rate of 18 stereo frames per second over KFUPM LAN.  Different output techniques for stereo video are implemented and performance evaluated like eye-shuttering glasses, HMD page flipping. 54
  • Contributions  A component based multi-threaded distributed framework for telerobotics is designed, implemented and performance evaluated to study the effects of multi-threading on real-time telerobotics.  This scheme has significantly reduced the network delays in a given telerobotic scenario while providing a very reliable connection between client and server sides. 55
  • Contributions  Different geometric working frames are provided for the operator to enhance his maneuverability in the remote environment.  Force feedback is deployed on the client side as a mean to enhance the tele-presence of the operator tele- manipulating the slave arm.  Computer vision techniques are explored to create AR (augmented reality) on the client side by merging the virtual data with the real video stream from the remote side.  The use of AR has helped in decreasing the network delays by reducing the requirement for fresh video data. 56
  • Future Research Directions  Implementing hierarchical supervisory control in the developed telerobotic framework. This will allow repeatability of simple tasks using impedance control.  Incorporation of complex geometrical shapes in the real video in order to provide even richer information to the client side.  Studying the affects of hyper-threading on a multi- threaded telerobotic framework.  Comparison of the projection accuracies of different camera models while augmenting the real data.  Analysis and design of a 6 d.o.f. (3 d.o.f. force feedback) master  arm being developed at KFUPM in COE department. 57
  • 58