691A Computer Vision Class Project Report

       Topic: 3D Virtual Classroom




                Shichao Ou
Computer Vision Class Project Report – 3D virtual Classroom   Shichao Ou




                              TABLE OF CONTEN...
Computer Vision Class Project Report – 3D virtual Classroom              Shichao Ou




1. Introduction
For this project s...
Computer Vision Class Project Report – 3D virtual Classroom             Shichao Ou




       lecturer. Secondly, how to t...
Computer Vision Class Project Report – 3D virtual Classroom                Shichao Ou




3. Implementations

            ...
Computer Vision Class Project Report – 3D virtual Classroom              Shichao Ou




audio. If we save the entire lectu...
Computer Vision Class Project Report – 3D virtual Classroom            Shichao Ou




                            Fig.4. D...
Computer Vision Class Project Report – 3D virtual Classroom               Shichao Ou




                    Fig.6. High r...
Computer Vision Class Project Report – 3D virtual Classroom           Shichao Ou




                        Fig.7. High r...
Computer Vision Class Project Report – 3D virtual Classroom              Shichao Ou




3.3. The instructor
Having the ins...
Computer Vision Class Project Report – 3D virtual Classroom             Shichao Ou




                 Fig.8. Playing the...
Computer Vision Class Project Report – 3D virtual Classroom           Shichao Ou




Fig.9. Extracting the instructor from...
Computer Vision Class Project Report – 3D virtual Classroom             Shichao Ou




4. Auto indexing of raw video mater...
Computer Vision Class Project Report – 3D virtual Classroom                   Shichao Ou




instructor or not.
To prove t...
Computer Vision Class Project Report – 3D virtual Classroom                    Shichao Ou




template:
                  ...
Computer Vision Class Project Report – 3D virtual Classroom             Shichao Ou




5. Future work
5.1. Making the slid...
Computer Vision Class Project Report – 3D virtual Classroom             Shichao Ou




6. Conclusion
In this study, we hav...
Upcoming SlideShare
Loading in...5
×

VisionProjectReport.doc.doc

197

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
197
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

VisionProjectReport.doc.doc

  1. 1. 691A Computer Vision Class Project Report Topic: 3D Virtual Classroom Shichao Ou
  2. 2. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou TABLE OF CONTENTS -2-
  3. 3. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou 1. Introduction For this project several topics were investigated with respect to the implementation of 3D virtual classroom: (1) the possibility of using a 3D graphics engine for games to create a virtual world, (2) automated indexing of raw video material, (3) the extraction of instructor from the original video stream. By trying out different techniques we hoped to discover advantages and also shortcoming of these techniques and therefore lay a path towards realizing of ultimate goal of creating a virtual classroom, an immersive distance learning experience anywhere and everywhere. 2. 3D virtual classroom construction using 3D graphics engine for games 2.1. The need for creating a 3D virtual classroom There has been a debate as whether creating a 3D virtual classroom is practical as oppose to just playing 2D video stream. Here is a comparison of the advantages and disadvantages of the two:  2D video stream playback  Advantages: - Seeing the real instructor (his movement, interaction with the students) - Easy implementation. With current available technology of video playback, it is very easy to develop the software to do so.  Disadvantages: - Difficult to see the slides due to the low quality of the video stream. The current solution to that is to have a separate window to display the slides in HTML format. However, this can be distracting for the virtual classroom student, whom needs to switch back and forth between the video stream window and the slides window. - High bandwidth requirement. Video streaming over the internet is a highly bandwidth intensive task. With the vast majority of the internet users today still uses modem, this has proven to be a genuine obstacle for the promotion of e-distance-learning. Limited viewing angle, not immersive  3D virtual classroom  Advantages: - An immersive environment that gives the student a sense being there - Allows high resolution slides to be rendered within the virtual environment - Integrated environment with sound, animation and high resolution slides in the right place (on the projection screen). - Allows the environment to be interactive. - Potential solution for the bandwidth problem. This is possible because in a pre- constructed 3D virtual world, only a few data needs to be transmitted for rendering of each frame, i.e. the text of the slides, the action and the 3D coordinates of the instructor.  Disadvantages: - Not a real world. The virtual avatars for the instructor may be very crude that it may not resemble the real instructor act all. It is possible with the current technology to construct a 3D avatar that looks very true-to-life, however it is quite expensive and thus may not be practical to do so for every instructor. - A number of issues still need to be solved in order to bring a virtual environment to life. First, in order for the instructor 3D avatar to replicate the real instructor’s movement in class (one may argue that this is this is what distinguishes a good lecturer from a common -3-
  4. 4. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou lecturer. Secondly, how to triangulate the instructor’s 3D position with just 2 un-calibrated video cameras, so that the virtual counterpart can be rendered in the virtual space accordingly. Thirdly, the rendering of high-resolution text into a 3D virtual environment, one of things this study is trying to solve. This will be discussed later in further detail. Require high-end client machines. Since 3D rendering is a very CPU intensive task, currently only high-end machines are capable of handling. However, I do not see this as an obstacle because with speed of computer doubling every few months and a 3D graphics card almost becoming a standard setup for a new computer, it should take long before majority of the e-learning users can utilize this technology. 2.2. The rendering engine Certainly, one of the first questions for creating a virtual world is how do you render the virtual world? Creating a custom 3D rendering engine is certainly desirable, for you can control every aspect of the rendering and optimize in the way that is suitable for this task. However, it is also a very arduous job, and takes a lot of time and also not the main focus of this research. A decision was made to try out 3D game engine due to the following advantages: High quality and yet very efficient – because the interactive nature of a computer game, rendering quality and frame rate are very important in a 3D game engine. Since the gaming industry has grown to be a multi-million dollar industry, a great deal of money and efforts have been invested into building 3D game engines that are optimized to do these tasks well. They take full advantage of the current technology of 3D graphics card and producing effects that puts other technologies such as VRML and Java3D to shame. Supports variety of 3D modeling software that allows us to create high-quality characters or objects (e.g. desks, TV sets) to make the virtual world more believable. Contains tools to build virtual architecture (e.g. buildings, rooms) easily. Specifically, we picked the Genesis3D engine for our experiments because aside from it possesses all the above-mentioned advantages, it is also open-source and yet well-developed, documented and most importantly has a large user community. Further more, Genesis3D has a web version called WildTangent3D (the programming interface is slight different). If needed, we can easily port the code into WildTangent3D for web access. -4-
  5. 5. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou 3. Implementations Genesis3D 3D modeling tools software Streaming video & audio Virtual character Virtual world Virtual Classroom Video Genesis3D DirectShow texture engine Streaming audio DirectSound Fig.1. System Overview Diagram To begin with, a room was constructed based on the Video Instruction Program (VIP) classroom in the UMass Computer Science Building, using a world-building tool called GEdit that comes with the Genesis3D development package. Then, a basic walk-through application was written using Genesis3D’s programming interface. This allows us visualize the effect of 3D virtual classroom and see what are the limitations or find out what needs to be improved. Fig.2. A virtual classroom 3.1. Playing the audio Now that we have a classroom, we need to be able to hear the instructor’s voice. Genesis3D supports audio playback ability. However, it only supports wav format, which is uncompressed -5-
  6. 6. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou audio. If we save the entire lecture on wav format, even a CD is not enough to store the file. The entire audio track was extracted from video using a program called “Total Recorder” to compress into mp3. Since the engine does not have support for mp3 playback, I had to write my function. I have decided to use DirectShow to accomplish this task. DirectShow is designed to simplify the task of creating multimedia applications by making the complexities of data transports, hardware difference, compression and decompression, and synchronization issues transparent to the programmer. It supports all kinds of video and audio formats such as mpeg, AVI, mp3, DV and ASF. Fig.3. DirectShow Structure Diagram 3.2. Displaying Text One of key issues with watch a video of the lecture is the low video resolution. It is very difficult to read the slides. Therefore, we need to make sure that we can display the slides in our virtual environment and text resolution is relatively high. However, the engine does not have support for 3D text rendering (only 2D text can be displayed). We need come up with ways to overcome this problem. The possible solutions are: (1) display slides as bitmap texture mapping; (2) render text as true 3D objects (3) Render each letter as 3D object with transparent font mapping. 3.2.1. Display slides as bitmap texture mapping This is the easiest way to display the slides and maintain the original format of the slides, since we can simply take snapshots of the original slides and preload all slides into the program as texture mappings. However, how to put these slide mapping into the right place (in our case, onto the virtual projection screen) is still problem (Recall that our virtual classroom is pre-designed and because of this the texture-mapping for the projection screen is already assigned). In order to allow for dynamic texture mapping, we need to assign a globally unique texture projection screen at design-time. Then at run-time, we can use a Genesis3D function to search for this particular texture and replace it with the appropriate slide texture. -6-
  7. 7. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou Fig.4. Displaying slides with texture mapping 3.2.2. Overcoming text clarity problem After testing the first prototype, the text resolution was considered to be too low. This is due the fact the engine only supports up to 256x256 (pixel) bitmap as texture. If the slide has more 5 lines of text, the text will become quite blurred on a 256x256 bitmap. It is possible to increase the texture mapping size by modifying the engine. However this is too time-consuming. a decision was to made to try out an easier method – increasing the size of projection screen, and like-wise scale up the room, so that now the projection screen is made up of 4 256x256 texture maps instead of one. Here is the result: (a) (b) Fig.5. Difference between 256x256 size texture (a) and 512x512 size texture (b) -7-
  8. 8. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou Fig.6. High resolution slide text using 512x512 texture map As we can see, although the resolution is improved, it still cannot match the quality of 2D text in original PowerPoint or HTML program. Another problem is that the bitmaps takes up quite a bit of disk space and as a result not suitable for web distribution. 3.2.3. Rendering Text as 3D objects In attempt to further improve text resolution, an alternative of rendering the text as true 3D objects was also tested. In this case, a 3D model was built for each letter in the alphabet. After parsing the HTML to retrieve text, and then each letter of the text was render with the corresponding 3D model and position them on top of the projection screen. The letter 3D model was made very thin so that they would appear as text being projected onto the screen. This guarantees text resolution since the 3D models for each letter has high resolution. At the same time it solves the size problem because instead of storing a bitmap image for each slide, we just need to store the pared text the relative font size, etc. However, it has led to another problem – slow frame rate. This is because each letter is now a true 3D object. Rendering non-blocky 3D objects are very computational intensive, causing the frame rate to drop noticeably even on the development machine which is an Athlon XP 1800+, equipped with a Geforce 2 graphics card. -8-
  9. 9. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou Fig.7. High resolution text with true 3D objects comparing with the normal texture mapping in the background 3.2.4. Other techniques Here are some of potential techniques and alternatives that were considered by testing, however, due to time limitation, were carried out.  Rendering each letter as 3D object with transparent font texture maps This potentially could solve the frame rate problem and yet maintain high resolution text. The main reason for the rendering text using 3D font models is that most letter in lower case are very curvy, which causes the model to have a bigger polygon size, and puts more burden on the rendering engine. We may resolve this by simply using a block object for each letter and painting a transparent font texture mapping on top of it. Making the object appears as a flat letter in 3D space. However, since the actual 3D object is just a simple square block, the frame rate should improve.  Create text texture maps on-the-fly If the 3D object rendering does not work out, we can still return to the texture mapping method described earlier. However the bitmap size problem still exists. Recently, a Genesis3D function was discovered that it is capable of the rendering 2D text onto texture maps. This way can create texture maps on-the-fly. Thus, solving the bitmap size problem. void DrawTextToBitmapMF(int X, int Y, char *Text, int Color, geBitmap *tgtBitmap); -9-
  10. 10. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou 3.3. The instructor Having the instructor in the virtual space helps to create a more believable study environment. 3.3.1. Creating a virtual character This is done by using 3D modeling software such as 3DMax to build a human model that resembles the real instructor. The engine supports a variety of 3D modeling software formats. Therefore, we can easily load these characters into our virtual classroom and moving the character around and playing different posture animation at different to create a believable character. However, the difficult part in this case is the modeling and animation. It is a very time consuming task and thus not likely that we can build a 3D model for each instructor. Another hard problem is that it is difficult to track the instructor’s body-gestures so that we may replicate them through the virtual actor. Facial expression is even harder to track given such low quality and un-calibrated video cameras. Yet, both body-gestures and facial expression are sometimes crucial visual keys that help us to better understand the instructor is trying to convey. As a result, this approach was considered not very appropriate given the current state of technology. Due to these yet-to-be solved vision problems, other techniques of putting the instructor into the virtual space were tested: 3.3.2. Playing the instructor video inside the virtual classroom This is an alternative way to put a “virtualized instructor” in the virtual space without building a 3D model of the instructor. A more ideal way to do this to extract the instructor from the original video frame and place him within the virtual environment. However, for simplicity, our first attempt was play the video on a virtual TV screen inside the virtual classroom. Thus, it is possible to see the actual motion and expression of the instructor still maintain the virtual environment setting. This is still better than the 2D version better now everything happens within the seamless 3D environment. This is similar to the scenario when the instructor is not present for the VIP class, he can pre-tape and lecture and let the students watch the pre-recorded lecture during the actual lecture. In the real world, the student cannot interact with instructor because he is not present, whereas in our case, we still can program interaction into scene. Technical issue wise, just like rendering text in 3D environment, playing streaming video in the 3D environment is also not supported by the game engine. However, it was later discovered that we could utilize DirectShow's support for multimedia files to implement this function into our program. More specifically, similar to what we did in the "playing audio" section where we used DirectShow interface to play mp3 audio, we can use DirectShow to decode the video format. As we can see from the DirectShow architecture diagram, we can intercept the raw decoded image frame at the "DirectDraw" level, just before it would be sent to the Video Graphics Card for 2D full screen rendering, and direct the stream to our 3D graphics engine and display the frame as a texture mapping onto the TV screen polygon. Thus, creating the effect that the video is being played on the virtual TV screen. -10-
  11. 11. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou Fig.8. Playing the Instructor video within the virtual environment 3.3.3. Extracting the instructor from the video As mentioned in the paragraph above, simply playing the video is not "real" enough, the proper way is by extracting the instructor from the video and placing him/her into the virtual world. In principle, this can be achieved by taking a "background" shot of the place where the instructor will be standing but without the instructor in it, and then simply do background subtraction for every frame that the instructor is in, thus we can automatically "cut" the instructor from every frame. Due to time constraints on the project, this actually was not tested. However, by talking to people who have attempted it, it should be recognized that there are still several environmental issues that makes this problem not so simple. For example, due to the variant lighting condition in the VIP classroom, the "background" image changes from frame to frame. It is not practical to use just one "background" image. Also the shadow effect on the instructor makes a clean extraction difficult - sometimes the instructor is extracted with a large shadow, which may block some of the text displaying in the back when we reinsert the extracted instructor into a scene that only contains the slides. A point that should be stressed here is that although this method is easier to accomplish than building the actually 3D representation of the instructor and in some sense more realistic, however it has its shortcomings of low-resolution and require a good amount bandwidth if we want to stream this online. This is because once the 3D representation of the instructor is built, all that needs to be transmitted through the Internet is the position and gesture ID, greatly reducing the size of data. -11-
  12. 12. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou Fig.9. Extracting the instructor from the original video frame (Hand extraction) -12-
  13. 13. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou 4. Auto indexing of raw video material 4.1. Motivation One of the most convenient features of the current online video lectures is the indexing feature. Indexing divides the video for a lecture into sections. Similar to “Table of Contents” for a book, the index allows the user to find out before hand what main topics are covered in this particular video. Also, it allows users to quickly skip to the section that is most important or interesting. Currently, this indexing is done manually. A person would be designated to sit through the entire video and note down the exact frame that instructor changes the topic and moving into a new slide. Then another person can use these information to divide the original video into sections that corresponds to the topics covered in the lecture. 4.2. Automate using Vision techniques We believe this process can be automated using vision techniques. Our knowledge about VIP lecture can be exploited here. Unlike news or sports programs on TV, where scenes change rapidly and events occur randomly, VIP videos mostly switches between frames showing the slides with the instructor’s voice talking in the background (maybe the instructor will walk in front of the it pointing out things from time to time) and frames focusing solely on the instructor. Almost in cases, a topic change occurs whenever a slide change occurs. We can exploit this occurrence by monitoring each frame that contains the slides and watch for a slide change then log the frame number. To achieve this, a number of vision problems need to be solved: (1) skipping the instructor frames (2) monitor slide change (3) discard the interference when the instructor walks in front of the slides. In order to utilize vision algorithms to analyze every frame, we need to be able to access pixels within each frame. Again, using DirectShow libraries, we can achieve this goal. Similar to the video playback’s case, we can intercept the uncompressed raw image data before it is sent to the DirectDraw level for rendering and then we can access each pixel from there. The most crucial challenge is how to monitor a slide change. Let’s first assume that we are only processing the frames with slides displaying in them, we can easily use vision techniques to compute specific features of each frame, e.g. lines, texture, corners the simplest feature we can use is the entire image and simply do a correlation across each frame and whenever the correlation value exceeds a certain threshold value, we can mark this frame down as a key frame. Now, certainly when the instructor walks in front of the slides it will cause a dramatic change in the correlation value. We need to be able to exclude those instances. The simplest way to solve this particular problem is described as follows: whenever the slides changes, one thing that is sure to change is the title, and in most slides, even when the instructor walks in front the slide, he/she would not talk enough to cover the top of the projection screen (where the title is). Therefore if we simply do the correlation on the top of screen, not only can we exclude the influence of the instructor, also the calculation would be more efficient. With the two of three problems behind us, let’s look the first problem – skipping the frames with close-up shots of the instructor. It is believed by using the same technique described for last two problems, we can also solve the first problem. This is because close-up shots of the instructor should be sufficiently different from shots of the slide (titles). Therefore, by using correlation, there should be a threshold from which we can tell whether a frame is a close-up shot of the -13-
  14. 14. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou instructor or not. To prove this theory, the following experiments were setup: from the video file, a series of frames were picked. One of them is the key template frame we supposedly wish to identify as the current slide, and a few other frames that also have the same slide but at different times. Also, a few frames from the next few slides are captured for testing, and a few frames of the close-up shots of the instructor. Our goal is to try to find thresholds that allows to safely distinguish “current slides frames” with “other slide” frames and “instructor close-up” frames. Groups Image Template Frame A Typical “Current Slide” Frame A Example “Other Slide” Frame A Typical “Close-up” Frame Table 1.Examples of frames chosen for the experiment Normalized Cross Correlation (Grupen, 2002) is used. It is defined as follows: α β R ( x, y ) = ∑ ∑β[( f ( x + i, y + j ) − ˆj )(t (i + α , j + β ) − tˆ)] /(VW ) α i =− j =− (E.q.1) where, − 1 ≤ R ( x, y ) ≤ +1 , is the normalized correlation of the (2α + 1) × (2 β + 1) template to the image at image location (x,y). This correlation depends on (constant) properties of the -14-
  15. 15. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou template: α β tˆ = [ ∑ ∑ t (i + α , j + β )] /(MN ) i = −α j = − β where M = (2α + 1) and N = (2α + 1) represent the dimensions of the template, and α β W =[∑ ∑β(t (i + α , j + β ) − tˆ) 2 1/ 2 ] α i=− j =− The correlation metric also depends on the properties of the image in the region about location (x,y): α β f = [∑ ˆ ∑ f ( x + i, y + j )] /(MN ) i = −α j = − β α β V =[∑ ∑β( f ( x + i, y + i) − fˆ ) 2 1/ 2 ] α i =− j =− 4.3. Results and discussion Here are the correlation results: Groups Normalized Cross Correlation Values “Current Slide” Frames 110 111 110 110 “Other Slide” Frames 99 98 48 58 “Close-up” Frames 12 16 15 13 Table 2.Test results *Note: the correlation value of the template frame with itself is 112 From the results, we can see that there is a definite pattern in these frames: “Current Slide” frames have stable correlation values around 110, “Close-up” frame fluctuates something below 20, whereas “other slide” frames have a bigger fluctuation between 48~98. Now we set the thresholds as follows (based Normalized Cross Correlation Values R): R < 30, “Instructor close-up” frames 30 < R < 108, “Other slides” frames  Slide change, mark key frame R > 108, “Current Slides” frames The technique above is simple, but may not work well under all conditions. However, it is at worst a coarse index of the video frames. There are many more sophisticated vision techniques we can add to improve our accuracy, e.g. feature (texture, lines, corners) detection and matching, statistical dynamic background construction. Also, if we accurately discover which region the instructor is in (by using facial recognition techniques), we can simply exclude that region when we do that correlation of the “slide” frames. The bigger the area, the more accurate our prediction would be. Time-wise, correlation is very time consuming, other possible methods maybe frame difference or histogram difference. -15-
  16. 16. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou 5. Future work 5.1. Making the slides hot-linkable This is another critical issue we need to solve. In PowerPoint or HTML format, the slides are hot- linkable, meaning users can click keywords that would lead them to either another page or redirect them to a website with the relevant information. However, in a 3D environment, the only way to make something “clickable” is by ray-collision detection. Here’s how it works: when you click on something on screen, the application draws an invisible virtual ray from the position of the mouse point into the virtual ray and this ray should intercept with a certain polygon in the virtual world. The function returns the pointer to the polygon that has been selected. Therefore, if we make each letter a true 3D object, it is easy to find out which text has been “clicked”, but if we make the entire slide a texture map, then it would take some effort to make the text “clickable”: you need to find out the exactly x, y coordinates of the area that needs to be “clickable” by parsing the HTML or PowerPoint file and then you can place a transparent polygon over that area of the projection screen. 5.2. Adding more interactive elements The most important advantage of a 3D virtual classroom over the traditional e-learning software would be interactive-ness of a 3D environment. We live in a 3D environment, therefore it is our first nature to interact with a 3D world. In a virtual space, we are freed from the constraint of the 2D world and will be capable to use our instinct to interact with the objects in the virtual world. This makes the software very easy to use. We can have the user asking questions to the virtual instructor. We can have virtual simulation going on in the classroom, e.g. imagine learning about networking protocol from the view of the message being passed alone the network. This would make the class a much more effective and enjoyable experience. 5.3. Solving other vision problems As mentioned before, in order to make a virtual environment more believable, a number of vision problems still need to solved, e.g. tracking body-gestures and facial expression. -16-
  17. 17. Computer Vision Class Project Report – 3D virtual Classroom Shichao Ou 6. Conclusion In this study, we have looked into both graphics and vision problems with respect to building a fully-automated interactive learning system. It includes building a 3D virtual environment, rending issues, how to display slides, creating the instructor in the virtual environment and auto- indexing of the video file etc. Several techniques to render the slides on screen were tested, yet none worked as perfectly as we had hoped. Either clarity or frame rate issue occurred. Possible techniques were discussed to resolve these issues. However, due time constraints, these methods have not yet been implemented. Ideally, we would like to operating system handle this – if we can simply intercept whatever is being rendered on the 2D screen and render them into the 3D environment. However, it is suspected this will cause slow frame rate because the low-resolution we use 1024x768 on our desktop these days, yet no real-time rendering engine on desktop can handle texture map of such size. Even if these problems are solved, there are still other vision problems involving tracking in 3D space that are not easy to solve and require a lot of computation power. These results have led us to wonder “is all this necessary”? We believe the argument in the beginning regarding the advantages of having an immersive 3D environment still holds. However, given the situation that current desktops and operating systems are optimized to handle 2D tasks, we conclude that more efforts should be put in to make these 2D applications better. Having talked to several students of the VIP courses, the following were to be essential: (1) be able to search and find the desired part quickly (2) the slides should be clear (3) there should be interaction between the instructor and student (4) seeing the instructor only when he is either writing on the whiteboard or performing some experiments, other times audio is sufficient if there is slides available. We have yet to create a perfect application to do all these things. There are also many issues involving the production of such programs, e.g., how to produce them more efficiently (take auto-indexing problem for example). For these practical reasons, the later part of the study focused more on the auto-indexing issue and presented some possible techniques to achieve this goal. References Grupen, R.: “A Computational Approach to Sensorimotor Systems”, Text for CS603 Robotics, Computer Science, University of Massachusetts, Amherst, 2002. Cited on: pages 120, 121. Zhu, Z.: “3D Virtual Classroom – the Next Generation e-Learning System”, http://vis- www.cs.umass.edu/~zhu/VirtualClassroom.html. -17-

×