This document provides background information for a master's thesis project on 3D head movement visualization. It discusses a sensor-based diagnostic system developed at the University of Duisburg-Essen that captures neck and head movement data. The thesis aims to develop a 3D visualization application to intuitively display this movement data in virtual 3D space, as an improvement over existing 2D data plots. The background section reviews relevant 3D techniques including OpenGL, Direct3D, VRML, X3D, H-Anim, and Java 3D, as well as mathematical concepts like Euler rotations that will be used to model and control the 3D head movement. X3D and H-Anim are selected for implementing the 3D visualization due
The document describes an image processing methodology to detect the nematode C. elegans in microscope images. It aims to automate the identification of individual worms, which is currently done manually but is too labor-intensive. The methodology segments worms from the background, detects endpoints, generates shape descriptors, and performs profile-driven shape fitting to identify worms. It was implemented as a plug-in for the open-source image analysis software Endrov and aims to improve upon previous automated methods by achieving a higher matching accuracy.
This document summarizes a master's thesis project that aimed to implement an object tracking system in Matlab using a single webcam. The system uses both fast and advanced algorithms to achieve better accuracy and speed than either approach alone. It tracks a person's hand placed in front of the webcam mounted on a computer screen. While not real-time, it serves as an initial step towards a real-time capable system. The thesis discusses background on object tracking approaches, related work, the specific problem and hardware, methods used including adaptive filtering, motion detection and pattern recognition, implementation details, results of simulations and tracking tests, and ideas for future work.
This document presents a master's thesis that designed coded excitation and filter techniques to improve 3D ultrasound computer tomography (USCT) imaging for early breast cancer detection. The thesis aimed to suppress side lobes and increase separability of reflections in USCT data by developing customized mismatch filters. Signal and image evaluations showed the best designed filter and coded excitation combination improved image contrast by 143% compared to the standard USCT approach, under the same system constraints.
This document describes a neural network based content-based image retrieval system developed as a final year project. The system uses Haar wavelet transform and RGB and RgYb colour channels as features to train neural networks on the COREL 1k dataset. The trained neural networks are used to retrieve similar images from the dataset and external images based on calculating distance between feature vectors. Experiments were conducted to evaluate the performance of the proposed system and neural network architecture. The system provides a graphical user interface for users to submit queries and obtain returned images.
Depth sensor independent body part localization in depth images using a multi...Rasmus Johansson
This thesis explores using multiple depth cameras to improve body part localization of occluded joints. A random forest algorithm is trained on low-resolution depth images to classify pixels and estimate joint positions. Testing shows this multi-camera approach provides more stable estimations than a single camera, though limitations of low-resolution training data impact generalization to high-resolution Kinect images. Overall, the results are satisfactory on low-resolution data but struggles are encountered when applying the model to real Kinect footage.
Im-ception - An exploration into facial PAD through the use of fine tuning de...Cooper Wakefield
This document is a thesis submitted by Cooper Wakefield to the University of Queensland for the degree of Bachelor of Engineering. The thesis proposes developing a presentation attack detection (PAD) system through fine tuning a deep convolutional neural network. It aims to leverage pre-trained networks and fine tune the upper layers to differentiate between real and fake facial images with a high degree of accuracy. The thesis outlines the problem of presentation attacks on facial recognition systems, reviews prior approaches to PAD, and describes the proposed solution of using transfer learning on a CNN to classify images as real or fake.
The document presents a complete Android-based framework for automatically identifying a user's transportation mode using GPS trajectories and accelerometer measurements from a smartphone. The framework includes an architecture, design, implementation, user interface, and algorithms for transportation mode identification. It applies segmentation, simplification, and machine learning classification techniques to collected GPS and accelerometer data to identify modes like walking, running, and in-vehicle transportation. The system was evaluated on real and simulated data, achieving an overall accuracy of around 85% for identifying transportation modes, outperforming the Google Activity Recognition API.
The document describes an image processing methodology to detect the nematode C. elegans in microscope images. It aims to automate the identification of individual worms, which is currently done manually but is too labor-intensive. The methodology segments worms from the background, detects endpoints, generates shape descriptors, and performs profile-driven shape fitting to identify worms. It was implemented as a plug-in for the open-source image analysis software Endrov and aims to improve upon previous automated methods by achieving a higher matching accuracy.
This document summarizes a master's thesis project that aimed to implement an object tracking system in Matlab using a single webcam. The system uses both fast and advanced algorithms to achieve better accuracy and speed than either approach alone. It tracks a person's hand placed in front of the webcam mounted on a computer screen. While not real-time, it serves as an initial step towards a real-time capable system. The thesis discusses background on object tracking approaches, related work, the specific problem and hardware, methods used including adaptive filtering, motion detection and pattern recognition, implementation details, results of simulations and tracking tests, and ideas for future work.
This document presents a master's thesis that designed coded excitation and filter techniques to improve 3D ultrasound computer tomography (USCT) imaging for early breast cancer detection. The thesis aimed to suppress side lobes and increase separability of reflections in USCT data by developing customized mismatch filters. Signal and image evaluations showed the best designed filter and coded excitation combination improved image contrast by 143% compared to the standard USCT approach, under the same system constraints.
This document describes a neural network based content-based image retrieval system developed as a final year project. The system uses Haar wavelet transform and RGB and RgYb colour channels as features to train neural networks on the COREL 1k dataset. The trained neural networks are used to retrieve similar images from the dataset and external images based on calculating distance between feature vectors. Experiments were conducted to evaluate the performance of the proposed system and neural network architecture. The system provides a graphical user interface for users to submit queries and obtain returned images.
Depth sensor independent body part localization in depth images using a multi...Rasmus Johansson
This thesis explores using multiple depth cameras to improve body part localization of occluded joints. A random forest algorithm is trained on low-resolution depth images to classify pixels and estimate joint positions. Testing shows this multi-camera approach provides more stable estimations than a single camera, though limitations of low-resolution training data impact generalization to high-resolution Kinect images. Overall, the results are satisfactory on low-resolution data but struggles are encountered when applying the model to real Kinect footage.
Im-ception - An exploration into facial PAD through the use of fine tuning de...Cooper Wakefield
This document is a thesis submitted by Cooper Wakefield to the University of Queensland for the degree of Bachelor of Engineering. The thesis proposes developing a presentation attack detection (PAD) system through fine tuning a deep convolutional neural network. It aims to leverage pre-trained networks and fine tune the upper layers to differentiate between real and fake facial images with a high degree of accuracy. The thesis outlines the problem of presentation attacks on facial recognition systems, reviews prior approaches to PAD, and describes the proposed solution of using transfer learning on a CNN to classify images as real or fake.
The document presents a complete Android-based framework for automatically identifying a user's transportation mode using GPS trajectories and accelerometer measurements from a smartphone. The framework includes an architecture, design, implementation, user interface, and algorithms for transportation mode identification. It applies segmentation, simplification, and machine learning classification techniques to collected GPS and accelerometer data to identify modes like walking, running, and in-vehicle transportation. The system was evaluated on real and simulated data, achieving an overall accuracy of around 85% for identifying transportation modes, outperforming the Google Activity Recognition API.
This document describes a dissertation that aims to improve 3D stereo reconstruction of human faces by combining it with a generic morphable face model. The dissertation first discusses background topics like facial landmark annotation, 3D morphable face models, texture representation, stereo reconstruction and face model deformation. It then describes the proposed scheme which involves steps like landmark annotation, pose estimation, shape fitting, texture extraction, stereo reconstruction from image pairs and deformation of the face model. The results show that fusing the stereo reconstruction with a single image reconstruction using a morphable model leads to a more accurate 3D face model compared to using either method alone. Finally, the deformed face model is visualized on a smartphone using a cardboard viewer.
An investigation into the building blocks for Neural Networks and modern day machine learning. This investigation touches on the evolution of the most basic of neural networks to more modern day concepts, particularly in methodologies that allow better training of these networks to produce more accurate real-life models.
This thesis proposes and evaluates a compressive sensing (CS)-based indoor positioning and tracking system using received signal strength (RSS) from wireless local area network access points. The system is designed and implemented on mobile devices with limited resources.
In the offline phase, RSS fingerprints are collected and clustered using affinity propagation. In the online phase, coarse localization is done by matching RSS measurements to precomputed clusters, and fine localization refines the position using CS recovery on the sparse location signal.
An indoor tracking system is also presented, which integrates the CS-based positioning with a Kalman filter for sequential location estimates. Experimental results on two testbeds show the system achieves better accuracy than other fingerprinting methods, suitable for implementation
Trade-off between recognition an reconstruction: Application of Robotics Visi...stainvai
Autonomous and ecient action of robots requires a robust robot vision system that can
cope with variable light and view conditions. These include partial occlusion, blur, and
mainly a large scale dierence of object size due to variable distance to the objects. This
change in scale leads to reduced resolution for objects seen from a distance. One of the
most important tasks for the robot's visual system is object recognition. This task is also
aected by orientation and background changes. These real-world conditions require a
development of specic object recognition methods.
This work is devoted to robotic object recognition. We develop recognition methods
based on training that includes incorporation of prior knowledge about the problem.
The prior knowledge is incorporated via learning constraints during training (parameter
estimation). A signicant part of the work is devoted to the study of reconstruction
constraints. In general, there is a tradeo between the prior-knowledge constraints and
the constraints emerging from the classication or regression task at hand. In order to
avoid the additional estimation of the optimal tradeo between these two constraints, we
consider this tradeo as a hyper parameter (under Bayesian framework) and integrate
over a certain (discrete) distribution. We also study various constraints resulting from
information theory considerations.
Experimental results on two face data-sets are presented. Signicant improvement in
face recognition is achieved for various image degradations such as, various forms of image
blur, partial occlusion, and noise. Additional improvement in recognition performance is
achieved when preprocessing the degraded images via state of the art image restoration
techniques.
This document is the master's thesis of Miquel Perelló Nieto submitted to Aalto University. The thesis examines merging chrominance and luminance in early, medium, and late fusion using Convolutional Neural Networks (CNNs) for image classification. The thesis demonstrates that fusing luminance and chrominance channels can improve CNNs' ability to learn visual features and outperforms models that do not fuse the channels. The thesis contains background chapters on image classification, neuroscience, artificial neural networks, CNNs, and the history of connectionism. It then describes the author's experiments comparing CNN architectures that fuse luminance and chrominance channels at different stages to a basic CNN model.
The document presents a thesis that developed and evaluated a localization component for mobile service applications. The thesis implemented a platform called YourWay! that collects contextual data from distributed sources and facilitates instant location information to mobile users. Empirical evaluation of YourWay! assessed user experience in indoor and outdoor environments. Results showed user experience was more reliable within community WiFi infrastructure, especially indoors, depending on access point coverage, density, and structure.
The document describes the development of an augmented reality system for the Oculus Rift using a stereoscopic rendering engine. Two fish-eye cameras were installed on the Oculus Rift to capture the real world. The cameras were calibrated using an omnidirectional camera model to account for lens distortion. A graphics engine was developed to render 3D holograms and merge them with the camera feed. Stereoscopic rendering and head tracking were implemented to provide an immersive augmented reality experience within the Oculus Rift headset.
This dissertation examines methods for measuring the spatial arrangement of neurons and glial cells in the mammalian cortex. The document begins with an introduction discussing the importance of studying brain cell arrangement and the need for quantitative tools. It then provides a literature review on brain anatomy, spatial arrangement mechanisms, and existing measurement theories. The experimental method section describes a three-part process: 1) digitizing tissue samples at high resolution, 2) developing algorithms to recognize cells in the digitized images, and 3) analyzing the data using metrics like cell counts, density maps, and cross-correlations. Results are presented on tissue samples from the macaque monkey and rat brain, focusing on specific cortical areas. Future studies are proposed to integrate the data, analyze
This thesis examines using machine learning methods to detect malfunctions in Road Weather Information System (RWIS) sensors. The author builds statistical models using weather data from RWIS and other sensors to predict temperature, precipitation, and visibility values. Significant deviations between predicted and actual sensor values would indicate malfunctions. Classification, regression, and Hidden Markov Models are applied. Experiments show Least Median Square and M5P regression accurately predict temperature and visibility. Decision trees and Bayesian networks perform well for precipitation. Hidden Markov Models also accurately predict temperature classes.
Machine learning solutions for transportation networksbutest
This dissertation proposes machine learning solutions for problems in transportation networks. It contains four main contributions:
1. A probabilistic graphical model called a Gaussian Tree Model that describes multivariate traffic patterns using fewer parameters than standard models. This allows learning from less data.
2. A dynamic probabilistic model of traffic flow inspired by macroscopic flow models. It handles uncertainty and incorporates observations using a particle filter for prediction.
3. Two new optimization algorithms for vehicle routing that use the traffic flow model for routing in volatile environments.
4. A method for detecting traffic accidents using supervised learning that outperforms manual methods. It addresses data biases using dynamic Bayesian networks to improve performance with little labeled data.
This document provides an abstract and table of contents for a book on graph representation learning. The abstract indicates that the book will provide an overview and synthesis of graph representation learning techniques, including deep graph embeddings, graph neural networks, and deep generative models for graphs. The table of contents outlines the book's three parts on node embeddings, graph neural networks, and generative graph models, with chapters covering topics such as random walk embeddings, graph neural network models, graph convolution networks, and variational autoencoders for graphs.
Nonlinear image processing using artificial neuralHưng Đặng
The document discusses the use of artificial neural networks (ANNs) for nonlinear image processing tasks. It first provides background on image processing problems, ANNs, and why ANNs may be suitable for nonlinear image processing. It then reviews literature on applying ANNs to image processing. The rest of the document focuses on using supervised ANNs for classification/feature extraction tasks like object recognition, and regression ANNs for image restoration/filtering tasks. It aims to determine when ANNs can effectively solve problems and how prior knowledge can improve ANN design/interpretability.
This thesis examines methods for reducing the memory and complexity requirements of deep learning models to enable processing and learning on chip. It reviews techniques for compressing model size and operations count, such as pruning connections, quantization, and lightweight architectures. It also introduces a new shift attention layer method for replacing convolutions with multiplications. The thesis also studies incremental learning approaches that can continuously update models as new data becomes available. Hardware implementations of these compressed models and learning methods are explored to enable deep learning inference and training directly on embedded systems.
This document is a master's thesis submitted by Sascha Nawrot to Berlin University of Applied Sciences in partial fulfillment of the requirements for a Master of Science degree in Applied Computer Science. The thesis introduces novel, lightweight open source annotation tools for whole slide images that enable deep learning experts and pathology experts to cooperate in creating training samples by annotating regions of interest in whole slide images, regardless of platform or format, in a fast and easy manner. The tools consist of a conversion service to convert whole slide images to an open format, an annotation service for annotating regions of interest, and a tessellation service to extract the annotated regions from the images.
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
This document is the dissertation of Gianmarco De Francisci Morales submitted for the PhD program in Computer Science and Engineering at IMT Institute for Advanced Studies in Lucca, Italy. The dissertation addresses challenges in managing and analyzing large datasets, or "big data", and presents algorithms for tasks like document filtering, graph computation and real-time news recommendation. It was approved by the program coordinator and supervisor, and reviewed by two external reviewers. The dissertation contains six chapters, including introductions to big data and related work, and presents three contributed algorithms for document filtering, graph computation and news recommendation that scale to large datasets through parallel and distributed techniques.
iGUARD: An Intelligent Way To Secure - ReportNandu B Rajan
This document presents a project report for an intelligent door lock system called iGuard. It was submitted by Nandu B Rajan in partial fulfillment of the requirements for a Bachelor of Technology degree in computer science and engineering. The report includes sections on requirements analysis, system design, implementation, testing, and conclusions. It aims to develop a door lock system that provides strengthened security functions such as sending images of unauthorized access attempts to users and alerting users if the lock is physically damaged.
The document summarizes the March 2015 newsletter from Women Who Write. It discusses the cancellation of their previous meeting due to bad weather, congratulates a member on being named poet laureate of Kentucky, and provides information on upcoming writing events, contests, workshops and grants. Members are encouraged to share their work at the next meeting on April 2nd.
This document provides 5 tips for more effective social media use: 1) Choose social media platforms carefully based on goals and audience. 2) Make a specific, measurable, achievable, realistic, and time-bound social media plan. 3) Tailor social media posts to different platforms by posting at optimal times and using a scheduling tool. 4) Use visual content like photos that are properly sized for each platform. 5) Learn from others by reading widely and finding mentors about effective social media practices.
This document describes a dissertation that aims to improve 3D stereo reconstruction of human faces by combining it with a generic morphable face model. The dissertation first discusses background topics like facial landmark annotation, 3D morphable face models, texture representation, stereo reconstruction and face model deformation. It then describes the proposed scheme which involves steps like landmark annotation, pose estimation, shape fitting, texture extraction, stereo reconstruction from image pairs and deformation of the face model. The results show that fusing the stereo reconstruction with a single image reconstruction using a morphable model leads to a more accurate 3D face model compared to using either method alone. Finally, the deformed face model is visualized on a smartphone using a cardboard viewer.
An investigation into the building blocks for Neural Networks and modern day machine learning. This investigation touches on the evolution of the most basic of neural networks to more modern day concepts, particularly in methodologies that allow better training of these networks to produce more accurate real-life models.
This thesis proposes and evaluates a compressive sensing (CS)-based indoor positioning and tracking system using received signal strength (RSS) from wireless local area network access points. The system is designed and implemented on mobile devices with limited resources.
In the offline phase, RSS fingerprints are collected and clustered using affinity propagation. In the online phase, coarse localization is done by matching RSS measurements to precomputed clusters, and fine localization refines the position using CS recovery on the sparse location signal.
An indoor tracking system is also presented, which integrates the CS-based positioning with a Kalman filter for sequential location estimates. Experimental results on two testbeds show the system achieves better accuracy than other fingerprinting methods, suitable for implementation
Trade-off between recognition an reconstruction: Application of Robotics Visi...stainvai
Autonomous and ecient action of robots requires a robust robot vision system that can
cope with variable light and view conditions. These include partial occlusion, blur, and
mainly a large scale dierence of object size due to variable distance to the objects. This
change in scale leads to reduced resolution for objects seen from a distance. One of the
most important tasks for the robot's visual system is object recognition. This task is also
aected by orientation and background changes. These real-world conditions require a
development of specic object recognition methods.
This work is devoted to robotic object recognition. We develop recognition methods
based on training that includes incorporation of prior knowledge about the problem.
The prior knowledge is incorporated via learning constraints during training (parameter
estimation). A signicant part of the work is devoted to the study of reconstruction
constraints. In general, there is a tradeo between the prior-knowledge constraints and
the constraints emerging from the classication or regression task at hand. In order to
avoid the additional estimation of the optimal tradeo between these two constraints, we
consider this tradeo as a hyper parameter (under Bayesian framework) and integrate
over a certain (discrete) distribution. We also study various constraints resulting from
information theory considerations.
Experimental results on two face data-sets are presented. Signicant improvement in
face recognition is achieved for various image degradations such as, various forms of image
blur, partial occlusion, and noise. Additional improvement in recognition performance is
achieved when preprocessing the degraded images via state of the art image restoration
techniques.
This document is the master's thesis of Miquel Perelló Nieto submitted to Aalto University. The thesis examines merging chrominance and luminance in early, medium, and late fusion using Convolutional Neural Networks (CNNs) for image classification. The thesis demonstrates that fusing luminance and chrominance channels can improve CNNs' ability to learn visual features and outperforms models that do not fuse the channels. The thesis contains background chapters on image classification, neuroscience, artificial neural networks, CNNs, and the history of connectionism. It then describes the author's experiments comparing CNN architectures that fuse luminance and chrominance channels at different stages to a basic CNN model.
The document presents a thesis that developed and evaluated a localization component for mobile service applications. The thesis implemented a platform called YourWay! that collects contextual data from distributed sources and facilitates instant location information to mobile users. Empirical evaluation of YourWay! assessed user experience in indoor and outdoor environments. Results showed user experience was more reliable within community WiFi infrastructure, especially indoors, depending on access point coverage, density, and structure.
The document describes the development of an augmented reality system for the Oculus Rift using a stereoscopic rendering engine. Two fish-eye cameras were installed on the Oculus Rift to capture the real world. The cameras were calibrated using an omnidirectional camera model to account for lens distortion. A graphics engine was developed to render 3D holograms and merge them with the camera feed. Stereoscopic rendering and head tracking were implemented to provide an immersive augmented reality experience within the Oculus Rift headset.
This dissertation examines methods for measuring the spatial arrangement of neurons and glial cells in the mammalian cortex. The document begins with an introduction discussing the importance of studying brain cell arrangement and the need for quantitative tools. It then provides a literature review on brain anatomy, spatial arrangement mechanisms, and existing measurement theories. The experimental method section describes a three-part process: 1) digitizing tissue samples at high resolution, 2) developing algorithms to recognize cells in the digitized images, and 3) analyzing the data using metrics like cell counts, density maps, and cross-correlations. Results are presented on tissue samples from the macaque monkey and rat brain, focusing on specific cortical areas. Future studies are proposed to integrate the data, analyze
This thesis examines using machine learning methods to detect malfunctions in Road Weather Information System (RWIS) sensors. The author builds statistical models using weather data from RWIS and other sensors to predict temperature, precipitation, and visibility values. Significant deviations between predicted and actual sensor values would indicate malfunctions. Classification, regression, and Hidden Markov Models are applied. Experiments show Least Median Square and M5P regression accurately predict temperature and visibility. Decision trees and Bayesian networks perform well for precipitation. Hidden Markov Models also accurately predict temperature classes.
Machine learning solutions for transportation networksbutest
This dissertation proposes machine learning solutions for problems in transportation networks. It contains four main contributions:
1. A probabilistic graphical model called a Gaussian Tree Model that describes multivariate traffic patterns using fewer parameters than standard models. This allows learning from less data.
2. A dynamic probabilistic model of traffic flow inspired by macroscopic flow models. It handles uncertainty and incorporates observations using a particle filter for prediction.
3. Two new optimization algorithms for vehicle routing that use the traffic flow model for routing in volatile environments.
4. A method for detecting traffic accidents using supervised learning that outperforms manual methods. It addresses data biases using dynamic Bayesian networks to improve performance with little labeled data.
This document provides an abstract and table of contents for a book on graph representation learning. The abstract indicates that the book will provide an overview and synthesis of graph representation learning techniques, including deep graph embeddings, graph neural networks, and deep generative models for graphs. The table of contents outlines the book's three parts on node embeddings, graph neural networks, and generative graph models, with chapters covering topics such as random walk embeddings, graph neural network models, graph convolution networks, and variational autoencoders for graphs.
Nonlinear image processing using artificial neuralHưng Đặng
The document discusses the use of artificial neural networks (ANNs) for nonlinear image processing tasks. It first provides background on image processing problems, ANNs, and why ANNs may be suitable for nonlinear image processing. It then reviews literature on applying ANNs to image processing. The rest of the document focuses on using supervised ANNs for classification/feature extraction tasks like object recognition, and regression ANNs for image restoration/filtering tasks. It aims to determine when ANNs can effectively solve problems and how prior knowledge can improve ANN design/interpretability.
This thesis examines methods for reducing the memory and complexity requirements of deep learning models to enable processing and learning on chip. It reviews techniques for compressing model size and operations count, such as pruning connections, quantization, and lightweight architectures. It also introduces a new shift attention layer method for replacing convolutions with multiplications. The thesis also studies incremental learning approaches that can continuously update models as new data becomes available. Hardware implementations of these compressed models and learning methods are explored to enable deep learning inference and training directly on embedded systems.
This document is a master's thesis submitted by Sascha Nawrot to Berlin University of Applied Sciences in partial fulfillment of the requirements for a Master of Science degree in Applied Computer Science. The thesis introduces novel, lightweight open source annotation tools for whole slide images that enable deep learning experts and pathology experts to cooperate in creating training samples by annotating regions of interest in whole slide images, regardless of platform or format, in a fast and easy manner. The tools consist of a conversion service to convert whole slide images to an open format, an annotation service for annotating regions of interest, and a tessellation service to extract the annotated regions from the images.
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
This document is the dissertation of Gianmarco De Francisci Morales submitted for the PhD program in Computer Science and Engineering at IMT Institute for Advanced Studies in Lucca, Italy. The dissertation addresses challenges in managing and analyzing large datasets, or "big data", and presents algorithms for tasks like document filtering, graph computation and real-time news recommendation. It was approved by the program coordinator and supervisor, and reviewed by two external reviewers. The dissertation contains six chapters, including introductions to big data and related work, and presents three contributed algorithms for document filtering, graph computation and news recommendation that scale to large datasets through parallel and distributed techniques.
iGUARD: An Intelligent Way To Secure - ReportNandu B Rajan
This document presents a project report for an intelligent door lock system called iGuard. It was submitted by Nandu B Rajan in partial fulfillment of the requirements for a Bachelor of Technology degree in computer science and engineering. The report includes sections on requirements analysis, system design, implementation, testing, and conclusions. It aims to develop a door lock system that provides strengthened security functions such as sending images of unauthorized access attempts to users and alerting users if the lock is physically damaged.
The document summarizes the March 2015 newsletter from Women Who Write. It discusses the cancellation of their previous meeting due to bad weather, congratulates a member on being named poet laureate of Kentucky, and provides information on upcoming writing events, contests, workshops and grants. Members are encouraged to share their work at the next meeting on April 2nd.
This document provides 5 tips for more effective social media use: 1) Choose social media platforms carefully based on goals and audience. 2) Make a specific, measurable, achievable, realistic, and time-bound social media plan. 3) Tailor social media posts to different platforms by posting at optimal times and using a scheduling tool. 4) Use visual content like photos that are properly sized for each platform. 5) Learn from others by reading widely and finding mentors about effective social media practices.
This document discusses how news coverage has changed with increased media conglomeration and the rise of online readership. It notes that a small number of large corporate owners now dominate news and media outlets, influencing the types of stories and viewpoints that are reported. The document analyzes differences in coverage between two news sources and finds that larger outlets tend to provide more opinion-based coverage than factual reporting. It also notes that readers increasingly prefer getting their news online and want information quickly, leading outlets to prioritize brief, superficial stories over in-depth reporting. The goal is to educate readers on how conglomeration affects the diversity and objectivity of news coverage.
This document summarizes some of the charitable efforts and community investments of a bank in Kentucky in 2006. It discusses donations to education, health and human services, civic causes, and the United Way. Specific initiatives highlighted include supporting after-school meals for children, financial education programs, affordable housing projects, and contributions to arts and cultural organizations. The overall message is that the bank believes investing in communities through charitable activities helps create vibrant neighborhoods and a better place to live and work.
3nd CDA Lecture - Dr Adamo - May 7, 2015 - Oquendo CenterOKGO
This document provides information on a 3rd cervical disc arthroplasty practical course, including objectives, schedule, speakers, and sponsors. The morning session will cover the history and description of instrumentation for cervical disc arthroplasty, recent clinical data, and a demonstration video. The afternoon session involves practice on canine cadavers. The course is approved for 5 hours of continuing education credit upon completion of a post-test and evaluation.
The document describes a cruise experience exploring remote areas of Southeast Alaska aboard the small cruise ship The Snow Goose. Each day involves exploring areas like beaches, forests and coves by kayak, skiff or on foot with a naturalist guide. Activities include watching for wildlife like birds and marine mammals, learning about the natural environment, and relaxing at a new anchorage each evening before dining and presentations. The Snow Goose aims to offer a more personal, immersive experience than larger cruise ships through small group size and opportunities for in-depth exploration of coastal areas.
Global warming is caused by greenhouse gases like carbon dioxide and methane trapping heat in the atmosphere. This is called the greenhouse effect. While the greenhouse effect makes Earth habitable, the additional heat trapped by human emissions is causing problems. Temperatures are rising faster than ever before in human history due to increased CO2 from the burning of fossil fuels. This is harming ecosystems, raising sea levels, and jeopardizing human health and safety. Efforts to reduce emissions through alternatives to fossil fuels and carbon sequestration are needed to slow global warming.
d.compress: Improving Adherence in Birth Controlonkursen
The document discusses improving adherence to birth control through reducing stress and increasing education. It describes prototypes created by Naomi Cornman and Onkur Sen of d.compress to provide facts about contraceptives and reminders via text and alarms. A trial with 20 participants showed adherence and stress were reduced for those receiving the facts, though engagement was not significantly affected. Next steps include further testing additional features and delivery platforms.
The document describes a trip from Dubai to Italy and Spain. It includes visits to Rome, where the author saw the Colosseum, Milan where they saw the Sansiro Stadium, and a trip to Como city. The author then traveled to Barcelona in Spain, where they saw Camp Nou stadium and celebrated New Year's. The trip ended on January 1, 2012 in Barcelona.
This document is National City's annual volunteer report. It summarizes the volunteer efforts of the company's 35,000 employees. In 2004, employees contributed over 271,000 hours of volunteer work, equivalent to 34,000 work days. The report highlights several employee-led volunteer initiatives focused on issues like breast cancer, poverty, education, and community revitalization. It also profiles several employees who excel at volunteer work, like Lisa Reichert who fundraises for cancer research and Kathi Moore who helps families purchase homes through Habitat for Humanity.
This document discusses how news coverage has changed with increased media conglomeration and the rise of online readership. It notes that a small number of large corporate owners now dominate news and media outlets, influencing the types of stories and viewpoints that are reported. The document analyzes differences in coverage between two news sources and finds that larger outlets tend to provide more opinion-based coverage than factual reporting. It also notes that readers increasingly prefer getting their news online and want information quickly, leading outlets to prioritize brief, superficial stories over in-depth reporting. The goal is to educate readers on how conglomeration affects the diversity and objectivity of news coverage.
This document discusses a thesis on using augmented reality for full body immersion. It describes using a motion capture suit and video see-through technologies with the Oculus Rift head mounted display. The thesis aims to achieve realistic interactions between the user and virtual characters through body movements and gestures controlled in the augmented reality scene. It plans to evaluate the system using surveys after subjects interact with a virtual agent scenario while wearing the motion capture suit and Oculus Rift.
This document provides an overview and outline of a thesis on single person pose recognition and tracking using a single camera. The thesis aims to improve the performance of an interactive spatial game controlled by human poses. Key areas discussed include background subtraction using mixtures of Gaussians, particle filtering for torso tracking, and classifiers for pose recognition. The experimental setup involves video recordings of people in different conditions for testing and training classifiers. The thesis contributes improvements to hand detection and adds a classifier to detect non-poses for better game control.
Final Year Project-Gesture Based Interaction and Image ProcessingSabnam Pandey, MBA
This document summarizes a student's final year project report on developing a gesture recognition system for browsing pictures. The student aims to implement algorithms for skin and contour detection of a user's hand in real-time images from a webcam. The report includes chapters on literature review of gesture recognition and image processing techniques, methodology using the waterfall model, requirements analysis and design diagrams, implementation details using OpenCV, and testing and evaluation of the project objectives and aims.
This thesis presents a framework for integrated uncertainty modeling and visualization. It aims to address four major barriers: (1) users must anticipate their uncertainty needs before building models, (2) uncertainty parameters are treated the same as variables, (3) uncertainty propagation must be manually managed, and (4) visualization techniques are largely incompatible with different uncertainty types. The framework encapsulates uncertainty into atomic variables, automates uncertainty propagation, and abstracts visual mappings from the underlying uncertainty type. It extends the traditional spreadsheet to intrinsically support uncertainty modeling and visualization. Case studies demonstrate the framework for business planning, financial decision support, and process specifications.
Design and implementation of a Virtual Reality application for Computational ...Lorenzo D'Eri
This document is a thesis discussing the design and implementation of a virtual reality application for visualizing computational fluid dynamics (CFD) data. It begins with an introduction and background sections covering the state of the art in VR applications for scientific data visualization and the relevant technologies used, including the HTC Vive, ParaView, Unity, and the ParaUnity plugin.
The thesis then describes the development of two key software artifacts: a VR application built in Unity to visualize and interact with CFD data, and an improved version of the existing ParaUnity plugin to export CFD datasets from ParaView to Unity. The final system allows users to export CFD simulation results from ParaView and load them into the Unity VR environment for interactive
This document is a master's thesis submitted by R.Q. Vlasveld to Utrecht University in partial fulfillment of the requirements for a Master of Science degree. The thesis explores using one-class support vector machines (SVMs) for temporal segmentation of human activity time series data recorded by inertial sensors in smartphones. The author first reviews related work in temporal segmentation and change detection methods. An algorithm is then presented that uses an incremental SVDD model to detect changes between activities in a continuous data stream. The algorithm is tested on both artificial and real-world human activity data sets recorded by the author. Quantitative and qualitative results demonstrate the method can find changes between activities in an unknown environment.
ML guided User Assistance for 3D CAD Surface Modeling: From Image to
Customized 3D Mouse Model
MSc Advanced Product Design Engineering & Manufacturing
By
GEORGIOS KONSTANTINOS KOURTIS
Abstract
The design of 3D CAD surfaces, notably in mouse design, often necessitates a specialized
understanding and expertise. This thesis presents an innovative approach that harnesses machine learning
(ML) to facilitate 3D CAD surface modeling. The primary objective is to develop a demonstration
platform that uses ML to process user input, identify the most similar pre-existing design from a database,
and guide the user in modifying the chosen design to meet their specific requirements. The demonstration
platform will offer step-by-step guidance, assisting users in adapting the suggested mouse surface design
to match their design preferences. This ML-guided approach aims to inspire users to explore more
inventive designs while saving both time and costs by streamlining the design process. The pivotal
project objectives encompass the development of a machine learning model capable of interpreting user
input and identifying the closest match from an existing database of designs, the construction of an
interactive demo that integrates with 3D CAD software, and the preparation of a comprehensive report
documenting all stages of the project. The implementation of the proposed demo will yield a more
efficient and streamlined surface modeling experience for users. The machine learning model, trained on
a robust dataset of user inputs and mouse designs, will facilitate the identification and modification of an
existing design, effectively assisting users in achieving their design goals. In summary, this thesis seeks
to synergize ML and CAD surface modeling, offering enhanced assistance to users. The anticipated
outcome includes a demo and machine learning model that are poised to significantly advance the process
of 3D CAD surface design, particularly for mouse design, optimizing creativity, efficiency, and user
satisfaction.
This document summarizes a dissertation titled "Augmented Reality for Space Applications". The dissertation proposes introducing in-field-of-view head mounted display systems in spacesuits to give astronauts the ability to access digital information and operate robots during extravehicular activities. The proposed system would be capable of feeding task-specific information on request and recognizing objects in the real world to overlay augmented reality information for error checking and status purposes. This would increase situational awareness and task accuracy while reducing human error risk. The dissertation focuses on preliminary design and testing of an experimental head mounted display and its integration and testing in a spacesuit analogue.
The document discusses content-based image retrieval (CBIR). It notes the increasing amounts of digital images being produced and stored without metadata. CBIR aims to analyze image content to discover semantic knowledge and improve image retrieval when no metadata is available. Recent deep learning methods have greatly outperformed traditional CBIR techniques. The document provides an overview of CBIR components, traditional and deep learning-based feature extraction methods, and evaluation of CBIR systems.
Geometric Processing of Data in Neural NetworksLorenzo Cassani
Feed-forward neural networks can be considered as geometric transformations that act on input data points. It is known that, during training, those transformations generally bring points belonging to the same class closer together and drive points belonging to different classes farther away. The purpose of this work is to carry out a numerical analysis of how this description varies during training. The training task consisted of a binary (e.g.\ even digits vs.\ odd digits) of the elements of MNIST and other similar structured datasets, in order to have a clear vision of the link between structure in data and possible noteworthy behaviours in the evolution of the inner geometries of neural networks. Particular attention has been reserved to data points which neural networks struggle more to correctly classify, and their connection to the neural networks generalization capability.
Development of 3D convolutional neural network to recognize human activities ...journalBEEI
This document describes the development of a 3D convolutional neural network (CNN) model to recognize human activities using moderate computation capabilities. The model is trained on the KTH dataset, which contains activities like walking, running, jogging, handwaving, handclapping, and boxing. The proposed model uses 3D CNN layers and max pooling layers to extract both spatial and temporal features from video frames. Testing achieved an accuracy of 93.33% for activity recognition. The number of model parameters and operations are also calculated to show the model can perform human activity recognition with reasonable computational requirements suitable for devices with moderate capabilities.
Emotions prediction for augmented EEG signals using VAE and Convolutional Neu...BouzidiAmir
Abstract
Our end-of-study project aims to develop an intelligent emotion prediction system using
augmented EEG signals using VAE and convolutional neural networks combined with LSTM. The aim of the project is to improve medical diagnosis, in particular by facilitating the
detection of emotions and applying quick treatment to patients, those suffering from trauma
and anxiety. This work helps both the medical staff and the patient. In a first step, we generated new data via cVAEs to provide a sufficient amount of data. Second, we used CNNs
combined with the LSTM technique to predict emotions. Finally, we have created an ergonomic, easy-to-access interface for therapists in medical diagnosis.
This document summarizes a thesis on detecting coughs using mobile phones. It explores using a phone's built-in accelerometer to detect coughing as an alternative to audio-based detection. The author conducted an experiment collecting accelerometer data from coughing and non-coughing scenarios. Initial results using 7 features and 10-fold cross-validation achieved over 90% accuracy for cough detection. The thesis aims to determine the viability of accelerometer-based cough detection and achieve accuracy comparable to audio-based methods.
This document provides a project report on developing a bike sharing Android application. It includes an introduction describing the motivation for the project, a literature survey reviewing papers on related topics like bike and public transport integration, a software requirements specification outlining the requirements, a system design section with diagrams, and plans for system implementation and testing. The report was submitted by students to fulfill the requirements for a degree in computer engineering.
This document describes a project that aims to estimate full-body demographics from images using computer vision and machine learning techniques. The project proposes a novel method to automatically annotate images with categorical labels for a wide range of body features, like height, leg length, and shoulder width. The method explores using common computer vision algorithms to extract features from images and video frames and compare them to a database of subjects with labeled body features. The document outlines the requirements, approaches considered, design and implementation of the project, and evaluates the results in estimating demographics and identifying individuals.
This document provides an overview of image and audio steganography. It discusses the basics of steganography including its definition as concealed writing and a brief history of its use from ancient times to modern digital applications. The document focuses on different steganography techniques for images and audio, including least significant bit (LSB) encoding and decoding processes. It also includes system design diagrams and an implementation section describing a steganography program.
This document is a master's thesis that investigates how AI planning techniques can be used for modeling services in the context of the Internet of Things (IoT). It begins with background on AI planning and defines the IoT. It then explores using rule engines, domain-specific planning, and domain-independent planning to solve representative IoT use cases of increasing complexity. It evaluates the performance of a state-of-the-art planner on a challenging waste collection problem and suggests techniques for improving scalability. The thesis concludes by summarizing achievements and outlining directions for future work.
1. Master Thesis
3D Head Movement Based on Real-Time Acquired and
Recorded Data
written by
B.Sc. Huang Hongfu
with
Prof. Dr. rer. nat. Anton Grabmaier
Department
Electronic Components and Circuits
of
University Duisburg-Essen
Duisburg, Jannuary 2008
2. Abstract
Since William Fetter first proposed the term computer graphics [1] in 1960s, computer
graphic technologies (especially 3D graphic technologies) are widely used scientific com-
puting, medical science research and clinical diagnosis. In the department of Electronic
Components and Circuits of Duisburg-Essen University, a sensor-based diagnostic system
has been invented with 3 sensors to capture neck and head movement of the human body.
This master thesis is to provide an application of 3D visualization for the sensor-based
diagnostic system, which can help doctors to sense the head and neck movement of a pa-
tient in an intuitive manner. Compared with 2D visualization, 3D visualization provides
a more intuitive way for people to sense the real world. Even though many scientists and
engineers are working in this field to make it ease for authoring and programming, it is
still somewhat complex. This thesis propose a method to build a 3D visualization applica-
tion in an easy way. By using X3D (Extensible 3D) and H-Anim (Humanoid Animation)
specifications, the approach taken in this thesis is to separate the process of modeling
and programming control. In modeling stage, a segmentation tool is developed to build a
H-Anim model from existing 3D meshes, which can be produced by the tools of Blender
(An open source 3D animation application from http://www.blender.org) and 3DS Max
(A commercial 3D graphics application developed by Autodesk Inc.). In programming
stage, the SAI (Scene Access Interface) programmer interface is adopted to control H-
Anim model. This thesis also develops a flexible application architecture so that existing
2D visualization and a new 3D visualization can work together in a compatible manner.
Key Words: 3D Visualization, X3D, VRML, Java3D, Xj3D, H-Anim, SAI, Head Move-
ment, Human Model, Humanoid Animation.
3. i
Acknowledgments
First of all I would like to thank my family. When I felt lonely, calling you and talk with
you can cure my homesickness. Without your great supports and understanding, I could
not imagine I can finish my study in Germany so quickly. Special thanks are gave to my
grant mother! In my life, your help is so selfless and is so important that it can not be
replaced by those from any others. Wherever I am, I will always miss you.
Then I would like to thank Dr. Viga for co-supervision during my master thesis. Dis-
cussion with you are so helpful. When I face difficulty, you can always give me a correct
direction so that I can find a solution easily. Without your help, the struggle will be
more worse. Thank your support for my future career plans. Thanks to Professor Anton
Grabmaier to be my supervisor.
Thank Dr. Stefan Freinatis. I will always remember my presentation experience in your
lecture. I enjoy such a chance to improve my presentation skill. Thank Mr. Philipp Lim-
bourg! Thanks for your kindly help in a seminar course. With your help, I learned how
to prepare a presentation, how to search technical papers and how to write a technical
paper. I will benefit from these skills in my future career. Thanks Doctor Cui Ai, who
gives me some knowledges in medical domains. As a computer engineer, I like to discuss
with you in medical domains. You give me some new knowledge. Thank your very much!
Thank Mr. Michael M¨uller who help me to distinguish computer vision and 3D Synthesis
at the beginning of my master thesis. Thanks are also given to my friends Li Jing and Xu
Linli, who helped me to proofread my first draft of this thesis.
I would like to express my appreciation to Professor Hunger and ISE department in
Duisburg-Essen university. It is really a nice experience for me to study in Duisburg-
Essen University. I will remember forever all stories I experienced in Duisburg.
Finally, I would like to express my appreciation to Web3D Consortium. Thanks for all
the guys who have done a good job in this fields and provide examples. which I learn so
much from these examples.
6. Chapter 1
Introduction
Sensor technology provides an interface to the real world, which help people to receive
signals from macro entities, like outer space and stars, or from microscopic particles like
cells and molecules. With the recent development of Health-Care-IT technology, sensor
technologies are widely used in diagnostic system, which can help doctor to make a di-
agnosis decision more easily and precisely. At the same time, computer 3D visualization
technology is not only limited its usage in computer game industry, but also extend its field
to medical data visualization. For example, with the help of 3D technology, the flowing of
blood and the beating of heart can be rendered in an intuitive way on a computer screen.
In the department of Electronic Components and Circuits of Duisburg-Essen University,
a sensor-based diagnostic system has been invented with 3 dimensional movement sensors
to capture neck and head movement of the human body. The acquired data is wireless
transmitted to a PC-based graphic application, where data can be monitored as time vs.
value graph. This system can be used to detect a neck disease like neurological movement
disorders. With the help of this system, an early detection becomes possible and easy.
Compared with a 2D visualization, this master work aims to develop a 3D virtual image
representation software application to visualize the head movement based on real time
measurement data of a sensor system or formerly recorded data from a record file. To
achieve this aim, the following issues will be addressed in this master thesis:
1. Development of a static X3D model for Head Movement Control based on humanoid
geometry data and movement data.
2. Development of a Movement Control Model based on actual mathematical movement
models.
2
7. Introduction 3
3. Development of a Java Application Framework for Head Movement Control as a
graphic user interface to the doctor.
4. Development of tools for model construction to improve the lifelike appearance of
humanoid.
With an 3D visualization application, the head movement can be interpreted in an intu-
itive way. So the doctor is able to get a better understanding of the moving aspects of the
neck disease, e.g. cervical dystonia.
The remainder of this thesis is organized as follows. In the second chapter a background of
this master thesis is introduced, which includes a application background and a technique
background. The application background describes the sensor-based diagnostic system
and its 2D visualization. The technique background describes several 3D technique options
which can be used to implement a 3D visualization. In the third chapter, a method of
modeling and authoring a human body will be covered. In this chapter, two examples
will be used to explain how to build a realistic 3D human model. 3D movement control
based on Euler’s rotation theorem is going to be discussed in chapter 4. In chapter 5, an
implementation of a 3D visualization in Java programming language will be covered, which
includes a description of a system architecture, main components and Java source code
explanation of this master work. The future works will be discussed in the last chapter.
At the end of this paper, references are listed.
8. Chapter 2
Background
2.1 Application Background
Nowadays many people, especially old people, are more likely to suffer neck movement
diseases. In these diseases, there is one called cervical dystonia, which is characterized by
abnormal movements or postures of the neck and head. Most current treatment methods
only focus on its symptoms rather than its underlying causes. To improve this situation, a
sensor-based diagnosis system (see [8]) is introduced by researchers from Duisburg-Essen
University. In this system, a helmet [Figure 2.1] with three sensors is used to measure the
movement of neck and head.
Figure 2.1: Diagnostic System for Neurological Movement Disorders (see [8])
Three movement parameters (α, β, γ) are captured by sensors in this helmet system. The
4
9. Introduction 5
movement model is shown as follows:
Figure 2.2: Movement Parameters (see [8])
From the above figure, α represents the nodding movement, which is around the X axis;
β represents the shaking movement, which is around the Y axis; and γ represents the
rotation movement, which is around the Z axis.
The measurement data can be displayed on a 2D user interface [Figure 2.3], which can
assist doctors in diagnosis of head movement deseases.
10. Introduction 6
Figure 2.3: 2D GUI of the Sensor-based Diagnostic System (see [8])
From figure 2.3, a three-dimension (3D) movement is displayed in three separate two-
dimensional (2D) time-based plots. It would be more attractive that an intuitive way
of movement can be given by a display of realistic 3D representation of a virtual human
model in a computer-based 3D space. Hence a 3D visualization based on sensor data will
be developed in this thesis. Compared with the former display of three separate 2D plots, a
3D visualization allows doctors to sense a head movement of a patient from many different
3D viewpoints and perspectives. This 3D visualization will not replace the existing 2D
visualization. Instead, it aims to work together with the existing 2D visualization to
provide additional information of a neck disease of a patient.
2.2 3D Technical Background
In this section, several 3D techniques, which can be used to build a 3D application, will
be introduced. These techniques range from low level programmer interfaces, such as
OpenGL (see [9]), to high level modeling languages like VRML (see [2]) and X3D (see
11. Introduction 7
[5]). In addition, Euler’s Rotation Theorem will be covered in this section. During the
discussion, some popular 3D tools like Blender (see [24]) will be mentioned. At the end of
this section, a simple comparison will be given out to explain why X3D and H-Anim (see
[6]) are chosen to implement the application of 3D visualization.
2.2.1 3D Techniques in low level
First, two low level programmer interfaces (API): OpenGL (see [9]) and Direct3D (see
[12]), will be discussed in this section. Even though these APIs will not be used directly
in the 3D visualization application implemented in this master thesis, they still deserve a
brief introduction here, since they are the most two important programmer interfaces in
3D visualization. Many high level programmer interfaces and authoring tools are based
on these two APIs, such as Java 3D (see [32]), VRML and X3D, which are designed to
run on OpenGL or Direct3D.
2.2.1.1 OpenGL
From the viewpoint of an application programmer, OpenGL (Open Graphics Library) is
a software interface to graphics hardware (see [9]). It provides a platform-independent
and language-independent interface to programmers so that a programmer can produce
high-quality graphical images, specifically color images of three-dimensional objects. For
the viewpoint of a implementor of OpenGL, OpenGL is a set of commands that control
the operation of graphics hardware like GPU (Graphics Processing Unit) or CPU (Central
Processing Unit). The following example [figure 2.4] shows a simple case code in which a
programmer uses OpenGL to draw a green square in the XY plane:
12. Introduction 8
Figure 2.4: Piece of Code Using OpenGL for drawing (see [10])
It does not matter if readers do not understand the programming language used in this
example. This simple example listed here just tells readers one fact that by using the
commands from OpenGL, a programmer can construct a 2D or 3D object and display
it on a computer screen. Since every complex 2D or 3D object can be composed by
several small and simple objects like lines, triangles or quadrilaterals, a programmer can
use OpenGL to construct very complex 3D objects, e.g. human body. For rendering 2D
or 3D objects, OpenGL provides a graphics pipeline [see figure 2.5] known as the OpenGL
state machine to handle OpenGL commands.
13. Introduction 9
Figure 2.5: Simplified version of the Graphics Pipeline Process (see [10])
Figure 2.5 shows a working pipeline of OpenGL. Commands can be processed immediately
or buffered in a display list. The first stage of the pipeline is evaluation, which provides an
efficient means for approximating curve and surface geometry by evaluating polynomial
functions of input values. The next stage operates on geometric primitives described by
vertices: points, line segments, and polygons. In this stage vertices are transformed and
lit, and primitives are clipped to a viewing volume in preparation for the next stage,
rasterization. The rasterizer produces a series of framebuffer addresses and values using
a two-dimensional description of a point, line segment, or polygon. Each fragment so
produced is fed to the next stage that performs operations on individual fragments before
they finally alter the framebuffer. It is also possible to bypass the vertex processing portion
of the pipeline by doing pixels operation. Further details about OpenGL can be found in
the OpenGL specification (see [9]).
2.2.1.2 Direct3D
Direct3D is another low level 3D API, which is a proprietary API designed by Microsoft
Corporation for hardware 3D acceleration on the Windows platform. It is limited in Mi-
crosoft’s various Windows operating systems and is also the basis for the graphics API on
the Xbox and Xbox 360 console systems. Since Direct3D is bound closely with Windows
systems and is optimized for Windows systems, it has an excellent rendering performance.
14. Introduction 10
A comparison of OpenGL and Direct3D can be found in [11]. A programming example is
given in Figure 2.6 to show how to use Direct3D for drawing an 3D object. The rendering
pipeline of Direct3D is shown in Figure 2.7.
Figure 2.6: A Piece of Code Using Direct3D (see [13])
The above example draws a triangle on the screen by using Direct3D API. Similarly,
Direct3D also uses a pipeline technique for rendering. The mechanism of its pipeline
rendering system is described in Figure 2.7.
15. Introduction 11
Figure 2.7: Pipeline Stages (Direct3D Version 10) of Microsoft Direct3D (see [14])
Figure 2.7 describes the working pipeline of Direct3D. The entrance of the pipeline is Input
Assembler, which supplies the data to the pipeline. The next stage is Vertex Shader, which
performs single vertex operations such as transformations, skinning or lighting. After the
Vertex Shader is the stage Geometry Shader. In this stage, the Direct3D primitives
(triangles, lines or vertex) are processed. A stage called Stream Output comes after
the stage of Geometry Shader. The stage Stream Output stores previous stage results on
memory, which is used to recirculate data back into the pipeline. After the Stream Output,
the output from Geometry Shader is rasterized into pixels, as well as clipping what is not
visible. This stage is called Rasterizer. After finishing rasterization, it comes to Pixel
Shader which does pixel operations such as coloring. The last stage is Output Merger.
This stage merges various types of output data (pixel shader values, depth/stencil) to build
the final result. This pipeline is configurable, which makes it very flexible and adaptable.
For more technical details, refer to [12].
16. Introduction 12
2.2.1.3 Java Binding for the OpenGL API
In the early history, the Java platform was criticized not only for a bad performance but
also for a bad GUI support. Since this platform is open source and many efforts have been
involved in improving it. Now most of the criticisms disappear. ”Java Binding for the
OpenGL API” is one of these efforts. Since OpenGL API is still a procedural API and
most programming languages used currently are object-oriented, people expect a wrapper
library that allows OpenGL to be used in object-oriented manner. This is the reason why
the specification ”Java Binding for the OpenGL API” being defined. With this binding
specification, programmers can easily use OpenGL in a Java application. Java OpenGL
project (see [15]) provides an open source implementation for this specification.
2.2.2 High Level: Tools, Standards and Specifications
In previous section, two important low level programming interfaces are introduced. Gen-
erally, low level APIs focus too much on programming and implementation. Compared
with these low level APIs, people are more willing to use some kinds of high level tools,
specifications and standards, which focus mostly on 3D modeling (authoring) and render-
ing. There are four high level 3D techniques to be illustrated in this section.
In the early history of the IT industry, specifications and standardizations were not very
important. Every company has its own implementation for a similar product. In these
products, Adobe Flash and 3DS Max are two widely used examples. Compared with open
systems (see [16]), these products belong to property platforms, which have their own
rendering pipeline, their own binary representations and authoring interfaces. Adobe Flash
also has its own programming scripts which can be used for animation and interaction. But
the interoperability and portability are very poor in these kinds of products. Sometimes,
people may want to build a model on one system and then render it in another system.
Without a open standard, crossing platform is really difficult. Recently, things change
that most of such kinds of product try to be open. For example, 3DS Max supports to
export or import a X3D or VRML format to its platform. This is because people need a
more open platform in 3D modeling and rendering. In the following sections, some open
standards, which aim to specify a very open 3D modeling and rendering platform, will be
introduced.
17. Introduction 13
2.2.2.1 VRML Overview
VRML (Virtual Reality Modeling Language, originally known as the Virtual Reality
Markup Language) is a standard for representing 3-dimensional (3D) interactive vector
graphics, designed in the World Wide Web context. The first version of VRML was re-
leased in November 1994. In August 1996, VRML 2.0 was released. And The current and
functionally complete version is VRML97 (ISO/IEC 14772-1:1997). The VRML97 has
two parts: Part 1 (ISO/IEC 14772-1) defines the base functionality and text encoding for
VRML; Part 2 (ISO/IEC 14772-2) defines the base functionality and all bindings for the
VRML External Authoring Interface (EAI). Even the VRML has now been superseded
by X3D (ISO/IEC 19775-1). it still worths a simple introduction in this section, since it
is a well-known ISO standard in Virtual Reality and it is also the base of X3D, which will
be introduced in next section. Here is a simple example of VRML:
19. Introduction 15
The above source code is stored as in a file with ”wrl” as its suffix, e.g. XXX.wrl. This
example defines a virtual scene which contains a blue cube and a red sphere. The virtual
scene looks as follow:
Figure 2.9: A VRML Example in a 3D Web Browser (see [3])
Since VRML is not supported by Web3D Consortium any more, the details in the above
source code will not be explained. This example given here just want to give readers a feel-
ing what it looks like. More details in VRML can be found in the specifications:ISO/IEC
14772-1:1997 and ISO/IEC 14772-2:2004. Readers can also refer to ”The VRML 2.0
Sourcebook” [19] to get further information.
2.2.2.2 X3D Overview
Extensible 3D (X3D) is a software standard for defining interactive web-based and broadcast-
based 3D content integrated with multimedia. It provides an XML-encoded (encoding by
using the Extensible Markup Language [4]) scene graph and a language-neutral Scene Au-
thoring Interface (SAI) that enable programmers to incorporate interactive 3D into Web
services architectures and distributed environments. The whole set of X3D standards are
listed in the following table:
20. Introduction 16
ISO Name Common Name
ISO/IEC 19775:2004 X3D Abstract
ISO/IEC 19775-1:2004/FDAM Am1:2006 X3D Amendment 1: Additional functionality
ISO/IEC 19775:2004/FDAM Am1:2006 X3D Architecture and base components with Amendment 1
ISO/IEC FCD 19775-1r1:200x X3D Architecture and base components Revision 1
ISO/IEC 19776:2005 X3D encodings: XML and Classic VRML
ISO/IEC FDAM 19776-1:2005/Am1 X3D encodings: XML encoding: Amendment 1
ISO/IEC FDAM 19776-2:2005/Am1 X3D encodings: Classic VRML encoding: Amendment 1
ISO/IEC FDIS 19776-3 X3D encodings: Binary encoding
ISO/IEC 19777-1:2005 X3D language bindings: ECMAScript
ISO/IEC 19777-2:2005 X3D language bindings: Java
ISO/IEC 19774 Humanoid Animation
ISO/IEC 14772:1997 Virtual Reality Modeling Language (VRML97)
ISO/IEC14772-1:1997/Amd. 1:2002 VRML97 Amendment 1
Table 2.1: Summary of X3D specifications
This section will not cover all of these standards, but focus on the specification ”ISO/IEC
19775:2004”, which is composed by two parts: Part 1 defines the abstract functional
specification for the X3D framework, and definitions of the standardized components and
profiles and Part 2 defines the scene access interface (SAI) that can be used to interact
with X3D worlds both from within the worlds or from external programs.
• X3D Architecture and Basic Concepts
First, there is a X3D architecture [Figure 2.10], which is defined in ”ISO/IEC 19775:2004”.
In this architecture, the kernel component is a X3D browser, which accepts X3D/VRML
files and streams as inputs. The X3D browser will then construct a scene graph structure,
which can be a tree or forest data structure if there are more than one root node. The
X3D browser provides two kinds of APIs (Application Programmer Interfaces), SAI and
EAI (EAI is defined in VRML and will be not be introduced in this section. Readers can
refer to [2] for further information.), which can be used to control the scene graph in a
dynamic and interactive way.
21. Introduction 17
Figure 2.10: X3D Architecture (see [5])
One important concept in X3D specification is ”X3D Browser”. In X3D specification, the
interpretation, execution, and presentation of X3D files occurs using a mechanism known
as a browser (To view a complete list of available X3D browsers, please check [22]), which
displays the shapes and sounds in the scene graph. Another important concept in X3D ar-
chitecture is scene graph. A scene graph is a basic unit of the X3D run-time environment.
This structure contains all the objects in the system and their relationships. Relation-
ships are contained along several axes of the scene graph. The transformation hierarchy
describes the spatial relationship of rendering objects. The event graph (sometimes it is
called as behavior graph) describes the connections between fields and the flow of events
through the system. An event graph will be constructed in memory from the events passing
with web page. The Figure 2.10 explains very well how X3D is used in a Web environ-
ment. X3D can also be used in a non-Web application environment. For example, it can be
used in a Java application. This will be explained in next section ”Scene Access Interface”.
Before going into next section, understanding of what is a X3D file is necessary. A X3D file
specifies a virtual world or part of a virtual world, which can be in XML syntax, original
VRML syntax or a binary stream.
22. Introduction 18
Figure 2.11: X3D examples
In Figure 2.11, the example on the left is in XML encoding and the example on right is in
classic VRML encoding. Both of them have the same virtual world. In this virtual world,
there is a node ”Group”, which includes a node ”Transform”. In the node ”Transform”,
there is a Shape, whose geometry is a Sphere. The radius of the node Sphere is 2.3 meters
(In ISO/IEC 19775:2004, it says the length unit is in meter). The color of the node Shape
is red.
There are many authoring tools, which can be used to create a X3D file. The ”X3D
Editor” is one of widely used authoring tools, which looks as follows:
23.
24. Introduction 20
with that script. Script node code may generate events which are propagated back to the
scene graph by the run-time environment. External access is supported through integra-
tion between the X3D run-time system and a variety of programming language run-time
libraries. A SAI architecture is shown in following figure:
Figure 2.13: SAI Architecture
In Figure 2.13, an external access is used in a Java application (If a scene contains a script
node, it will have an internal access scheme at the same time). This Java application
connects with two X3D browsers by use two sessions. In a session, the Java application
use SAI external access to manipulate a scene. As shown in this figure, An application
is the external process that is not implicitly part of the X3D browser. A session defines
the life of a single connection between an application and a X3D browser. Please note
that ”Routes” in this figure are not nodes. The ROUTE statement is a construct for
establishing event paths between specified fields of nodes. Based on the architecture in
Figure 2.13, a suggestion on how to implemented an application is given in next figure. It
is supposed that only one browser and one scene graph are used in this application.
25. Introduction 21
Figure 2.14: SAI Programming
The first step for external access from a Java application is creating a X3D browser. Then
using the created X3D browser to load (or say ”create”) a scene. Once a scene is loaded,
the application can manipulate the scene by adding, modifying, deleting a node or a field.
It also can add a route or delete a route.
2.2.2.3 H-Anim Overview
The full name of H-Anim is ”Information technology - Computer graphics and image pro-
cessing - Humanoid animation (H-Anim)”. It is an international standard for humanoid
modeling and animation. This standard aims to create libraries of interchangeable hu-
manoids, as well as authoring tools that make it easy to create new humanoids and ani-
mate them in various ways. Its first version H-Anim 1.0 , which was a subset of VRML2.0,
was published in 1998. And then it was upgraded to H-Anim 1.1 in Aug 3, 1999, so that
it was compatible with VRML97. After VRML was superseded by the X3D standard,
the H-Anim was upgraded to ISO/IEC FCD 19774:200x. In this thesis, only the concepts
26. Introduction 22
from ISO/IEC FCD 19774:2005 will be discussed. For features of the newest version of
ISO/IEC FCD 19774:200x, see http://www.iso.org [18].
In H-Anim specification, a H-Anim figure is structured by the following objects: Hu-
manoid, Joint, Segment, Site and Displacer. The Humanoid object is the root of an
H-Anim figure. It is a container and provides the attachment framework for all other
parts of the humanoid. The Joint object is attached to the Humanoid object or other
Joint objects using a transformation that specifies the current state of articulation along
with geometry associated with the attached body part. The Segment object specifies the
attributes of the physical links between the joints of the humanoid figure. The Site object
specifies locations at which known semantics can be associated. The Displacer object spec-
ifies information about the range of movement allowed for object in which it is embedded.
Besides the definitions of these objects, the H-Anim specification also specifies skeletal
hierarchy and body geometry for a H-Anim figure. There are four levels of skeletal hi-
erarchies recommended in H-Anim specification. In different level, a H-Anim figure has
different skeletal hierarchy, which contains different numbers of joints, segments, sites.
The geometry of a body of an H-Anim humanoid figure can be described in two ways:
skeletal, and skinned. The skeletal method specifies the geometry within the scene graph
of the skeletal hierarchy, which is defined in the skeleton field of the Humanoid object.
The geometry defined within the Segment objects of this hierarchy describes the body as
separate geometric pieces. This method, while computationally efficient, can cause cer-
tain visual anomalies (such as seams or creases) that detract from the appearance of the
humanoid figure. So in this thesis, the second way –skinned method– is used to specify
the surface of a human body. In the so called ”skinned method”, human body is specified
as a continuous piece of geometry, within the skin field of the Humanoid object. By using
the skinned method, it is very easy to create a seamless H-Anim figure. For how to build
up a complex and seamless 3D human, please refer the section ”Complex Human Body
Model”[Section 3.2] in Chapter 3.
2.2.2.4 Java 3D
Java 3D is another high level specification, which is widely used to construct a 3D appli-
cation. It runs on top of either OpenGL or Direct3D. This is why it is classified as a high
level specification. Compared with VRML and X3D, Java 3D faces mostly to program-
27. Introduction 23
mers. Even though Java 3D allows users to load a scene from a VRML or X3D file, but
most of the time it constructs a scene by programming rather than loading from a VRML
or X3D file. This is why Java 3D can be used to implement a X3D browser. In Java 3D, a
scene is also constructed by using a scene graph, which is a representation of the objects
that have to be shown. The scene graph is structured as a tree containing several elements
that are necessary to display the objects.
Figure 2.15: Java 3D Scene Graph Model (see [32])
The above figure only explains the concept of a scene graph in Java 3D. A complete
structure of a Java 3D should at least have two ”BranchGroup” nodes. One is the content
branch and the other is the view branch.
28. Introduction 24
Figure 2.16: A Complete Java 3D Scene Graph (see [32])
A 3D browser must be included in a Java 3D application. Programmers must consider
how to construct and control a scene as well as how to render the scene. And in a X3D-
SAI application, the programmer considers mostly on how to control the scene, because
modeling can be done by a separated X3D authoring tools. From this point of view, Java
3D can also be classified as a middle level specification. This is especially true when using
Java 3D to construct a X3D browser.
2.2.3 Rotation Mathematics
In general, every displacement of a rigid object can can be separated as a translation and a
rotation. Since the movement of the head and neck of a human body is composed mostly
by rotations, only rotation mathematics will be considered in this section. In Euler’s
rotation theorem, once a fixed point is defined, every rotation of a rigid object can be
uniquely described by a minimum of three parameters. This master thesis assumes that
there is a fixed point in human’s neck and the movement of neck and head is a 3D rotation
based on this fixed point (This point is also called a Joint. In real life, since a human
body is not a rigid object, a movement of human’s neck and head should not only based
on one point). In the rest of this section, some mathematical representations of a rotation
will be outlined, which will be used to construct a 3D application.
• Direction Cosine Matrix (DCM)
29. Introduction 25
In this representation, a new coordinate system (it is called a rotated coordinate system)
is specified in terms of the original coordinate system (it is called a non-rotated coordinate
system). This representation supposes that there is a original coordinate system, which is
described by an original point O and three unit vectors: u,v,w; the new rotated coordinate
system is defined by O and three unit vectors: ˆu,ˆv,ˆw. Please note that O is a fixed point.
The vectors u,v,w represent respectively the axises X, Y and Z in the original coordinate
system; the vectors ˆu,ˆv,ˆw represent respectively the axises X’,Y’ and Z’ in the rotated
coordinate system. Give a point P(x,y,z) in the original coordinate system. After the
rotation, a new point P’(x’,y’,z’) in the original coordinate system can be calculated by
the following equations.
A =
ˆux ˆvx ˆwx
ˆuy ˆvy ˆwy
ˆuz ˆvz ˆwz
(2.1)
And
P′
= A ∗ P (2.2)
Each element in matrix A is the cosine of the angle between a rotated unit basis vector
and one of the original axes, and therefore is often called the Direction Cosine Matrix
(DCM). The most powerful capability of this representation is that a series of rotations
can be composed by matrix multiplication as follow.
P′
= An ∗ An−1 ∗ ... ∗ A2 ∗ A1 ∗ P (2.3)
In Chapter 4, the equation 2.3 will be used to compose a complex rotation movement.
• Euler axis and angle (rotation vector)
Euler’s rotation theorem says that any rotation can be expressed as a single rotation
around an unchanged axis. The axis is an unit vector and the rotation angle is a scale
value with its sign to show the rotation direction. The relation between the axis and the
rotation angle is defined by the so called right-handedness rule [Figure 2.17]. That is,
the fingers of the right hand are curled to match the direction of the rotation; the thumb
indicates the direction of the axis.
30. Introduction 26
Figure 2.17: Euler Axis and Angle
This representation is widely used in robot controlling and computer 3D visualization.
In X3D specification, a rotation in a virtual world is expressed by an Euler axis and an
angle. Since combining two successive rotations, which is represented by a Euler axis
and angle, is not straightforward, people usually describe these two successive rotations in
DCM representation and then compose them by using equation 2.3. The composite result,
which is in DCM representation, will be converted to Euler Axis and Angle representation.
• Euler angles
According to Euler’s rotation theorem, any rotation can be described by using three angles.
These three angles are called Euler angles [27]. A complex rotation can be split into three
simple constitutive rotations. In such a way, the complete rotation matrix is the product
of three simple rotation matrices. Usually, the most simple rotation, which leads to the
most simple rotation matrix, is a rotation around one of the axises X,Y and Z in the
reference coordinate system. Unfortunately, the definition of Euler angles does not define
an unique combining sequences, which composes three simple rotations into one complex
rotation. This is the so-called ”conventions’ problem” [27] in the literature. In history,
many different conventions are used, which depend on the axes around which the rotations
are carried out, and their composite sequence (since rotations are not commutative). In
31. Introduction 27
this thesis, only the so-called ”X-Y-Z” (sometimes it is called 1-2-3 convention) convention
is discussed, since the movement model of the sensor-based diagnosis system is based on
three independent parameters (see Figure 2.2). In this convention, α describes a rotation
around axis X, β describing a rotation around axis Z and γ for a rotation around axis Y.
Three DCM matrices can be derived from Euler angles as the follows:
AX =
1 0 0
0 cosα −sinα
0 sinα cosα
(2.4)
AY =
cosγ 0 sinγ
0 1 0
−sinγ 0 cosγ
(2.5)
AZ =
cosβ −sinβ 0
sinβ cosβ 0
0 0 1
(2.6)
A = AX ∗ AY ∗ AZ (2.7)
It is known that a representation of DCM can also easily be transformed as a representation
of Euler’s axis and angle. In this thesis, three movement parameters captured by sensors
in the sensor-based diagnosis system (see Figure 2.2) will be treated as three Euler angles.
2.2.4 Transformation in X3D specification
A transformation in X3D specification is described as follows: Given a 3D point P, P can
be transformed into a new point P’ in its parent’s coordinate system by a series of interme-
diate transformations. The following equation explains the calculation of a transformation
in a X3D scene:
P′
= T ∗ C ∗ R ∗ SR ∗ S ∗ (−SR) ∗ (−C) ∗ P (2.8)
In the above matrix notation, C specifies a translation offset from the origin of the local
coordinate system (0,0,0); SR specifies a rotation of the coordinate system before the scale
(to specify scales in arbitrary orientations); T specifies a translation to the coordinate
system; R specifies a rotation of the coordinate system and S specifies a non-uniform scale
of the coordinate system. In this paper, only a rotation transformation will be discussed
32. Introduction 28
since head and neck movements are mostly composed by rotations. A rotation can be
calculated as follows:
P′
= C ∗ R ∗ (−C) ∗ P (2.9)
A neck and head movement can be simplified as a complex rotation, which is based on
the center of a joint in the neck of a human body. As it is mentioned before, the center
of a joint is different from people to people. A doctor, who use this 3D visualization
application, should define the center of the selected joint in a neck according to the figure
of a specific patient. The rotation parameters are captured by sensors in the sensor-based
diagnosis system.
2.3 3D Techniques in This Thesis
In this chapter, some related technical options are listed, which can be used to implement
a 3D visualization. The low level interfaces, like OpenGL (Java OpenGL) and Direct3D,
are very powerful to control level elements like pixels and image frame. Many complex
transformations can be done on 3D object by using these low level interfaces. But the task
of programming is too sophisticated when using a low level programmer interface. It is
really a nightmare for programmers when a 3D scene contains many complex 3D objects.
This is the reason why OpenGL or Direct3D is not used directly in this master thesis
(Actually, low level APIs are used in an indirectly way. For example, a X3D browser is
implemented by using OpenGL or Direct3D). The Java 3D seems to be a good option, but
it does not separated authoring, controlling and rendering in a very clear way. Some times
these tasks mix together. Moreover, it does not distinguish humanoid objects with other
general 3D object in its specification. While in this master thesis, visualizing movements
of the head and neck of a 3D human is the main task. To fulfill this task, the selected
specification must provide an easy way to manipulate H-Anim objects in a X3D scene. It
seems that H-Anim with SAI in X3D specification is certainly the perfect option for this
master thesis.
33. Chapter 3
Human Body Model
This thesis will separate human body modeling and programming control into two devel-
opment stages. In modeling stage, the question of how to create a realistic 3D human
model, which is compliant with X3D and H-Anim specifications, will be answered. A
simple H-Anim model will be used to illustrate the creation of a skeletal hierarchy of 3D
human body, within which there is a neck joint, which will be controlled through the SAI
interface in the programming stages. Based on the simple H-Anim model, a complex H-
Anim model with a realistic skin surface will be developed and used in the 3D visualization
application.
3.1 Simple BoxMan Model
Since this thesis aims to build an application to visualize sensor data, not a pure 3D mod-
eling or rendering system. A simple 3D human model should be used to start the work
of this master thesis. ”Simple” means this model should be easy to achieve, easy to be
controlled in a Java program. Supposed that there is H-Anim model, which has a joint
called ”Neck Joint”. The neck and head of this model is connected with this joint. When
the joint moves, the neck and head move accordingly.
Since VRML, X3D and H-Anim are open standards, many open source VRML and X3D
examples are published in the Internet. An example call ”BoxMan” model can be found
in Web3D forum (see [23]). This simple model uses several boxes to construct a H-Anim
model. Although it looks not like a real person, it fits the simple rule of ”easy to achieve
and easy to be controlled” very well. What needs to be done in this example is adding a
29
34. Introduction 30
neck joint, removing the unnecessary time-based animation control and segmenting of the
box-shape skin so that the skin of head and neck can be attached to the new adding neck
joint. Because the ”BoxMan” model is very simple, all this modification can easily be
done by hands, rather than using some sophisticated authoring tools. After modification,
a simple static ”BoxMan” is achieved as follows:
Figure 3.1: Simple Box Man Model
Definitely a simple model will not lead to a smoothly movement, especially when a move-
ment of muscles and skins is considered. But once a simple model has been built success-
fully, and if the architecture of the 3D application is flexible enough, it should be easily
extended to support a more complex HAnim human model. This topic will be discussed
in next section. With this simple model, it is quite easy to implement a Java program to
visualize a head movement. A simple demo of a 3D visualization system looks like:
35. Introduction 31
Figure 3.2: Simple Demo of 3D Visualization
In this simple demo, the relationship of sensor data and a 3D human model can be eval-
uated quickly. This demo also provides a prototype to develop a more realistic 3D visual-
ization system. Now it is the time to build a complex model.
3.2 Complex Human Body Model
A human body consists of a number of segments (such as forearms, hands and feet) that are
connected to each other by joints (such as a elbow, a wrist and an ankle). In the previous
section, a simple H-Anim model with a box-shape skin is achieved. This section aims to
create a H-Anim human body with a realistic skin. The science of Human Anatomy has
described most details about a skeletal structure of a human body. While a soft tissue in
a human body, such as muscle and skin, is still very difficult to be described by a simple
mathematical model. This difficulty make it very hard to visualize a very highly realistic
movement of human boy by software technologies. In X3D, building a H-Anim model
by using skinned body geometry, which is composed by many tiny polygons (mostly by
triangles or by convex plane quadrilaterals), can improve the degree of realism. As it is
known that a continuous 3D mesh is widely used to constructed a human body. And many
3D modeling tools, such as Blender (see [24]), can be used to create a 3D continuous mesh.
There are also many free 3D meshes of human body in the Internet (see [25]). Here is a
3D mesh I got from a open source project called ”MakeHuman” in Blender forum [24].
36. Introduction 32
Figure 3.3: A 3D Mesh Constructed by Blender
In the above example, a 3D mesh can be exported into a VRML file or a X3D file, so
that the 3D points and polygons in the 3D mesh can be obtained, which will then be used
to replace the box-shape skin in the simple H-Anim model. Another example from ”3D
Meshes Research Database” [25] is shown in next figure.
38. Introduction 34
In this example, the points and polygons are stored in a VRML file, so that it is very
easy to access these points and polygons. These 3D meshes may be too large or too small.
They may be in a wrong position (For example its back toward to a viewer or it is upside
down). A comparison of a realistic skin and a box-shape skin is shown in the following
figure.
Figure 3.5: From A Simple Model To A Complex Model
Additional handling on these points and polygons are storing them in a matrix and then
using mathematics tools (e.g. Matlab [26]) to enlarge or reduce their scales. Sometimes a
translation or rotation is needed.
To attach a 3D mesh to a skeletal structure, a continuous 3D mesh needs to be segmented
into several segments so that they can be attached to different joints. This process is called
segmentation in this thesis. It is done by a segmentation tool developed in this master
thesis.
39. Introduction 35
Figure 3.6: Segmentation for Complex Model
The next step removing the box-shape skin of the ”Box Man” model and adjust the skeletal
structure of the ”Box Man” to fit the shape of realistic 3D mesh. The following figure
shows a skeletal structure, which will be attached with a realistic skin later.
40. Introduction 36
Figure 3.7: Skeletal Hierarchy
The last step is replacing the skin in the previous simple ”Box Man” model with this
more realistic 3D mesh. Different segments are attached to different joints, so that a new
realistic H-Anim model is finished.
41. Introduction 37
Figure 3.8: A Complex Model for Woman
The above steps show a creation process of a whole body model. Actually, we can build
a half body model (for example a model has only head, neck and part of arms) using the
same process.
42. Chapter 4
3D Movement Control
3D modeling has been discussed in previous chapter. This chapter discuss 3D movement
control based on rotation mathematics, which has been introduced in Section 2.2.3. The
contents include: input parameters of 3D movement control, a coordination system used
in 3D movement control and several control models.
4.1 Movement Parameters and Coordination System
In the sensor-based diagnosis system, three movement parameters (see [8]) are measured:
α, β, γ. The α presents the action of head nodding; the β presents the action of head
shaking and the γ presents the action of head rotating. These three parameters will be
used to control the neck and head movement of 3D H-Anim model. In the specification
ISO/IEC 19775, a Cartesian, right-handed, three-dimensional coordinate system is used.
By default, the viewer is on the Z axis looking down the -Z axis toward the origin with +X
to the right and +Y straight up. The relations between the parameters and the coordinate
system are shown in the following figure.
38
43. Introduction 39
Figure 4.1: Movement Parameters and Coordination System
The left of this figure shows the rotation in the sensor-based diagnostic system and the
right of the figure shows the ration in a X3D virtual world. In the sensor-based diagnostic
system, the axis Y faces toward to a viewer while in X3D virtual world, the axis Z faces
toward to a viewer, which means that the nodding in a X3D virtual world is around axis
X, shaking is around axis Z (not Y as in the helmet system) and rotation is around axis
Y (not Z as in the helmet system). Since a head and neck movement is a rotation, the
rotation center needs to be defined. While different people has different anatomic structure
and it is quit difficult to fine such a rotation center. This thesis assumes that a doctor
can define the rotation center of head and neck based on the real physical structure of a
patient.
4.2 Two Rotation Models
In this part, two classes of movement models will be introduced. The first class is a
composite movement model, in which a read 3D movement will be displayed in a 3D
browser. The second class contains 3 separated movement models, in which only a specific
dimension (e.g. only the rotation around axis X) of a 3D movement is displayed in a 3D
browser.
44. Introduction 40
4.2.1 Composite Movement Model
A real 3D movement will displayed in a 3D browser if an user chooses this movement
model. In this model, the design intent is transforming the representation a rotation from
an Euler Angles’ representation into an Euler’s Axis and Angle representation. From the
sensor-based diagnostic system [Figure 2.1], the following Euler angles are achieved:
(α,β,γ)
From these three Euler angles, a rotation axis e and rotation angle θ will be calculated.
The eis a unit vector: (ex,ey,ez). A vector representation for Euler’s Axis and Angle is:
(ex,ey,ez, θ).
The calculating steps are as follows:
• Step 1: DCM around X from α
AX =
1 0 0
0 cosα −sinα
0 sinα cosα
(4.1)
• Step 2: DCM around Y from γ
AY =
cosγ 0 sinγ
0 1 0
−sinγ 0 cosγ
(4.2)
• Step 3: DCM around Z from β
AZ =
cosβ −sinβ 0
sinβ cosβ 0
0 0 1
(4.3)
45. Introduction 41
• Step 4: Combining DCMs
A = AX ∗ AY ∗ AZ (4.4)
In this step, the conversion of ”X-Y-Z” (see [27]) is used to combine a rotation.
• Step 5: Converting DCM into Euler’s Axis and Angle
θ = arccos((A11 + A22 + A33 − 1)/2) (4.5)
ex = (A32 − A23)/(2sinθ) (4.6)
ey = (A13 − A31)/(2sinθ) (4.7)
ez = (A21 − A12)/(2sinθ) (4.8)
4.2.2 Separative Movement Models
There are three separative models: rotation around axis X (nodding of head), rotation
around axis Y (rotating of head) and rotation around axis Z (shaking of head).
• Euler’s Axis and Angle around axis X
e = (1, 0, 0) (4.9)
θ = arccos((AX11 + AX22 + AX33 − 1)/2) (4.10)
In equation (4.10), AX comes from equation (4.1).
• Euler’s Axis and Angle around axis Y
e = (0, 1, 0) (4.11)
θ = arccos((AY 11 + AY 22 + AY 33 − 1)/2) (4.12)
In equation (4.12), AY comes from equation (4.2).
• Euler’s Axis and Angle around axis Z
e = (0, 0, 1) (4.13)
θ = arccos((AZ11 + AZ22 + AZ33 − 1)/2) (4.14)
In equation (4.14), AY comes from equation (4.3).
46. Chapter 5
Implementation
In this chapter, a 3D visualization system will be implemented with Java programming
language. In the first section, the system architecture is described, a main use case will be
outlined in a ”Sequence Diagram” [28]. Then the functions and main algorithm of each
component will be introduced one by one. At the end of this chapter, the source code
structure is explained and what programming libraries used are illustrated.
5.1 System Architecture
As it is mentioned in the first chapter (Chapter 1) of this thesis, developing a flexible
architecture is one of the aims of this master thesis. To introduce a flexible software ar-
chitecture, the concept of ”component-based software engineering” is used in this section.
The functions of modeling, movement control, simulation and GUI are separated into dif-
ferent components. The relations between different components are shown in the following
figure:
42
47. Introduction 43
Figure 5.1: A Framework Architecture
This system can work under an on-line mode or under an off-line mode. The ”on-line”
mode means that the visualization system is connecting with the sensor-based diagnostic
system (which includes a helmet and a 2D visualization application running on a PC) by
a TCP/IP link. The connection is as follow:
48. Introduction 44
Figure 5.2: On-Line Connection
In the on-line working model, the connection between ”Host 1” and ”Host 2” is a TCP/IP
link, so the ”Host 2” can be the same PC as ”Host 1” or ”Host 2” can be a remote PC
in another office, another city or even another country. With this loose connection, it is
easy to extend such a system to a remote diagnosis system. This is also a technical trend
nowadays in the domain of Health-Care IT.
The 3D visualization system can also work in an off-line mode, which means the application
must read sensor data from a local disk or from a remote disk by the FTP protocol (see
[30]) or other protocols like NFS (Network File Protocol, see [31]).
5.1.1 Sequence Diagram
Before going into the details of implementation of each component, a sequence diagram is
used to describe briefly a dynamic behavior of this system.
49. Introduction 45
Figure 5.3: Sequence Diagram
• Step 1: A user load a H-Anim model from a X3D file.
• Step 2: A movement model is set. There are four movement models at present.
• Step 3: If the system is working in on-line mode, the sensor simulator is disable.
• Step 4: If the system is working in on-line mode, the real sensor-based diagnostic
system sends head movement data to the component Movement Control. The latter
then controls the head movement of the H-Anime model.
• Step 5: If the system is working in off-line mode, the user must load movement data
from a data file.
• Step 6: Once the data has been loaded, the user can start the simulation process.
That is the simulator will send movement data to Movement Control component.
And then the latter will control the head movement of the H-Anime model.
5.1.2 Components
In this system, there are four components: GUI component, Simulator component, Move-
ment Control component and HAnim Model component. These component will be dis-
cussed in the following sections.
50. Introduction 46
5.1.2.1 GUI Component
The main functions in the GUI component is I/O (Input and Output) handling. There
are two frames in this component: a main frame and a subframe.
Figure 5.4: GUI Appearance
In the main frame, there are two functional area: one is Control Panel and the other is 3D
Browser. In the Control Panel, there are four sub-control panels. They are Human Model,
Movement Model, Sensor Data and Play Menu. The Human Model contains all the joints
of a H-Anim model, which are built in a tree structure. The Movement Model contains the
Euler movement models described in Chapter 4. In the Sensor Data, the user can select
an on-line working model, under which this visualization application get movement data
directly from helmet system, or an off-line working model, under which the movement data
51. Introduction 47
must be loaded from a data file. The Play Menu only works under off-line model. In this
menu, user can play or playback head movement according the sensor data from a data
file. A subframe contains the information of a specific joint selected from human model
(H-Anim) tree in the main frame. In a subframe, the user can modify the parameters of
a joints like center of a joint, translation, scale and rotation of a joint.
5.1.2.2 Simulator Component
When the system is working under off-line mode, the Simulator simulates the real sensor-
based diagnostic system to send movement data to the Movement Control component
from a data file.
Figure 5.5: Flow Chart of Simulator
The time-out information is derived from the timestamp in original data record in a data
file. Let us assume that the last movement data is:
(α,β,γ,timestamp1)
52. Introduction 48
The current movement data is:
(α,β,γ,timestamp2)
Once the current movement data is read from data file, the simulator sleep for ∆ t before
it sends the current movement data to the Movement Control component.
∆t = timestamp2 − timestamp1 (5.1)
5.1.2.3 HAnimModel Component
The HAnim Model component takes responsibilities to create and manipulate a 3D human
model.
Figure 5.6: Flow Chart of HAnimModel
This component loads a X3D file, in which there is H-Anim model defined. After loading
a X3D file, this component constructs a tree structure containing all the joints in H-Anim
model. This tree structure is displayed in GUI component, where a user can modify the
parameters of a selected joint. This component also provides a SAI interface so that the
53. Introduction 49
Movement Control component can control the movement of neck and head of the loaded
H-Anim model.
5.1.2.4 Movement Control Component
The function of this component is movement control.
Figure 5.7: Flow Chart of Movement Control
It creates a TCP/IP socket port and listens to this port. Once there is movement data
coming, it use a selected movement model to calculate the value of Euler’s Axis and Angle.
And then using this calculated value to trigger the movement of head and neck.
5.2 Java Source Code Introduction
The Java code of this application is based on Java 3D (see [32]) and Xj3D (see [33])
projects. The Java 3D API enables a creation of three-dimensional graphics applications
and Internet-based 3D applets. It provides high-level constructs for creating and manip-
ulating 3D geometry and building the structures used in rendering that geometry. The
54. Introduction 50
Java 3D API 1.5.1 is used in this project. The Xj3D is a project of the Web3D Consor-
tium, which focuses on creating a toolkit for VRML97 and X3D content and is written
completely in Java. The current stable release of Xj3D is 1.0. Since the version 1.0 of
Xj3D does not completely implement the functions of the H-Anim specification, a devel-
oper version VERSION-2-0-M10 (see [34]) is compiled from the source code of the Xj3D
project and used in this 3D visualization application.
5.2.1 Source Code Structure
In this project, Ant 1.7.0 (see [35]) is used as the build tool. The compiler used in this
project is JDK 1.6.0 01. Together with the libraries from Java 3D and Xj3D projects
(including required libraries like Java OpenGL), the project structure is shown as follows:
Figure 5.8: Source Code Structure
Under the root directory of this project, there is a file named ”build.xml”, which is Ant’s
55. Introduction 51
buildfile. This file tells the build tool Ant how to compile and run the whole project. In
the sub-directory ”bin”, there are some native libraries like Java OpenGL native libraries.
The sub-directory ”Data” contains off-line sensor data. Java supported libraries are placed
in the ”jars” sub-directory and the H-Anim models developed in this project are stored
in sub-directory ”Model”. The Java source codes are stored in sub-directory ”src”, which
contains a visualization application in its sub-directory ”Hanim” and a segmentation tool
in its sub-directory ”tools”. The files ”AppMain.java” and ”ToolsMain” are two respective
entrance classes of the visualization application and the segmentation tool.
5.3 Testing and Evaluation
The 3D visualization developed in this thesis will be tested and evaluated in this section.
Before testing, Java Runtime Engine version 6 (see [36]) or a higher version must be
installed on computer, on which the 3D visualization will run. A run-time version of
the 3D application should be installed in a computer with Micorsoft Windows Operation
System. The following figure shows an example that the 3D visualization is installed in
the directory D:Hanim Runtime. In fact, it can be installed in any directory.
56. Introduction 52
Figure 5.9: 3D Visualization Application Run-time Version
The same as in Source Code Structure, the directory ”bin” contains the OpenGL native
libraries; the directory ”jars” contains Java libraries from the Java 3D project and the
Xj3D project. The directory ”Data” contains movement data captured by the sensor-based
diagnostic system. These data are used for off-line visualization. The directory ”Model”
contains H-Anim 3D models. One of these models should be loaded when running the
application. The file ”HeadMovement.jar” is a Java executable program implemented in
this master thesis. Running the program is quit simple, just double clicking on script file
”3DHeadMovementApp.bat” and a GUI application will start as follows:
58. Introduction 54
Figure 5.11: Loading a H-Anim Model
Once a model is loaded, a 3D human model will be displayed in the X3D Browser:
59. Introduction 55
Figure 5.12: Loaded H-Anim Model
After loading a model, the joint parameters can be adjusted as it was shown in Figure 5.4.
User can also change viewpoint settings or movement control models:
61. Introduction 57
For an on-line testing, the on-line working model must be set as follows:
Figure 5.14: On-Line Setting
The sensor-based diagnostic system must be changed so that it can send sensor data
via a TCP/IP link. Running the sensor-based diagnostic system together with the 3D
visualization is shown as follows:
62. Introduction 58
Figure 5.15: 2D and 3D GUI
For an off-line testing, do not setting on-line working model and load sensor data from a
data file as follows:
64. Introduction 60
After loading a data file, click the running button to run the visualization application as
follows:
Figure 5.17: Off-Line Running
From the above figure, the head and neck of the loaded model are moving according the
loaded data.
65. Chapter 6
Conclusion and Outlook
In all, this thesis implements a 3D visualization application for a sensor-based diagnosis
system. With the help of this 3D visualization, doctors can sense the head and neck
movement in an intuitive manner.
This thesis evaluates several 3D techniques which range from low level APIs, e.g. OpenGL
and Direct3D, to high level standards, e.g. X3D and H-Anim. Such an evaluation explains
why this master thesis accepts the high level standards X3D and H-Anim as the solution
for th 3D visualization problem. Besides adopting open high level standards, this thesis
also proposes a method to build a realistic human model, which are composed by two
stages. In the first stage, an existing simple model BoxMan is modified to build a skeletal
hierarchy of a human body, which contains a neck joint to be controlled by programming.
In the second stage, a small Java tool is developed to segment a continuous 3D mesh into
several separated parts so that each part can be attached to different joints. In this stage,
a realistic skin surface is built and attached to the skeletal hierarchy built in the first
stage, so that the human model looks more like a real person. To achieve a 3D movement
from three sensor-measured data, Euler’s rotation theorem is used in this paper. This
paper separated human modeling and movement control so that the human model can be
changed from a man to a woman, from a whole body model to a half body model, without
impact to movement controlling. This paper also suggests a loose coupling system archi-
tecture, in which the existing 2D visualization can work together with a 3D visualization.
This flexible architecture allows the system working on-line with real-time sensor data or
off-line with recored data.
61
66. Introduction 62
In the future work, a texture mapping on the face of a model can be introduced so that
a more realistic face appearance can be achieved. Another improvement can be done is
that adding more joints and attaching muscles in the neck of a human model, so that
movement can be more smoothly. The 3D application developed in this thesis can also
be extended in many ways for many usages. For example, it can be easily extended to
support the other joint movement rather than the neck. It is also possible to support a
visualization of neural signals or blood flow of human body.
67. Bibliography
[1] Wayne Carlson: A Critical History of Computer Graphics and Animation, in section 2.
URL:http://design.osu.edu/carlson/history/lesson2.html.
[2] Web3D Consortium: VRML97 and Related Specifications.
URL: http://www.web3d.org/x3d/specifications/vrml/.
[3] Web3D Consortium: Examples of VRML97. URL: http://www.web3d.org/x3d/specifications/vrml/ISO-
IEC-14772-VRML97/part1/examples.html.
[4] World Wide Web Consortium: Extensible Markup Language. URL:http://www.w3.org/XML/.
[5] Web3D Consortium: X3D and Related Specifications. URL:http://www.web3d.org/x3d/specifications/.
[6] Web3D Consortium: Humanoid animation (H-Anim).
URL: http://www.web3d.org/x3d/specifications/ISO-IEC-19774-HumanoidAnimation/.
[7] Stefan Wesarg: MedicalImaging, Fraunhofer IGD, Darmstadt,Dept. Cognitive Computing and Med-
ical Imaging. URL: http://www.igd.fraunhofer.de/igd-a7/lectures/medicalimaging 06.pdf.
[8] Prof. Dr. Klaus Solbach, Dr. Reinhard Viga: Diagnosesystem f¨ur neurologische Bewegungsst¨orungen.
URL: http://www.forum-forschung.de/2004/artikel/18.html.
[9] OpenGL Web Site: OpenGL and OpenGL Utility Specifications.
URL: http://www.opengl.org/documentation/specs/.
[10] Wikipedia: OpenGL. URL:http://en.wikipedia.org/wiki/OpenGL.
[11] Wikipedia: Comparison of OpenGL and Direct3D.
URL: http://en.wikipedia.org/wiki/Comparison of Direct3D and OpenGL.
[12] Microsoft Corporation: Get started with DirectX 10.
URL: http://www.gamesforwindows.com/en-US/AboutGFW/Pages/DirectX10.aspx.
[13] Wikipedia: Direct3D. URL:http://en.wikipedia.org/wiki/Direct3d.
[14] Microsoft Corporation: Pipeline Stages (Direct3D 10).
URL: http://msdn2.microsoft.com/en-us/library/bb205123%28VS.85%29.aspx.
63
68. Introduction 64
[15] Java-Net Community: JOGL API Project. URL: https://jogl.dev.java.net/.
[16] Wikipedia: Open system in computing science.
URL: http://en.wikipedia.org/wiki/Open system %28computing%29.
[17] Wikipedia: Virtual Reality. URL: http://en.wikipedia.org/wiki/Virtual reality.
[18] ISO Website of ISO. http://www.iso.org.
[19] Andrea L. Ames, David R. Nadeau and John L. Moreland: The VRML 2.0 Sourcebook. URL:
http://www.wiley.com/legacy/compbooks/vrml2sbk/cover/cover.htm.
[20] Mike Bailey and Don Brutzman: The X3D-Edit 3.2 Authoring Tool.
URL: https://savage.nps.edu/X3D-Edit/.
[21] Seamless3d Home Page. URL:http://www.seamless3d.com/#3d-modelling-software.
[22] Web3D Consortium: X3D Viewers, Browsers & Plug-ins.
URL: http://www.web3d.org/tools/viewers and browsers/.
[23] Web3D Consortium: A Seamless VRML Human, demonstrating the H-Anim 2001 Specification.
URL: http://www.web3d.org/x3d/content/examples/Basic/HumanoidAnimation/BoxMan.x3dv.
[24] Blender Projects: the open source project MakeHuman.
URL: http://projects.blender.org/projects/makeh/.
[25] Gamma project: 3D Meshes Research Database. URL: http://www-c.inria.fr/gamma/gamma.php.
[26] The MathWorks, Inc: MatLab - The Language of Technical Computing
URL:http://www.mathworks.com/products/matlab/.
[27] Weisstein, Eric W: Euler Angles. From MathWorld–A Wolfram Web Resource.
URL: http://mathworld.wolfram.com/EulerAngles.html.
[28] Scott W. Ambler: Introduction to UML 2 Sequence Diagrams.
URL: http://www.agilemodeling.com/artifacts/sequenceDiagram.htm.
[29] Internet Engineering Task Force: RFC 1180 A TCP/IP Tutorial (January 1991).
URL: http://tools.ietf.org/html/rfc1180.
[30] J. Postel, J. Reynolds: RFC 959 File Transfer Protocol (FTP).
URL: http://tools.ietf.org/html/rfc959.
[31] Internet Engineering Task Force: Network File System (NFS) version 4 Protocol.
URL: http://tools.ietf.org/html/rfc3530.
[32] Sun Microsystems: The Java 3D API. URL: http://java.sun.com/products/java-media/3D/.
[33] Web3D Consortium: Xj3D Project Home Page. URL: http://www.xj3d.org/.
69. Introduction 65
[34] Web3D Consortium: Development Snapshots of Xj3D project.
URL: http://www.xj3d.org/snapshots.html.
[35] Apache Software Foundation: Apache Ant Project. URL: http://ant.apache.org/index.html.
[36] Sun Microsystems: Java(TM) Platform, Standard Edition Runtime Environment Version 6
URL:http://java.sun.com/javase/6/webnotes/install/jre/README.
70. List of Figures
2.1 Diagnostic System for Neurological Movement Disorders (see [8]) . . . . . . 4
2.2 Movement Parameters (see [8]) . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 2D GUI of the Sensor-based Diagnostic System (see [8]) . . . . . . . . . . . 6
2.4 Piece of Code Using OpenGL for drawing (see [10]) . . . . . . . . . . . . . . 8
2.5 Simplified version of the Graphics Pipeline Process (see [10]) . . . . . . . . 9
2.6 A Piece of Code Using Direct3D (see [13]) . . . . . . . . . . . . . . . . . . 10
2.7 Pipeline Stages (Direct3D Version 10) of Microsoft Direct3D (see [14]) . . . 11
2.8 A VRML Source Code (see [3]) . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.9 A VRML Example in a 3D Web Browser (see [3]) . . . . . . . . . . . . . . 15
2.10 X3D Architecture (see [5]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.11 X3D examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.12 X3D Edit for X3D authoring (see [20]) . . . . . . . . . . . . . . . . . . . . . 19
2.13 SAI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.14 SAI Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.15 Java 3D Scene Graph Model (see [32]) . . . . . . . . . . . . . . . . . . . . . 23
2.16 A Complete Java 3D Scene Graph (see [32]) . . . . . . . . . . . . . . . . . 24
2.17 Euler Axis and Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Simple Box Man Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Simple Demo of 3D Visualization . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 A 3D Mesh Constructed by Blender . . . . . . . . . . . . . . . . . . . . . . 32
3.4 A 3D Mesh from 3D Meshes Research Database . . . . . . . . . . . . . . . 33
3.5 From A Simple Model To A Complex Model . . . . . . . . . . . . . . . . . 34
3.6 Segmentation for Complex Model . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Skeletal Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.8 A Complex Model for Woman . . . . . . . . . . . . . . . . . . . . . . . . . 37
66