We present Lift, a visible light-enabled finger
tracking and object localization technique that allows users to
perform freestyle multi-touch gestures on any object’s surface in
an everyday environment. By projecting encoded visible patterns
onto an object’s surface (e.g. paper, display, or table), and
localizing the user’s fingers with light sensors, Lift offers users a
richer interactive space than the device’s existing interfaces.
Additionally, everyday objects can be augmented by attaching
sensor units onto their surface to accept multi-touch gesture
input. We also present two applications as proof of concept.
Finally, results from our experiments indicate that Lift can
localize ten fingers simultaneously with an average accuracy of
1.7 millimeter and an average refresh rate of 84 Hz with 31
milliseconds delay on WiFi and 23 milliseconds delay on serial
communication, making gesture recognition on non-
instrumented objects possible.
The document describes a method for automatic projector calibration using embedded light sensors. The approach embeds light sensors into the target projection surface to detect gray code patterns projected by the projector. By analyzing the detected patterns with the sensors, the system can automatically calibrate the projector without any external cameras. The calibration process is fast, robust, accurate, and scalable to different projection surfaces and setups.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
This document describes a laser distance measurement system using a webcam. It consists of a laser transmitter and webcam receiver. The laser pulse is reflected off an object and received by the webcam. Software calculates the distance based on the time of flight. The system achieves high accuracy of ±3cm. It calibrates the system using test measurements to determine the relationship between pixel location of the laser dot and actual distance. This allows accurate distance measurements within a few percent of error out to over 2 meters. Potential improvements discussed are using a laser line instead of dot for more data points and a green laser for better visibility.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
Digital 3D imaging can be accelerated using advances in VLSI technology. High-resolution 3D images can be captured using laser-based vision systems, which produce 3D information insensitive to background illumination and surface texture. Complete images of featureless surfaces invisible to the human eye can be generated. Sensors for 3D digitization include position sensitive detectors and laser sensors. Continuous response position sensitive detectors provide precise centroid measurement while discrete response detectors are slower but more accurate. An integrated sensor architecture is proposed using a combination of these sensors to simultaneously measure color and 3D.
Enhanced Computer Vision with Microsoft Kinect Sensor: A ReviewAbu Saleh Musa
This document discusses enhanced computer vision capabilities using the Microsoft Kinect sensor. It covers preprocessing techniques, object tracking and recognition, human activity analysis, hand gesture analysis, indoor 3D mapping, and issues and future outlook. The document reviews Kinect's hardware, software tools, and performance, and techniques like depth data filtering, object detection and tracking, pose estimation, and sparse/dense point matching. It aims to provide an overview of research on using Kinect for applications in computer vision.
Sensors on 3 d digitization seminar reportVishnu Prasad
The document discusses sensors for 3D digitization. It describes two main strategies for 3D vision - passive vision which analyzes ambient light, and active vision which structures light using techniques like laser range cameras. It then discusses an auto-synchronized scanner that can provide registered 3D surface maps and color data by scanning a laser spot across a scene and detecting the reflected light with a linear sensor, producing registered images with spatial and color information.
The document describes a method for automatic projector calibration using embedded light sensors. The approach embeds light sensors into the target projection surface to detect gray code patterns projected by the projector. By analyzing the detected patterns with the sensors, the system can automatically calibrate the projector without any external cameras. The calibration process is fast, robust, accurate, and scalable to different projection surfaces and setups.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
This document describes a laser distance measurement system using a webcam. It consists of a laser transmitter and webcam receiver. The laser pulse is reflected off an object and received by the webcam. Software calculates the distance based on the time of flight. The system achieves high accuracy of ±3cm. It calibrates the system using test measurements to determine the relationship between pixel location of the laser dot and actual distance. This allows accurate distance measurements within a few percent of error out to over 2 meters. Potential improvements discussed are using a laser line instead of dot for more data points and a green laser for better visibility.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
Digital 3D imaging can be accelerated using advances in VLSI technology. High-resolution 3D images can be captured using laser-based vision systems, which produce 3D information insensitive to background illumination and surface texture. Complete images of featureless surfaces invisible to the human eye can be generated. Sensors for 3D digitization include position sensitive detectors and laser sensors. Continuous response position sensitive detectors provide precise centroid measurement while discrete response detectors are slower but more accurate. An integrated sensor architecture is proposed using a combination of these sensors to simultaneously measure color and 3D.
Enhanced Computer Vision with Microsoft Kinect Sensor: A ReviewAbu Saleh Musa
This document discusses enhanced computer vision capabilities using the Microsoft Kinect sensor. It covers preprocessing techniques, object tracking and recognition, human activity analysis, hand gesture analysis, indoor 3D mapping, and issues and future outlook. The document reviews Kinect's hardware, software tools, and performance, and techniques like depth data filtering, object detection and tracking, pose estimation, and sparse/dense point matching. It aims to provide an overview of research on using Kinect for applications in computer vision.
Sensors on 3 d digitization seminar reportVishnu Prasad
The document discusses sensors for 3D digitization. It describes two main strategies for 3D vision - passive vision which analyzes ambient light, and active vision which structures light using techniques like laser range cameras. It then discusses an auto-synchronized scanner that can provide registered 3D surface maps and color data by scanning a laser spot across a scene and detecting the reflected light with a linear sensor, producing registered images with spatial and color information.
An Approach for Object and Scene Detection for Blind Peoples Using Vocal Vision.IJERA Editor
This system help the blind peoples for the navigation without the help of third person so blind person can perform its work independently. This system implemented on android device in which object detection and scene detection implemented, so after detection there will be text to speech conversion so user or blind person can get message from that android device with the help of headphone connected to that device. Our project will help blind people to understand the images which will be converted to sound with the help of webcam. We shall capture images in front of blind peoples .The captured image will be processed through our algorithms which will enhances the image data. The hardware component will have its own database. The processed image is compare with the database in the hardware component .The result after processing and comparing will be converted into speech signals. The headphones guide the blind peoples.
A review on automatic wavelet based nonlinear image enhancement for aerial ...IAEME Publication
This document summarizes an article from the International Journal of Electronics and Communication Engineering & Technology about improving aerial imagery through automatic wavelet-based nonlinear image enhancement. It discusses how aerial images often have low clarity due to atmospheric effects and limited dynamic range of cameras. The proposed method uses wavelet-based dynamic range compression to enhance aerial images while preserving local contrast and tonal rendition. It applies techniques like nonlinear processing, selective enhancement based on the human visual system, and uses Gabor filters for high-pass filtering to generate a enhanced image. The results of applying this algorithm to various aerial images show strong robustness and improved image quality.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
1) The document presents a real-time static hand gesture recognition system for the Devanagari number system using two feature extraction techniques: Discrete Cosine Transform (DCT) and Edge Oriented Histogram (EOH).
2) The system captures an image using a webcam, performs pre-processing, extracts the region of interest, then extracts features using DCT or EOH before matching against a training database to recognize the gesture.
3) An experiment tested 20 images and found DCT achieved a higher recognition accuracy of 18 gestures compared to 15 for EOH.
1. The document proposes a new method for shadow detection and removal in satellite images using image segmentation, shadow feature extraction, and inner-outer outline profile line (IOOPL) matching.
2. Key steps include segmenting the image, detecting suspected shadows, eliminating false shadows through analysis of color and spatial properties, extracting shadow boundaries, and obtaining homogeneous sections through IOOPL similarity matching to determine radiation parameters for shadow removal.
3. Experimental results showed the method could successfully detect shadows and remove them to improve image quality for applications like object classification and change detection.
With the rapid development of smartphone industry,
various positioning-enabled sensors such as GPS receivers,
accelerometers, gyroscopes, digital compasses, cameras, WiFi and Bluetooth have been built in smartphones for
communication, entertainment and location-based services.
Smartphone users can get their locations fixed according to
the function of GPS receiver.
With the rapid development of
smartphone industry, various positioning-enabled
sensors such as GPS receivers, accelerometers,
gyroscopes, digital compasses, cameras, Wi-Fi and
Bluetooth have been built in smartphones for
communication, entertainment and location-based
services. Smartphone users can get their locations
fixed according to the function of GPS receiver.
This document summarizes a research paper that proposes a method for lossless visible watermarking using spread spectrum techniques. The method embeds various types of visible watermarks (opaque, monochrome, translucent) into images using reversible one-to-one compound mappings of pixel values. This allows the original image to be recovered losslessly from the watermarked image. The watermark is spread across the image spectrum, providing security against attacks by making any changes imperceptible or requiring changes across the entire spectrum. Algorithms for the watermark embedding and removal processes are described.
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Tarek Gaber
SIFT-based Arabic Sign Language Recognition (ArSL) System
By
Alaa Tharwat1,3
And
Tarek Gaber2,3
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University
Scientific Research Group in Egypt
Introduction: Why ArSL
Introduction: Aim of the work
What is ArSL?
Translating ArSL to spoken language, i.e. translate hand gestures to Arabic characters
Sign Language hand formations:
Hand shape
Hand location
Hand movement
Hand orientation
Introduction: Types of ArSL
Proposed Method: General Framework
Proposed Method: General Framework
Training phase
Collecting all training images (i.e. gestures of Arabic Sign Language).
Extracting the features using SIFT
Representing each image by one feature vector.
Applying a dimensionality reduction (e.g, LDA) to reduce the number features in the vector
Proposed Method: Feature Extraction
Proposed Method: Feature Extraction
Proposed Method: Feature Extraction
Proposed Method: Classification Techniques
We have used the following classifiers assess their performance with our approach :
SVM is one of the classifers which deals with a problem of high dimensional datasets and gives very good results.
K-NN: unknown patterns are distinguished based on the similarity to known samples
Nearest Neighbor: Its idea is extremely simple as it does not require learning
Experimental Results: Dataset
We have used 210 gray level images with size 200x200.
These images represent 30 Arabic characters, 7 images for each character).
The images are collected in different illumination, rotation, quality levels, and image partiality.
Experimental Scenarios
To select the most suitable parameters.
To understand the effect of changing the number of training images.
To prove that our proposed method is robust against rotation
To prove that our proposed method is robust against occlusion.
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Conclusions
Our proposal approach for ArSL Recognition
Achieve an excellent accuracy to identify ArSL from 2D images
Robust against to rotation images with different angels and occluded images horizontally or vertically.
Robust against many previous ArSL approaches.
Performance of this approach is measured by
Using captured images with Matlab implementation
Comparison with related work
Future Work
Improving the results of in case of image occlusion
Increase the size of the dataset to check its scalability.
Identify characters from video frames and then try to implement real time ArSL system.
Thanks
Immersive 3 d visualization of remote sensing datasipij
Immersive 3D Visualization is a Java application that allows users to view remote sensing data like aerial images in 3D. It uses Java 3D Technology and OpenGL to render 2D images into 3D by combining the image data with digital elevation models (DEM) that provide height information. This allows any region of interest to be viewed in 3D rather than just 2D images. The application can display 3D models of terrain, buildings, and vegetation to create an immersive 3D visualization of remote areas. It provides tools for simulation and fly-through of 3D environments built from remote sensing data.
Complex Weld Seam Detection Using Computer Vision Linked Inglenn_silvers
This document discusses a project to use computer vision and a Microsoft Kinect sensor to enable real-time gesture control of a welding robot. The project aims to detect and track a user's hand gestures to control robot movement, and to define the weld seam region of interest to allow for seam detection. The plan involves accessing Kinect data, detecting and tracking the hand in 3D space, recognizing gestures for robot movement commands, extracting color values from the hand for skin detection, and using the hand position to define the seam region of interest. The work so far has successfully defined the hand and fingers, tracked hand motion, and extracted the seam region. Further work is needed to finalize the gesture commands and integrate control of the robot.
The document describes TagSense, a new approach to automatic image tagging using smartphones. TagSense leverages multiple sensors in smartphones like accelerometers, compasses, microphones and GPS to generate tags for images including who, what activity, and location. It works better than existing techniques like facial recognition since it does not rely on facial features. The system architecture has phones share a password to coordinate sensor data collection when a photo is taken to generate tags without human input. Preliminary tests show TagSense has higher recall than other auto-tagging systems but lower precision.
Kinect is a motion sensing input device for Xbox 360 that uses cameras and microphones to recognize gestures, body movements, and voice commands. It allows users to control and interact with their console without using a game controller through gestures, spoken commands, and body movements. The Kinect SDK provides both managed and unmanaged APIs for developers to access Kinect's RGB camera, depth sensor, skeletal tracking capabilities and audio processing for their applications. Microsoft Research played a key role in developing the underlying computer vision and machine learning techniques that enabled Kinect's capabilities.
The document describes a project to develop a tabletop touchscreen interface using a Kinect depth sensor. The researchers designed and built a table setup with a projected screen and mounted Kinect. They tested the Kinect's ability to track finger touches and gestures through programs that allowed for coloring, puzzles, zooming and image swiping. The Kinect was able to accurately detect and follow finger motion. This demonstrated the viability of using a depth sensor for a multi-user touch interface and suggested advantages over other touchscreen technologies.
Real-Time Map Building using Ultrasound ScanningIRJET Journal
This document describes a device that uses an ultrasonic sensor to build maps of environments in real-time. The device scans the sensor across a 180 degree range to collect distance data points and generates a 2D map visualization. It is intended to enable robots to autonomously map and navigate environments. The device was able to generate basic maps of test environments but had some inaccuracies, depicting flat surfaces as arcs rather than straight lines due to the wide beam of the ultrasonic sensor. Improving the sensor resolution could help address these inaccuracies. The real-time mapping ability, low cost, and small size make the device suitable for mobile robot applications.
Computer vision based human computer interaction using color detection techni...Chetan Dhule
This document describes a method for controlling a computer using hand and finger gestures detected through a webcam, without the need for specialized hardware or gesture recognition training. The method tracks color markers attached to fingers to detect finger motion in real-time and uses the motion to control the mouse pointer position and clicks. An application was created with a graphical user interface that allows setting the marker color and controls the mouse based on finger movements detected by calculating pixel value changes of the colored markers in video frames. The method provides a low-cost way to interact with a computer using natural hand gestures without lag compared to existing gesture recognition methods.
Integration of a Structure from Motion into Virtual and Augmented Reality for...Tomohiro Fukuda
Proceedings (Full paper reviewed)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Future Trajectories of Computation in Design: 17th International Conference CAAD Futures 2017, p.596, 2017.7
Book (Book contribution)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Computer-Aided Architectural Design - Future Trajectories,pp.60-77,Springer (Communications in Computer and Information Science 724), ISSN 1865-0929,ISBN 978-981-10-5196-8,2017.7
Computational visual simulations are extremely useful and powerful tools for decision-making. The use of virtual and augmented reality (VR/AR) has become a common phenomenon due to real-time and interactive visual simulation tools in architectural and urban design studies and presentations. In this study, a demonstration is performed to integrate structure from motion (SfM) into VR and AR. A 3D modeling method is explored by SfM under real-time rendering as a solution for the modeling cost in large-scale VR. The study examines the application of camera parameters of SfM to realize an appropriate registration and tracking accuracy in marker-less AR to visualize full-scale design projects on a planned construction site. The proposed approach is applied to plural real architectural and urban design projects, and results indicate the feasibility and effectiveness of the proposed approach.
This document summarizes a research project that developed an interactive display system using image feedback localization techniques. The system uses an infrared LED or laser pointer as an input device on projected surfaces to replace a mouse or keyboard. A projector is used for display, and a camera captures images to localize the infrared pointer and track user movement. Several applications are demonstrated, including using the pointer on projected surfaces like tables or car windows to interact with documents, games, and more. The system aims to provide an intuitive, low-cost interactive display environment.
DESIGN AND DEVELOPMENT OF WRIST-TILT BASED PC CURSOR CONTROL USING ACCELEROMETERIJCSEA Journal
Human computer Interfacing apparatus is key part in modern electronics period. Motion recognition can be well introduced in present day computers to play games. In this work simple inertial navigation sensor like accelerometer can be use to get Dynamic or Static acceleration profile of movement to move cursor of mouse or even rotate 3-D object. In this paper a human computer interfaces system is presented, which will be able to act as an enhanced version of one of the most common interfacing system, which is computer mouse. In this research work, an alternative to interact with computer, for those who do not want to use conventional HCI (human computer interface) or not able to use conventional human computer interface and this achieved by using a sensor accelerometer mount on human wrist or anywhere in human body.Accelerometer device use to detect the position in x, y direction caused by movement of device mount on the wrist, as referred from acceleration of gravity (1g=9.8m/s2). Accelerometer is connected with PIC16F877A for analog to digital conversion and PIC16F877A are connected with LCD for displaying the co-ordinates(x, y) in which the accelerometer move and it further connected with ZigBee transmitter which is used to transmit the wireless signal to ZigBee receiver. ZigBee receiver receives the signal from transmitting end and transferred to PC through MAX232 serial communication. In PC an application for cursor control in response to accelerometer movement is developed
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
Real Time Head & Hand Tracking Using 2.5D Data Harin Veera
This document provides an overview of installing Linux on the Mini 6410 single board computer. It discusses the necessary components which include the bootloader, kernel, and root file system. The Supervivi bootloader and Linux kernel are used. The root file system is the Qtopia root file system specific to Mini 6410. These components are loaded onto the NOR flash of the Mini 6410 board to enable it to run Linux. Additionally, application programs can be loaded onto the NAND flash. The DNW tool is used to load the components via USB. Once installed, the Mini 6410 can run applications based on the program loaded in NAND flash. The document also provides details about the hardware specifications of the Mini 6410
An Approach for Object and Scene Detection for Blind Peoples Using Vocal Vision.IJERA Editor
This system help the blind peoples for the navigation without the help of third person so blind person can perform its work independently. This system implemented on android device in which object detection and scene detection implemented, so after detection there will be text to speech conversion so user or blind person can get message from that android device with the help of headphone connected to that device. Our project will help blind people to understand the images which will be converted to sound with the help of webcam. We shall capture images in front of blind peoples .The captured image will be processed through our algorithms which will enhances the image data. The hardware component will have its own database. The processed image is compare with the database in the hardware component .The result after processing and comparing will be converted into speech signals. The headphones guide the blind peoples.
A review on automatic wavelet based nonlinear image enhancement for aerial ...IAEME Publication
This document summarizes an article from the International Journal of Electronics and Communication Engineering & Technology about improving aerial imagery through automatic wavelet-based nonlinear image enhancement. It discusses how aerial images often have low clarity due to atmospheric effects and limited dynamic range of cameras. The proposed method uses wavelet-based dynamic range compression to enhance aerial images while preserving local contrast and tonal rendition. It applies techniques like nonlinear processing, selective enhancement based on the human visual system, and uses Gabor filters for high-pass filtering to generate a enhanced image. The results of applying this algorithm to various aerial images show strong robustness and improved image quality.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
1) The document presents a real-time static hand gesture recognition system for the Devanagari number system using two feature extraction techniques: Discrete Cosine Transform (DCT) and Edge Oriented Histogram (EOH).
2) The system captures an image using a webcam, performs pre-processing, extracts the region of interest, then extracts features using DCT or EOH before matching against a training database to recognize the gesture.
3) An experiment tested 20 images and found DCT achieved a higher recognition accuracy of 18 gestures compared to 15 for EOH.
1. The document proposes a new method for shadow detection and removal in satellite images using image segmentation, shadow feature extraction, and inner-outer outline profile line (IOOPL) matching.
2. Key steps include segmenting the image, detecting suspected shadows, eliminating false shadows through analysis of color and spatial properties, extracting shadow boundaries, and obtaining homogeneous sections through IOOPL similarity matching to determine radiation parameters for shadow removal.
3. Experimental results showed the method could successfully detect shadows and remove them to improve image quality for applications like object classification and change detection.
With the rapid development of smartphone industry,
various positioning-enabled sensors such as GPS receivers,
accelerometers, gyroscopes, digital compasses, cameras, WiFi and Bluetooth have been built in smartphones for
communication, entertainment and location-based services.
Smartphone users can get their locations fixed according to
the function of GPS receiver.
With the rapid development of
smartphone industry, various positioning-enabled
sensors such as GPS receivers, accelerometers,
gyroscopes, digital compasses, cameras, Wi-Fi and
Bluetooth have been built in smartphones for
communication, entertainment and location-based
services. Smartphone users can get their locations
fixed according to the function of GPS receiver.
This document summarizes a research paper that proposes a method for lossless visible watermarking using spread spectrum techniques. The method embeds various types of visible watermarks (opaque, monochrome, translucent) into images using reversible one-to-one compound mappings of pixel values. This allows the original image to be recovered losslessly from the watermarked image. The watermark is spread across the image spectrum, providing security against attacks by making any changes imperceptible or requiring changes across the entire spectrum. Algorithms for the watermark embedding and removal processes are described.
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Tarek Gaber
SIFT-based Arabic Sign Language Recognition (ArSL) System
By
Alaa Tharwat1,3
And
Tarek Gaber2,3
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University
Scientific Research Group in Egypt
Introduction: Why ArSL
Introduction: Aim of the work
What is ArSL?
Translating ArSL to spoken language, i.e. translate hand gestures to Arabic characters
Sign Language hand formations:
Hand shape
Hand location
Hand movement
Hand orientation
Introduction: Types of ArSL
Proposed Method: General Framework
Proposed Method: General Framework
Training phase
Collecting all training images (i.e. gestures of Arabic Sign Language).
Extracting the features using SIFT
Representing each image by one feature vector.
Applying a dimensionality reduction (e.g, LDA) to reduce the number features in the vector
Proposed Method: Feature Extraction
Proposed Method: Feature Extraction
Proposed Method: Feature Extraction
Proposed Method: Classification Techniques
We have used the following classifiers assess their performance with our approach :
SVM is one of the classifers which deals with a problem of high dimensional datasets and gives very good results.
K-NN: unknown patterns are distinguished based on the similarity to known samples
Nearest Neighbor: Its idea is extremely simple as it does not require learning
Experimental Results: Dataset
We have used 210 gray level images with size 200x200.
These images represent 30 Arabic characters, 7 images for each character).
The images are collected in different illumination, rotation, quality levels, and image partiality.
Experimental Scenarios
To select the most suitable parameters.
To understand the effect of changing the number of training images.
To prove that our proposed method is robust against rotation
To prove that our proposed method is robust against occlusion.
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Conclusions
Our proposal approach for ArSL Recognition
Achieve an excellent accuracy to identify ArSL from 2D images
Robust against to rotation images with different angels and occluded images horizontally or vertically.
Robust against many previous ArSL approaches.
Performance of this approach is measured by
Using captured images with Matlab implementation
Comparison with related work
Future Work
Improving the results of in case of image occlusion
Increase the size of the dataset to check its scalability.
Identify characters from video frames and then try to implement real time ArSL system.
Thanks
Immersive 3 d visualization of remote sensing datasipij
Immersive 3D Visualization is a Java application that allows users to view remote sensing data like aerial images in 3D. It uses Java 3D Technology and OpenGL to render 2D images into 3D by combining the image data with digital elevation models (DEM) that provide height information. This allows any region of interest to be viewed in 3D rather than just 2D images. The application can display 3D models of terrain, buildings, and vegetation to create an immersive 3D visualization of remote areas. It provides tools for simulation and fly-through of 3D environments built from remote sensing data.
Complex Weld Seam Detection Using Computer Vision Linked Inglenn_silvers
This document discusses a project to use computer vision and a Microsoft Kinect sensor to enable real-time gesture control of a welding robot. The project aims to detect and track a user's hand gestures to control robot movement, and to define the weld seam region of interest to allow for seam detection. The plan involves accessing Kinect data, detecting and tracking the hand in 3D space, recognizing gestures for robot movement commands, extracting color values from the hand for skin detection, and using the hand position to define the seam region of interest. The work so far has successfully defined the hand and fingers, tracked hand motion, and extracted the seam region. Further work is needed to finalize the gesture commands and integrate control of the robot.
The document describes TagSense, a new approach to automatic image tagging using smartphones. TagSense leverages multiple sensors in smartphones like accelerometers, compasses, microphones and GPS to generate tags for images including who, what activity, and location. It works better than existing techniques like facial recognition since it does not rely on facial features. The system architecture has phones share a password to coordinate sensor data collection when a photo is taken to generate tags without human input. Preliminary tests show TagSense has higher recall than other auto-tagging systems but lower precision.
Kinect is a motion sensing input device for Xbox 360 that uses cameras and microphones to recognize gestures, body movements, and voice commands. It allows users to control and interact with their console without using a game controller through gestures, spoken commands, and body movements. The Kinect SDK provides both managed and unmanaged APIs for developers to access Kinect's RGB camera, depth sensor, skeletal tracking capabilities and audio processing for their applications. Microsoft Research played a key role in developing the underlying computer vision and machine learning techniques that enabled Kinect's capabilities.
The document describes a project to develop a tabletop touchscreen interface using a Kinect depth sensor. The researchers designed and built a table setup with a projected screen and mounted Kinect. They tested the Kinect's ability to track finger touches and gestures through programs that allowed for coloring, puzzles, zooming and image swiping. The Kinect was able to accurately detect and follow finger motion. This demonstrated the viability of using a depth sensor for a multi-user touch interface and suggested advantages over other touchscreen technologies.
Real-Time Map Building using Ultrasound ScanningIRJET Journal
This document describes a device that uses an ultrasonic sensor to build maps of environments in real-time. The device scans the sensor across a 180 degree range to collect distance data points and generates a 2D map visualization. It is intended to enable robots to autonomously map and navigate environments. The device was able to generate basic maps of test environments but had some inaccuracies, depicting flat surfaces as arcs rather than straight lines due to the wide beam of the ultrasonic sensor. Improving the sensor resolution could help address these inaccuracies. The real-time mapping ability, low cost, and small size make the device suitable for mobile robot applications.
Computer vision based human computer interaction using color detection techni...Chetan Dhule
This document describes a method for controlling a computer using hand and finger gestures detected through a webcam, without the need for specialized hardware or gesture recognition training. The method tracks color markers attached to fingers to detect finger motion in real-time and uses the motion to control the mouse pointer position and clicks. An application was created with a graphical user interface that allows setting the marker color and controls the mouse based on finger movements detected by calculating pixel value changes of the colored markers in video frames. The method provides a low-cost way to interact with a computer using natural hand gestures without lag compared to existing gesture recognition methods.
Integration of a Structure from Motion into Virtual and Augmented Reality for...Tomohiro Fukuda
Proceedings (Full paper reviewed)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Future Trajectories of Computation in Design: 17th International Conference CAAD Futures 2017, p.596, 2017.7
Book (Book contribution)
Tomohiro Fukuda, Hideki Nada, Haruo Adachi, Shunta Shimizu, Chikako Takei, Yusuke Sato, Nobuyoshi Yabuki, and Ali Motamedi: 2017, Integration of a Structure from Motion into Virtual and Augmented Reality for Architectural and Urban Simulation: Demonstrated in Real Architectural and Urban Projects, Computer-Aided Architectural Design - Future Trajectories,pp.60-77,Springer (Communications in Computer and Information Science 724), ISSN 1865-0929,ISBN 978-981-10-5196-8,2017.7
Computational visual simulations are extremely useful and powerful tools for decision-making. The use of virtual and augmented reality (VR/AR) has become a common phenomenon due to real-time and interactive visual simulation tools in architectural and urban design studies and presentations. In this study, a demonstration is performed to integrate structure from motion (SfM) into VR and AR. A 3D modeling method is explored by SfM under real-time rendering as a solution for the modeling cost in large-scale VR. The study examines the application of camera parameters of SfM to realize an appropriate registration and tracking accuracy in marker-less AR to visualize full-scale design projects on a planned construction site. The proposed approach is applied to plural real architectural and urban design projects, and results indicate the feasibility and effectiveness of the proposed approach.
This document summarizes a research project that developed an interactive display system using image feedback localization techniques. The system uses an infrared LED or laser pointer as an input device on projected surfaces to replace a mouse or keyboard. A projector is used for display, and a camera captures images to localize the infrared pointer and track user movement. Several applications are demonstrated, including using the pointer on projected surfaces like tables or car windows to interact with documents, games, and more. The system aims to provide an intuitive, low-cost interactive display environment.
DESIGN AND DEVELOPMENT OF WRIST-TILT BASED PC CURSOR CONTROL USING ACCELEROMETERIJCSEA Journal
Human computer Interfacing apparatus is key part in modern electronics period. Motion recognition can be well introduced in present day computers to play games. In this work simple inertial navigation sensor like accelerometer can be use to get Dynamic or Static acceleration profile of movement to move cursor of mouse or even rotate 3-D object. In this paper a human computer interfaces system is presented, which will be able to act as an enhanced version of one of the most common interfacing system, which is computer mouse. In this research work, an alternative to interact with computer, for those who do not want to use conventional HCI (human computer interface) or not able to use conventional human computer interface and this achieved by using a sensor accelerometer mount on human wrist or anywhere in human body.Accelerometer device use to detect the position in x, y direction caused by movement of device mount on the wrist, as referred from acceleration of gravity (1g=9.8m/s2). Accelerometer is connected with PIC16F877A for analog to digital conversion and PIC16F877A are connected with LCD for displaying the co-ordinates(x, y) in which the accelerometer move and it further connected with ZigBee transmitter which is used to transmit the wireless signal to ZigBee receiver. ZigBee receiver receives the signal from transmitting end and transferred to PC through MAX232 serial communication. In PC an application for cursor control in response to accelerometer movement is developed
Development of Sign Signal Translation System Based on Altera’s FPGA DE2 BoardWaqas Tariq
The main aim of this paper is to build a system that is capable of detecting and recognizing the hand gesture in an image captured by using a camera. The system is built based on Altera’s FPGA DE2 board, which contains a Nios II soft core processor. Image processing techniques and a simple but effective algorithm are implemented to achieve this purpose. Image processing techniques are used to smooth the image in order to ease the subsequent processes in translating the hand sign signal. The algorithm is built for translating the numerical hand sign signal and the result are displayed on the seven segment display. Altera’s Quartus II, SOPC Builder and Nios II EDS software are used to construct the system. By using SOPC Builder, the related components on the DE2 board can be interconnected easily and orderly compared to traditional method that requires lengthy source code and time consuming. Quartus II is used to compile and download the design to the DE2 board. Then, under Nios II EDS, C programming language is used to code the hand sign translation algorithm. Being able to recognize the hand sign signal from images can helps human in controlling a robot and other applications which require only a simple set of instructions provided a CMOS sensor is included in the system.
Real Time Head & Hand Tracking Using 2.5D Data Harin Veera
This document provides an overview of installing Linux on the Mini 6410 single board computer. It discusses the necessary components which include the bootloader, kernel, and root file system. The Supervivi bootloader and Linux kernel are used. The root file system is the Qtopia root file system specific to Mini 6410. These components are loaded onto the NOR flash of the Mini 6410 board to enable it to run Linux. Additionally, application programs can be loaded onto the NAND flash. The DNW tool is used to load the components via USB. Once installed, the Mini 6410 can run applications based on the program loaded in NAND flash. The document also provides details about the hardware specifications of the Mini 6410
The document discusses various advanced interaction techniques including touch, gestures, eye tracking, head and pose tracking, speech, and brain-body interfaces. It provides information on multitouch interaction technologies like capacitive touch screens, resistive touch screens, surface acoustic wave touch screens, infrared touch screens, and optical touch screens. It also discusses gesture recognition, eye gaze systems, and speech recognition technologies. The document outlines the components, working principles, applications and future scope of these human-computer interaction techniques.
This document summarizes an interactive touch board that uses an infrared camera and infrared stylus. It can turn any projected display into an interactive surface. The system uses a low-cost infrared camera to detect the position of an infrared light from the stylus tip. An image processing algorithm analyzes the camera image to determine the stylus coordinates and move the mouse cursor accordingly. The algorithm was implemented using NI LabVIEW. Experimental results found average accuracy of 98.9% and latency of 0.28 seconds at a resolution of 800x600 pixels. This low-cost design could enable interactive whiteboard applications in education.
project presentation on mouse simulation using finger tip detection Sumit Varshney
This project presentation describes a virtual mouse interface using finger tip detection. A group of 3 students will design a vision-based mouse that detects hand gestures to control cursor movement and clicks instead of using a physical mouse. The system will use a webcam to capture finger tip motion and apply image processing algorithms like segmentation, denoising, and convex hull analysis to identify gestures and control mouse functions accordingly. The goal is to allow gesture-based computer interaction for applications like presentations to reduce workspace needs.
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMIRJET Journal
This document presents a proposed method for video-based sign language recognition using convolutional neural networks (CNN) and long short-term memory (LSTM). The method uses CNN to extract spatial features from video frames of sign language and LSTM to analyze the temporal characteristics of the frames to recognize the sign. Color segmentation is used to isolate the hands from video frames by detecting colored gloves worn by the signer. CNN is trained on spatial features from frames to classify signs, and LSTM is used to analyze the sequential features from CNN to recognize signs in full videos. The proposed method achieved 94% accuracy on sign recognition in testing.
Hand LightWeightNet: an optimized hand pose estimation for interactive mobile...IJECEIAES
In this paper, a hand pose estimation method is introduced that combines MobileNetV3 and CrossInfoNet into a single pipeline. The proposed approach is tailored for mobile phone processors through optimizations, modifications, and enhancements made to both architectures, resulting in a lightweight solution. MobileNetV3 provides the bottleneck for feature extraction and refinements while CrossInfoNet benefits the proposed system through a multitask information sharing mechanism. In the feature extraction stage, we utilized an inverted residual block that achieves a balance between accuracy and efficiency in limited parameters. Additionally, in the feature refinement stage, we incorporated a new best-performing activation function called “activate or not” ACON, which demonstrated stability and superior performance in learning linearly and non-linearly gates of the whole activation area of the network by setting hyperparameters to switch between active and inactive states. As a result, our network operated with 65% reduced parameters, but improved speed by 39% which is suitable for running in a mobile device processor. During experiment, we conducted test evaluation on three hand pose datasets to assess the generalization capacity of our system. On all the tested datasets, the proposed approach demonstrates consistently higher performance while using significantly fewer parameters than existing methods. This indicates that the proposed system has the potential to enable new hand pose estimation applications such as virtual reality, augmented reality and sign language recognition on mobile devices.
A Gesture Based Digital Art with Colour Coherence Vector AlgorithmIRJET Journal
This document describes a gesture-based digital art system using a color coherence vector algorithm. The system uses gesture recognition to track colored markers on a user's fingers to allow drawing in the air. A Raspberry Pi processes images from a webcam to recognize gestures and display corresponding drawings on an LCD screen in real-time. The system is intended to provide a convenient way to create digital art wirelessly using natural hand gestures.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
This document discusses multi-touch technology. It provides an overview of the history of multi-touch from 1982 onwards. It describes different types of multi-touch technologies including capacitive, resistive, optical and infrared. It outlines common multi-touch gestures and applications. It then focuses on frustrated total internal reflection (FTIR) as a new approach, describing how it works and its advantages over other methods like high resolution, low cost and scalability. It concludes that touch screens are the interface for the 21st century and FTIR provides a simple method for multi-touch displays.
MOUSE SIMULATION USING NON MAXIMUM SUPPRESSIONIRJET Journal
This document describes a system to simulate mouse functions using hand gestures detected by a webcam. The system uses Single Shot Multi-box Detection (SSD) and Non-Maximum Suppression (NMS) algorithms to accurately detect hand positions from images. SSD uses anchor boxes to detect objects within a divided image grid, while NMS eliminates overlapping bounding boxes to identify distinct objects. The system maps detected hand landmarks to mouse cursor positions and can perform functions like dragging and zooming. It aims to enable human-computer interaction without additional hardware by interpreting gestures captured from a webcam.
Sign Language Recognition using Machine LearningIRJET Journal
This document describes a study on sign language recognition using machine learning. The researchers developed a convolutional neural network model to detect hand movements and classify them as letters of the alphabet from sign language. They used a dataset of images of American Sign Language letters and trained their CNN model on this data. Their model was able to accurately recognize the letters in real-time using input from a webcam. The document also discusses using background subtraction and other techniques to improve the model's performance at sign language recognition.
Design of Image Projection Using Combined Approach for TrackingIJMER
Over the years the techniques and methods that have been used to interact with the
computers have evolved significantly. From the primitive use of punch cards to the latest touch screen
panels we can see the vast improvement in interaction with the system. There are many new ways of
projection and interaction technologies that can reshape our perception and interaction
methodologies. Also projection technology is very useful for creating various geometric displays. In
earlier generations, the projector technology was used for projecting images and videos on single
screen, using large and bulky setup. To overcome the earlier limitations we are designing “Wireless
Image Projection Tracking”, which is a system that uses IR (Infrared) technology to track the body in
the IR range and uses their movements for image orientation and manipulations like zoom, tilt/rotate,
and scale. We are presenting a method of mapping IR light source position and orientation to an
image. By using this system we can also track single and multiple IR light source positions and also it
can be used effectively to see the image projection in 3D view. Extension in this technology can further
be useful for future tracking capabilities to implement the touch screen feature for commercial
applications.
Media Control Using Hand Gesture MomentsIRJET Journal
This document discusses a system for controlling media players using hand gestures. The system uses a webcam to capture images of hand gestures. It then uses neural networks trained on large gesture datasets to recognize the gestures. The recognized gestures can control functions of a media player like increasing/decreasing volume, playing, pausing, rewinding and forwarding. The system achieves recognition rates of 90-95% for different gestures. It provides a more natural user interface than keyboards and mice by allowing control through hand movements.
13 9246 it implementation of cloud connected (edit ari)IAESIJEECS
This document describes a proposed system for a smart cloud connected robot that can sense environmental data like temperature and motion using sensors. The robot is controlled by a microcontroller and communicates sensor readings to a private cloud server setup using OpenStack. The cloud server stores the collected sensor data, which can be viewed through a mobile app. This allows environmental changes in a location to be monitored. The robot has been prototyped to demonstrate picking and placing objects and sending sensor readings to the cloud to be analyzed and stored.
Controlling Computer using Hand GesturesIRJET Journal
This document describes a research project on controlling a computer using hand gestures. The researchers created a real-time gesture recognition system using convolutional neural networks (CNNs). They developed a dataset of 3000 training images of 10 different hand gestures for tasks like opening apps. A CNN model was trained to detect hands in images and recognize gestures. The model achieved 80.4% validation accuracy and was able to successfully perform operations like opening WhatsApp, PowerPoint and other apps based on detected gestures in real-time. The system provides a cost-effective and contactless way of interacting with computers using hand gestures only.
Hand gesture recognition becomes a popular topic of deep learning and provides many application fields for bridging the human-computer barrier and has a positive impact on our daily life. The primary idea of our project is a static gesture acquisition from depth camera and to process the input images to train the deep convolutional neural network pre-trained on ImageNet dataset. Proposed system consists of gesture capture device (Intel® RealSense™ depth camera D435), pre-processing and image segmentation algorithms, feature extraction algorithm and object classification. For preprocessing and image segmentation algorithms computer vision methods from the OpenCV and Intel Real Sense libraries are used. The subsystem for features extracting and gestures classification is based on the modified VGG16 by using the TensorFlow&Keras deep learning framework. Performance of the static gestures recognition system is evaluated using maching learning metrics. Experimental results show that the proposed model, trained on a database of 2000 images, provides high recognition accuracy both at the training and testing stages.
Interactive full body motion capture using infrared sensor networkijcga
Traditional motion capture (mocap) has been
well
-
stud
ied in visual science for
the last decades
. However
the fie
ld is mostly about capturing
precise animation to be used in
specific
application
s
after
intensive
post
processing such as studying biomechanics or rigging models in movies. These data set
s are normally
captured in complex laboratory environments with
sophisticated
equipment thus making motion capture a
field that is mostly exclusive to professional animators.
In
addition
, obtrusive sensors must be attached to
actors and calibrated within t
he capturing system, resulting in limited and unnatural motion.
In recent year
the rise of computer vision and interactive entertainment opened the gate for a different type of motion
capture which focuses on producing
optical
marker
less
or mechanical sens
orless
motion capture.
Furtherm
ore a wide array of low
-
cost
device are released that are easy to use
for less mission critical
applications
.
This paper
describe
s
a new technique of using multiple infrared devices to process data from
multiple infrared sensors to enhance the flexibility and accuracy of the markerless mocap
using commodity
devices such as Kinect
. The method involves analyzing each individual sensor
data, decompose and rebuild
them into a uniformed skeleton across all sensors. We then assign criteria to define the confidence level of
captured signal from
sensor. Each sensor operates on its own process and communicates through MPI.
Our method emphasize
s on the need of minimum calculation overhead for better real time performance
while being able to maintain good scalability
Similar to Lift using projected coded light for finger tracking and device augmentation (20)
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 6
Lift using projected coded light for finger tracking and device augmentation
1. Lift: Using Projected Coded Light for Finger
Tracking and Device Augmentation
Shang Ma1,2
, Qiong Liu2
, Chelhwon Kim2
, Phillip Sheu1
EECS, UC Irvine1
Irvine, California
{shangm, psheu@uci.edu}
FX Palo Alto Laboratory2
Palo Alto, California
{liu, kim@fxpal.com}
Abstract—We present Lift, a visible light-enabled finger
tracking and object localization technique that allows users to
perform freestyle multi-touch gestures on any object’s surface in
an everyday environment. By projecting encoded visible patterns
onto an object’s surface (e.g. paper, display, or table), and
localizing the user’s fingers with light sensors, Lift offers users a
richer interactive space than the device’s existing interfaces.
Additionally, everyday objects can be augmented by attaching
sensor units onto their surface to accept multi-touch gesture
input. We also present two applications as proof of concept.
Finally, results from our experiments indicate that Lift can
localize ten fingers simultaneously with an average accuracy of
1.7 millimeter and an average refresh rate of 84 Hz with 31
milliseconds delay on WiFi and 23 milliseconds delay on serial
communication, making gesture recognition on non-
instrumented objects possible.
Keywords—Coded light; projector-based tracking; finger
tracking; device augmentation
I. INTRODUCTION
Multi-touch gesture-based interaction on an instrumented
surface is well suited for a variety of applications because of its
simplicity and intuitiveness. Most of the laptops currently on
the market have inbuilt touchpads providing a user-friendly
gesture interface. For desktop computers, products like
Logitech K400 [1] and T650 [2] are also available to allow
multi-touch interaction besides the traditional mouse and
keyboard interface. These examples and the popularity of the
touchscreen-enabled smartphone demonstrate the usability and
acceptability of such an interface among users. In this work,
we present Lift, a visible light-based finger tracking technique
that employs an off-the-shelf projector and light sensors to
enable direct multi-touch gesture control on any device’s
surface. Fig. 1 presents a simple scenario where a user directly
performs freestyle drawing on an uninstrumented table using
ten fingers.
Lift is implemented by embedding location information
into visible patterns and projecting these patterns onto a target
surface. Light sensors are attached on both a device’s surface
and the user’s fingers for localization and gesture tracking. In
doing so, Lift not only removes the need for a dedicated
touchscreen on the device or an external touchpad for gesture
interaction, but it also enables users to have direct control on
the device’s surface, whether it is originally instrumented or
not. More importantly, Lift features the richness of interaction
space and simplicity in physical size, hardware complexity,
and computation load, meaning Lift is compact and easy-to-use
while allowing users to interact with any device. This is made
possible by using a projection-based interaction method and
encoding position information into each pixel of the projection
area. The perspective of the projection-based encoded position
is desirable for sensing, positioning, and processing in a variety
of scenarios where computation resources are limited. Because
all pixels in the projection area have been encoded with
location information and the only step needed to restore the
corresponding position is to decode a sequence of sensor
readings, Lift can locate and track light sensors without the
need of heavy computation and therefore enable a fast response
in finger tracking and gesture operation in many applications.
This advantage sets it apart from all existing approaches.
II. RELATED WORK
A. Finger Tracking
Finger tracking has been explored by a number of previous
research projects. In particular, Kramer's glove [3] was one of
the first to track a user’s finger postures using a strain gauge
attached to a glove. However, the position of the strain garget
on the glove may shift when the hand changes its pose, which
could reduce tracking accuracy.
Visual tracking approaches, in which the camera is either
the only or the main sensor, are generally considered to be a
dominating technique covering a wide field of applications [4,
5, 6, 7]. The main disadvantages of this technique include the
significant power consumption and being subject to various
factors, such as camera resolution, lighting condition, and the
reflective properties of target surfaces. All of these add a
Fig. 1. (a) 1. A DLP projector behind the ceiling; 2. An office table as
interaction space. (b) The user performed freestyle drawing on a non-
instrumented table and the traces of all fingers were recorded.
(a) (b)
2. substantial amount of infrastructure, cost, and complexity to
achieve the desired effect and can be challenging to implement
in practice.
Sensor-based approaches can also provide a greater range
of data and create good gesture recognition systems. For
example, infrared proximity sensor [8], magnetic sensing [9],
electric field [10], acoustics [11], ultrasound imaging [12],
wrist-worn camera + laser projector [13], ultra-wideband signal
[14], muscle and tendon sensing [15], and data glove [16] have
all been explored in previous studies. However, some of these
systems operate in a relatively short range, and others only
support a small set of hand gestures. In contrast, Lift supports
fine-grained continuous tracking in a much larger physical
space.
B. Encoded Projection
Encoded projection is used in a variety of applications,
including projection calibration [17], tracking movable
surfaces/cars [18, 19, 20], GUI adaptation and device control
[21], and ambient intelligence [22]. These works employ
visible pattern projection through a Digital Light Processing
(DLP) projector to transmit location information and restore
original position data by sensing light intensity via light
sensors. This mechanism leverages one important property of a
DLP projector: the fast-flipping tiny micro mirrors inside can
be used to encode and modulate the projected light.
However, none of these technologies is fast enough to track
multiple fast-moving objects. Lee et al. [18] demonstrated that
an update rate of 11.8 Hz was achieved when 10 patterns were
used to resolve a 32×32 unit grid. Summet et al. [20] projected
7.5 packets per second in their system, which is “just enough to
track slow hand motions, approximately 12.8 cm/sec when 1.5
m from the projector”. Instead, Lift has achieved a tracking
speed of 84 Hz for fingers moving at a speed of 46.1 cm/sec.
Ma et al. [19] used an Android phone to decode received light
signals increasing the refresh rate to 80 Hz. However, in their
system, only one light sensor can be decoded at a time. In
contrast, our implementation has achieved an average update
rate of 84 Hz while tracking ten light sensors simultaneously.
This is a significant improvement over previous works. [23]
and [24] did achieve a higher system update rate, but they
required a specially made beamer for one bit of spatial
resolution. Lift, however, only needs a single off-the-shelf
projector for all 1,039,680 (1140×912) pixels.
III. PROTOTYPE IMPLEMENTAION
To assess the feasibility of our tracking approach, and to
experiment with different interaction applications, we
constructed a prototype platform (Fig. 1 & 2). Our setup
consists of four components: (1) an off-the-shelf DLP projector
from Texas Instruments [25] that can project gray-coded binary
patterns onto the target surface with a projection area of 1380
mm × 860 mm at a distance of 2 m, (2) tiny sensors which will
be attached to a user’s ten fingers and the surface of a target
device, (3) two Arduino MKR1000 boards [26] with batteries,
one for each hand, and (4) two custom printed circuit boards
for basic signal conditioning.
Gray-coded patterns are used to encode every pixel in the
projection area and to provide the unique coordinates so that
when a light sensor detects the presence or absence of the
projected light, a bit sequence representing the coordinates of
this specific pixel can be received and decoded, retrieving the x
and y pixel coordinates. Because projecting images into the
environment is the inbuilt function of our projector, Lift does
not require any augmentation on the projector itself.
Additionally, communication in Lift currently takes place in a
unidirectional fashion (from the projector to light sensors), and
position decoding is performed on the Arduino board.
Therefore, Lift does not need central infrastructure for heavy
computation. This simplicity allows us to maintain a
minimalist system design, while still being able to provide a
rich interaction space on different object surfaces. Finally, the
two Arduino boards used in the current design have inbuilt
WiFi communication modules and therefore make Lift suitable
for mobile applications.
A. Projector & Encoded Patterns
Our current projector has a native resolution of 1140×912
pixels and a projection frequency of 4 kHz. Given that gray-
codes have a O(log2n) relationship between the number of
required patterns and the total number of pixels, n, inside the
projection area, 21 (log2912 + log21140 ) gray-coded images
are needed to uniquely locate each pixel of our interaction
space. Fig. 3 demonstrates an example in which 3 horizontal
(left three) and 3 vertical patterns (right three) are used to
resolve an 8×8 unit grid.
B. Encoding & Decoding Scheme
In previous work [17, 18, 20], the absolute light intensity
value of light sensors is used to determine the bit value for each
projection frame. This design is subject to ambient lighting, the
variance of the light sensors, and the analog-to-digital
converter inside the microcontroller on which the decoding
software is executed. Instead, we use Manchester coding [27]
to transmit the data patterns. That is, each of our 21 gray-coded
images is projected onto the target device followed by its
reverse pattern. This removes the dependency on the DC
voltage of the received light intensity, which can be influenced
severely by lighting conditions and light sensor reception
efficiency. However, this completes the necessary number of
patterns for location discovery from 21 to 42. In addition, we
add 5 framing bits at the beginning of each packet, allowing
sensor units to be synchronized with the data source, further
increasing the necessary patterns to 47. Unsurprisingly, this
Fig. 3. An example of gray code images for an 8×8 grid.
Fig. 2. (a) The user wears ten sensors on his fingers. (b) Electronics
components: light sensors on rings, Arduino and signal conditioning
boards, and batteries. (c) Light sensor units on a target device’s surface.
(a) (b) (c)
3. encoding scheme also helps with the flickering issue of visible
light communication, which is usually due to long runs of
black or white and may cause serious detrimental physiological
changes in humans if the fluctuation in the brightness of the
light is perceivable to human eyes [28]. In Lift, each black or
white bit is followed by its reverse bit, and the number of the
white in a packet is only one more than the number of black.
This design frees the system from fluctuation so that users
always see the projection area as a static image without
flickering (Fig. 1a).
In Lift, two Arduino MKR1000 boards are used to decode
the received light signals. More specifically, five digital I/O
pins on each board are configured as input to evaluate the
voltage level using a threshold approach. The firmware running
on the microcontroller keeps collecting the light intensity every
250 microseconds and restoring the original position data for
all fingers simultaneously.
C. Projector-Light Sensor Homography
Lift is designed to provide fine-grained 2D physical
coordinates of a user’s fingers on an uninstrumented surface.
To obtain physical coordinates of a user’s fingers, the position,
orientation, and optical parameters of the projector relative to
the interaction surface should be known. Consider that there
exists a point (x’, y’) on the DMD (Digital Micromirror
Device) mirror array inside the projector, and it is projected to
a point on a projection screen, for instance a flat table, with
physical coordinates (x, y) (in mm). (Here we assume that the
origin on the table is already known.) The relationship between
(x’, y’) and (x, y) is determined by a planar projective
transformation whose parameters further depend on the
position and orientation of the projector relative to the table,
and the optical parameters of the lens inside the projector.
In Lift, (x’, y’) coordinates can be observed and decoded by
a light sensor while (x, y) can be measured relative to a user-
defined origin. Therefore, the perspective transformation
between the projector plane and the table can be calculated
with the following steps: (1) marking a certain number of
points in the projection area, (2) measuring the distances
between these points and the origin of the physical plane,
which, in our case, is the center of an office table shown in Fig.
4, and (3) collecting the pixel coordinates of these points using
a light sensor. The inbuilt cv::findHomography function in
OpenCV can be used to find the transformation matrix, and this
result could be applied to future sensor readings to obtain the
Euclidean coordinates of a projected point.
IV. EVALUATION & RESULTS
A projector-based finger tracking system offers the
capability to directly compute the location, moving speed and
orientation of ten fingers in a large interaction space. These
quantities can be used in a variety of applications, such as
gaming control, augmented interaction space, and more. Here,
we evaluate several parameters in the system, including: (1)
tracking accuracy, (2) tracking resolution, (3) communication
delay, (4) system refresh rate, and (5) system robustness under
different lighting conditions.
A. Tracking Accuracy
1) Study Design for Discrete Points
To evaluate the position estimation accuracy of our
proposed system for different points, a test arena was
developed as shown in Fig. 4a. In this experiment, the projector
was installed behind the ceiling, and a flat office table was used
as an interaction space. The distance between the projector and
the table is 2 meters, and the projection area on the table is
1380 mm × 860 mm (54.3 inch × 33.85 inch). A 48×36 inch
cutting mat with 0.5 inch grid was used to provide high-
accuracy ground truth data. Since the cutting mat does not
cover the whole projection area, we performed our experiment
using points inside the cutting mat area.
Fig. 4 shows that totally 641
points (black dots at the center
of red circles) were marked for data collecting, and they were
selected to be uniformly distributed in the projection area. A
sensor unit was placed at these points to collect pixel
coordinates. Their physical distances with regard to the center
were also measured and the homography was calculated.
We also collected the pixel locations of another 56 points,
which were different from the 64 points from the first step, and
applied the homography to them. The physical coordinates of
these 56 points relative to the center were measured as ground
truth. We compared the difference of these two quantities for
all 56 points.
2) Results
Fig. 5 shows the results of tracking errors at different points
in the projection space. The difference is defined as the
Euclidean distance between the computed coordinates from
Lift and the ground truth. An average error of 1.707 mm for the
entire projection space was reported, and the standard deviation
was 0.918 mm. This demonstrates that Lift achieves its goal of
millimeter-level finger tracking in practice.
3) Study Design for Continuous Gestures
1
Lift achieves an average accuracy of 1.761 mm if 16 points are used.
About ten seconds is needed for one point during this calibration process.
Fig. 4. (a) A test arena for calculating the homography. (b) Tracking the
user’s gestures using both an Android tablet and Lift.
(a) (b)
Fig. 5. Tracking error of all 56 discrete points.
4. We also evaluated the tracking accuracy for continuous
gestures, as shown in Fig. 4b. We divided the projection area
into 4 × 4 grids, where each one has the size of 267 × 178 mm.
At the center of each grid, we placed an Android tablet
(Google Nexus 9) and asked participants to draw freely using
one of their fingers on the tablet. The screen size of the tablet is
about 180 × 134 mm and the resolution is 2048 × 1536. A
program was developed to obtain the pixel coordinates when
participants made a touch gesture on the screen and to draw the
traces of these touch points as well. Since this location data
represents pixel coordinates on the screen, we converted this
quantity into their corresponding physical locations (in mm) by
scaling the pixel value with the number of pixels per millimeter
and also offset the result with the distance between the origin
on the table and the origin on the tablet, which is the top left
corner in our design. The resulting locations were used as the
ground truth data for this experiment. Simultaneously, the
physical location of finger movements collected by Lift was
also calculated based on sensor readings and the homography.
We also noticed that there is certain offset between the
position of sensor units on fingers and the fingertips for touch
interaction on the tablet. Therefore, we offset all the data points
from Lift with the distance between the first point of touch
interaction and the first point of data collection from Lift. Here,
we assume that this offset is consistent during the entire motion
of a gesture once the gesture starts and we used the physical
coordinates of these two points to calculate their distance.
Six participants were invited for this experiment. Three of
these participants are female, and other three are male; none of
them were provided with any monetary benefits. Each
participant was asked to draw 3 traces in a grid, meaning 192
trials for each participant and 1152 trails in total.
4) Results
To compute the tracking error for continuous gestures, we
calculated the average least perpendicular distance of each
point collected by touch events on the tablet with the
corresponding trajectory formed by the outputs from Lift, and
we performed this measurement for each grid by averaging
across trials. Fig. 6 shows the average tracking error for each
grid. The average continuous tracking error for the whole
interaction area is about 1.631 mm, the standard deviation is
0.35 mm, and the maximum deviation is 1.032 mm.
B. Tracking Resolution
From the same dataset used in previous section, we also
calculated the resolution, PPI (pixels per inch), for each grid of
the projection area. As shown in Fig. 4a, 64 points divide the
cutting mat into 49 grids and the size of a single grid is 6 inch
by 4 inch. We first applied the homography to these 64 points
to get their physical coordinates, and then calculated the
resolution of each grid using (1):
(1)
where w is the width resolution, h is the height resolution in
pixel, and d is the diagonal size in inch. The result is shown in
Fig. 7. Additionally, the average PPI for the whole projection
space is about 25.22.
C. Latency
The total system performance is also affected by latency at
multiple levels. First, the optical packet itself is 11.75
milliseconds long. Second, the microcontrollers add a certain
delay while decoding all the positions and packing data for
transmission. Finally, to make Lift portable and compatible
with existing mobile/IoT devices, we chose WiFi for
communication between Lift and target devices and Open
Sound Control (OSC) [29] is used as the data transmission
protocol because of its simplicity and robustness. This WiFi
connection is the communication bottleneck of our system; the
WiFi bandwidth is shared between two Arduino boards in Lift
with other devices in the testing environment, including
desktops, laptops, mobile phones, and a variety of other
wireless devices. Therefore, we designed the following
experiment to measure the latency of the proposed system.
We used a simple setup to measure the delay between the
time when Lift receives a “start” command for data collection
and the time when a laptop (2.2 GHz CPU, 8 GB memory)
receives the decoded position through the WiFi connection and
applies the homography on the sensor readings to get the final
position. Fig. 8a demonstrates our setup for this experiment.
A program was developed to run on a laptop recording a
timestamp of a keypress event from the keyboard. Whenever
the user pressed the “space” key, the corresponding timestamp
was recorded by the program, which also notified Lift to start
the position detection process through serial communication
running at 115200 baud. In turn, Lift will send position data of
all ten fingers to the same laptop through WiFi connection
once it finishes data processing. The time interval between this
program detecting the keyboard event and it computing final
finger positions was considered as the system latency.
Fig. 7. The projection area was divided into 6 inch × 4 inch grids, 49
grids in total. The resolution (pixel per inch) was calculated for each grid.
Fig. 8. Experiment setup to evaluate (a) system latency, (b, c) system
refresh rate, and (d) system reliability under different light condition.
(a) (b) (c) (d)
Fig. 6. Average tracking error of continuous gestures for the 16 grids.
5. The same six participants were invited to collect 1200
groups of time difference, and an average latency of 31
milliseconds was reported while using WiFi connection. It is
worth mentioning that when serial communication (115200
baud) was used to send data from Lift to the host PC, the delay
decreased to 23 milliseconds.
D. Refresh Rate
1) Study Design
To ensure smooth finger tracking, the maximum speed that
fingers can move in the space also plays an important role in a
finger tracking system. The same six participants were invited
for the following evaluation to find out the number of positions
that Lift can decode at different movement speeds.
The setup of this evaluation is shown in Fig. 8b & 8c, and
the procedure is demonstrated as follows. All participants were
asked to wear 10 sensor units on their fingers and touch two
touchpads (29 mm by 16 mm) in a fixed order (left one first,
then right one) at different speed modes: normal, medium, and
as fast as possible. The time when these two touch sensors are
activated is used as timestamps indicating the beginning and
ending of a single test. Based on the time difference and the
distance between these two pads, the movement speed of user’s
finger can be measured. To make our experiment as accurate as
possible, the participants were asked to move their fingers in a
straight line. They were given a demonstration of the system
first and were allowed to practice with it. Once they were
familiar with the procedure, we started the experiment.
In our experiment, the distance between these two pads was
chosen to be 20 cm, which is long enough so that participants
have to move their fingers for a measurable period of time, but
not too long that they have to move or stretch their body to
access the second touchpad. The time interval between two
touch events was detected by an Arduino microcontroller and
sent to a laptop through serial communication (115200 baud)
for data logging. A program running on the same laptop was
used to count the number of position packages Lift collects
through WiFi during this time interval. Each speed case was
repeated 10 times. Participants can use any finger they wish for
the touch action, but they have to use the same finger during a
single test. 30 tests were performed for each user and 180 tests
in total for all six participants. Participants can take a rest and
change to another finger between the tests.
2) Results
The average moving time between these two touchpads is
955.504, 547.692, and 435.105 milliseconds for the three speed
modes respectively, shown in Fig. 9. Fig. 9 also shows average
speeds, v, calculated by (1):
(2)
which are 211, 368, and 461 mm/second. The average number
of position packages received at these three different speeds is
84.365, 84.424, and 84.087, and the standard deviation is
0.179, 0.325, and 0.170, respectively. Each package contains
the position data of all ten fingers. This agrees with the
theoretical maximum update rate of Lift system, which is 85.1
Hz and decided by the total number of patterns our projector
can send every second and the length of one position packet.
This quantity, r, can be calculated simply by (2):
(3)
The results demonstrate that Lift can maintain high refresh rate
even when users move their fingers at a high speed.
E. Lighting Conditions
1) Study Design
We also tested the robustness of our current
implementation under different lighting conditions by putting a
light sensor at a known place and logging its decoded
coordinates at different indoor ambient light (Fig. 8d). The
light intensity was measured by a light meter placed next to the
sensor unit. The main metric we used here to quantify the
system reliability is the percentage of correct readings.
2) Result
Table I below shows the percentage of correct readings at
different lighting conditions. The first column is the light
intensity of the environment under different conditions. Since
the current implementation of Lift is based on a visible light
projector, the projector used in the system will increase the
light intensity of the projection area by a certain range. Here,
the second column in the table shows the light intensity of the
projection space after Lift is activated. The results indicate that
our tracking technique can work reliably when ambient light is
in the range of 0 Lux to 345 Lux. However, if the environment
is brighter than this threshold, Lift fails to work. We will
explore how to solve this problem in section VI.
TABLE I. LOCALIZATION ACCURACY UNDER DIFFERENT LIGHTING
CONDITIONS
Ambient Light Intensity
(Lux)
Ambient Light + Lift
Intensity (Lux)
Accuracy
Percentage (%)
21 150 100
232 384 100
312 444 100
336 464 98.7
345 472 94.4
349 482 22
355 487 0
400 538 0
483 614 0
Fig. 9. Refresh rate of Lift at different movement speeds.
6. V. EXAMPLE APPLICATION
As part of our investigations, we also implemented demo
applications on two interaction surfaces with different form
factors and input styles. For the tablet-size application, we
developed a simple photo browsing album on a non-
touchscreen laptop (Fig. 10). Users can navigate through a
collection of pictures on an uninstrumented display directly
using multi-touch gestures. We also implemented a ten-finger
freestyle drawing application (Fig. 1), with which a user can
draw any freeform traces on a common office table.
A. Gesture Tracking on An Uninstrumented Display
For the album application, a projector was used to project
the aforementioned patterns onto a table with the non-
touchscreen laptop inside the projection area. Four light
sensors were fixed at the corners of the display of the laptop,
one in each corner, indicating the boundary of the target device
(Fig. 2c). A user was then asked to wear two sensor units, one
on the index finger and the other on the middle finger (Fig. 10).
As a proof of concept, we designed four simple multi-touch
gestures: pan, zoom in/out, and rotation. To pan an image that
is larger than the display, the user moves two fingers on the
screen at the same time to view different areas of the image
(Fig. 10a). As the fingers move, their position data are saved,
and decoded. The distance the data travels is used to drag the
image around. The rotation gesture is similar to Mac or iPhone
products, where users can move two fingers around each other
to rotate an image (Fig. 10b). To zoom in/out of images, users
need to put two fingers on the screen and pinch them to zoom
in or out (Fig. 10c). Other gestures, such as swipe and scroll,
can be implemented based on these sensor data as well.
B. Freestyle Drawing on An Uninstrumented Tabletop
Another application we have implemented is a freestyle
drawing tool. Thanks to our projection-based tracking
mechanism, it gives us a very large interaction area: 1380 mm
× 860 mm at a projection distance of 2 m (Fig. 1a). However,
this can be further increased by adding a wide angle lens in
front of the projector to allow a much larger interaction space,
such as a wall. We will leave this for future explorations.
In this experiment, ten light sensors were attached to a
user’s ten fingers. The user can freely move his fingers in the
interaction space at a relatively high speed. The output signals
of all these sensors were then collected and decoded on the two
Arduino boards, one for each hand, and sent to a laptop
computer for visualization purpose. A GUI application was
used to display the traces of the user’s ten fingers, which are
color coded as well (Fig. 1b).
VI. LIMITATIONS & FUTURE DIRECTIONS
The experimental results suggest that projection-based
location discovery is a promising approach for fine-grained
finger tracking in different interaction scenarios. However, our
study and exemplary applications brought to light several
limitations and challenges, which include the following.
A. Interference from ambient light
Our current implementation projects binary patterns
through visible light. To make position decoding possible, the
light sensor we use, TSL13T, is more responsive to visible
light, around 750 nm. However, this also brings up the problem
of interference from ambient light. In section IV, we have
tested the proposed system in different lighting conditions and
found that Lift fails to work with strong ambient light. This is,
however, not a fundamental limitation of our approach and can
be addressed by modulating the visible light with high
frequency carriers, which would allow us to track fingers under
various challenging conditions.
B. Interacting with Projected Content
The projection-based finger tracking mechanism only
leverages the pattern projection mode of the projector. But the
projector itself also supports video projection as a normal video
projector. We envision a scenario where users can directly and
intuitively manipulate projected multimedia contents using
Lift. This application combines video projection and pattern
projection modes, providing finger tracking, content
manipulation, and visual feedback all at the same time. We will
leave the development of such an application for future work.
C. 2D versus 3D
Currently, Lift only supports 2D gesture interaction,
meaning users need to place their fingers on top of the
calibrated surface and keep all the light sensors facing upwards
before performing any gestures. We are interested in exploring
3D finger tracking using projected patterns. In principle, 3D
depth information can be available if multiple projectors are
used. We are also curious about how different types of inertial
sensors can be incorporated into the system to automatically
detect the starting and ending of gestures.
VII. CONCLUSION
We introduce Lift, a coded visible light-based object
localization and finger tracking technique in an everyday
environment. Use of projected coded light enables
computation-efficient finger tracking with an extended
interaction space on any device’s surface. We evaluated our
design through a series of experiments, validating that Lift can
track ten fingers simultaneously with high accuracy (1.7 mm),
high refresh rate (84 Hz), and low latency (31 ms for WiFi, and
23 ms for serial) under different ambient light conditions. We
also developed two applications where Lift makes use of the
projection area on uninstrumented surfaces to track fast
moving fingers and perform gesture recognition to control the
devices. With this work, we successfully demonstrate that Lift
holds significant promise for future research direction.
Fig. 10. With Lift, a user can (a) pan an image, (b) rotate an image, (c)
zoom in/out an image on an uninstrumented display.
(a) (b) (c)
7. REFERENCES
[1] Wireless Touch Keyboard K400. http://www.logitech.com/en-
us/product/wireless-touch-keyboard-k400r
[2] Wireless Rechargeable Touchpad T650 http://support.logitech.com/en_
us/product/touchpad-t650
[3] J. P. Kramer, P. Lindener, and W. R. George, “Communication system
for deaf, deaf-blind, or non-vocal individuals using instrumented glove,”
U.S. Patent 5,047,952. Filed October 14, 1988, issued September 10,
1991.
[4] W. Hürst, and C. Van Wezel, “Gesture-based interaction via finger
tracking for mobile augmented reality,” Multimedia Tools and
Applications, Vol. 62, No. 1, pp. 233-258, 2013.
[5] H. Koike, Y. Sato, and Y. Kobayashi, “Integrating paper and digital
information on EnhancedDesk: a method for realtime finger tracking on
an augmented desk system,” ACM Trans. Computer-Human Interaction,
Vol. 8, No. 4, pp. 307-322, 2001.
[6] S. Henderson and S. Feiner, “Opportunistic tangible user interfaces for
augmented reality,” IEEE Trans. Visualization and Computer Graphics,
Vol. 16, No. 1, pp. 4-16, 2010.
[7] C. Harrison, H. Benko, and A. D. Wilson, “OmniTouch: wearable
multitouch interaction everywhere,” 24th Annual ACM Symp. User
Interface Software and Technology, p. 441-450, 2011.
[8] W. Kienzle and K. Hinckley, “LightRing: always-available 2D input on
any surface,” 27th Annual ACM Symp. User Interface Software and
Technology, p. 157-160, 2014.
[9] K. Y. Chen, S. Patel, and S. Keller, “Finexus: Tracking Precise Motions
of Multiple Fingertips Using Magnetic Sensing,” CHI Conf. Human
Factors in Computing Systems, p. 1504-1514, 2016.
[10] M. Le Goc, S. Taylor, S. Izadi, and C. Keskin, “A low-cost transparent
electric field sensor for 3d interaction on mobile devices,” CHI Conf.
Human Factors in Computing Systems p. 3167-3170, 2014.
[11] A. Mujibiya, X. Cao, D. S. Tan, D. Morris, S. N. Patel, and J. Rekimoto,
“The sound of touch: on-body touch and gesture sensing based on
transdermal ultrasound propagation,” ACM Conf. Interactive tabletops
and surfaces, p. 189-198, 2013.
[12] M. Sagardia, K. Hertkorn, D. S. González, and C. Castellini,
“Ultrapiano: A novel human-machine interface applied to virtual
reality,” IEEE Conf. Robotics and Automation, p. 2089, 2014.
[13] D. Kim, O. Hilliges, S. Izadi, A. Butler, J. Chen, I. Oikonomidis, and P.
Olivier, “Digits: freehand 3D interactions anywhere using a wrist-worn
gloveless sensor,” 25th Annual ACM Symp. User Interface Software
and Technology, p. 167-176, 2012.
[14] S. Wang, J. Song, J. Lien, I. Poupyrev, and O. Hilliges, “Interacting with
soli: Exploring fine-grained dynamic gesture recognition in the radio-
frequency spectrum,” 29th Annual ACM Symp. User Interface Software
and Technology, p. 851-860, 2016.
[15] S. Scott, D. S. Tan, D. Morris, R. Balakrishnan, J. Turner, and J. A.
Landay, “Enabling always-available input with muscle-computer
interfaces,” 22nd Annual ACM Symp. User Interface Software and
Technology, p. 167-176, 2009.
[16] J.H. Kim, N.D. Thang, and T.S. Kim, “3-d hand motion tracking and
gesture recognition using a data glove,” IEEE Symp. Industrial
Electronics, p. 1013-1018, 2009.
[17] J. C. Lee, P. H. Dietz, D. Maynes-Aminzade, R. Raskar, and S. E.
Hudson, “Automatic projector calibration with embedded light sensors,”
17th Annual ACM Symp. User Interface Software and Technology, p.
123-126, 2004.
[18] J. C. Lee, S. E. Hudson, J. W. Summet, and P. H. Dietz, “Moveable
interactive projected displays using projector based tracking,” 18th
Annual ACM Symp. User Interface Software and Technology, p. 63-72,
2005.
[19] S. Ma, Q. Liu, and P. Sheu, “On hearing your position through light for
mobile robot indoor navigation,” IEEE Conf. Multimedia and Expo
Workshops, 2016.
[20] J. Summet, and R. Sukthankar, “Tracking locations of moving hand-held
displays using projected light,” Conf. Pervasive Computing, p. 37-46,
2005.
[21] D. Schmidt, D. Molyneaux, and X. Cao, “PICOntrol: using a handheld
projector for direct control of physical devices through visible light,”
24th Annual ACM Symp. User Interface Software and Technology, p.
379-388, 2012.
[22] R. Raskar, P. Beardsley, J. Van Baar, Y. Wang, P. Dietz, J. Lee, D.
Leigh, and T. Willwacher, “RFIG lamps: interacting with a self-
describing world via photosensing wireless tags and projectors,” ACM
Trans. Graphics, Vol. 23, No. 3, pp. 406-415, 2004.
[23] R. Raskar, H. Nii, B. Dedecker, Y. Hashimoto, J. Summet, D. Moore, Y.
Zhao, J. Westhues, P. Dietz, J. Barnwell, and S. Nayar, “Prakash:
lighting aware motion capture using photosensing markers and
multiplexed illuminators,” ACM Trans. Graphics, Vol. 26, No. 3, pp. 36,
2007.
[24] J. Kim, G. Han, IJ. Kim, H. Kim, and S.C. Ahn, “Long-Range Hand
Gesture Interaction Based on Spatio-temporal Encoding,” 5th Conf.
Distributed, Ambient and Pervasive Interactions, p. 22-31, 2013.
[25] TI DLP Product. http://www.ti.com/lsds/ti/dlp/advanced-light-control/
products.page
[26] Arduino MKR1000. https://www.arduino.cc/en/Main/ArduinoMKR1000
[27] Manchester code. https://en.wikipedia.org/wiki/Manchester_code
[28] S. M. Berman, D. S. Greehouse, I. L. Bailey, R. D. Clear, and T. W.
Raasch, “Human electroretinogram responses to video displays,
fluorescent lighting, and other high frequency sources,” Optometry &
Vision Science, Vol. 68, No. 8, pp. 645-662, 1991.
[29] M. Wright, and A. Freed, “Open sound control: A new protocol for
communicating with sound synthesizers,” Conf. Computer Music, Vol.
2013, No. 8, p. 10, 1997.