Paper from CHI 2008 Proceedings - Improved Video Navigation and Capture.
It present a method for browsing videos by directly dragging their content, in contrast with traditional seeker bar which focus on visual content rather than time.
This document discusses a technique for detecting copy-paste or cut-paste digital image forgeries. The technique uses wavelet decomposition to decompose the input image into sub-bands, including low-low, low-high, high-low, and high-high. Edge detection is then performed to extract edges. The cut-paste or copy-paste region is identified by examining edge pixels in the wavelet domain, as this region is normally rectangular or square. Parameters like entropy, power energy, and standard deviation are compared between the input image and suspected forged images to detect forgeries. The technique was tested on images with copy-pasted areas and could accurately identify the forged regions.
Generating 3 d model in virtual reality and analyzing its performanceijcsit
In this paper is presented an virtual environment of a real model. Here are given all analyzes for
making and vizualization of virtual environment in Quest3D. All analyzes of performance of the system in
real time is presented.We described advantages and disadvantages of interactions in virtual environment
and made a critical analysis on a rendering speed and quality on different machines
This document provides an overview of a computer vision course, including administrative details, topics, and expectations. The instructor is Guodong Guo from UW-Madison. Key topics covered include computer vision fundamentals and applications, publications in top journals and conferences, and course requirements such as homework, exams, and a final project. Meeting times are on Mondays from 5-7:30pm and the instructor's office hours are Tuesdays and Thursdays from 1-2pm.
This document summarizes a survey paper on collaborative work in augmented reality. The paper reviews 65 papers on AR and CSCW systems published between 2008-2019. It introduces fundamental concepts of AR and CSCW, provides a taxonomy of AR-CSCW systems based on time, space, roles and technology used. The paper survey analyzes examples of both asynchronous and synchronous collaboration in spatial, temporal dimensions. It also discusses design considerations and remaining research challenges in collaborative AR systems.
Unsupervised object-level video summarization with online motion auto-encoderNEERAJ BAGHEL
Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day.
Author investigate a pioneer research direction towards the unsupervised object-level video summarization.
It can be distinguished from existing pipelines in two aspects:
Extracting key motions of participated objects
Learning to summarize in an unsupervised and online manner.
TOP 5 Most View Article From Academia in 2019sipij
TOP 5 Most View Article From Academia in 2019
Signal & Image Processing : An International Journal (SIPIJ)
ISSN : 0976 - 710X (Online) ; 2229 - 3922 (print)
http://www.airccse.org/journal/sipij/index.html
Recent articles published in Signal & Image Processing: An InternationalJourn...sipij
Signal & Image Processing : An International Journal is an Open Access peer-reviewed journal intended for researchers from academia and industry, who are active in the multidisciplinary field of signal & image processing. The scope of the journal covers all theoretical and practical aspects of the Digital Signal Processing & Image processing, from basic research to development of application.
This document discusses a technique for detecting copy-paste or cut-paste digital image forgeries. The technique uses wavelet decomposition to decompose the input image into sub-bands, including low-low, low-high, high-low, and high-high. Edge detection is then performed to extract edges. The cut-paste or copy-paste region is identified by examining edge pixels in the wavelet domain, as this region is normally rectangular or square. Parameters like entropy, power energy, and standard deviation are compared between the input image and suspected forged images to detect forgeries. The technique was tested on images with copy-pasted areas and could accurately identify the forged regions.
Generating 3 d model in virtual reality and analyzing its performanceijcsit
In this paper is presented an virtual environment of a real model. Here are given all analyzes for
making and vizualization of virtual environment in Quest3D. All analyzes of performance of the system in
real time is presented.We described advantages and disadvantages of interactions in virtual environment
and made a critical analysis on a rendering speed and quality on different machines
This document provides an overview of a computer vision course, including administrative details, topics, and expectations. The instructor is Guodong Guo from UW-Madison. Key topics covered include computer vision fundamentals and applications, publications in top journals and conferences, and course requirements such as homework, exams, and a final project. Meeting times are on Mondays from 5-7:30pm and the instructor's office hours are Tuesdays and Thursdays from 1-2pm.
This document summarizes a survey paper on collaborative work in augmented reality. The paper reviews 65 papers on AR and CSCW systems published between 2008-2019. It introduces fundamental concepts of AR and CSCW, provides a taxonomy of AR-CSCW systems based on time, space, roles and technology used. The paper survey analyzes examples of both asynchronous and synchronous collaboration in spatial, temporal dimensions. It also discusses design considerations and remaining research challenges in collaborative AR systems.
Unsupervised object-level video summarization with online motion auto-encoderNEERAJ BAGHEL
Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day.
Author investigate a pioneer research direction towards the unsupervised object-level video summarization.
It can be distinguished from existing pipelines in two aspects:
Extracting key motions of participated objects
Learning to summarize in an unsupervised and online manner.
TOP 5 Most View Article From Academia in 2019sipij
TOP 5 Most View Article From Academia in 2019
Signal & Image Processing : An International Journal (SIPIJ)
ISSN : 0976 - 710X (Online) ; 2229 - 3922 (print)
http://www.airccse.org/journal/sipij/index.html
Recent articles published in Signal & Image Processing: An InternationalJourn...sipij
Signal & Image Processing : An International Journal is an Open Access peer-reviewed journal intended for researchers from academia and industry, who are active in the multidisciplinary field of signal & image processing. The scope of the journal covers all theoretical and practical aspects of the Digital Signal Processing & Image processing, from basic research to development of application.
October 202:top read articles in signal & image processingsipij
Signal & Image Processing : An International Journal is an Open Access peer-reviewed journal intended for researchers from academia and industry, who are active in the multidisciplinary field of signal & image processing. The scope of the journal covers all theoretical and practical aspects of the Digital Signal Processing & Image processing, from basic research to development of application.
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper62.pdf
YouTube: https://youtu.be/gV-rvV3iFDA
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification.
Presented by: Pierre-Etienne Martin
Top Cited Articles in Computer Graphics and Animationijcga
The document summarizes research on using animation and blended learning to teach children. It describes an experimental study conducted in a primary school in Dhaka, Bangladesh. Students were taught the solar system using three different methods: traditional teaching, visual learning materials only, and a blended approach combining visual materials and teacher instruction. Questionnaires assessed student understanding after each method. The results showed the blended approach greatly improved student ability to acquire knowledge and skills compared to the other methods. The research concludes interactive blended learning may be an effective teaching method for school children.
Sparse representation in image and video copy detectionHuan-Cheng Hsu
This document discusses using sparse representation for image and video copy detection. It begins by introducing the problem of identifying duplicated images and videos online given various manipulations. It then reviews existing techniques using watermarking or feature extraction and compares them to sparse representation, which describes images based on natural sparsity. The document outlines applying sparse representation to detect copies, presents experimental settings testing various distortions on image datasets, and compares performance to other methods. It concludes sparse representation enables efficient yet accurate copy detection and discusses future work applying it to large-scale image retrieval.
This document describes a Pong-inspired interactive art installation called Pong Sketch Two Project. Users can interact with a digitally projected bouncing ball using their full body motions, which are tracked via video camera. Their silhouettes and the ball are projected on a screen. Users can pass the ball back and forth or trap it in different ways. The installation aims to create an engaging interactive experience through whole-body gestures without restrictions of wires or hardware.
This is a guest lecture given by Mark Billinghurst at the University of Sydney on March 27th 2024. It discusses some future research directions for Augmented Reality.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
This document describes an experimental evaluation of different user interfaces for visual indoor navigation. The study compared augmented reality (AR) to virtual reality (VR) and found that VR was faster and seemed more accurate to users. It also tested a feature indicator and found it increased the number of identifiable features in images. Finally, it evaluated object highlighting and found a soft border version was less distracting than a framed version. The novel user interfaces improved localization accuracy and were more effective and popular than traditional AR interfaces.
Development Prototype Design of Virtual Assembly Application-Based Leap MotionIJAEMSJORNAL
Innovation in design engineering practice is very important in the world of manufacturing in the increasingly competitive global market. Prototyping and evaluation measures are inseparable from the design process in the manufacture of a product. And made one of many physical prototypes require very expensive and time consuming, so the technology of Virtual Reality (VR) is needed, so the industry can quickly and precisely in the decision. VR technology combines a human being with a computer environment visually, touch and hearing, so that the user as if into the virtual world. The goal is that users with hand movements can interact with what is displayed on the computer screen or the user can interact with the environment is unreal to be added into the real world. VR is required for simulations that require a lot of interaction such as prototype assembly methods, or better known as the Virtual Assembly. Virtual Assembly concept which was developed as the ability to assemble a real representation of the physical model, the 3D models in CAD software by simulating the natural movement of the human hand. Leap Motion (accuracy of 0.01mm) was used to replace Microsoft's Kinect (accuracy of 1.5cm) and Motion Glove with flex sensors (accuracy of 1°) in several previous research. Leap mot ion controller is a device that captures every movement of the hand to then be processed and integrated with 3D models in CAD software. And simulation of assembly process virtually in CA D software with hand gestures detected by the leap mot ion, assembly parts can be driven either in translation or rotation, zooming and adding the assembly constraint. It also can perform mouse functions (such as left-click, middle-click, right-click and move the mouse cursor position) to a virtual assembly process simulation on CAD software.
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNINGIRJET Journal
The document discusses human posture and mannerism identification using deep learning. It describes how a deep learning model can be trained on media data from humans to recognize certain personalities based on their postures. This could then be used for applications like home security systems to prevent unwanted entrance. The document reviews several related works that use techniques like pose estimation, skeleton modeling, and deep neural networks to identify body joints and estimate full-body poses from images and video. Accurately estimating poses of multiple people remains a challenge, but recent methods using techniques like part affinity fields have achieved real-time performance with good accuracy.
IRJET- Application of MCNN in Object DetectionIRJET Journal
This document discusses using a multi-column convolutional neural network (MCNN) for object detection in videos. The MCNN approach is compared to other methods like CNN and HOG-BOW-Gray pooling and is shown to achieve over 95% accuracy for pedestrian detection. The document outlines extracting frames from videos, dividing images into regions, classifying regions using CNNs, and combining results to detect objects. The MCNN approach is concluded to be useful for applications like medical imaging due to its high detection accuracy.
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...ijcsa
This document discusses algorithms for automatically generating 3D video views from a single 2D image of both indoor and outdoor scenes. For indoor scenes, it segments the floor to determine the termination point for video generation. For outdoor scenes, it detects the vanishing point, which is used to calculate the distance to the termination point. The algorithms crop the input image to generate frames as it navigates up to the termination point, creating the effect of a 3D video view from a single 2D image with no need for human intervention. Experimental results on over 250 images demonstrated the effectiveness of the proposed methods.
1) The document discusses using data in deep learning models, including understanding the limitations of data and how it is acquired.
2) It describes techniques for image matching using multi-view geometry, including finding corresponding points across images and triangulating them to determine camera pose.
3) Recent works aim to improve localization of objects in images using multiple instance learning approaches that can learn without full supervision or through more stable optimization methods like linearizing sampling operations.
Final lecture from the COMP 4010 course on Virtual and Augmented Reality. This lecture was about Research Directions in Augmented Reality. Taught by Mark Billinghurst on November 1st 2016 at the University of South Australia
The document presents a project report on machine learning. It discusses several projects completed including implementing neural networks to compute averages, extracting histogram of joints features, and developing a gesture recognition system using Hidden Markov Models. The gesture recognition system uses a Kinect sensor to capture skeleton data, extracts features, builds a codebook using clustering, trains HMM models for each gesture, and achieves over 85% accuracy on a dataset of 15 gestures. Future work to improve the system is also outlined.
This document provides an activity and research report for Marco Cagnazzo from September 2013. It summarizes his teaching activities from 2004-present, which include courses on information theory, multimedia signal processing, compression techniques, and digital video/multimedia at various universities. It also lists his PhD student supervision and involvement in research projects related to video coding optimization, adaptive image compression, and robust video streaming. His main research themes are described as motion representation, 3D video coding, and distributed video coding. Bibliometric data on his publications and other scholarly activities are also presented.
Cartoonization of images using machine LearningIRJET Journal
The document presents a method for cartoonization of images using machine learning. It discusses converting real-world photos into cartoon images using a GAN-based approach. The key steps include:
1. Importing required modules like OpenCV, NumPy for image processing and GAN modeling.
2. Pre-processing input images by converting them to grayscale, smoothing, and edge detection.
3. Training a GAN using cartoon and photo images to generate new cartoon images.
4. For video cartoonization, frames are extracted from videos using OpenCV, individually cartoonized using the GAN, and reconstructed into a cartoon video.
The proposed system is able to convert images and videos to cartoon style in real-time using deep learning
1. The document discusses analyzing videos with convolutional neural networks (CNNs). It covers techniques like video recognition using CNNs to classify video content at the clip level.
2. DeepVideo and C3D are discussed as approaches for video recognition using CNNs. DeepVideo employs 2D CNNs on multiple video frames while C3D uses 3D CNNs to learn spatiotemporal features directly from video data.
3. Optical flow estimation techniques like DeepFlow are also covered, which uses a deep matching approach to compute dense correspondences between video frames for large displacements.
This document is a resume for Amit Sethi summarizing his professional experience and qualifications. It outlines his objective to obtain a research and development position in industry. It then details his education at the University of Illinois at Urbana-Champaign where he is pursuing a PhD in Electrical and Computer Engineering, as well as a previous degree from Indian Institute of Technology. His experience includes research in machine learning, computer vision, and video processing. He has several publications and awards and is skilled in programming languages such as C++.
This document discusses lean software development principles. It introduces agile software development processes and the agile manifesto. Lean software development is then discussed, which comes from the Toyota Production System and uses a set of principles and tools to achieve quality, speed and customer alignment. The 7 principles of lean thinking are outlined: 1) eliminate waste, 2) amplify learning, 3) decide as late as possible, 4) deliver as fast as possible, 5) empower the team, 6) build integrity in, and 7) see the whole. Each principle is then explained in more detail with examples related to software development.
The document discusses the future of the IT market in Thailand, outlining how technology has evolved from mainframe computers and personal computers to the modern era of mobile devices and internet access. It notes several app and mobile technology trends like location-based services, augmented reality, social networking, and cloud computing. Data shows the rapid growth of Thailand's mobile apps market from 2009 to 2011. The conclusion suggests that in the future, more technologies will evolve from mobile devices, including the "Internet of Things".
More Related Content
Similar to Video Browsing By Direct Manipulation - Draft 1
October 202:top read articles in signal & image processingsipij
Signal & Image Processing : An International Journal is an Open Access peer-reviewed journal intended for researchers from academia and industry, who are active in the multidisciplinary field of signal & image processing. The scope of the journal covers all theoretical and practical aspects of the Digital Signal Processing & Image processing, from basic research to development of application.
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
Paper: http://ceur-ws.org/Vol-2882/paper62.pdf
YouTube: https://youtu.be/gV-rvV3iFDA
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online.
This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification.
Presented by: Pierre-Etienne Martin
Top Cited Articles in Computer Graphics and Animationijcga
The document summarizes research on using animation and blended learning to teach children. It describes an experimental study conducted in a primary school in Dhaka, Bangladesh. Students were taught the solar system using three different methods: traditional teaching, visual learning materials only, and a blended approach combining visual materials and teacher instruction. Questionnaires assessed student understanding after each method. The results showed the blended approach greatly improved student ability to acquire knowledge and skills compared to the other methods. The research concludes interactive blended learning may be an effective teaching method for school children.
Sparse representation in image and video copy detectionHuan-Cheng Hsu
This document discusses using sparse representation for image and video copy detection. It begins by introducing the problem of identifying duplicated images and videos online given various manipulations. It then reviews existing techniques using watermarking or feature extraction and compares them to sparse representation, which describes images based on natural sparsity. The document outlines applying sparse representation to detect copies, presents experimental settings testing various distortions on image datasets, and compares performance to other methods. It concludes sparse representation enables efficient yet accurate copy detection and discusses future work applying it to large-scale image retrieval.
This document describes a Pong-inspired interactive art installation called Pong Sketch Two Project. Users can interact with a digitally projected bouncing ball using their full body motions, which are tracked via video camera. Their silhouettes and the ball are projected on a screen. Users can pass the ball back and forth or trap it in different ways. The installation aims to create an engaging interactive experience through whole-body gestures without restrictions of wires or hardware.
This is a guest lecture given by Mark Billinghurst at the University of Sydney on March 27th 2024. It discusses some future research directions for Augmented Reality.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
This document describes an experimental evaluation of different user interfaces for visual indoor navigation. The study compared augmented reality (AR) to virtual reality (VR) and found that VR was faster and seemed more accurate to users. It also tested a feature indicator and found it increased the number of identifiable features in images. Finally, it evaluated object highlighting and found a soft border version was less distracting than a framed version. The novel user interfaces improved localization accuracy and were more effective and popular than traditional AR interfaces.
Development Prototype Design of Virtual Assembly Application-Based Leap MotionIJAEMSJORNAL
Innovation in design engineering practice is very important in the world of manufacturing in the increasingly competitive global market. Prototyping and evaluation measures are inseparable from the design process in the manufacture of a product. And made one of many physical prototypes require very expensive and time consuming, so the technology of Virtual Reality (VR) is needed, so the industry can quickly and precisely in the decision. VR technology combines a human being with a computer environment visually, touch and hearing, so that the user as if into the virtual world. The goal is that users with hand movements can interact with what is displayed on the computer screen or the user can interact with the environment is unreal to be added into the real world. VR is required for simulations that require a lot of interaction such as prototype assembly methods, or better known as the Virtual Assembly. Virtual Assembly concept which was developed as the ability to assemble a real representation of the physical model, the 3D models in CAD software by simulating the natural movement of the human hand. Leap Motion (accuracy of 0.01mm) was used to replace Microsoft's Kinect (accuracy of 1.5cm) and Motion Glove with flex sensors (accuracy of 1°) in several previous research. Leap mot ion controller is a device that captures every movement of the hand to then be processed and integrated with 3D models in CAD software. And simulation of assembly process virtually in CA D software with hand gestures detected by the leap mot ion, assembly parts can be driven either in translation or rotation, zooming and adding the assembly constraint. It also can perform mouse functions (such as left-click, middle-click, right-click and move the mouse cursor position) to a virtual assembly process simulation on CAD software.
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNINGIRJET Journal
The document discusses human posture and mannerism identification using deep learning. It describes how a deep learning model can be trained on media data from humans to recognize certain personalities based on their postures. This could then be used for applications like home security systems to prevent unwanted entrance. The document reviews several related works that use techniques like pose estimation, skeleton modeling, and deep neural networks to identify body joints and estimate full-body poses from images and video. Accurately estimating poses of multiple people remains a challenge, but recent methods using techniques like part affinity fields have achieved real-time performance with good accuracy.
IRJET- Application of MCNN in Object DetectionIRJET Journal
This document discusses using a multi-column convolutional neural network (MCNN) for object detection in videos. The MCNN approach is compared to other methods like CNN and HOG-BOW-Gray pooling and is shown to achieve over 95% accuracy for pedestrian detection. The document outlines extracting frames from videos, dividing images into regions, classifying regions using CNNs, and combining results to detect objects. The MCNN approach is concluded to be useful for applications like medical imaging due to its high detection accuracy.
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...ijcsa
This document discusses algorithms for automatically generating 3D video views from a single 2D image of both indoor and outdoor scenes. For indoor scenes, it segments the floor to determine the termination point for video generation. For outdoor scenes, it detects the vanishing point, which is used to calculate the distance to the termination point. The algorithms crop the input image to generate frames as it navigates up to the termination point, creating the effect of a 3D video view from a single 2D image with no need for human intervention. Experimental results on over 250 images demonstrated the effectiveness of the proposed methods.
1) The document discusses using data in deep learning models, including understanding the limitations of data and how it is acquired.
2) It describes techniques for image matching using multi-view geometry, including finding corresponding points across images and triangulating them to determine camera pose.
3) Recent works aim to improve localization of objects in images using multiple instance learning approaches that can learn without full supervision or through more stable optimization methods like linearizing sampling operations.
Final lecture from the COMP 4010 course on Virtual and Augmented Reality. This lecture was about Research Directions in Augmented Reality. Taught by Mark Billinghurst on November 1st 2016 at the University of South Australia
The document presents a project report on machine learning. It discusses several projects completed including implementing neural networks to compute averages, extracting histogram of joints features, and developing a gesture recognition system using Hidden Markov Models. The gesture recognition system uses a Kinect sensor to capture skeleton data, extracts features, builds a codebook using clustering, trains HMM models for each gesture, and achieves over 85% accuracy on a dataset of 15 gestures. Future work to improve the system is also outlined.
This document provides an activity and research report for Marco Cagnazzo from September 2013. It summarizes his teaching activities from 2004-present, which include courses on information theory, multimedia signal processing, compression techniques, and digital video/multimedia at various universities. It also lists his PhD student supervision and involvement in research projects related to video coding optimization, adaptive image compression, and robust video streaming. His main research themes are described as motion representation, 3D video coding, and distributed video coding. Bibliometric data on his publications and other scholarly activities are also presented.
Cartoonization of images using machine LearningIRJET Journal
The document presents a method for cartoonization of images using machine learning. It discusses converting real-world photos into cartoon images using a GAN-based approach. The key steps include:
1. Importing required modules like OpenCV, NumPy for image processing and GAN modeling.
2. Pre-processing input images by converting them to grayscale, smoothing, and edge detection.
3. Training a GAN using cartoon and photo images to generate new cartoon images.
4. For video cartoonization, frames are extracted from videos using OpenCV, individually cartoonized using the GAN, and reconstructed into a cartoon video.
The proposed system is able to convert images and videos to cartoon style in real-time using deep learning
1. The document discusses analyzing videos with convolutional neural networks (CNNs). It covers techniques like video recognition using CNNs to classify video content at the clip level.
2. DeepVideo and C3D are discussed as approaches for video recognition using CNNs. DeepVideo employs 2D CNNs on multiple video frames while C3D uses 3D CNNs to learn spatiotemporal features directly from video data.
3. Optical flow estimation techniques like DeepFlow are also covered, which uses a deep matching approach to compute dense correspondences between video frames for large displacements.
This document is a resume for Amit Sethi summarizing his professional experience and qualifications. It outlines his objective to obtain a research and development position in industry. It then details his education at the University of Illinois at Urbana-Champaign where he is pursuing a PhD in Electrical and Computer Engineering, as well as a previous degree from Indian Institute of Technology. His experience includes research in machine learning, computer vision, and video processing. He has several publications and awards and is skilled in programming languages such as C++.
Similar to Video Browsing By Direct Manipulation - Draft 1 (20)
This document discusses lean software development principles. It introduces agile software development processes and the agile manifesto. Lean software development is then discussed, which comes from the Toyota Production System and uses a set of principles and tools to achieve quality, speed and customer alignment. The 7 principles of lean thinking are outlined: 1) eliminate waste, 2) amplify learning, 3) decide as late as possible, 4) deliver as fast as possible, 5) empower the team, 6) build integrity in, and 7) see the whole. Each principle is then explained in more detail with examples related to software development.
The document discusses the future of the IT market in Thailand, outlining how technology has evolved from mainframe computers and personal computers to the modern era of mobile devices and internet access. It notes several app and mobile technology trends like location-based services, augmented reality, social networking, and cloud computing. Data shows the rapid growth of Thailand's mobile apps market from 2009 to 2011. The conclusion suggests that in the future, more technologies will evolve from mobile devices, including the "Internet of Things".
This document presents an introduction to ubiquitous computing. It discusses how ubiquitous computing aims to make many computers available throughout the physical environment, yet make them effectively invisible to the user. It outlines the three waves of computing as mainframes, personal computers, and ubiquitous computing. It also covers key elements of ubiquitous computing including ubiquitous networking, sensing, access, and middleware. Issues with privacy, reliability, and social impact are discussed.
This document discusses using microformats to create visualizations from information on the web. It begins with an overview of the history of the internet and issues around information overflow. It then proposes using microformats as a solution, as they allow information to be both human and machine readable. Finally, it discusses several visualization techniques that could be used with microformatted data, such as timelines, graphs, charts, and maps.
Extreme Programming (XP) is an agile software development methodology that focuses on rapid delivery of working software, customer satisfaction, simplicity, communication, and feedback. Key practices of XP include having working software delivered frequently in small releases, writing automated tests before code, pairing programmers, continuous refactoring, and integrating code daily. The goal of XP is to improve productivity and quality through practices like test-driven development, simple design, pair programming, and frequent feedback from customers.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
1. Video Browsing by Direct
Manipulation
Pierre Dragicevic, Gonzato Ramos, Jacobo Bibliowicz,
Derek Nowrouzezahrai, Ravin Balakrishman,
Karan Singh
User Interface Design 646
Presented by Vashira Ravipanich
5171439021
2. Introduction
• All video players use
“seeker bar” to control
user interaction
• What if you can directly
dragging in the movie?
3. Introduction
• This paper presents a method for browsing
videos by “directly dragging” their content
• Automatically extracting motion data
• Relative Flow Dragging
4.
5. Why Direct Manipulation?
• Input ~ Output
• Time V.S. Space
• Both are complementary NOT rival
Input like finger move = Output like
mouse movement
Time = seeker Bar, Space = Direct
Manipulation
6. How does it works?
• Videos = sequence of multiple pictures
(frame)
• Extract object(s) movement Call “Trajectory
Extraction”
• Construct “hint path”
7. Relative Flow Dragging
• Directness Directness => user input lang ==
generated output
• Matching gesture with motion
2D = map
3D = scaling object, rotating object
8. Type of dragging
• Curvilinear Dragging
• Flow Dragging
• Relative Dragging
12. Trajectory Extraction
• Computer Vision Approaches
• Object Tracking
- object on video sequence
- motion capture, surveillance
• Optical Flow
- whole picture, calculate pixels
- video compression
• Optical Flow is better for general video player
14. Proposed Solutions
• 3D Distance Method
• (x, y, z) where z is arc-length distance from
the curve origin
15. Limitations
• Video with back-and-forth movement, i.e a
couple dancing tango
• DIfficult to visualize path clearly
16. Evaluation
• User Study
• 6 males, 10 females
• 18 - 44 years old
• Test with 2 videos with given objectives
• Offer both seeker bar and relative flow
dragging
• Which one user comfortable with the most?
20. Previous work on Video Browsing
• Non-Linear Video Browsing
- Segment of difference importance
- Estimating motion activity
• Visual Summaries
- Generate mosaic from key frames
• Content-Based Video Retrieval
21. Conclusion & Future Work
• New way of browsing videos using direct
manipulation
• Appealing to touch-input handheld. iPhone,
Pocket PC.
• Interactive Learning Environments.
22. References
1. Accot, J. and Zhai, S. (1997). Beyond Fitts' law: mod- 11. Dragicevic, P., Huot, S. and Huot, S. (2002). SpiraC-
els for trajectory-based HCI tasks. CHI. p. 295-302. lock: a continuous and non-intrusive display for up-
2. Appert, C. and Fekete, J. (2006). OrthoZoom scroller: coming events. CHI Extended Abstracts. p. 604-605.
1D Multi-Scale Navigation. CHI. P. 21-30. 12. Goldman, D.B., Curless, B., Salesin, D. and Seitz, S.M.
3. Autodesk Maya. http://www.autodesk.com/ (2006). Schematic storyboarding for video visualization
4. Baudel, T., Fitzmaurice, G., Buxton, W., Kurtenbach, and editing. SIGGRAPH. p. 862-871.
G., Tappen, C. and Liepa, P. (2002). Drawing system 13. Guimbretière, F. (2000). FlowMenu: combining com-
using design guides. US Patent # 6,377,240. mand, text, and data entry. UIST. p. 213-216.
5. Beauchemin, S.S. and Barron, J.L. (1995). The compu- 14. Hölzl, R. (1996). How does ‘dragging’ affect the learn-
tation of optical flow. ACM Computing Surveys, 27(3). ing of geometry? International Journal of Computers
p. 433-467. for Mathematical Learning, 1(2). p. 169-187.
6. Beaudouin-Lafon, M. (2000). Instrumental Interaction: 15. Hutchins, E.L., Hollan, J.D. and Norman, D.A. (1987).
An interaction model for designing post-WIMP user in- Direct manipulation interfaces. In Human-Computer in-
terfaces. CHI. p. 446-453. teraction: A Multidisciplinary Approach. R. M. Baeck-
7. Beaudouin-Lafon, M. (2001). Novel interaction tech- er, Ed. Morgan Kaufmann. p. 468-470.
niques for overlapping windows. UIST. p. 153-154. 16. Irani, M., Anadan, P. and Hsu, H. (1995). Mosaic based
8. Bezerianos, A., Dragicevic, P. and Balakrishnan, R. representations of video sequences and their applica-
(2006). Mnemonic rendering: an image-based approach tions. Intl. Conference on Computer Vision. p. 605-611.
for exposing hidden changes in dynamic displays. 17. Kim, C. and Hwang, J. (2002). Fast and automatic
UIST. p. 159-168. video object segmentation and tracking for content-
9. Buxton, W. (1986). There's more to interaction than based applications. IEEE Trans. Circuits and Systems
meets the eye: some issues in manual input. In User for Video Technology, 12. p. 122-129.
Centered System Design: New Perspectives on Human- 18. Kimber D., Dunnigan, T., Girgensohn, A., Shipman, F.,
Computer Interaction. Lawrence Erlbaum. p. 19-337. Turner, T. and Yang, T. (2007). Trailblazing: Video
10. Cheng,Y. (1995). Mean shift, mode seeking, and clus- playback control by direct object manipulation. ICME.
tering. IEEE Transactions on Pattern Analysis and Ma- p. 1015-1018.
chine Intelligence, 17(8). p. 790-799. 19. Li, F.C., Gupta, A., Sanocki, E., He, L. and Rui, Y.