HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task

•

0 likes•105 views

Paper: http://ceur-ws.org/Vol-2882/paper50.pdf Hai Nguyen-Truong, San Cao, N. A. Khoa Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Sports Video Classification Tasks in the Multimedia Evaluation 2020 Challenge focuses on classifying different types of table tennis strokes in video segments. In this task, we - the HCMUS Team - perform multiple experiments, which includes a combination of models such as SlowFast, Optical Flow, DensePose, R2+1, Channel-Separated Convolutional Networks, to classify 21 types of table tennis strokes from video segments. In total, we submit eight runs corresponding to five different models with different sets of hyper-parameters in each of our models. In addition, we apply some pre-processing techniques on the dataset in order for our model to learn and classify more accurately. According to the evaluation results, one of our team's methods out-performs the other team's. In particular, our best run achieves 31.35\% global accuracy, and all of our methods show potential results in terms of local and global accuracy for action recognition tasks.

Science

Hai Nguyen-Truong, San Cao, Khoa N. A. Nguyen, Bang-Dang
Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh,
Hai-Dang Nguyen, and Minh-Triet Tran
Ensembles of Temporal Deep Neural Networks
for Table Tennis Strokes Classiﬁcation Task
MediaEval 2020 Classiﬁcation of Strokes in Table Tennis
{nthai18, ctsan18, nnakhoa18, pbdang18, dhieu}@apcs.vn
{lmquan, ndhphuc, nhdang}@selab.hcmus.edu.vn, tmtriet@ﬁt.hcmus.edu.vn
Dec.15,2020
1
MediaEval 2020
11th Anniversary Workshop
14-15 December 2020
Sophia Antipolis, France

Run 04 -Late Temporal Modeling in 3D-CNN [4]
2
1. Approach
● Late temporal modeling in 3D CNN Architectures with BERT
● Replace the conventional Temporal Global Average Pooling (TGAP) layer with
Bidirectional Encoder Representations from Transformers (BERT)
2. Training Conﬁg
● ResNeXt101[7] with 64 frames
3. Result
● 87.9% on validation dataset
● 25.42% on ﬁnal result
[4] M. Esat Kalfaoglu et al. “Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition”
[7] Saining Xie et al. “Aggregated Residual Transformations for Deep Neural Networks”

Run 08 - TSTCNN [6] with Multi-labeling and Ensembling
3
We reimplemented TSTCNN based on: https://github.com/P-eMartin/crisp
[6] Pierre-Etienne Martin et al. “Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks: Application to table tennis”

Run 06 -Slowfast Networks [5]
4
4
Background Removal Process using Dense Pose [1]
[1]: Rıza Alp Güler et al. “DensePose: Dense Human Pose Estimation In The Wild”
[5] Christoph Feichtenhofer et al. “SlowFast Networks for Video Recognition”
Sample Results

Run 06 -Slowfast Networks (cont)
5
SFN 1
SFN 2
● Forehand
● BackHand
● Serve
● Offensive
● Defensive
● Forehand_Oﬀensive_??
● Backhand_Oﬀensive_??
● Forehand_Defensive_??
● Backhand_Defensive_??
● Forehand_Serve_??
● Backhand_Serve_??
concatenate
SFN 3
SFN 4
SFN 5
SFN 6

Run 03 -Baseline using CSN [2] method
1. Approach
● Channel-Separated Convolutional Networks (CSN) [2]
● Pointwise 1 x 1 x1 or Depthwise 3x3x3 convolutions.
Reduce computational cost and increase accuracy
2. Training Conﬁg
● Resnet3D [3] with Batch Normalization Frozen.
3. Result
● 86.67% on validation dataset.
● 28.81% on the ﬁnal result.
6
[2]: Du Tran et al. “Video Classiﬁcation with Channel-Separated Convolutional Networks”
[3]: https://github.com/kenshohara/3D-ResNets-PyTorch
Our code was based on: https://github.com/open-mmlab/mmaction2

Run 07-Improve CSN-based method and Ensemble
1. Training conﬁg
● Switch to Multi-label Classiﬁcation.
● BCEWithLogitsLoss
2. Result
● 0.97 mAP = 0.9 top 1 on validation dataset.
● Stable learning with the ﬁrst two label (Serve/Oﬀensive/Defensive) and
(Forehand/Backhand).
● 25.98% on the ﬁnal result.
● After post processing phase: 31.35%
7

Conclusion and Future Work
● CSN is an eﬃcient method.
● Multi-label classiﬁcation.
10

Paper: http://ceur-ws.org/Vol-2882/paper62.pdf YouTube: https://youtu.be/gV-rvV3iFDA Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri and Julien Morlier : Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal CNN for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online. This work presents a method for classifying table tennis strokes using spatio-temporal convolutional neural networks. The fine-grained classification is performed on trimmed video segments recorded at 120 fps with different players performing in natural conditions. From those segments, the frames are extracted, their optical flow is computed and the pose of the player is estimated. From the optical flow amplitude, a region of interest is inferred. A three stream spatio-temporal convolutional neural network using combination of those modalities and 3D attention mechanisms is presented in order to perform classification. Presented by: Pierre-Etienne Martin

HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper47.pdf YouTube: https://youtu.be/vMsM4zg2-JY Tien-Phat Nguyen, Tan-Cong Nguyen, Gia-Han Diep, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Medico task, MediaEval 2020, explores the challenge of building accurate and high-performance algorithms to detect all types of polyps in endoscopic images. We proposed different approaches leveraging the advantages of either ResUnet++ or PraNet model to efficiently segment polyps in colonoscopy images, with modifications on the network structure, parameters, and training strategies to tackle various observed characteristics of the given dataset. Our methods outperform the other teams' methods, for both accuracy and efficiency. After the evaluation, we are at top 2 for task 1 (with Jaccard index of 0.777, best Precision and Accuracy scores) and top 1 for task 2 (with 67.52 FPS and Jaccard index of 0.658).

Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper31.pdf Syed Muhammad Faraz Ali, Muhammad Taha Khan, Syed Unaiz Haider, Talha Ahmed, Zeshan Khan and Muhammad Atif Tahir : Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Intestinal Tract. Proc. of MediaEval 2020, 14-15 December 2020, Online. Identification of polyps in endoscopic images is critical for the diagnosis of colon cancer. Finding the exact shape and size of polyps requires the segmentation of endoscopic images. This research explores the advantage of using depth-wise separable convolution in the atrous convolution of the ResUNet++ architecture. Deep atrous spatial pyramid pooling was also implemented on the ResUNet++ architecture. The results show that architecture with separable convolution has a smaller size and fewer GFLOPs without degrading the performance too much.

HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper73.pdf YouTube: https://youtu.be/TadJ6y7xZeA Thuc Nguyen-Quang, Tuan-Duy Nguyen, Thang-Long Nguyen-Ho, Anh-Kiet Duong, Xuan-Nhat Hoang, Vinh-Thuyen Nguyen-Truong, Hai-Dang Nguyen and Minh-Triet Tran : HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching. Proc. of MediaEval 2020, 14-15 December 2020, Online. Matching text and images based on their semantics has an important role in cross-media retrieval. However, text and images in articles have a complex connection. In the context of MediaEval 2020 Challenge, we propose three multi-modal methods for mapping text and images of news articles to the shared space in order to perform efficient cross-retrieval. Our methods show systemic improvement and validate our hypotheses, while the best-performed method reaches a recall@100 score of 0.2064. Presented by: Thuc Nguyen-Quang

Automatic Image Co-segmentation Using Geometric Mean Saliency(Top 10% paper)[...

Koteswar Rao Jerripothula

Poster: MMSP 2008Mahfuzul Haque

Image Co-segmentation via Saliency Co-fusion

Koteswar Rao Jerripothula

Most existing high-performance co-segmentation algorithms are usually complex due to the way of co-labelling a set of images as well as the common need of fine-tuning few parameters for effective co-segmentation. In this paper, instead of following the conventional way of co-labelling multiple images, we propose to first exploit inter-image information through cosaliency, and then perform single-image segmentation on each individual image. To make the system robust and to avoid heavy dependence on one single saliency extraction method, we propose to apply multiple existing saliency extraction methods on each image to obtain diverse salient maps. Our major contribution lies in the proposed method that fuses the obtained diverse saliency maps by exploiting the inter-image information, which we call saliency co-fusion. Experiments on five benchmark datasets with eight saliency extraction methods show that our saliency co-fusion based approach achieves competitive performance even without parameter fine-tuning when compared with the state-of-the-art methods.

A LOCALITY SENSITIVE LOW-RANK MODEL FOR IMAGE TAG COMPLETION

Nexgen Technology

TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.

Segmentation is a process of partitioning the image into several objects. It plays a vital role in many fields such as satellite, remote sensing, object identification, face tracking and most importantly in medical field. In radiology, magnetic resonance imaging (MRI) is used to investigate the human body processes and functions of organisms. In hospitals, this technique has been using widely for medical diagnosis, to find the disease stage and follow-up without exposure to ionizing radiation.Here in this paper, we proposed a novel MR brain image segmentation method for detecting the tumor and finding the tumor area with improved performance over conventional segmentation techniques such as fuzzy c means (FCM), K-means and even that of manual segmentation in terms of precision time and accuracy. Simulation performance shows that the proposed scheme has performed superior to the existing segmentation methods.

Brain tumor segmentation using asymmetry based histogram thresholding and k m...

eSAT Publishing House

Final thesis presentation

Pawan Singh

03. Harsha GA.

Harsha M

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...grssieee

Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm

MDABDULMANNANMONDAL

P1151133713Ashraf Aboshosha

Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...

CSCJournals

Histogram equalization is an efficient process often employed in consumer electronic systems for image contrast enhancement. In addition to an increase in contrast, it is also required to preserve the mean brightness of an image in order to convey the true scene information to the viewer. A conventional approach is to separate the image into sub-images and then process independently by histogram equalization towards a modified profile. However, due to the variations in image contents, the histogram separation threshold greatly influences the level of shift in mean brightness with respect to the uniform histogram in the equalization process. Therefore, the choice of a proper threshold, to separate the input image into sub-images, is very critical in order to preserve the mean brightness of the output image. In this research work, a dynamic range stretching approach is adopted to reduce the shift in output image mean brightness. Moreover, the computationally efficient golden section search algorithm is applied to obtain a proper separation into sub-images to preserve the mean brightness. Experiments were carried out on a large number of color images of natural scenes. Results, as compared to current available approaches, showed that the proposed method performed satisfactorily in terms of mean brightness preservation and enhancement in image contrast.

Automatic Image Co-segmentation Using Geometric Mean Saliency

Koteswar Rao Jerripothula

Most existing high-performance co-segmentation algorithms are usually complicated due to the way of co-labelling a set of images and the requirement to handle quite a few parameters for effective co-segmentation. In this paper, instead of relying on the complex process of co-labelling multiple images, we perform segmentation on individual images but based on a combined saliency map that is obtained by fusing single-image saliency maps of a group of similar images. Particularly, a new multiple image based saliency map extraction, namely geometric mean saliency (GMS) method, is proposed to obtain the global saliency maps. In GMS, we transmit the saliency information among the images using the warping technique. Experiments show that our method is able to outperform state-of-the-art methods on three benchmark co-segmentation datasets.

OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...

ijcsit

In this paper, we propose a new algorithm to estimate a super-resolution image from a given low-resolution image, by adding high-frequency information that is extracted from natural high-resolution images in the training dataset. The selection of the high-frequency information from the training dataset is accomplished in two steps, a nearest-neighbor search algorithm is used to select the closest images from the training dataset, which can be implemented in the GPU, and a sparse-representation algorithm is used to estimate a weight parameter to combine the high-frequency information of selected images. This simple but very powerful super-resolution algorithm can produce state-of-the-art results. Qualitatively and quantitatively, we demonstrate that the proposed algorithm outperforms existing state-of-the-art super-resolution algorithms.

Support Vector Machine–Based Prediction System for a Football Match Result

iosrjce

IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.

A DISCUSSION ON IMAGE ENHANCEMENT USING HISTOGRAM EQUALIZATION BY VARIOUS MET...

pharmaindexing

Grychtol B. et al.: 3D EIT image reconstruction with GREIT.

Hauke Sann

Adaptive threshold for moving objects detection using gaussian mixture model

TELKOMNIKA JOURNAL

Moving object detection becomes the important task in the video surveilance system. Defining the threshold automatically is challenging to differentiate the moving object from the background within a video. This study proposes gaussian mixture model (GMM) as a threshold strategy in moving object detection. The performance of the proposed method is compared to the Otsu algorithm and gray threshold as the baseline method using mean square error (MSE) and Peak Signal Noise Ratio (PSNR). The performance comparison of the methods is evaluated on human video dataset. The average result of MSE value GMM is 257.18, Otsu is 595.36 and Gray is 645.39, so the MSE value is lower than Otsu and Gray threshold. The average result of PSNR value GMM is 24.71, Otsu is 20.66 and Gray is 19.35, so the PSNR value is higher than Otsu and Gray threshold. The performance of the proposed method outperforms the baseline method in term of error detection.

Estimating Number of People in ITU-EEB as an Application of People Counting T...

Fellowship at Vodafone FutureLab

Development of algorithm for identification of maligant growth in cancer usin...

IJECEIAES

The precise identification and characterization of small pulmonary nodules at low-dose CT is a necessary requirement for the completion of valuable lung cancer screening. It is compulsory to develop some automated tool, in order to detect pulmonary nodules at low dose ct at the beginning stage itself. The various algorithms had been proposed earlier by many researchers within the past, but the accuracy of prediction is usually a challenging task. During this work, a man-made neural networ based methodology is proposed to seek out the irregular growth of lung tissues. Higher probability of detection is taken as a goal to urge an automatic tool, with great accuracy. The best feature sets derived from Haralick Gray level co occurrence Matrix and used because the dimension reduction way for feeding neural network. During this work, a binary Binary classifier neural network has been proposed to spot the traditional images out of all the images. The potential of the proposed neural network has been quantitatively computed using confusion matrix and located in terms of accuracy.

One shot scene specific crowd counting

madhobilota

Analysis of collaborative learning methods for image contrast enhancement

IAEME Publication

CARI-2020, Application of LSTM architectures for next frame forecasting in Se...

Mokhtar SELLAMI

IndabaX Ghana Poster.pdf

kwadwoAmedi

What's hot

Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain

john236zaq

An Enhanced Model for Inpainting on Digital Images Using Dynamic Masking

Md. Shohel Rana

SEGMENTATION OF MAGNETIC RESONANCE BRAIN TUMOR USING INTEGRATED FUZZY K-MEANS...

ijcsit

Brain tumor segmentation using asymmetry based histogram thresholding and k m...

eSAT Publishing House

Final thesis presentation

Pawan Singh

03. Harsha GA.

Harsha M

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...grssieee

Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm

MDABDULMANNANMONDAL

P1151133713Ashraf Aboshosha

Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...

CSCJournals

Automatic Image Co-segmentation Using Geometric Mean Saliency

Koteswar Rao Jerripothula

OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...

ijcsit

Support Vector Machine–Based Prediction System for a Football Match Result

iosrjce

A DISCUSSION ON IMAGE ENHANCEMENT USING HISTOGRAM EQUALIZATION BY VARIOUS MET...

pharmaindexing

Grychtol B. et al.: 3D EIT image reconstruction with GREIT.

Hauke Sann

Adaptive threshold for moving objects detection using gaussian mixture model

TELKOMNIKA JOURNAL

Estimating Number of People in ITU-EEB as an Application of People Counting T...

Fellowship at Vodafone FutureLab

Development of algorithm for identification of maligant growth in cancer usin...

IJECEIAES

One shot scene specific crowd counting

madhobilota

Analysis of collaborative learning methods for image contrast enhancement

IAEME Publication

What's hot (20)

Image Restoration Using Joint Statistical Modeling in a Space-Transform Domain

An Enhanced Model for Inpainting on Digital Images Using Dynamic Masking

SEGMENTATION OF MAGNETIC RESONANCE BRAIN TUMOR USING INTEGRATED FUZZY K-MEANS...

Brain tumor segmentation using asymmetry based histogram thresholding and k m...

Final thesis presentation

03. Harsha GA.

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...

Paper 58 disparity-of_stereo_images_by_self_adaptive_algorithm

P1151133713

Image Contrast Enhancement for Brightness Preservation Based on Dynamic Stret...

Automatic Image Co-segmentation Using Geometric Mean Saliency

OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...

Support Vector Machine–Based Prediction System for a Football Match Result

A DISCUSSION ON IMAGE ENHANCEMENT USING HISTOGRAM EQUALIZATION BY VARIOUS MET...

Grychtol B. et al.: 3D EIT image reconstruction with GREIT.

Adaptive threshold for moving objects detection using gaussian mixture model

Estimating Number of People in ITU-EEB as an Application of People Counting T...

Development of algorithm for identification of maligant growth in cancer usin...

One shot scene specific crowd counting

Analysis of collaborative learning methods for image contrast enhancement

Similar to HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task

CARI-2020, Application of LSTM architectures for next frame forecasting in Se...

Mokhtar SELLAMI

IndabaX Ghana Poster.pdf

kwadwoAmedi

Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020

Universitat Politècnica de Catalunya

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)

byteLAKE

Chennai python augustmeetup

Electronics and Communication Engineering, Institute of Road and Transport Technology

教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...

cvpaper. challenge

CVPR 2018 完全読破チャレンジ報告会 cvpaper.challenge 勉強会@Wantedly白金台オフィス cvpaper.challenge はコンピュータビジョン分野の今を映し、創り出す挑戦です。論文読破・まとめ・アイディア考案・議論・実装・論文執筆（・社会実装）に至るまで広く取り組み、あらゆる知識を共有しています。 http://xpaperchallenge.org/cv/

Parallel Biological Sequence Comparison in GPU Platforms

Ganesan Narayanasamy

Chap 8. Optimization for training deep models

Young-Geun Choi

Restricting the Flow: Information Bottlenecks for Attribution

taeseon ryu

101번째 영상, 펀디멘탈팀 김준호 님의 Restricting the Flow: Information Bottlenecks for Attribution 논문 리뷰 입니다 Explanable ai, xai와 관련된 페이퍼 입니다! 관련되어 관심있으신 분들이 많은 도움이 되시길 바랍니다! attribution map을 이용하여 결과물에 영향을 준 네트워크의 gradient를 직접 추적하여 비주얼 explanation을 추적하는 방식입니다! 펀디멘탈팀 김준호님이 밑바닥부터 자세한 리뷰를 도와주셨습니다! 오늘도 많은 관심과 사랑 감사합니다!

Scaling Deep Learning Algorithms on Extreme Scale Architectures

inside-BigData.com

In this video from the MVAPICH User Group, Abhinav Vishnu from PNNL presents: Scaling Deep Learning Algorithms on Extreme Scale Architectures. "Deep Learning (DL) is ubiquitous. Yet leveraging distributed memory systems for DL algorithms is incredibly hard. In this talk, we will present approaches to bridge this critical gap. We will start by scaling DL algorithms on large scale systems such as leadership class facilities (LCFs). Specifically, we will: 1) present our TensorFlow and Keras runtime extensions which require negligible changes in user-code for scaling DL implementations, 2) present communication-reducing/avoiding techniques for scaling DL implementations, 3) present approaches on fault tolerant DL implementations, and 4) present research on semi-automatic pruning of DNN topologies. Our results will include validation on several US supercomputer sites such as Berkeley's NERSC, Oak Ridge Leadership Class Facility, and PNNL Institutional Computing. We will provide pointers and discussion on the general availability of our research under the umbrella of Machine Learning Toolkit on Extreme Scale (MaTEx) available at http://github.com/matex-org/matex." Watch the video: https://wp.me/p3RLHQ-hnZ

Compressed learning for time series classification

學翰施

Fractional step discriminant pruning

VasileiosMezaris

CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...

The Statistical and Applied Mathematical Sciences Institute

The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.

"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2017-embedded-vision-summit-chiu For more information about embedded vision, please visit: http://www.embedded-vision.com Matthew Chiu, Founder of Almond AI, presents the "Designing CNN Algorithms for Real-time Applications" tutorial at the May 2017 Embedded Vision Summit. The real-time performance of CNN-based applications can be improved several-fold by making smart decisions at each step of the design process – from the selection of the machine learning framework and libraries used to the design of the neural network algorithm to the implementation of the algorithm on the target platform. This talk delves into how to evaluate the runtime performance of a CNN from a software architecture standpoint. It then explains in detail how to build a neural network from the ground up based on the requirements of the target hardware platform. Chiu shares his ideas on how to improve performance without sacrificing accuracy, by applying recent research on training very deep networks. He also shows examples of how network optimization can be achieved at the algorithm design level by making a more efficient use of weights before the model is compressed via more traditional methods for deployment in a real-time application.

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

csandit

The ability to mine and extract useful information automatically, from large datasets, is a common concern for organizations (having large datasets), over the last few decades. Over the internet, data is vastly increasing gradually and consequently the capacity to collect and store very large data is significantly increasing. Existing clustering algorithms are not always efficient and accurate in solving clustering problems for large datasets. However, the development of accurate and fast data classification algorithms for very large scale datasets is still a challenge. In this paper, various algorithms and techniques especially, approach using non-smooth optimization formulation of the clustering problem, are proposed for solving the minimum sum-of-squares clustering problems in very large datasets. This research also develops accurate and real time L2-DC algorithm based with the incremental approach to solve the minimum

A quantum-inspired optimization heuristic for the multiple sequence alignment...

Konstantinos Giannakis

Fuzzy Entropy Based Optimal Thresholding Technique for Image Enhancement

ijsc

Soft computing is likely to play aprogressively important role in many applications including image enhancement. The paradigm for soft computing is the human mind. The soft computing critique has been particularly strong with fuzzy logic. The fuzzy logic is facts representationas a rule for management of uncertainty. Inthis paperthe Multi-Dimensional optimized problem is addressed by discussing the optimal thresholding usingfuzzyentropyfor Image enhancement. This technique is compared with bi-level and multi-level thresholding and obtained optimal thresholding values for different levels of speckle noisy and low contrasted images. The fuzzy entropy method has produced better results compared to bi-level and multi-level thresholding techniques.

FastV2C-HandNet - ICICC 2020

RohanLekhwani

Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...

csandit

Single-channel speech intelligibility enhancement is much more difficult than multi-channel intelligibility enhancement. It has recently been reported that machine learning training-based single-channel speech intelligibility enhancement algorithms perform better than traditional algorithms. In this paper, the performance of a deep neural network method using a multiresolution cochlea-gram feature set recently proposed to perform single-channel speech intelligibility enhancement processing is evaluated. Various conditions such as different speakers for training and testing as well as different noise conditions are tested. Simulations and objective test results show that the method performs better than another deep neural networks setup recently proposed for the same task, and leads to a more robust convergence compared to a recently proposed Gaussian mixture model approach.

Accelerating stochastic gradient descent using adaptive mini batch size3

muayyad alsadi

Similar to HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task (20)

CARI-2020, Application of LSTM architectures for next frame forecasting in Se...

IndabaX Ghana Poster.pdf

Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)

Chennai python augustmeetup

教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...

Parallel Biological Sequence Comparison in GPU Platforms

Chap 8. Optimization for training deep models

Restricting the Flow: Information Bottlenecks for Attribution

Scaling Deep Learning Algorithms on Extreme Scale Architectures

Compressed learning for time series classification

Fractional step discriminant pruning

CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...

"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

A quantum-inspired optimization heuristic for the multiple sequence alignment...

Fuzzy Entropy Based Optimal Thresholding Technique for Image Enhancement

FastV2C-HandNet - ICICC 2020

Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...

Accelerating stochastic gradient descent using adaptive mini batch size3

More from multimediaeval

Sports Video Classification: Classification of Strokes in Table Tennis for Me...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper2.pdf YouTube: https://youtu.be/-bRL868b8ys Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier : Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020. Proc. of MediaEval 2020, 14-15 December 2020, Online. Fine-grained action classification has raised new challenges compared to classical action classification problems. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we offer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and adaptation of his training for improving his/her sport skills more efficiently. Presented by: Pierre-Etienne Martin

Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper61.pdf YouTube: https://youtu.be/brmI4g3jLS4 Ricardo Kleinlein, Cristina Luna-Jiménez, Fernando Fernández-Martínez and Zoraida Callejas : Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention and LSTM Models. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper reports on the GTH-UPM team experience in the Predicting Media Memorability task at MediaEval 2020. Teams were requested to predict memorability scores at both short-term and long-term, understanding such score as a measure of whether a video was perdurable in a viewer's memory or not. Our proposed system relies on a late fusion of the scores predicted by three sequential models, each trained over a different modality: video captions, aural embeddings and visual optical flow-based vectors. Whereas single-modality models show a low or zero Spearman correlation coefficient value, their combination considerably boosts performance over development data up to 0.2 in the short-term memorability prediction subtask and 0.19 in the long-term subtask. However, performance over test data drops to 0.016 and -0.041, respectively. Presented by: Ricardo Kleinlein

Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper52.pdf Janadhip Jacutprakart, Rukiye Savran Kiziltepe, John Q. Gan, Giorgos Papanastasiou and Alba G. Seco de Herrera : Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we present the methods of approach and the main results from the Essex NLIP Team’s participation in the MediEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The focus of our approach is on the use of colour-based visual features as well as the use of the video annotation meta-data. In addition, hyper-parameter tuning was explored. Besides the simplicity of the methodology, our approach achieves competitive results. We investigated the use of different visual features. We assessed the performance of memorability scores through various regression models where Random Forest regression is our final model, to predict the memorability of videos.

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper6.pdf YouTube: https://youtu.be/ySGGu_4vaxs Alba García Seco De Herrera, Rukiye Savran Kiziltepe, Jon Chamberlain, Mihai Gabriel Constantin, Claire-Hélène Demarty, Faiyaz Doctor, Bogdan Ionescu and Alan F. Smeaton : Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a Video Memorable? Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video to Text dataset, containing more action rich video content as compare with the 2019 task. In this paper a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for the run submission. Presented by: Rukiye Savran Kiziltepe

Fooling an Automatic Image Quality Estimator

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper45.pdf Benoit Bonnet, Teddy Furon and Patrick Bas : Fooling an Automatic Image Quality Estimator. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper we present our work on the 2020 MediaEval task: Pixel "Privacy: Quality Camouflage for Social Images". Blind Image Quality Assessment (BIQA) is a classifier that for any given image will return a quality score. Our task is to modify an image to decrease its BIQA score while maintaining a good perceived quality. Since BIQA is a deep neural network, we worked on an adversarial attack approach of the problem.

Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper16.pdf YouTube: https://youtu.be/ix_b9K7j72w Zhengyu Zhao : Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable Color Filter. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper presents the submission of our RU-DS team to the Pixel Privacy Task 2020. We propose to fool the blind image quality assessment model by transforming images based on optimizing a human-understandable color filter. In contrast to the common work that relies on small, $L_p$-bounded additive pixel perturbations, our approach yields large yet smooth perturbations. Experimental results demonstrate that in the specific context of this task, our approach is able to achieve strong adversarial effects, but has to sacrifice the image appeal. Presented by: Zhengyu Zhao

Pixel Privacy: Quality Camouflage for Social Images

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper77.pdf YouTube: https://youtu.be/8Rr4KknGSac Zhuoran Liu, Zhengyu Zhao, Martha Larson and Laurent Amsaleg : Pixel Privacy: Quality Camouflage for Social Images. Proc. of MediaEval 2020, 14-15 December 2020, Online. High-quality social images shared online can be misappropriated for unauthorized goals, where the quality filtering step is commonly carried out by automatic Blind Image Quality Assessment (BIQA) algorithms. Pixel Privacy benchmarks privacy-protective approaches that protect privacy-sensitive images against unethical computer vision algorithms. In the 2020 task, participants are encouraged to develop camouflage methods that can effectively decrease the BIQA quality score of high-quality images and maintain image appeal. The camouflaged images need to be either imperceptible to the human eye, or it can be a visible enhancement. Presented by: Zhuoran Liu

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper72.pdf Sabarinathan D and Suganya Ramamoorthy : Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attention Unit. Proc. of MediaEval 2020, 14-15 December 2020, Online. Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. The risk of colorectal cancer could be reduced by early diagnosis of poly during a colonoscopy. The disease and their symptoms are highly varying and always a need for a continuous update of knowledge for the doctors and medical analyst. The diseases fall into different categories and a small variation of symptoms may lead to higher rate of risk. We have taken Medico polyp challenge dataset, which consists of 1000 segmented polyp images from gastrointestinal track. We proposed an efficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for different metrics such as Dice coefficient, Recall, Precision and F2.

Deep Conditional Adversarial learning for polyp Segmentation

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper22.pdf Debapriya Banik and Debotosh Bhattacharjee : Deep Conditional Adversarial learning for polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. This approach has addressed the Medico automatic polyp segmentation challenge which is a part of Mediaeval 2020. We have proposed a deep conditional adversarial learning based network for the automatic polyp segmentation task. The network comprises of two interdependent models namely a generator and a discriminator. The generator network is a FCN employed for the prediction of the polyp mask while the discriminator enforces the segmentation to be as similar as the real segmented mask (ground truth). Our proposed model achieved a comparative result on the test dataset provided by the organizers of the challenge.

A Temporal-Spatial Attention Model for Medical Image Detection

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper21.pdf Hwang Maxwell, Wu Cai, Hwang Kao-Shing, Xu Yong Si and Wu Chien-Hsing : A Temporal-Spatial Attention Model for Medical Image Detection. Proc. of MediaEval 2020, 14-15 December 2020, Online. A local region model with attentive temporal-spatial pathways is proposed for automatically learning various target structures. The attentive spatial pathway highlights the salient region to generate bounding boxes and ignores irrelevant regions in an input image. The proposed attention mechanism allows efficient object localization and the overall predictive performance is increased because there are fewer false positives for the object detection task for medical images with manual annotations. The experimental results show that proposed models consistently increase the base architectures' predictive performance for different datasets and training sizes without undue computational efficiency.

HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper20.pdf YouTube: https://youtu.be/CVelQl5Luf0 Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh and Minh-Triet Tran : HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and UNet for Polyps Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. The Medico: Multimedia Task focuses on developing an efficient and accurate framework to computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps in endoscopic images of the gastrointestinal (GI) tract. We are HCMUS-team approach a solution, which includes combination Residual module, Inception module, Adaptive Convolutional neural network with Unet model and PraNet to semantic segmentation all types of polyps in endoscopic images. We submit multiple runs with different architecture and parameters in our model. Our methods show potential results in accuracy and efficiency through multiple experiments.

Fine-tuning for Polyp Segmentation with Attention

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper15.pdf Rabindra Khadka : Transfer of Knowledge: Fine-tuning for Polyp Segmentation with Attention. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper describes how the transfer of prior knowledge can effectively take on segmentation tasks with the help of attention mechanisms. The UNet model pretrained on brain MRI dataset was fine-tuned with the polyp dataset. Attention mechanism was integrated to focus on relevant regions in the input images. The implemented architecture is evaluated on 200 validation images based on intersection over union and dice score between groundtruth and predicted region. The model demonstrates a promising result with computational efciency.

Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper12.pdf Adrian Krenzer and Frank Puppe : Bigger Networks are not Always Better: Deep Convolutional Neural Networks for Automated Polyp Segmentation. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper presents our team's (AI-JMU) approach to the Medico automated polyp segmentation challenge. We consider deep convolutional neural networks to be well suited for this task. To determine the best architecture we test and compare state of the art backbones and two different heads. Finally we achieve a Jaccard index of 73.74\% on the challenge test set. We further demonstrate that bigger networks do not always perform better. However the growing network size always increases the computational complexity.

Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper51.pdf Amel Ksibi, Amina Salhi, Ala Alluhaidan and Sahar A. El-Rahman : Insights for wellbeing: Predicting Personal Air Quality Index using Regression Approach. Proc. of MediaEval 2020, 14-15 December 2020, Online. Providing air pollution information to individuals enables them to understand the air quality of their living environments. Thus, the association between people’s wellbeing and the properties of the surrounding environment is an essential area of investigation. This paper proposes Air Quality Prediction through harvesting public/open data and leveraging them to get the Personal Air Quality index. These are usually incomplete. To cope with the problem of missing data, we applied the KNN imputation method. To predict Personal Air Quality Index, we apply a voting regression approach based on three base regressors which are Gradient Boosting regressor, Random Forest regressor, and linear regressor. Evaluating the experimental results using the RMSE metric, we got an average score of 35.39 for Walker and 51.16 for Car.

Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper40.pdf YouTube: https://youtu.be/SL5Hvu1mARY Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Use Visual Features From Surrounding Scenes to Improve Personal Air Quality Data Prediction Performance. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we propose a method to predict the personal air quality index in an area by using the combination of the levels of the following pollutants: PM2.5, NO2, and O3, measured from the nearby weather stations of that area, and the photos of surrounding scenes taken at that area. Our approach uses the Inverse Distance Weighted (IDW) technique to estimate the missing air pollutant levels and then use regression to integrate visual features from taken photos to optimize the predicted values. After that, we can use those values to calculate the Air Quality Index (AQI). The results show that the proposed method may not improve the performance of the prediction in some cases.

Personal Air Quality Index Prediction Using Inverse Distance Weighting Method

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper39.pdf YouTube: https://youtu.be/3r_oSguFPVM Trung-Quan Nguyen, Dang-Hieu Nguyen and Loc Tai Tan Nguyen : Personal Air Quality Index Prediction Using Inverse Distance Weighting Method. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we propose a method to predict the personal air quality index in an area by only using the levels of the following pollutants: PM2.5, NO2, O3. All of them are measured from the nearby weather stations of that area. Our approach uses one of the most well-known interpolation methods in spatial analysis, the Inverse Distance Weighted (IDW) technique, to estimate the missing air pollutant levels. After that, we can use those levels to calculate the Air Quality Index (AQI). The results show that the proposed method is suitable for the prediction of those air pollutant levels.

Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper11.pdf YouTube: https://youtu.be/fBPuacAZkxs Minh-Son Dao, Peijiang Zhao, Thanh Nguyen, Thanh Binh Nguyen, Duc Tien Dang Nguyen and Cathal Gurrin : Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health Lifelog Data Analysis. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper provides a description of the MediaEval 2020 “Multimodal personal health lifelog data analysis". The purpose of this task is to develop approaches that process the environment data to obtain insights about personal wellbeing. Establishing the association between people’s wellbeing and properties of the surrounding environment which is vital for numerous research. Our task focuses on the internal associations of heterogeneous data. Participants create systems that derive insights from multimodal lifelog data that are important for health and wellbeing to tackle two challenging subtasks. The first task is to investigate whether we can use public/open data to predict personal air pollution data. The second task is to develop approaches to predict personal air quality index(AQI) using images captured by people (plus GAQD). This task targets (but is not limited to) researchers in the areas of multimedia information retrieval, machine learning, AI, data science, event-based processing and analysis, multimodal multimedia content analysis, lifelog data analysis, urban computing, environmental science, and atmospheric science. Presented by: Peijiang Zhao

Ensemble based method for the classification of flooding event using social m...

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper37.pdf YouTube: https://youtu.be/4ROoOzdQzEI Muhammad Hanif, Huzaifa Joozer, Muhammad Atif Tahir and Muhammad Rafi : Ensemble based method for the classification of flooding event using social media data. Proc. of MediaEval 2020, 14-15 December 2020, Online. This paper presents the method proposed and implemented by team FAST-NU-DS, in "The Flood-related Multimedia Task at MediaEval 2020". The task includes data of tweets in Italian language, extracted during floods between 2017 and 2019. The proposed method has utilized text of the tweet and its relevant image for the purpose of binary classification, which identifies whether or not the particular tweet is about flood incident. The proposed method has designed an ensemble based method for the classification of tweets, on the basis of textual data, visual data and combination of both. For visual data, the proposed method has utilized the technique of data augmentation for oversampling of the minority class and applied stratified random sampling for the selection of input. Moreover, Visual Geometry Group (VGG16) convolutional neural network, pretrained on ImageNet and Places365 is utilized by the proposed method. For classification of textual data, the technique of Term Frequency Inverse Document Frequency (TF-IDF) is utilized for feature representation and Multinomial Naive-Bayes classifier is used for the prediction of class. The prediction of image and text are combined for the prediction of each instance. The evaluation of method revealed 36.31%, 20.76% and 27.86% F1-score for text, image and combination of both text and image respectively. Presented by: Muhammad Hanif

Flood Detection via Twitter Streams using Textual and Visual Features

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper35.pdf Firoj Alam, Zohaib Hassan, Kashif Ahmad, Asma Gul, Michael Reiglar, Nicola Conci and Ala Al-Fuqaha : Flood Detection via Twitter Streams using Textual and Visual Features. Proc. of MediaEval 2020, 14-15 December 2020, Online. The paper presents our proposed solutions for the MediaEval 2020 Flood-Related Multimedia Task, which aims to analyze and detect flooding events in multimedia content shared over Twitter. In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as optional runs. In the multi-modal method, we rely on a supervised multimodal bitransformer model that combines textual and visual features in an early fusion, achieving a micro F1-score of .859 on the development data set. For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. For image-based solutions, we employed multiple deep models, pre-trained on both, the Ima- geNet and places data sets, individually and combined in an early fusion achieving F1-scores of .816 and .805 on the development set, respectively.

Floods Detection in Twitter Text and Images

multimediaeval

Paper: http://ceur-ws.org/Vol-2882/paper34.pdf YouTube: https://youtu.be/3f_Q1WeulbI Naina Said, Kashif Ahmad, Asma Gul, Nasir Ahmad and Ala Al-Fuqaha : Floods Detection in Twitter Text and Images. Proc. of MediaEval 2020, 14-15 December 2020, Online. In this paper, we present our methods for the MediaEval 2020 Flood Related Multimedia task, which aims to analyze and combine textual and visual content from social media for the detection of real-world flooding events. The task mainly focuses on identifying floods related tweets relevant to a specific area. We propose several schemes to address the challenge. For text-based flood events detection, we use three different methods, relying on Bog of Words (BOW) and an Italian Version of Bert individually and in combination, achieving an F1-score of 0.77%, 0.68%, and 0.70% on the development set, respectively. For the visual analysis, we rely on features extracted via multiple state-of-the-art deep models pre-trained on ImageNet. The extracted features are then used to train multiple individual classifiers whose scores are then combined in a late fusion manner achieving an F1-score of 0.75%. For our mandatory multi-modal run, we combine the classification scores obtained with the best textual and visual schemes in a late fusion manner. Overall, better results are obtained with the multimodal scheme achieving an F1-score of 0.80% on the development set Presented by: Naina Said

More from multimediaeval (20)

Sports Video Classification: Classification of Strokes in Table Tennis for Me...

Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...

Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task

Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...

Fooling an Automatic Image Quality Estimator

Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...

Pixel Privacy: Quality Camouflage for Social Images

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...

Deep Conditional Adversarial learning for polyp Segmentation

A Temporal-Spatial Attention Model for Medical Image Detection

HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...

Fine-tuning for Polyp Segmentation with Attention

Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...

Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...

Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

Personal Air Quality Index Prediction Using Inverse Distance Weighting Method

Overview of MediaEval 2020 Insights for Wellbeing: Multimodal Personal Health...

Ensemble based method for the classification of flooding event using social m...

Flood Detection via Twitter Streams using Textual and Visual Features

Floods Detection in Twitter Text and Images

Recently uploaded

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...

Scintica Instrumentation

Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes. In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.

GBSN- Microbiology (Lab 3) Gram Staining

Areesha Ahmad

Hemoglobin metabolism_pathophysiology.pptx

muralinath2

Unveiling the Energy Potential of Marshmallow Deposits.pdf

Erdal Coalmaker

Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...

NathanBaughman3

Large scale production of streptomycin.pptx

Cherry

Richard's aventures in two entangled wonderlands

Richard Gill

Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.

general properties of oerganologametal.ppt

IqrimaNabilatulhusni

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION

ChetanK57

platelets_clotting_biogenesis.clot retractionpptx

muralinath2

RNA INTERFERENCE: UNRAVELING GENETIC SILENCING

AADYARAJPANDEY1

Introduction: RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression. It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences. dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants. This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes. What are small ncRNAs? micro RNA (miRNA) short interfering RNA (siRNA) Properties of small non-coding RNA: Involved in silencing mRNA transcripts. Called “small” because they are usually only about 21-24 nucleotides long. Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered). Silence an mRNA by base pairing with some sequence on the mRNA. Discovery of siRNA? The first small RNA: In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the development of the worm C. elegans. Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14. Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions. Types of RNAi ( non coding RNA) MiRNA Length (23-25 nt) Trans acting Binds with target MRNA in mismatch Translation inhibition Si RNA Length 21 nt. Cis acting Bind with target Mrna in perfect complementary sequence Piwi-RNA Length ; 25 to 36 nt. Expressed in Germ Cells Regulates trnasposomes activity MECHANISM OF RNAI: First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces. Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands. The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it. THE RISC COMPLEX: RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA Unwinding of double stranded Si RNA by ATP independent Helicase Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA. DICER: endonuclease (RNase Family III) Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC) One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute ARGONAUTE PROTEIN : 1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA 2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity. MiRNA: The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

Sérgio Sacani

We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and 30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1 . Our search finds no candidates at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to infer the properties of the evolving luminosity function without binning in redshift or luminosity that marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results, and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5 from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical models for evolution of the dark matter halo mass function.

Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...

muralinath2

filosofia boliviana introducción jsjdjd.pptx

IvanMallco1

Structural Classification Of Protein (SCOP)

aishnasrivastava

A brief information about the SCOP protein database used in bioinformatics. The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.

insect morphology and physiology of insect

anitaento25

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Sérgio Sacani

Since volcanic activity was first discovered on Io from Voyager images in 1979, changes on Io’s surface have been monitored from both spacecraft and ground-based telescopes. Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images show that a plume deposit from a powerful eruption at Pillan Patera has covered part of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive optics at visible wavelengths.

Lateral Ventricles.pdf very easy good diagrams comprehensive

silvermistyshot

plant biotechnology Lecture note ppt.pptx

yusufzako14

Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx

muralinath2

Recently uploaded (20)

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...

GBSN- Microbiology (Lab 3) Gram Staining

Hemoglobin metabolism_pathophysiology.pptx

Unveiling the Energy Potential of Marshmallow Deposits.pdf

Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...

Large scale production of streptomycin.pptx

Richard's aventures in two entangled wonderlands

general properties of oerganologametal.ppt

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION

platelets_clotting_biogenesis.clot retractionpptx

RNA INTERFERENCE: UNRAVELING GENETIC SILENCING

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...

filosofia boliviana introducción jsjdjd.pptx

Structural Classification Of Protein (SCOP)

insect morphology and physiology of insect

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Lateral Ventricles.pdf very easy good diagrams comprehensive

plant biotechnology Lecture note ppt.pptx

Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx

HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task

1. Hai Nguyen-Truong, San Cao, Khoa N. A. Nguyen, Bang-Dang Pham, Hieu Dao, Minh-Quan Le, Hoang-Phuc Nguyen-Dinh, Hai-Dang Nguyen, and Minh-Triet Tran Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task MediaEval 2020 Classification of Strokes in Table Tennis {nthai18, ctsan18, nnakhoa18, pbdang18, dhieu}@apcs.vn {lmquan, ndhphuc, nhdang}@selab.hcmus.edu.vn, tmtriet@fit.hcmus.edu.vn Dec.15,2020 1 MediaEval 2020 11th Anniversary Workshop 14-15 December 2020 Sophia Antipolis, France

2. Run 04 -Late Temporal Modeling in 3D-CNN [4] 2 1. Approach ● Late temporal modeling in 3D CNN Architectures with BERT ● Replace the conventional Temporal Global Average Pooling (TGAP) layer with Bidirectional Encoder Representations from Transformers (BERT) 2. Training Conﬁg ● ResNeXt101[7] with 64 frames 3. Result ● 87.9% on validation dataset ● 25.42% on ﬁnal result [4] M. Esat Kalfaoglu et al. “Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition” [7] Saining Xie et al. “Aggregated Residual Transformations for Deep Neural Networks”

3. Run 08 - TSTCNN [6] with Multi-labeling and Ensembling 3 We reimplemented TSTCNN based on: https://github.com/P-eMartin/crisp [6] Pierre-Etienne Martin et al. “Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks: Application to table tennis”

4. Run 06 -Slowfast Networks [5] 4 4 Background Removal Process using Dense Pose [1] [1]: Rıza Alp Güler et al. “DensePose: Dense Human Pose Estimation In The Wild” [5] Christoph Feichtenhofer et al. “SlowFast Networks for Video Recognition” Sample Results

5. Run 06 -Slowfast Networks (cont) 5 SFN 1 SFN 2 ● Forehand ● BackHand ● Serve ● Offensive ● Defensive ● Forehand_Oﬀensive_?? ● Backhand_Oﬀensive_?? ● Forehand_Defensive_?? ● Backhand_Defensive_?? ● Forehand_Serve_?? ● Backhand_Serve_?? concatenate SFN 3 SFN 4 SFN 5 SFN 6

6. Run 03 -Baseline using CSN [2] method 1. Approach ● Channel-Separated Convolutional Networks (CSN) [2] ● Pointwise 1 x 1 x1 or Depthwise 3x3x3 convolutions. Reduce computational cost and increase accuracy 2. Training Config ● Resnet3D [3] with Batch Normalization Frozen. 3. Result ● 86.67% on validation dataset. ● 28.81% on the final result. 6 [2]: Du Tran et al. “Video Classification with Channel-Separated Convolutional Networks” [3]: https://github.com/kenshohara/3D-ResNets-PyTorch Our code was based on: https://github.com/open-mmlab/mmaction2

7. Run 07-Improve CSN-based method and Ensemble 1. Training config ● Switch to Multi-label Classification. ● BCEWithLogitsLoss 2. Result ● 0.97 mAP = 0.9 top 1 on validation dataset. ● Stable learning with the first two label (Serve/Offensive/Defensive) and (Forehand/Backhand). ● 25.98% on the final result. ● After post processing phase: 31.35% 7

8. Run 07 (cont) Post Processing Diagram 8

9. Results 9 Accuracies of our 5 runs

10. Conclusion and Future Work ● CSN is an eﬃcient method. ● Multi-label classiﬁcation. 10

11. Thank you Q&A 11

HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task

Similar to HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task (20)

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table Tennis Strokes Classification Task