Does deep learning solve all the machine learning problems? Where would domain knowledge fit in? While it is common in medical data analytics to incorporate domain knowledge, we focus on one emerging area in computer vision and language processing, video+language, to answer these questions.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning and knowledge graph techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video entity linking, video-language alignment, and video captioning, and discuss how domain knowledge can fit in to improve the performance.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on understanding videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now striving to bridge videos with natural language, which can be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video-language alignment and video captioning.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning and knowledge graph techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video entity linking, video-language alignment, and video captioning, and discuss how domain knowledge can fit in to improve the performance.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques.
Talk given at PYCON Stockholm 2015
Intro to Deep Learning + taking pretrained imagenet network, extracting features, and RBM on top = 97 Accuracy after 1 hour (!) of training (in top 10% of kaggle cat vs dog competition)
Zero shot learning through cross-modal transferRoelof Pieters
review of the paper "Zero-Shot Learning Through Cross-Modal Transfer" by Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng.
at KTH's Deep Learning reading group:
www.csc.kth.se/cvap/cvg/rg/
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on understanding videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now striving to bridge videos with natural language, which can be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video-language alignment and video captioning.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning and knowledge graph techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video entity linking, video-language alignment, and video captioning, and discuss how domain knowledge can fit in to improve the performance.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques.
Talk given at PYCON Stockholm 2015
Intro to Deep Learning + taking pretrained imagenet network, extracting features, and RBM on top = 97 Accuracy after 1 hour (!) of training (in top 10% of kaggle cat vs dog competition)
Zero shot learning through cross-modal transferRoelof Pieters
review of the paper "Zero-Shot Learning Through Cross-Modal Transfer" by Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng.
at KTH's Deep Learning reading group:
www.csc.kth.se/cvap/cvg/rg/
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general.
However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”.
The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them.
CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems.
HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain.
In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...GeeksLab Odessa
23.05.15 Одесса. Impact Hub Odessa. Конференция AI&BigData Lab
Артем Чернодуб (Computer Vision Team, ZZ Wolf)
"Распознавание изображений методом Lazy Deep Learning в фото-органайзере ZZ Photo"
В докладе рассматривается проблема распознавания изображений методами машинного зрения. Проводится краткий обзор существующих подзадач в этой области (детекция обьектов, классификация сцен, ассоциативный поиск в базах изображений, распознавание лиц и др.) и современных методов их решения с акцентом на глубокое обучение (Deep Learning).
Подробнее:
http://geekslab.co/
https://www.facebook.com/GeeksLab.co
https://www.youtube.com/user/GeeksLabVideo
Slides from Portland Machine Learning meetup, April 13th.
Abstract: You've heard all the cool tech companies are using them, but what are Convolutional Neural Networks (CNNs) good for and what is convolution anyway? For that matter, what is a Neural Network? This talk will include a look at some applications of CNNs, an explanation of how CNNs work, and what the different layers in a CNN do. There's no explicit background required so if you have no idea what a neural network is that's ok.
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
В докладе представлена тема глубокого обучения (Deep Learning) для распознавания изображений. Рассматриваются практические аспекты обучения глубоких сверточных сетей на GPU, обсуждается личный опыт портирования обученных нейросетей в приложение на основе библиотеки OpenCV, проводится сравнение полученного детектора домашних животных на основе подхода Lazy Deep Learning с детектором Виолы-Джонса.
Докладчики: Артем Чернодуб – эксперт в области искусственных нейронных сетей и систем искусственного интеллекта. В 2007 году закончил Московский физико-технический институт. Руководит направлением Computer Vision в компании ZZ Wolf, а также по совместительству работает научным сотрудником в Институте проблем математических машин и систем НАНУ.
Юрий Пащенко – специалист в области систем машинного зрения и машинного обучения, магистр НТУУ «Киевский Политехнический Институт», факультет прикладной математики (2014). Работает в компании ZZ Wolf на должности R&D Engineer.
This is a slide deck from a presentation, that my colleague Shirin Glander (https://www.slideshare.net/ShirinGlander/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :)
For the sake of simplicity and completeness, I just copied the two slide decks together. As I did the "surrounding" part, I added Shirin's part at the place when she took over and then added my concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;)
The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts.
The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand.
After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start.
The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it.
As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.
Lecture by Xavier Giro-i-Nieto (UPC) at the Master in Computer Vision Barcelona (March 30, 2016).
http://pagines.uab.cat/mcv/
This lecture provides an overview of computer vision analysis of images at a global scale using deep learning techniques. The session is structure in two blocks: a first one addressing end to end learning, and a second one focusing on applications that use off-the-shelf features.
Please submit your feedback as comments on the GDrive source slides:
https://docs.google.com/presentation/d/1ms9Fczkep__9pMCjxtVr41OINMklcHWc74kwANj7KKI/edit?usp=sharing
Sogang University Machine Learning and Data Mining lab seminar, Neural Networks for newbies and Convolutional Neural Networks. This is prerequisite material to understand deep convolutional architecture.
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general.
However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”.
The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them.
CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems.
HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain.
In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...GeeksLab Odessa
23.05.15 Одесса. Impact Hub Odessa. Конференция AI&BigData Lab
Артем Чернодуб (Computer Vision Team, ZZ Wolf)
"Распознавание изображений методом Lazy Deep Learning в фото-органайзере ZZ Photo"
В докладе рассматривается проблема распознавания изображений методами машинного зрения. Проводится краткий обзор существующих подзадач в этой области (детекция обьектов, классификация сцен, ассоциативный поиск в базах изображений, распознавание лиц и др.) и современных методов их решения с акцентом на глубокое обучение (Deep Learning).
Подробнее:
http://geekslab.co/
https://www.facebook.com/GeeksLab.co
https://www.youtube.com/user/GeeksLabVideo
Slides from Portland Machine Learning meetup, April 13th.
Abstract: You've heard all the cool tech companies are using them, but what are Convolutional Neural Networks (CNNs) good for and what is convolution anyway? For that matter, what is a Neural Network? This talk will include a look at some applications of CNNs, an explanation of how CNNs work, and what the different layers in a CNN do. There's no explicit background required so if you have no idea what a neural network is that's ok.
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
В докладе представлена тема глубокого обучения (Deep Learning) для распознавания изображений. Рассматриваются практические аспекты обучения глубоких сверточных сетей на GPU, обсуждается личный опыт портирования обученных нейросетей в приложение на основе библиотеки OpenCV, проводится сравнение полученного детектора домашних животных на основе подхода Lazy Deep Learning с детектором Виолы-Джонса.
Докладчики: Артем Чернодуб – эксперт в области искусственных нейронных сетей и систем искусственного интеллекта. В 2007 году закончил Московский физико-технический институт. Руководит направлением Computer Vision в компании ZZ Wolf, а также по совместительству работает научным сотрудником в Институте проблем математических машин и систем НАНУ.
Юрий Пащенко – специалист в области систем машинного зрения и машинного обучения, магистр НТУУ «Киевский Политехнический Институт», факультет прикладной математики (2014). Работает в компании ZZ Wolf на должности R&D Engineer.
This is a slide deck from a presentation, that my colleague Shirin Glander (https://www.slideshare.net/ShirinGlander/) and I did together. As we created our respective parts of the presentation on our own, it is quite easy to figure out who did which part of the presentation as the two slide decks look quite different ... :)
For the sake of simplicity and completeness, I just copied the two slide decks together. As I did the "surrounding" part, I added Shirin's part at the place when she took over and then added my concluding slides at the end. Well, I'm sure, you will figure it out easily ... ;)
The presentation was intended to be an introduction to deep learning (DL) for people who are new to the topic. It starts with some DL success stories as motivation. Then a quick classification and a bit of history follows before the "how" part starts.
The first part of the "how" is some theory of DL, to demystify the topic and explain and connect some of the most important terms on the one hand, but also to give an idea of the broadness of the topic on the other hand.
After that the second part dives deeper into the question how to actually implement DL networks. This part starts with coding it all on your own and then moves on to less coding step by step, depending on where you want to start.
The presentation ends with some pitfalls and challenges that you should have in mind if you want to dive deeper into DL - plus the invitation to become part of it.
As always the voice track of the presentation is missing. I hope that the slides are of some use for you, though.
Lecture by Xavier Giro-i-Nieto (UPC) at the Master in Computer Vision Barcelona (March 30, 2016).
http://pagines.uab.cat/mcv/
This lecture provides an overview of computer vision analysis of images at a global scale using deep learning techniques. The session is structure in two blocks: a first one addressing end to end learning, and a second one focusing on applications that use off-the-shelf features.
Please submit your feedback as comments on the GDrive source slides:
https://docs.google.com/presentation/d/1ms9Fczkep__9pMCjxtVr41OINMklcHWc74kwANj7KKI/edit?usp=sharing
Sogang University Machine Learning and Data Mining lab seminar, Neural Networks for newbies and Convolutional Neural Networks. This is prerequisite material to understand deep convolutional architecture.
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
This talk will be a summary of the recent advances in deep learning research, current trends in the industry, and the opportunities that lie ahead.
We will discuss topics in research such as:
Transformers, GPT-3, BERT
Neural Architecture Search, Evolutionary Search
Distillation, self-learning
NeRF
Self-Attention
Also shifting industry trends such as:
The move to free data
Rising importance of 3D vision
Using synthetic data (Sim2Real)
Mobile vision & Federated Learning
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
I present a brief review, and an outlook on the rapid changes happening in the field of recommendation engine research on the heels of the deep learning revolution!
How to use transfer learning to bootstrap image classification and question a...Wee Hyong Tok
#theaiconf SFO 2018
Session by Danielle Dean, WeeHyong Tok
Transfer learning enables you to use pretrained deep neural networks trained on various large datasets (ImageNet, CIFAR, WikiQA, SQUAD, and more) and adapt them for various deep learning tasks (e.g., image classification, question answering, and more).
Wee Hyong Tok and Danielle Dean share the basics of transfer learning and demonstrate how to use the technique to bootstrap the building of custom image classifiers and custom question-answering (QA) models. You’ll learn how to use the pretrained CNNs available in various model libraries to custom build a convolution neural network for your use case. In addition, you’ll discover how to use transfer learning for question-answering tasks, with models trained on large QA datasets (WikiQA, SQUAD, and more), and adapt them for new question-answering tasks.
https://conferences.oreilly.com/artificial-intelligence/ai-ca/public/schedule/detail/68527
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Long-term Face Tracking in the Wild using Deep LearningElaheh Rashedi
This paper investigates long-term face tracking of a specific person given his/her face image in a single frame as a query in a video stream. Through taking advantage of pre-trained deep learning models on big data, a novel system is developed for accurate video face tracking in the unconstrained environments depicting various people and objects moving in and out of the frame. In the proposed system, we present a detection-verification-tracking method (dubbed as 'DVT') which accomplishes the long-term face tracking task through the collaboration of face detection, face verification, and (short-term) face tracking. An offline trained detector based on cascaded convolutional neural networks localizes all faces appeared in the frames, and an offline trained face verifier based on deep convolutional neural networks and similarity metric learning decides if any face or which face corresponds to the queried person. An online trained tracker follows the face from frame to frame. When validated on a sitcom episode and a TV show, the DVT method outperforms tracking-learning-detection (TLD) and face-TLD in terms of recall and precision. The proposed system is also tested on many other types of videos and shows very promising results.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-warden
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Pete Warden, Google research engineer and the tech lead of the TensorFlow Mobile and Embedded team, presents the "Solving Vision Tasks Using Deep Learning: An Introduction" tutorial at the May 2018 Embedded Vision Summit.
This talk introduces deep learning for vision tasks. It provides an overview of deep learning, explores its weaknesses and strengths, and highlights best approaches to applying deep learning to solving vision problems. The audience will learn to think about vision problems from a different perspective, understand what questions to ask, and discover where to find the answers to these questions. The talk will conclude with insights on the challenges of deploying deep learning solutions on mobile devices.
In this talk we cover
1. Why NLP and DL
2. Practical Challenges
3. Some Popular Deep Learning models for NLP
Today you can take any webpage in any language and translate it automatically into language you know! You can also cut paste an article or other document into NLP systems and immediately get list of companies and people it talks about, topics that are relevant and the sentiment of the document. When you talk to Google or Amazon assistant, you are using NLP systems. NLP is not perfect but given the advances in last two years and continuing, it is a growing field. Let’s see how it actually works, specifically using Deep learning
About Shishir
Shishir is a Senior Data Scientist at Thomson Reuters working on Deep Learning and NLP to solve real customer pain, even ones they have become used to.
This video is from a a college level course taught at the Florida Polytechnic University located in Lakeland Florida. The purpose of this course is to introduce Freshmen students to what design patterns are and how to use them.
in this class session, Dr. Anderson introduces object orientated programming, covers abstraction, encapsulation, polymorphism, and inheritance. He then wraps things up with a discussion of the differences between composition and inheritance..
The Instructor is Dr. Jim Anderson.
Searching Images: Recent research at SouthamptonJonathon Hare
Intelligence, Agents, Multimedia Seminar series. University of Southampton. 7th March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will
start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
الموعد الإثنين 03 يناير 2022
143
مبادرة
#تواصل_تطوير
المحاضرة ال 143 من المبادرة
المهندس / محمد الرافعي طرباي
نقيب المبرمجين بالدقهلية
بعنوان
"IT INDUSTRY"
How To Getting Into IT With Zero Experience
وذلك يوم الإثنين 03 يناير2022
السابعة مساء توقيت القاهرة
الثامنة مساء توقيت مكة المكرمة
و الحضور من تطبيق زووم
https://us02web.zoom.us/meeting/register/tZUpf-GsrD4jH9N9AxO39J013c1D4bqJNTcu
علما ان هناك بث مباشر للمحاضرة على القنوات الخاصة بجمعية المهندسين المصريين
ونأمل أن نوفق في تقديم ما ينفع المهندس ومهمة الهندسة في عالمنا العربي
والله الموفق
للتواصل مع إدارة المبادرة عبر قناة التليجرام
https://t.me/EEAKSA
ومتابعة المبادرة والبث المباشر عبر نوافذنا المختلفة
رابط اللينكدان والمكتبة الالكترونية
https://www.linkedin.com/company/eeaksa-egyptian-engineers-association/
رابط قناة التويتر
https://twitter.com/eeaksa
رابط قناة الفيسبوك
https://www.facebook.com/EEAKSA
رابط قناة اليوتيوب
https://www.youtube.com/user/EEAchannal
رابط التسجيل العام للمحاضرات
https://forms.gle/vVmw7L187tiATRPw9
ملحوظة : توجد شهادات حضور مجانية لمن يسجل فى رابط التقيم اخر المحاضرة
Searching Images: Recent research at SouthamptonJonathon Hare
Knowledge Media Institute seminar series. The Open University. 23rd March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
Similar to Video + Language: Where Does Domain Knowledge Fit in? (20)
Many learning tasks can be summarized as learning a mapping from a structured input to a structured output, such as machine translation, image captioning, image style transfer, and image dehazing. Such mappings are usually learned on paired training data, where an input sample and its corresponding output are both provided. Collecting paired training data often involves expensive human annotation, and the scale of paired training data is therefore often limited. As a result, the generalization ability of models trained on paired data is also limited. One way to mitigate this issue is learning with unpaired data, which is far less expensive to collect. Taking machine translation as an example, the unpaired training data can be collected separately from newspapers in the source language and target language without any annotation. The challenge of unpaired learning turns into how to align the unpaired data. With carefully designed objectives, unpaired learning has achieved remarkable progress on several tasks. This talk will cover the data collection and training methods of several unpaired learning tasks to illustrate the power of learning with unpaired data.
With the explosive growth in AI related fields, top conferences and journals are struggling to keep up with the tremendous amount of paper submissions. More and more new or inexprienced reviewers are rising to the ocassion. How to become a good reviewer and contribute to the health and growth of the field we all invest in? We will share our perspectives and suggestions.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Video + Language: Where Does Domain Knowledge Fit in?
1. Video + Language: Where Does
Domain Knowledge Fit in?
Jiebo Luo
Department of Computer Science
July 10, 2016
Keynote@2016 IJCAI Workshop on Semantic Machine Learning
2. Domain Knowledge in Machine Learning
• Domain knowledge is used frequently in ML applications
(sometimes without knowing that you are actually doing it)
– A good example is feature extraction. What features to use?
– Other uses include objective function, parameter selection
– Even in deep learning (architecture, learning rate, etc.)
– Certainly probabilistic graphical models (including priors)
– Data cleaning (yes!)
• Context models encode domain knowledge
– Spatial context (e.g., in computer vision)
– Temporal context (e.g., in sequence analysis)
– Social context (e.g., in social media data mining)
• We will focus on some less obvious, more sophisticated forms
of domain knowledge, especially in the area of “vision and
language”, an emerging fertile ground in machine learning
4. Introduction
• Video has become ubiquitous on the Internet, TV, as well as
personal devices.
• Recognition of video content has been a fundamental challenge
in computer vision for decades, where previous research
predominantly focused on understanding videos using a
predefined yet limited vocabulary.
• Thanks to the recent development of deep learning techniques,
researchers in both computer vision and multimedia
communities are now striving to bridge video with
natural language, which can be regarded as the ultimate goal of
video understanding.
• We present recent advances in exploring the synergy of video
understanding and language processing, including video-
language alignment, video captioning, and video emotion
analysis.
10. Semantic Video Entity Linking
• Motivations to use visual content
1. Video entity linking is very challenging with only
title & descriptions, especially for UGC
2. Video entity linking must be of high quality
3. The visual content of a video truly represents the
user intent for video watching and sharing
18. Semantic Video Entity Linking
• Experiments
– Dataset: 1920 videos
– Labels: Amazon Mechanical Turk
19. Semantic Video Entity Linking
We propose two constraints to guide the video entity linking process:
1. Temporal Smoothness: the entity occurrence (matches with any of the representative
images) is smooth over time
2. Representativeness Smoothness: in order to reduce the significant irrelevant
information
20. Conclusions
• Metric Learning helps by adapting to different
topics and domains
• Structured constraints are important for
suppressing noises
• Future work includes integrating the video
metadata information and building entity
integrated applications, e.g., video spam
detection
21. Semantic Video Entity Linking.
An Application: Video Spam Removal
• “guardians of the galaxy full movie”
– Let’s watch the movie
22. Unsupervised Alignment of Actions
in Video with Text Descriptions
Y. Song, I. Naim, A. Mamun, K. Kulkarni, P. Singla
J. Luo, D. Gildea, H. Kautz
23. Overview
• Unsupervised alignment of video with text
• Motivations
– Generate labels from data (reduce burden of manual labeling)
– Learn new actions from only parallel video+text
– Extend noun/object matching to verbs and actions
Matching Verbs to Actions
The person takes out a knife
and cutting board
Matching Nouns to Objects
[Naim et al., 2015]
An overview of the text and
video alignment framework
24. Hyperfeatures for Actions
• High-level features required for alignment with text
→ Motion features are generally low-level
• Hyperfeatures, originally used for image recognition extended
for use with motion features
→ Use temporal domain instead of spatial domain for
vector quantization (clustering)
Originally described in “Hyperfeatures:
Multilevel Local Coding for Visual Recog-
nition” Agarwal, A. (ECCV 06), for images Hyperfeatures for actions
25. Hyperfeatures for Actions
• From low-level motion features, create high-level
representations that can easily align with verbs in text
Cluster 3
at time t
Accumulate over
frame at time t
& cluster
Conduct vector
quantization
of the histogram
at time t
Cluster 3, 5, …,5,20
= Hyperfeature 6
Each color code
is a vector
quantized
STIP point
Vector quantized
STIP point histogram at time t
Accumulate clusters over
window (t-w/2, t+w/2]
and conduct vector
quantization
→ first-level hyperfeatures
Align hyperfeatures
with verbs from text
(using LCRF)
26. Latent-variable CRF Alignment
• CRF where the latent variable is the alignment
– N pairs of video/text observations {(xi, yi)}i=1 (indexed by i)
– Xi,m represents nouns and verbs extracted from the mth sentence
– Yi,n represents blobs and actions in interval n in the video
• Conditional likelihood
– conditional probability of
• Learning weights w
– Stochastic gradient descent
where
feature function
More details in Naim et al. 2015 NAACL Paper -
Discriminative unsupervised alignment of natural language instructions with corresponding video segments
N
27. Experiments: Wetlab Dataset
• RGB-Depth video with lab protocols in text
– Compare addition of hyperfeatures generated from motion features to
previous results (Naim et al. 2015)
• Small improvement over previous results
– Activities already highly correlated with object-use
Detection of objects in 3D space
using color and point-cloud
Previous results
using object/noun
alignment only
Addition of different types
of motion features
2DTraj: Dense trajectories
*Using hyperfeature window size w=150
28. Experiments: TACoS Dataset
• RGB video with crowd-sourced text descriptions
– Activities such as “making a salad,” “baking a cake”
– No object recognition, alignment using actions only
– Uniform: Assume each sentence takes the same amount of time over the entire sequence
– Segmented LCRF: Assume the segmentation of actions is known, infer only the action labels
– Unsupervised LCRF: Both segmentation and alignment are unknown
• Effect of window size and number of clusters
– Consistent with average
action length: 150 frames
*Using hyperfeature
window size w=150
*d(2)=64
29. Experiments: TACoS Dataset
• Segmentation from a sequence in the dataset
Crowd-sourced descriptions
Example of text and video alignment generated
by the system on the TACoS corpus for sequence s13-d28
30. Image Captioning with Semantic
Attention (CVPR 2016)
Quanzeng You, Jiebo Luo
Hailin Jin, Zhaowen Wang and Chen Fang
31. Image Captioning
• Motivations
– Real-world Usability
• Help visually impaired people, learning-impaired
– Improving Image Understanding
• Classification, Objection detection
– Image Retrieval
1. a young girl inhales with the intent of blowing out
a candle
2. girl blowing out the candle on an ice cream
1. A shot from behind home
plate of children playing
baseball
2. A group of children playing
baseball in the rain
3. Group of baseball players
playing on a wet field
32. Introduction of Image Captioning
• Machine learning as an approach to solve the problem
Model sentence
1. A young girl inhales with the intent of blowing out a
candle.
2. A young girl is preparing to blow out her candle.
3. A kid is to blow out the single candle in the bowl of
birthday goodness.
4. Girl blowing out the candle on an ice-cream
5. A little girl is getting ready to blow out a candle on a small
dessert.
1. A shot from behind home plate of children playing
baseball
2. A group of children playing baseball in the rain
3. Group of baseball players playing on a wet field
4. A batter leaning back so they don’t get hit by a ball
5. A group of young boys playing baseball in the rain
1. A girl in a park area flies
a multi-colored kite.
2. A girl flying a kit in the
sky
3. A young woman flying a
rainbow colored kite.
4. A person in a large field
flying a kite in the sky.
5. A woman looks up at her
colorful sailing kite.
33. Overview
• Brief overview of current approaches
• Our main motivation
• The proposed semantic attention model
• Evaluation results
34. Brief Introduction of Recurrent Neural Network
• Different from CNN
11),( −− +== ttttt BhAxhxfh
tt Chy =
• Unfolding over time
Feedforward network
Backpropagation Through Time
Inputs
Hidden Units
Outputs
xt ht-1
ht
yt
...
Convolutional Neural Network
Inputs
Hidden Units
Outputs
C
yt
Inputs
Hidden Units
Inputs
Hidden Units
B
B
A
A
A B
t-1
t-2
35. Applications of Recurrent Neural Networks
• Machine Translation
• Reads input sentence “ABC” and produces
“WXYZ”
Decoder RNNEncoder RNN
36. Encoder-Decoder Framework for Captioning
• Inspired by neural network based machine translation
• Loss function
∑=
−−=
−=
N
t
tt wwIwp
IwpL
1
10 ),,,|(log
)|(log
...
Convolutional Neural Network
w1
wstart
w2
w1
wN
wN-1
wend
wN
Image
Recurrent Neural Network
...
Convolutional Neural
Network
#Start#
Some
riverSome
elephants .
Some elephants roaming around
on a river bank.
bank
bank
37. Our Motivation
• Additional textual information
– Own noisy title, tags or captions (Web)
– Visually similar nearest neighbor images
– Success of low-level tasks
• Visual attributes detection
38. Image Captioning with Semantic Attention
• Main idea
RNN ...
CNN
attention
wave
riding
man
surfboard
ocean
water
surfer
surfing
person
board
0
0.1
0.2
0.3
surfboard
wave
surfing
39. First Idea
• Provide additional knowledge at each input node
• Concatenate the input word and the extra attributes K
• Each image has a fixed keyword list
)],,([),( 11 −− +== tktttt hbKWwfhxfh
Visual Features: 1024 GoogleNet
LSTM Hidden states: 512
Training details:
1. 256 image/sentence pairs
2. RMS-Prob
...
Convolutional Neural Network
w1
wstart
w2
w1
wN
wN-1
wend
wN
Retrieve Tags, titles,
descriptions
from weak annotated
images
Feature
extraction
K
Keywords,
key-phrase
Image
Recurrent Neural Network
40. Using Attributes along with Visual Features
• Provide additional knowledge at each input node
• Concatenate the visual embedding and keywords for h0
];[),( 10 bKWvWhvfh kiv +== −
...
Convolutional Neural Network
w1
wstart
w2
w1
wN
wN-1
wend
wN
Retrieve Tags, titles,
descriptions
from weak annotated
images
Feature
extraction
K
Keywords,
key-phrase
Image
h0
Recurrent Neural Network
41. Attention Model on Attributes
• Instead of using the same set of attributes at every step
• At each step, select the attributes
∑= m mtmt kKwatt α),(
)softmax VK(wT
tt =α
))],,(;([),( 11 −− == tttttt hKwattxfhxfh
...
CNN
w1
wstart
w2
w1
wN
wN-1
wend
wN
Retrieve Tags, titles,
descriptions
from weak annotated
images
Feature
extraction
K
Keywords,
key-phrase
RNN
42. Overall Framework
• Training with a bilinear/bilateral attention model
ht
pt
xt
v
{Ai}
Yt~
𝜑
𝜙
RNN
Image
CNN
AttrDet 1
AttrDet 2
AttrDet 3
AttrDet N
RNN ...
CNN
attention
wave
riding
man
surfboard
ocean
water
surfer
surfing
person
board
0
0.1
0.2
0.3
surfboard
wave
surfing
43. Visual Attributes
• A secondary contribution
• We try different approaches
vase flowers bathroom table glass sink blue
small white clear
k-NN
sitting table small many little glass different
flowers vase shown
Multi-label Ranking
vase flowers table glass sitting kitchen water
room white filled
FCN
44. Performance
• Examples showing the impact of visual attributes on captions
Google NIC
Top-5 visual
attributes
ATT-FCN
a white plate
topped with a
variety of food.
a plate with a
sandwich and
french fries.
plate broccoli
fries food
french
a baby with a
toothbrush in
its mouth.
a baby is eating
a piece of
paper.
teeth brushing
toothbrush
holding baby
a traffic light is
on a city street.
a street with
cars and a
clock tower.
street sign cars
clock traffic
a yellow and
black train on a
track.
a train traveling
down tracks
next to a
building.
train tracks
clock tower
down
a close up of a
plate of food on
a table.
a table topped
with a cake
with candles on
it.
a teddy bear
sitting on top of
a chair .
a white teddy
bear sitting
next to a
stuffed animal .
a person is
holding colorful
umbrella.
a black
umbrella sitting
on top of a
sandy beach .
a woman is
holding a cell
phone in her
hand .
a woman
holding a pair
of scissors in
her hands .
cake table plate
sitting birthday
teddy cat bear
stuffed white
umbrella beach
water sitting
boat
woman
bathroom her
scissors man
50. Examples
a skate boarder is doing
trick on his skate board. a gloved hand opens to
reveal a golden ring.
a sport car is swinging on the
race playground
the vehicle is moving fast
into the tunnel
51. Contributions
• A large scale animated GIF description dataset for
promoting image sequence modeling and
research
• Performing automatic validation to collect natural
language descriptions from crowd workers
• Establishing baseline image captioning methods
for future benchmarking
• Comparison with existing datasets, highlighting
the benefits with animated GIFs
52. In Comparison with Existing Datasets
• The language in our dataset is closer to common
language
• Our dataset has an emphasis on the verbs
• Animated GIFs are more coherent and self contained
• Our dataset can be used to solve more difficult
movie description problem
56. Comparing Professionals and Crowd-workers
Crowd worker: two people are kissing on a boat.
Professional: someone glances at a kissing couple then
steps to a railing overlooking the ocean an older man
and woman stand beside him.
Crowd worker: two men got into their car and
not able to go anywhere because the wheels
were locked.
Professional: someone slides over the
camaros hood then gets in with his partner he
starts the engine the revving vintage car starts
to backup then lurches to a halt.
Crowd worker: a man in a shirt and tie sits beside a person who is covered in a
sheet.
Professional: he makes eye contact with the woman for only a second.
More: http://beta-
web2.cloudapp.net/ls
mdc_sentence_compar
ison.html
57. Movie Descriptions versus TGIF
• Crowd workers are encouraged to describe the
major visual content directly, and not to use
overly descriptive language
• Because our animated GIFs are presented to
crowd workers without any context, the
sentences in our dataset are more self-contained
• Animated GIFs are perfectly segmented since
they are carefully curated by online users to
create a coherent visual story
58. Where is CV (AI) in 2016?
Winter 2002 Fall 2003 Summer 2008
Image/Video captioning
看图识字 看图说话
59. Thanks
Q & A
Google
***
Baidu
Sogou
Bing
XiaoIce
Are you smarter than a 5th grader?
What doe it take to go from a 5-
year old to a 5th grader?
1. Learning from “small data”
2. Unsupervised learning
3. Transfer learning
4. Integration of domain
knowledge or experience
60. Visual Intelligence & Social Multimedia Analytics
www.cs.rochester.edu/u/jluo
Questions?