This talk will present some recent advances in video understanding at Google. It will cover the technology behind progress in applications such as large-scale video annotation for YouTube, video summarization and Motion Stills, as well as our research in weakly-supervised learning, domain adaptation from YouTube to Google Photos and action recognition. I will also give my perspective on promising directions for future research in video.
Divya Jain at AI Frontiers : Video SummarizationAI Frontiers
As video content is becoming mainstream, video summarization is becoming a hot research topic in academia and industry. Video thumbnail generation and summarization has been worked on for years, but deep learning and reinforcement learning is changing the landscape and emerging as the winner for optimal frame selection. Recent advances in GANs are improving the quality, aesthetics and relevancy of the frames to represent the original videos. Come join this session to get an understanding of various challenges and emerging solutions around video summarization.
Always be open to leaning new things - that's something I keep in mind as an entrepreneur. New ideas come from many places! Perhaps this presentation "The Entrepreneur Mindset (by Ty Rhame)" will spark a few ideas for you.
How should startups embrace the trend of IoT and Big DataRuvento Ventures
This presentation prepared by Ruvento Ventures gives comprehensive coverage of the state of IoT, Big Data and AI industries. It covers the latest trends and most successful investments in Consumer Hardware. Moreover, we tried to give pieces of advice to startups working in the intersection of IoT, Big Data and AI.
Dekang Lin at AI Frontiers: Adding Conversation to GUIsAI Frontiers
Most AI assistants on mobile phones uses a conversational user interface (CUI) that mimics a chat app and translates user requests to API calls to backend services. I will present Conversational GUI (CGUI) which provides a thin layer of conversational interaction on top of existing GUI of mobile apps, by translating user requests into sequences of GUI actions such as clicks and swipes that user would have to perform by themselves. CUI avoids rebuilding existing user experiences in a chat window. More importantly, it makes it possible for end users, instead of software engineers, to create new skills by providing pairs of natural language expressions and a demonstration of the GUI actions.
Divya Jain at AI Frontiers : Video SummarizationAI Frontiers
As video content is becoming mainstream, video summarization is becoming a hot research topic in academia and industry. Video thumbnail generation and summarization has been worked on for years, but deep learning and reinforcement learning is changing the landscape and emerging as the winner for optimal frame selection. Recent advances in GANs are improving the quality, aesthetics and relevancy of the frames to represent the original videos. Come join this session to get an understanding of various challenges and emerging solutions around video summarization.
Always be open to leaning new things - that's something I keep in mind as an entrepreneur. New ideas come from many places! Perhaps this presentation "The Entrepreneur Mindset (by Ty Rhame)" will spark a few ideas for you.
How should startups embrace the trend of IoT and Big DataRuvento Ventures
This presentation prepared by Ruvento Ventures gives comprehensive coverage of the state of IoT, Big Data and AI industries. It covers the latest trends and most successful investments in Consumer Hardware. Moreover, we tried to give pieces of advice to startups working in the intersection of IoT, Big Data and AI.
Dekang Lin at AI Frontiers: Adding Conversation to GUIsAI Frontiers
Most AI assistants on mobile phones uses a conversational user interface (CUI) that mimics a chat app and translates user requests to API calls to backend services. I will present Conversational GUI (CGUI) which provides a thin layer of conversational interaction on top of existing GUI of mobile apps, by translating user requests into sequences of GUI actions such as clicks and swipes that user would have to perform by themselves. CUI avoids rebuilding existing user experiences in a chat window. More importantly, it makes it possible for end users, instead of software engineers, to create new skills by providing pairs of natural language expressions and a demonstration of the GUI actions.
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
The market is already demonstrating strong value in the home for voice-activated AI, but the work environment is yet to catch up. Omar will explain why voice-activated AI is the most important development to come to the workplace. He will pull from his experiences creating Eva, the first enterprise voice assistant focused on making meetings more actionable, and dive specifically into the challenges of ASR (Automatic Speech Recognition), NLP and neural networks in creating these kinds of voice-activated assistants. He will share how his team have overcome these challenges.
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...AI Frontiers
In this talk, I will present recent developments in Google Research for end-to-end goal-oriented dialogue systems, with components for language understanding, dialogue state tracking, policy, and language generation. The talk will summarize novel aspects of each component, and highlight novel approaches where dialogue is viewed as a collaborative game between a user and an agent: The user has a goal in mind and the agent has access to the data that user is interested in, and can perform actions in order to realize the user’s goal. The two engage in a conversation so that the agent can help the user find a way for task completion.
Yuandong Tian at AI Frontiers: AI in Games: Achievements and ChallengesAI Frontiers
Recently, substantial progress of AI has been made in applications that require advanced pattern reading, including computer vision, speech recognition and natural language processing. However, it remains an open problem whether AI will make the same level of progress in tasks that require sophisticated reasoning, planning and decision making in complicated game environments similar to the real-world. In this talk, I present the state-of-the-art approaches to build such an AI, our recent contributions in terms of designing more effective algorithms and building extensive and fast general environments and platforms, as well as issues and challenges.
Xiaofeng Ren at AI Frontiers: The Quest for Video UnderstandingAI Frontiers
In this talk I will briefly discuss the ubiquitous needs of video and video understanding across Alibaba and the challenges that are being addressed and solved at iDST, Alibaba's AI R&D division. Examples include mobile shopping on Taobao, video search and recommendation on Youku and Tudou, and real-time systems for Cainiao Logistics and City Brain.
James Manyika at AI Frontiers: Sizing up the promise of AIAI Frontiers
This presentation will draw on new findings from the McKinsey Global Institute's ongoing research on the economic and business impact of AI. It will explore four key questions for AI today: who is investing and where, who is adopting AI and how, where can AI improve corporate performance, and what do business leaders need to know tomorrow morning.
Magnus Nordin at AI Frontiers: Deep Learning for Game DevelopmentAI Frontiers
The number of applications of deep neural networks has multiplied in the last couple of years. Neural nets has enabled significant breakthroughs in everything from computer vision, voice generation, voice recognition, translation, and self-driving cars. Neural nets will also be an powerful enabler for future game development. This presentation will give an overview of the potential of neural nets in game development, as well as provide an in depth look at how we can use neural nets combined with reinforcement learning for new types of game AI.
Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBNAI Frontiers
Deep learning has evolved not linearly but through a series of step-functions: sudden unexpected outbreaks of capability, which fundamentally changed the envelope of what computers are able to do. At TwentyBN, we have created spatio-temporal video models and data infrastructure that allowed us to grow approximately one million labeled videos showing everyday common-sense scenes and situations - many of them extremely subtle. This allowed us to successfully train neural networks end-to-end on a wide range of action understanding tasks, that neither hand-engineering nor neural networks had appeared anywhere near solving just a few months ago. I will show how these recognition tasks now drive commercial value at TwentyBN, and how they drive our long-term AI agenda for learning common sense world knowledge through video.
Frank Chen at AI Frontiers: Startups and AIAI Frontiers
Isn't AI going to be dominated by the big companies like Google and Amazon and Microsoft and Baidu? What can startups do to thrive in this ecosystem? What are investors looking for when they meet AI-powered startups? Should startups with AI inside think about their go-to-market process any differently from other startups? Frank Chen from Andreessen Horowitz will tackle these and other AI startup questions in this session.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on understanding videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now striving to bridge videos with natural language, which can be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video-language alignment and video captioning.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques.
Face recognition for augmented reality and media management.Viewdle.2011.Alexa Dovgopolaya
Applications and building blocks of the face recognition technology developed by Viewdle. Concepts and products of face recognition usage in cell phone augmented reality and photo-video content management and sharing are presented. Overview of the technology building blocks targeting different hardware and software environment is given. Among others face detection, feature detection, face tracking and face recognition operation in different environments and applications is considered. Prototypes of the products are presented.
Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The W...AI Frontiers
The market is already demonstrating strong value in the home for voice-activated AI, but the work environment is yet to catch up. Omar will explain why voice-activated AI is the most important development to come to the workplace. He will pull from his experiences creating Eva, the first enterprise voice assistant focused on making meetings more actionable, and dive specifically into the challenges of ASR (Automatic Speech Recognition), NLP and neural networks in creating these kinds of voice-activated assistants. He will share how his team have overcome these challenges.
Dilek Hakkani-Tur at AI Frontiers: Conversational machines: Deep Learning for...AI Frontiers
In this talk, I will present recent developments in Google Research for end-to-end goal-oriented dialogue systems, with components for language understanding, dialogue state tracking, policy, and language generation. The talk will summarize novel aspects of each component, and highlight novel approaches where dialogue is viewed as a collaborative game between a user and an agent: The user has a goal in mind and the agent has access to the data that user is interested in, and can perform actions in order to realize the user’s goal. The two engage in a conversation so that the agent can help the user find a way for task completion.
Yuandong Tian at AI Frontiers: AI in Games: Achievements and ChallengesAI Frontiers
Recently, substantial progress of AI has been made in applications that require advanced pattern reading, including computer vision, speech recognition and natural language processing. However, it remains an open problem whether AI will make the same level of progress in tasks that require sophisticated reasoning, planning and decision making in complicated game environments similar to the real-world. In this talk, I present the state-of-the-art approaches to build such an AI, our recent contributions in terms of designing more effective algorithms and building extensive and fast general environments and platforms, as well as issues and challenges.
Xiaofeng Ren at AI Frontiers: The Quest for Video UnderstandingAI Frontiers
In this talk I will briefly discuss the ubiquitous needs of video and video understanding across Alibaba and the challenges that are being addressed and solved at iDST, Alibaba's AI R&D division. Examples include mobile shopping on Taobao, video search and recommendation on Youku and Tudou, and real-time systems for Cainiao Logistics and City Brain.
James Manyika at AI Frontiers: Sizing up the promise of AIAI Frontiers
This presentation will draw on new findings from the McKinsey Global Institute's ongoing research on the economic and business impact of AI. It will explore four key questions for AI today: who is investing and where, who is adopting AI and how, where can AI improve corporate performance, and what do business leaders need to know tomorrow morning.
Magnus Nordin at AI Frontiers: Deep Learning for Game DevelopmentAI Frontiers
The number of applications of deep neural networks has multiplied in the last couple of years. Neural nets has enabled significant breakthroughs in everything from computer vision, voice generation, voice recognition, translation, and self-driving cars. Neural nets will also be an powerful enabler for future game development. This presentation will give an overview of the potential of neural nets in game development, as well as provide an in depth look at how we can use neural nets combined with reinforcement learning for new types of game AI.
Roland Memisevic at AI Frontiers: Common sense video understanding at TwentyBNAI Frontiers
Deep learning has evolved not linearly but through a series of step-functions: sudden unexpected outbreaks of capability, which fundamentally changed the envelope of what computers are able to do. At TwentyBN, we have created spatio-temporal video models and data infrastructure that allowed us to grow approximately one million labeled videos showing everyday common-sense scenes and situations - many of them extremely subtle. This allowed us to successfully train neural networks end-to-end on a wide range of action understanding tasks, that neither hand-engineering nor neural networks had appeared anywhere near solving just a few months ago. I will show how these recognition tasks now drive commercial value at TwentyBN, and how they drive our long-term AI agenda for learning common sense world knowledge through video.
Frank Chen at AI Frontiers: Startups and AIAI Frontiers
Isn't AI going to be dominated by the big companies like Google and Amazon and Microsoft and Baidu? What can startups do to thrive in this ecosystem? What are investors looking for when they meet AI-powered startups? Should startups with AI inside think about their go-to-market process any differently from other startups? Frank Chen from Andreessen Horowitz will tackle these and other AI startup questions in this session.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on understanding videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now striving to bridge videos with natural language, which can be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video-language alignment and video captioning.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques.
Face recognition for augmented reality and media management.Viewdle.2011.Alexa Dovgopolaya
Applications and building blocks of the face recognition technology developed by Viewdle. Concepts and products of face recognition usage in cell phone augmented reality and photo-video content management and sharing are presented. Overview of the technology building blocks targeting different hardware and software environment is given. Among others face detection, feature detection, face tracking and face recognition operation in different environments and applications is considered. Prototypes of the products are presented.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning and knowledge graph techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video entity linking, video-language alignment, and video captioning, and discuss how domain knowledge can fit in to improve the performance.
Does deep learning solve all the machine learning problems? Where would domain knowledge fit in? While it is common in medical data analytics to incorporate domain knowledge, we focus on one emerging area in computer vision and language processing, video+language, to answer these questions.
Video has become ubiquitous on the Internet, TV, as well as personal devices. Recognition of video content has been a fundamental challenge in computer vision for decades, where previous research predominantly focused on recognizing videos using a predefined yet limited vocabulary. Thanks to the recent development of deep learning and knowledge graph techniques, researchers in multiple communities are now striving to bridge videos with natural language in order to move beyond classification to interpretation, which should be regarded as the ultimate goal of video understanding. We will present recent advances in exploring the synergy of video understanding and language processing techniques, including video entity linking, video-language alignment, and video captioning, and discuss how domain knowledge can fit in to improve the performance.
Digital transformation with AI and process automation.
Prior consulting use cases in the domain of talent acquisition, e-commerce, e-Publishing and HR analytics.
Content Based Video Retrieval Using Integrated Feature Extraction and Persona...IJERD Editor
Traditional video retrieval methods fail to meet technical challenges due to large and rapid growth of
multimedia data, demanding effective retrieval systems. In the last decade Content Based Video Retrieval
(CBVR) has become more and more popular. The amount of lecture video data on the Worldwide Web (WWW)
is growing rapidly. Therefore, a more efficient method for video retrieval in WWW or within large lecture video
archives is urgently needed. This paper presents an implementation of automated video indexing and video
search in large videodatabase. First of all, we apply automatic video segmentation and key-frame detection to
extract the frames from video. At next, we extract textual keywords by applying on video i.e. Optical Character
Recognition (OCR) technology on key-frames and Automatic Speech Recognition (ASR) on audio tracks of that
video. At next, we also extractingcolour, texture and edge detector features from different method. At last, we
integrate all the keywords and features which has extracted from above techniques for searching
purpose.Finallysearch similarity measure is applied to retrieve the best matchingcorresponding videos are
presented as output from database. Additionally we are providing Re-ranking of results as per users interest in
original result.
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
Topic: How to use big data to enhance AI
Outline:
1. Spark ETL
Spark SQL
Spark Streaming
2. Spark ML
Spark ML pipeline
Distributed model tuning
Spark ML model and data lineage management
3. Spark XGboost
XGboost introduction
XGboost with Spark
XGboost with GPU
4. Spark Deep Learning pipeline
Transfer learning
Build Spark ML pipeline with TensorFlow
Model selection on distributed TF model
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...AI Frontiers
In this tutorial I will introduce recent work in applying weak supervision and reinforcement learning to Questions Answering (QA) systems. Specifically we discuss the semantic parsing task for which natural language queries are converted to computation steps on knowledge graphs or data tables and produce the expected answers. State-of-the-art results can be achieved by novel memory structure for sequence models and improvements in reinforcement learning algorithms. Related code and experiment setup can be found at https://github.com/crazydonkey200/neural-symbolic-machines. Related paper: https://openreview.net/pdf?id=SyK00v5xx.
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksAI Frontiers
Instructor: Mat Leonard
Outline
1. Text Processing
Using Python + NLTK
Cleaning
Normalization
Tokenization
Part-of-speech Tagging
Stemming and Lemmatization
2. Feature Extraction
Bag of Words
TF-IDF
Word Embeddings
Word2Vec
GloVe
3. Topic Modeling
Latent Variables
Beta and Dirichlet Distributions
Laten Dirichlet Allocation
4. NLP with Deep Learning
Neural Networks
Recurrent Neural Networks (RNNs)
Word Embeddings
Sentiment Analysis with RNNs
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers
Sequence to sequence learning is a powerful way to train deep networks for machine translation, various NLP tasks, but also image generation and recently video and music generation. We will give a hands-on tutorial showing how to use the open-source Tensor2Tensor library to train state-of-the-art models for translation, image generation, and a task of your choice!
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningAI Frontiers
In recent years, machine learning has undoubtedly been hugely successful in driving progress in AI applications. However, as we will explore in this talk, even state-of-the-art systems have "blind spots" which make them generalize poorly out of domain and render them vulnerable to adversarial examples. We then suggest that more unsupervised learning settings can encourage the development of more robust systems. We show positive results on two tasks: (i) text style and attribute transfer, the task of converting a sentence with one attribute (e.g., sentiment) to one with another; and (ii) solving SAT instances (classical problems requiring logical reasoning) using end-to-end neural networks.
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI missionAI Frontiers
I will present several advances in deep learning from OpenAI. First, I will present OpenAI Five, a neural network that learned to play on par with some of the strongest professional Dota 2 teams in the world in an 18-hero version of the game. Next, I will present Dactyl, a human-like robot hand trained entirely in simulation with reinforcement learning that has achieved unprecedented dexterity on a physical robot. I will also present our results on unsupervised learning in language, that show that pre-training and finetuning can achieve a significant improvement over state of the art. Finally, I will present an overview of the historical progress in the field.
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...AI Frontiers
The availability of affordable electronics components, powerful embedded microprocessors, and ubiquitous internet access and WiFi in the household has enabled a new generation of connected consumer robots. In 2015, iRobot launched the Roomba 980, introducing intelligent visual navigation to its successful line of vacuum cleaning robots. In 2018, iRobot launched the Roomba i7, equipped with the latest mapping and navigation technology that provides spatial information to the broader ecosystem of connected devices in the home. In this talk, I will describe the challenges and the potential of introducing consumer robots capable of developing spatial context by exploring the physical space of the home, and I will elaborate on the impact of AI in the future of robotics applications. Moreover, I will describe our vision of the Smart Home, an AI-powered home that maintains itself and magically just does the right thing in anticipation of occupant needs. This home will be built on an ecosystem of connected and coordinated robots, sensors, and devices that provides the occupants with a high quality of life by seamlessly responding to the needs of daily living – from comfort to convenience to security to efficiency.
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...AI Frontiers
As the data and models scale, it becomes necessary to have multiple processing units for both training and inference. SignSGD is a gradient compression algorithm that only transmits the sign of the stochastic gradients during distributed training. This algorithm uses 32 times less communication per iteration than distributed SGD. We show that signSGD obtains free lunch both in theory and practice: no loss in accuracy while yielding speedups. Pushing the current boundaries of deep learning also requires using multiple dimensions and modalities. These can be encoded into tensors, which are natural extensions of matrices. These functionalities are available in the Tensorly package with multiple backend interfaces for large-scale deep learning.
Sumit Gupta at AI Frontiers : AI for EnterpriseAI Frontiers
The use of AI for voice search and image recognition is talked about often. Enterprises, however, have different challenges and requirements. In this talk, we will focus on talking about use cases in the enterprise and challenges in building out AI solutions. We will talk about how an Auto-machine learning software for videos and images called PowerAI Vision enables quick AI model training & deployment for various enterprise use cases.
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningAI Frontiers
Deep Reinforcement Learning (DRL) has made strong progress in many tasks, such as board games, robotics, navigation, neural architecture search, etc. I will present our recent open-sourced DRL frameworks to facilitate game research and development. Our framework is scalable so we can can reproduce AlphaGoZero and AlphaZero using 2000 GPUs, achieving super-human performance of Go AI that beats 4 top-30 professional players. We also show usability of our platform by training agents in real-time strategy games, and show interesting behaviors with a small amount of resource.
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAI Frontiers
The latest AI advances have the potential to massively improve our health and well being. However, most of the work is yet to be done. In this talk, we will explore the most important opportunities for AI in healthcare. For example, we will explore how AI can diagnose major life-threatening conditions even before those conditions emerge. We will talk about AI ability to recommend dramatically more effective and less harmful treatment plans based on AI understanding of patient's medical history and current conditions. Finally, we will talk about AI role in making our healthcare system effective and affordable for everyone.
Long Lin at AI Frontiers : AI in GamingAI Frontiers
Games have been leveraging AI since the 1950s, when people built a rules-based AI engine that played tic-tac-toe. With technological advances over the years, AI has become increasingly popular and widely used in the gaming industry. The typical characteristics of games and game development makes them an ideal playground for practicing and implementing AI techniques, especially deep learning and reinforcement learning. Most games are well scoped; it is relatively easy to generate and use the data; and states/actions/rewards are relatively clear. In this talk, I will show a couple of use cases where ML/AI helps in-game development and enhances player experience. Examples include AI agents playing game and services that provide personalized experience to players.
Melissa Goldman at AI Frontiers : AI & FinanceAI Frontiers
AI in finance is having wide-ranging impact and solving some of the most critical societal problems. The talk gives overview of the opportunities of applying AI in finance with specific examples and highlights some of the unique challenges financial services firms face in deploying AI at scale.
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...AI Frontiers
I will first survey how deep learning has disrupted speech and language processing industries since 2009. Then I will draw connections between the techniques for modeling speech and language and those for financial markets. Finally, I will address three unique technical challenges to financial investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Rahul Sukthankar at AI Frontiers: Large-Scale Video Understanding: YouTube and Beyond
1. Large-Scale Video Understanding:
YouTube and Beyond
Rahul Sukthankar
Machine Perception, Google Research
https://research.google.com/teams/perception/
AI Frontiers Conference - Nov. 3, 2017
5. Sample of Perception tech in products
(Seth LaForge, Nexus 5X)
HDR+ in Android Camera Mobile Vision API
6. Sample of Perception tech in products
Organizing Photos image & video
collections and making them
searchable by content
Microvideo tech in
Photos & Motion Stills
De-reflection & tracking
in Photo Scanner
7. Sample of Perception tech in products
Personalized sticker
packs in Allo
On-device handwriting
input & recognition
OCR for lots of languages
8. Sample of Perception tech in products
Visual & auditory
annotation & signals on
YouTube
Thumbnail/preview selection &
optimization for YouTube
Non-speech sound captions
on YouTube
9. Sample of Perception tech in products
Region tracking for custom blurring
tool on YouTube
Mobile creative effects on YouTube
10. watch, listen, understandcapture a moment improve & manipulate
Useful Applications for Video Technology
Help users create, enhance, organize, and discover videos.
14. Large-Scale Video Annotation for YouTube
extract
features
quantize &
aggregate
train model
(e.g., AdaBoost)
training data
Video understanding pipeline as of ~5 years ago
frame
features
video
features
“Roller-blading”
hand-designed
descriptors
codebook
histogram
pixels & sound
samples
15. Large-Scale Video Annotation for YouTube
extract
features
training data
Modern video understanding pipeline
“Roller-blading”
pixels & sound
samples
Magic box containing many
convolutional, deep, end-to-
end buzzwords :-)
16. Deep-learned visual features
Inception model
trained on noisy
data (images)
Bottleneck
embedding
layer (1000-d)
Videos with noisy labels
Frame-level Video-level
- Max pooling
- Avg pooling
- VLAD pooling
17. +80%
mean avg.
precision
40x more compact features
Deep learned visual
features, VLAD coding:
1024-d, 0.272 MAP
Handcrafted audio-
visual features: ~40K-
d, 0.153 MAP
MeanAveragePrecision
Dimensionality
0.40
0.30
0.20
0.10
0
Deep-learned vs. handcrafted features
21. Domain adaptation: Finding home videos on YouTube
By capture device
vs
By video frame rate
By video orientation
vs
22. The technology behind personal video search
Video
Trained on web images
Image / photo
annotation model
1
23. The technology behind personal video search
Video
Trained on web images
Image / photo
annotation model
YouTube frame
annotation model
Trained on video thumbnails
Domain-adapted
frame-level
vision model
1
2
24. YouTube video
annotation model
Trained on YouTube videos
The technology behind personal video search
Video
Trained on web images
Image / photo
annotation model
YouTube frame
annotation model
Trained on video thumbnails
Domain-adapted
frame-level
vision model
Domain-adapted
video-level
vision model
1
2
3
25. YouTube video
annotation model
Trained on YouTube videos
The technology behind personal video search
Video
Audio
Trained on web images
Image / photo
annotation model
Trained on YouTube videos
YouTube audio
annotation model
YouTube frame
annotation model
Trained on video thumbnails
Domain-adapted
frame-level
vision model
Domain-adapted
video-level
vision model
Domain-adapted
audio model
1
2
3
4
26. YouTube video
annotation model
Trained on YouTube videos
toddler
dancing
The technology behind personal video search
Video
Audio
Trained on web images
Image / photo
annotation model
Trained on YouTube videos
YouTube audio
annotation model
YouTube frame
annotation model
Trained on video thumbnails
Domain-adapted
frame-level
vision model
Domain-adapted
video-level
vision model
Domain-adapted
audio model
1
2
3
4
Fusion &
calibration
5
Trained on
home videos
Domain-adapted
personal video
model
28. Evolution of personal video annotation models
1
2
3
4
Photo annotation model applied on video frames
29. Evolution of personal video annotation models
Domain adaptation + fusion across frames
1
2
3
4
Photo annotation model applied on video frames
30. Evolution of personal video annotation models
Fusion across multiple vision models
Domain adaptation + fusion across frames
1
2
3
4
Photo annotation model applied on video frames
31. Evolution of personal video annotation models
Fusion across multiple audio-visual models
Fusion across multiple vision models
Photo annotation model applied on video frames
Domain adaptation + fusion across frames
1
2
3
4
45. Sermanet, Self-Supervised Imitation, Google Brain
Self-Supervised Imitation
Pierre Sermanet* Corey Lynch* Yevgen Chebotar*
Jasmine Hsu Eric Jang Stefan Schaal Sergey Levine
Google Brain + University of Southern California
* equal contribution
57. Open Images v3 - detailed spatial annotations in images
Example validation images
58. Open Images v3 - detailed spatial annotations in images
Example validation images
59. ● Significant progress in large-scale video annotation for YouTube
● Video understanding has many applications beyond YouTube
● We encourage others to work on video through public datasets
● Many exciting research problems ahead, particularly in learning from video
(I think there’s a lot more progress to be made in video understanding)
Conclusion