[論文読み]Interpretable Coun.ng for Visual Ques.on Answering

•

1 like•618 views

1) The document presents a new model called Interpretable Reinforced Learning Counting (IRLC) for visual question answering that focuses on counting objects in natural images. 2) IRLC uses an object detector to ground question words to image regions, then uses an LSTM to incrementally count the number of detected objects while generating an interpretable counting process. 3) Experimental results on the VQA2.0, Visual Genome, and HowMany-QA datasets show that IRLC outperforms previous counting models like Softcount and UpDown in terms of accuracy and root mean squared error on counting questions.

Technology

Interpretable Coun.ng for
Visual Ques.on Answering
3
d-hacks
1

•
• VQA
• VQA
• Dataset
• Model
• Object detection
• Language
• Counting
• Experiment & Results
•
2

•
(VQA)
SOTA
• interpretable
Model Model
SOTA
Our approach
2 2
3

•
- Alexander Tro,, Caming Xiong & Richard Socher
- Salesforce Research
• ICLR 2018
• OpenReview 6,7,7
4

Visual Question Answering; VQA
•
• CVPR2016 Competition
5

1:
•
• 2015 CVPRW, Learning to count with deep object features
• CNN
•
• 2016 ECCV, Towards perspective-free object counting with deep learning
• …
6

2: Visual Question Answering
• 11% ( : )
• :
• Simple Baseline for Visual Ques7on Answering 2015
• Exploring Models and Data for Image Ques7on Answering. In NIPS, 2015a.
7

3: A$en'on
• ( UpDown )
• (Attention )
• Multimodal compact bilinear pooling for visual question answering and
visual grounding. EMNLP, 2016
• Bottom-Up and Top-Down Attention for Image Captioning and VQA. arXiv,
2017.
8

4: VQA
• VQA
• →
•
•
• It Takes Two to Tango: Towards Theory of AI’s Mind, 2017
• AI AI (
)
9
AI

• VQA
• VQA2.0 Visual Genome
/
HowMany-QA
11

VQA2.0 / Visual Genome
• VQA2.0 (1.0 2015)
• Image + (Ques4on + Answer)
• 265,016
• 3 /1
• 10 +3
/1
• Visual Genome
• Image +
•
• 108,077
• 5.4 Million Region Descrip4ons
• 1.7 Million Visual Ques4on Answers
• 3.8 Million Object Instances
• 2.8 Million ALributes
• 2.3 Million Rela4onships
12

HowMany-QA
• 0 20
• [“how many”, “number of”, “amount of”, or “count of”]
• [ “number of the”] reject
• ID
13
Our Dataset

Input: Input:
Output:
• 3
15
Caption Grounding

• →
•
• : Bounding box
• : ( )
• Faster RCNN*
16

function
•
•
• Gated Tanh Units (GTU)*
Gated Tanh Units : Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional Image Generation with PixelCNN Decoders. In NIPS, 2016
29
0 10.02
5 2
1

•
•
• : bounding box
• :
•
•
•
(Interpretable Rainforced Learning Counting)
30

So#Count
• (0 1 )
• →
32
0.02
5 2
1 0.50
0.99 0.88
0.73

UpDown (Attention )
• Attention (UpDown, 2017)
• →
33
0.02
5 2
1
0.50*
0.99* 0.88*
0.73*
p_i …

IRLC ←Our Model
• →
•
• Object
• → +=1
•
34
→
→

IRLC
• K
• R
•
37
: http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf
acAon

• accuracy
•
• ( )
• RMSE
•
• ( )
41
10
3

• IRLC
• Accuracy IRLC RMSE IRLC Softcount
• UpDown Softcount accuracy
bad
bad
43

• VQA + VG = HowManyQA
• Coun1ng HowMany QA
44

Grounding
• (
)
•
Chandrasekaran(2017)
•
46

Grounding
• Microsoft COCO: Common Objects in Context:
• 80 ( )
• 80
47
Category: train
Category: line
GloVe:
SVD
Word2Vec

Grounding Quality
• 0.5=<IoU COCO
/ Background
• IoU(Intersection over union)… boundingBox
• : m i
48
m
i

• m
(ex. q=“car” How many “cars” are there )
• : So9Count or IRLC
Grounding Quality
49
m
i
i
( )

Grounding
• Grounding IRLC
• ( )
( ) Grounding
50

ICLR2018 VQA Coun0ng
• Learning to Count Objects in Natural Images for Visual Question
Answering Yan Zhang, Jonathon Hare, Adam Prügel-Bennett
• Open review 6,4,6
52

• VQA
•
(So'count UpDown) Grounding
→IRLC
• Grounding
• VQA2.0 VG
HowMany-QA (VQA2.0
Accuracy RMSE )
53

Graphs are useful data structures that can be used to model various sorts of data: from molecular protein structures to social networks, pandemic spreading models, and visually rich content such as websites & invoices. In the recent few years, graph neural networks have done a huge leap forward. It is a powerful tool that every data scientist should know. In this talk, we will review their basic structure, show some example usages, and explore the existing (python) tools.

Webinar on Graph Neural Networks

LucaCrociani1

Graph R-CNN for Scene Graph Generation

Sangmin Woo

Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...

Preferred Networks

This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.

Inspirational applications of deep learning

ssh1

Graph Partitioning and Spectral Methods

Carlos Castillo (ChaTo)

Detection and reconstruction of 3D buildings in urban areas has been a hot topic of research due to its many applications, including 3D population density studies, emergency planning, and building value estimation. Standard approaches to extract building footprint and measure building height rely on either aerial or space borne point cloud data, which in many areas is unavailable. In contrast, high resolution satellite imagery has become more readily available in recent years, and could provide enough information to estimate a building’s height. Recent successes of deep learning on semantic segmentation have shown that convolutional neural networks can be effective tools at extracting 2D building footprints. Using a digital surface model derived using FOSS and LiDAR data as ground truth, this study goes a step further by employing state of the art deep learning architectures such as U-net to infer both building footprints and estimated building heights in one pass from a single satellite image. This application of open deep learning frameworks can bring the benefits of 3D cities to a larger portion of the world.

Graph Evolution Models

Carlos Castillo (ChaTo)

Lecture 16Anshul Yadav

Algorithms for Graph Coloring Problem

Shengyi Wang

Presentation

Ildar Nurgaliev

OWLGrEd Ontology Visualizer

Uldis Bojars

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

SOYEON KIM

Synchronizing Multi-User Photo Galleries with MRF

multimediaeval

We present a novel solution to the MediaEval 2014 Event Synchronization Task: Synchronization of Multi-User Event Media (SEM). The framework is based on a probabilistic graphical model. Thanks to the simple topology of the graph, the estimation of the true temporal displacement among multiple photo collections can be performed eciently through exact inference. The underlying tness function is dened in a exible way, for which it is possible to integrate easily new information (e.g., text tags or social network data). The exibility makes the framework suitable and adaptable to cope with many real situations. The method is evaluated on two datasets obtaining an overall accuracy of more than 85% in both cases

Deep LearningフレームワークChainerと最近の技術動向

Shunta Saito

Image style transfer & AI on App

Chihyang Li

Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng

Databricks

Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems within those graphs, you need tools to analyze the graphs easily and efficiently. At Spark Summit 2016, Databricks introduced GraphFrames, which implemented graph queries and pattern matching on top of Spark SQL to simplify graph analytics. In this talk, you'll learn about work that has made graph algorithms in GraphFrames faster and more scalable. For example, new implementations like connected components have received algorithm improvements based on recent research, as well as performance improvements from Spark DataFrames. Discover lessons learned from scaling the implementation from millions to billions of nodes; compare its performance with other popular graph libraries; and hear about real-world applications.

Challenging Web-Scale Graph Analytics with Apache Spark

Databricks

Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems within those graphs, you need tools to analyze the graphs easily and efficiently. At Spark Summit 2016, Databricks introduced GraphFrames, which implemented graph queries and pattern matching on top of Spark SQL to simplify graph analytics. In this talk, you’ll learn about work that has made graph algorithms in GraphFrames faster and more scalable. For example, new implementations like connected components have received algorithm improvements based on recent research, as well as performance improvements from Spark DataFrames. Discover lessons learned from scaling the implementation from millions to billions of nodes; compare its performance with other popular graph libraries; and hear about real-world applications.

A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...

Sangwoo Mo

Applying your Convolutional Neural Networks

Databricks

Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.

style clip paper review !!

taeseon ryu

A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...

Spark Summit

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

Wee Hyong Tok

Image style transfer and iOS CoreML, Vision Frameworks

Chihyang Li

Transformer in Vision

Sangmin Woo

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

MLconf

Graph Representation Learning with Deep Embedding Approach: Graphs are commonly used data structure for representing the real-world relationships, e.g., molecular structure, knowledge graphs, social and communication networks. The effective encoding of graphical information is essential to the success of such applications. In this talk I’ll first describe a general deep learning framework, namely structure2vec, for end to end graph feature representation learning. Then I’ll present the direct application of this model on graph problems on different scales, including community detection and molecule graph classification/regression. We then extend the embedding idea to temporal evolving user-product interaction graph for recommendation. Finally I’ll present our latest work on leveraging the reinforcement learning technique for graph combinatorial optimization, including vertex cover problem for social influence maximization and traveling salesman problem for scheduling management.

lecture_16_jiajun.pdf

Kuan-Tsae Huang

What's hot

Gnn overview

Louis (Yufeng) Wang

Using Deep Learning to Derive 3D Cities from Satellite Imagery

Astraea, Inc.

Graph Evolution Models

Carlos Castillo (ChaTo)

Lecture 16Anshul Yadav

Algorithms for Graph Coloring Problem

Shengyi Wang

Presentation

Ildar Nurgaliev

OWLGrEd Ontology Visualizer

Uldis Bojars

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

SOYEON KIM

Synchronizing Multi-User Photo Galleries with MRF

multimediaeval

What's hot (9)

Gnn overview

Using Deep Learning to Derive 3D Cities from Satellite Imagery

Graph Evolution Models

Lecture 16

Algorithms for Graph Coloring Problem

Presentation

OWLGrEd Ontology Visualizer

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Synchronizing Multi-User Photo Galleries with MRF

Similar to [論文読み]Interpretable Coun.ng for Visual Ques.on Answering

Deep LearningフレームワークChainerと最近の技術動向

Shunta Saito

Image style transfer & AI on App

Chihyang Li

Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng

Databricks

Challenging Web-Scale Graph Analytics with Apache Spark

Databricks

Graph analytics has a wide range of applications, from information propagation and network flow optimization to fraud and anomaly detection. The rise of social networks and the Internet of Things has given us complex web-scale graphs with billions of vertices and edges. However, in order to extract the hidden gems within those graphs, you need tools to analyze the graphs easily and efficiently. At Spark Summit 2016, Databricks introduced GraphFrames, which implemented graph queries and pattern matching on top of Spark SQL to simplify graph analytics. In this talk, you’ll learn about work that has made graph algorithms in GraphFrames faster and more scalable. For example, new implementations like connected components have received algorithm improvements based on recent research, as well as performance improvements from Spark DataFrames. Discover lessons learned from scaling the implementation from millions to billions of nodes; compare its performance with other popular graph libraries; and hear about real-world applications.

A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...

Sangwoo Mo

Applying your Convolutional Neural Networks

Databricks

style clip paper review !!

taeseon ryu

A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...

Spark Summit

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

Wee Hyong Tok

Image style transfer and iOS CoreML, Vision Frameworks

Chihyang Li

Transformer in Vision

Sangmin Woo

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

MLconf

lecture_16_jiajun.pdf

Kuan-Tsae Huang

The Matsu Project - Open Source Software for Processing Satellite Imagery Data

Robert Grossman

Cepicky os-mapping-frameworksJachym Cepicky

Navigation-aware adaptive streaming strategies for omnidirectional video

Silvia Rossi

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

Yahoo Developer Network

In the analysis of big data there are problematic queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most frequent items, joins, matrix computations, and graph analysis. If approximate results are acceptable, there is a class of sub-linear, stochastic streaming algorithms, called "sketches", that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of extracting results for these problem queries in real-time, sketches are the only known solution. For any analysis system that requires these problematic queries from big data, sketches are a required toolkit that should be tightly integrated into the system's analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days to hours, or minutes to seconds on a number of its internal platforms. This talk covers the current state of our Open Source DataSketches.github.io library, which includes adaptations and example code for Pig, Hive, Spark and Druid and gives architectural examples of use and a case study. Speakers: Jon Malkin is a scientist at Yahoo working to extend the DataSketches library. His previous roles have involved large scale data processing for sponsored search, display advertising, user counting, ad targeting, and cross-device user identity modeling. Alexander Saydakov is a senior software engineer at Yahoo working on the open source Data Sketches project. In his previous roles he has been involved in building large-scale back-end data processing systems and frameworks for data analytics and experimentation based on Torque, Hadoop, Pig, Hive and Druid. Alexander’s education background is in the field of applied mathematics.

alphablues - ML applied to text and image in chat bots

André Karpištšenko

Deep Learning for Personalized Search and Recommender Systems

Benjamin Le

画像キャプションと動作認識の最前線〜データセットに注目して〜（第17回ステアラボ人工知能セミナー）

STAIR Lab, Chiba Institute of Technology

Similar to [論文読み]Interpretable Coun.ng for Visual Ques.on Answering (20)

Deep LearningフレームワークChainerと最近の技術動向

Image style transfer & AI on App

Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng

Challenging Web-Scale Graph Analytics with Apache Spark

A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...

Applying your Convolutional Neural Networks

style clip paper review !!

A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

Image style transfer and iOS CoreML, Vision Frameworks

Transformer in Vision

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

lecture_16_jiajun.pdf

The Matsu Project - Open Source Software for Processing Satellite Imagery Data

Cepicky os-mapping-frameworks

Navigation-aware adaptive streaming strategies for omnidirectional video

February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

alphablues - ML applied to text and image in chat bots

Deep Learning for Personalized Search and Recommender Systems

画像キャプションと動作認識の最前線〜データセットに注目して〜（第17回ステアラボ人工知能セミナー）

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

Mind map of terminologies used in context of Generative AI

Kumud Singh

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Neo4j

Dr. Sean Tan, Head of Data Science, Changi Airport Group Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

SOFTTECHHUB

As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.

20240609 QFM020 Irresponsible AI Reading List May 2024

Matthew Sinclair

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Nexer Digital

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024

GridMate - End to end testing is a critical piece to ensure quality and avoid...

Mind map of terminologies used in context of Generative AI

Securing your Kubernetes cluster_ a step-by-step guide to success !

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Introduction to CHERI technology - Cybersecurity

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!

20240609 QFM020 Irresponsible AI Reading List May 2024

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Monitoring Java Application Security with JDK Tools and JFR Events

Video Streaming: Then, Now, and in the Future

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

Microsoft - Power Platform_G.Aspiotis.pdf

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Epistemic Interaction - tuning interfaces to provide information for AI support

GraphRAG is All You need? LLM & Knowledge Graph

[論文読み]Interpretable Coun.ng for Visual Ques.on Answering

1. Interpretable Coun.ng for Visual Ques.on Answering 3 d-hacks 1

2. • • VQA • VQA • Dataset • Model • Object detection • Language • Counting • Experiment & Results • 2

3. • (VQA) SOTA • interpretable Model Model SOTA Our approach 2 2 3

4. • - Alexander Tro,, Caming Xiong & Richard Socher - Salesforce Research • ICLR 2018 • OpenReview 6,7,7 4

5. Visual Question Answering; VQA • • CVPR2016 Competition 5

6. 1: • • 2015 CVPRW, Learning to count with deep object features • CNN • • 2016 ECCV, Towards perspective-free object counting with deep learning • … 6

7. 2: Visual Question Answering • 11% ( : ) • : • Simple Baseline for Visual Ques7on Answering 2015 • Exploring Models and Data for Image Ques7on Answering. In NIPS, 2015a. 7

8. 3: A$en'on • ( UpDown ) • (Attention ) • Multimodal compact bilinear pooling for visual question answering and visual grounding. EMNLP, 2016 • Bottom-Up and Top-Down Attention for Image Captioning and VQA. arXiv, 2017. 8

9. 4: VQA • VQA • → • • • It Takes Two to Tango: Towards Theory of AI’s Mind, 2017 • AI AI ( ) 9 AI

10. Dataset 10

11. • VQA • VQA2.0 Visual Genome / HowMany-QA 11

12. VQA2.0 / Visual Genome • VQA2.0 (1.0 2015) • Image + (Ques4on + Answer) • 265,016 • 3 /1 • 10 +3 /1 • Visual Genome • Image + • • 108,077 • 5.4 Million Region Descrip4ons • 1.7 Million Visual Ques4on Answers • 3.8 Million Object Instances • 2.8 Million ALributes • 2.3 Million Rela4onships 12

13. HowMany-QA • 0 20 • [“how many”, “number of”, “amount of”, or “count of”] • [ “number of the”] reject • ID 13 Our Dataset

14. Model 14

15. Input: Input: Output: • 3 15 Caption Grounding

16. • → • • : Bounding box • : ( ) • Faster RCNN* 16

17. Faster RCNN 17 • • End to End • makora

18. 18 makora

19. 19 makora

20. 20 makora

21. 21 makora

22. 22 makora

23. • → • ( ) • LSTM* • ( ) • x: T: s: 23

24. LSTM( ) • RNN ( ) • / 24

25. LSTM( ) 25 0 ~ 1

26. LSTM( ) 26 0 ~ 1

27. LSTM( ) 27 0 ~ 1

28. function • • • Gated Tanh Units (GTU)* Gated Tanh Units : Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional Image Generation with PixelCNN Decoders. In NIPS, 2016 29 0 10.02 5 2 1

29. • • • : bounding box • : • • • (Interpretable Rainforced Learning Counting) 30

30. • 3 31Our modelSOTA

31. So#Count • (0 1 ) • → 32 0.02 5 2 1 0.50 0.99 0.88 0.73

32. UpDown (Attention ) • Attention (UpDown, 2017) • → 33 0.02 5 2 1 0.50* 0.99* 0.88* 0.73* p_i …

33. IRLC ←Our Model • → • • Object • → +=1 • 34 → →

34. 35

35. 36

36. IRLC • K • R • 37 : http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf acAon

37. : • H: • 38 →

38. & 39

39. 3 40

40. • accuracy • • ( ) • RMSE • • ( ) 41 10 3

41. • / • 5 • 6 : !? 42

42. • IRLC • Accuracy IRLC RMSE IRLC Softcount • UpDown Softcount accuracy bad bad 43

43. • VQA + VG = HowManyQA • Coun1ng HowMany QA 44

44. Up Down • • 45 1 &2

45. Grounding • ( ) • Chandrasekaran(2017) • 46

46. Grounding • Microsoft COCO: Common Objects in Context: • 80 ( ) • 80 47 Category: train Category: line GloVe: SVD Word2Vec

47. Grounding Quality • 0.5=<IoU COCO / Background • IoU(Intersection over union)… boundingBox • : m i 48 m i

48. • m (ex. q=“car” How many “cars” are there ) • : So9Count or IRLC Grounding Quality 49 m i i ( )

49. Grounding • Grounding IRLC • ( ) ( ) Grounding 50

50. • 2 2 51

51. ICLR2018 VQA Coun0ng • Learning to Count Objects in Natural Images for Visual Question Answering Yan Zhang, Jonathon Hare, Adam Prügel-Bennett • Open review 6,4,6 52

52. • VQA • (So'count UpDown) Grounding →IRLC • Grounding • VQA2.0 VG HowMany-QA (VQA2.0 Accuracy RMSE ) 53

[論文読み]Interpretable Coun.ng for Visual Ques.on Answering

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to [論文読み]Interpretable Coun.ng for Visual Ques.on Answering

Similar to [論文読み]Interpretable Coun.ng for Visual Ques.on Answering (20)

Recently uploaded

Recently uploaded (20)

[論文読み]Interpretable Coun.ng for Visual Ques.on Answering