Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering
Described herein are systems and methods for generating and using attention-based deep learning architectures for visual question answering task (VQA) to automatically generate answers for image-related (still or video i
Patent Number
US 9965705
Status
Active
Filing Date
June 16, 2016
Grant Date
May 8, 2018
Expiration
~June 2036 (estimated)
Claims
23
Assignee
Baidu USA LLC
Inventors
Kan Chen, Jiang Wang, Wei Xu
Citations
28 forward · 2 backward
What it covers
Described herein are systems and methods for generating and using attention-based deep learning architectures for visual question answering task (VQA) to automatically generate answers for image-related (still or video images) questions. To generate the correct answers, it is important for a model's attention to focus on the relevant regions of an image according to the question because different questions may ask about the attributes of different image regions. In embodiments, such question-guided attention is learned with a configurable convolutional neural network (ABC-CNN). Embodiments of the ABC-CNN models determine the attention maps by convolving image feature map with the configurable convolutional kernels determined by the questions semantics. In embodiments, the question-guided attention maps focus on the question-related regions and filters out noise in the unrelated regions.
Generated by PatentBrief · Not legal advice · patentbrief.org
US 9965705 · 2026