Invited Speakers
- Alan Yuille, Johns Hopkins University
- Tao Mei, JD Research
- Qin Jin, Renmin University of China
- Christoph Feichtenhofer, Facebook AI Research (FAIR)
Schedule [Live Session Landing Page]
9:00-9:10 | Opening Remarks [Live] | Workshop Organizers |
9:10-9:50 | Invited Talk 1 + Live Q&A
Compositional Models for Open-Set Activity Classification of Videos |
Alan Yuille |
9:50-10:30 | Invited Talk 2 + Live Q&A
Efficient Video Recognition |
Christoph Feichtenhofer |
10:30-10:40 | Coffee Break | |
10:40-11:20 | Invited Short Oral (workshop paper session) | |
11:20-14:00 | Lunch | |
14:00-14:40 | Invited Talk 3 + Live Q&A
Image and Video Captioning |
Qin Jin |
14:40-15:10 | Invited Talk 4
Vision and Language: From Perception to Creation |
Tao Mei |
15:10-15:30 | Coffee Break | |
15:30-15:40 | YouMakeup Challenge | Shizhe Chen [Slides] |
15:40-16:00 | Challenge Talk + Live Q&A
Baseline Introduction and Analysis |
Ludan Ruan, Linli Yao, Weiying Wang |
16:00-16:20 | Challenge Talk (winner)
Multi-modal Feature Fusion for YouMakeup Video Question Answering BUPTMM Submission for YouMakeUp VQA Challenge in Step Ordering Task |
Yajie Zhang, Li Su [report] Kun Liu, Huadong Ma [report] |
16:20-16:30 | VATEX Captioning Challenge [Live] | Xin Wang |
16:30-16:50 | Challenge Talk (winner) + Live Q&A
Multi-View Features and Hybrid Reward Strategies for Video Captioning |
Xinxin Zhu, Longteng Guo, Peng Yao, Shichen Lu, Wei Liu, Jing Liu |
16:50-17:10 | Challenge Talk (runner-up) + Live Q&A
Multi-modal Feature Fusion with Feature Attention for Video Captioning |
Ke Lin, Zhuoxin Gan, Liwei Wang |
17:10-17:15 | Ending [Live] | Workshop Organizers |
Overview and Call For Papers
Vision and language is a recently raised research area and has received a lot of attention. Initial research and applications in this area are mainly image-focused, such as Image Captioning, Visual Question Answering, and Referring Expression. However, moving beyond static images is essential for vision and language understanding as videos contain much richer information like spatial-temporal dynamics and audio signals. So most recently, researchers in both computer vision and natural language processing communities are striving to bridge videos and natural language. Popular topics such as video captioning, video question answering, text guided video generation fall into this area. We are proposing the first Language & Vision with applications to Video Understanding in CVPR with a joint VATEX Video Captioning Challenge and a YouMakeup Video Question Answering Challenge. This workshop offers to gather researchers from multiple domains to form a new video-language community and attract more people on this topic. In the workshop, we will invite several top-tier researchers from this area to present their most recent works. We will cover different video-language related topics such as video captioning and video question answering. The invited speakers will present key architectural building blocks and novel algorithms used to solve these tasks.
This workshop covers (but is not limited to) the following topics:
- Video captioning, dialogue, and question-answering;
- Sequence learning towards bridging video and language;
- Novel tasks which combine language and video;
- Understanding the relationship between language and video in humans;
- Video synthesis from language;
- Stories as means of abstraction;
- Transfer learning across language and video;
- Joint video and language alignment and parsing;
- Cross-modal learning beyond image understanding, such as videos and audios;
- Multidisciplinary study that may involve linguistics, cognitive science, robotics, etc.
In addition, We will call for 10-15 high-quality 4 pages extended abstracts to be showcased at a poster session along with short talk spotlights. Abstracts are not archival and will not be included in the Proceedings of CVPR 2020. In the interests of fostering a freer exchange of ideas we welcome both novel and previously-published work.
Submission details
This track follows the CVPR paper format. Submissions may consist of up to 4 pages of content (excluding references) in CVPR format, plus unlimited references. We are also accepting full submissions which will not be included in the Proceedings of CVPR 2020 but we will at the option of the authors provide a link to the relevant arXiv submission. The submission should be emailed as a single PDF to the languageandvision@gmail.com
The format of submitted papers to the archival track must follow the CVPR Author Guidelines. Style sheets (Latex, Word) are available here.
Important Dates
- Submission Deadline: May 6, 2020 (11:59pm Anywhere on Earth time, UTC-12)
- Notification: May 12, 2020
- Workshop Day: June 19, 2020
Challenges
VATEX Captioning Challenge 2020
This VATEX Captioning Challenge 2020 aims to benchmark progress towards models that can describe the videos in various languages such as English and Chinese. This year, in addition to the original 34,991 videos, we release a private test set with 6,278 new videos for evaluation.
Please visit VATEX Captioning Challenge 2020 website for more details!
YouMakeup VQA Challenge
The YouMakeup VQA challenge aims to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. The makeup instructional videos are naturally more fine-grained than open-domain videos. Different action steps contain subtle but critical differences in actions, tools and applied facial areas.
We propose two question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is Facial Image Ordering, which aims to understand visual effects of different actions expressed in natural language on facial object. The second task is Step Ordering, which aims to measure cross-modal semantic alignments between untrimmed long videos and multi-sentence texts.
Please visit YouMakeup VQA Challenge website for more details!
Organizers and PC
Organizers
University of Adelaide | qi.wu01@adelaide.edu.au | |
UC Santa Cruz | xwang366@ucsc.edu | |
Johns Hopkins University | cxliu@jhu.edu | |
Facebook AI | lichengyu@fb.com | |
Google AI | lujiang@google.com | |
UCAS, China | yhuang@nlpr.ia.ac.cn | |
JD AI Research | tingyao.ustc@gmail.com | |
Renmin University of China | qjin@ruc.edu.cn | |
UC Santa Barbara | william@cs.ucsb.edu | |
University of Adelaide | anton.vandenhengel@adelaide.edu.au |
Contact the Organizing Committee: languageandvision@gmail.com