LVVU 2020

Invited Speakers

Alan Yuille, Johns Hopkins University
Tao Mei, JD Research
Qin Jin, Renmin University of China
Christoph Feichtenhofer, Facebook AI Research (FAIR)

Schedule [Live Session Landing Page]

9:00-9:10	Opening Remarks [Live]	Workshop Organizers
9:10-9:50	Invited Talk 1 + Live Q&A Compositional Models for Open-Set Activity Classification of Videos	Alan Yuille
9:50-10:30	Invited Talk 2 + Live Q&A Efficient Video Recognition	Christoph Feichtenhofer
10:30-10:40	Coffee Break
10:40-11:20	Invited Short Oral (workshop paper session)
11:20-14:00	Lunch
14:00-14:40	Invited Talk 3 + Live Q&A Image and Video Captioning	Qin Jin
14:40-15:10	Invited Talk 4 Vision and Language: From Perception to Creation	Tao Mei
15:10-15:30	Coffee Break
15:30-15:40	YouMakeup Challenge	Shizhe Chen [Slides]
15:40-16:00	Challenge Talk + Live Q&A Baseline Introduction and Analysis	Ludan Ruan, Linli Yao, Weiying Wang
16:00-16:20	Challenge Talk (winner) Multi-modal Feature Fusion for YouMakeup Video Question Answering BUPTMM Submission for YouMakeUp VQA Challenge in Step Ordering Task	Yajie Zhang, Li Su [report] Kun Liu, Huadong Ma [report]
16:20-16:30	VATEX Captioning Challenge [Live]	Xin Wang
16:30-16:50	Challenge Talk (winner) + Live Q&A Multi-View Features and Hybrid Reward Strategies for Video Captioning	Xinxin Zhu, Longteng Guo, Peng Yao, Shichen Lu, Wei Liu, Jing Liu
16:50-17:10	Challenge Talk (runner-up) + Live Q&A Multi-modal Feature Fusion with Feature Attention for Video Captioning	Ke Lin, Zhuoxin Gan, Liwei Wang
17:10-17:15	Ending [Live]	Workshop Organizers

Overview and Call For Papers

Vision and language is a recently raised research area and has received a lot of attention. Initial research and applications in this area are mainly image-focused, such as Image Captioning, Visual Question Answering, and Referring Expression. However, moving beyond static images is essential for vision and language understanding as videos contain much richer information like spatial-temporal dynamics and audio signals. So most recently, researchers in both computer vision and natural language processing communities are striving to bridge videos and natural language. Popular topics such as video captioning, video question answering, text guided video generation fall into this area. We are proposing the first Language & Vision with applications to Video Understanding in CVPR with a joint VATEX Video Captioning Challenge and a YouMakeup Video Question Answering Challenge. This workshop offers to gather researchers from multiple domains to form a new video-language community and attract more people on this topic. In the workshop, we will invite several top-tier researchers from this area to present their most recent works. We will cover different video-language related topics such as video captioning and video question answering. The invited speakers will present key architectural building blocks and novel algorithms used to solve these tasks.

This workshop covers (but is not limited to) the following topics:

Video captioning, dialogue, and question-answering;
Sequence learning towards bridging video and language;
Novel tasks which combine language and video;
Understanding the relationship between language and video in humans;
Video synthesis from language;
Stories as means of abstraction;
Transfer learning across language and video;
Joint video and language alignment and parsing;
Cross-modal learning beyond image understanding, such as videos and audios;
Multidisciplinary study that may involve linguistics, cognitive science, robotics, etc.

In addition, We will call for 10-15 high-quality 4 pages extended abstracts to be showcased at a poster session along with short talk spotlights. Abstracts are not archival and will not be included in the Proceedings of CVPR 2020. In the interests of fostering a freer exchange of ideas we welcome both novel and previously-published work.

Submission details

This track follows the CVPR paper format. Submissions may consist of up to 4 pages of content (excluding references) in CVPR format, plus unlimited references. We are also accepting full submissions which will not be included in the Proceedings of CVPR 2020 but we will at the option of the authors provide a link to the relevant arXiv submission. The submission should be emailed as a single PDF to the languageandvision@gmail.com

The format of submitted papers to the archival track must follow the CVPR Author Guidelines. Style sheets (Latex, Word) are available here.

Important Dates

Submission Deadline: May 6, 2020 (11:59pm Anywhere on Earth time, UTC-12)
Notification: May 12, 2020
Workshop Day: June 19, 2020

Challenges

VATEX Captioning Challenge 2020

This VATEX Captioning Challenge 2020 aims to benchmark progress towards models that can describe the videos in various languages such as English and Chinese. This year, in addition to the original 34,991 videos, we release a private test set with 6,278 new videos for evaluation.

Please visit VATEX Captioning Challenge 2020 website for more details!

YouMakeup VQA Challenge

The YouMakeup VQA challenge aims to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. The makeup instructional videos are naturally more fine-grained than open-domain videos. Different action steps contain subtle but critical differences in actions, tools and applied facial areas.

We propose two question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is Facial Image Ordering, which aims to understand visual effects of different actions expressed in natural language on facial object. The second task is Step Ordering, which aims to measure cross-modal semantic alignments between untrimmed long videos and multi-sentence texts.

Please visit YouMakeup VQA Challenge website for more details!

Organizers and PC

Organizers

Qi Wu	University of Adelaide	qi.wu01@adelaide.edu.au
Xin Wang	UC Santa Cruz	xwang366@ucsc.edu
Chenxi Liu	Johns Hopkins University	cxliu@jhu.edu
Licheng Yu	Facebook AI	lichengyu@fb.com
Lu Jiang	Google AI	lujiang@google.com
Yan Huang	UCAS, China	yhuang@nlpr.ia.ac.cn
Ting Yao	JD AI Research	tingyao.ustc@gmail.com
Qin Jin	Renmin University of China	qjin@ruc.edu.cn
William Wang	UC Santa Barbara	william@cs.ucsb.edu
Anton van den Hengel	University of Adelaide	anton.vandenhengel@adelaide.edu.au

Contact the Organizing Committee: languageandvision@gmail.com

LVVU 2020

Workshop on Language & Vision with applications to Video Understanding
Date: June 19, 2020, PDT time

In conjunction with CVPR 2020
Room: Online

Invited Speakers

Schedule [Live Session Landing Page]

Overview and Call For Papers

Submission details

Important Dates

Challenges

VATEX Captioning Challenge 2020

YouMakeup VQA Challenge

Organizers and PC

Organizers

LVVU 2020

Workshop on Language & Vision with applications to Video UnderstandingDate: June 19, 2020, PDT time

In conjunction with CVPR 2020 Room: Online

Invited Speakers

Schedule [Live Session Landing Page]

Overview and Call For Papers

Submission details

Important Dates

Challenges

VATEX Captioning Challenge 2020

YouMakeup VQA Challenge

Organizers and PC

Organizers

Workshop on Language & Vision with applications to Video Understanding
Date: June 19, 2020, PDT time

In conjunction with CVPR 2020
Room: Online