Paper list
- Deep learning extensions
- "Generative Adversarial Networks" Goodfellow et al. link.
- "Auto-encoding variational bayes" Kingma and Welling. link.
- "Understanding LSTM Networks" Olah. link.
- "On Multiplicative Integration with Recurrent Neural Networks" Wu et al. link.
- Low-level visual features
- "Large Displacement Optical Flow: Descriptor Matching in Variational Flow Estimation" Brox and Malik. PAMI 2010. link.
- "Fully-Trainable Deep Matching" Thewlis et al. link.
- "Convolutional Two-Stream Network Fusion for Video Action Recognition" Feichtenhofer et al. link.
- "Handcrafted local features are convolutional neural networks" Lan et al. link.
- "Dynamic image networks for action recognition" Bilen et al. link.
- "Multi-region two-stream R-CNN for action detection" Peng and Schmid. link.
- "Actions~Transformations" Wang et al. link.
- "Long Term Temporal Convolutions" Gul et a. link.
- "Beyond Short Snippets: Deep Networks for Video Classification" Ng et al. link.
- Mid-level tracking / pose / detection
- "Learning complexity-aware cascades for deep pedestrian detection" Cai et al. CVPR 15.
- "Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images" Idrees et al. CVPR 13.
- "Convolutional Pose Machines" Wei et al. CVPR16. link.
- "Stct: Sequentially training convolutional networks for visual tracking" Wang et al. CVPR 16.
- "Visual tracking with fully convolutional networks" Wang et al. CVPR 15.
- "Multi-task Recurrent Neural Network for Immediacy Prediction" Chu et al. ICCV 15.
- "Social Role Discovery in Human Events" Ramanathan et al. CVPR 13link.
- "Recurrent Network Models for Human Dynamics" Fragkiadaki et al. link..
- "Learning Predictive Visual Models of Physics for Playing Billiards" Fragkiadaki et al. link.
- "Structural-RNN: Deep Learning on Spatio-Temporal Graphs" Jain et al. link.
- High-level actions / intentions
- "Learning a driving simulator" Santana and Hotz. link.
- "A database for fine grained activity detection of cooking activities" Rohrbach et al. CVPR 12 link.
- "Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding" Sigurdsson et al. link.
- "Delving into egocentric actions" Li et al. CVPR 15.
- "MSR-VTT: A Large Video Description Dataset for Bridging Video and Language" Xu et al. CVPR 16.
- "MovieQA: Understanding Stories in Movies through Question-Answering" Tapaswi et al. CVPR 16.
- "Experience replay for real-time reinforcement learning control." Adam et al. link.
- "Human-level control through deep reinforcement learning" Mnih et al. link.
- "Deep recurrent q-learning for partially observable MDPs" Hauskecht and Stone. link.
- "Predicting Motivations of Actions by Leveraging Text" Vondrick et al. CVPR 16 link.
- "Inferring the Why in Images" Pirsiavash et al. link.
- "Assessing the Quality of Actions" Pirviash et al. ECCV 14. link.
- "Animate vision" Ballard. link.
- "Action understanding as inverse planning" Baker et al. link.
- "MazeBase: A Sandbox for Learning from Games" Sukhbaatar et al. link.
- "Unsupervised Semantic Action Discovery from Video Collections" Sener et al. link.
- "Watch-n-Patch: Unsupervised Learning of Actions and Relations" Wu et al. link.
- "Situation Recognition: Visual Semantic Role Labeling for Image Understanding" Yatskar et. link.
- "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations" Krisha et al. link.
- "SPICE: Semantic Propositional Image Caption Evaluation" link.
Last modified: Mon Aug 29 22:07:16 EDT 2016