Deep Learning Working Group R-CNN

Similar documents
Category-level localization. Cordelia Schmid

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

arxiv: v1 [cs.si] 2 Nov 2017

Instructors: Tengyu Ma and Chris Re

Cluster Analysis. (see also: Segmentation)

Research and strategy for the land community.

Support Vector Machines

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

Natural Resource Abundance: Blessing or Curse

Towards Tackling Hate Online Automatically

Deep Classification and Generation of Reddit Post Titles

Learning Systems. Research at the Intersection of Machine Learning & Data Systems. Joseph E. Gonzalez

Deep Learning and Visualization of Election Data

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

Analysis of Categorical Data from the California Department of Corrections

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Us and Them Adversarial Politics on Twitter

Events and Memes in Media- rich Social Informa7on Networks

Fine-Grained Opinion Extraction with Markov Logic Networks

A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation

Understanding factors that influence L1-visa outcomes in US

Classifier Evaluation and Selection. Review and Overview of Methods

Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources

Constraint satisfaction problems. Lirong Xia

* Source: Part I Theoretical Distribution

Random Forests. Gradient Boosting. and. Bagging and Boosting

The Predictive Potential of Political Discourse. Leah Windsor Institute for Intelligent Systems The University of Memphis

Introduction to Text Modeling

Visible home styles in Congress

FOREIGN TRADE CHANGES AND SECTORAL DEVELOPMENT IN LATVIA: COMPARISON OF THE BALTIC STATES

Distributed representations of politicians

Introduction-cont Pattern classification

AMONG the vast and diverse collection of videos in

BYLAWS OF THE COMPUTER VISION FOUNDATION. 1. Name. The name of this organization shall be the Computer Vision Foundation ( the CV Foundation ).

Statistics, Politics, and Policy

Subreddit Recommendations within Reddit Communities

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info

PASW & Hand Calculations for ANOVA

Outline. From Pixels to Semantics Research on automatic indexing and retrieval of large collections of images. Research: Main Areas

CS 5523: Operating Systems

STATISTICAL GRAPHICS FOR VISUALIZING DATA

CSC304 Lecture 16. Voting 3: Axiomatic, Statistical, and Utilitarian Approaches to Voting. CSC304 - Nisarg Shah 1

JUDGE, JURY AND CLASSIFIER

Quality of Service in Optical Telecommunication Networks

arxiv: v1 [cs.si] 28 Dec 2017

Yang Zhang. Contact Information. Department of Political Science Washington University in St. Louis 253 Seigle Hall St.

Essential Questions Content Skills Assessments Standards/PIs. Identify prime and composite numbers, GCF, and prime factorization.

ANALYZING SOCIAL India s 2011 MEDIA MOMENTUM Anticorruption

arxiv: v2 [cs.si] 10 Apr 2017

Probabilistic Latent Semantic Analysis Hofmann (1999)

Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science

An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems

Tengyu Ma Facebook AI Research. Based on joint work with Yuanzhi Li (Princeton) and Hongyang Zhang (Stanford)

Pioneers in Mining Electronic News for Research

Congressional Gridlock: The Effects of the Master Lever

Computational challenges in analyzing and moderating online social discussions

Predicting Congressional Votes Based on Campaign Finance Data

An overview and comparison of voting methods for pattern recognition

Perception of the Business Climate in Vietnam May 2015

Photographic home styles in Congress: a computer vision approach

Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining

The Impact of the Interaction between Economic Growth and Democracy on Human Development: Cross-National Analysis

A Global Perspective on Socioeconomic Differences in Learning Outcomes

Design and Analysis of College s CPC-Building. System Based on.net Platform

U.S. History 8 Reconstruction DBQ Teaching Reconstruction with Primary Sources and Document Based Questions

Political Science. Political Culture and Policy Liberalism in American States: A Test of a New Measure. Mark Wagner. Introduction

Space Climate Observatory

CS388: Natural Language Processing Coreference Resolu8on. Greg Durrett

Appendices for Elections and the Regression-Discontinuity Design: Lessons from Close U.S. House Races,

Syllabus for ECONOMIC DEVELOPMENT OF MAINLAND CHINA 2005 IMCS Autumn Course National Chengchi University, Taipei. Instructor

The role of Social Cultural and Political Factors in explaining Perceived Responsiveness of Representatives in Local Government.

Rural-urban Migration and Urbanization in Gansu Province, China: Evidence from Time-series Analysis

Psychological Factors

Aligning claim drafting and filing strategies to optimize protection in the EPO, GPTO and USPTO

Types of Economies. 10x10learning.com

COULD SIMULATION OPTIMIZATION HAVE PREVENTED 2012 CENTRAL FLORIDA ELECTION LINES?

Final exam: Political Economy of Development. Question 2:

Automatic Thematic Classification of the Titles of the Seimas Votes

Genetic Algorithms with Elitism-Based Immigrants for Changing Optimization Problems

Persuasion. Persuasion process

Comments on: Richard Baldwin, The Great Convergence

Classification of posts on Reddit

1/12/12. Introduction-cont Pattern classification. Behavioral vs Physical Traits. Announcements

Burning Coal in Tangshan Energy Resources as Commons

October Next Generation Smart Border Security Ability. Quality. Delivery.

Interactive Functional Medical Image Analysis

Vision for SCEC. John E. Vidale

Philips Lifeline. Ø Chenyang Lu 1

VLSI Design I; A. Milenkovic 1

I. MODEL Q1 Q2 Q9 Q10 Q11 Q12 Q15 Q46 Q101 Q104 Q105 Q106 Q107 Q109. Stepwise Multiple Regression Model. A. Frazier COM 631/731 March 4, 2014

Trials and Tribulations of Shooting a Water Well. by Wes Bender

MOS Exams Objective Mapping

GRADE 8 SOCIAL STUDIES SAMPLE ITEMS

Birth and Death Rates Grades 9-12

Internet of Things Wireless Sensor Networks. Chenyang Lu

CS 229 Final Project - Party Predictor: Predicting Political A liation

Case3:11-cr WHA Document40 Filed08/08/11 Page1 of 10

Host-guest Interaction: A Study Based on Cognitions and Attitudes of Residents in Ethnic Tourism Regions on Tourism Impacts

China s Reform and Opening-up

Matthew A. Cole and Eric Neumayer. The pitfalls of convergence analysis : is the income gap really widening?

Transcription:

Deep Learning Working Group R-CNN Includes slides from : Josef Sivic, Andrew Zisserman and so many other Nicolas Gonthier February 1, 2018

Recognition Tasks Image Classification Does the image contain an aeroplane? (last lecture) Object Class Detection/Localization Where are the aeroplanes (if any)? Object Class Segmentation Which pixels are part of an aeroplane (if any)?

Classification vs. Detection ü Dog Dog Dog

Problem formulation { airplane, bird, motorbike, person, sofa } person motorbike Input Desired output

Region proposals: Selective Search 1. Merge two most similar regions based on S. 2. Update similarities between the new region and its neighbors. 3. Go back to step 1. until the whole image is a single region. [K. van de Sande, J. Uijlings, T. Gevers, and A. Smeulders, ICCV 2011]

Region proposals: Selective Search Take bounding boxes of all generated regions and treat them as possible object locations. [K. van de Sande, J. Uijlings, T. Gevers, and A. Smeulders, ICCV 2011]

Region proposals: Selective Search [K. van de Sande, J. Uijlings, T. Gevers, and A. Smeulders, ICCV 2011]

Test: Non-maximum suppression (NMS) Scanning-window detectors typically result in multiple responses for the same object Conf=.9 To remove multiple responses, a simple greedy procedure called Non-maximum suppression is applied: NMS: 1. Sort all detections by detector confidence 2. Choose most confident detection d i ; remove all d j s.t. overlap(d i,d j )>T 3. Repeat Step 2. until convergence

Putting it together: R-CNN Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-1 Feb 2016 Slide credit: Ross Girschick [Girschick et al, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014]

Region-based Convolutional Networks (R-CNNs) mean Average Precision (map) 70% 60% 50% 40% 30% 20% 10% 17% DPM 23% DPM, HOG+ BOW 28% DPM, MKL 37% 41% DPM++ DPM++, MKL, Selective Search 41% Selective Search, DPM++, MKL 53% R-CNN v1 62% R-CNN v2 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 year [R-CNN. Girshick et al. CVPR 2014]

76% mean Average Precision (map) 70% 60% 50% 40% 30% 20% 10% ResNet ~1 year ~5 years 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 year

R-CNN Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors 3. Complex multistage training pipeline Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-66 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

R-CNN Problem #1: Slow at test-time due to independent forward passes of the CNN Solution: Share computation of convolutional layers between proposals for an image Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-68 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016 [Girschick, Fast R-CNN, ICCV 2015]

R-CNN Problem #2: Post-hoc training: CNN not updated in response to final classifiers and regressors R-CNN Problem #3: Complex training pipeline Solution: Just train the whole system end-to-end all at once! Slide credit: Ross Girschick Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-69 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Fast R-CNN: Region of Interest Pooling Convolution and Pooling Max-pool within each grid cell Fully-connected layers Hi-res input image: 3 x 800 x 600 with region proposal Hi-res conv features: C x H x W with region proposal RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-73 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Fast R-CNN: Region of Interest Pooling Convolution and Pooling Can back propagate similar to max pooling Fully-connected layers Hi-res input image: 3 x 800 x 600 with region proposal Hi-res conv features: C x H x W with region proposal RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-74 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Fast R-CNN Results R-CNN Fast R-CNN Faster! FASTER! Better! Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x map (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-77 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Fast R-CNN Problem Solution: Test-time speeds don t include region proposals Just make the CNN do region proposals too! R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-79 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Faster R-CNN: Insert a Region Proposal Network (RPN) after the last convolutional layer RPN trained to produce region proposals directly; no need for external region proposals! After RPN, use RoI Pooling and an upstream classifier and bbox regressor just like Fast R-CNN Ren et al, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015 Slide credit: Ross Girschick Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-80 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Faster R-CNN: Region Proposal Network Slide a small window on the feature map Build a small network for: classifying object or not-object, and regressing bbox locations 1 x 1 conv 1 x 1 conv Position of the sliding window provides localization information with reference to the image 1 x 1 conv Box regression provides finer localization information with reference to this sliding window Slide credit: Kaiming He Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-81 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Faster R-CNN: Region Proposal Network Use N anchor boxes at each location Anchors are translation invariant: use the same ones at every location Regression gives offsets from anchor boxes Classification gives the probability that each (regressed) anchor shows an object Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-82 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Faster R-CNN: Training In the paper: - Use alternating optimization to train RPN, then Fast R-CNN with RPN proposals, etc. - More complex than it has to be Since publication: Joint training! One network, four losses - RPN classification (anchor good / bad) - RPN regression (anchor -> proposal) - Fast R-CNN classification (over classes) - Fast R-CNN regression (proposal -> box) Slide credit: Ross Girschick Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-83 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Faster R-CNN: Results R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x map (VOC 2007) 66.0 66.9 66.9 Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-84 1 Feb 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson 1 Feb 2016

Detection without proposals: Yolo / SSD Input image 3 x H x W Redmon et al, You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 Liu et al, SSD: Single-Shot MultiBox Detector, ECCV 2016 Divide image into grid 7 x 7 Image a set of base boxes centered at each grid cell Here B = 3 Slide credit: L. Fei Fei, J. Johnson, S. Yeung, http://cs231n.stanford.edu/

Detection without proposals: Yolo / SSD Within each grid cell: - Regress from each of the B base boxes to a final box with 5 numbers: (dx, dy, dh, dw, confidence) - Predict scores for each of C classes (including background as a class) Divide image into grid 7 x 7 Image a set of base boxes centered at each grid cell Here B = 3 Output: 7 x 7 x (5 * B + C) From input image to scores with a single network. Faster but not as accurate as RCNN. See also: Lin et al., Focal loss for dense object detection, ICCV 2017. Slide credit: L. Fei Fei, J. Johnson, S. Yeung, http://cs231n.stanford.edu/

Mask-RCNN: object detection and segmentation R-CNN = Faster R-CNN with FCN on RoIs Faster R-CNN FCN on RoI Mask RCNN = - 1. Object detector using Faster RCNN + - 2. fully convolutional network (FCN) on region of interest (RoI) Slide credit: K. He, instancetutorial.github.io

References for object detection RCNN B. Alexe, T. Deselaers, and V. Ferrari. Measuring the objectness of image windows. TPAMI, 2012. I. Endres and D. Hoiem. Category independent object proposals. In ECCV, 2010. J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013. A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012. X. Wang, M. Yang, S. Zhu, and Y. Lin. Regionlets for generic object detection. In ICCV, 2013. R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014. Fast R-CNN K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014. R. Girshick. Fast R-CNN. In ICCV, 2015. Faster R-CNN D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014. P.O.Pinheiro,R.Collobert,andP.Dollar.Learningtosegmentobjectcandidates.InNIPS,2015. S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015. Segmentation J. Long, E. Shelhamer, T. Darrell, Fully Convolutional Networks for Semantic Segmentation. CVPR 2015. He et al, Mask R-CNN, ICCV 2017

Summary of object detection / segmentation Basic idea: train a sliding window classifier from training data Pre-CNN: Histogram of oriented gradients (HOG) + lin. SVM jittering, hard negative mining to improve accuracy, region proposals R-CNN: combine region proposals and CNN features Fast(er) R-CNN: end-to-end training: Region proposals and object classification can be trained jointly Deeper networks (ResNet101) improve accuracy Mask-RCNN: object detection+segmentation Fully convolutional networks (FCN) for segmentation Loss: segmentation, classification and bounding box prediction