CV Chapter 6 Categorization 2

Questions about the lecture 'Computer Vision' of the RWTH Aachen Chapter 6 Categorization 2

Questions about the lecture 'Computer Vision' of the RWTH Aachen Chapter 6 Categorization 2


Set of flashcards Details

Flashcards 50
Language English
Category Computer Science
Level University
Created / Updated 04.02.2017 / 21.02.2017
Weblink
https://card2brain.ch/box/20170204_cv_chapter_6_categorization_2
Embed
<iframe src="https://card2brain.ch/box/20170204_cv_chapter_6_categorization_2/embed" width="780" height="150" scrolling="no" frameborder="0"></iframe>

What is the difference of categorization to local feature matching?

Recognizable objects have no longer exact correspondence, only local

Name models for object categorization 2? [3]

1. Part-based models

2. Implicit shape models (ISM)

3. Deformable part-based model

What is the idea of part-based models for classification 2? [2]

1. Parts are 2D image fragments

2. Structure is configuration of parts

Name connectivity structures for part-based models for categorization 2? [7]

1. Bag of visual words with O(N)

2. Constellation with O(N^k)

3. Star shape with O(N²)

4. Tree with O(N²)

5. k-fan with O(N³)

6. Hierarchy

7. Sparse flexible model

What is the idea of implicit shape models (ISM) for categorization 2? [4]

1. Learn appearance codebook and star topology structural model

2. Features are considered independent given object center

 

3. Use visual vocabulary with displacement vectors to index votes

4. Robust to clutter, occlusion, noise and low contrast

What changed for the probabilistic generalized hough transform? [5]

1. Exact correspondence → probabilistic match

2. NN matching → soft matching

3. Feature location on object → part location distribution

4. Uniform votes → probabilistic vote weighting

5. Quantized hough array → continuous hough space

How does recognition works for implicit shape models (ISM) for categorization 2? [3]

1. Choose interest points from image feature f

2. Compare with codebook entries with probabilistic vote weights

3. Locate object position and return back project hypothesis

How does segmentation works for implicit shape models (ISM) for categorization 2? [2]

1. Find pixel contributions with meta information from hypothesis

2. Perform segmentation

What is the definition of the scale invariant votes? [3]

1. x_vote = x_img – x_occ*(s_img/s_occ)

2. y_vote = y_img – y_occ*(s_img/s_occ)

3. s_vote = (s_img/s_occ)

What is the idea of deformable part-based model for implicit shape models (ISM) for categorization 2?

Each component has global template plus deformable parts // Bike

What is the definition of deformable part-based model for implicit shape models (ISM) for categorization 2? [3]

1. Use HOG sliding-window detector

2. Score is dot product of filter and vectors in window specified by p

3. Score of object hypothesis is sum of scores minus deformation costs s(p0,…,pn) = Sum_i Fi*phi(H,p_i) – Sum_i d_i*(dx_i²,dy_i²)

What is used for image classification for categorization 2?

Bag-of-words model

What is the difference of traditional recognition approaches compared to deep learning? [4]

1. Traditional recognition approach uses hand-designed feature extraction

2. Build better classifiers or more features?

 

3. New is learning features from pixel layer and forward to simple classifier

4. Inspiration by neuron cells

What are the characteristics of perceptrons for deep learning? [3]

1. Multiple inputs x_1,… ,x_d

2. Multiple weights w_1,… ,w_d

3. Single output sigma(w*x+b) with sigma(t)=1/(1+exp(-t))

How does the layer structure of multi-layer neural networks looks like for deep learning?

Input, hidden and output layer

What is the definition of multi-layer neural networks for deep learning? [3]

1. Find weights minimizing error between true t_n and estimated labels f_w(x_n) with E(W) = Sum_n L(t_n, f(x_n;W))

2. Minimization with gradient descent if f is differentiable

3. Training with error back-propagation

What is the definition of the Hubel/Wiesel architecture (Nobel prize 1981) in deep learning?

Visual cortex consists of simple, complex and hyper-complex cells

What are the working steps of a convolutional neural network (CNN)? [6]

1. Input image

2. Convolution

3. Non-linearity

4. Spatial pooling

5. Normalization

6. Feature maps

 

With back-propagation classification error

What are the three network possibilities intuitions for 1k² image with 1M hidden units for convolutional neural network (CNN)? [3]

1. Fully connected network

2. Locally connected net

3. Convolutional net

What is the characteristic for the fully connected network for 1k² image with 1M hidden units for convolutional neural network (CNN)?

Requires 1T parameters

What is the characteristic for the locally connected network for 1k² image with 1M hidden units for convolutional neural network (CNN)?

With 10² receptive fields requires 100M parameters

What are the characteristics for the convolutional network for 1k² image with 1M hidden units for convolutional neural network (CNN)? [3]

1. Shares parameters across different locations

2. With 100 filters of size 10² requires 10k parameters

3. Result is a (memory) response map of size 1000x1000x100

What holds for an assumed eye detector on the convolutional network for 1k² image with 1M hidden units for convolutional neural network (CNN)? [2]

1. How to make detection robust to exact location?

2. Use pooling (e.g. max or avg) for filter response

What are the characteristics of layers for convolutional neural network (CNN)? [2]

1. Hidden neuron connects to local space covering full depth

2. Multiple neurons looking at same input region stacked in depth

What are the characteristics of filters for convolutional neural network (CNN)? [2]

1. So-called depth slice or activation map

2. Use low, mid and high level features before classifier

Name three non-linearities g(a) for convolutional neural network (CNN)? [3]

1. Sigmoid with g(a) = sigma(a) = 1/(1+exp(-a))

2. Hyperbolic tangent with g(a) = tanh(a) = 2*sigma(2a) – 1

3. Rectified linear unit (ReLU) with g(a) = max{0,a}

List possible CNN architectures? [5]

1. LeNet (1998)

2. AlexNet (2012)

3. VGGNet (2014/15)

4. GoogLeNet (2014)

5. Residual networks ReNet (2015)

What are the characteristics for LeNet a possible CNN architecture? [4]

1. Early convolution architecture

2. 2 convolutional and pooling layers

3. Fully connected NN layers for classificiation

4. Successfully used for handwritten digit recognition (MNIST)

What are the characteristics for AlexNet a possible CNN architecture? [8]

1. Similar to LeNet

2. 7 hidden layers, 650k units and 60M parameters

3. 11² with stride 4

4. More data with 10⁶ instead of 10³

5. GPU implementation

6. Better regularization and up-to-date training tricks as dropout

 

7. Halved error rate at ILSVRC (16.4% vs 26.2%) // Revolution

8. Acquired by Google and deployed to Google+ in 2013

What are the characteristics for VGGNet a possible CNN architecture? [3]

1. Deeper network with stacked convolutional layers // 19 layers

2. 3² with stride 1 cause less parameters

3. Ameliorating ILSVRC top-5 to 6.7%

What are the characteristics for GoogLeNet a possible CNN architecture? [2]

1. Uses inception module // 22 layers

2. At ILSVRC similar to VGGNet

What are the characteristics for ResNet a possible CNN architecture? [3]

1. Possibility to skip connections

2. Better propagation to deeper layers // 152 layers

 

3. Ameliorating ILSVRC top-5 to 3.57%

What are possible applications for convolutional neural network (CNN)? [4]

1. Generic learned features transfer with CalTech256

2. Detection

3. Semantic segmentation

4. Face verification

How does detection with convolutional neural network (CNN) works? [4]

1. Extract region proposals

2. Compute CNN features

3. Classify regions

 

4. Ameliorating accuracy from ~35% to ~50%

How did the accuracy of object detection increased with convolutional neural network (CNN)?

From before 40% to depp CNN 75% // Before with sliding-window

Name three object detectors based on convolutional neural network (CNN)? [3]

1. R-CNN 2. fast R-CNN and 3. faster R-CNN // R stands for regions

What are the steps of R-CNN? [3]

1. Extract ~2k region proposals from input image // Selective search

2. Compute CNN features out of warped region with pre-trained/fine-tuned network // AlexNet, VGGNet

3. Classify regions

How are regions classified for R-CNN? [2]

1. Linear SVMs and 2. bounding box regressors (Bbox reg)

How does the linear SVM on R-CNN works? [2]

1. fc(xfc7) = wcT*xfc7

2. With xfc7 features from fully-connected layer 7 and c the object class

How does the bbox reg of R-CNN works? [2]

1. Predict 2D box due to wrong proposal region

2. Compute weight for new x*,y*,w* and h*