Paper Overview

CVPR'24

Abstract

이 논문은 open-world semantic segmentation을 다룬다.
저자들은 정확한 closed-world semantic segmentation을 하는 동시에 새로운 카테고리를 색인할 수 있는 접근법을 제안한다.
이 접근법은 추가로 한 이미지 내에서 새롭게 발견된 class들에 대해 similarity measure를 제공한다.

Keywords

Open-World Semantic Segmentation

Introduction

open-world란 실제 상황을 의미하며 학습데이터 외의 카테고리가 존재할 수 있다는 것을 의미한다.
반대로 closed-world는 현대의 supervised 세팅과 같이 현실세계에는 학습데이터에 있는 카테고리만 존재한다고 가정하는 것이다.
따라서 이러한 closed-world 세팅으로 학습된 모델은 known class에 overconfident한 예측을 내며 어떤 데이터가 입력돼도 known class에만 class를 할당한다.
이것이 vision system이 NLP system에 비해 주목받지 못하는 여러 이유 중 하나다.

이 논문은 위 문제를 해결하기 위해 open-world semantic segmentation 문제에 대해 다룬다.
test에서 이미지가 주어지면 모델은 unseen 카테고리에 속하는 pixel을 감지하고 이 알 수 없는(unknown) 카테고리들도 분리해내는 것을 목표로 한다.
전자를 보통 anomaly segmentation이라 하고 후자를 novel class discovery라고 한다.
저자들은 이 문제들을 한 네트워크로 해결하고자 한다.
참고로 novel class discovery는 학습데이터에 unlabel unknown data가 포함된다.

Our Approach

첫번째 decoder는 semantic segmentation을 다루고 feature space에서 작동한다. 그래서 각 class에 대해 비슷한 feature를 가지도록 한다.

두번째 decoder는 binary anomaly segmentation을 수행한다.

이 결과들을 통합하여 open-world semantic segmentation을 수행한다.

2. Approach for Open-World Segmentation

첫번째 decoder는 semantic segmenation을 위한 것으로 feature space를 조작하여 각 known class에 대해 unique한 discriptor를 얻도록 한다.
(discriptor는 prototype라고도 하며 대표값이라 생각하면 된다.)
저자들의 목표는 known class에 대해서는 알맞은 semantic segmentation을 수행할뿐 아니라 특정 class의 각 pre-softmax feature가 discripter와 비슷하도록 한다.
여기서 pre-softmax feature란 network의 가장 끝 단 feature로 softmax 활성화 함수에 입력하기 전 출력을 말한다.
그리하여, 저자들은 이 discripter와 feature의 차이를 통해 unknown class들을 감지한다.

두번째 decoder는 contrastive decoder라고 부르며 contrastive loss와 objectosphere loss를 함께 이용한다.
따라서 모든 feature를 hypersphere의 표면에 위치시키고 unknown class들은 중앙으로 밀어낸다.
그리하여, second decoder는 anomaly segmentation 기능을 제공한다.

두 결과는 마지막으로 automated post-processing operation으로 합쳐진다.

$\boldsymbol{\Omega} = \left\{ (1,1), ..., (H, W)\right\}$는 image의 각 pixel을 나타내고, $Y \in \left\{1, ..., K\right\}^{H \times W}$는 label을 나타낸다.

$\hat{Y} \in \left\{1, ..., K\right\}^{H \times W}$은 예측을 나타낸다.

$\boldsymbol{\Omega}_{k} = \left\{p \in \boldsymbol{\Omega} | Y_{p} = k \right\}$는 각 class별 픽셀 집합이고, $\hat{\boldsymbol{\Omega}}_{k} = \left\{p \in \boldsymbol{\Omega} | \hat{Y}_{p} = Y_{p} \right\}$는 각 class별 true positive 예측 픽셀 집합을 나타낸다.
그리고 element-wise 제곱은 다음과 같이 나타낸다.

Semantic Decoder

semantic segmentaion을 위해 다음과 같이 cross-entropy로 학습한다.

$w$는 weight $\textbf{t}$는 one-hot encoding vector $\sigma$는 softmax함수, $\textbf{f}$는 pre-softmax feature다.

여기서 weight는 dataset class의 빈도의 역수다.

저자들은 이렇게 standard semantic segmentation을 진행할뿐 아니라 class discripter도 얻는다.

이를 위해서 $\hat{\boldsymbol{\Omega}}_{k}$의 평균을 구한다.

이렇게 평균을 구한 다음 분산도 구한다.

$e$번째 epoch를 시작할때 앞 epoch에서 계산해둔 mean과 variance를 가지고 있다.

$e$ epoch에서 semantic segmentation으로 이전 epoch의 값을 예측하도록 조종한다.

아래와 같은 loss를 feature loss라 한다.

첫 epoch에는 이를 시행하지 않는다.

이 두 loss를 합쳐서 다음과 같이 semantic decoder loss를 정의한다.

(위 feature loss를 사용하면 semantic segmentation의 학습 자체는 느리게 진행될 수 있으나 pre-softmax feature가 class discripter로 모일 것이라고 생각해볼 수 있다.)

Contrastive Decoder

contrastive decoder는 binary prediction의 기반을 제공한다. known이 0, unknown이 1.

앞서 언급한 것 처럼 contrastive loss와 objectosphere loss로 학습한다.

먼저 $\boldsymbol{\Omega}_{k}$의 평균 feature를 얻는다.

그 다음 constrastive loss를 계산한다.

동시에 objectosphere loss를 계산한다.

$\mathcal{D}_{k}$는 known class에 속하는 pixel 집합이다.

여기서 $\xi$는 구체의 반지름이다.

따라서 unlabel data는 반지름이 없도록 만든다. 즉 0 vector로 만든다.

최종 contrastive decoder loss는 다음과 같다.

위 loss로 학습하면 아래와 같은 결과를 얻을 수 있다.

Post-Processing for anomaly Segmentation

저자들은 semantic segmentation decoder를 통해 class별 평균과 분산을 계산할 수 있다.

이를 통해 쉽게 multi-variate normal distribution을 얻을 수 있다.

dataset 내에서 각 class별 Gaussian model을 빌드한 다음, 각 feature $\textbf{f}_{p}$에 대해 exponential kernel로 fitting score를 계산할 수 있다.

간단하게 마할라노비스거리에 exp 취한 것이라 생각하면 된다.

(class discripter와 얼마나 떨어져 있는가?)

이 score를 계산하고 각 pixel에 대해 최대값을 얻는다.

이 값이 작다는건 Gaussian 분포의 꼬리쪽으로 novel class로 간주된다.

저자들은 unknown에 대한 pixel별 socre $s_{unk, p}^{sem}$를 얻을 수 있다.

contrastive decoder에서도 이 score를 계산할 수 있다.

마지막으로 두 score를 평균낸다.

이 값이 $\delta$보다 크면 unknown class가 되는 것이다.

Post-Processing for Open-World Semantc Segmentation

pixel이 unkown로 고려될 경우 이 vector를 저장하고 이미 발견된 것인지 아니면 완전 새로운 것인지 구별할 필요가 있다.

다음과 같이 발견된 feature의 평균을 저장해두고 이것과의 거리가 $\eta$보다 크면 새로운 class라고 간주한다.

3. Class Similarity

open world 예측을 안하고 그냥 식(12)에서 $k$로 가장 가까운 class를 얻을 수 있다.

Experiments

precision-recall curve(AUPR), false positive rate at a true positive rate of 95% (FPR95), mean intersection-over-union(mIoU)를 메트릭으로 잡는다.

Anomaly Segmentation

저자들의 open-world 접근법은 closed-world 성능에 영향을 크게 주지 않는다는 것을 다음 표를 통해 보여준다.

Open-World Semantic Segmentation

Experiments on Class Similarity

Ablation Studies

Anomaly Segmentation

Class Similarity

Conclusions

In this paper, we presented a novel approach for openworld semantic segmentation on RGB images based on a double decoder architecture. Our method manipulates the feature space of the semantic segmentation for identifying novel classes and additionally indicates the known categories that are most similar to the newly discovered ones. We implemented and evaluated our approach on different datasets and provided comparisons with other existing models and supported all claims made in this paper. The experiments suggest that our double-decoder strategy achieves compelling open-world segmentation results. In fact, with our approach, we are able to detect all anomalous regions in an image and distinguish between different novel classes.

KHS Computer Vision

이 블로그 검색

[논문리뷰] ANEDL: Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

[논문리뷰] Open-World Semantic Segmentation Including Class Similarity

Paper Overview

Introduction

Our Approach

2. Approach for Open-World Segmentation

Semantic Decoder

Contrastive Decoder

Post-Processing for anomaly Segmentation

Post-Processing for Open-World Semantc Segmentation

3. Class Similarity

Experiments

Anomaly Segmentation

Open-World Semantic Segmentation

Experiments on Class Similarity

Ablation Studies

Anomaly Segmentation

Class Similarity

Conclusions

댓글

댓글 쓰기