[Paper Review] Decoupled Knowledge Distillation

Paper Review/Knowledge Distillation

[Paper Review] Decoupled Knowledge Distillation

hakk35 2024. 11. 25. 08:49

This is a review of

"Decoupled Knowledge Distillation"
presented at CVPR 2022.

Introduction

Types of Knowledge Distillation

Logits-based method

$(+)$ Computational and storage cost ↓
$(-)$ Unsatisfactory performance

Feature-based method

$(+)$ Superior performance
$(-)$ Extra computational cost and storage usage

∴ Potential of logit distillation is limited.

Decoupled Knowledge Distillation

Target classifiation knowledge distillation $($ TCKD $)$

Binary logit distillation

Non-target classification knowledge distillation $($ NCKD $)$

Knowledge among non-target logits

Method

Reformulation

$p_{i} = \frac{\exp (z_{i})}{\sum_{j = 1}^{C} \exp (z_{j})}, p = [p_{1}, p_{2}, \dots, p_{t}, \dots, p_{C}] \in R^{1 \times C}$

Binary probabilities

$p_{t} = \frac{\exp (z_{t})}{\sum_{j = 1}^{C} \exp (z_{j})}, p_{∖ t} = \frac{\sum_{k = 1, k \neq t}^{C} \exp (z_{k})}{\sum_{j = 1}^{C} \exp (z_{j})}, b = [p_{t}, p_{∖ t}] \in R^{1 \times 2}$

Probabilities among non-target classes

${\hat{p}}_{i} = \frac{\exp (z_{i})}{\sum_{j = 1, j \neq t}^{C} \exp (z_{j})}, \hat{p} = [{\hat{p}}_{1}, \dots, {\hat{p}}_{t - 1}, {\hat{p}}_{t + 1}, \dots, {\hat{p}}_{C}] \in R^{1 \times (C - 1)}$

Vanilla KD

$\begin{aligned} KD & = KL (p^{T} ‖ p^{S}) \\ = p_{t}^{T} \log (\frac{p_{t}^{T}}{p_{t}^{S}}) + \sum_{i = 1, i \neq t}^{C} p_{i}^{T} \log (\frac{p_{i}^{T}}{p_{i}^{S}}) \end{aligned}$

$\begin{aligned} KD & = p_{t}^{T} \log (\frac{p_{t}^{T}}{p_{t}^{S}}) + p_{∖ t}^{T} \sum_{i = 1, i \neq t}^{C} {\hat{p}}_{i}^{T} (\log (\frac{{\hat{p}}_{i}^{T}}{{\hat{p}}_{i}^{S}}) + \log (\frac{p_{∖ t}^{T}}{p_{∖ t}^{S}})) \\ = p_{t}^{T} \log (\frac{p_{t}^{T}}{p_{t}^{S}}) + p_{∖ t}^{T} \log (\frac{p_{∖ t}^{T}}{p_{∖ t}^{S}}) + p_{∖ t}^{T} \sum_{i = 1, i \neq t}^{C} {\hat{p}}_{i}^{T} \log (\frac{{\hat{p}}_{i}^{T}}{{\hat{p}}_{i}^{S}}) . \end{aligned}$

$KD = KL (b^{T} ‖ b^{S}) + (1 - p_{t}^{T}) KL ({\hat{p}}^{T} ‖ {\hat{p}}^{S})$

$∴ KD = TCKD + (1 - p_{t}^{T}) NCKD$

While NCKD focus on the knowledge among non-target classes, TCKD focus on the knowledge related to the target class.

Effects of TCKD and NCKD

Solely applying TCKD is unhelpful or even harmful.
Performane of NCKD are comparable and even better than vanilla KD

∴ Target-class related knwoedge could not be as important as knolwedge among non-target classes.

∴ The more difficult the training data is, the more benefits TCKD could provide.

Decoupled Knowledge Distillation

$KD = TCKD + (1 - p_{t}^{T}) NCKD$

NCKD loss is coupled with $(1 - p_{t}^{T})$ : More confident predictions results in smaller NCKD weights. $($ highly suppressed weights $)$
Weights of NCKD and TCKD are coupled.

$∴ DKD = α TCKD + β NCKD$

Experiments

Ablation: $α$ and $β$

CIFAR-100

ImageNet

COCO

Conclusions

Reformulation of vanilla KD loss into two parts: TCKD and NCKD
Decoupled Knowledge Distillation overcomes limitation of coupled formulation or effective transfer.
Significant improvements on various datasets

저작자표시 비영리 변경금지

'Paper Review > Knowledge Distillation' 카테고리의 다른 글

[Paper Review] Knowledge Distillation from A Stronger Teacher $0$	2024.11.25
[Paper Review] Multi-level Logit Distillation $0$	2024.11.25
[Paper Review] Distilling Knowledge via Knowledge Review $0$	2024.11.25
[Paper Review] Similarity-Preserving Knowledge Distillation $0$	2024.11.25
[Paper Review] Distilling the Knowledge in a Neural Network $0$	2024.11.25

현재글[Paper Review] Decoupled Knowledge Distillation

成學 학문을 성취하다; Achievement in Studying; 学問を成し遂げる

成學

학문을 성취하다; Achievement in Studying; 学問を成し遂げる

knowledge_distillation, computational_fluid_dynamics, multi_image_mixing, radar_system, hyperbolic_pdes, overfitting, difference_schemes, numerical_methods, dataset_distillation, data_augmentation, neural_operator, object_detection, euler_equation, Generalization, computer_vision, single_image_mixing, upwind_schemes, conservative_methods, MIT, discretisation,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

成學