[7주차] Troubleshooting

Fullstack Deeplearning 강의를 듣고 정리한 내용입니다.

https://fullstackdeeplearning.com/spring2021/lecture-7/

딥러닝 모델 문제 해결이 어려운 이유

버그가 존재하는지 알아차리기 어렵다.
Data, Model, Infrastructure 등 모델 성능을 떨어뜨리는 원인이 다양함
hyperparameters and dataset makeup에 성능이 민감함

모델 성능이 낮은 이유

Implementation bugs : Most DL bugs are invisible
Models are sensitive to hyperparameters (learning rate varies a lot by hyperparameters)
Data-Model fit
Data construction issues : Constructing good datasets is hard
-Not enough data
-Class imbalances
-Noisy labels
-Train/Test from different distributions

Neural Network troubleshooting 방법

다음의 과정을 고려한다.

1. 단순하게 시작하기

심플한 모델 구조로 시작한다.
Image의 경우, Start with LeNet-like architecture & Consider using ResNet later
Sequence 데이터의 경우, Transformer model or WaveNet-like model
멀티모달 데이터의 경우, Map each into a lower dimension space - Concatenate - Pass through FC layers
시작할 때 default 값들
Adam optimizer with learning rate 3e-4
Activations : Relu for FC and conv models, tanh for LSTMs
Regularization & Data normalization Start with none (버그의 가장 흔한 원인 중 하나라고 한다!)
Normalize scale of input data
Images Scale values to [0,1] or [-0.5, 0.5] (e.g. By dividing by 255)
작은 스케일로 시작한다.
Start with small training set : Create a simpler synthetic training set

2. Implement & Debug

가장 자주 발생하는 5가지 버그

모델 돌려보기

debugger 한줄씩 살펴보기 & shape 에러, casting 에러, OOM 에러

Start with lightweight implementation
Use off-the-shelf components (e.g. keras tf.layers.dense instead of tf.nn.relu(tf.matmul(W,x)))

Shape mismatch & Casting issue (Step through model creation & inference in a debugger)
Tools
Pytorch ipdb
tensorflow trickier tfdb
Out Of Memory (Scale back one-by-one)

Overfit a single batch

Look for corrupted data, over-regularization, errors

Error goes up
Error explodes
Error oscillates
Error plateaus

알려진 결과와 비교하기

Iterate until model performs up to expectations

3. 모델 평가

distribution shift

Use 2 validation sets : one sampled from training distribution & one from test distribution
Train-val error & Test-val error Proxy for how much distribution shift is hurting the model performance

Test error 는 irreducible error + bias + variance + distribution shift + validation overfitting로 볼 수 있다

4. 모델, 데이터 개선

Underfitting 해결하기

Make the model bigger (add layers / more units)
Reduce regularization
Choose different architecture (closer to SOTA model)
Tune learning rate & hyperparameters

Overfitting 해결하기

Address distribution shift

Analyze test-val set errors & collect or synthesize more training data to compensate
Apply domain adaptation techniques to train & test distributions

e.g.

Domain adaptation

Techniques to train on source distribution and generalize to another target using only unlabeled data ot limited labeled data.

Correlation Alignment
Domain confusion
cycleGAN

Re-balance datasets

If val looks significantly better than test, model overfits to the val set.
Due to small val sets / Lots of hyperparameter tuning
Recollect validation data or resample

저작자표시 비영리 (새창열림)

'3-2기 스터디 > MLOps' 카테고리의 다른 글

[8주차] Data Management (0)	2022.06.29
[5주차] ML Projects (0)	2022.05.31
[4주차] Transformers (0)	2022.05.17
[3주차] RNNs (0)	2022.05.10
[2주차] CNNs (0)	2022.05.03

[7주차] Troubleshooting

딥러닝 모델 문제 해결이 어려운 이유

Neural Network troubleshooting 방법

1. 단순하게 시작하기

2. Implement & Debug

가장 자주 발생하는 5가지 버그

모델 돌려보기

Overfit a single batch

알려진 결과와 비교하기

3. 모델 평가

distribution shift

4. 모델, 데이터 개선

Underfitting 해결하기

Overfitting 해결하기

Address distribution shift

Domain adaptation

Re-balance datasets

'3-2기 스터디 > MLOps' 카테고리의 다른 글

댓글

티스토리툴바

[7주차] Troubleshooting

딥러닝 모델 문제 해결이 어려운 이유

Neural Network troubleshooting 방법

1. 단순하게 시작하기

2. Implement & Debug

가장 자주 발생하는 5가지 버그

모델 돌려보기

Overfit a single batch

알려진 결과와 비교하기

3. 모델 평가

distribution shift

4. 모델, 데이터 개선

Underfitting 해결하기

Overfitting 해결하기

Address distribution shift

Domain adaptation

Re-balance datasets

'3-2기 스터디 > MLOps' 카테고리의 다른 글

관련글

댓글

티스토리툴바