1. Transfer Learning in Computer Vision
- 이미지 데이터 1 만장으로 새 분류→ 과적합 발생할 수 있음
- → fine tuning
- → Resnet-50가 좋은 성능을 보일 것
- 큰 데이터로 학습 시킨 large model (=pretrained model)
- 학습 되어있는 모델을 가져와 새로운 레이어를 더하거나 교체해 학습
- → 더 적은 데이터로 빠르고 정확하게 학습 가능전이 학습이란?
- Model zoo
- pretrained-model
- tensorflow, pytorch 둘 다 사용 가능
2. Embeddings and Language Models
- 자연어 처리에서 실제 input은 단어이지만, 딥러닝에서는 벡터임
- 단어를 벡터로 어떻게 바꿀까?
- 원-핫 인코딩
- 문제) 작동은 되나 어휘 크기에 따라 제대로 확장 X→ Violates what we know about word similarity
- → 매우 높은 차원의 희소 벡터에서의 신경망은 잘 작동하지 못함
- dense vector문제) how do we find the values of the embedding matrix?
- Learn as part of the task
- Learn a Language Model→ Skip-grams (Look on both sides of the target word)
- → N-Grams
- 작업 속도를 높이기 위해서는?→ Word2Vec
- → Binary instead of multi-class
- → embedding matrix
3. "NLP's ImageNet Moment": ELMO/ULMFit
- around 2017
- Elmo
- SQuAD
- SNLI
- GLUE
- ULMFit
- similar to Elmo
4. Transformers
- Paper
- Encoder-decoder with only attention and fully-connected layers
- 실제 매커니즘
- focus just on the encoder
- → Attention is all you need(2017)
다음 논문 스터디 때 읽기 - (Masked) Self-attention
- Positional encoding
- Layer normalization
4.1 Attention in detail
Basic self-attention
- No learned weights
- Order of the sequence does not affect result of computations
Let's learn some weights!
- x_i 를 어떻게 사용할 것인지 생각하기 (세 가지 방법)
- Query
- → Compared to every other vector to compute attention weights for its own output y_i
- Key
- → Compared to every other vector to compute attention weight w_ij for output y_j
- Value
- → Summed with other vectors to form the result of the attention weighted sum
- Transformer
- Learned query, key, value weights
- Multiple heads
- Order of the sequence does not affect result of computations
- → encode each vector with position
4.2 BERT, GPT-2, DistillBERT, T5
- GPT / GPT-2
- → Generative Pre-trained Transformer
- BERT
- → Bidirectional Encoder Representations from Transformers
- Transformer
- T5: Text-to-Text Transfer Transformer
- GPT-3
- DistillBERT
- → a smaller model is trained to reproduce the output of a larger model1. Transfer Learning in Computer Vision
'3-2기 스터디 > MLOps' 카테고리의 다른 글
[7주차] Troubleshooting (0) | 2022.06.21 |
---|---|
[5주차] ML Projects (0) | 2022.05.31 |
[3주차] RNNs (0) | 2022.05.10 |
[2주차] CNNs (0) | 2022.05.03 |
[1주차] Fundamentals (0) | 2022.04.08 |
댓글