본문 바로가기

GDG on campus Ewha Tech Blog

3-2기 스터디/MLOps

[4주차] Transformers

by 서 진 2022. 5. 17.

Full Stack Deep Learning

1. Transfer Learning in Computer Vision

이미지 데이터 1 만장으로 새 분류→ 과적합 발생할 수 있음
→ fine tuning
→ Resnet-50가 좋은 성능을 보일 것
- 큰 데이터로 학습 시킨 large model (=pretrained model)
- 학습 되어있는 모델을 가져와 새로운 레이어를 더하거나 교체해 학습
- → 더 적은 데이터로 빠르고 정확하게 학습 가능전이 학습이란?

Model zoo
- pretrained-model
- tensorflow, pytorch 둘 다 사용 가능

2. Embeddings and Language Models

자연어 처리에서 실제 input은 단어이지만, 딥러닝에서는 벡터임
단어를 벡터로 어떻게 바꿀까?
- 원-핫 인코딩
- 문제) 작동은 되나 어휘 크기에 따라 제대로 확장 X→ Violates what we know about word similarity
- → 매우 높은 차원의 희소 벡터에서의 신경망은 잘 작동하지 못함
- dense vector문제) how do we find the values of the embedding matrix?
  - Learn as part of the task
  - Learn a Language Model→ Skip-grams (Look on both sides of the target word)
  - → N-Grams
  - 작업 속도를 높이기 위해서는?→ Word2Vec
  - → Binary instead of multi-class
- → embedding matrix

3. "NLP's ImageNet Moment": ELMO/ULMFit

around 2017

Elmo
- SQuAD
- SNLI
- GLUE
ULMFit
- similar to Elmo

4. Transformers

Paper
- Encoder-decoder with only attention and fully-connected layers
- 실제 매커니즘
- focus just on the encoder
→ Attention is all you need(2017) ~~다음 논문 스터디 때 읽기~~
(Masked) Self-attention
Positional encoding
Layer normalization

4.1 Attention in detail

Basic self-attention

No learned weights
Order of the sequence does not affect result of computations

Let's learn some weights!

x_i 를 어떻게 사용할 것인지 생각하기 (세 가지 방법)
- Query
- → Compared to every other vector to compute attention weights for its own output y_i
- Key
- → Compared to every other vector to compute attention weight w_ij for output y_j
- Value
- → Summed with other vectors to form the result of the attention weighted sum
Transformer
- Learned query, key, value weights
- Multiple heads
- Order of the sequence does not affect result of computations
- → encode each vector with position

4.2 BERT, GPT-2, DistillBERT, T5

GPT / GPT-2
→ Generative Pre-trained Transformer
BERT
→ Bidirectional Encoder Representations from Transformers
Transformer

T5: Text-to-Text Transfer Transformer
GPT-3
DistillBERT
→ a smaller model is trained to reproduce the output of a larger model1. Transfer Learning in Computer Vision

저작자표시 비영리 (새창열림)

'3-2기 스터디 > MLOps' 카테고리의 다른 글

[7주차] Troubleshooting (0)	2022.06.21
[5주차] ML Projects (0)	2022.05.31
[3주차] RNNs (0)	2022.05.10
[2주차] CNNs (0)	2022.05.03
[1주차] Fundamentals (0)	2022.04.08

댓글

티스토리툴바