19. Negative Sampling

코딩소비 2023. 9. 27. 15:16

target을 맞추는 학습을 하기 위해서는 연관성이 없는(target=0)인 데이터셋도 있어야한다

- 이게 negative sampling.

10,000(vocabulary size)에 대해 softmax 보다, 10,000에 대해 sigmoid(binary classification) 계산비용이 훨씬 싸다. -> 왜인진 모르겠음

- 그리고 k개의 negative sampling을 통해 10,000개에 대해 sigmoid하는 것이 아닌 k+1(k negative, 1 positive)개에 대해 sigmoid를 수행하면 된다.

- 이때 매 학습 단계에서 k개의 neagtive sample은 vocabulary에서 무작위로 추출한다.