16. Bias and Variance with Mismatched Data Distributions

2023. 9. 15. 00:07Google ML Bootcamp/3. Structuring Machine Learning Projects

Train / Dev의 분포 형태부터 보자. 비슷한가?

if Train / Dev set distribution are similar, variance problem

if Train / Dev set distribution are not similar (means come from another dataset), can't decide to variance problem.

 

Train, Train-dev(Train과 같은 분포를 가진 Train 중 일부 학습에 사용 x), Dev,Test(Train과 다른 분포를 가진 Dev, Test)

In left case, Train-dev error 9%

- variance problem exist. cauze Train-dev set distribution is similar with Train set.

 

In right case, Train-dev error 1.5%, Dev error 10%

- data mismatch problem. not variance problem.

 

 

In left case, avoidable bias problem exist. not variance and data mismatch problem.

 

In right case, avoidable bias and data mismatch problem exist. not variance problem.

 

if dev error 와 Test error가 차이가 많이나는 경우, dev에 과적합되었다.

- 해결법은 Dev set을 늘리는 것. 데이터를 더 수집해야한다.

 

 

**정리 : avoidable bias 는 bigger model or more data 등등, variance는 regularization, change NN model 등등**

- data mismatch 문제는 어떻게 해결해야 하나? 다음영상에서 알아보자.