๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋ก๋‹จ์–ด์ž„๋ฒ ๋”ฉ (2)

DATA101

[NLP] Word2Vec: (2) CBOW ๊ฐœ๋… ๋ฐ ์›๋ฆฌ

๐Ÿ“š๋ชฉ์ฐจ1. ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ 2. ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจํ˜• 3. ํ•™์Šต ์ ˆ์ฐจ4. CBOW vs Skip-gram5. ํ•œ๊ณ„์ ๋“ค์–ด๊ฐ€๋ฉฐWord2Vec๋Š” ํ•™์Šต๋ฐฉ์‹์— ๋”ฐ๋ผ ํฌ๊ฒŒ \(2\)๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: Continuous Bag of Words(CBOW)์™€ Skip-gram. CBOW๋Š” ์ฃผ๋ณ€ ๋‹จ์–ด(Context Word)๋กœ ์ค‘๊ฐ„์— ์žˆ๋Š” ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ค‘๊ฐ„์— ์žˆ๋Š” ๋‹จ์–ด๋ฅผ ์ค‘์‹ฌ ๋‹จ์–ด(Center Word) ๋˜๋Š” ํƒ€๊ฒŸ ๋‹จ์–ด(Target Word)๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ, Skip-gram์€ ์ค‘์‹ฌ ๋‹จ์–ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ฃผ๋ณ€ ๋‹จ์–ด๋“ค์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” CBOW์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๊ณ , ๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ Skip-gram์— ๋Œ€ํ•ด ์ž์„ธํžˆ ๋‹ค๋ฃน๋‹ˆ๋‹ค.1. ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑCBOW์—์„œ ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์„ ..

[NLP] Word Embedding์˜ ์ดํ•ด: ํฌ์†Œํ‘œํ˜„๊ณผ ๋ฐ€์ง‘ํ‘œํ˜„

๐Ÿ“š ๋ชฉ์ฐจ1. ํฌ์†Œํ‘œํ˜„(Sparse Representation) 2. ๋ฐ€์ง‘ํ‘œํ˜„(Dense Representation) 3. ์›Œ๋“œ์ž„๋ฒ ๋”ฉ(Word Embedding)๋“ค์–ด๊ฐ€๋ฉฐ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ(Word Embedding)์€ ๋‹จ์–ด(Word)๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ๋ฐ, ํŠนํžˆ ๋ฐ€์ง‘ํ‘œํ˜„(Dense Representation) ๋ฐฉ์‹์„ ํ†ตํ•ด ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋ฐ€์ง‘ํ‘œํ˜„๊ณผ ๋ฐ˜๋Œ€๋˜๋Š” ๊ฐœ๋…์ด ํฌ์†Œํ‘œํ˜„(Sparse Representation)์ž…๋‹ˆ๋‹ค. ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ์„ ์ดํ•ดํ•˜๊ธฐ์— ์•ž์„œ ํฌ์†Œํ‘œํ˜„๊ณผ ๋ฐ€์ง‘ํ‘œํ˜„์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.1. ํฌ์†Œํ‘œํ˜„(Sparse Representation)ํฌ์†Œํ‘œํ˜„์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒกํ„ฐ ๋˜๋Š” ํ–‰๋ ฌ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆ˜์น˜ํ™”ํ•˜์—ฌ ํ‘œํ˜„ํ•  ๋•Œ ๊ทนํžˆ ์ผ๋ถ€์˜ ์ธ๋ฑ์Šค๋งŒ ํŠน์ • ๊ฐ’์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ , ๋Œ€๋ถ€๋ถ„์˜ ..