๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋กword embedding (2)

DATA101

[NLP] Word2Vec: (2) CBOW ๊ฐœ๋… ๋ฐ ์›๋ฆฌ

๐Ÿ“š๋ชฉ์ฐจ1. ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ 2. ์ธ๊ณต์‹ ๊ฒฝ๋ง ๋ชจํ˜• 3. ํ•™์Šต ์ ˆ์ฐจ4. CBOW vs Skip-gram5. ํ•œ๊ณ„์ ๋“ค์–ด๊ฐ€๋ฉฐWord2Vec๋Š” ํ•™์Šต๋ฐฉ์‹์— ๋”ฐ๋ผ ํฌ๊ฒŒ \(2\)๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: Continuous Bag of Words(CBOW)์™€ Skip-gram. CBOW๋Š” ์ฃผ๋ณ€ ๋‹จ์–ด(Context Word)๋กœ ์ค‘๊ฐ„์— ์žˆ๋Š” ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ค‘๊ฐ„์— ์žˆ๋Š” ๋‹จ์–ด๋ฅผ ์ค‘์‹ฌ ๋‹จ์–ด(Center Word) ๋˜๋Š” ํƒ€๊ฒŸ ๋‹จ์–ด(Target Word)๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ, Skip-gram์€ ์ค‘์‹ฌ ๋‹จ์–ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ฃผ๋ณ€ ๋‹จ์–ด๋“ค์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” CBOW์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๊ณ , ๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ Skip-gram์— ๋Œ€ํ•ด ์ž์„ธํžˆ ๋‹ค๋ฃน๋‹ˆ๋‹ค.1. ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑCBOW์—์„œ ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์„ ..

[NLP] Word2Vec: (1) ๊ฐœ๋…

๐Ÿ“š ๋ชฉ์ฐจ1. Word2Vec ๊ฐœ๋…2. ํฌ์†Œํ‘œํ˜„๊ณผ์˜ ์ฐจ์ด์  3. ์–ธ์–ด๋ชจ๋ธ๊ณผ์˜ ์ฐจ์ด์ 1. Word2Vec ๊ฐœ๋…Word2Vec๋Š” Word to Vector๋ผ๋Š” ์ด๋ฆ„์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ๋‹จ์–ด(Word)๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ˆ˜์น˜ํ™”๋œ ๋ฒกํ„ฐ(Vector)๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋ถ„์‚ฐํ‘œํ˜„(Distributed Representation) ๊ธฐ๋ฐ˜์˜ ์›Œ๋“œ์ž„๋ฒ ๋”ฉ(Word Embedding) ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋ถ„์‚ฐํ‘œํ˜„์ด๋ž€ ๋ถ„ํฌ๊ฐ€์„ค(Distibutional Hypothesis) ๊ฐ€์ • ํ•˜์— ์ €์ฐจ์›์— ๋‹จ์–ด ์˜๋ฏธ๋ฅผ ๋ถ„์‚ฐํ•˜์—ฌ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ถ„ํฌ๊ฐ€์„ค์€ "์œ ์‚ฌํ•œ ๋ฌธ๋งฅ์— ๋“ฑ์žฅํ•œ ๋‹จ์–ด๋Š” ์œ ์‚ฌํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ–๋Š”๋‹ค"๋ผ๋Š” ๊ฐ€์ •์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐํ™”ํ•˜๋Š” ์ž‘์—…์„ ์›Œ๋“œ์ž„๋ฒ ๋”ฉ(Word Embedding)์ด๋ผ๊ณ ..