๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋กํ…์ŠคํŠธ๋ถ„์„ (10)

DATA101

[NLP] Word Embedding์˜ ์ดํ•ด: ํฌ์†Œํ‘œํ˜„๊ณผ ๋ฐ€์ง‘ํ‘œํ˜„

๐Ÿ“š ๋ชฉ์ฐจ1. ํฌ์†Œํ‘œํ˜„(Sparse Representation) 2. ๋ฐ€์ง‘ํ‘œํ˜„(Dense Representation) 3. ์›Œ๋“œ์ž„๋ฒ ๋”ฉ(Word Embedding)๋“ค์–ด๊ฐ€๋ฉฐ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ(Word Embedding)์€ ๋‹จ์–ด(Word)๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ๋ฐ, ํŠนํžˆ ๋ฐ€์ง‘ํ‘œํ˜„(Dense Representation) ๋ฐฉ์‹์„ ํ†ตํ•ด ํ‘œํ˜„ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๋ฐ€์ง‘ํ‘œํ˜„๊ณผ ๋ฐ˜๋Œ€๋˜๋Š” ๊ฐœ๋…์ด ํฌ์†Œํ‘œํ˜„(Sparse Representation)์ž…๋‹ˆ๋‹ค. ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ์„ ์ดํ•ดํ•˜๊ธฐ์— ์•ž์„œ ํฌ์†Œํ‘œํ˜„๊ณผ ๋ฐ€์ง‘ํ‘œํ˜„์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.1. ํฌ์†Œํ‘œํ˜„(Sparse Representation)ํฌ์†Œํ‘œํ˜„์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฒกํ„ฐ ๋˜๋Š” ํ–‰๋ ฌ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ˆ˜์น˜ํ™”ํ•˜์—ฌ ํ‘œํ˜„ํ•  ๋•Œ ๊ทนํžˆ ์ผ๋ถ€์˜ ์ธ๋ฑ์Šค๋งŒ ํŠน์ • ๊ฐ’์œผ๋กœ ํ‘œํ˜„ํ•˜๊ณ , ๋Œ€๋ถ€๋ถ„์˜ ..

[NLP] ๋ฌธ์„œ ๋‹จ์–ด ํ–‰๋ ฌ(DTM) ๊ฐœ๋… ์ดํ•ด

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ ๋ฌธ์„œ ๋‹จ์–ด ํ–‰๋ ฌ(DTM)์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.๐Ÿ“š ๋ชฉ์ฐจ1. DTM ๊ฐœ๋… 2. DTM ์˜ˆ์‹œ 3. DTM ํ•œ๊ณ„์ 1. DTM ๊ฐœ๋…๋ฌธ์„œ ๋‹จ์–ด ํ–‰๋ ฌ(Document-Term Maxtrix, DTM)์€ ๋‹ค์ˆ˜์˜ ๋ฌธ์„œ ๋ฐ์ดํ„ฐ(=Corpus)์—์„œ ๋“ฑ์žฅํ•œ ๋ชจ๋“  ๋‹จ์–ด์˜ ์ถœํ˜„ ๋นˆ๋„์ˆ˜(frequency)๋ฅผ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฆ‰, DTM์€ ๋‹ค์ˆ˜์˜ ๋ฌธ์„œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ Bag of Words(BoW)๋ฅผ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. DTM์€ ๊ตญ์†Œ ํ‘œํ˜„(Local Representation) ๋˜๋Š” ์ด์‚ฐ ํ‘œํ˜„(Discrete Representation)์˜ ์ผ์ข…์œผ๋กœ ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.2. DTM ์˜ˆ์‹œDTM ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด 4๊ฐœ์˜ ๋ฌธ์„œ๊ฐ€ ์žˆ๋‹ค..