๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋ก๋‹จ์–ดํ‘œํ˜„๋ฐฉ๋ฒ• (2)

DATA101

[NLP] ๋ฌธ์„œ ๋‹จ์–ด ํ–‰๋ ฌ(DTM) ๊ฐœ๋… ์ดํ•ด

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ ๋ฌธ์„œ ๋‹จ์–ด ํ–‰๋ ฌ(DTM)์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.๐Ÿ“š ๋ชฉ์ฐจ1. DTM ๊ฐœ๋… 2. DTM ์˜ˆ์‹œ 3. DTM ํ•œ๊ณ„์ 1. DTM ๊ฐœ๋…๋ฌธ์„œ ๋‹จ์–ด ํ–‰๋ ฌ(Document-Term Maxtrix, DTM)์€ ๋‹ค์ˆ˜์˜ ๋ฌธ์„œ ๋ฐ์ดํ„ฐ(=Corpus)์—์„œ ๋“ฑ์žฅํ•œ ๋ชจ๋“  ๋‹จ์–ด์˜ ์ถœํ˜„ ๋นˆ๋„์ˆ˜(frequency)๋ฅผ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฆ‰, DTM์€ ๋‹ค์ˆ˜์˜ ๋ฌธ์„œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ Bag of Words(BoW)๋ฅผ ํ–‰๋ ฌ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. DTM์€ ๊ตญ์†Œ ํ‘œํ˜„(Local Representation) ๋˜๋Š” ์ด์‚ฐ ํ‘œํ˜„(Discrete Representation)์˜ ์ผ์ข…์œผ๋กœ ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.2. DTM ์˜ˆ์‹œDTM ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด 4๊ฐœ์˜ ๋ฌธ์„œ๊ฐ€ ์žˆ๋‹ค..

[NLP] Bag of Words(BoW) ๊ฐœ๋… ๋ฐ ์‹ค์Šต

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„ ๋ฐฉ๋ฒ•์ธ Bag of Words(BoW) ๊ฐœ๋…๊ณผ ์ƒ์„ฑ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ด…๋‹ˆ๋‹ค.๐Ÿ“š ๋ชฉ์ฐจ1. BoW ๊ฐœ๋…2. BoW ํŠน์ง•3. BoW ์ƒ์„ฑ ์ ˆ์ฐจ4. BoW ์ƒ์„ฑ ์‹ค์Šต1. BoW ๊ฐœ๋…Bag of Words(BoW)๋Š” ๋‹จ์–ด๋ฅผ ์ˆ˜์น˜ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, ๋ฌธ์„œ ๋‚ด ๋‹จ์–ด์˜ ์ˆœ์„œ์™€ ์˜๋ฏธ๋Š” ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ์˜ค์ง ์ถœํ˜„ ๋นˆ๋„(frequency)๋งŒ ๊ณ ๋ คํ•˜์—ฌ ๋‹จ์–ด๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. BoW๋Š” ๊ตญ์†Œ ํ‘œํ˜„๋ฐฉ๋ฒ•(Local Representation) ๋˜๋Š” ์ด์‚ฐ ํ‘œํ˜„๋ฐฉ๋ฒ•(Discrete Representation)์˜ ์ผ์ข…์œผ๋กœ ์นด์šดํŠธ ๊ธฐ๋ฐ˜์˜ ๋‹จ์–ด ํ‘œํ˜„๋ฐฉ๋ฒ•(Count-based Word Representation)์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค(๊ทธ๋ฆผ 1 ์ฐธ๊ณ ).2. BoW ํŠน์ง•BoW๋Š” ์–ด๋–ค ๋‹จ์–ด๋“ค์ด ๋ช‡ ํšŒ..