๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋ก๋‹จ์–ด ์œ ์‚ฌ๋„ (1)

DATA101

[NLP] ๋ฌธ์„œ ์œ ์‚ฌ๋„ ๋ถ„์„: (1) ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„(Cosine Similarity)

๐Ÿ“š ๋ชฉ์ฐจ1. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ฐœ๋…2. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ์‹ค์Šต1. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๊ฐœ๋…์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„(Cosine Similarity)๋ž€ ๋‘ ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ๊ฐ๋„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋‘ ๋ฒกํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€ ์ธก์ •ํ•˜๋Š” ์ฒ™๋„์ž…๋‹ˆ๋‹ค. ์ฆ‰, DTM, TF-IDF, Word2Vec ๋“ฑ๊ณผ ๊ฐ™์ด ๋‹จ์–ด๋ฅผ ์ˆ˜์น˜ํ™”ํ•˜์—ฌ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์„œ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ๋น„๊ตํ•˜๋Š” ๊ฒŒ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋Š” \(1\)์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๋‘ ๋ฒกํ„ฐ๊ฐ€ ์œ ์‚ฌํ•˜๋‹ค๊ณ  ํ•ด์„ํ•˜๋ฉฐ, ๋ฌธ์„œ์˜ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ์—๋„ ๋น„๊ต์  ๊ณต์ •ํ•˜๊ฒŒ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ 1๊ณผ ๊ฐ™์ด ๋‘ ๋ฒกํ„ฐ๊ฐ€ ๊ฐ™์€ ๋ฐฉํ–ฅ์„ ๊ฐ€๋ฆฌํ‚ค๋Š”, ์ฆ‰ ๋‘ ๋ฒกํ„ฐ ์‚ฌ์ด์˜ ๊ฐ๋„๊ฐ€ \(0^\circ\)์ผ ๋•Œ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๊ฐ€ ์ตœ๋Œ“๊ฐ’์ธ 1์„ ๊ฐ–์Šต๋‹ˆ๋‹ค. \(A\), \(B\)๋ผ๋Š” ๋‘ ๋ฒกํ„ฐ๊ฐ€..