๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋ก์ „์ฒด ๊ธ€ (350)

DATA101

[๋”ฅ๋Ÿฌ๋‹] Epoch, Iteration, Batch size ๊ฐœ๋…

๐Ÿ“š ๋ชฉ์ฐจ 1. Batch Size 2. Iteration 3. Epoch 1. Batch Size Batch ํฌ๊ธฐ๋Š” ๋ชจ๋ธ ํ•™์Šต ์ค‘ parameter๋ฅผ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์ด ๋ฌธ์ œ ํ’€์ด๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•ด ๋‚˜๊ฐ€๋Š” ๊ณผ์ •์„ ์˜ˆ๋กœ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Batch ํฌ๊ธฐ๋Š” ๋ช‡ ๊ฐœ์˜ ๋ฌธ์ œ๋ฅผ ํ•œ ๋ฒˆ์— ์ญ‰ ํ’€๊ณ  ์ฑ„์ ํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด 100๊ฐœ์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ์„ ๋•Œ, 20๊ฐœ์”ฉ ํ’€๊ณ  ์ฑ„์ ํ•œ๋‹ค๋ฉด Batch ํฌ๊ธฐ๋Š” 20์ž…๋‹ˆ๋‹ค. ์‚ฌ๋žŒ์€ ๋ฌธ์ œ๋ฅผ ํ’€๊ณ  ์ฑ„์ ์„ ํ•˜๋ฉด์„œ ๋ฌธ์ œ๋ฅผ ํ‹€๋ฆฐ ์ด์œ ๋‚˜ ๋งž์ถ˜ ์›๋ฆฌ๋ฅผ ํ•™์Šตํ•˜์ฃ . ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์—ญ์‹œ ๋งˆ์ฐฌ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค. Batch ํฌ๊ธฐ๋งŒํผ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ด ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๊ฐ’๊ณผ ์‹ค์ œ ์ •๋‹ต ๊ฐ„์˜ ์˜ค์ฐจ(conf. ์†์‹คํ•จ์ˆ˜)๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ Optimizer๊ฐ€ parameter๋ฅผ..

[Deep Learning] ์ตœ์ ํ™”(Optimizer): (4) Adam

1. ๊ฐœ๋…Adaptive Moment Estimation(Adam)์€ ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ์จ Momentum๊ณผ RMSProp์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ฆ‰, ํ•™์Šต์˜ ๋ฐฉํ–ฅ๊ณผ ํฌ๊ธฐ(=Learning rate)๋ฅผ ๋ชจ๋‘ ๊ฐœ์„ ํ•œ ๊ธฐ๋ฒ•์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜์–ด "์˜ค๋˜" ์ตœ์ ํ™” ๊ธฐ๋ฒ•์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ตœ๊ทผ์—๋Š” RAdam, AdamW๊ณผ ๊ฐ™์ด ๋”์šฑ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์ตœ์ ํ™” ๊ธฐ๋ฒ•์ด ์ œ์•ˆ๋˜์—ˆ์ง€๋งŒ, ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ ์ „๋ฐ˜์„ ๊ณต๋ถ€ํ•˜๋Š” ๋งˆ์Œ๊ฐ€์ง์œผ๋กœ Adam์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.2. ์ˆ˜์‹์ˆ˜์‹๊ณผ ํ•จ๊ป˜ Adam์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. $$ m_{t} = \beta_{1} m_{t-1} + (1 - \beta_{1}) \nabla f(x_{t-1}) $$$$ g_{t} = \beta_{..

[Deep Learning] ์ตœ์ ํ™”(Optimizer): (3) RMSProp

1. ๊ฐœ๋…RMSProp๋Š” ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ์จ Root Mean Sqaure Propagation์˜ ์•ฝ์ž๋กœ, ์•Œ์— ์—์Šคํ”„๋กญ(R.M.S.Prop)์ด๋ผ๊ณ  ์ฝ์Šต๋‹ˆ๋‹ค.โœ‹๋“ฑ์žฅ๋ฐฐ๊ฒฝ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ AdaGrad๋Š” ํ•™์Šต์ด ์ง„ํ–‰๋  ๋•Œ ํ•™์Šต๋ฅ (Learning rate)์ด ๊พธ์ค€ํžˆ ๊ฐ์†Œํ•˜๋‹ค ๋‚˜์ค‘์—๋Š” \(0\)์œผ๋กœ ์ˆ˜๋ ดํ•˜์—ฌ ํ•™์Šต์ด ๋” ์ด์ƒ ์ง„ํ–‰๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. RMSProp์€ ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ๋ณด์™„ํ•œ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์œผ๋กœ์จ ์ œํ”„๋ฆฌ ํžŒํŠผ ๊ต์ˆ˜๊ฐ€ Coursea ๊ฐ•์˜ ์ค‘์— ๋ฐœํ‘œํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.๐Ÿ›  ์›๋ฆฌRMSProp์€ AdaGrad์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋ณ€์ˆ˜(feature)๋ณ„๋กœ ํ•™์Šต๋ฅ ์„ ์กฐ์ ˆํ•˜๋˜ ๊ธฐ์šธ๊ธฐ ์—…๋ฐ์ดํŠธ ๋ฐฉ์‹์—์„œ ์ฐจ์ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ „ time step์—์„œ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋‹จ์ˆœํžˆ ๊ฐ™์€ ๋น„์œจ๋กœ ๋ˆ„์ ํ•˜์ง€ ์•Š๊ณ  ์ง€์ˆ˜์ด๋™..

[Deep Learning] ์ตœ์ ํ™”(Optimizer): (2) AdaGrad

๐Ÿ“š ๋ชฉ์ฐจ 1. ๊ฐœ๋… 2. ์žฅ์  3. ๋‹จ์  1. ๊ฐœ๋… AdaGrad๋Š” ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ์จ Adaptive Gradient์˜ ์•ฝ์ž์ด๊ณ , ์ ์‘์  ๊ธฐ์šธ๊ธฐ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. Feature๋งˆ๋‹ค ์ค‘์š”๋„, ํฌ๊ธฐ ๋“ฑ์ด ์ œ๊ฐ๊ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋“  Feature๋งˆ๋‹ค ๋™์ผํ•œ ํ•™์Šต๋ฅ ์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋น„ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ด€์ ์—์„œ AdaGrad ๊ธฐ๋ฒ•์ด ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. AdaGrad๋Š” Feature๋ณ„๋กœ ํ•™์Šต๋ฅ (Learning rate)์„ Adaptiveํ•˜๊ฒŒ, ์ฆ‰ ๋‹ค๋ฅด๊ฒŒ ์กฐ์ ˆํ•˜๋Š” ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค. AdaGrad๋ฅผ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. $$ g_{t} = g_{t-1} + (\nabla f(x_{t-1}))^{2} $$ $$ x_{t} = x_{t-1} - \frac{\eta}{\sqrt{g_{t} + \epsi..

[Deep Learning] ์ตœ์ ํ™”(Optimizer): (1) Momentum

๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹ ์ตœ์ ํ™”(optimizer) ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ Momentum์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค. ๋จผ์ €, Momentum ๊ธฐ๋ฒ•์ด ์ œ์•ˆ๋œ ๋ฐฐ๊ฒฝ์ธ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์˜ ํ•œ๊ณ„์ ์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๊ณ  ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.๐Ÿ“š ๋ชฉ์ฐจ1. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ํ•œ๊ณ„ 1.1. Local Minimum ๋ฌธ์ œ 1.2. Saddle Point ๋ฌธ์ œ2. Momentum 2.1. ๊ฐœ๋… 2.2. ์ˆ˜์‹1. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ํ•œ๊ณ„๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์€ ํฌ๊ฒŒ 2๊ฐ€์ง€ ํ•œ๊ณ„์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ์งธ, Local Minimum์— ๋น ์ง€๊ธฐ ์‰ฝ๋‹ค๋Š” ์ . ๋‘˜์งธ, ์•ˆ์žฅ์ (Saddle point)๋ฅผ ๋ฒ—์–ด๋‚˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ์ . ๊ฐ๊ฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.1.1. Local Minimum..

[Deep Learning] ์ตœ์ ํ™” ๊ฐœ๋…๊ณผ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)

๐Ÿ“š ๋ชฉ์ฐจ1. ์ตœ์ ํ™” ๊ฐœ๋… 2. ๊ธฐ์šธ๊ธฐ ๊ฐœ๋… 3. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ• ๊ฐœ๋… 4. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์˜ ํ•œ๊ณ„1. ์ตœ์ ํ™” ๊ฐœ๋…๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ ์ตœ์ ํ™”(Optimization)๋ž€ ์†์‹ค ํ•จ์ˆ˜(Loss Function) ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ตฌํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค(์•„๋ž˜ ๊ทธ๋ฆผ 1 ์ฐธ๊ณ ). ๋”ฅ๋Ÿฌ๋‹์—์„œ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ๊ฑฐ์ณ ์˜ˆ์ธก๊ฐ’(\(\hat{y}\))์„ ์–ป์Šต๋‹ˆ๋‹ค. ์ด ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ ์ •๋‹ต(\(y\))๊ณผ์˜ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๋Š” ํ•จ์ˆ˜๊ฐ€ ์†์‹ค ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๊ฐ’๊ณผ ์‹ค์ ฏ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ(a.k.a., Feature)๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์ด ์ตœ์ ํ™”์ž…๋‹ˆ๋‹ค. ์ตœ์ ํ™” ๊ธฐ๋ฒ•์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋ณธ ํฌ์ŠคํŒ…์—์„œ๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.2. ๊ธฐ์šธ๊ธฐ ๊ฐœ๋…..

[Deep Learning] ํ‰๊ท ์ ˆ๋Œ€์˜ค์ฐจ(MAE) ๊ฐœ๋… ๋ฐ ํŠน์ง•

๐Ÿ’ก ๋ชฉํ‘œ ํ‰๊ท ์ ˆ๋Œ€์˜ค์ฐจ(MAE)์˜ ๊ฐœ๋… ๋ฐ ํŠน์ง•์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค. 1. MAE ๊ฐœ๋… ํ‰๊ท ์ ˆ๋Œ€์˜ค์ฐจ(Mean Absolute Error, MAE)๋Š” ๋ชจ๋“  ์ ˆ๋Œ€ ์˜ค์ฐจ(Error)์˜ ํ‰๊ท ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์˜ค์ฐจ๋ž€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์˜ˆ์ธกํ•œ ๊ฐ’๊ณผ ์‹ค์ œ ์ •๋‹ต๊ณผ์˜ ์ฐจ์ด๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ •๋‹ต์„ ์ž˜ ๋งžํž์ˆ˜๋ก MSE ๊ฐ’์€ ์ž‘์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, MAE๊ฐ€ ์ž‘์„์ˆ˜๋ก ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. MAE์˜ ์ˆ˜์‹์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. $$ E = \sum_{i}|y_{i} -\tilde{y_{i}}| $$ \(E\): ์†์‹ค ํ•จ์ˆ˜ \(y_i\): \(i\)๋ฒˆ์งธ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์ •๋‹ต \(\tilde{y_i}\): \(i\)๋ฒˆ์งธ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ์˜ˆ์ธกํ•œ ๊ฐ’ 2. MAE ํŠน์ง• 2.1. ์˜ค์ฐจ์™€ ๋น„๋ก€ํ•˜๋Š” ์†์‹ค ํ•จ์ˆ˜ MAE๋Š” ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ..

[BFS] ๋ฐฑ์ค€#16234: ์ธ๊ตฌ ์ด๋™/Python

๐Ÿ“ ๋ฌธ์ œ https://www.acmicpc.net/problem/16234 16234๋ฒˆ: ์ธ๊ตฌ ์ด๋™ N×Nํฌ๊ธฐ์˜ ๋•…์ด ์žˆ๊ณ , ๋•…์€ 1×1๊ฐœ์˜ ์นธ์œผ๋กœ ๋‚˜๋ˆ„์–ด์ ธ ์žˆ๋‹ค. ๊ฐ๊ฐ์˜ ๋•…์—๋Š” ๋‚˜๋ผ๊ฐ€ ํ•˜๋‚˜์”ฉ ์กด์žฌํ•˜๋ฉฐ, rํ–‰ c์—ด์— ์žˆ๋Š” ๋‚˜๋ผ์—๋Š” A[r][c]๋ช…์ด ์‚ด๊ณ  ์žˆ๋‹ค. ์ธ์ ‘ํ•œ ๋‚˜๋ผ ์‚ฌ์ด์—๋Š” ๊ตญ๊ฒฝ์„ ์ด ์กด์žฌํ•œ๋‹ค. ๋ชจ www.acmicpc.net ๐Ÿ’ก ์ ‘๊ทผ๋ฒ• BFS ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ํ(Queue) ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ฌธ์ œํ•ด๊ฒฐ ์ ˆ์ฐจ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ๊ตญ๊ฐ€๋ฅผ ๋Œ€์ƒ์œผ๋กœ ๊ฐ๊ฐ ์ค‘์‹ฌ๊ตญ์œผ๋กœ ์„ ์ •ํ•˜๊ณ , ์ƒํ•˜์ขŒ์šฐ ๋ฐฉ๋ฉด์— ์—ฐํ•ฉ์ด ๊ฐ€๋Šฅํ•œ ์ธ์ ‘๊ตญ์ด ์žˆ๋Š”์ง€ ํƒ์ƒ‰ํ•˜๋‹ˆ๋‹ค. ๋งŒ์ผ ์—ฐํ•ฉ๊ตญ์ด ์„ฑ๋ฆฝ๋œ๋‹ค๋ฉด, ํ•ด๋‹น ์ธ์ ‘๊ตญ์„ ์ค‘์‹ฌ์œผ๋กœ ๋‹ค์‹œ ์ƒํ•˜์ขŒ์šฐ ๋ฐฉ๋ฉด์˜ ์ธ์ ‘๊ตญ๊ณผ ์—ฐํ•ฉ๊ตญ์ด ์„ฑ๋ฆฝ๋˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ์ƒํ•˜์ขŒ์šฐ ๋ฐฉ๋ฉด์œผ๋กœ..