728x90

learning rate 2

[Deep Learning] ์˜ค์ฐจ ์ˆ˜์ •ํ•˜๊ธฐ: ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•, ํŽธ๋ฏธ๋ถ„ ์ฝ”๋“œ ๊ตฌํ˜„

๊ธฐ์šธ๊ธฐ a๋ฅผ ๋„ˆ๋ฌด ํฌ๊ฒŒ ์žก๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ์ž‘๊ฒŒ ์žก์œผ๋ฉด ์˜ค์ฐจ๊ฐ€ ์ปค์ง. ์˜ค์ฐจ์™€ ๊ธฐ์šธ๊ธฐ ์‚ฌ์ด์—๋Š” ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ์Œ. ์˜ค์ฐจ๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ์ ์€ ๊ธฐ์šธ๊ธฐ a๊ฐ€ m์— ์œ„์น˜ํ•ด ์žˆ์„๋•Œ ์ด๋ฏ€๋กœ, m์œผ๋กœ ์ด๋™์‹œํ‚ค๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•จ. ๋ฏธ๋ถ„ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ด์šฉํ•œ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(gradient descent)๋ฅผ ํ†ตํ•ด ์˜ค์ฐจ๋ฅผ ๋น„๊ตํ•˜์—ฌ ๊ฐ€์žฅ ์ž‘์€ ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™์‹œ์ผœ์•ผํ•จ. ์ตœ์†Ÿ๊ฐ’ m์—์„œ์˜ ์ˆœ๊ฐ„ ๊ธฐ์šธ๊ธฐ๋Š” x์ถ•๊ณผ ํ‰ํ–‰ํ•œ ์„ , ์ฆ‰ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ž„. - '๋ฏธ๋ถ„๊ฐ’์ด 0์ธ ์ง€์ '์„ ์ฐพ์•„ ์˜ค์ฐจ๋ฅผ ๊ฐ€์žฅ ์ ๊ฒŒ ๋งŒ๋“ค์–ด์•ผ ํ•จ. aโ‚์—์„œ ๋ฏธ๋ถ„์„ ๊ตฌํ•œ๋‹ค. ๊ตฌํ•ด์ง„ ๊ธฐ์šธ๊ธฐ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ(๊ธฐ์šธ๊ธฐ๊ฐ€ +๋ฉด ์Œ์˜ ๋ฐฉํ–ฅ, -๋ฉด ์–‘์˜ ๋ฐฉํ–ฅ)์œผ๋กœ ์–ผ๋งˆ๊ฐ„ ์ด๋™์‹œํ‚จ aโ‚‚์—์„œ ๋ฏธ๋ถ„์„ ๊ตฌํ•œ๋‹ค. ์œ„์—์„œ ๊ตฌํ•œ ๋ฏธ๋ถ„ ๊ฐ’์ด 0์ด ์•„๋‹ˆ๋ฉด ์œ„ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•œ๋‹ค. ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์€ ์ด๋ ‡๊ฒŒ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ธฐ์šธ๊ธฐ a๋ฅผ ๋ณ€ํ™”์‹œ์ผœ์„œ..

Deep Learning 2022.11.22

[Deep Learning] learning rate decay, learning rate scheduler

Learning Rate Decay ๊ธฐ์กด์˜ learning rate๊ฐ€ ๋†’์€ ๊ฒฝ์šฐ loss ๊ฐ’์„ ๋น ๋ฅด๊ฒŒ ๋‚ด๋ฆด ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, ์˜ค์ฐจ๊ฐ€ 0์ธ ์ง€์ ์„ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ๊ณ , ๋‚ฎ์€ ๊ฒฝ์šฐ๋Š” ์ตœ์ ์˜ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๋„ˆ๋ฌด ์˜ค๋žœ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ •์ ์ธ learning rate๋กœ ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, epoch๋งˆ๋‹ค ๋™์ ์œผ๋กœ learning rate๋ฅผ ๋ณ€ํ™”์‹œ์ผœ ์ตœ์ ์˜ ํ•™์Šต์„ ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. tf.keras.optimizers.schedules.CosineDecay tf.keras.optimizers.schedules.CosineDecay( initial_learning_rate, decay_steps, alpha=0.0, name=None ) initial_learning_rate: ์ดˆ๊ธฐ lr decay_ste..

728x90