Auto-regressive 1

Tacotron: Towards End-to-End Speech Synthesis ์š”์•ฝ

๐Ÿ“œ Y. Wang et al., "Tacotron: Towards End-to-End Speech Synthesis," in Interspeech, 2017 ๋…ผ๋ฌธ 3์ค„ ์š”์•ฝ ๋ณต์žกํ•œ ๊ตฌ์กฐ์˜ ํ˜„๋Œ€ TTS ๋ชจ๋ธ์„ end-to-end ๊ตฌ์กฐ๋กœ ๋ณ€ํ™”ํ•˜์˜€๋‹ค. ์Œ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ, ๋” ๋‹ค์–‘ํ•œ ํŠน์ง•์˜ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ด์กŒ๋‹ค. ์˜ค๋””์˜ค ์ƒ˜ํ”Œ ๋‹จ์œ„์˜ ์ƒ์„ฑ์ด ์•„๋‹Œ, Mel-spectrogram ํ”„๋ ˆ์ž„ ๋‹จ์œ„๋กœ ์Œ์„ฑ์„ ์ƒ์„ฑํ•˜์—ฌ ๋” ๋น ๋ฅธ ํ•™์Šต๊ณผ ์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. Abstract Text-to-Speech (TTS, ๋ฌธ์ž ์Œ์„ฑ ๋ณ€ํ™˜) ์‹œ์Šคํ…œ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ํ…์ŠคํŠธ ๋ถ„์„์„ ์œ„ํ•œ frontend์™€ ์Œํ–ฅ ๋ชจ๋ธ(acoustic model), ์˜ค๋””์˜ค ํ•ฉ์„ฑ ๋ชจ๋“ˆ(audio synthesis module)๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ตฌ์ถ•์—..