EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

Haoxun Li, Yu Liu, Yuqing Sun, Hanlei Shi, Leyuan Qu, Taihao Li*
Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, China
*Corresponding author

Abstract

Recent LLM-based TTS systems achieve strong quality and zero-shot ability, but lack fine-grained emotional control due to their reliance on discrete speech tokens. Existing approaches either limit emotions to categorical labels or cannot generalize to LLM-based architectures. We propose EMORL-TTS (Fine-grained Emotion-controllable TTS with Reinforcement Learning), a framework that unifies global intensity control in the VAD space with local emphasis regulation. Our method combines supervised fine-tuning with reinforcement learning guided by task-specific rewards for emotion category, intensity, and emphasis. Moreover, we further investigate how emphasis placement modulates fine-grained emotion intensity. Experiments show that EMORL-TTS improves emotion accuracy, intensity differentiation, and emphasis clarity, while preserving synthesis quality comparable to strong LLM-based baselines.

1. Emotion Accuracy

We test if the generated speech conveys the correct target emotion, and EMORL-TTS achieves higher accuracy across all compared models.

Sentence 1: The project turned out extremely successful beyond our hopes.

Emotion CosyVoice2 EmoSpeech EMORL w/o GRPO EMORL
Neutral
Angry
Happy
Sad
Surprise

Sentence 2: That was a truly unforgettable experience for us all.

Emotion CosyVoice2 EmoSpeech EMORL w/o GRPO EMORL
Neutral
Angry
Happy
Sad
Surprise

2. Emotion Intensity

We evaluate whether weak, medium, and strong emotions can be clearly distinguished, and EMORL-TTS provides more fine-grained and robust control than all other systems.

Angry: And what are doves. and what are doves.

Intensity Relative Attribute EmoSphere++ EMORL
Weak
Medium
Strong

Happy: They never know id regular ran away.

Intensity Relative Attribute EmoSphere++ EMORL
Weak
Medium
Strong

Sad: This used to be jerrys occupation.

Intensity Relative Attribute EmoSphere++ EMORL
Weak
Medium
Strong

Surprise: This used to be jerrys occupation.

Intensity Relative Attribute EmoSphere++ EMORL
Weak
Medium
Strong

3. Emphasis Accuracy

We check if emphasis is perceived at the intended words (with emphasized words marked by * in this demo page), and EMORL-TTS delivers clearer and more reliable emphasis across different emotions than all compared approaches; in the surprise emotion, however, emphasis control is relatively weaker, since adding emphasis can cause misclassification into other emotions, and the RL stage prioritizes emotion accuracy as the core reward.

I *chose* the right way.

Emotion CosyVoice2 EME-TTS EMORL
Neutral
Angry
Happy
Sad
Surprise

I chose the *right* way.

Emotion CosyVoice2 EME-TTS EMORL
Neutral
Angry
Happy
Sad
Surprise

She is now *choosing* skirt to wear.

Emotion CosyVoice2 EME-TTS EMORL
Neutral
Angry
Happy
Sad
Surprise

She is now choosing skirt to *wear*.

Emotion CosyVoice2 EME-TTS EMORL
Neutral
Angry
Happy
Sad
Surprise

4. Quality and Naturalness

We assess MOS and NISQA scores, showing that EMORL-TTS surpasses all baselines in controllability while still preserving the high synthesis quality and naturalness unique to LLM-based TTS.

Sentences:
1. All smile were real and the happier the more sincere.
2. Monster made a deep bow.
3. They'd never know I'd regular ran away.

Sentence CosyVoice2 EmoSpeech EmoSphere++ Spark-TTS EMORL w/o GRPO EMORL
1
2
3

5. Effect of Part-of-Speech Emphasis on Emotion Intensity

As an extended study within EMORL-TTS, we investigate how emphasis placement affects perceived emotion intensity, and find from EMORL-generated speech that emphasizing adverbs brings the most significant enhancement of emotion intensity.

Sentence: She became extremely emotional after hearing the good news.

Emotion No Emphasis *extremely*(Adverb) *after*(Other) *hearing*(Verb) *good*(Adjective) *news*(Noun)
Angry
Happy
Sad
Surprise

We will open-source our code upon paper acceptance.
Hope you enjoy this research! ^_^