publications
2023
- Bayes Risk Transducer: Transducer with Controllable Alignment PredictionIn Proc. INTERSPEECH 2023, 2023
- Integrating Lattice-Free MMI Into End-to-End Speech RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023
- BAYES RISK CTC: CONTROLLABLE CTC ALIGNMENT IN SEQUENCE-TO-SEQUENCE TASKSIn The Eleventh International Conference on Learning Representations , 2023
- Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataarXiv preprint arXiv:2309.13876, 2023
- UniAudio: An Audio Foundation Model Toward Universal Audio GenerationarXiv preprint arXiv:2310.00704, 2023
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech DataarXiv preprint arXiv:2309.13905, 2023
- Hifi-codec: Group-residual vector quantization for high fidelity audio codecarXiv preprint arXiv:2305.02765, 2023
- Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative StudyarXiv preprint arXiv:2309.15800, 2023
2022
- LAE: Language-Aware Encoder for Monolingual and Multilingual ASRIn Proc. Interspeech 2022, 2022
- Consistent Training and Decoding for End-to-End Speech Recognition Using Lattice-Free MMIIn ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022
- Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language ModelIEEE Signal Processing Letters, 2022
- Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker ExtractionIn Proc. Interspeech 2022, 2022
2020
- A Random Gossip BMUF Process for Neural Language ModelingIn ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020