Yuxin Li, Cheng Ma, Tao Zhang
Tsinghua University · School of Economics & Management; Renmin University of China
We benchmark four Chinese-capable language models — ChatGLM-3, Qwen-2, BERT-wwm-Chinese, and FinBERT-CN — for sentence-level sentiment extraction on a hand-labelled corpus of 14,000 earnings-call transcripts from Shanghai and Shenzhen-listed firms (2018-2024). Qwen-2 with chain-of-thought prompting achieves the highest F1 (0.847) on a five-class sentiment task. Downstream, sentiment-weighted portfolios constructed from Qwen-extracted signals earn a CAPM-adjusted alpha of 6.3% annualised over 2022-2023.