Publications

See my Google Scholar for more.

* denotes equal contributions

# denotes corresponding authors

preprint

Hai Hu*, Ziyin Zhang*, Weifang Huang, Jackie Yan-Ki Lai, Aini Li, Yina Patterson, Jiahui Huang, Peng Zhang, Chien-Jer Charles Lin, Rui Wang. (2023). Revisiting Acceptability Judgements: CoLAC - Corpus of Linguistic Acceptability in Chinese. ArXiv, abs/2305.14091. *equal contributions. paper. data.
Liu, Y., Zhang, Z., Zhang, W., Yue, S., Zhao, X., Cheng, X., Zhang, Y., & Hu, H#. (2023). ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models. ArXiv, abs/2304.07666. paper. data.

2026

Lin, C., Zhou, H., Hu, H. (2026). Thinking Differently in Chinese and English: The Role of Grammar in Translational Thinking. In: Moser, D. (eds) Thinking in Chinese and English. Springer, Singapore. paper
Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prévot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, Leshem Choshen. (2026). BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data. EACL. paper. data.

2025

Liu, Y., Shen, Y., Zhu, H., Xu, L., Qian, Z., Song, S., Zhang, K., Tang, J., Zhang, P., Yang, B., Wang, R., & Hu, H#. (accepted). ZhoBLiMP: a Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese. Transactions of ACL. paper. data.
de Paiva, V., Gao, Q., Hu, H., Kovalev, P., Liu, Y., Moss, L. S., & Qian, Z. (2025). Math Natural Language Inference: this should be easy!. In: Proceedings of The 14th Joint Conference on Lexical and Computational Semantics (StarSEM). paper
Liu, Y., Zhang, W., Wang, Y., Tang, J., Zhang, P., Yang, B., Huang, F., Wang, R.#, & Hu, H.# (2025). Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese. In: Proceedings of EMNLP. (Oral) paper
Xinmeng Hou, Lingyue Fu, Chenhao Meng, Kounianhua Du, Wuqi Wang, Hai Hu#. (2025). Train Once for All: A Transitional Approach for Efficient Aspect Sentiment Triplet Extraction. In: Proceedings of EMNLP (Findings) paper
Hu*, Hai, Aini Li*, Yina Patterson, Jiahui Huang, Chien-Jer Charles Lin. (accepted). Bilingual Influences and Sources of Variability in Acceptability Judgments: A Case Study of Chinese. Lingua. data. *equal contributions

2024

Shisen Yue, Siyuan Song, Xinyuan Cheng, Hai Hu#. (2024). Do Large Language Models Understand Conversational Implicature – A case study with a Chinese sitcom. Proceedings of CCL. paper. data. Highlight Paper Award
Jushi Kai, Tianhang Zhang, Hai Hu, Zhouhan Lin. (2024). SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully. In: Proceedings of EMNLP (Findings). paper.
Ziyin Zhang*, Yikang Liu*, Weifang Huang, Junyu Mao, Rui Wang#, Hai Hu#. (2024). MELA: Multilingual Evaluation of Linguistic Acceptability. In Proceedings of ACL. paper. data *equal contributions.

2023

Li, A., Tamminga, M., & Hu, H. (2023). Intra- and interspeaker repetitiveness in Chengdu Mandarin locative variation. Language Variation and Change, 1-21. doi:10.1017/S095439452300008X Paper.
Aikaterini-Lida Kalouli*, Hai Hu*, Alexander F. Webb, Lawrence S. Moss, Valeria de Paiva. (2023). Curing the SICK and other NLI maladies. Computational Linguistics. 49 (1): 199–243. doi: https://doi.org/10.1162/coli_a_00465 *equal contributions. paper. data.

2022

Hu, Hai, Patrícia Amaral and Sandra Kübler (2022). “Word Embeddings and Semantic Shifts in Historical Spanish: Methodological Considerations”. Digital Scholarship in the Humanities. Volume 37, Issue 2, Pages 441–461. https://doi.org/10.1093/llc/fqab050 paper. data and code
Amaral, Patrícia, Hai Hu and Sandra Kübler (2022). “Tracing semantic change with distributional methods: The contexts of algo”. Diachronica. https://doi.org/10.1075/dia.21012.ama

2021

Xu, Liang, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Pan Xiang, Xin Tian, Hai Hu. (2021). FewCLUE: A Chinese few-shot learning evaluation benchmark. arXiv preprint arXiv:2107.07498. paper. code.
Hu, Hai, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle Richardson (2021). Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference. In: Findings of ACL. paper. data and code.
Hu, Hai and Sandra Kübler. (2021). Investigating Translated Chinese and Its Variants Using Machine Learning. In Natural Language Engineering. Volume 27, Issue 3, May 2021, pp. 339 - 372. https://doi.org/10.1017/S1351324920000182 paper. code.

2020

Hu, Hai, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Sandra Kübler, and Chien-Jer Charles Lin (2020). “Building a Literary Treebank for Translation Studies in Chinese”. In: Proceedings of 19th International Workshop on Treebanks and Linguistic Theories (TLT). pp. 18-31. paper.
Xu, Liang, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan (2020). CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings ofthe 28th International Conference on Computational Linguistics (COLING). pp. 4762–4772. paper. website. github page
Hu, Hai, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, and Larry Moss. (2020). OCNLI: Original Chinese Natural Language Inference. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 3512–3526. paper. code and data. leaderboard.
Richardson, Kyle, Hai Hu, Larry Moss, and Ashish Sabharwal. (2020). Probing Natural Language Inference Models through Semantic Fragments. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp. 8713-8721. paper. code and data.
Hu, Hai, Qi Chen, Kyle Richardson, Atreyee Mukherjee, Lawrence S Moss, and Sandra Kuebler. (2020). MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity. In: Proceedings of the Society for Computation in Linguistics 2020. pp. 319-329. paper. poster. code.
Li, Junyi, Hai Hu, Xuanwei Zhang, Minglei Li, Lu Li, and Liang Xu. “Light Pre-Trained Chinese Language Model for NLP Tasks.” In CCF International Conference on Natural Language Processing and Chinese Computing, pp. 567-578. Springer, 2020. paper

2019

Hu*, Hai, Wen Li*, He Zhou*, Zuoyu Tian, Yiwen Zhang and Liang Zou. (2019). Ensemble Methods to Distinguish Mainland and Taiwan Chinese. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects at NAACL 2019, pp. 165–171. Minneapolis, MN, USA. paper. * equal contributions
Hu, Hai, Qi Chen and Larry Moss. (2019). Natural Language Inference with Monotonicity. In Proceedings of the 13th International Conference on Computational Semantics (IWCS 2019), pp. 8–15. Gothenburg, Sweden. paper.

2018

Hu, Hai, Wen Li, and Sandra Kübler. (2018). Detecting Syntactic Features of Translated Chinese. In Proceedings of the 2nd Workshop on Stylistic Variation at NAACL 2018, pp. 20-28. New Orleans, Louisiana, USA. paper. slides. video presentation.
Hu, Hai, and Lawrence S. Moss. (2018). Polarity Computations in Flexible Categorial Grammar. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics: *SEM at NAACL 2018, pp. 124–129. New Orleans, Louisiana, USA. paper. poster. code.
Hu, Hai, Thomas Icard and Larry Moss. (2018). Automated Reasoning from Polarized Parse Trees. In Proceedings of the Fifth Workshop on Natural Language and Computer Science. Oxford, England. pp. 1-11. paper.
Lin, Chien-Jer Charles, & Hu, Hai. (2018). Linking comprehension and production: Frequency distribution of Chinese relative clauses in the Sinica Treebank. In Chu-Ren Huang, Shukai Hsieh, & Peng Jin (eds.) Text, Speech, and Language Technology Series. Springer. pp. 1-21.

2017

Hu, Hai and Yiwen Zhang. (2017). Path of Vowel Raising in Chengdu Dialect of Mandarin. In Proceedings of the 29th North America Conference on Chinese Linguistics. Rutgers, NJ. pp. 481-498. paper. abstract.
Hu, Hai, Dannial Dakota, and Sandra Kübler. (2017). Non-Deterministic Segmentation for Chinese Lattice Parsing. In Proceedings of Recent Advances of Natural Language Processing 2017, pp. 316–324. Varna, Bulgaria. paper. bib.

2016

Hu, Hai (2016). Is China entering WTO or shijie maoyi zuzhi–A Corpus-based Study of English Acronyms in Chinese Newspapers. In: Proceedings of 28th North America Conference on Chinese Linguistics. Provo, Utah. paper. abstract.
Cavar, Damir, Lwin Moe, Hai Hu, and Kenneth Steimel. (2016). Preliminary Results from the Free Linguistic Environment Project. In: Joint 2016 Conference on Head-driven Phrase Structure Grammar and Lexical Functional Grammar (HeadLex 2016), pp. 161–181. Warsaw, Poland. paper.

Hai Hu

Publications

preprint

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016