Grants & Datasets

ArguGPT detector: (2023) ChatGPT英语作文检测器：预测英语议论文由ChatGPT生成的概率（huggingface链接）
【new!】ZhoBLiMP: a Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese 包含15种大语言现象、118种小语言现象的汉语最小对立体（minimal pair）；以及20个从头预训练的汉语大模型（参数量：14M to 1.4B）
MELA：(2024) Multilingual Evaluation of Linguistic Acceptability 多语句法可接受度数据集（10种语言：英、中、俄、意、德、西、日、法、阿、冰岛）
CoLAC：(2023) Corpus of Linguistic Acceptability in Chinese 汉语句法可接受度数据集
SwordsmanImp：(2024) A benchmark for pragmatic understanding in Chinese based a sitcom《武林外传》言外之意数据集
Cured SICK: (2023) Re-annotated SICK dataset; 重新标注的SICK数据集
ChineseNLIProbing: (2021) Multiple probing datasets for Chinese NLI, including Chinese HANS, expanded diagnostics, etc. 多个汉语自然语言推理评测
OCNLI： (2020) Original Chinese Natural Language Inference; 原生汉语自然语言推理数据集
CLUE: (2020) Chinese Language Understanding Evaluation (CLUE) benchmark; 中文语言理解测评基准
FewCLUE: (2021) Few-shot CLUE Benchmark; CLUE少样本学习评测

Workshop at WESSLLI 2020-2023. webpage

Hai Hu