LLM-SR aims to generate a controllable and interpretable reasoning process by employing step-by-step inferences. In this task, we focus on a fine-grained analysis of the Chain-of-Thought (CoT) process, which enables a more detailed evaluation of LLMs and contributes to Process Reward Modeling, thereby enhancing the generation of more coherent and accurate reasoning processes.
To achieve this, the task requires generating "question_parsing" and "cot_parsing" results based on the content of "question" and "cot" (produced by Llama-3-8B-Instruct) for each given question. The question parsing process involves extracting all conditions necessary for solving the question. The CoT parsing process identifies all "statements" and their corresponding "evidence" within the context of the question conditions and the given CoT content. Subsequently, for each extracted statement-evidence pair, a conclusion is required to determine whether the evidence sufficiently supports the statement.
Focusing on the LLM’s capacity for fine-grained question analysis and deduction based on given conditions, we provide only 24 training examples to illustrate the data format and question types. Furthermore, participants can only use the Llama-3-8B-Instruct as their backbone model.
We introduce a fine-grained Chain-of-Thought (CoT) analysis dataset derived from LogiQA, comprising 24 annotated examples that constitute the training set. Each example is supplemented with question parsing and CoT parsing annotations.
A data sample:
{ "question": "There are 7 outstanding students G, H, L, M, U, W and Z in a school.During the summer vacation, the school will send them to the United Kingdom and the United States for inspection.The school has only 7 students participating in this activity, and each person happens to go to one of these two countries.Considering the specialty of each student, this activity must meet the following conditions? (1) If G goes to the UK, then H To the United States.(2) If L goes to the UK, both M and U go to the US.(3) The country W went to was different from the country Z went to.(4) The country where U goes is different from the country where G goes.(5) If Z goes to the UK, then H also goes to the UK.\nIf G goes to the United States, which of the following must be true?\nA.H go to the UK\nB.L go to America\nC.M go to the UK\nD.W go to America", "question_parsing": [ "There are 7 outstanding students G, H, L, M, U, W and Z in a school.During the summer vacation, the school will send them to the United Kingdom and the United States for inspection.", "each person happens to go to one of these two countries", "If G goes to the UK, then H To the United States", "If L goes to the UK, both M and U go to the US", "The country W went to was different from the country Z went to", "The country where U goes is different from the country where G goes", "If Z goes to the UK, then H also goes to the UK", "G goes to the United States" ], "answer": "b", "id": 162, "cot": "Since G goes to the United States, we need to analyze the conditions that follow. Condition (1) is not applicable since G is going to the US. Condition (2) is also not applicable since L's destination is not specified. Condition (3) does not provide any information about H, M, U, or W. Condition (4) states that U's destination is different from G's, which is the US, so U must go to the UK. Condition (5) is not applicable since Z's destination is not specified.", "cot_parsing": [ { "statement": "Condition (1) is not applicable", "evidence": "Condition (1): If G goes to the UK, then H To the United States. | G is going to the US", "Verification": "false" }, { "statement": "Condition (2) is also not applicable", "evidence": "Condition (2): If L goes to the UK, both M and U go to the US. | L's destination is not specified", "Verification": "false" }, { "statement": "Condition (3) does not provide any information about H, M, U, or W", "evidence": "Condition (3): The country W went to was different from the country Z went to.", "Verification": "false" }, { "statement": "U must go to the UK", "evidence": "Condition (4): The country where U goes is different from the country where G goes. | Condition (4) states that U's destination is different from G's, which is the US", "Verification": "true" }, { "statement": "Condition (5) is not applicable", "evidence": "Condition (5): If Z goes to the UK, then H also goes to the UK. | Z's destination is not specified", "Verification": "true" } ], "sel_idx": 92 },
If the "statement" can be logically deduced from the "evidence," then the "verification" is considered true; otherwise, the "verification" is false.
The datasets are avaliable on Google Drive
The Challenge Task Definition and Metrtics will be comming as soon as possible.
$$ Accuracy = \frac{TP+FN}{ TP + FN+ TN + FP} $$
$$ P = \frac{TP}{ TP + FP } $$
$$ R = \frac{TP}{TP + FN} $$
$$ F1 = \frac{2 \times P \times R }{ P + R } $$
Baseline code and full data will publish as soon as possible.
Our challenge seeks to investigate the structure reasoning capabilities of large language models (LLMs), particularly in low-resource settings. To this end, we introduce a fine-grained Chain-of-Thought (CoT) analysis dataset derived from LogiQA, comprising 24 annotated examples that constitute the training set. Each example is supplemented with question parsing and CoT parsing annotations.
Participants may utilize the provided training set to develop their own structure reasoning models and make predictions on the test set. It is important to note that the use of additional data for training is permitted. Participants are required to apply their trained models to generate predictions on the test set and present the results in the specified format.
Participants need to successfully submit all the following files to be considered valid submissions:
Please sumbit predicted results with a json files "results.json".
{ "question": "There are 7 outstanding students G, H, L, M, U, W and Z in a school.During the summer vacation, the school will send them to the United Kingdom and the United States for inspection.The school has only 7 students participating in this activity, and each person happens to go to one of these two countries.Considering the specialty of each student, this activity must meet the following conditions? (1) If G goes to the UK, then H To the United States.(2) If L goes to the UK, both M and U go to the US.(3) The country W went to was different from the country Z went to.(4) The country where U goes is different from the country where G goes.(5) If Z goes to the UK, then H also goes to the UK.\nIf G goes to the United States, which of the following must be true?\nA.H go to the UK\nB.L go to America\nC.M go to the UK\nD.W go to America", "question_parsing": [ "There are 7 outstanding students G, H, L, M, U, W and Z in a school.During the summer vacation, the school will send them to the United Kingdom and the United States for inspection.", "each person happens to go to one of these two countries", "If G goes to the UK, then H To the United States", "If L goes to the UK, both M and U go to the US", "The country W went to was different from the country Z went to", "The country where U goes is different from the country where G goes", "If Z goes to the UK, then H also goes to the UK", "G goes to the United States" ], "answer": "b", "id": 162, "cot": "Since G goes to the United States, we need to analyze the conditions that follow. Condition (1) is not applicable since G is going to the US. Condition (2) is also not applicable since L's destination is not specified. Condition (3) does not provide any information about H, M, U, or W. Condition (4) states that U's destination is different from G's, which is the US, so U must go to the UK. Condition (5) is not applicable since Z's destination is not specified.", "cot_parsing": [ { "statement": "Condition (1) is not applicable", "evidence": "Condition (1): If G goes to the UK, then H To the United States. | G is going to the US", "Verification": "false" }, { "statement": "Condition (2) is also not applicable", "evidence": "Condition (2): If L goes to the UK, both M and U go to the US. | L's destination is not specified", "Verification": "false" }, { "statement": "Condition (3) does not provide any information about H, M, U, or W", "evidence": "Condition (3): The country W went to was different from the country Z went to.", "Verification": "false" }, { "statement": "U must go to the UK", "evidence": "Condition (4): The country where U goes is different from the country where G goes. | Condition (4) states that U's destination is different from G's, which is the US", "Verification": "true" }, { "statement": "Condition (5) is not applicable", "evidence": "Condition (5): If Z goes to the UK, then H also goes to the UK. | Z's destination is not specified", "Verification": "true" } ], "sel_idx": 92 },
Please note: The submission deadline is at 11:59 p.m. (Anywhere on Earth) of the stated deadline date.
Training data and participant instruction release for all shared tasks | February 10, 2025 |
Evaluation deadline for all shared tasks | March 30, 2025 |
Notification of all shared tasks | April 5, 2025 |
Shared-task paper submission deadline | April 12, 2025 |
Acceptance notification of shared-task papers | April 30, 2025 |
Camera ready paper deadline | May 16, 2025 |
Top-ranked participants in this competition will receive a certificate of achievement and will be recommended to write a technical paper for submission to the XLLM Workshop of ACL 2025.
Zixia Jia (Beijing Institute for General Artificial Intelligence, Beijing, China)
Zilong Zheng (Beijing Institute for General Artificial Intelligence, Beijing, China)
Shuyi Zhang (Beijing Institute for General Artificial Intelligence, Beijing, China)
Zhenbin Chen (Beijing Institute for General Artificial Intelligence, Beijing, China)
[1] Yao, Yuan, et al. "DocRED: A large-scale document-level relation extraction dataset." arXiv preprint arXiv:1906.06127 (2019).
[2] Tan, Qingyu, et al. "Revisiting DocRED--Addressing the False Negative Problem in Relation Extraction." arXiv preprint arXiv:2205.12696 (2022).
[3] Li, Junpeng, Zixia Jia, and Zilong Zheng. "Semi-automatic data enhancement for document-level relation extraction with distant supervision from large language models." arXiv preprint arXiv:2311.07314 (2023).
[4] Gui, Honghao, et al. "Iepile: Unearthing large-scale schema-based information extraction corpus." arXiv preprint arXiv:2402.14710 (2024).
[5] Xue, Lilong, et al. "Autore: Document-level relation extraction with large language models." arXiv preprint arXiv:2403.14888 (2024).