Overview: Document-level Information Extraction Task.
We introduce the Document-level Information Extraction (DocIE) challenge on this platform. The goal of DocIE is to identify entities, their corresponding mentions, and the relationships between entity pairs within long, unstructured documents. This challenge requires models to process an input document, which consists of a sequence of sentences, and produce output structures that include three key elements: 1) sets of mentions, where each set corresponds to a distinct entity; 2) entity types; and 3) relation triples, which describe the relationships between pairs of entities.
The distribution of domains and source datasets within the DocIE, including 34 domains datasets, is shown in the figure above. The DocIE is a comprehensive dataset that covers a wide range of domains and source datasets, making it a valuable resource for researchers and practitioners in the field of Document-level information extraction.
Distribution of different tasks, domains, and source datasets within the DocIE.
Example of a sample:
{ "domain": "Culture", "title": "The_Triple_Package", "doc": "The Triple Package: How Three Unlikely Traits Explain the Rise and Fall of Cultural Groups in America is a book published in 2014 by two professors at Yale Law School, Amy Chua and her husband, Jed Rubenfeld. Amy Chua is also the author of the 2011 international bestseller, Battle Hymn of the Tiger Mother.\nAccording to the preface, the authors find that \"certain groups do much better in America than others\u2014as measured by various socioeconomic indicators such as income, occupational status, job prestige, test scores, and so on\u2014 [which] is difficult to talk about. In large part this is because the topic feels racially charged.\" Nevertheless, the book attempts to debunk racial stereotypes by focusing on three \"cultural traits\" that attribute to success in the United States.\nFollowing Battle Hymn of the Tiger Mother in 2011, Chua wrote this book with her husband Jed Rubenfeld after observing a more prevalent trend of students from specific ethnic groups achieving better academic results than other ethnic groups. For example, ..........", "triplets": [ { "subject": "The Triple Package", "relation": "ReviewedBy", "object": "Colin Woodard" }, { "subject": "The Triple Package", "relation": "Creator", "object": "Jed Rubenfeld" }, ... ... ], "entities":[ { "id": 0, "mentions": [ "the United States", "America", "U.S.", "American", "the successful groups in the United States", "the rest in America", "the national average", "the American dream", "UK" ], "type": "GPE" }, { "id": 1, "mentions": [ "Yale Law School", "Yale" ], "type": "ORG" }, ... ... ], "label_set": ["ReviewedBy","NominatedFor","InfluencedBy","AwardReceived","HasWorksInTheCollection","Creator","PresentedIn","EthnicGroup","PublishedIn","Affiliation","OwnerOf","InvestigatedBy","CitesWork","HasPart","Acknowledged","DifferentFrom","Follows"], "entity_label_set":['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART'] }
The datasets are avaliable on Google Drive.
Task 1: Named Entity Recognition.
Task-1: Named Entity Recognition (NER) involves identifying named entities within a given text and classifying them into appropriate categories. Participants are expected to develop models that accurately extract both the entities mand their corresponding types. Unlike traditional sentence-level NER tasks, this task requires participants to identify all mentions of each entity within the entire paragraph.
Task 2: Relation Extraction.
Task-2: Relation Extraction(RE) involves identifying the relations between entities within a given text. Participants are expected to develop models that accurately extract both entitiy pairs and its relaion types. Unlike traditional sentence-level RE tasks, this task requires participants to identify all relations of each entity pair within the entire paragraph.
$$ P = \frac{TP}{ TP + FP } $$
$$ R = \frac{TP}{TP + FN} $$
$$ F1 = \frac{2 \times P \times R }{ P + R } $$
We will use the F1 score as the main evaluation metric. However, for different settings (strict mode and general mode), there are different criteria for determining whether a sample is correctly predicted, which we will introduce in detail.
The named entity recognition task can be divided into Entity Identification and Entity Type Classification at a more fine-grained level. In order to evaluate the capabilities of the model more systematically, we set different evaluation indicators for these two aspects:
Example of named entity recognition task. When an entity mention set is judged to be predicted correctly, the value of \(TP_{\text{EI}}\) or \(TP_{\text{EC}}\) increases by 1. Therefore, the final value of \(TP_{\text{EI}}\) or \(TP_{\text{EC}}\) represents the number of correctly extracted entity.
Participants are required to correctly extract all the entity mentions from the given text. Only all predicted mentions match the ground truth, the sample will be considered correctly predicted. And the value of \(TP_{\text{EI}} \) is the number of samples predicted correctly. Finally, according to the calculation formula of F1 score, \(F1_{EI}\) can be calculated as the evaluation metric of this subtask.
Participants are required to classify all predicted mentions into the correct entity types. Only all mentions are classify correctly, the sample will be considered correctly predicted. And the value of \(TP_{\text{EC}} \) is the number of samples classify correctly. Finally, according to the calculation formula of F1 score, \(F1_{EC}\) can be calculated as the evaluation metric of this subtask.
The evaluation of the relation extraction will be divided into two different settings: general mode and strict mode.
Example of relation extraction task in general mode and strict mode. When a triple is judged to be predicted correctly, the value of \(TP_{\text{REG}}\) increases by 1. Therefore, the final value of \(TP_{\text{REG}}\) represents the number of correctly predicted triplets.
Baseline code and full data will publish as soon as possible.
Our challenge seeks to investigate the domain transfer capabilities of large language models (LLMs) in document-level information extraction (DocIE) tasks, particularly in low-resource settings. To this end, we present a dataset that encompasses document data from 34 distinct domains. The dataset is divided as follows: 5 domains are designated for training, 2 domains for validation, and the remaining domains are allocated as test sets. Each domain dataset consists of 8 to 10 documents.
Participants may utilize the provided training set to develop their own information extraction models and make predictions on the test set. It is important to note that the use of additional data for training is permitted. Participants are required to apply their trained models to generate predictions on the test set and present the results in the specified format.
Participants need to successfully submit all the following files to be considered valid submissions:
Please sumbit predicted results with a json files "results.json".
[ { "id": 0, "domain": "DOMAIN_NAME", # The domain of the text # Participants need to form the prediction results into the following structure # Task 1: Named Entity Recognition "entities_output": [ { "mentions":["MENTION_1", "MENTION_2", ...], # the mentions of the entity "type": "ENTITY_TYPE_1" # the type of the entity }, { "mentions":["MENTION_1", "MENTION_2", ...], # the mentions of the entity "type": "ENTITY_TYPE_2" # the type of the entity }, ... ], # Task 2: Relation Extraction "triples_output": [ { "head": "SUBJECT_1", # The subject of the relation triplet "relation": "RELATION_1", # The relation of the relation triplet "tail": "OBJECT_1" # The object of the relation triplet }, { "head": "SUBJECT_2", # The subject of the relation triplet "relation": "RELATION_2", # The relation of the relation triplet "tail": "OBJECT_2" # The object of the relation triplet } ], }, { "id": 1, ... }, ... ]
Please note: The submission deadline is at 11:59 p.m. (Anywhere on Earth) of the stated deadline date.
Training data and participant instruction release for all shared tasks | February 10, 2025 |
Evaluation deadline for all shared tasks | March 30, 2025 |
Notification of all shared tasks | April 5, 2025 |
Shared-task paper submission deadline | April 12, 2025 |
Acceptance notification of shared-task papers | April 30, 2025 |
Camera ready paper deadline | May 16, 2025 |
Top-ranked participants in this competition will receive a certificate of achievement and will be recommended to write a technical paper for submission to the XLLM Workshop of ACL 2025.
Zixia Jia (Beijing Institute for General Artificial Intelligence, Beijing, China)
Zilong Zheng (Beijing Institute for General Artificial Intelligence, Beijing, China)
Shuyi Zhang (Beijing Institute for General Artificial Intelligence, Beijing, China)
Zhenbin Chen (Beijing Institute for General Artificial Intelligence, Beijing, China)
[1] Yao, Yuan, et al. "DocRED: A large-scale document-level relation extraction dataset." arXiv preprint arXiv:1906.06127 (2019).
[2] Tan, Qingyu, et al. "Revisiting DocRED--Addressing the False Negative Problem in Relation Extraction." arXiv preprint arXiv:2205.12696 (2022).
[3] Li, Junpeng, Zixia Jia, and Zilong Zheng. "Semi-automatic data enhancement for document-level relation extraction with distant supervision from large language models." arXiv preprint arXiv:2311.07314 (2023).
[4] Gui, Honghao, et al. "Iepile: Unearthing large-scale schema-based information extraction corpus." arXiv preprint arXiv:2402.14710 (2024).
[5] Xue, Lilong, et al. "Autore: Document-level relation extraction with large language models." arXiv preprint arXiv:2403.14888 (2024).