DEFT2011

homepresentationcorpusformatsevaluationsfaqproceedings

Training corpus

From February 21st. The access to the corpora is portected by a login and a password.

Task 1. Diachronic variation

This corpus is composed of newspaper archives of 300 or 500 words per extract (depending of the track), in one global XML file. The meta-data indicate the year to identify (see presentation of corpora).

Task 2. Abstract/articles pairing

This corpus is composed of 300 documents split in 300 abstract (sub-directory "res/*.res") and 300 full articles (sub-directory "art/*.art") or 300 articles without introduction and conclusion (sub-directory "txt/*.txt").

The 300 documents are named with a numeric identifyer between 001 and 300. The correspondance between abstract file and article file, or between abstract file and text file, is given in another file ("log_reference_appr.txt"); this correspondance makes the reference for tha abstract/article pairing.

 

Test corpus

From April 4th.

Task 1. Diachronic variation

This corpus is composed of newspaper archives of 300 or 500 words per extract (depending of the track), in one global XML file.

Task 2. Abstract/articles pairing

This corpus is composed of 198 documents split in 198 abstract (sub-directory "res/*.res") and 198 full articles (sub-directory "art/*.art") or 198 articles without introduction and conclusion (sub-directory "txt/*.txt").

The 198 documents are named with a numeric identifyer between 001 and 198. The correspondance between abstract file and article file, or between abstract file and text file, is given in another file ("log_reference_appr.txt"); this correspondance makes the reference for tha abstract/article pairing.