DEFT'09
Call for participation   Task   Commitees and calendar   Corpora
     Results and publication   Programme and proceedings
 
  [Help]

Corpus      Description    Format    Downloading

Description of corpora

Corpora were constituted from two separate sources: debates in the European Parliament and a collection of articles from the newspapers Le Monde (France), The Financial Times (United Kingdom) and Il Sole 24 Ore (Italia).

"Parliament" Corpus

This corpus contains 32,289 interventions by representatives, which took place at the Parliament between 1999 and 2004. We only extracted interventions from representatives belonging to one of following five parties:

Examples

"Newspaper" Corpus

The assignment of « objective » and « subjective » values to articles was made in different ways for different newspapers.

Examples "Le Monde"

Examples "The Financial Times"

Examples "Il Sole 24 Ore"

Format of the corpora      Description   Downloading

The corpora are in an XML format with a DTD available here (last update: 01/12/08).

Lists of values for article properties (Task 1) and political parties (Task 3) are folllowing:

For Tasks 1 and 3, reference files are available, therefore, learning processes can be developed. Contrariwise, for Task 2, there are no reference files.

Examples

Downloading the corpora      Description   Format

The encoding of corpora was modified to be uniformly UTF-8. Possible misspellings or mispunctuations were not corrected.

The userid and password requested for accessing the corpora will be sent to persons who will have registrated and who will have signed and sent the agreement.

Training Corpus

Update: 09/01/21.

Test Corpus

Update: 09/03/18.

Reference data

F-measure perl scripts: task 1 and 3

Update: 09/05/05.