Frequently asked questions
- I have looked at the training data and I expected to find also some English texts, as in the presentation page (http://deft2011.limsi.fr/index.php?id=4&lang=en) it is said that: "Two languages are proposed: French, with articles from the humanities; English, with articles from health." However, the texts are only in French. Does it mean that no English articles will be included in the challenge?
→ Unfortunately, I had not time to prepare the corpora in English. So, there is only French language for this year. We will include English articles for DEFT2012.
- Thank you for sending us the french corpus. We are very much into research publications mining work and we are happy to work on your data set. However, we have coded all our modules to suit english corpus. Now we are in the process of making the algorithms suitable to process french language. I request you to provide us any parallel english corpus for the research publications french corpus, so that we could verify our algorithms and fine tune it to french more possibly.
→ Unfortunately, due to lack of time, we do not have any English corpus available yet... The English corpora we were looking for are not parallel corpora, nor comparable corpora, with the French one; while French corpora deal with the Humanities, the English corpora we began to gather deal with health. I do not see which parallel corpus I can find for helping you.
- The file "132.res"I have is in Greek language (Systran translates the first sentence as follow: "L'étude des travaux intelligibles pendant la réalisation du travail de traduction constitue défi important pour les experts à la didactique"). Nevertheless, the corresponding file is written in French. Do I have to remove it from the corpus or did I do an error?
→ You did not make any error for the presence of this greek file in the corpus (we did not detect it while constituting the corpus, the sole from all these files); I confirm this is paired with the article 174.art and the text 209.txt that are written in French. I precise there is no Greek file in the test corpus.
- I noticed the name of the review was written in abstract and articles. I am wondering about the status of this information. Can the process take into account this information in order to divide in 5 (or 6 for the test) the number of possible candidates?
→ I confirm you can take into account this information in your approach; this information will be available in all files in the test corpus.