|
|
|
|
|
Alison V. Callahan; Michel Dumontier. |
POS tagging is used as the first step in many NLP workflows, although the accuracy of tag assignment frequently goes unchecked. We hypothesize that changing the training corpora for a parser will affect its POS tagging of a target corpus. To this end we train the Charniak-Lease parser on the WSJ corpus and two biomedical corpora and evaluate its output to MedPost, a POS tagger with a reported 97% accuracy on biomedical text. Our findings indicate that using biomedical training corpora significantly improves performance, but that minor differences in the biomedical training corpora have a significant effect on the correctness of POS tagging. Specifically, the tagging of hyphenated words and verbs was affected. This work suggests that the choice of training... |
Tipo: Manuscript |
Palavras-chave: Bioinformatics. |
Ano: 2008 |
URL: http://precedings.nature.com/documents/2310/version/1 |
| |
|
|
Marc-Alexandre Nolin; Jacques Corbeil; Luc Lamontagne; Michel Dumontier. |
The Bio2RDF project uses open-source Semantic Web technologies to provide interlinked life science data in order to maximize productivity and facilitate biological knowledge discovery. Using both syntactic and semantic data integration techniques, Bio2RDF puts into practice a simple methodology to generate and
seamlessly integrate machine-interpretable data that can be powerfully interrogated with SPARQL-based queries to answer sophisticated questions.

At its core, database records are converted into a set of statements or so-called triples that are captured together as a named graph annotated with provenance. The records and the entities they are about are provided with a Uniform Resource Identifier... |
Tipo: Presentation |
Palavras-chave: Bioinformatics. |
Ano: 2010 |
URL: http://precedings.nature.com/documents/5060/version/1 |
| |
|
|
|