Resumo: |
Understanding interactions between environmental chemicals and genes provides insights into the mechanisms of chemical action, disease susceptibility, therapeutic drug interactions, and toxicity. The Comparative Toxicogenomics Database (CTD; http://ctd.mdibl.org) is a web-based resource that integrates diverse information for the cross-species analysis of chemical, gene, and disease relationships. Much of the data contained in CTD is manually gathered by biocurators; CTD integrates data curated manually from over 10,000 scientific documents. CTD biocurators manually curate chemical-gene and chemical/gene-disease interactions from the scientific literature using controlled vocabularies. Unfortunately, there are many more scientific documents available for curation than can actually be curated by CTD staff; consequently, selecting the best documents for curation is very important.

To improve the efficacy of CTD biocuration process, a computational text mining prototype was developed to score and rank PubMed abstracts in terms of their desirability for curation. The prototype identifies:
•	chemical, gene, and disease actors, 
•	specific action terms used to define interaction activity, and 
•	other key factors that contribute to a document’s overall relevancy to CTD.

The prototype was then tested using data manually curated by CTD as the control group in order to determine its overall effectiveness; a metric known as mean average precision was used in evaluating the prototype. How was the prototype designed and architected, what 3rd party tools were integrated into the prototype, how was the prototype tested? Were the tools able to identify the same actors as the curators, how were the documents scored and ranked, how effective was the document ranking process? What major problems were encountered? How will the prototype ultimately be integrated into the CTD biocuration process? The answers to these and other questions will be discussed during the workshop.
|