|
|
|
|
|
Heather A. Piwowar; Wendy W. Chapman. |
*Background:* Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. 

*Methods:* We investigated these relationships with respect to gene expression microarray data in the... |
Tipo: Manuscript |
Palavras-chave: Bioinformatics. |
Ano: 2008 |
URL: http://precedings.nature.com/documents/1700/version/1 |
| |
|
|
Heather A. Piwowar; Wendy W. Chapman. |
Sharing research data is a cornerstone of science. Although many tools and policies exist to encourage data sharing, the prevalence with which datasets are shared is not well understood. We report our preliminary results on patterns of sharing microarray data in public databases.

The most comprehensive method for measuring occurrences of public data sharing is manual curation of research reports, since data sharing plans are usually communicated in free text within the body of an article. Our early findings from manual curation of 100 papers suggest that 30% of investigators publicly share their full microarray datasets. Of these, 70% of the datasets are deposited at NCBI's Gene Expression Omnibus (GEO) database,... |
Tipo: Poster |
Palavras-chave: Bioinformatics. |
Ano: 2008 |
URL: http://precedings.nature.com/documents/1701/version/1 |
| |
|
|
Heather Piwowar; Wendy W. Chapman. |
Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the... |
Tipo: Manuscript |
Palavras-chave: Bioinformatics. |
Ano: 2008 |
URL: http://precedings.nature.com/documents/1721/version/1 |
| |
|
|
Heather Piwowar; Wendy W. Chapman. |
Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to find shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the... |
Tipo: Manuscript |
Palavras-chave: Bioinformatics. |
Ano: 2008 |
URL: http://precedings.nature.com/documents/1721/version/2 |
| |
|
|
Heather A. Piwowar; Wendy W. Chapman. |
*Background*
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries... |
Tipo: Manuscript |
Palavras-chave: Bioinformatics. |
Ano: 2010 |
URL: http://precedings.nature.com/documents/4267/version/1 |
| |
|
|
Heather A. Piwowar; Wendy W. Chapman. |
Repurposing research data holds many benefits for the advancement of biomedicine, yet is very difficult to measure and evaluate. We propose a data reuse registry to maintain links between primary research datasets and studies that reuse this data. Such a resource could help recognize investigators whose work is reused, illuminate aspects of reusability, and evaluate policies designed to encourage data sharing and reuse. |
Tipo: Poster |
Palavras-chave: Bioinformatics. |
Ano: 2008 |
URL: http://precedings.nature.com/documents/2152/version/1 |
| |
|
|
Heather A. Piwowar; Wendy W. Chapman. |
*Background* 
Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries... |
Tipo: Manuscript |
Palavras-chave: Bioinformatics. |
Ano: 2010 |
URL: http://precedings.nature.com/documents/4267/version/2 |
| |
|
|
|