New sequencing technologies make it possible to achieve genome sequence fast and cheap. Since the assembly step of such next generation reads is still not well standardized it is the most cumbersome part of sequencing projects.
We present here approaches undertaken on the way to the draft assembly the cucumber (Cucumis sativus L. cv. Borszczagowski) genome done with the use of 8x unpaired and 4x paired (3 Kbp) pyrosequenced 454 XLR Titanium reads, together BAC library ends fragments (12,7x physical coverage). Two different assembly approaches, namely Celera and Celera/Arachne were finally used. The Celera result was 15’196 contigs of 197 Mbp length and N50 27’086 bp and 4’157 scaffolds of 224 Mbp length and N50 2’324 Kbp. In the Celera/Arachne method contigs after prior pyrosequencing reads assembly (in the form of ~ 800 nt reads) were used together with the STCs as the input to Arachne assembler. Number of contigs were 15’764 of the length 193 Mbp with N50 23’280 bp. Number of supercontigs were 12’438 with the coverage of 323 Mbp and N50 323’092 bp. Correctness of the assemblies were proven after mapping 95,56% of 63’035 cucumber unigenes with the average identity 97,81%. Additionally 6 BAC/Fosmid cucumber sequences (totally 372’277 bp) had identity of 97,61% to the assembled genome. Average reads coverage of the genome were 14,20x and 98% of the assembled genome had reads coverage higher than 3x.
Taking into consideration coverage used in already reported cucumber assemblies (32x by Miller J. et al., 2009 and 72x by Huang S. et al., 2009), as well as no differences in the quality and the best overall quantity statistics of contigs/scaffold lengths, the Celera approach used in this project should be considered the most optimal one to get omics-ready quality draft sequence of highly repeated eukaryotic genome.