Lamar Michaelsen posted an update 1 week, 4 days ago
Although the RNA-seq data had been produced with a non-strand-specific http://www.selleckchem.com/products/Dasatinib.html protocol, the reading direction could be determined with the help of MaxEntScan (Yeo and Burge, 2004) for 98.8% of these reads based on the canonical splice site motifs. This resulted in 270,957 unique splice sites, of which 208,956 exactly matched the splice sites of the ENSEMBL 70.1 gene build for the LatCha1 assembly (Fig. 1, Top). About 43% of the ENSEMBL splice junctions were not visible in our transcriptome map because the corresponding genes were not expressed at sufficient levels to pass our filtering criteria in the three tissues considered here. Additionally, 1,793 sites matched to splice junctions from the lincRNA set reported in the Supplementary Materials (Supplementary Data 1) of the coelacanth genome paper (Amemiya et al., 2013). Another 17,801 mapped to novel splice junctions within the boundaries of genes annotated in ENSEMBL 70.1 in the correct reading direction. Since they did not match exactly to positions of annotated splice sites of ENSEMBL 70.1, they are not shown in Figure 1 (Top). This left 42,463 novel splice sites located outside annotated genes, corresponding to 22,424 distinct splice junctions that are located entirely outside of annotation. Furthermore, we identified 3,360 distinct junctions with only one side outside the published annotation. A detailed comparison of observed splice junctions is compiled in Supplementary Tables S3 and S4, a graphical summary of the splice sites accounting for the exact matches only is given in Figure 1 (Top). Assembled into transcripts with cufflinks and cuffmerge, these combined transcriptome data of L. chalumunae and L. menadoensis encompassed 126,235 distinct transcripts belonging to 109,761 genes. This amounts to an average of 2.54 exons. Of these, 86,203 (68.3%) transcripts were intronless. 61.9% of the transcripts (69,434) did not contain exons located within gene boundaries annotated by ENSEMBL. The majority of these, namely 58,058 transcripts, were intronless. About 87% (60,444) of these new transcripts can be considered as lincRNAs since they have no overlaps with RNAcode hits or blastx hits in the CCDS data base with an E-value <10?10. About 18% of the rest, that is, 1,586 new transcripts can be classified as potentially coding genes, since at least half of their exons overlap by at least 50% of their sequence with blastx alignments or with regions found by RNAcode. If strand information was available, the overlap had to be strand-specific. We found 22,424 novel splice junctions outside the published annotation corresponding to 41,139 unique splice sites. Of these, 32,467 matched exactly with splice sites in the collated transcript models produced by cuffmerge. 4,163 additional splice sites were located within these transcripts, apparently corresponding to local variations in the exact splicing position.