The figure overviews the pipeline used for the annotation of the sequences originated from the gene trap technique. The sequences are obtained from the clones were a successful insertion occurred.
We use Blat to map the sequences onto the mouse genome (assembly NCBI - 33). We do not process sequences with length less that 18 bp or those with very low quality score as calculated by Phred¹.
First we map the sequences to the mouse genome with BLAT². Afterwards we query the Ensembl database³ for the relevant annotation in that mapped region. We start by verifying if a known exon was trapped. If so, we store the relevant information of the corresponding gene. In case no known exon is trapped we try to identify if the trapped region corresponds to a known gene (perhaps a novel exon), a putative exon or gene (predict in the Ensembl pipeline) and finally to the EST's that have been aligned to the genome but are no associated with a Ensembl prediction.

