Skip to main content
Figure 3 | Respiratory Research

Figure 3

From: Transposable elements and their potential role in complex lung disorder

Figure 3

General scheme of pipeline in identifying repeated sequences. (White inset boxes – Few examples of the available computational tools) Input query sequence data, is pre-processed by screening for TEs and the cryptic structures (poly (A) tail, degenerate primers) are trimmed to avoid excessive mismatches. It is then mapped against the reference genome and/or repeat region library to form clusters, for each cluster the programs (MAP, MAFFT) constructs multiple alignments resulting in consensus sequences. Followed by a post processing step wherein the consensus is realigned with the reference by using characteristic TE features as filter parameters, yielding concordant (YES) or discordant combinations (NO). Concordant combinations are the elements that are already in the reference library while the discordant combinations are of much interest as it represents putative novel elements.

Back to article page