Snaptron Data Directories ------------------------- These directories contain the raw data and the indices used by Snaptron including: Tabix (.tbi), SQLite (.sqlite), and Lucene (directories lucene_*). The compilation specific data and indices are contained within the sub-directory named by the compilation: srav1 - ~42M junctions from ~22K samples in the sequence read archive (HG19) srav2 - ~81M junctions from ~50K samples in the sequence read archive (HG38) gtex - ~30M junctions from ~10K samples from the Genotype-Tissue Expression consortium (HG38) tcga - ~37M junctions from ~11K samples from The Cancer Genome Atlas (HG38) Annotation-related files shared by multiple compilations and linked from their specific sub-directories are: gene_annotation_hg19 gene_annotation_hg38 where the HG19 version is only used by SRAv1. The HG38 version is used by the SRAv2, GTEx, and TCGA. The header for all junction files is (tab-delimited): junctions.header.tsv The junctions raw data file is (per-compilation) is: junctions.bgz in block-gzip format (.bgz) which can be read by gzip. The sample metadata file is (per-compilation): samples.tsv