Generated on 2017-11-14 at 16:47.

Snaptron Reference Tables

Table 1. Query Types

Query Type Description Multiplicity Format Example
Region chromosome based coordinates range (1-based); HUGO gene name 1 chr(1-22,X,Y,M):1-size of chromosome; gene_name chr21:1-500; CD99
Filter range over summary statistic column values 1 or more column_name(>:,<:,:)number (integer or float) coverage_avg>:10
Sample Metadata keyword and numeric range search over sample metadata 1 or more fieldname(>:,<:,:)keyword description:cortex; SMRIN>:8
Sample IDs limits results to only junctions found in specified samples IDs 1 sids=\d+[,\d+]* sids=2,40,50,100
Snaptron IDs one or more snaptron_ids 1 ids=\d+[,\d+]* ids=5,7,8
Sample IDs one or more sample_ids 1 ids=\d+[,\d+]* ids=20,40,100

The Region query type is required to be present if the Filter, Sample Metadata, and or Sample IDs types are used.

Table 2. List of Snaptron Parameters

Parameter WSI Endpoints Values # Occurrences Example Description
regions snaptron;genes chr[1-22XYM]:\d+-\d+;HUGO gene 1 but can take multiple arguments separated by a comma representing an OR chr1:1-5000;DRD4 coordinate intervals and/or HUGO gene names
sids snaptron;genes sids=\d+[,\d+]* 1 sids=30,100,150 additional query filter to only include junctions from one or more samples in this list; uses the samples’ rail_ids
ids* snaptron;genes;samples ids=\d+[,\d+]* 1 ids=5,6,7 ID filter for snaptron_id (endpoint=snaptron) and rail_id (endpoint=samples); this only returns the specific records with those IDs
rfilter snaptron;genes fieldname[><!:]value 0 or more rfilter=samples_count>:5&rfilter=coverage_sum:3 point range filter (inclusion)
sfilter snaptron;genes;samples fieldname:value OR freetext 0 or more sfilter=description:Cortex&sfilter=library_strategy:RNA-Seq sample metadata filter (inclusion)
contains snaptron;genes 0,1 0-1 occurrences contains=1 return only those junctions whose start and end coordinates are within the boundaries of the region (using either coordinates directly or passed in gene name)
exact snaptron;genes 0,1 0-1 occurrences exact=1 return only those junctions whose start and end coordinates are match the boundaries of the region requested
either snaptron;genes 0,1,2 0-1 occurrences either=2 return only those junctions whose start (either=1) or end (either=2) coordinate match or are within the boundaries of the region requested
header snaptron;genes 0,1 0-1 occurrences header=0 include the header as the first line (or not)
fields** snaptron;genes fields=fieldname[,fieldname]* 0 or more unique fieldnames within one fields clause fields=snaptron_id,samples_count which fields to return

* The ids parameter cannot be used with other parameters.

**can include non-return field options such as: rc (result count)

Tables 3 and 4 show the queryable fields for region and range query types respectively. Fields from tables 3 and 4 can be mixed together in the same query though only one region predicate is allowed per query as specified in Table 1 above.

Table 3. Region Query Fields (“regions” parameter)

Field Range of Values Example Description
coordinate* chr(1-22;X;Y;M):1-size of chromosome chr1:4-100 chromosome:start-end
gene symbol* a-zA-Z0-9 CD99 HUGO (HGNC) gene symbols

*you can either pass a coordinate string or a gene symbol in the interval query segment, but not both

Often the query filter columns (Table 4) can be used as a way to reduce the number of false positive junctions. This can be done easily with the two columns: samples_count and coverage_sum. Some suggested values from our own research are presented in Table 5.

Table 4. Query Filter Fields (“rfilter” parameter)

Field Range of Values Example Description
length 1-500K intron_length<:5000 length of exon-exon junction (intron)
annotated* 0 or 1 annotated:1 whether both left and right splice sites in one or more annotations (default is both)
left_annotated* 0 or 1 left_annotated:1 whether the left splice site is in one or more annotations
right_annotated* 0 or 1 right_annotated:1 whether the right splice site is in one or more annotations
strand + or - strand:+ which strand to require (default is both)
samples_count 1-Inf samples_count>:5 number of samples in which this junction has one or more reads covering it
coverage_sum 1-Inf coverage_sum>:10 aggregate count of reads covering the junction across all samples the junction appears in
coverage_avg 1.0-Inf coverage_avg>:5.0 average of read coverage across all samples the junction appears in
coverage_median 1.0-Inf coverage_median>:6.0 median of read coverage across all samples the junction appears in

* these fields are treated as booleans for the purpose of searching but as Strings when returned since if they are not 0, they will be a list of one or more annotation source abbreviations. Also, importantly, if each splice site of a junction (left/right) is annotated separately (not connected), annotated will be 0 but BOTH the left and right annotated fields will not be 0.

The return format is a TAB-delimited series of fields where each line represents a unique intron call. Table 5 displays the complete list of fields in the return format of the Snaptron web service. The chromosome, start, and, end fields are a special case where the index is a combination of all three of them together.

Table 5. Complete list of Snaptron Fields In Return Format

Field Index Indexed? Field Name Type Description
1 No DataSource:Type Abbrev:Single Character Differentiates between a return line of type Intron (I), Sample (S), or Gene (G).
2 Yes snaptron_id Integer stable, unique ID for Snaptron junctions
3 Yes chromosome String Reference ID for genomics coordinates
4 Yes start Integer beginning (left) coordinate of intron
5 Yes end Integer last (right) coordinate of intron
6 Yes length Integer Length of intron coordinate span
7 Yes strand Single Character Orientation of intron (Watson or Crick)
8 Yes annotated String If both ends of the intron are annotated as a splice site in some annotation
9 No left_motif String Splice site sequence bases at the left end of the intron
10 No right_motif String Splice site sequence bases at the right end of the intron
11 Yes left_annotated String If the left end splice site is annotated or not and which annotations it appears in (maybe more than once)
12 Yes right_annotated String If the right end splice site is in an annotated or not, same as left_annotated
13 No samples* Comma separated list of tuples: integer:integer The list of samples which had one or more reads covering the intron and their coverages. IDs are from the IntropolisDB.
14 Yes samples_count Integer Total number of samples that have one or more reads covering this junction
15 Yes coverage_sum Integer Sum of all samples coverage for this junction
16 Yes coverage_avg Float Average coverage across all samples which had at least 1 read covering the intron in the first pass alignment
17 Yes coverage_median Float Median coverage across all samples which had at least 1 read covering the intron in the first pass alignment
18 No source_dataset_id Integer Snaptron ID for the compilation. GTEx=1, SRAv2=2, TCGA=4)

* this field always starts with a ,; this is due to how it is searched when samples are used to filter a junction query (R+M or R+F+M). The format of this field is a comma-delimited list of samples and their raw read coverage in that sample. It uses the rail_id of the sample: ,rail_id1:coverage1,rail_id2:coverage2,.... This rail_id matches the first column in the relevant compilation’s samples.tsv file available from the links previously listed in the Raw Data and Indices section.