Generated on 2023-02-11 at 16:44.

Snaptron Reference Tables¶

Table 1. Query Types¶

Query Type	Description	Multiplicity	Format	Example
Region	chromosome based coordinates range (1-based); HUGO gene name	1	chr(1-22,X,Y,M):1-size of chromosome; gene_name	chr21:1-500; CD99
Filter	range over summary statistic column values	1 or more	column_name(>:,<:,:)number (integer or float)	coverage_avg>:10
Sample Metadata	keyword and numeric range search over sample metadata	1 or more	fieldname(>:,<:,:)keyword	description:cortex; SMRIN>:8
Sample IDs	limits results to only junctions found in specified samples IDs	1	sids=\d+[,\d+]*	sids=2,40,50,100
Sample Group*	limits to only junctions found in specified sample group	1	sids=groupname	sids=Brain (gtex)
Snaptron IDs	one or more snaptron_ids	1	ids=\d+[,\d+]*	ids=5,7,8
Sample IDs	one or more sample_ids	1	ids=\d+[,\d+]*	ids=20,40,100

* The sids=groupname parameter is based on predefined groups of sample IDs. These group definitions are found in the data directory of the compilation being queried, typically in a file samples.groups.tsv, e.g. for GTEx: http://snaptron.cs.jhu.edu/data/gtex/samples.groups.tsv

The Region query type is required to be present if the Filter, Sample Metadata, Sample Group, and/or Sample IDs types are used.

Table 2. List of Snaptron Parameters¶

Parameter	WSI Endpoints	Values	# Occurrences	Example	Description
regions	snaptron;genes	chr[1-22XYM]:\d+-\d+;HUGO gene	1 but can take multiple arguments separated by a comma representing an OR	chr1:1-5000;DRD4	coordinate intervals and/or HUGO gene names
sids	snaptron;genes	sids=(\d+[,\d+]*)\|(groupname)	1	sids=30,100,150 ; sids=Brain	filter to only junctions from >=1 samples in this list; uses the samples’ rail_ids, can also take a predefined sample group name (e.g. GTEx tissue)
ids*	snaptron;genes;samples	ids=\d+[,\d+]*	1	ids=5,6,7	ID filter for snaptron_id (endpoint=snaptron) and rail_id (endpoint=samples); this only returns the specific records with those IDs
rfilter	snaptron;genes	fieldname[><!:]value	0 or more	rfilter=samples_count>:5&rfilter=coverage_sum:3	point range filter (inclusion)
sfilter	snaptron;genes;samples	fieldname:value OR freetext	0 or more	sfilter=description:Cortex&sfilter=library_strategy:RNA-Seq	sample metadata filter (inclusion)
contains	snaptron;genes	0,1	0-1 occurrences	contains=1	return only those junctions whose start and end coordinates are within the boundaries of the region (using either coordinates directly or passed in gene name)
exact	snaptron;genes	0,1	0-1 occurrences	exact=1	return only those junctions whose start and end coordinates are match the boundaries of the region requested
either	snaptron;genes	0,1,2	0-1 occurrences	either=2	return only those junctions whose start (either=1) or end (either=2) coordinate match or are within the boundaries of the region requested
header	snaptron;genes	0,1	0-1 occurrences	header=0	include the header as the first line (or not)
fields**	snaptron;genes	fields=fieldname[,fieldname]*	0 or more unique fieldnames within one fields clause	fields=snaptron_id,samples_count	which fields to return

* The ids parameter cannot be used with other parameters.

**can include non-return field options such as: rc (result count)

Tables 3 and 4 show the queryable fields for region and range query types respectively. Fields from tables 3 and 4 can be mixed together in the same query though only one region predicate is allowed per query as specified in Table 1 above.

Table 3. Region Query Fields (“regions” parameter)¶

Field	Range of Values	Example	Description
coordinate*	chr(1-22;X;Y;M):1-size of chromosome	chr1:4-100	chromosome:start-end
gene symbol*	a-zA-Z0-9	CD99	HUGO (HGNC) gene symbols

*you can either pass a coordinate string or a gene symbol in the interval query segment, but not both

Often the query filter columns (Table 4) can be used as a way to reduce the number of false positive junctions. This can be done easily with the two columns: samples_count and coverage_sum. Some suggested values from our own research are presented in Table 5.

Table 4. Query Filter Fields (“rfilter” parameter)¶

Field	Range of Values	Example	Description
length	1-500K	intron_length<:5000	length of exon-exon junction (intron)
annotated*	0 or 1	annotated:1	whether both left and right splice sites in one or more annotations (default is both)
left_annotated*	0 or 1	left_annotated:1	whether the left splice site is in one or more annotations
right_annotated*	0 or 1	right_annotated:1	whether the right splice site is in one or more annotations
strand	`+` or `-`	strand:+	which strand to require (default is both)
samples_count	1-Inf	samples_count>:5	number of samples in which this junction has one or more reads covering it
coverage_sum	1-Inf	coverage_sum>:10	aggregate count of reads covering the junction across all samples the junction appears in
coverage_avg	1.0-Inf	coverage_avg>:5.0	average of read coverage across all samples the junction appears in
coverage_median	1.0-Inf	coverage_median>:6.0	median of read coverage across all samples the junction appears in

* these fields are treated as booleans for the purpose of searching but as Strings when returned since if they are not 0, they will be a list of one or more annotation source abbreviations. Also, importantly, if each splice site of a junction (left/right) is annotated separately (not connected), annotated will be 0 but BOTH the left and right annotated fields will not be 0.

The return format is a TAB-delimited series of fields where each line represents a unique intron call. Table 5 displays the complete list of fields in the return format of the Snaptron web service. The chromosome, start, and, end fields are a special case where the index is a combination of all three of them together.

Table 5. Complete list of Snaptron Fields In Return Format¶

Field Index	Indexed?	Field Name	Type	Description
1	No	DataSource:Type	Abbrev:Single Character	Differentiates between a return line of type Intron (I), Sample (S), or Gene (G).
2	Yes	snaptron_id	Integer	stable, unique ID for Snaptron junctions
3	Yes	chromosome	String	Reference ID for genomics coordinates
4	Yes	start	Integer	beginning (left) coordinate of intron
5	Yes	end	Integer	last (right) coordinate of intron
6	Yes	length	Integer	Length of intron coordinate span
7	Yes	strand	Single Character	Orientation of intron (Watson or Crick)
8	Yes	annotated	String	If both ends of the intron are annotated as a splice site in some annotation
9	No	left_motif	String	Splice site sequence bases at the left end of the intron
10	No	right_motif	String	Splice site sequence bases at the right end of the intron
11	Yes	left_annotated	String	If the left end splice site is annotated or not and which annotations it appears in (maybe more than once)
12	Yes	right_annotated	String	If the right end splice site is in an annotated or not, same as left_annotated
13	No	samples*	Comma separated list of tuples: integer:integer	The list of samples which had one or more reads covering the intron and their coverages. IDs are from the IntropolisDB.
14	Yes	samples_count	Integer	Total number of samples that have one or more reads covering this junction
15	Yes	coverage_sum	Integer	Sum of all samples coverage for this junction
16	Yes	coverage_avg	Float	Average coverage across all samples which had at least 1 read covering the intron in the first pass alignment
17	Yes	coverage_median	Float	Median coverage across all samples which had at least 1 read covering the intron in the first pass alignment
18	No	source_dataset_id	Integer	Snaptron ID for the compilation. GTEx=1, SRAv2=2, TCGA=4)

* this field always starts with a ,; this is due to how it is searched when samples are used to filter a junction query (R+M or R+F+M). The format of this field is a comma-delimited list of samples and their raw read coverage in that sample. It uses the rail_id of the sample: ,rail_id1:coverage1,rail_id2:coverage2,.... This rail_id matches the first column in the relevant compilation’s samples.tsv file available from the links previously listed in the Raw Data and Indices section.

Snaptron Reference Tables¶

Table 1. Query Types¶

Table 2. List of Snaptron Parameters¶

Table 3. Region Query Fields (“regions” parameter)¶

Table 4. Query Filter Fields (“rfilter” parameter)¶

Table 5. Complete list of Snaptron Fields In Return Format¶

Snaptron

Navigation

Related Topics