Filtering variants

Filter area

Special filters

The first four filters consist of a check box and a labeled entry field. The check box controls if the filter is turned on or off: If the box in unchecked, the content of the corresponding entry field is grayed out and ignored.

Restrict to genes
Either write in gene names directly in the entry field (separated by comma if more than one) or specify a file containing gene names. The file should have a single column, no header, and one gene name per row. If this filter is switched on, variants are excluded if their gene does not give an exact match to one of the specified genes. If a variant is associated with more than one gene, it will pass the filter if at least one of the genes gives a match.

Exclude genes
This filter is used to avoid certain genes, for instance known sources of false positives. As in the "Restrict to genes" field, you can enter gene names directly, or indicate a file with gene names. If the filter is switched on, variants are excluded if their associated gene is on the list. If a variant has more than one associated gene, it is excluded if at least one of them is on the list.

Exclude variants
This field expects the file name of a variant database. Typically this file is previously made by FILTUS using the database functionality, but this is not a requirement. Any variant format will work, as long as FILTUS manages to read it, and correctly guess which columns contain 1) chromosome number and 2) position. Other columns in the file are ignored. If this filter is switched on, variants are excluded if they give an exact match to both chromosome and position.

Restrict to regions
Here you can specify a file containing genomic regions. The file should have no headers and exactly 3 columns (separated by tab or space). Each line defines a genomic region by stating chromosome number, start and stop positions. For example, the file could look like this:
2 1000 5000
X 0 2e8
Using this file would keep only variants on chromosome 2 between positions 1000 and 5000, and all variants on the X chromosome (since the final number is bigger than the physical length of X). Note that the chromosome numbers must be written as in the variant files: If the variant files use the notation 'chr1', 'chr2', ..., then so must the region file.

Column filters

These basic filters work directly on columns in each of the loaded samples. Each column filter is defined by 4 entities: Column name, operator, value and the "keep if missing" (KIM) check box.

Column name
Clicking on the first button lets you choose among all column names present in any of the loaded files. If you choose a column name which is missing in some samples, the result is highly dependent on the KIM-box (see below).

Operator
The choices here should be self explanatory. Note: To switch off a particular column filter, simply empty either its operator or column name button.

Value
This is the character string or number you want to compare with the entries of the specified column. For all operators except greater than and less than you can use simple combinations using the keywords AND or OR. For the operators contains and does not contain there is also support for regular expressions: The keyword REGEX indicates that whatever comes after it is a regular expression pattern.

Keep if missing (KIM)
This check box controls what happens with variants having an empty entry in the specified column. If the KIM-box is checked, these variants are kept, otherwise they are excluded. Typical example: When filtering on a column with variant frequencies (e.g. 1000 Genomes Project), empty entries often mean a frequency of 0. But a column filter like "1000G - less than - 0.01" returns FALSE for empty entries, the default action is to remove such variants. To keep them, make sure the KIM-box is checked.
The KIM-box also determines the fate of samples not containing the specified column. If the KIM-box is checked, that particular column filter is simply ignored. If it is unchecked, all variants in this sample are excluded (and a warning is displayed).