Filtering variants
Special filters
The first four filters consist of a check box and a labeled
entry field. The check box controls if the filter is turned on or off:
If the box in unchecked, the content of the corresponding entry field
is grayed out and ignored.
- Restrict to
genes
- Either write in gene names directly in the
entry field (separated by comma if more than one) or specify a file
containing gene names. The file should have a single column, no header,
and one
gene name per row. If this filter is switched on, variants are excluded
if their
gene does not give an exact match to one of the specified genes. If a
variant is associated with more than one gene, it will pass the filter
if at least
one of the genes gives a match.
- Exclude genes
-
This filter is used to avoid certain genes, for instance known sources
of false positives. As in the "Restrict to genes" field, you can enter
gene names directly, or indicate a file with gene names. If the filter
is switched on, variants are excluded if their associated gene is on
the list. If a variant has more than one associated gene, it is
excluded if at least one of them is on the list.
- Exclude
variants
- This field expects the file name of a variant
database. Typically this file is previously made by FILTUS using the
database functionality, but this is not a requirement. Any variant
format will work, as long as FILTUS manages to read it, and correctly
guess which columns contain 1) chromosome number and 2) position. Other
columns in the file are ignored. If this filter is switched on,
variants are excluded if they give an exact match to both chromosome
and position.
- Restrict to
regions
- Here you can specify a file containing genomic
regions. The file should have no headers and exactly 3 columns
(separated by tab or space). Each line defines a genomic region by
stating chromosome number, start and stop positions. For example, the
file could look like this:
2
1000 5000
X 0
2e8
Using this file would keep only variants on chromosome 2 between
positions 1000 and 5000, and all variants on the X chromosome (since
the final number is bigger than the physical length of X). Note that
the chromosome numbers must be written as in the variant files: If the
variant files use the notation 'chr1', 'chr2', ..., then so must the
region file.
Column filters
These basic filters work directly on columns
in each of the loaded samples. Each column filter is defined by 4
entities: Column name, operator, value and the "keep if missing" (KIM) check
box.
- Column name
-
Clicking on the first button lets you choose
among all column names present in any of the loaded files. If you
choose a column name which is missing in some samples, the result is
highly dependent on the KIM-box (see below).
- Operator
-
The choices here should be self explanatory.
Note: To switch off a particular column filter, simply empty either its
operator or column name button.
- Value
-
This is the character string or number you want to
compare with the entries of the specified column. For all operators
except greater than and less than you can
use simple combinations
using the keywords AND or OR. For the operators contains
and does
not contain there is also support for regular expressions:
The keyword
REGEX indicates that whatever comes after it is a regular expression
pattern.
- Keep if
missing (KIM)
- This check box controls what
happens with variants having an empty entry in the specified column. If
the KIM-box is checked, these variants are kept, otherwise they are
excluded. Typical example: When filtering on a column with variant
frequencies (e.g. 1000 Genomes Project), empty entries often mean a
frequency of 0. But a column filter like "1000G - less than - 0.01"
returns FALSE for empty entries, the default action is to remove such
variants. To keep them, make sure the KIM-box is checked.
The KIM-box
also determines the fate of samples not containing the specified
column. If the KIM-box is checked, that particular column filter is
simply ignored. If it is unchecked, all variants in this sample are
excluded (and a warning is displayed).