VariantKey
5.4.1
Numerical Encoding for Human Genetic Variants
|
Functions to retrieve genome reference sequences from a binary FASTA file. More...
Go to the source code of this file.
Macros | |
#define | ALLELE_MAXSIZE 256 |
Maximum allele length. More... | |
#define | NORM_WRONGPOS (-2) |
Normalization: Invalid position. More... | |
#define | NORM_INVALID (-1) |
Normalization: Invalid reference. More... | |
#define | NORM_OK (0) |
Normalization: The reference allele perfectly match the genome reference. More... | |
#define | NORM_VALID (1) |
Normalization: The reference allele is inconsistent with the genome reference (i.e. when contains nucleotide letters other than A, C, G and T). More... | |
#define | NORM_SWAP (1 << 1) |
Normalization: The alleles have been swapped. More... | |
#define | NORM_FLIP (1 << 2) |
Normalization: The alleles nucleotides have been flipped (each nucleotide have been replaced with its complement). More... | |
#define | NORM_LEXT (1 << 3) |
Normalization: Alleles have been left extended. More... | |
#define | NORM_RTRIM (1 << 4) |
Normalization: Alleles have been right trimmed. More... | |
#define | NORM_LTRIM (1 << 5) |
Normalization: Alleles have been left trimmed. More... | |
Functions | |
static void | mmap_genoref_file (const char *file, mmfile_t *mf) |
static int | aztoupper (int c) |
static void | prepend_char (const uint8_t pre, char *string, size_t *size) |
static char | get_genoref_seq (mmfile_t mf, uint8_t chrom, uint32_t pos) |
static int | check_reference (mmfile_t mf, uint8_t chrom, uint32_t pos, const char *ref, size_t sizeref) |
static void | flip_allele (char *allele, size_t size) |
static void | swap_sizes (size_t *first, size_t *second) |
static void | swap_alleles (char *first, size_t *sizefirst, char *second, size_t *sizesecond) |
static int | normalize_variant (mmfile_t mf, uint8_t chrom, uint32_t *pos, char *ref, size_t *sizeref, char *alt, size_t *sizealt) |
static uint64_t | normalized_variantkey (mmfile_t mf, const char *chrom, size_t sizechrom, uint32_t *pos, uint8_t posindex, char *ref, size_t *sizeref, char *alt, size_t *sizealt, int *ret) |
Returns a normalized 64 bit variant key based on CHROM, POS, REF, ALT. More... | |
The functions provided here allows to retrieve genome reference sequences from a binary version of a genome reference FASTA file.
The input reference binary files can be generated from a FASTA file using the resources/tools/fastabin.sh
script.
#define ALLELE_MAXSIZE 256 |
#define NORM_FLIP (1 << 2) |
#define NORM_INVALID (-1) |
#define NORM_LEXT (1 << 3) |
#define NORM_LTRIM (1 << 5) |
#define NORM_OK (0) |
#define NORM_RTRIM (1 << 4) |
#define NORM_SWAP (1 << 1) |
#define NORM_VALID (1) |
#define NORM_WRONGPOS (-2) |
|
inlinestatic |
Returns the uppercase version of the input character. Note that this is safe to be used only with a-z characters. All characters above 'a' will be changed.
c | Character to uppercase. |
|
inlinestatic |
Check if the reference allele matches the reference genome data.
mf | Structure containing the memory mapped file. |
chrom | Encoded Chromosome number (see encode_chrom). |
pos | Position. The reference position, with the first base having position 0. |
ref | Reference allele. String containing a sequence of nucleotide letters. |
sizeref | Length of the ref string, excluding the terminating null byte. |
|
inlinestatic |
Flip the allele nucleotides (replaces each letter with its complement). The resulting string is always in uppercase. Support extended nucleotide letters.
allele | Allele. String containing a sequence of nucleotide letters. |
size | Length of the allele string. |
|
inlinestatic |
Returns the genome reference nucleotide at the specified chromosome and position.
mf | Structure containing the memory mapped file. |
chrom | Encoded Chromosome number (see encode_chrom). |
pos | Position. The reference position, with the first base having position 0. |
|
inlinestatic |
Memory map the genoref binary file.
file | Path to the file to map. |
mf | Structure containing the memory mapped file. |
|
inlinestatic |
Normalize a variant. Flip alleles if required and apply the normalization algorithm described at: https://genome.sph.umich.edu/wiki/Variant_Normalization
mf | Structure containing the memory mapped file. |
chrom | Chromosome encoded number. |
pos | Position. The reference position, with the first base having position 0. |
ref | Reference allele. String containing a sequence of nucleotide letters. |
sizeref | Length of the ref string, excluding the terminating null byte. |
alt | Alternate non-reference allele string. |
sizealt | Length of the alt string, excluding the terminating null byte. |
|
inlinestatic |
mf | Structure containing the memory mapped binary fasta file. |
chrom | Chromosome. An identifier from the reference genome, no white-space or leading zeros permitted. |
sizechrom | Length of the chrom string, excluding the terminating null byte. |
pos | Position. The reference position. |
posindex | Position index: 0 for 0-based, 1 for 1-based. |
ref | Reference allele. String containing a sequence of nucleotide letters. The value in the pos field refers to the position of the first nucleotide in the String. Characters must be A-Z, a-z or *. |
sizeref | Length of the ref string, excluding the terminating null byte. |
alt | Alternate non-reference allele string. Characters must be A-Z, a-z or *. |
sizealt | Length of the alt string, excluding the terminating null byte. |
ret | Normalization return value (see: normalize_variant). |
|
inlinestatic |
Prepend a character to a string.
pre | Character to prepend. |
string | String to modify. |
size | Input string length. |
|
inlinestatic |
|
inlinestatic |