|
VariantKey
5.4.1
Numerical Encoding for Human Genetic Variants
|
Functions to retrieve genome reference sequences from a binary FASTA file. More...
Go to the source code of this file.
Macros | |
| #define | ALLELE_MAXSIZE 256 |
| Maximum allele length. More... | |
| #define | NORM_WRONGPOS (-2) |
| Normalization: Invalid position. More... | |
| #define | NORM_INVALID (-1) |
| Normalization: Invalid reference. More... | |
| #define | NORM_OK (0) |
| Normalization: The reference allele perfectly match the genome reference. More... | |
| #define | NORM_VALID (1) |
| Normalization: The reference allele is inconsistent with the genome reference (i.e. when contains nucleotide letters other than A, C, G and T). More... | |
| #define | NORM_SWAP (1 << 1) |
| Normalization: The alleles have been swapped. More... | |
| #define | NORM_FLIP (1 << 2) |
| Normalization: The alleles nucleotides have been flipped (each nucleotide have been replaced with its complement). More... | |
| #define | NORM_LEXT (1 << 3) |
| Normalization: Alleles have been left extended. More... | |
| #define | NORM_RTRIM (1 << 4) |
| Normalization: Alleles have been right trimmed. More... | |
| #define | NORM_LTRIM (1 << 5) |
| Normalization: Alleles have been left trimmed. More... | |
Functions | |
| static void | mmap_genoref_file (const char *file, mmfile_t *mf) |
| static int | aztoupper (int c) |
| static void | prepend_char (const uint8_t pre, char *string, size_t *size) |
| static char | get_genoref_seq (mmfile_t mf, uint8_t chrom, uint32_t pos) |
| static int | check_reference (mmfile_t mf, uint8_t chrom, uint32_t pos, const char *ref, size_t sizeref) |
| static void | flip_allele (char *allele, size_t size) |
| static void | swap_sizes (size_t *first, size_t *second) |
| static void | swap_alleles (char *first, size_t *sizefirst, char *second, size_t *sizesecond) |
| static int | normalize_variant (mmfile_t mf, uint8_t chrom, uint32_t *pos, char *ref, size_t *sizeref, char *alt, size_t *sizealt) |
| static uint64_t | normalized_variantkey (mmfile_t mf, const char *chrom, size_t sizechrom, uint32_t *pos, uint8_t posindex, char *ref, size_t *sizeref, char *alt, size_t *sizealt, int *ret) |
| Returns a normalized 64 bit variant key based on CHROM, POS, REF, ALT. More... | |
The functions provided here allows to retrieve genome reference sequences from a binary version of a genome reference FASTA file.
The input reference binary files can be generated from a FASTA file using the resources/tools/fastabin.sh script.
| #define ALLELE_MAXSIZE 256 |
| #define NORM_FLIP (1 << 2) |
| #define NORM_INVALID (-1) |
| #define NORM_LEXT (1 << 3) |
| #define NORM_LTRIM (1 << 5) |
| #define NORM_OK (0) |
| #define NORM_RTRIM (1 << 4) |
| #define NORM_SWAP (1 << 1) |
| #define NORM_VALID (1) |
| #define NORM_WRONGPOS (-2) |
|
inlinestatic |
Returns the uppercase version of the input character. Note that this is safe to be used only with a-z characters. All characters above 'a' will be changed.
| c | Character to uppercase. |
|
inlinestatic |
Check if the reference allele matches the reference genome data.
| mf | Structure containing the memory mapped file. |
| chrom | Encoded Chromosome number (see encode_chrom). |
| pos | Position. The reference position, with the first base having position 0. |
| ref | Reference allele. String containing a sequence of nucleotide letters. |
| sizeref | Length of the ref string, excluding the terminating null byte. |
|
inlinestatic |
Flip the allele nucleotides (replaces each letter with its complement). The resulting string is always in uppercase. Support extended nucleotide letters.
| allele | Allele. String containing a sequence of nucleotide letters. |
| size | Length of the allele string. |
|
inlinestatic |
Returns the genome reference nucleotide at the specified chromosome and position.
| mf | Structure containing the memory mapped file. |
| chrom | Encoded Chromosome number (see encode_chrom). |
| pos | Position. The reference position, with the first base having position 0. |
|
inlinestatic |
Memory map the genoref binary file.
| file | Path to the file to map. |
| mf | Structure containing the memory mapped file. |
|
inlinestatic |
Normalize a variant. Flip alleles if required and apply the normalization algorithm described at: https://genome.sph.umich.edu/wiki/Variant_Normalization
| mf | Structure containing the memory mapped file. |
| chrom | Chromosome encoded number. |
| pos | Position. The reference position, with the first base having position 0. |
| ref | Reference allele. String containing a sequence of nucleotide letters. |
| sizeref | Length of the ref string, excluding the terminating null byte. |
| alt | Alternate non-reference allele string. |
| sizealt | Length of the alt string, excluding the terminating null byte. |
|
inlinestatic |
| mf | Structure containing the memory mapped binary fasta file. |
| chrom | Chromosome. An identifier from the reference genome, no white-space or leading zeros permitted. |
| sizechrom | Length of the chrom string, excluding the terminating null byte. |
| pos | Position. The reference position. |
| posindex | Position index: 0 for 0-based, 1 for 1-based. |
| ref | Reference allele. String containing a sequence of nucleotide letters. The value in the pos field refers to the position of the first nucleotide in the String. Characters must be A-Z, a-z or *. |
| sizeref | Length of the ref string, excluding the terminating null byte. |
| alt | Alternate non-reference allele string. Characters must be A-Z, a-z or *. |
| sizealt | Length of the alt string, excluding the terminating null byte. |
| ret | Normalization return value (see: normalize_variant). |
|
inlinestatic |
Prepend a character to a string.
| pre | Character to prepend. |
| string | String to modify. |
| size | Input string length. |
|
inlinestatic |
|
inlinestatic |