VariantKey
5.4.1
Numerical Encoding for Human Genetic Variants
|
Functions to read VariantKey-rsID binary files. More...
Go to the source code of this file.
Data Structures | |
struct | rsidvar_cols_t |
Typedefs | |
typedef struct rsidvar_cols_t | rsidvar_cols_t |
Functions | |
static void | mmap_vkrs_file (const char *file, mmfile_t *mf, rsidvar_cols_t *cvr) |
static void | mmap_rsvk_file (const char *file, mmfile_t *mf, rsidvar_cols_t *crv) |
static uint64_t | find_rv_variantkey_by_rsid (rsidvar_cols_t crv, uint64_t *first, uint64_t last, uint32_t rsid) |
static uint64_t | get_next_rv_variantkey_by_rsid (rsidvar_cols_t crv, uint64_t *pos, uint64_t last, uint32_t rsid) |
static uint32_t | find_vr_rsid_by_variantkey (rsidvar_cols_t cvr, uint64_t *first, uint64_t last, uint64_t vk) |
static uint32_t | get_next_vr_rsid_by_variantkey (rsidvar_cols_t cvr, uint64_t *pos, uint64_t last, uint64_t vk) |
static uint32_t | find_vr_chrompos_range (rsidvar_cols_t cvr, uint64_t *first, uint64_t *last, uint8_t chrom, uint32_t pos_min, uint32_t pos_max) |
The functions provided here allows fast search for rsID and VariantKey values from binary files made of adjacent constant-length binary blocks sorted in ascending order.
rsvk.bin: Lookup table to retrieve VariantKey from rsID. This binary file can be generated by the `resources/tools/rsvk.sh' script from a TSV file. This can also be in Apache Arrow File format with a single RecordBatch, or Feather format. The first column must contain the rsID sorted in ascending order.
vkrs.bin: Lookup table to retrieve rsID from VariantKey. This binary file can be generated by the `resources/tools/vkrs.sh' script from a TSV file. This can also be in Apache Arrow File format with a single RecordBatch, or Feather format. The first column must contain the VariantKey sorted in ascending order.
typedef struct rsidvar_cols_t rsidvar_cols_t |
Struct containing the RSVK or VKRS memory mapped file column info.
|
inlinestatic |
Search for the specified rsID and returns the first occurrence of VariantKey in the RV file.
crv | Structure containing the pointers to the RSVK memory mapped file columns (rsvk.bin). |
first | Pointer to the first element of the range to search (min value = 0). This will hold the position of the first record found. |
last | Element (up to but not including) where to end the search (max value = nitems). |
rsid | rsID to search. |
|
inlinestatic |
Search for the specified CHROM-POS range and returns the first occurrence of rsID in the VR file.
cvr | Structure containing the pointers to the VKRS memory mapped file columns (vkrs.bin). |
first | Pointer to the first element of the range to search (min value = 0). |
last | Pointer to the Element (up to but not including) where to end the search (max value = nitems). |
chrom | Chromosome encoded number. |
pos_min | Start reference position, with the first base having position 0. |
pos_max | End reference position, with the first base having position 0. |
|
inlinestatic |
Search for the specified VariantKey and returns the first occurrence of rsID in the VR file.
cvr | Structure containing the pointers to the VKRS memory mapped file columns (vkrs.bin). |
first | Pointer to the first element of the range to search (min value = 0). This will hold the position of the first record found. |
last | Element (up to but not including) where to end the search (max value = nitems). |
vk | VariantKey. |
|
inlinestatic |
Get the next VariantKey for the specified rsID in the RV file. This function should be used after find_rv_variantkey_by_rsid. This function can be called in a loop to get all VariantKeys that are associated with the same rsID (if any).
crv | Structure containing the pointers to the RSVK memory mapped file columns (rsvk.bin). |
pos | Pointer to the current item. This will hold the position of the next record. |
last | Element (up to but not including) where to end the search (max value = nitems). |
rsid | rsID to search. |
|
inlinestatic |
Get the next rsID for the specified VariantKey in the VR file. This function should be used after find_vr_rsid_by_variantkey. This function can be called in a loop to get all rsIDs that are associated with the same VariantKey (if any).
cvr | Structure containing the pointers to the VKRS memory mapped file columns (vkrs.bin). |
pos | Pointer to the current item. This will hold the position of the next record. |
last | Element (up to but not including) where to end the search (max value = nitems). |
vk | VariantKey. |
|
inlinestatic |
Memory map the RSVK binary file.
file | Path to the file to map. |
mf | Structure containing the memory mapped file. |
crv | Structure containing the pointers to the RSVK memory mapped file columns. |
|
inlinestatic |
Memory map the VKRS binary file.
file | Path to the file to map. |
mf | Structure containing the memory mapped file. |
cvr | Structure containing the pointers to the VKRS memory mapped file columns. |