divbrowse.lib.genotype_data¶
Module Contents¶
Classes¶
Class for managing all genotype data related data structures and methods |
Functions¶
|
Calculate the mean for each variant of a variant matrix array holding the number of alternate alleles |
|
variant matrix array for that missing values should be imputed (replaced) with the mean for the variant |
Calculate a PCA for a variant matrix array |
|
|
Calculate UMAP for a variant matrix array |
- divbrowse.lib.genotype_data.calculate_mean(slice_of_variant_calls: numpy.ndarray) numpy.ndarray¶
Calculate the mean for each variant of a variant matrix array holding the number of alternate alleles
Note
Missing variant calls are excluded from the mean calculation
- Parameters
slice_of_variant_calls (numpy.ndarray) – Numpy array representing a variant matrix holding the number of alternate allele calls
- Returns
Numpy array holding the means per variant
- Return type
numpy.ndarray
- divbrowse.lib.genotype_data.impute_with_mean(slice_of_variant_calls: numpy.ndarray) numpy.ndarray¶
variant matrix array for that missing values should be imputed (replaced) with the mean for the variant
- Parameters
slice_of_variant_calls (numpy.ndarray) – Numpy array representing a variant matrix holding the number of alternate allele calls
- Returns
Imputed version of the input variant matrix array
- Return type
numpy.ndarray
- divbrowse.lib.genotype_data.calc_pca_for_slice_of_variant_calls(slice_of_variant_calls, samples_selected)¶
Calculate a PCA for a variant matrix array
- Parameters
slice_of_variant_calls (numpy.ndarray) – Numpy array representing a variant matrix holding the number of alternate allele calls
- Returns
PCA result aligned with the sample IDs in the first column
- Return type
numpy.ndarray
- divbrowse.lib.genotype_data.calc_umap_for_slice_of_variant_calls(slice_of_variant_calls, samples_selected, n_neighbors=15)¶
Calculate UMAP for a variant matrix array
- Parameters
slice_of_variant_calls (numpy.ndarray) – Numpy array representing a variant matrix holding the number of alternate allele calls
- Returns
PCA result aligned with the sample IDs in the first column
- Return type
numpy.ndarray
- class divbrowse.lib.genotype_data.GenotypeData(config)¶
Class for managing all genotype data related data structures and methods
- _load_data()¶
- get_vcf_header()¶
- _setup_sample_id_mapping()¶
- _create_chrom_indices()¶
- _create_list_of_chromosomes()¶
- sample_ids_to_mask(sample_ids: list) numpy.ndarray¶
Creates a boolean mask based on the input sample IDs that could be found in the samples array of the Zarr storage
- Parameters
sample_ids (list) – List with sample IDs
- Returns
Boolean mask, True for found sample IDs
- Return type
numpy.ndarray
- map_input_sample_ids_to_vcf_sample_ids(sample_ids: list) list¶
Map input sample IDs to VCF sample IDs according to the configured mapping table
- Parameters
sample_ids (list) – List with sample IDs
- Returns
List of mapped sample IDs
- Return type
list
- map_vcf_sample_ids_to_input_sample_ids(sample_ids: list) list¶
Map VCF sample IDs to input sample IDs according to the configured mapping table
- Parameters
sample_ids (list) – List with sample IDs
- Returns
List of mapped sample IDs
- Return type
list
- get_samples_mask(sample_ids)¶
Returns a tupel consisting of a boolean mask for found sample Ids and a list of mapped sample IDs
- Parameters
sample_ids (list) – List with sample IDs
- Returns
Boolean mask, True for found sample IDs list: mapped sample IDs
- Return type
numpy.ndarray
- get_posidx_by_genome_coordinate(chrom, pos) Tuple[int, str]¶
Returns array coordinates for given physical position on a given chromosome
- Parameters
chrom (str) – ID of the chromosome
pos (int) – Physical position on the chromosome
- Returns
lookup (int) Array coordinate of the found physical position on the chromosome lookup_type (str): Type of the lookup, could be either ‘direct_lookup’ or ‘nearest_lookup’
- count_alternate_alleles(sliced_variant_calls)¶
Returns a tupel consisting of a boolean mask for found sample Ids and a list of mapped sample IDs
- Parameters
sliced_variant_calls (numpy.ndarray) – variant matrix array holding the allele calls (0/0 0/1 1/1)
- Returns
variant matrix array holding the number of alternate allele calls
- Return type
numpy.ndarray
- count_variants_in_window(chrom, startpos, endpos) int¶
Counts number of variants in a genomic region
- Parameters
chrom (str) – The chromosome of the genomic region.
startpos (int) – The first position of the genommic region.
endpos (int) – The last position of the genommic region.
- Returns
Number of variants in the genomic region
- Return type
int
- calculate_minor_allele_freq(numbers_of_alternate_alleles)¶
Calculates minor allele frequency
- Parameters
numbers_of_alternate_alleles (numpy.ndarray) – Numpy array representing a variant matrix holding the number of alternate allele calls
- Returns
Numpy array (1d) holding the calculated minor allele frequencies per each variant
- Return type
numpy.ndarray
- calc_variants_summary_stats(numbers_of_alternate_alleles)¶
- apply_variant_filter_settings(fs, numbers_of_alternate_alleles, _slice_variant_calls)¶
- get_slice_of_variant_calls(chrom, startpos=None, endpos=None, count=None, samples=None, variant_filter_settings=None)¶