divbrowse.lib.genotype_data

Module Contents

Classes

GenotypeData

Class for managing all genotype data related data structures and methods

class divbrowse.lib.genotype_data.GenotypeData(config)

Class for managing all genotype data related data structures and methods

_load_data()
get_vcf_header()
_setup_sample_id_mapping()
_create_chrom_indices()
_create_list_of_chromosomes()
_free_mem()
sample_ids_to_mask(sample_ids: list) numpy.ndarray

Creates a boolean mask based on the input sample IDs that could be found in the samples array of the Zarr storage

Parameters

sample_ids (list) – List with sample IDs

Returns

Boolean mask, True for found sample IDs

Return type

numpy.ndarray

map_input_sample_ids_to_vcf_sample_ids(sample_ids: list) list

Map input sample IDs to VCF sample IDs according to the configured mapping table

Parameters

sample_ids (list) – List with sample IDs

Returns

List of mapped sample IDs

Return type

list

map_vcf_sample_ids_to_input_sample_ids(sample_ids: list) list

Map VCF sample IDs to input sample IDs according to the configured mapping table

Parameters

sample_ids (list) – List with sample IDs

Returns

List of mapped sample IDs

Return type

list

get_samples_mask(sample_ids)

Returns a tupel consisting of a boolean mask for found sample Ids and a list of mapped sample IDs

Parameters

sample_ids (list) – List with sample IDs

Returns

Boolean mask, True for found sample IDs list: mapped sample IDs

Return type

numpy.ndarray

get_posidx_by_genome_coordinate(chrom, pos, method='nearest') Tuple[int, str]

Returns array coordinates for given physical position on a given chromosome

Parameters
  • chrom (str) – ID of the chromosome

  • pos (int) – Physical position on the chromosome

Returns

lookup (int) Array coordinate of the found physical position on the chromosome lookup_type (str): Type of the lookup, could be either ‘direct_lookup’ or ‘nearest_lookup’

get_posidx_by_genome_coordinates(chrom, positions, method='nearest') Tuple[int, str]
count_variants_in_window(chrom, startpos, endpos) int

Counts number of variants in a genomic region

Parameters
  • chrom (str) – The chromosome of the genomic region.

  • startpos (int) – The first position of the genommic region.

  • endpos (int) – The last position of the genommic region.

Returns

Number of variants in the genomic region

Return type

int

get_slice_of_variant_calls(chrom, startpos=None, endpos=None, positions=None, count=None, samples=None, variant_filter_settings=None, with_call_metadata=False, calc_summary_stats=False, flanking_region_include=False, flanking_region_length=1500, flanking_region_direction='both') divbrowse.lib.variant_calls_slice.VariantCallsSlice