Run TLS-Tractor
tlstractor.RdUsage
tlstractor(
gds_path,
sumstats_path,
method,
cond_local,
pheno_path,
pheno_id_col,
pheno_col,
covar_path = NULL,
covar_id_col = NULL,
covar_cols = NULL,
output_prefix,
scratch_dir = NULL,
snp_start = 1L,
snp_count = NULL,
n_cores = 1L,
chunk_size = 1024L,
local_ancestry_mac_threshold = 20L,
use_fast_version = TRUE
)Arguments
- gds_path
Character; path to the input GDS file.
- sumstats_path
Character; path to the munged summary statistics file.
- method
Character; one of
"linear"or"logistic".- cond_local
Logical; whether to condition on local-ancestry dosage terms.
- pheno_path
Character; path to the phenotype file.
- pheno_id_col
Character; sample-ID column name in the phenotype file.
- pheno_col
Character; phenotype column name in the phenotype file.
- covar_path
Character or
NULL; path to an optional covariate file. Default isNULL.- covar_id_col
Character or
NULL; sample-ID column name in the covariate file (required whencovar_pathis provided). Default isNULL.- covar_cols
Character vector or
NULL; covariate column names in the covariate file (required whencovar_pathis provided). Default isNULL.- output_prefix
Character; output path prefix.
- scratch_dir
Character or
NULL; directory used to temporarily store per-task result files before merge. IfNULL, a run-specific directory is created underdirname(output_prefix)with name<basename(output_prefix)>_<pid>_<YYYYmmdd_HHMMSS>_tmp. Default isNULL.- snp_start
Integer; 1-based starting SNP index in the GDS file. Default is
1L.- snp_count
Integer or
NULL; number of SNPs to process. IfNULL, processing starts atsnp_startand continues to the end of the GDS file. Default isNULL.- n_cores
Integer; number of CPU cores to use. Default is
1L.- chunk_size
Integer; number of variants processed per chunk within each CPU core. Default is 1024; larger values may improve speed but increase memory usage.
- local_ancestry_mac_threshold
Integer; ancestry-specific minor-allele count threshold. SNPs with any ancestry-specific MAC below this threshold are skipped. Skipped SNPs are returned in the output with
NAvalues for inferential columns. Default is20L.- use_fast_version
Logical; whether to use the fast TLS-Tractor mode. The fast mode assumes that the estimated non-genetic covariate effects from the null model (phenotype ~ covariates) are close to those from the full standard GWAS model (phenotype ~ SNP dosage + covariates) for any single SNP, so that estimated covariate effects from the null model can be reused across variants to reduce computation. The fast mode can provide a substantial speedup with minimal loss of accuracy. Default is
TRUE(recommended for large datasets). SettingFALSEfources the full per-variant fitting path, which is generally more robust but substantially slower when covariates are included. If no covariates are provided, fast mode is automatically disabled.
Value
Invisibly returns NULL. Writes gzipped GWAS results to
<output_prefix>.txt.gz. The output includes:
Variant metadata:
CHROM,POS,ID,REF,ALT,main_N(sample size in the main/internal study)Frequency and ancestry summaries:
AF(overall allele frequency),AF_anc*(ancestry-specific allele frequency),LAprop_anc*(local-ancestry-specific haplotype proportion)Analysis metadata:
has_sumstats(indicates whether external summary statistics were available),fallback_used(indicates whether QR decomposition fallback was used; when TRUE, results may have reduced numerical stability and should be interpreted with caution)Association results:
joint_pval(p-value for joint testing all ancestry-specific SNP dosage effects),beta_anc*,se_anc*,pval_anc*(effect size, standard error, and p-value for each ancestry-specific SNP dosage effect)When
cond_local = TRUE:LAeff_anc*,LAse_anc*,LApval_anc*(effect size, standard error, and p-value for each local ancestry term) }
Performs local ancestry-aware GWAS by integrating individual-level data with
external GWAS summary statistics via transfer learning.
If present, output the file <output_prefix>.excluded_samples.txt containing
sample IDs present in the GDS file but excluded from analysis after sample
intersection/filtering.During execution, temporary per-task result files are written to
scratch_dir/<run_tag>_task_<id>.txt.gz, where run_tag is a run-specific
identifier with format tlstractor_<pid>_<YYYYmmdd_HHMMSS>. These temporary
files are removed on successful cleanup. The directory scratch_dir is
removed only if it was created by the function.SNPs present in the GDS file but absent from the summary statistics file are
still analyzed using the Tractor model with individual-level data only.