--- title: "gwas2crispr: From GWAS to CRISPR-ready Files" pagetitle: "gwas2crispr: From GWAS to CRISPR-ready Files" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{gwas2crispr: From GWAS to CRISPR-ready Files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, message = FALSE, warning = FALSE ) ``` ## Overview `gwas2crispr` prepares genome-wide association study (GWAS) results for downstream clustered regularly interspaced short palindromic repeats (CRISPR) workflows. The package retrieves significant single-nucleotide polymorphisms (SNPs) for supported GWAS Catalog trait identifiers from the EMBL-EBI GWAS Catalog REST API v2 and returns CRISPR-ready outputs for the GRCh38/hg38 human genome build. The main outputs are: * comma-separated values (CSV) tables, * Browser Extensible Data (BED) files, * optional FASTA sequence files. The public argument name `efo_id` is retained for backward compatibility. In gwas2crispr 0.1.5, selected EFO, MONDO, and NCIT identifiers are supported when available through the GWAS Catalog API. HP, Orphanet, and ORPHA identifiers are accepted for compatibility with selected records. Example accepted formats include `EFO_0001663`, `EFO:0001663`, `MONDO_0007254`, `MONDO:0007254`, `NCIT_C4872`, and `NCIT:C4872`. ## Installation Install from CRAN: ```{r} install.packages("gwas2crispr") ``` Optional packages for FASTA output: ```{r} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c( "Biostrings", "GenomeInfoDb", "BSgenome.Hsapiens.UCSC.hg38" )) ``` Development version: ```{r} if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") devtools::install_github("leopard0ly/gwas2crispr") ``` ## Fetch GWAS associations ```{r} library(gwas2crispr) gwas_data <- fetch_gwas( efo_id = "EFO_0000707", p_cut = 1e-6, verbose = FALSE ) names(gwas_data) head(gwas_data$associations) ``` Selected non-EFO identifiers use the same argument name when supported by the GWAS Catalog API: ```{r, eval=FALSE} fetch_gwas(efo_id = "MONDO_0007254", p_cut = 5e-8, verbose = FALSE) fetch_gwas(efo_id = "NCIT_C4872", p_cut = 5e-8, verbose = FALSE) ``` ## Run without writing files By default, no files are written. ```{r} res <- run_gwas2crispr( efo_id = "EFO_0000707", p_cut = 1e-6, flank_bp = 300, out_prefix = NULL, verbose = FALSE ) res$summary head(res$snps_full) head(res$bed) ``` ## Write files safely To write output files, provide `out_prefix`. In examples, use `tempdir()`. ```{r} out_prefix <- file.path(tempdir(), "lung") res <- run_gwas2crispr( efo_id = "EFO_0000707", p_cut = 1e-6, flank_bp = 300, out_prefix = out_prefix, verbose = FALSE ) res$written ``` Expected output paths: ```{r} paste0(out_prefix, "_snps_full.csv") paste0(out_prefix, "_snps_hg38.bed") paste0(out_prefix, "_snps_flank300.fa") ``` The FASTA file is created only when the optional genome packages are available. ## Output structure ```{r} names(res) ``` Common outputs: ```{r} res$summary res$snps_full res$bed res$fasta res$written ``` ## Session information ```{r, eval=TRUE} sessionInfo() ```