Preprocess V(D)J Data for Pseudobulk Analysis
setupVdjPseudobulk.Rd
This function preprocesses single-cell V(D)J sequencing data for pseudobulk analysis. It filters data based on productivity and chain status, subsets data, extracts main V(D)J genes, and removes unmapped entries.
Usage
setupVdjPseudobulk(
sce,
mode_option = c("abT", "gdT", "B"),
already.productive = TRUE,
productive_cols = NULL,
productive_vj = TRUE,
productive_vdj = TRUE,
allowed_chain_status = NULL,
subsetby = NULL,
groups = NULL,
extract_cols = NULL,
filter_unmapped = TRUE,
check_vj_mapping = c(TRUE, TRUE),
check_vdj_mapping = c(TRUE, FALSE, TRUE),
check_extract_cols_mapping = NULL,
remove_missing = TRUE
)
Arguments
- sce
A
SingleCellExperiment
object. V(D)J data should be contained incolData
for filtering.- mode_option
Optional character. Specifies the mode for extracting V(D)J genes. If
NULL
,extract_cols
must be specified. Default isNULL
.- already.productive
Logical. Whether the data has already been filtered for productivity. If
TRUE
, skips productivity filtering. Default isFALSE
.- productive_cols
Character vector. Names of
colData
columns used for productivity filtering. Default isNULL
.- productive_vj
Logical. If
TRUE
, retains cells where the main VJ chain is productive. Default isTRUE
.- productive_vdj
Logical. If
TRUE
, retains cells where the main VDJ chain is productive. Default isTRUE
.- allowed_chain_status
Character vector. Specifies chain statuses to retain. Valid options include
c('single pair', 'Extra pair', 'Extra pair-exception', 'Orphan VDJ', 'Orphan VDJ-exception')
. Default isNULL
.- subsetby
Character. Name of a
colData
column for subsetting. Default isNULL
.- groups
Character vector. Specifies the subset condition for filtering. Default is
NULL
.- extract_cols
Character vector. Names of
colData
columns where V(D)J information is stored, used instead of the standard columns. Default isNULL
.- filter_unmapped
Logic. Whether to filter unmapped data. Default is TRUE.
- check_vj_mapping
Logic vector. Whether to check for VJ mapping. Default is
c(TRUE, TRUE)
.If the first element is TRUE, function will filter the unmapped data in V gene of the VJ chain
If the second element is TRUE, function will filter the unmapped data in J gene of the VJ chain
- check_vdj_mapping
Logic vector. Specifies columns to check for VDJ mapping. Default is
c(TRUE, FALSE, 'TRUE)
.If the first element is TRUE, function will filter the unmapped data in V gene of the VDJ chain
If the second element is TRUE, function will filter the unmapped data in D gene of the VDJ chain
If the third element is TRUE, function will filter the unmapped data in J gene of the VDJ chain
- check_extract_cols_mapping
Character vector. Specifies columns related to
extract_cols
for mapping checks. Default isNULL
.- remove_missing
Logical. If
TRUE
, removes cells with contigs matching the filter. IfFALSE
, masks them with uniform values. Default isTRUE
.
Details
The function performs the following preprocessing steps:
Productivity Filtering:
Skipped if
already.productive = TRUE
.Filters cells based on productivity using
productive_cols
or standardcolData
columns namedproductive_{mode_option}_{type}
(wheretype
is 'VDJ' or 'VJ').mode_option
function will check colData(s) named
productive_{mode_option}_{type}
, where type should be 'VDJ' or 'VJ' or both, depending on values of productive_vj and productive_vdj.If set as
NULl
, the function needs the option 'extract_cols' to be specified
productive_cols
must be be specified when productivity filtering is need to conduct and mode_option is NULL.
where VDJ/VJ information is stored so that this will be used instead of the standard columns.
productive_vj, productive_vdj
If
TRUE
, cell will only be kept if the main V(D)J chain is productive
Chain Status Filtering:
Retains cells with chain statuses specified by
allowed_chain_status
.
Subsetting:
Conducted only if both
subsetby
andgroups
are provided.Retains cells matching the
groups
condition in thesubsetby
column.
Main V(D)J Extraction:
Uses
extract_cols
to specify custom columns for extracting V(D)J information.
Unmapped Data Filtering:
decided to removes or masks cells based on
filter_unmapped
.Checks specific columns for unclear mappings using
check_vj_mapping
,check_vdj_mapping
, orcheck_extract_cols_mapping
.filter_unmapped
pattern to be filtered from object.
If is set to be
NULL
, the filtering process will not start
check_vj_mapping, check_vdj_mapping
only
colData
specified by these arguments (check_vj_mapping
andcheck_vdj_mapping
) will be checked for unclear mappings
check_extract_cols_mapping, related to extract_cols
Only
colData
specified by the argument will be checked for unclear mapping, the colData should first specified by extract_cols
remove_missing
If
TRUE
, will remove cells with contigs matching the filter from the object.If
FALSE
, will mask them with a uniform value dependent on the column name.
Examples
# load data
data(sce_vdj)
# check the dimension
dim(sce_vdj)
#> [1] 33538 10000
# filtered the data
sce_vdj <- setupVdjPseudobulk(
sce = sce_vdj,
mode_option = "abT", # set the mode to αβTCR
allowed_chain_status = c("Single pair", "Extra pair"),
already.productive = FALSE
) # need to filter the unproductive cells
#> Checking productivity from productive_abT_VDJ, productive_abT_VJ ...
#> 7279 of cells filtered
#> checking allowed chain status...
#> 12 of cells filtered
#> VDJ data extraction begin:
#> Parameter extract_cols do not provided, automatically geneterate colnames for extraction.
#> Detect whether colData v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ already exist...
#> Extract main TCR from v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ ...
#> Complete.
#> Filtering cells from v_call_abT_VDJ_main, j_call_abT_VDJ_main, v_call_abT_VJ_main, j_call_abT_VJ_main ...
#> 63 of cells filtered
#> 2646 of cells remain.
# check the remaining dim
dim(sce_vdj)
#> [1] 33538 2646