Generate Pseudobulk V(D)J Feature Space
vdjPseudobulk.Rd
This function creates a pseudobulk V(D)J feature space from single-cell data, aggregating V(D)J information into pseudobulk groups. It supports input as either a Milo
object or a SingleCellExperiment
object.
Arguments
- milo
A
Milo
orSingleCellExperiment
object containing V(D)J data.- pbs
Optional. A binary matrix with cells as rows and pseudobulk groups as columns.
If
milo
is aMilo
object, this parameter is not required.If
milo
is aSingleCellExperiment
object, eitherpbs
orcol_to_bulk
must be provided.
- col_to_bulk
Optional character or character vector. Specifies
colData
column(s) to generatepbs
. If multiple columns are provided, they will be combined. Default isNULL
.If
milo
is aMilo
object, this parameter is not required.If
milo
is aSingleCellExperiment
object, eitherpbs
orcol_to_bulk
must be provided.
- extract_cols
Character vector. Specifies column names where V(D)J information is stored. Default is
c('v_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main')
.- mode_option
Character. Specifies the mode for extracting V(D)J genes. Must be one of
c('B', 'abT', 'gdT')
. Default is'abT'
.Note: This parameter is considered only when
extract_cols = NULL
.If
NULL
, uses column names such asv_call_VDJ
instead ofv_call_abT_VDJ
.
- col_to_take
Optional character or list of characters. Specifies
obs
column(s) to identify the most common value for each pseudobulk. Default isNULL
.- normalise
Logical. If
TRUE
, scales the counts of each V(D)J gene group to 1 for each pseudobulk. Default isTRUE
.- renormalise
Logical. If
TRUE
, rescales the counts of each V(D)J gene group to 1 for each pseudobulk after removing 'missing' calls. Useful whensetupVdjPseudobulk()
was run withremove_missing = FALSE
. Default isFALSE
.- min_count
Integer. Sets pseudobulk counts in V(D)J gene groups with fewer than this many non-missing calls to 0. Relevant when
normalise = TRUE
. Default is1
.
Details
This function aggregates V(D)J data into pseudobulk groups based on the following logic:
Input Requirements:
If
milo
is aMilo
object, neitherpbs
norcol_to_bulk
is required.If
milo
is aSingleCellExperiment
object, the user must provide eitherpbs
orcol_to_bulk
.
Normalization:
When
normalise = TRUE
, scales V(D)J counts to 1 for each pseudobulk group.When
renormalise = TRUE
, rescales the counts after removing 'missing' calls.
Mode Selection:
If
extract_cols = NULL
, the function relies onmode_option
to determine which V(D)J columns to extract.
Filtering:
Uses
min_count
to filter pseudobulks with insufficient counts for V(D)J groups.
Examples
data(sce_vdj)
sce_vdj <- setupVdjPseudobulk(sce_vdj,
already.productive = FALSE,
allowed_chain_status = c("Single pair", "Extra pair")
)
#> Checking productivity from productive_abT_VDJ, productive_abT_VJ ...
#> 7279 of cells filtered
#> checking allowed chain status...
#> 12 of cells filtered
#> VDJ data extraction begin:
#> Parameter extract_cols do not provided, automatically geneterate colnames for extraction.
#> Detect whether colData v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ already exist...
#> Extract main TCR from v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ ...
#> Complete.
#> Filtering cells from v_call_abT_VDJ_main, j_call_abT_VDJ_main, v_call_abT_VJ_main, j_call_abT_VJ_main ...
#> 63 of cells filtered
#> 2646 of cells remain.
# Build Milo Object
milo_object <- miloR::Milo(sce_vdj)
milo_object <- miloR::buildGraph(milo_object, k = 50, d = 20, reduced.dim = "X_scvi")
#> Constructing kNN graph with k:50
milo_object <- miloR::makeNhoods(milo_object, reduced_dims = "X_scvi", d = 20)
#> Checking valid object
#> Running refined sampling with reduced_dim
# Construct pseudobulked VDJ feature space
pb.milo <- vdjPseudobulk(milo_object, col_to_take = "anno_lvl_2_final_clean")