Generate Pseudobulk V(D)J Feature Space — vdjPseudobulk • dandelionR

This function creates a pseudobulk V(D)J feature space from single-cell data, aggregating V(D)J information into pseudobulk groups. It supports input as either a Milo object or a SingleCellExperiment object.

Usage

vdjPseudobulk(
  milo,
  pbs = NULL,
  col_to_bulk = NULL,
  extract_cols = c("v_call_abT_VDJ_main", "j_call_abT_VDJ_main", "v_call_abT_VJ_main",
    "j_call_abT_VJ_main"),
  mode_option = c("abT", "gdT", "B"),
  col_to_take = NULL,
  normalise = TRUE,
  renormalize = FALSE,
  min_count = 1L,
  verbose = TRUE
)

Arguments

milo

A Milo or SingleCellExperiment object containing V(D)J data.

pbs

Optional. A binary matrix with cells as rows and pseudobulk groups as columns.

If milo is a Milo object, this parameter is not required.
If milo is a SingleCellExperiment object, either pbs or col_to_bulk must be provided.

col_to_bulk

Optional character or character vector. Specifies colData column(s) to generate pbs. If multiple columns are provided, they will be combined. Default is NULL.

If milo is a Milo object, this parameter is not required.
If milo is a SingleCellExperiment object, either pbs or col_to_bulk must be provided.

extract_cols

Character vector. Specifies column names where V(D)J information is stored. Default is c('v_call_abT_VDJ_main', 'j_call_abT_VDJ_main', ' 'v_call_abT_VJ_main', 'j_call_abT_VJ_main').

mode_option

Character. Specifies the mode for extracting V(D)J genes. Must be one of c('B', 'abT', 'gdT'). Default is 'abT'.

Note: This parameter is considered only when extract_cols = NULL.
If NULL, uses column names such as v_call_VDJ instead of v_call_abT_VDJ.

col_to_take

Optional character or vector of characters. Specifies names of colData of milo that need to identify the most common value for each pseudobulk Default is NULL.

normalise

Logical. If TRUE, scales the counts of each V(D)J gene group to 1 for each pseudobulk. Default is TRUE.

renormalize

Logical. If TRUE, rescales the counts of each V(D)J gene group to 1 for each pseudobulk after removing 'missing' calls. Useful when setupVdjPseudobulk() was run with remove_missing = FALSE. Default is FALSE.

min_count

Integer. Sets pseudobulk counts in V(D)J gene groups with fewer than this many non-missing calls to 0. Default is 1.

verbose

Logical. If TRUE, prints messages and warnings. Default is TRUE.

Value

SingleCellExperiment object

Details

This function aggregates V(D)J data into pseudobulk groups based on the following logic:

Input Requirements:
If milo is a Milo object, neither pbs nor col_to_bulk is required.
If milo is a SingleCellExperiment object, the user must provide either pbs or col_to_bulk.
Normalization:
When normalise = TRUE, scales V(D)J counts to 1 for each pseudobulk group.
When renormalize = TRUE, rescales the counts after removing 'missing' calls.
Mode Selection:
If extract_cols = NULL, the function relies on mode_option to determine which V(D)J columns to extract.
Filtering:
Uses min_count to filter pseudobulks with insufficient counts for V(D)J groups.

Examples

data(sce_vdj)
sce_vdj <- setupVdjPseudobulk(sce_vdj,
    already.productive = FALSE,
    allowed_chain_status = c("Single pair", "Extra pair")
)
#> Checking productivity from productive_abT_VDJ, productive_abT_VJ ...
#> 7279 of cells filtered
#> checking allowed chains...
#> 12 of cells filtered
#> VDJ data extraction begin:
#> extract_cols not specified, automatically generate colnames for extraction.
#> Extract main TCR from v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ ...
#> Complete.
#> Filtering cells from v_call_abT_VDJ_main, j_call_abT_VDJ_main, v_call_abT_VJ_main, j_call_abT_VJ_main ...
#> 63 of cells filtered
#> 2646 of cells remain.
# Build Milo Object
milo_object <- miloR::Milo(sce_vdj)
milo_object <- miloR::buildGraph(milo_object,
    k = 50, d = 20,
    reduced.dim = "X_scvi"
)
#> Constructing kNN graph with k:50
milo_object <- miloR::makeNhoods(milo_object,
    reduced_dims = "X_scvi",
    d = 20
)
#> Checking valid object
#> Running refined sampling with reduced_dim

# Construct pseudobulked VDJ feature space
pb.milo <- vdjPseudobulk(milo_object, col_to_take = "anno_lvl_2_final_clean")