Skip to contents

This function creates a pseudobulk V(D)J feature space from single-cell data, aggregating V(D)J information into pseudobulk groups. It supports input as either a Milo object or a SingleCellExperiment object.

Usage

vdjPseudobulk(
  milo,
  pbs = NULL,
  col_to_bulk = NULL,
  extract_cols = c("v_call_abT_VDJ_main", "j_call_abT_VDJ_main", "v_call_abT_VJ_main",
    "j_call_abT_VJ_main"),
  mode_option = c("abT", "gdT", "B"),
  col_to_take = NULL,
  normalise = TRUE,
  renormalise = FALSE,
  min_count = 1L
)

Arguments

milo

A Milo or SingleCellExperiment object containing V(D)J data.

pbs

Optional. A binary matrix with cells as rows and pseudobulk groups as columns.

  • If milo is a Milo object, this parameter is not required.

  • If milo is a SingleCellExperiment object, either pbs or col_to_bulk must be provided.

col_to_bulk

Optional character or character vector. Specifies colData column(s) to generate pbs. If multiple columns are provided, they will be combined. Default is NULL.

  • If milo is a Milo object, this parameter is not required.

  • If milo is a SingleCellExperiment object, either pbs or col_to_bulk must be provided.

extract_cols

Character vector. Specifies column names where V(D)J information is stored. Default is c('v_call_abT_VDJ_main', 'j_call_abT_VDJ_main', 'v_call_abT_VJ_main', 'j_call_abT_VJ_main').

mode_option

Character. Specifies the mode for extracting V(D)J genes. Must be one of c('B', 'abT', 'gdT'). Default is 'abT'.

  • Note: This parameter is considered only when extract_cols = NULL.

  • If NULL, uses column names such as v_call_VDJ instead of v_call_abT_VDJ.

col_to_take

Optional character or list of characters. Specifies obs column(s) to identify the most common value for each pseudobulk. Default is NULL.

normalise

Logical. If TRUE, scales the counts of each V(D)J gene group to 1 for each pseudobulk. Default is TRUE.

renormalise

Logical. If TRUE, rescales the counts of each V(D)J gene group to 1 for each pseudobulk after removing 'missing' calls. Useful when setupVdjPseudobulk() was run with remove_missing = FALSE. Default is FALSE.

min_count

Integer. Sets pseudobulk counts in V(D)J gene groups with fewer than this many non-missing calls to 0. Relevant when normalise = TRUE. Default is 1.

Value

SingleCellExperiment object

Details

This function aggregates V(D)J data into pseudobulk groups based on the following logic:

  • Input Requirements:

    • If milo is a Milo object, neither pbs nor col_to_bulk is required.

    • If milo is a SingleCellExperiment object, the user must provide either pbs or col_to_bulk.

  • Normalization:

    • When normalise = TRUE, scales V(D)J counts to 1 for each pseudobulk group.

    • When renormalise = TRUE, rescales the counts after removing 'missing' calls.

  • Mode Selection:

    • If extract_cols = NULL, the function relies on mode_option to determine which V(D)J columns to extract.

  • Filtering:

    • Uses min_count to filter pseudobulks with insufficient counts for V(D)J groups.

Examples

data(sce_vdj)
sce_vdj <- setupVdjPseudobulk(sce_vdj,
    already.productive = FALSE,
    allowed_chain_status = c("Single pair", "Extra pair")
)
#> Checking productivity from productive_abT_VDJ, productive_abT_VJ ...
#> 7279 of cells filtered
#> checking allowed chain status...
#> 12 of cells filtered
#> VDJ data extraction begin:
#> Parameter extract_cols do not provided, automatically geneterate colnames for extraction.
#> Detect whether colData v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ already exist...
#> Extract main TCR from v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ ...
#> Complete.
#> Filtering cells from v_call_abT_VDJ_main, j_call_abT_VDJ_main, v_call_abT_VJ_main, j_call_abT_VJ_main ...
#> 63 of cells filtered
#> 2646 of cells remain.
# Build Milo Object
milo_object <- miloR::Milo(sce_vdj)
milo_object <- miloR::buildGraph(milo_object, k = 50, d = 20, reduced.dim = "X_scvi")
#> Constructing kNN graph with k:50
milo_object <- miloR::makeNhoods(milo_object, reduced_dims = "X_scvi", d = 20)
#> Checking valid object
#> Running refined sampling with reduced_dim

# Construct pseudobulked VDJ feature space
pb.milo <- vdjPseudobulk(milo_object, col_to_take = "anno_lvl_2_final_clean")