Skip to contents

This function preprocesses single-cell V(D)J sequencing data for pseudobulk analysis. It filters data based on productivity and chain status, subsets data, extracts main V(D)J genes, and removes unmapped entries.

Usage

setupVdjPseudobulk(
  sce,
  mode_option = c("abT", "gdT", "B"),
  already.productive = TRUE,
  productive_cols = NULL,
  productive_vj = TRUE,
  productive_vdj = TRUE,
  allowed_chain_status = NULL,
  subsetby = NULL,
  groups = NULL,
  extract_cols = NULL,
  filter_unmapped = TRUE,
  check_vj_mapping = c(TRUE, TRUE),
  check_vdj_mapping = c(TRUE, FALSE, TRUE),
  check_extract_cols_mapping = NULL,
  remove_missing = TRUE
)

Arguments

sce

A SingleCellExperiment object. V(D)J data should be contained in colData for filtering.

mode_option

Optional character. Specifies the mode for extracting V(D)J genes. If NULL, extract_cols must be specified. Default is NULL.

already.productive

Logical. Whether the data has already been filtered for productivity. If TRUE, skips productivity filtering. Default is FALSE.

productive_cols

Character vector. Names of colData columns used for productivity filtering. Default is NULL.

productive_vj

Logical. If TRUE, retains cells where the main VJ chain is productive. Default is TRUE.

productive_vdj

Logical. If TRUE, retains cells where the main VDJ chain is productive. Default is TRUE.

allowed_chain_status

Character vector. Specifies chain statuses to retain. Valid options include c('single pair', 'Extra pair', 'Extra pair-exception', 'Orphan VDJ', 'Orphan VDJ-exception'). Default is NULL.

subsetby

Character. Name of a colData column for subsetting. Default is NULL.

groups

Character vector. Specifies the subset condition for filtering. Default is NULL.

extract_cols

Character vector. Names of colData columns where V(D)J information is stored, used instead of the standard columns. Default is NULL.

filter_unmapped

Logic. Whether to filter unmapped data. Default is TRUE.

check_vj_mapping

Logic vector. Whether to check for VJ mapping. Default is c(TRUE, TRUE).

  • If the first element is TRUE, function will filter the unmapped data in V gene of the VJ chain

  • If the second element is TRUE, function will filter the unmapped data in J gene of the VJ chain

check_vdj_mapping

Logic vector. Specifies columns to check for VDJ mapping. Default is c(TRUE, FALSE, 'TRUE).

  • If the first element is TRUE, function will filter the unmapped data in V gene of the VDJ chain

  • If the second element is TRUE, function will filter the unmapped data in D gene of the VDJ chain

  • If the third element is TRUE, function will filter the unmapped data in J gene of the VDJ chain

check_extract_cols_mapping

Character vector. Specifies columns related to extract_cols for mapping checks. Default is NULL.

remove_missing

Logical. If TRUE, removes cells with contigs matching the filter. If FALSE, masks them with uniform values. Default is TRUE.

Value

filtered SingleCellExperiment object

Details

The function performs the following preprocessing steps:

  • Productivity Filtering:

    • Skipped if already.productive = TRUE.

    • Filters cells based on productivity using productive_cols or standard colData columns named productive_{mode_option}_{type} (where type is 'VDJ' or 'VJ').

    • mode_option

      • function will check colData(s) named productive_{mode_option}_{type}, where type should be 'VDJ' or 'VJ' or both, depending on values of productive_vj and productive_vdj.

      • If set as NULl, the function needs the option 'extract_cols' to be specified

    • productive_cols

      • must be be specified when productivity filtering is need to conduct and mode_option is NULL.

      • where VDJ/VJ information is stored so that this will be used instead of the standard columns.

    • productive_vj, productive_vdj

      • If TRUE, cell will only be kept if the main V(D)J chain is productive

  • Chain Status Filtering:

    • Retains cells with chain statuses specified by allowed_chain_status.

  • Subsetting:

    • Conducted only if both subsetby and groups are provided.

    • Retains cells matching the groups condition in the subsetby column.

  • Main V(D)J Extraction:

    • Uses extract_cols to specify custom columns for extracting V(D)J information.

  • Unmapped Data Filtering:

    • decided to removes or masks cells based on filter_unmapped.

    • Checks specific columns for unclear mappings using check_vj_mapping, check_vdj_mapping, or check_extract_cols_mapping.

    • filter_unmapped

      • pattern to be filtered from object.

      • If is set to be NULL, the filtering process will not start

    • check_vj_mapping, check_vdj_mapping

      • only colData specified by these arguments (check_vj_mapping and check_vdj_mapping) will be checked for unclear mappings

    • check_extract_cols_mapping, related to extract_cols

      • Only colData specified by the argument will be checked for unclear mapping, the colData should first specified by extract_cols

    • remove_missing

      • If TRUE, will remove cells with contigs matching the filter from the object.

      • If FALSE, will mask them with a uniform value dependent on the column name.

Examples


# load data
data(sce_vdj)
# check the dimension
dim(sce_vdj)
#> [1] 33538 10000
# filtered the data
sce_vdj <- setupVdjPseudobulk(
    sce = sce_vdj,
    mode_option = "abT", # set the mode to αβTCR
    allowed_chain_status = c("Single pair", "Extra pair"),
    already.productive = FALSE
) # need to filter the unproductive cells
#> Checking productivity from productive_abT_VDJ, productive_abT_VJ ...
#> 7279 of cells filtered
#> checking allowed chain status...
#> 12 of cells filtered
#> VDJ data extraction begin:
#> Parameter extract_cols do not provided, automatically geneterate colnames for extraction.
#> Detect whether colData v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ already exist...
#> Extract main TCR from v_call_abT_VDJ, d_call_abT_VDJ, j_call_abT_VDJ, v_call_abT_VJ, j_call_abT_VJ ...
#> Complete.
#> Filtering cells from v_call_abT_VDJ_main, j_call_abT_VDJ_main, v_call_abT_VJ_main, j_call_abT_VJ_main ...
#> 63 of cells filtered
#> 2646 of cells remain.
# check the remaining dim
dim(sce_vdj)
#> [1] 33538  2646