Skip to contents

Generate approximate pseudo-bulk data by random projections while sharing rows/features across multiple data sets. Horizontal concatenation.

Usage

asap_random_bulk_cbind_mtx(
  mtx_files,
  row_files,
  col_files,
  idx_files,
  num_factors,
  r_batch_names = NULL,
  rows_restrict = NULL,
  rename_columns = TRUE,
  take_union_rows = FALSE,
  rseed = 42L,
  verbose = TRUE,
  NUM_THREADS = 0L,
  CELL_NORM = 10000,
  BLOCK_SIZE = 1000L,
  do_batch_adj = TRUE,
  do_log1p = FALSE,
  do_down_sample = TRUE,
  save_aux_data = FALSE,
  KNN_CELL = 10L,
  CELL_PER_SAMPLE = 100L,
  BATCH_ADJ_ITER = 100L,
  a0 = 1,
  b0 = 1,
  MAX_ROW_WORD = 2L,
  ROW_WORD_SEP = "_",
  MAX_COL_WORD = 100L,
  COL_WORD_SEP = "@"
)

Arguments

mtx_files

matrix-market-formatted data files (bgzip)

row_files

row names (gene/feature names)

col_files

column names (cell/column names)

idx_files

matrix-market colum index files

num_factors

a desired number of random factors

take_union_rows

take union of rows (default: FALSE)

rseed

random seed

verbose

verbosity

NUM_THREADS

number of threads in data reading

CELL_NORM

normalization constant per each data point

BLOCK_SIZE

disk I/O block size (number of columns)

do_batch_adj

(default: FALSE)

do_log1p

log(x + 1) transformation (default: FALSE)

do_down_sample

down-sampling (default: TRUE)

save_aux_data

save random projection (default: FALSE)

KNN_CELL

k-NN cells per batch between different batches (default: 10)

CELL_PER_SAMPLE

down-sampling cell per sample (default: 100)

BATCH_ADJ_ITER

batch Adjustment steps (default: 100)

a0

gamma(a0, b0) (default: 1e-8)

b0

gamma(a0, b0) (default: 1)

MAX_ROW_WORD

maximum words per line in row_files[i]

ROW_WORD_SEP

word separation character to replace white space

MAX_COL_WORD

maximum words per line in col_files[i]

COL_WORD_SEP

word separation character to replace white space

Value

a list

  • PB pseudobulk (average) data (feature x sample)

  • sum pseudobulk (sum) data (feature x sample)

  • matched.sum kNN-matched pseudobulk data (feature x sample)

  • sum_db batch-specific sum (feature x batch)

  • size size per sample (sample x 1)

  • prob_bs batch-specific frequency (batch x sample)

  • size_bs batch-specific size (batch x sample)

  • batch.effect batch effect (feature x batch)

  • log.batch.effect log batch effect (feature x batch)

  • batch.names batch names (batch x 1)

  • positions pseudobulk sample positions (cell x 1)

  • rand.dict random dictionary (proj factor x feature)

  • rand.proj random projection results (sample x proj factor)

  • colnames column (cell) names

  • rownames feature (gene) names