Generate approximate pseudo-bulk data by random projections while linking features across multiple mtx files

Usage

asap_random_bulk_linking_mtx(
  mtx_files,
  row_files,
  col_files,
  idx_files,
  num_factors,
  rseed = 42L,
  verbose = TRUE,
  NUM_THREADS = 1L,
  CELL_NORM = 10000,
  BLOCK_SIZE = 1000L,
  do_log1p = FALSE,
  do_down_sample = FALSE,
  save_rand_proj = FALSE,
  weighted_rand_proj = FALSE,
  CELL_PER_SAMPLE = 100L,
  a0 = 1e-08,
  b0 = 1,
  MAX_ROW_WORD = 2L,
  ROW_WORD_SEP = "_",
  MAX_COL_WORD = 100L,
  COL_WORD_SEP = "@"
)

Arguments

mtx_files: matrix-market-formatted data files (bgzip)
row_files: row names (gene/feature names)
col_files: column names (cell/column names)
idx_files: matrix-market colum index files
num_factors: a desired number of random factors per data set
rseed: random seed
verbose: verbosity
NUM_THREADS: number of threads in data reading
CELL_NORM: normalization constant per each data point
BLOCK_SIZE: disk I/O block size (number of columns)
do_log1p: log(x + 1) transformation (default: FALSE)
do_down_sample: down-sampling (default: FALSE)
save_rand_proj: save random projection (default: FALSE)
weighted_rand_proj: save random projection (default: FALSE)
CELL_PER_SAMPLE: down-sampling cell per sample (default: 100)
a0: gamma(a0, b0) (default: 1e-8)
b0: gamma(a0, b0) (default: 1)
MAX_ROW_WORD: maximum words per line in row_files[i]
ROW_WORD_SEP: word separation character to replace white space
MAX_COL_WORD: maximum words per line in col_files[i]
COL_WORD_SEP: word separation character to replace white space

Value

a list

PB.list pseudobulk (average) data (feature x sample) for each type
sum.list pseudobulk (sum) data (feature x sample) for each type
size.list size per sample (sample x 1) for each type
rownames.list feature (gene) names for each type
colnames column (cell) names across data types
positions pseudobulk sample positions (cell x 1)
rand.proj random projection results (sample x proj factor)
colnames column (cell) names