Simulate mosaic (multi-batch) single-cell MTX data for eQTL analysis.

We will generate Y ~ Poisson(mu * rho) where mu ~ exp(log.mu/smudge), rho ~ Gamma(a,b)

make.sc.eqtl.mosaic(
  file.header,
  X,
  h2,
  n.causal.snps = 1,
  n.causal.genes = 5,
  pve.y.by.u0 = 0.3,
  n.u0 = 3,
  pve.u1.by.x = 0.8,
  pve.y.by.u1 = 0.3,
  n.u1 = 3,
  pve.interaction = 0.5,
  n.interaction = 0,
  n.genes = 50,
  n.covar.genes = n.genes,
  num.mixtures = 1,
  num.mosaic = 1,
  smudge = 1,
  rho.a = 2,
  rho.b = 2,
  ncell.ind = 10,
  rseed = 13
)

Arguments

X: genotype matrix (individual x SNPs)
h2: heritability (proportion of variance of Y explained by genetic X)
n.causal.snps: X variables directly affecting on Y
n.causal.genes: Y variables directly regulated by X
pve.y.by.u0: proportion of variance of Y explained by U0
n.u0: number of covariates on Y
pve.u1.by.x: proportion of variance of U1 explained by X
pve.y.by.u1: proportion of variance of Y explained by U1
n.u1: number of covariates on Y
pve.interaction: proportion of variance of Y explained by interaction
n.interaction: number of genes interacting with the causal genes
n.genes: total number of genes (Y variables)
num.mixtures: num of cell mixtures
smudge: a scaling factor for a GLM model (default: 1)
rho.a: rho ~ Gamma(a, b)
rho.b: rho ~ Gamma(a, b)
ncell.ind: number of cells per individual
rseed: random seed
num.batches: num of single-cell data batches

Value

simulation results

Details

The simulation result list will have two lists:

data:

data$mtx: a matrix market data file
data$row: a file with row names
data$col: a file with column names
data$idx: an indexing file for the columns
data$indv: a mapping file between column and individual names

indv:

indv$y: observed (noisy) individual x gene matrix
indv$x: observed individual x variants genotype matrix
indv$causal.snps: causal variants (X variables)
indv$causal.genes: causal genes (Y variables)
indv$causal.label: true labels