
Vars_use = 'dataset', # variable to integrate out Theta = 1, # cluster diversity enforcement Meta_data = meta_data, # dataframe with cell labels We’ll have a closer look into the different pieces of this object as we go!ĭata_mat = V, # PCA embedding matrix of cells
#Isubtitle org full#
Instead, it is the full Harmony model object.
setting return_object to TRUE means that harmonyObj below is not a corrected PCA embeddings matrix. we set to 0 because in this tutorial, we don’t want to actually run Harmony just yet. nclust in the R code below corresponds to the parameter K in the manuscript.
The rest of the parameters are described below.
meta_data: a dataframe object containing the variables we’d like to Harmonize over. The first thing we do is initialize a Harmony object. Labs(title = 'Colored by cell type', x = 'PC1', y = 'PC2') + Labs(title = 'Colored by dataset', x = 'PC1', y = 'PC2') +ĭo_scatter(V, meta_data, 'cell_type', no_guides = TRUE, do_labels = TRUE) + do_scatter(V, meta_data, 'dataset', no_guides = TRUE, do_labels = TRUE) + We color the cells by dataset of origin (left) and cell type (right). The plots below show the cells’ PC1 and PC2 embeddings. To get a feel for the data, let’s visualize the cells in PCA space. Meta_data <- harmony::cell_lines$meta_data We begin the analysis in this notebook from here. Then we performed PCA and kept the top 20 PCs. We library normalized the cells, log transformed the counts, and scaled the genes. We inferred cell type with the canonical marker XIST, since the two cell lines come from 1 male and 1 female donor. The first two (jurkat and 293t) come from pure cell lines while the half dataset is a 50:50 mixture of Jurkat and HEK293T cells. We downloaded 3 cell line datasets from the 10X website. This dataset is described in figure 2 of the Harmony manuscript.