PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data.

For normal, non-disease expression data, a Binyamini-Hochenberg False Discovery Rate (FDR) correction for multiple tests is applied. In this study, we performed DE gene analysis for 24 developmental pathways. This database also provides DE gene lists in each developmental pathway, t-SNE map, and GO and KEGG enrichment analysis based on these differential genes. In developmental biology, gene expression changes during the developmental process is an important feature to understand developmental questions such as cell growth, cell differentiation, cell fate decisions, etc. We constructed an index for those gene expression data repositories, called All Of gene Expression (AOE), to integrate publicly available gene expression data. It helps researchers within the fields of developmental biology to facilitate gene expression studies in human single cells. These are the NCBI Gene Expression Omnibus (GEO) and the EBI ArrayExpress (AE) in a MIAME compliant manner. Each column in the file represents one gene that is expressed in the spinal cord or its related entities. Most of this fraction of data are in AOE level 2, but many entries can be found in this filter (AOE level 3). From the usage statistics (from July 2015 to Oct 2019), there were 95,334 visits, 393,174 page views, and 630,837 hits. After carefully reviewing the resultant papers and datasets, we obtained 10 datasets for human single-cell RNA-seq using normal cell type, tissue, or organs. Clicking on the stage name Left_Ventricle_E7w listed in the graph legend can remove the boxplot data of this stage. Thus, we decided to include those missing entries in AOE. A search for the keyword hypoxia with 25 results in one page is represented as follows: The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. Each large scale data set is presented in an experiment card containing a list of its samples. A search for human data in RNA-seq (sequencing assay) with twenty-five results in one page is represented as follows: Unfortunately, AE discontinued GEO data import in 2017. For example, the user can select data by release date by dragging the histogram generated with the keyword search. That meant AE contained all gene expression data, including those deposited to GEO. Three levels of AOE data flow are required to make the AOE index.

Normalized data extracted from microarray analyses and deposited in a public repository, were used to provide high-resolution gene expression information within the context. These stem and progenitor cells can be cultured in vitro and induced to differentiate into specific cell types. Users can see overall statistics of stored data in AOE.

The SCDevDB website was built using the Django Python Web framework coupled with the MySQL database. We believe that this database will facilitate researchers within the fields of developmental biology to investigate gene expression changes during human developmental pathways in the single-cell level. The Supplementary Material for this article can be found online at: The more copies a specific transcript has in the RNA sample, the stronger the staining intensity will be.

The computing resource was partly provided by the supercomputer system at the National Institute of Genetics (NIG), Research Organization of Information and Systems (ROIS), Japan. A comparison card presents two sample groups that were compared and the resulting differential gene expression. Each row represents a development path, anatomical compartment, cell or large scale dataset sample. Based on this rule, we classified the 18,413 single cells into 176 cell groups (Supplemental Table S1). The mRNA is then reverse transcribed. Gene expression data are accrued by manually surveying the published literature and/or by high-throughput experiments. Therefore, single-cell RNA-seq is particularly apposite for developmental biology. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. While most of the stem cells become differentiated, mammalians retain some stem/progenitor cells. We call this dataset that includes data from AE only AOE level 1. At the time of this publication, the database contains 10 datasets covering 18,413 single cells and 176 cell groups (see Methods). Using single-cell genomics to understand developmental processes and cell fate decisions. Gathering all three levels of data described above, AOE enables visualization and exploration of gene expression data. Data at this level contain BioProject IDs only.

Dev. Currently, expression data from The existence of a great deal of data at AOE level 3 shows that not all sequencing gene expression data are stored in GEO. Methods 15, 539. doi: 10.1038/s41592-018-0033-z, Kim, J. J., Khalid, O., Namazi, A., Tu, T. G., Elie, O., Lee, C., et al. When AOE is used as a search interface for DDBJ GEA, it is expected that AOE will be continuously used at the DDBJ website. Here, we developed a database to investigate single-cell gene expression profiling during different developmental processes (SCDevDB). Differential gene expression from high throughput experiments is calculated using an algorithm developed Development 139, 829841. Top 20 significantly enriched GO terms and KEGG terms were selected to show potential functions of DE genes. comparison is usually performed by calculating the normalized intensity fold change of one sample versus Users can retrieve data of interest graphically in the AOE web interface. Cell groups originating from the same cell lines, tissue regions, or organ regions but at different developmental time points were regarded as one developmental stage. The species in which the gene was found to be expressed, The location of the related protein in the cell. Yes (2013). We used the AE type of data to construct an initial AOE index set (called AOE level 1). Springer Science & Business Media. Once all the fragments are sequenced, the transcripts (or reads) are assembled into genes. *Correspondence: Shuai Cheng Li,, These authors have contributed equally to this work, Bioinformatics Analysis of Single Cell Sequencing Data and Applications in Precision Medicine, View all Researchers who are interested in gene expression changes during a specific developmental process are not easily able to extract these dynamic features from these databases. Identification of two novel mutations in the ventricular regulatory myosin light chain gene (MYL2) associated with familial and classical forms of hypertrophic cardiomyopathy. The Specifically, if the searched gene (MYL2 gene as an example) is not expressed at one stage, the stage image will be disabled and cannot be clicked (the light-colored images in Figure 3A). Integration of these public gene expression databases is needed to increase the reusability of gene expression data. the following projects are included: Microarray, expressed sequenced tags (EST) analysis and RNA sequencing data sets: These data sets include from whole embryo, commercial synthetic RNA or mixed samples.

adult body, while the second is their self-renewing capacity, which makes them a potential cell reservoir Mol. Qiu, X., Hill, A., Packer, J., Lin, D., Ma, Y.-A., Trapnell, C. (2017). The web interactive visualization graphs were developed using Plotly JavaScript Open Source Graphing Library ( doi: 10.1016/j.ydbio.2008.10.024, Griffiths, J. These files are in a simple spreadsheet-based, MIAME-supportive format, called MicroArray Gene Expression Tabular (MAGE-TAB) files, which are Array Design Format (ADF), Investigation Description Format (IDF), and Sample and Data Relationship Format (SDRF) files [11]. (2013). Peripheral blood - healthy controls, GUDMAP, GenitoUrinary Development Molecular Anatomy Project, Gensat, Gene expression nervous system by LifeMap Sciences, allowing standardization of gene expression lists. At the anatomical compartment level, genes expressed in any of the cells that comprise In this database, we collected 10 human single-cell RNA-seq datasets, split these datasets into 176 developmental cell groups, and constructed 24 different developmental pathways. Previous studies using bulk-seq data have shown that MYL2 is highly expressed in tissue of muscles including skeletal muscle, myocardial, and smooth muscles (Hsu et al., 2012; Lindholm et al., 2014; Renaudin et al., 2018). In order to provide users easy access to the SCDevDB, we designed an interface to allow users to perform basic operations, such as searching, viewing, and downloading data. This database is freely available at These studies not only revealed many biological features, including developmental processes, signaling pathways, cell cycle, and transcription factor networks but also provided resources to investigate the gene expression patterns during different developmental processes. (C) The function of removing uninterested developmental stages by clicking the name of the stage listed in the figure legend. J. Mach. to be characteristic of the cell. 71, 27072715. By subtracting the GEO data already existing in AE, new entries were included in AOE. Moreover, comparing with the expression levels in cells of the atriums, MYL2 has higher levels in cells of the ventricles (Figure 5). The top 30 species in AOE will be listed and can be accessed in this way.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Functions of myosin light chain-2 (MYL2) in cardiac muscle and disease.

cells during adult life. of the expression or repression of specific sets of genes, as well as by regulation at the epigenetic Single-cell mRNA quantification and differential analysis with Census. is then allocated to appropriate entity cards. Gene expression data are accrued by manually surveying the published literature and/or by The list contains the following gene types: In addition, the organ/tissue card includes a detailed gene expression table summarizing genes that At that point, we investigated data-series entries in these two databases by matching GEO series IDs: IDs beginning with GSE in GEO and those beginning with E-GEOD in AE; for example, GSE52334 in GEO corresponds to E-GEOD-52334 in AE. cell markers. AOE was originally developed to provide a graphical web interface to search EBI AE, which is one of the public gene expression databases described above. Genet. All shell and Perl5 scripts for those are accessible from GitHub ( the mRNA is isolated. Nature 500, 593. doi: 10.1038/nature12364, Yan, L., Yang, M., Guo, H., Yang, L., Wu, J., Li, R., et al. Figure 5 Comparison of MYL2 expression distributions between atrium cell types and ventricle cell types. Conceptualization, Developmental biology of the pancreas: a comprehensive review. LifeMap Discovery, which functions as a gene expression database, aims to sketch a systematic map of doi: 10.1007/s40291-018-0324-1, Saliba, A.-E., Westermann, A. J., Gorski, S. A., Vogel, J. and function as replacements for cells that are lost through normal physiological processes, injury We have maintained AOE for five years, and it has been useful for functional genomics research. (e.g., DNA microarray and RNA sequencing) are detailed in dedicated experiment cards. LifeMap Discovery. of the cell. Therefore, there is a strong need for a web resource that curates and provides single-cell gene expression profiles during different developmental processes. Res. The riboprobes hybridize to their complementary RNA transcripts in the sample, thereby enabling visualization of the RNA transcript location (D). Here, we report a detailed description and utility of AOE. While GEO data archived in AE is still available from AE, new data archived in GEO is no longer available from AE. doi: 10.1016/j.placenta.2013.03.016, Wang, D., Bodovitz, S. (2010). No, Is the Subject Area "Web-based applications" applicable to this article? SCDevDB is composed of two functional pages: Gene Expression Search page and Differential Gene List Collection page. Biol. Newly submitted data contain BioProject IDs, and this feature makes it possible to integrate multiple levels of indices and resolve complicated relationships among IDs, while old AE entries do not have BioProject ID. RNA sequencing (RNA-Seq) allows for quantitative determination of RNA expression levels. Below is a partial sample of the spinal cord gene expression data (the complete summary includes 64 Med. To perform RNA sequencing, RNA is extracted from cells or tissues and reverse transcribed to cDNA (A). Cell Stem Cell 20, 120134. to cDNA. Studies Network, Human Protein Atlas - RNA Sequencing Data, Cancer Data from 'The Gene Expression Barcode 3.0', This site does not provide medical advice and is for research use only. Furthermore, experiment includes one or more comparisons, each with its dedicated card (termed gene expression comparison Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima,Japan. Furthermore, the interactive boxplot of gene expression level along with a selected developmental pathway is available by clicking the stage image (Figure 3B). Database 2019 doi: 10.1093/database/baz046, Gafni, O., Weinberger, L., Mansour, A. The cDNA is then digested into fragments, while the fragment length may vary depending on sequencing machine specifications (B). It also provides lists of differentially expressed genes during each developmental pathway, T-distributed stochastic neighbor embedding maps showing the relationships between developmental stages based on these differentially expressed genes, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes analysis results of these differentially expressed genes. Int. Figure 3 Overview of the gene expression search page. We eventually got 176 different files which are consistent with 176 different cell groups. Each sample card contains a list of up to 200 down- and 200 up-regulated genes in the diseased tissue in comparison to the control. entities and 2,901 genes 17 developmental path-specific and 5 developmental path-enriched). ADF files are located in subdirectories in, with the file extension .adf.txt. Single-cell RNA-seq studies profile thousands of cells in developmental processes. Yes Single-cell RNA-seq: advances and future challenges. After the import of GEO data to AE was discontinued, AOE began importing GEO data by directly utilizing the DBCLS SRA application programming interface (API) [7]. data from the following projects are included: Normal expression large scale dataset example: Disease expression large scale dataset example: Developmental path-specific genes: Genes that only appear in a specific or few developmental paths. Searching result of the SCDevDB is consistent with these studies as shown in Figure 3A. This allows for characterization of cells by their gene expression patterns. doi: 10.1002/stem.1675, Ko, M. S. (2001). Here, we used this gene as an interested example to test the functions of SCDevDB. LifeMap Discovery provides a snapshot of gene expression patterns at cellular and anatomical resolutions. To illustrate the interactive function of this boxplot, we took the distribution of the MYL2 expression during left ventricle process as an example. Hsu, J., Hanna, P., Van Wagoner, D. R., Barnard, J., Serre, D., Chung, M. K., et al. AOE currently reports 524 items with three histograms (by year, organism, and quantification method; Fig 3C). each characterized by unique gene expression profiles. Whole-mount embryos or tissue sections are placed and fixed on a standard microscope slide (A, B). doi: 10.1007/s001090050210, Franzn, O., Gan, L.-M., Bjrkegren, J. L. (2019). Thus, we started indexing GEO data directly by making use of the API of DBCLS SRA (AOE level 2). In addition to data derived from the literature and from high throughput experiments, LifeMap Discovery ZW performed the data collection and analysis, XF developed the database, SL designed and supervised the study, and ZW, XF, and SL wrote the manuscript. High-resolution single-cell transcriptome analysis has been performed during many developmental processes including preimplantation development from oocyte to morula (Xue et al., 2013; Yan et al., 2013), early forebrain and mid/hindbrain cell differentiation from human embryonic stem cells (hESCs) (Yao et al., 2017), and digestive tract development from human embryos between 6 and 25 weeks (Gao et al., 2018), etc. cell markers. Nucleic Acids Res. Diseases associated with MYL2 include cardiomyopathy, familial hypertrophic, and congenital fiber-type disproportion (Flavigny et al., 1998; Weterman et al., 2013).

22, 219223. In normal gene expression experiment cards, the samples are linked to organ/tissue/anatomical compartment/in vivo cell/cultured cell cards. (2018). 12, 28252830. In addition, users can also download differentially expressed (DE) genes during each developmental pathway, the T-distributed stochastic neighbor embedding (t-SNE) map constructed with these genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis results of these differentially expressed (DE) genes. A In this study, we only focused on the normal human developmental processes; thus, we abnegated experiments using tumor and other samples treated with chemical reagents. Users can easily filter data by organism and quantification method of gene expression. Yes MalaCards. doi: 10.1016/j.stem.2016.09.011, Keywords: single cell, gene expression, development, database, cell type, differential expression, Citation: Wang Z, Feng X and Li SC (2019) SCDevDB: A Database for Insights Into Single-Cell Gene Expression Profiles During Human Developmental Processes. (2000). We selected the ontology parameter as BP, MF, and CC for GO analysis and pvalueCutoff parameter as 0.5 for both GO and KEGG analysis. expression list. Gene orthologs are referred to by the same gene symbol. So far, several web resources for human single-cell transcriptome data have been reported. Specifically, we first considered the developmental process from oocyte to hESC as the root process; then, the left 27 developmental stages were classified into 24 different developmental pathways by combining with the root process (Supplemental Table S1). Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Writing review & editing, Affiliation For each developmental pathway, we merged the expression data of all developmental stages in this pathway into one file. For the AE data type, several files are required to make an AOE index. Citation: Bono H (2020) All of gene expression (AOE): An integrated index for public gene expression databases. Investigation, doi: 10.1242/dev.060426.

The histogram for ranking by quantification methods can be dynamically created by clicking on the technology name. Cardiovasc. The fluorescence intensities of the spots, Stem cells demonstrate two unique characteristics distinguishing them from other cells in the body. PLOS ONE promises fair, rigorous peer review, SCDevDB allows users to search the expression profiles of the interested genes across different developmental pathways. Anatomical compartments: Gene expression profiles in the anatomical compartment cards, which Brain 136, 282293. (2013). Genes that are either established cell markers, or that have been suggested Front. For visualizing datasets, we implemented specially coded Python3 scripts, and we also used D3.js, a JavaScript library for manipulating documents based on data ( (2017). Scikit-learn: machine learning in Python. Nat. The aim of AOE is to integrate gene expression data and make them all searchable together. Fig 3B shows the number of data in AOE only for sequencing assays (RNA-seq). which correspond to the level of gene expression in each population, are normalized and compared. (2018). here. The detailed cell number in each stage is shown in Figure 2, and the datasets summary is available at It is also designed to be a search interface for DDBJ GEA, and a link to AOE can be found at the official GEA website. Data at this level contain IDs for BioProject and GEO, but not for AE. AOE is accessible as a search tool from the GEA website and is freely available at A concatenated tab-delimited file of the constructed index is archived in the Life Science Database Archive at National Bioscience Database Center (NBDC), Japan Science and Technology Agency (JST) at DOI: 10.18908/lsdba.nbdc00467-000 ( Roles Derivation of novel human ground state naive pluripotent stem cells. Transcript expression is calculated based on the number of reads aligned to each gene (D). Biol. ISH uses a labeled cDNA fragment (i.e., probe) Each cDNA fragment is sequenced in a high-throughput manner to obtain short sequences termed short sequenced reads (SSRs). (2017). However, users of these databases can only query gene expression in specific cell types or population heterogeneity processed by the authors. Whole genome expression differences in human left and right atria ascertained by RNA sequencing. high throughput experiments Myosin light chain-2 (MYL2, also called MLC-2) is a protein that belongs to the EF-hand calcium binding protein superfamily and exists as three major isoforms encoded by three distinct genes in mammalian striated muscle (Sheikh et al., 2015). The selected data (58 records currently) can be retrieved by clicking on the Retrieve button (Fig 3F). In disease expression datasets, the samples are not linked to any card in LifeMap Discovery, but are linked to cards in