An important goal in sequence analysis is to understand how parts of DNA, RNA, or protein sequences interact with each other and to predict how these interactions result in given phenotypes. Mapping phenotypes onto underlying sequence space at first- and higher order levels in order to independently quantify the impact of given nucleotides or residues along a sequence is critical to understanding sequence–phenotype relationships. We developed a Python software tool, ortho_seqs, that quantifies higher order sequence-phenotype interactions based on our previously published method of applying multivariate tensor-based orthogonal polynomials to biological sequences.
Technological advances in next generation sequencing have allowed for broad experimental sampling of immune repertoires, providing insight into how our immune system responds to infection, vaccination, autoimmunity, and cancer. The scale of these “big data”, however, make it difficult to bioinformatically extract the key sequence features that are shared across multiple repertoires. With AIRRscape, we enable large-scale immune repertoire visualization and analysis that requires no knowledge of the command line or advanced programming. By providing the community with an open-source, interactive, and user-friendly interface, we reduce the barriers to exploring immune repertoires at scale.
Molecular characterization of cell types using single-cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression.
Here, we present OnClass, an algorithm and accompanying software for automatically classifying cells into cell types that are part of the controlled vocabulary that forms the Cell Ontology.
An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space.
Although tremendous effort has been put into cell-type annotation, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as new cell types.
Despite rapid advances over recent years, many of the molecular and cellular processes that underlie the progressive loss of healthy physiology are poorly understood. To gain a better insight into these processes, here we generate a single-cell transcriptomic atlas across the lifespan of Mus musculus that includes data from 23 tissues and organs.
Here we present a compendium of single-cell transcriptomic data from the model organism Mus musculus that comprises more than 100,000 cells from 20 organs and tissues. Learn more.