Overview

Class Central Tips

The past decade has seen a vast increase in the amount of data available to biologists, driven by the dramatic decrease in cost and concomitant rise in throughput of various next-generation sequencing technologies, such that a project unimaginable 10 years ago was recently proposed, the Earth BioGenomes Project, which aims to sequence the genomes of all eukaryotic species on the planet within the next 10 years. So while data are no longer limiting, accessing and interpreting those data has become a bottleneck. One important aspect of interpreting data is data visualization. This course introduces theoretical topics in data visualization through mini-lectures, and applied aspects in the form of hands-on labs. The labs use both web-based tools and R, so students at all computer skill levels can benefit. Syllabus may be viewed at https://tinyurl.com/DataViz4GenomeBio.

Syllabus

Week 1

In this module we'll cover 3 straightforward approaches for generating simple plots. As we'll see in the lab, often visualizing datasets can help us see the overall shape of the data that might not be captured in descriptive statistics like mean and standard deviation. Plotting datasets is also a useful way to identify outliers. In the mini-lectures we go over some common biological data visualization paradigms and more generally what the common chart types are, and we also talk about the context and grammar of data visualization.

Week 2

In this week's module we explore ways of displaying biological variation and a little bit of background about track viewers. We also cover visual perception, Gestalt principles, and issues related to colour perception, important for accessibility-related reasons. In the lab we'll use an online app, PlotsOfDifferences, to generate some charts that display variation nicely, and we'll also use R to generate some box plots, histograms, and violin plots. Last but not least, we'll try adjusting some of the settings in JBrowse to help assess gene expression levels in a more intuitive manner. Thanks to Dr. Joachim Goedhart, University of Amsterdam, Netherlands for permission to use PlotsOfDifferences in the lab.

Week 3

In this week's module we explore ways of visualizing gene expression data after briefly covering how we can measure gene expression levels with RNA-seq and identify significantly differentially expressed genes using statistical tests. We also cover design thinking. In the lab we'll use an online platform, Galaxy, to generate a volcano plot for visualizing significantly differentially expressed genes, and we'll also use R to generate some heatmaps of gene expression. Last but not least, we'll create our own "electronic fluorescent pictographs" for a gene expression data set.

Week 4

In this week's module we cover how the Gene Ontology can be used to make sense of often overwhelmingly long lists of genes from transcriptomic and other kind of 'omic experiments, especially through Gene Ontology enrichment analyses. We'll also look at Agile Development and User Testing and how these can help improve data visualization tools. In the lab, we'll try our hand at 3 online Gene Ontology analysis apps, and create some nice overview charts for GO enrichment results in R. Thanks to Dr. Roy Navon, Technion University, Israel, for permission to use GOrilla in the lab. Thanks to Dr. Juri Reimand of the University of Toronto for permission to use g:Profiler. And thanks to Dr. Zhen Su of the China Agricultural University for permission to use AgriGO.

Week 5

In this week's module, we explore tools for displaying and analyzing graph networks, notably those created when we generate protein-protein interactions, especially in a high-throughput manner. These PPIs are deposited in online databases like BioGRID, and can be retrieved on-the-fly via web services for display in powerful network visualization apps like Cytoscape. We'll talk about other web services/APIs that are available for biology in one of the mini-lectures, and in the lab we'll use Cytoscape to explore interactors of BRCA2. We'll also use a plug-in called BiNGO to do Gene Ontology enrichment analyses of its interactors, continuing our exploration of GO that we started last week. Last, we'll try using D3 to display an interaction network in a web page.

Week 6

In this module we cover methods for generating and making sense of ever bigger biological data sets. The growth in sequencing capacity has enabled projects that we unimaginable even a few years ago, such as the Earth Biogenomes Project, which aims to sequence the genome of a representative of every eukaryotic species on the planet. In order to make sense of these large data sets, it is often useful to use dimentionality reduction methods, like t-SNE, PCA, and UMAP, to help visualize how similar samples are. Logic diagrams (Venn-Euler or Upset plots) are also useful for displaying how sets of genes are similar one to another. Thanks to Dr. Tim Hulsen (Philips Research, the Netherlands) for permission to use the DeepVenn app in the lab.