Last updated: 2023-01-04
circleRegions()
chromRegions()
takes a file with the sizes of the chromosomes and draws a ggplot2
-based bar plot. Then takes a list of regions in a BED-like format to draw them into each chromosome. The inputs chrom.sizes
and regions
can be supplied as a list of character with path to the file where the information is stored or a list of data frame.
The crhom_sizes
input is:
chromsizes_sets <- list("Regions1" = "../testdata/mm10.chrom.sizes",
"Regions2" = "../testdata/mm10.chrom.sizes",
"Regions3" = "../testdata/mm10.chrom.sizes")
chromsizes_sets[[1]] %>% read.delim(header = F) %>% head()
## V1 V2
## 1 1 195471971
## 2 10 130694993
## 3 11 122082543
## 4 12 120129022
## 5 13 120421639
## 6 14 124902244
The regions_sets
is a list of characters or dataframes. It can have as many elements as wanted and it has the following structure:
regions_sets <- list("Regions1" = "../testdata/mm10.regions.tsv",
"Regions2" = "../testdata/mm10.regions2.tsv",
"Regions3" = "../testdata/mm10.regions3.tsv")
regions_sets[[1]] %>% read.delim(header = F) %>% head()
## V1 V2 V3 V4 V5 V6
## 1 1 57348975 57377520 region2 28545 -
## 2 1 91403055 91406029 region4 2974 -
## 3 1 92992344 92997067 region6 4723 -
## 4 1 125174891 125177979 region10 3088 +
## 5 10 18796805 18831930 region18 35125 -
## 6 10 20310505 20312760 region19 2255 -
The default run requires only the chromsizes_sets
and the regions_sets
arguments, either as a lists of paths to a file or a list of data frames. Both, the chromsizes_sets
and the regions_sets
arguments, must have the same length and the same order, ii.e. if the regions are from different species, the chromosome sizes must be from the corresponding species in the same order.
If you provide a list with 1 element, only one circle will be drawn. If the list contains three elements, three circles will be drawn.
Here we have a list of 1 element as input.
circleRegions(chromsizes_sets = chromsizes_sets[1], regions_sets = regions_sets[1], color_by = "region")
And here, a list of three elements as input:
circleRegions(chromsizes_sets = chromsizes_sets, regions_sets = regions_sets)
Also from a list of data frames:
# Read the data
chromsizes = chromsizes_sets %>% purrr::map(~read.delim(.x, header = F))
regions = regions_sets %>% purrr::map(~read.delim(.x, header = F))
circleRegions(chromsizes_sets = chromsizes, regions_sets = regions)
Very often, the genome assemblies of a lot of species have chromosomes/scaffolds with strange names, which are not nice to plot. These can be excluded using the chr_exlude
argument with a vector of regular expressions that match the chromosomes to exclude. By default chr_exclude
removes the most usuall strange chromosomes, but if you want to remove more chromosomes or don’t want to remove any, you can change the chr_exclude
argument.
An example that excludes all the chromosomes that contain a dot in the name:
circleRegions(chromsizes_sets = chromsizes_sets, regions_sets = regions_sets, color_by = "region", chr_exclude = "\\.")
## Paired regions
Imagine we have, for each element regions_set
, regions that are paired (i.e. have the same id (4th column) in different elements) and we want to connect them. We can do this by setting the paired
argument to TRUE
, which will cause the function to draw a line connecting the paired regions. The color of the line can be controlled with paired_color
, which is "Blue"
by default..
# Read the data
chromsizes = chromsizes_sets %>% purrr::map(~read.delim(.x, header = F))
regions = regions_sets %>% purrr::map(~read.delim(.x, header = F))
# Subset one of the regions and enter it as element of regions_sets
regions[[2]] <- regions[[1]][sample(1:100, 20, replace = F),]
regions[[3]] <- regions[[1]][sample(1:100, 20, replace = F),]
circleRegions(chromsizes_sets = chromsizes,
regions_sets = regions,
paired = T, paired_color = "Darkblue")
By default, circleRegions()
draws a line/rectangle and a point in the middle of each region. To avoid drawing the points, the argument draw_points
can be set to FALSE.
circleRegions(chromsizes_sets = chromsizes_sets, regions_sets = regions_sets, draw_points = F)
By default, if all the elements in chromsizes_sets
are equal, circleRegions()
only draws the labels of the chromosomes in the outer side of the most external circle. However, if they are not equal (i.e. different file names or different information in the dataframe (only between first and second elements)), they are plotted in all the circles. The color of the labels can be controlled with chr_label
, which is "Black"
by default.
chromsizes_sets2 <- list("Regions1" = "../testdata/mm10.chrom.sizes",
"Regions2" = "../testdata/mm10.chrom.sizes2")
circleRegions(chromsizes_sets = chromsizes_sets2, regions_sets = regions_sets[1:2], chr_label = "Red")
By default, circleRegions()
does not plot any lines to separate the chromosomes. This can be reversed by setting chr_line
to TRUE
.
circleRegions(chromsizes_sets = chromsizes_sets, regions_sets = regions_sets, chr_line = T)
By default, the regions are colored by region (i.e. each element in regions_sets
). This can be controlled with the colors
argument, which accepts a character vector with valid color names and the same length as regions_sets
circleRegions(chromsizes_sets = chromsizes_sets, regions_sets = regions_sets, color_by = "region")
If you want to color by strand, just turn color_by
to "strand"
.
circleRegions(chromsizes_sets = chromsizes_sets, regions_sets = regions_sets, color_by = "strand")
Now, imagine that we have regions that do not have a defined strand (e.g. most ChIP-seq peaks). In this case, the col_by_strand
is internally converted to FALSE and the regions are colored by region set (i.e. elements in regions_sets
). Look at this example with only one region set whose strand values are converted to “.”.
# Read and format regions file to have strand as "."
regions_no_strand <- read.delim("../testdata/mm10.regions.tsv", header = F) %>% dplyr::mutate(V6 = ".")
# Draw the plot
circleRegions(chromsizes_sets = chromsizes_sets,
regions_sets = list(regions_no_strand, regions_no_strand, regions_no_strand),
sets_names = c(paste("Regions", 1:3)), # set colnames because they are mandatory.
color_by = "strand")
If extra_info
is added, color_by
can be set to "extra"
, which will cause circleRegions
to be coloured by the added information. extra_info
must be a list of files or data frames with the id of the regions in region_sets
(4th column) and an extra column with the (discrete) information you want to use to colour the regions.
This is how a file/data frame within extra_info
should look like.
# Read and format regions files
extra <- purrr::map(regions_sets, read.delim, header = FALSE) %>%
purrr::set_names(nm = c("extra1", "2extra", "3")) %>% # this is to set the names of the extra info, which will be used as "extra" column
purrr::map(~dplyr::select(.x, "id" = V4)) %>%
purrr::imap(~dplyr::mutate(.x, extra = .y))
head(extra[[1]])
## id extra
## 1 region2 extra1
## 2 region4 extra1
## 3 region6 extra1
## 4 region10 extra1
## 5 region18 extra1
## 6 region19 extra1
And this is how the plot looks like.
# Draw the plot
circleRegions(chromsizes_sets = chromsizes_sets[1:3],
regions_sets = regions_sets,
sets_names = c(paste("Regions", 1:3)), # set colnames because they are mandatory.
extra_info = extra,
color_by = "extra")
Title and subtitle can be supplied through the arguments title
and subtitle
, respectively. By default, they are set to NULL, but can accept a character of length 1.
circleRegions(chromsizes_sets = chromsizes_sets,
regions_sets = regions_sets,
title = "This is a title",
subtitle = "This is a subtitle")
Finally, a caption can be included in the bottom-right corner by setting the caption
argument. By default, caption
is set to NULL and it can be set to TRUE or any character. If caption
is set to a character, whatever is written will be placed in the bottom-right corner. Instead, if it is set to TRUE, what will be written will be the number of regions in the input region sets.
Here there is an example with any character:
circleRegions(chromsizes_sets = chromsizes_sets,
regions_sets = regions_sets,
caption = "This is a caption")
On the other hand, if caption
is set to TRUE, the caption will show the number of regions in the input regions set (regions_sets
).
circleRegions(chromsizes_sets = chromsizes_sets,
regions_sets = regions_sets,
caption = TRUE)
The position of the legend is, by default, the bottom of the plot. This can be changed by changing the legend
argument to one of “bottom”, “right”, “top”, “left” or “none” (no legend). The legend
argument is passed through ggpubr::theme_pubr()
.
circleRegions(chromsizes_sets = chromsizes_sets,
regions_sets = regions_sets,
legend = "top")
circleRegions(chromsizes_sets = chromsizes_sets,
regions_sets = regions_sets,
legend = "none")
Since chromRegions()
outputs a ggplot2
-based bar plot, it can be further customized like any other ggplot2
-based plot.