Plotting Histograms with Density Plots in R (ggplot2)

Last week I needed to plot some distributions of means of proportions of correct answers of a experiment.  As all we hate bar charts,  we must favour plots that show the data variability. I decided to make a histogram with density plot and mean.

First I simulated a dataset in long format (each row is an observation), which are the ones that I regularly use
then I make the plot.

Update: I added a new version with the points jittered and a boxplot. Suitable when you have few participants (<30).

View it in GitHub as Gist



# Cleaning the session
rm(list=ls());gc()
# Useul function to load and install al needed packages
# we are reading it from my gist on GitHub.
source("https://raw.githubusercontent.com/guidocor/R_utils/master/install.R")
install_and_detach(c("dplyr","ggplot2", "doBy"),load= T, clean = T)

theme_set(theme_bw()) # My favourite ggplot2 theme : )  
parts = 30 # Our participants
trials = 30 # Our Trials
conds = 3
# Simulate some data. There are better ways, but by the way this is a easy 
# 30 participants in two conditions with binary responses
response <- c( rbinom(parts*trials, 1, .3) , 
               rbinom(parts*trials, 1, .65), 
               rbinom(parts*trials, 1, .59)  )
condition <- c(rep(1, parts*trials), rep(2, parts*trials),rep(3, parts*trials))
participant <- sort(rep(1:(conds*parts), trials) )
# our data frame, ready
df <- data.frame(participant, condition, response)

# This is a trick to convert to factor (or numeric, etc) the data
df <- df[,c("participant", "condition", "response")]
to_factor <- c("participant", "condition")
df[,to_factor] <- lapply(df[,to_factor], factor)

# Factors are coded as 1, 2 and 3. We want they to be meaningful, so we must
# change them as follow:

levels(df$condition) <- c("Condition 1", "Condition 2", "Condition 3")

# Two ways of do the same one easier with summaryBy and other with dplyr
# Both are useful and apropiatte. In this particular
# case i think doBy function summaryBy is better 
means.v <- summaryBy(response ~ participant + condition, data = df) # First we have to group the data and then make a summary # (Don't be afraid of using the pipe ( %>%), if you are in Rstudio pres ctrl+shift+M)
means <- df %>% group_by(., condition, participant)  %>%  summarise(., m = mean(response))

# We need to store the means of each condition for the plot 
m.data <- means %>% summarise(., global.mean = mean(m))

# And finally the plot! 
means.plot<-ggplot(means, aes(m)) + # set the ggplot boject
  geom_density(alpha=.5, fill="#FF6666" ) + # you can pay with the alpha and the fill 
  # add a histogram, adapt the binwidth to your data!
  geom_histogram(colour="black", fill="white", alpha = 0.4, binwidth = 0.05)  + 
  # we want separate graphs for each condition, you can 
  # play with the number of columns with ncol!
  facet_wrap(~condition, ncol = 1) + 
  ggtitle("Mean by each group") + # title 
  labs(x = "Mean of each participant",
       y = "") + # labels in x 
  # Remember that we made a data frame wirh the mean of each condition?
  # que are using it for plotting the mean in each density plot 
  geom_vline(data=m.data, aes(xintercept=global.mean),
             linetype="dashed", size=1) 

means.plot 

# finally we can save the plot 
ggsave(file  = "./distributions.png",  means.plot, height = 8, width = 4, dpi = 300) 
# adjust the width, the height and the density per inch 


# A version of the same data can be displayed as points on a boxplot. 
# Suitable when you have few participants. 
points <- ggplot(means, aes(m)) + 
        geom_point(data = means, aes(y = m, x = condition),
                      size = 3, alpha = 0.5, colour="#FF6666",
                      position = position_jitter(width = 0.6, height = 0.1)) + 
        geom_boxplot(data = means, aes(y=m, x=condition), alpha= 0.2 , fill = "#545454") # colour and alphas 
   
# You can flip axis to get a more confortable display of results         
points <- points + ggtitle("Mean by each group") +
  labs(x = "", y = "Mean of each participant") + coord_flip()
points
# finally we can save the plot 
ggsave(file  = "./points.png",  points, height = 4, width = 5, dpi = 300) 
# adjust the width, the height and the density per inch 

And the result!

distributionspoints