Last week I needed to plot some distributions of means of proportions of correct answers of a experiment. As all we hate bar charts, we must favour plots that show the data variability. I decided to make a histogram with density plot and mean.

First I simulated a dataset in long format (each row is an observation), which are the ones that I regularly use

then I make the plot.

Update: I added a new version with the points jittered and a boxplot. Suitable when you have few participants (<30).

# Cleaning the session rm(list=ls());gc() # Useul function to load and install al needed packages # we are reading it from my gist on GitHub. source("https://raw.githubusercontent.com/guidocor/R_utils/master/install.R") install_and_detach(c("dplyr","ggplot2", "doBy"),load= T, clean = T) theme_set(theme_bw()) # My favourite ggplot2 theme : ) parts = 30 # Our participants trials = 30 # Our Trials conds = 3 # Simulate some data. There are better ways, but by the way this is a easy # 30 participants in two conditions with binary responses response <- c( rbinom(parts*trials, 1, .3) , rbinom(parts*trials, 1, .65), rbinom(parts*trials, 1, .59) ) condition <- c(rep(1, parts*trials), rep(2, parts*trials),rep(3, parts*trials)) participant <- sort(rep(1:(conds*parts), trials) ) # our data frame, ready df <- data.frame(participant, condition, response) # This is a trick to convert to factor (or numeric, etc) the data df <- df[,c("participant", "condition", "response")] to_factor <- c("participant", "condition") df[,to_factor] <- lapply(df[,to_factor], factor) # Factors are coded as 1, 2 and 3. We want they to be meaningful, so we must # change them as follow: levels(df$condition) <- c("Condition 1", "Condition 2", "Condition 3") # Two ways of do the same one easier with summaryBy and other with dplyr # Both are useful and apropiatte. In this particular # case i think doBy function summaryBy is better means.v <- summaryBy(response ~ participant + condition, data = df) # First we have to group the data and then make a summary # (Don't be afraid of using the pipe ( %>%), if you are in Rstudio pres ctrl+shift+M) means <- df %>% group_by(., condition, participant) %>% summarise(., m = mean(response)) # We need to store the means of each condition for the plot m.data <- means %>% summarise(., global.mean = mean(m)) # And finally the plot! means.plot<-ggplot(means, aes(m)) + # set the ggplot boject geom_density(alpha=.5, fill="#FF6666" ) + # you can pay with the alpha and the fill # add a histogram, adapt the binwidth to your data! geom_histogram(colour="black", fill="white", alpha = 0.4, binwidth = 0.05) + # we want separate graphs for each condition, you can # play with the number of columns with ncol! facet_wrap(~condition, ncol = 1) + ggtitle("Mean by each group") + # title labs(x = "Mean of each participant", y = "") + # labels in x # Remember that we made a data frame wirh the mean of each condition? # que are using it for plotting the mean in each density plot geom_vline(data=m.data, aes(xintercept=global.mean), linetype="dashed", size=1) means.plot # finally we can save the plot ggsave(file = "./distributions.png", means.plot, height = 8, width = 4, dpi = 300) # adjust the width, the height and the density per inch # A version of the same data can be displayed as points on a boxplot. # Suitable when you have few participants. points <- ggplot(means, aes(m)) + geom_point(data = means, aes(y = m, x = condition), size = 3, alpha = 0.5, colour="#FF6666", position = position_jitter(width = 0.6, height = 0.1)) + geom_boxplot(data = means, aes(y=m, x=condition), alpha= 0.2 , fill = "#545454") # colour and alphas # You can flip axis to get a more confortable display of results points <- points + ggtitle("Mean by each group") + labs(x = "", y = "Mean of each participant") + coord_flip() points # finally we can save the plot ggsave(file = "./points.png", points, height = 4, width = 5, dpi = 300) # adjust the width, the height and the density per inch

And the result!