--- title: "Unravelling the gender productivity gap in science" subtitle: "Question 1b - Gender productivity gap (group-based)" date: March 21, 2019 author: "Camila de Toledo Castanho" output: html_document: toc: true toc_float: collapsed: false --- # Installing required packages ```{r , echo TRUE, eval = FALSE} install.packages (c("metafor", "rmarkdown", "knitr") ``` # Loading required packages ```{r } library(knitr) library(rmarkdown) library (metafor) ``` # Loading data ```{r eval = TRUE, echo = TRUE} Q1b<-read.table("Q1b.txt", header=TRUE, sep="\t") str(Q1b) ``` # Calculating effect sizes (yi) and variances (vi) ```{r eval = TRUE, echo = TRUE} Q1b<-escalc(measure="PR", xi=N_men, ni=N_total, data=Q1b) head (Q1b) ``` # Hierarchical mixed effect meta-analysis ```{r eval = TRUE, echo = TRUE} m.Q1b<-rma.mv(yi, vi, random=~1|ID_Article/ID_observation,data=Q1b) m.Q1b forest (m.Q1b, slab=Q1b$Reference, ylim=c(-1, 148), xlab="Raw proportion", mlab="Overall effect (144)") abline(v=0.5) text(-1.4,150, "Author(s) and Year", pos=4, font=2, cex=0.8) text(2.7,150, "Raw proportion [95% CI]", pos=2, font=2, cex=0.8) ``` # Heterogeneity Heterogeneity I^2 for hierarchical models is not provided by metafor. We calculate total heterogeneity using the formulas provided by Nakagawa & Santos 2012. 1) Calculate sampling variance of the dataset (we use precision of effect size); 2) Use the variance components of the model associated with random factors (those summarized in the sigma2 structure components). ## Sampling variance of the dataset ```{r eval = TRUE, echo = FALSE} Q1b$wi <- 1/Q1b$vi sv.mQ1b <- sum(Q1b$wi*(length(Q1b$wi)-1))/(sum(Q1b$wi)^2-sum(Q1b$wi^2)) sv.mQ1b ``` ## Total heterogenity ```{r eval = TRUE, echo = TRUE} I2.total = (m.Q1b$sigma2[1]+m.Q1b$sigma2[2])/(m.Q1b$sigma2[1]+m.Q1b$sigma2[2] + sv.mQ1b) * 100 I2.total ``` # Moderators ## Research field ### Significance of the moderator This parameterization of the model is used to test the significance of the moderator. ```{r eval = TRUE, echo = TRUE} tapply(Q1b$ID_observation, Q1b$Research_field, length) m.Q1b_area <- rma.mv(yi, vi, mods= ~ Research_field, random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_area ``` ### Estimation and significance of each level This parameterization of the model is used to estimate the mean effect size of each level of the moderator and test which of them are different from 0.5. ```{r eval = TRUE, echo = TRUE} m.Q1b_area <- rma.mv(yi, vi, mods= ~ Research_field-1, random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_area ``` ### Change in the reference level This parameterization is used to test if social science (as reference level) is significantly different from the other levels of the moderator. ```{r eval = TRUE, echo = TRUE} m.Q1b_area <- rma.mv(yi, vi, mods= ~ relevel (factor(Research_field), ref="Social science"), random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_area ``` ## Time ### Significance of the moderator This parameterization of the model is used to test the significance of the moderator. ```{r eval = TRUE, echo = TRUE} tapply(Q1b$Time, Q1b$Time, length) m.Q1b_ano<-rma.mv(yi, vi, mods= ~ Time, random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_ano ``` ### Estimation and significance of each level This parameterization of the model is used to estimate the mean effect size of each level of the moderator and test which of them are different from 0.5. ```{r eval = TRUE, echo = TRUE} m.Q1b_ano<-rma.mv(yi, vi, mods= ~ Time-1, random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_ano ``` ## Productivity proxy ### Significance of the moderator This parameterization of the model is used to test the significance of the moderator. ```{r eval = TRUE, echo = TRUE} tapply(Q1b$ID_observation, Q1b$Productivity_proxy, length) m.Q1b_prod<-rma.mv(yi, vi, mods= ~ Productivity_proxy,random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_prod ``` ### Estimation and significance of each level This parameterization of the model is used to estimate the mean effect size of each level of the moderator and test which of them are different from 0.5. ```{r eval = TRUE, echo = TRUE} m.Q1b_prod<-rma.mv(yi, vi, mods= ~ Productivity_proxy-1,random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_prod ``` ### Change in the reference level This parameterization is used to test if published articles (as reference level) is significantly different from the other levels of the moderator. ```{r eval = TRUE, echo = TRUE} m.Q1b_prod<-rma.mv(yi, vi, mods= ~ relevel (factor(Productivity_proxy), ref="publications"),random=~1|ID_Article/ID_observation, data=Q1b) m.Q1b_prod ``` # Publication bias ## Egger's regression Egger's regression using the meta-analytic residuals as the response variable and the precision as the moderator, as proposed by Nakagawa & Santos 2012 for hierarchical models. If the intercept of Egger's regression is significantly different from zero, there is evidence of publication bias. ```{r eval = TRUE, echo = TRUE} egger.Q1b<-lm(residuals.rma(m.Q1b)~Q1b$vi) summary(egger.Q1b) ``` ## Sensitivity analysis If residual standard >3 AND hatvalue >2 times the average of hatvalues, run analysis with those cases deleted to test for sensitivity (from Habeck & Schultz 2015). ```{r eval = TRUE, echo = TRUE} rs.Q1b.me<-rstandard (m.Q1b) hat.Q1b.me<-hatvalues(m.Q1b)/mean(hatvalues(m.Q1b)) plot(hat.Q1b.me, rs.Q1b.me$resid, xlab="hat / average hat value", ylab= "standard residuals",xlim=c(0,2.5), ylim=c(-3,3), cex.lab=1.2) abline (h=-3) abline (h=3) abline (v=(2)) ```