Title: | Tools for Analyzing Mixed Effect Regression Models |
---|---|
Description: | Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models. |
Authors: | Jared E. Knowles [aut, cre], Carl Frederick [aut], Alex Whitworth [ctb] |
Maintainer: | Jared E. Knowles <[email protected]> |
License: | GPL (>=2) |
Version: | 0.6.2 |
Built: | 2024-11-03 04:37:08 UTC |
Source: | https://github.com/jknowles/mertools |
Extract a data frame of a single row that represents the average observation in a merMod object. This function also allows the user to pass a series of conditioning argument to calculate the average observation conditional on other characteristics.
averageObs(merMod, varList = NULL, origData = NULL, ...)
averageObs(merMod, varList = NULL, origData = NULL, ...)
merMod |
a merMod object |
varList |
optional, a named list of conditions to subset the data on |
origData |
(default=NULL) a data frame containing the original, untransformed data used to call the model. This MUST be specified if the original variables used in formula function calls are NOT present as 'main effects'. |
... |
not used currently |
Each character and factor variable in the data.frame is assigned to the modal category and each numeric variable is collapsed to the mean. Currently if mode is a tie, returns a "." Uses the collapseFrame function.
a data frame with a single row for the average observation, but with full factor levels. See details for more.
Take an entire dataframe and summarize it in one row by using the mean and mode.
collapseFrame(data)
collapseFrame(data)
data |
a data.frame |
Each character and factor variable in the data.frame is assigned to the modal category and each numeric variable is collapsed to the mean. Currently if mode is a tie, returns a "."
a data frame with a single row
Draw is used to select a single observation out of an R object. Additional parameters allow the user to control how that observation is chosen in order to manipulate that observation later. This is a generic function with methods for a number of objects.
draw(object, type = c("random", "average"), varList = NULL, seed = NULL, ...) ## S3 method for class 'merMod' draw(object, type = c("random", "average"), varList = NULL, seed = NULL, ...)
draw(object, type = c("random", "average"), varList = NULL, seed = NULL, ...) ## S3 method for class 'merMod' draw(object, type = c("random", "average"), varList = NULL, seed = NULL, ...)
object |
the object to draw from |
type |
what kind of draw to make. Options include random or average |
varList |
a list specifying filters to subset the data by when making the draw |
seed |
numeric, optional argument to set seed for simulations, ignored if type="average" |
... |
additional arguments required by certain methods |
In cases of tie, ".", may be substituted for factors.
a data.frame with a single row representing the desired observation
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) # Random case draw(fm1, type = "random") # Average draw(fm1, type = "average") # Subset draw(fm1, type = "average", varList = list("Subject" = "308"))
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) # Random case draw(fm1, type = "random") # Average draw(fm1, type = "average") # Subset draw(fm1, type = "average", varList = list("Subject" = "308"))
expectedRank
calculates the expected rank and the percentile expected
rank of any random term in a merMod object. A simple ranking of the estimated
random effects (as produced by ranef
) is not satisfactory
because it ignores any amount of uncertainty.
expectedRank(merMod, groupFctr = NULL, term = NULL)
expectedRank(merMod, groupFctr = NULL, term = NULL)
merMod |
An object of class merMod |
groupFctr |
An optional character vector specifying the name(s) the grouping factor(s)
over which the random coefficient of interest varies. This is the
variable to the right of the pipe, |
term |
An optional character vector specifying the name(s) of the random coefficient of interest. This is the
variable to the left of the pipe, |
Inspired by Lingsma et al. (2010, see also Laird and Louis 1989), expectedRank sums the probability that each level of the grouping factor is greater than every other level of the grouping factor, similar to a two-sample t-test.
The formula for the expected rank is:
where is the standard normal distribution function,
is the estimated random effect and
is the posterior
variance of the estimated random effect. We add one to the sum so that the
minimum rank is one instead of zero so that in the case where there is no
overlap between the variances of the random effects (or if the variances are
zero), the expected rank equals the actual rank. The ranks are ordered such
that the winners have ranks that are greater than the losers.
The formula for the percentile expected rank is:
where is the number of grouping factor levels. The percentile
expected rank can be interpreted as the fraction of levels that score at or
below the given level.
NOTE: expectedRank
will only work under conditions that lme4::ranef
will work. One current example of when this is not the case is for
models when there are multiple terms specified per factor (e.g. uncorrelated random
coefficients for the same term, e.g.
lmer(Reaction ~ Days + (1 | Subject) + (0 + Days | Subject), data = sleepstudy)
)
A data.frame with the following five columns:
a character representing name of the grouping factor
a character representing the level of the grouping factor
a character representing the formula term for the group
effect estimate from lme4::ranef(, condVar=TRUE)
).
the posterior variance of the estimate random effect
(from lme4::ranef(, condVar=TRUE)
); named "term
"_var.
The expected rank.
The percentile expected rank.
Laird NM and Louis TA. Empirical Bayes Ranking Methods. Journal of Education Statistics. 1989;14(1)29-46. Available at http://www.jstor.org/stable/1164724.
Lingsma HF, Steyerberg EW, Eijkemans MJC, et al. Comparing and ranking hospitals based on outcome: results from The Netherlands Stroke Survey. QJM: An International Journal of Medicine. 2010;103(2):99-108. doi:10.1093/qjmed/hcp169
#For a one-level random intercept model m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) (m1.er <- expectedRank(m1)) #For a one-level random intercept model with multiple random terms m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) #ranked by the random slope on Days (m2.er1 <- expectedRank(m2, term="Days")) #ranked by the random intercept (m2.er2 <- expectedRank(m2, term="int")) #For a two-level model with random intercepts m3 <- lmer(y ~ service * dept + (1|s) + (1|d), InstEval) #Ranked by the random intercept on 's' (m3.er1 <- expectedRank(m3, groupFctr="s", term="Intercept"))
#For a one-level random intercept model m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) (m1.er <- expectedRank(m1)) #For a one-level random intercept model with multiple random terms m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) #ranked by the random slope on Days (m2.er1 <- expectedRank(m2, term="Days")) #ranked by the random intercept (m2.er2 <- expectedRank(m2, term="int")) #For a two-level model with random intercepts m3 <- lmer(y ~ service * dept + (1|s) + (1|d), InstEval) #Ranked by the random intercept on 's' (m3.er1 <- expectedRank(m3, groupFctr="s", term="Intercept"))
Find link function family
famlink(object, resp = object@resp)
famlink(object, resp = object@resp)
object |
a merMod object |
resp |
the response vector |
the link function and family
Display model fit summary of x or x like objects, fast
fastdisp(x, ...) ## S3 method for class 'merMod' fastdisp(x, ...) ## S3 method for class 'merModList' fastdisp(x, ...)
fastdisp(x, ...) ## S3 method for class 'merMod' fastdisp(x, ...) ## S3 method for class 'merModList' fastdisp(x, ...)
x |
a model object |
... |
additional arguments to pass to |
Faster than the implementation in the arm package because it avoids refitting
The time saving is only noticeable for large, time-consuming (g)lmer fits.
A printed summary of a x object
#Compare the time for displaying this modest model require(arm) m1 <- lmer(y ~ lectage + studage + (1|d) + (1|s), data=InstEval) system.time(display(m1)) system.time(fastdisp(m1))
#Compare the time for displaying this modest model require(arm) m1 <- lmer(y ~ lectage + studage + (1|d) + (1|s), data=InstEval) system.time(display(m1)) system.time(fastdisp(m1))
FEsim
simulates fixed effects from merMod object posterior distributionsSimulate fixed effects from merMod
FEsim
simulates fixed effects from merMod object posterior distributions
FEsim(merMod, n.sims = 200, oddsRatio = FALSE, seed = NULL)
FEsim(merMod, n.sims = 200, oddsRatio = FALSE, seed = NULL)
merMod |
a merMod object from the lme4 package |
n.sims |
number of simulations to use |
oddsRatio |
logical, should parameters be converted to odds ratios? |
seed |
numeric, optional argument to set seed for simulations |
Use the Gelman sim technique to build fixed effect estimates and confidence intervals. Uses the sim function in the arm package
a data frame with the following columns
term
Name of fixed term (intercept/coefficient)
mean
Mean of the simulations
median
Median of the simulations
sd
Standard deviation of the simulations, NA
if oddsRatio=TRUE
require(lme4) m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) fe2 <- FEsim(m2, 25) head(fe2)
require(lme4) m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) fe2 <- FEsim(m2, 25) head(fe2)
Extract all warning msgs from a merMod object
fetch.merMod.msgs(x)
fetch.merMod.msgs(x)
x |
a merMod object |
findFormFuns
used by averageObs to calculate proper
averagesThe purpose is to properly derive data for the average observation in the
data by being 'aware' of formulas that contain interactions and/or function
calls. For example, in the old behavior, if the formula contained a square
term specified as I(x^2)
, we were returning the mean of x(^2)
not the
square of mean(x).
findFormFuns(merMod, origData = NULL)
findFormFuns(merMod, origData = NULL)
merMod |
the merMod object from which to draw the average observation |
origData |
(default=NULL) a data frame containing the original, untransformed data used to call the model. This MUST be specified if the original variables used in formula function calls are NOT present as 'main effects'. |
a data frame with a single row for the average observation, but with full factor levels. See details for more.
Extract fixed-effects estimates for a merModList
## S3 method for class 'merModList' fixef(object, add.dropped = FALSE, ...)
## S3 method for class 'merModList' fixef(object, add.dropped = FALSE, ...)
object |
any fitted model object from which fixed effects estimates can be extracted. |
add.dropped |
for models with rank-deficient design
matrix, reconstitute the full-length parameter vector by
adding |
... |
optional additional arguments. Currently none are used in any methods. |
Extract the estimates of the fixed-effects parameters from a list of
fitted merMod
models. Takes the mean of the individual fixef
objects for each of the component models in the merModList
.
a named, numeric vector of fixed-effects estimates.
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) fixef(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) fixef(mod)
Identify if a merMod has weights
hasWeights(merMod)
hasWeights(merMod)
merMod |
the merMod object to test for weights |
TRUE or FALSE for whether the model has weights
A key example dataset used for examples in the HLM software manual. Included here for use in replicating HLM analyses in R.
hsb
hsb
A data frame with 7,185 observations on the following 8 variables.
schid
a numeric vector, 160 unique values
mathach
a numeric vector for the performance on a standardized math assessment
female
a numeric vector coded 0 for male and 1 for female
ses
a numeric measure of student socio-economic status
minority
a numeric vector coded 0 for white and 1 for non-white students
schtype
a numeric vector coded 0 for public and 1 for private schools
meanses
a numeric, the average SES for each school in the data set
size
a numeric for the number of students in the school
The data file used for this presentation is a subsample from the 1982 High School and Beyond Survey and is used extensively in Hierarchical Linear Models by Raudenbush and Bryk. It consists of 7,185 students nested in 160 schools.
Data made available by UCLA Institute for Digital Research and Education (IDRE) online: https://stats.oarc.ucla.edu/other/hlm/hlm-mlm/introduction-to-multilevel-modeling-using-hlm/
Stephen W. Raudenbush and Anthony S. Bryk (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). SAGE.
data(hsb) head(hsb)
data(hsb) head(hsb)
Calculate the intraclass correlation using mixed effect models
ICC(outcome, group, data, subset = NULL)
ICC(outcome, group, data, subset = NULL)
outcome |
a character representing the variable of the outcome |
group |
a character representing the name of the grouping term |
data |
a data.frame |
subset |
an optional subset |
a numeric for the intraclass correlation
data(sleepstudy) ICC(outcome = "Reaction", group = "Subject", data = sleepstudy)
data(sleepstudy) ICC(outcome = "Reaction", group = "Subject", data = sleepstudy)
Apply a multilevel model to a list of data frames
Apply a Bayesian multilevel model to a list of data frames
Apply a generalized linear multilevel model to a list of data frames
Apply a Bayesian generalized linear multilevel model to a list of data frames
lmerModList(formula, data, parallel = FALSE, ...) blmerModList(formula, data, parallel = FALSE, ...) glmerModList(formula, data, parallel = FALSE, ...) bglmerModList(formula, data, parallel = FALSE, ...)
lmerModList(formula, data, parallel = FALSE, ...) blmerModList(formula, data, parallel = FALSE, ...) glmerModList(formula, data, parallel = FALSE, ...) bglmerModList(formula, data, parallel = FALSE, ...)
formula |
a formula to pass through compatible with merMod |
data |
a list object with each element being a data.frame |
parallel |
logical, should the models be run in parallel? Default FALSE. If so, the 'future_lapply' function from the 'future.apply' package is used. See details. |
... |
additional arguments to pass to the estimating function |
Parallel computing is provided by the 'futures' package, and its extension the 'future.apply' package to provide the 'future_lapply' function for easy parallel computations on lists. To use this package, simply register a parallel backend using the 'plan()' function from 'futures' - an example is to use 'plan(multisession)'
a list of fitted merMod objects of class merModList
a merModList
a merModList
a merModList
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) summary(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) summary(mod)
Extract averaged fixed effect parameters across a list of merMod objects
modelFixedEff(modList, ...)
modelFixedEff(modList, ...)
modList |
an object of class merModList |
... |
additional arguments to pass to |
The Rubin correction for combining estimates and standard errors from Rubin (1987) is applied to adjust for the within and between imputation variances.
a data.frame of the averaged fixed effect parameters
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) modelFixedEff(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) modelFixedEff(mod)
Extract model information from a merMod
modelInfo(object)
modelInfo(object)
object |
a merMod object |
Simple summary information about the object, number of observations, number of grouping terms, AIC, and residual standard deviation
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) modelInfo(mod[[1]]) lapply(mod, modelInfo)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) modelInfo(mod[[1]]) lapply(mod, modelInfo)
Extract data.frame of random effect statistics from merMod List
modelRandEffStats(modList)
modelRandEffStats(modList)
modList |
a list of multilevel models |
a data.frame
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) modelRandEffStats(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) modelRandEffStats(mod)
Extract all warning msgs from a merMod object
plot_sim_error_chks( type = c("FE", "RE"), level = 0.95, stat = c("mean", "median"), sd = TRUE, sigmaScale = NULL, oddsRatio = FALSE, labs = FALSE, facet = TRUE )
plot_sim_error_chks( type = c("FE", "RE"), level = 0.95, stat = c("mean", "median"), sd = TRUE, sigmaScale = NULL, oddsRatio = FALSE, labs = FALSE, facet = TRUE )
type |
check a fixed or random effect |
level |
the width of the confidence interval |
stat |
a character value indicating the variable name in data of the midpoint of the estimated interval, e.g. "mean" or "median" |
sd |
a logical indicating whether or not to plot error bars around
the estimates (default is TRUE). Calculates the width of the error bars
based on |
sigmaScale |
a numeric value to divide the estimate and the standard deviation by in the case of doing an effect size calculation |
oddsRatio |
logical, should the parameters be converted to odds ratios before plotting |
labs |
logical, include the labels of the groups on the x-axis |
facet |
Accepts either logical ( |
Plot the simulated fixed effects on a ggplot2 chart
plotFEsim( data, level = 0.95, stat = "median", sd = TRUE, intercept = FALSE, sigmaScale = NULL, oddsRatio = FALSE )
plotFEsim( data, level = 0.95, stat = "median", sd = TRUE, intercept = FALSE, sigmaScale = NULL, oddsRatio = FALSE )
data |
a data.frame generated by |
level |
the width of the confidence interval |
stat |
a character value indicating the variable name in data of the midpoint of the estimated interval, e.g. "mean" or "median" |
sd |
logical, indicating whether or not to plot error bars around
the estimates (default is TRUE). Calculates the width of the error bars
based on |
intercept |
logical, should the intercept be included, default is FALSE |
sigmaScale |
a numeric value to divide the estimate and the standard deviation by in the case of doing an effect size calculation |
oddsRatio |
logical, should the parameters be converted to odds ratios before plotting |
a ggplot2 plot of the coefficient effects
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) (p1 <- plotFEsim(FEsim(fm1)))
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) (p1 <- plotFEsim(FEsim(fm1)))
Plot the simulated random effects on a ggplot2 chart. Points that
are distinguishable from zero (i.e. the confidence band based on level
does not cross the red line) are highlighted. Currently, the plots are ordered
according to the grouping factor.
plotREsim( data, level = 0.95, stat = "median", sd = TRUE, sigmaScale = NULL, oddsRatio = FALSE, labs = FALSE, facet = TRUE )
plotREsim( data, level = 0.95, stat = "median", sd = TRUE, sigmaScale = NULL, oddsRatio = FALSE, labs = FALSE, facet = TRUE )
data |
a data.frame generated by |
level |
the width of the confidence interval |
stat |
a character value indicating the variable name in data of the midpoint of the estimated interval, e.g. "mean" or "median" |
sd |
a logical indicating whether or not to plot error bars around
the estimates (default is TRUE). Calculates the width of the error bars
based on |
sigmaScale |
a numeric value to divide the estimate and the standard deviation by in the case of doing an effect size calculation |
oddsRatio |
logical, should the parameters be converted to odds ratios before plotting |
labs |
logical, include the labels of the groups on the x-axis |
facet |
Accepts either logical ( |
a ggplot2 plot of the coefficient effects
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) (p1 <- plotREsim(REsim(fm1))) #Plot just the random effects for the Days slope (p2 <- plotREsim(REsim(fm1), facet= list(groupFctr= "Subject", term= "Days")))
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) (p1 <- plotREsim(REsim(fm1))) #Plot just the random effects for the Days slope (p2 <- plotREsim(REsim(fm1), facet= list(groupFctr= "Subject", term= "Days")))
This function provides a way to capture model uncertainty in
predictions from multi-level models fit with lme4
. By drawing a sampling
distribution for the random and the fixed effects and then estimating the fitted
value across that distribution, it is possible to generate a prediction interval
for fitted values that includes all variation in the model except for variation
in the covariance parameters, theta. This is a much faster alternative than
bootstrapping for models fit to medium to large datasets.
predictInterval( merMod, newdata, which = c("full", "fixed", "random", "all"), level = 0.8, n.sims = 1000, stat = c("median", "mean"), type = c("linear.prediction", "probability"), include.resid.var = TRUE, returnSims = FALSE, seed = NULL, .parallel = FALSE, .paropts = NULL, fix.intercept.variance = FALSE, ignore.fixed.terms = NULL )
predictInterval( merMod, newdata, which = c("full", "fixed", "random", "all"), level = 0.8, n.sims = 1000, stat = c("median", "mean"), type = c("linear.prediction", "probability"), include.resid.var = TRUE, returnSims = FALSE, seed = NULL, .parallel = FALSE, .paropts = NULL, fix.intercept.variance = FALSE, ignore.fixed.terms = NULL )
merMod |
a merMod object from lme4 |
newdata |
a data.frame of new data to predict |
which |
a character specifying what to return, by default it returns the
full interval, but you can also select to return only the fixed variation or
the random component variation. If full is selected the resulting data.frame
will be |
level |
the width of the prediction interval |
n.sims |
number of simulation samples to construct |
stat |
take the median or mean of simulated intervals |
type |
type of prediction to develop |
include.resid.var |
logical, include or exclude the residual variance for linear models |
returnSims |
logical, should all n.sims simulations be returned? |
seed |
numeric, optional argument to set seed for simulations |
.parallel |
logical should parallel computation be used, default is FALSE |
.paropts |
-NOT USED: Caused issue #54- a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |
fix.intercept.variance |
logical; should the variance of the intercept term be adjusted downwards to roughly correct for its covariance with the random effects, as if all the random effects are intercept effects? |
ignore.fixed.terms |
a numeric or string vector of indexes or names of fixed effects which should be considered as fully known (zero variance). This can result in under-conservative intervals, but for models with random effects nested inside fixed effects, holding the fixed effects constant intervals may give intervals with closer to nominal coverage than the over-conservative intervals without this option, which ignore negative correlation between the outer (fixed) and inner (random) coefficients. |
To generate a prediction interval, the function first computes a simulated
distribution of all of the parameters in the model. For the random, or grouping,
effects, this is done by sampling from a multivariate normal distribution which
is defined by the BLUP estimate provided by ranef
and the associated
variance-covariance matrix for each observed level of each grouping terms. For
each grouping term, an array is build that has as many rows as there are levels
of the grouping factor, as many columns as there are predictors at that level
(e.g. an intercept and slope), and is stacked as high as there are number of
simulations. These arrays are then multiplied by the new data provided to the
function to produce a matrix of yhat values. The result is a matrix of the simulated
values of the linear predictor for each observation for each simulation. Each
grouping term has such a matrix for each observation. These values can be added
to get the estimate of the fitted value for the random effect terms, and this
can then be added to a matrix of simulated values for the fixed effect level to
come up with n.sims
number of possible yhat values for each observation.
The distribution of simulated values is cut according to the interval requested
by the function. The median or mean value as well as the upper and lower bounds
are then returned. These can be presented either on the linear predictor scale
or on the response scale using the link function in the merMod
.
a data.frame with three columns:
fit
The center of the distribution of predicted values as defined by
the stat
parameter.
lwr
The lower prediction interval bound corresponding to the quantile cut
defined in level
.
upr
The upper prediction interval bound corresponding to the quantile cut
defined in level
.
If returnSims = TRUE, then the individual simulations are attached to this
data.frame in the attribute sim.results
and are stored as a matrix.
merTools
includes the functions subBoot
and thetaExtract
to allow the user to estimate the variability in theta
from a larger
model by bootstrapping the model fit on a subset, to allow faster estimation.
m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) regFit <- predict(m1, newdata = sleepstudy[11, ]) # a single value is returned intFit <- predictInterval(m1, newdata = sleepstudy[11, ]) # bounded values # Can do glmer d1 <- cbpp d1$y <- d1$incidence / d1$size gm2 <- glmer(y ~ period + (1 | herd), family = binomial, data = d1, nAGQ = 9, weights = d1$size) regFit <- predict(gm2, newdata = d1[1:10, ]) # get probabilities regFit <- predict(gm2, newdata = d1[1:10, ], type = "response") intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "probability") intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "linear.prediction")
m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) regFit <- predict(m1, newdata = sleepstudy[11, ]) # a single value is returned intFit <- predictInterval(m1, newdata = sleepstudy[11, ]) # bounded values # Can do glmer d1 <- cbpp d1$y <- d1$incidence / d1$size gm2 <- glmer(y ~ period + (1 | herd), family = binomial, data = d1, nAGQ = 9, weights = d1$size) regFit <- predict(gm2, newdata = d1[1:10, ]) # get probabilities regFit <- predict(gm2, newdata = d1[1:10, ], type = "response") intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "probability") intFit <- predictInterval(gm2, newdata = d1[1:10, ], type = "linear.prediction")
Summarize a merMod list
## S3 method for class 'merModList' print(x, ...)
## S3 method for class 'merModList' print(x, ...)
x |
a modList of class merModList |
... |
additional arguments |
a summary object of model information
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) summary(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) summary(mod)
Print the summary of a merMod list
## S3 method for class 'summary.merModList' print(x, ...)
## S3 method for class 'summary.merModList' print(x, ...)
x |
a summary of amerModList object |
... |
additional arguments |
summary content printed to console
Select a random observation from the model frame of a merMod
randomObs(merMod, varList, seed = NULL)
randomObs(merMod, varList, seed = NULL)
merMod |
an object of class merMod |
varList |
optional, a named list of conditions to subset the data on |
seed |
numeric, optional argument to set seed for simulations |
Each factor variable in the data frame has all factor levels from the full model.frame stored so that the new data is compatible with predict.merMod
a data frame with a single row for a random observation, but with full factor levels. See details for more.
Extract random-effects estimates for a merModList
## S3 method for class 'merModList' ranef(object, ...)
## S3 method for class 'merModList' ranef(object, ...)
object |
an object of a class of fitted models with
random effects, typically a
|
... |
some methods for these generic functions require additional arguments. |
Extract the estimates of the random-effects parameters from a list of
fitted merMod
models. Takes the mean of the individual ranef
objects for each of the component models in the merModList
.
a named, numeric vector of random-effects estimates.
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) ranef(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) ranef(mod)
Extract the correlations between the slopes and the intercepts from a model
REcorrExtract(model)
REcorrExtract(model)
model |
an object that inherits from class merMod |
a numeric vector of the correlations among the effects
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) REcorrExtract(fm1)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) REcorrExtract(fm1)
Extracts random effect terms from an lme4 model
REextract(merMod)
REextract(merMod)
merMod |
a merMod object from the lme4 package |
a data frame with the following columns
The name of the grouping factor associated with the random effects
The level of the grouping factor associated with the random effects
One column per random effect, the name is derived from the merMod
One column per random effect, the name is derived from the merMod
m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) rfx <- REextract(m2) #Note the column names head(rfx)
m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) rfx <- REextract(m2) #Note the column names head(rfx)
REimpact
calculates the average predicted value for each row of a
new data frame across the distribution of expectedRank
for a
merMod object. This allows the user to make meaningful comparisons about the
influence of random effect terms on the scale of the response variable,
for user-defined inputs, and accounting for the variability in grouping terms.
REimpact(merMod, newdata, groupFctr = NULL, term = NULL, breaks = 3, ...)
REimpact(merMod, newdata, groupFctr = NULL, term = NULL, breaks = 3, ...)
merMod |
An object of class merMod |
newdata |
a data frame of observations to calculate group-level differences for |
groupFctr |
The name of the grouping factor over which the random
coefficient of interest varies. This is the variable to the right of the
pipe, |
term |
The name of the random coefficient of interest. This is the
variable to the left of the pipe, |
breaks |
an integer representing the number of bins to divide the group effects into, the default is 3; alternatively it can specify breaks from 0-100 for how to cut the expected rank distribution |
... |
additional arguments to pass to |
The function predicts the response at every level in the random effect term specified by the user. Then, the expected rank of each group level is binned to the number of bins specified by the user. Finally, a weighted mean of the fitted value for all observations in each bin of the expected ranks is calculated using the inverse of the variance as the weight – so that less precise estimates are downweighted in the calculation of the mean for the bin. Finally, a standard error for the bin mean is calculated.
This function uses the formula for variance of a weighted mean recommended by Cochran (1977).
A data.frame with all unique combinations of the number of cases, rows in the newdata element, and number of bins:
The row number of the observation from newdata.
The ranking bin for the expected rank, the higher the bin number, the greater the expected rank of the groups in that bin.
The weighted mean of the fitted values for case i in bin k
The standard deviation of the mean of the fitted values for case i in bin k.
The number of group effects contained in that bin.
Gatz, DF and Smith, L. The Standard Error of a Weighted Mean Concentration. I. Bootstrapping vs other methods. Atmospheric Environment. 1995;11(2)1185-1193. Available at https://www.sciencedirect.com/science/article/pii/135223109400210C
Cochran, WG. 1977. Sampling Techniques (3rd Edition). Wiley, New York.
#For a one-level random intercept model m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) m1.er <- REimpact(m1, newdata = sleepstudy[1, ], breaks = 2) #For a one-level random intercept model with multiple random terms m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) #ranked by the random slope on Days m2.er1 <- REimpact(m2, newdata = sleepstudy[1, ], groupFctr = "Subject", term="Days") #ranked by the random intercept m2.er2 <- REimpact(m2, newdata = sleepstudy[1, ], groupFctr = "Subject", term="int") # You can also pass additional arguments to predictInterval through REimpact g1 <- lmer(y ~ lectage + studage + (1|d) + (1|s), data=InstEval) zed <- REimpact(g1, newdata = InstEval[9:12, ], groupFctr = "d", n.sims = 50, include.resid.var = TRUE) zed2 <- REimpact(g1, newdata = InstEval[9:12, ], groupFctr = "s", n.sims = 50, include.resid.var = TRUE) zed3 <- REimpact(g1, newdata = InstEval[9:12, ], groupFctr = "d", breaks = 5, n.sims = 50, include.resid.var = TRUE)
#For a one-level random intercept model m1 <- lmer(Reaction ~ Days + (1 | Subject), sleepstudy) m1.er <- REimpact(m1, newdata = sleepstudy[1, ], breaks = 2) #For a one-level random intercept model with multiple random terms m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) #ranked by the random slope on Days m2.er1 <- REimpact(m2, newdata = sleepstudy[1, ], groupFctr = "Subject", term="Days") #ranked by the random intercept m2.er2 <- REimpact(m2, newdata = sleepstudy[1, ], groupFctr = "Subject", term="int") # You can also pass additional arguments to predictInterval through REimpact g1 <- lmer(y ~ lectage + studage + (1|d) + (1|s), data=InstEval) zed <- REimpact(g1, newdata = InstEval[9:12, ], groupFctr = "d", n.sims = 50, include.resid.var = TRUE) zed2 <- REimpact(g1, newdata = InstEval[9:12, ], groupFctr = "s", n.sims = 50, include.resid.var = TRUE) zed3 <- REimpact(g1, newdata = InstEval[9:12, ], groupFctr = "d", breaks = 5, n.sims = 50, include.resid.var = TRUE)
REmargins
calculates the average predicted value for each row of a
new data frame across the distribution of expectedRank
for a
merMod object. This allows the user to make meaningful comparisons about the
influence of random effect terms on the scale of the response variable,
for user-defined inputs, and accounting for the variability in grouping terms.
REmargins( merMod, newdata = NULL, groupFctr = NULL, term = NULL, breaks = 4, .parallel = FALSE, ... )
REmargins( merMod, newdata = NULL, groupFctr = NULL, term = NULL, breaks = 4, .parallel = FALSE, ... )
merMod |
An object of class merMod |
newdata |
a data frame of observations to calculate group-level differences for |
groupFctr |
The name of the grouping factor over which the random
coefficient of interest varies. This is the variable to the right of the
pipe, |
term |
The name of the random coefficient of interest. This is the
variable to the left of the pipe, |
breaks |
an integer representing the number of bins to divide the group effects into, the default is 3. |
.parallel |
logical should parallel computation be used, default is TRUE |
... |
additional arguments to pass to |
The function simulates the
The function predicts the response at every level in the random effect term specified by the user. Then, the expected rank of each group level is binned to the number of bins specified by the user. Finally, a weighted mean of the fitted value for all observations in each bin of the expected ranks is calculated using the inverse of the variance as the weight – so that less precise estimates are downweighted in the calculation of the mean for the bin. Finally, a standard error for the bin mean is calculated.
A data.frame with all unique combinations of the number of cases, rows in the newdata element:
The columns of the original data taken from newdata
The row number of the observation from newdata. Each row in newdata will be repeated for all unique levels of the grouping_var, term, and breaks.
The grouping variable the random effect is being marginalized over.
The term for the grouping variable the random effect is being marginalized over.
The ntile of the effect size for grouping_var
and term
The original grouping value for this case
The predicted value from predictInterval
for this case simulated
at the Nth ntile of the expected rank distribution of grouping_var
and term
The upper bound of the predicted value.
The lower bound of the predicted value.
For each grouping term in newdata the predicted value is decomposed into its fit components via predictInterval and these are all returned here
The upper bound for the effect of each grouping term
The lower bound for the effect of each grouping term
The predicted fit with all the grouping terms set to 0 (average)
The upper bound fit with all the grouping terms set to 0 (average)
The lower bound fit with all the grouping terms set to 0 (average)
Gatz, DF and Smith, L. The Standard Error of a Weighted Mean Concentration. I. Bootstrapping vs other methods. Atmospheric Environment. 1995;11(2)1185-1193. Available at https://www.sciencedirect.com/science/article/pii/135223109400210C
Cochran, WG. 1977. Sampling Techniques (3rd Edition). Wiley, New York.
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) mfx <- REmargins(merMod = fm1, newdata = sleepstudy[1:10,]) # You can also pass additional arguments to predictInterval through REimpact g1 <- lmer(y ~ lectage + studage + (1|d) + (1|s), data=InstEval) margin_df <- REmargins(g1, newdata = InstEval[20:25, ], groupFctr = c("s"), breaks = 4) margin_df <- REmargins(g1, newdata = InstEval[20:25, ], groupFctr = c("d"), breaks = 3)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) mfx <- REmargins(merMod = fm1, newdata = sleepstudy[1:10,]) # You can also pass additional arguments to predictInterval through REimpact g1 <- lmer(y ~ lectage + studage + (1|d) + (1|s), data=InstEval) margin_df <- REmargins(g1, newdata = InstEval[20:25, ], groupFctr = c("s"), breaks = 4) margin_df <- REmargins(g1, newdata = InstEval[20:25, ], groupFctr = c("d"), breaks = 3)
For a user specified quantile (or quantiles) of the random effect terms in a merMod object. This allows the user to easily identify the observation associated with the nth percentile effect.
REquantile(merMod, quantile, groupFctr, term = "(Intercept)")
REquantile(merMod, quantile, groupFctr, term = "(Intercept)")
merMod |
a merMod object with one or more random effect levels |
quantile |
a numeric vector with values between 0 and 100 for quantiles |
groupFctr |
a character of the name of the random effect grouping factor to extract quantiles from |
term |
a character of the random effect to extract for the grouping factor specified. Default is the intercept. |
a vector of the level of the random effect grouping term that corresponds to each quantile
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) REquantile(fm1, quantile = 0.25, groupFctr = "Subject") REquantile(fm1, quantile = 0.25, groupFctr = "Subject", term = "Days")
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) REquantile(fm1, quantile = 0.25, groupFctr = "Subject") REquantile(fm1, quantile = 0.25, groupFctr = "Subject", term = "Days")
Extract the standard deviation of the random effects from a merMod object
REsdExtract(model)
REsdExtract(model)
model |
an object that inherits from class merMod |
a numeric vector for standard deviations of the random effects
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) REsdExtract(fm1)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) REsdExtract(fm1)
REsim
simulates random effects from merMod object posterior distributionsSimulate random effects from merMod
REsim
simulates random effects from merMod object posterior distributions
REsim(merMod, n.sims = 200, oddsRatio = FALSE, seed = NULL)
REsim(merMod, n.sims = 200, oddsRatio = FALSE, seed = NULL)
merMod |
a merMod object from the lme4 package |
n.sims |
number of simulations to use |
oddsRatio |
logical, should parameters be converted to odds ratios? |
seed |
numeric, optional argument to set seed for simulations |
Use the Gelman sim technique to build empirical Bayes estimates. Uses the sim function in the arm package
a data frame with the following columns
groupFctr
Name of the grouping factor
groupID
Level of the grouping factor
term
Name of random term (intercept/coefficient)
mean
Mean of the simulations
median
Median of the simulations
sd
Standard deviation of the simulations, NA
if oddsRatio=TRUE
require(lme4) m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) re2 <- REsim(m2, 25) head(re2)
require(lme4) m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) re2 <- REsim(m2, 25) head(re2)
Extract the Root Mean Squared Error for a lmerMod object
RMSE.merMod(merMod, scale = FALSE)
RMSE.merMod(merMod, scale = FALSE)
merMod |
a lmerMod object from the lme4 package |
scale |
logical, should the result be returned on the scale of response variable standard deviations? |
a numeric which represents the RMSE
require(lme4) m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) RMSE.merMod(m2)
require(lme4) m2 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) RMSE.merMod(m2)
Strips out transformations from variable names in data frames
sanitizeNames(data)
sanitizeNames(data)
data |
a data.frame |
a data frame with variable names cleaned to remove factor() construction
Set up parallel environment
setup_parallel()
setup_parallel()
Nothing
shinyMer
launches a shiny app that allows you to interactively
explore an estimated merMod using functions from merTools
.
shinyMer(merMod, simData = NULL, pos = 1)
shinyMer(merMod, simData = NULL, pos = 1)
merMod |
An object of class "merMod". |
simData |
A data.frame to make predictions from (optional). If NULL, then the user can only make predictions using the data in the frame slot of the merMod object. |
pos |
The position of the environment to export function arguments to. Defaults to 1, the global environment, to allow shiny to run. |
A shiny app
Randomly reorder a dataframe by row
shuffle(data)
shuffle(data)
data |
a data frame |
a data frame of the same dimensions with the rows reordered randomly
Strips attributes off of a data frame that come with a merMod model.frame
stripAttributes(data)
stripAttributes(data)
data |
a data.frame |
a data frame with variable names cleaned to remove all attributes except for names, row.names, and class
Bootstrap a subset of an lme4 model
subBoot(merMod, n = NULL, FUN, R = 100, seed = NULL, warn = FALSE)
subBoot(merMod, n = NULL, FUN, R = 100, seed = NULL, warn = FALSE)
merMod |
a valid merMod object |
n |
the number of rows to sample from the original data in the merMod object, by default will resample the entire model frame |
FUN |
the function to apply to each bootstrapped model |
R |
the number of bootstrap replicates, default is 100 |
seed |
numeric, optional argument to set seed for simulations |
warn |
logical, if TRUE, warnings from lmer will be issued, otherwise they will be suppressed default is FALSE |
This function allows users to estimate parameters of a large merMod object using bootstraps on a subset of the data.
a data.frame of parameters extracted from each of the R replications. The original values are appended to the top of the matrix.
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)) resultMatrix <- subBoot(fm1, n = 160, FUN = thetaExtract, R = 20)
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)) resultMatrix <- subBoot(fm1, n = 160, FUN = thetaExtract, R = 20)
Split a data.frame by elements in a list
subsetList(data, list)
subsetList(data, list)
data |
a data.frame |
list |
a named list of splitting conditions |
a data frame with values that match the conditions in the list
Title
## S3 method for class 'mm' sum( object, correlation = (p <= getOption("lme4.summary.cor.max")), use.hessian = NULL, ... )
## S3 method for class 'mm' sum( object, correlation = (p <= getOption("lme4.summary.cor.max")), use.hessian = NULL, ... )
object |
a merMod object |
correlation |
optional p value |
use.hessian |
logical |
... |
additional arguments to pass through |
a summary of the object
Print the results of a merMod list
## S3 method for class 'merModList' summary(object, ...)
## S3 method for class 'merModList' summary(object, ...)
object |
a modList of class merModList |
... |
additional arguments |
summary content printed to console
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) print(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) print(mod)
Create a factor variable and include unobserved levels for compatibility with model prediction functions
superFactor(x, fullLev)
superFactor(x, fullLev)
x |
a vector to be converted to a factor |
fullLev |
a vector of factor levels to be assigned to x |
a factor variable with all observed levels of x and all levels of x in fullLev
regularFactor <- c("A", "B", "C") regularFactor <- factor(regularFactor) levels(regularFactor) # Now make it super newLevs <- c("D", "E", "F") regularFactor <- superFactor(regularFactor, fullLev = newLevs) levels(regularFactor) # now super
regularFactor <- c("A", "B", "C") regularFactor <- factor(regularFactor) levels(regularFactor) # Now make it super newLevs <- c("D", "E", "F") regularFactor <- superFactor(regularFactor, fullLev = newLevs) levels(regularFactor) # now super
A convenience function that returns the theta parameters for a
merMod
object.
thetaExtract(merMod)
thetaExtract(merMod)
merMod |
a valid merMod object |
a vector of the covariance, theta, parameters from a merMod
merMod
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)) thetaExtract(fm1) #(a numeric vector of the covariance parameters)
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)) thetaExtract(fm1) #(a numeric vector of the covariance parameters)
Extract the variances and correlations for random effects from a merMod list
## S3 method for class 'merModList' VarCorr(x, sigma = 1, rdig = 3L, ...)
## S3 method for class 'merModList' VarCorr(x, sigma = 1, rdig = 3L, ...)
x |
for |
sigma |
an optional numeric value used as a multiplier for the standard deviations. |
rdig |
the number of digits to round to, integer |
... |
Ignored for the |
a list with two elements "stddev" and "correlation" for the standard deviations and correlations averaged across models in the list
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) VarCorr(mod)
sim_list <- replicate(n = 10, expr = sleepstudy[sample(row.names(sleepstudy), 180),], simplify=FALSE) fml <- "Reaction ~ Days + (Days | Subject)" mod <- lmerModList(fml, data = sim_list) VarCorr(mod)
Creates a new data.frame with copies of the original observation, each assigned to a different user-specified value of a variable. Allows the user to look at the effect on predicted values of changing either a single variable or multiple variables.
wiggle(data, varlist, valueslist)
wiggle(data, varlist, valueslist)
data |
a data frame with one or more observations to be reassigned |
varlist |
a character vector specifying the name(s) of the variable to adjust |
valueslist |
a list of vectors with the values to assign to var |
If the variable specified is a factor, then wiggle will return it as a character.
a data.frame
with each row assigned to the one of the new variable combinations.
All variable combinations are returned, eg wiggling two variables with 3 and 4 variables
respectively will return a new dataset with 3 * 4 = 12
observations.
data(iris) wiggle(iris[3,], varlist = "Sepal.Width", valueslist = list(c(1, 2, 3, 5))) wiggle(iris[3:5,], "Sepal.Width", valueslist = list(c(1, 2, 3, 5))) wiggle(iris[3,], c("Sepal.Width", "Petal.Length"), list(c(1,2,3,5), c(3,5,6))) wiggle(iris[3:5,], c("Sepal.Width", "Petal.Length"), list(c(1,2,3,5), c(3,5,6)))
data(iris) wiggle(iris[3,], varlist = "Sepal.Width", valueslist = list(c(1, 2, 3, 5))) wiggle(iris[3:5,], "Sepal.Width", valueslist = list(c(1, 2, 3, 5))) wiggle(iris[3,], c("Sepal.Width", "Petal.Length"), list(c(1,2,3,5), c(3,5,6))) wiggle(iris[3:5,], c("Sepal.Width", "Petal.Length"), list(c(1,2,3,5), c(3,5,6)))