GAMsetup {mgcv} | R Documentation |
Sets up design matrix X, penalty matrices S_i and linear equality constraint matrix C for a GAM defined in terms of
penalized regression splines. Various other information characterising the bases used is also returned.
The output is such that the model can be fitted and
smoothing parameters estimated by the method of Wood (2000) as implemented in routine
mgcv()
. This is usually called by gam
.
GAMsetup(G)
G |
is the single argument to this function: it is a list
containing at least the elements listed below:
|
A list H
, containing the elements of G
(the input list) plus the
following:
X |
the full design matrix. |
S |
If fit.method is "magic" then this is a one
dimensional array containing the non-zero elements of the
penalty matrices. Let start[k+1]<-start[k]+H$df[1:(k-1)]^2 and
start[1]<-0 . Then penalty matrix k has
H$S[start[k]+i+H$df[i]*(j-1) on its ith row and jth column.
To get the kth full penalty matrix the matrix so obtained would be
inserted into a full matrix of zeroes with it's 1,1 element at H$off[k],H$off[k] .
If fit.method is "mgcv" then this is a list of penalty
matrices, again stored as smallest matrices including all the non-zero
elements of the penalty matrix concerned. |
off |
is an array of offsets, used to facilitate efficient storage of the penalty
matrices and to indicate where in the overall parameter vector the parameters of the ith
spline reside (e.g. first parameter of ith spline is at p[off[i]+1] ). |
C |
a matrix defining the linear equality constraints on the parameters used to define the the model (i.e. C in Cp=0). |
UZ |
Array containing matrices, which transform from a t.p.r.s. basis to the
equivalent t.p.s. basis (for t.p.r.s. terms only). The packing method
is as follows: set start[1]<-0 and
start[k+1]<-start[k]+(M[k]+n)*tp.bs[k] where n is number
of data, M[k] is penalty null space dimension and
tp.bs[k] is zero for a cubic regression spline and the basis
dimension for a t.p.r.s. Then element i,j of the UZ matrix for
model term k is:UZ[start[k]+i+(j=1)*(M[k]+n)] . |
Xu |
Set of unique covariate combinations for each term. The packing method
is as follows: set start[1]<-0 and
start[k+1]<-start[k]+(xu.length[k])*tp.dim[k] where xu.length[k] is number
of unique covariate combinations and tp.dim[k] is zero for a
cubic regression spline
and the dimension of the smooth (i.e. number of covariates it is a
function of) for a t.p.r.s. Then element i,j of the Xu matrix for
model term k is:Xu[start[k]+i+(j=1)*(xu.length[k])] . |
xu.length |
Number of unique covariate combinations for each t.p.r.s. term. |
covariate.shift |
All covariates are centred around zero before bases are constructed - this is an array of the applied shifts. |
xp |
matrix whose rows contain the covariate values corresponding to the parameters of each cubic regression spline - the cubic regression splines are parameterized using their y- values at a series of x values - these vectors contain those x values! Note that these will be covariate shifted. |
rank |
an array giving the ranks of the penalty matrices. |
m.free |
this is only for use with "magic" and is the number
of smoothing parameters that must be estimated. |
m.off |
again only for "magic" : the offests for the penalty
matrices for the penalties with smoothing parameters that must be
estimated. |
Simon N. Wood simon@stats.gla.ac.uk
Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428
Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114
http://www.stats.gla.ac.uk/~simon/
set.seed(0) n<-100 # number of observations to simulate x <- runif(5 * n, 0, 1) # simulate covariates x <- array(x, dim = c(5, n)) # put into array for passing to GAMsetup pi <- asin(1) * 2 # begin simulating some data y <- 2 * sin(pi * x[2, ]) y <- y + exp(2 * x[3, ]) - 3.75887 y <- y + 0.2 * x[4, ]^11 * (10 * (1 - x[4, ]))^6 + 10 * (10 * x[4, ])^3 * (1 - x[4, ])^10 - 1.396 sig2<- -1 # set magnitude of variance e <- rnorm(n, 0, sqrt(abs(sig2))) y <- y + e # simulated data w <- matrix(1, n, 1) # weight matrix par(mfrow = c(2, 2)) # scatter plots of simulated data plot(x[2, ], y);plot(x[3, ], y);plot(x[4, ], y);plot(x[5, ], y) x[1,]<-1 # create list for passing to GAMsetup.... G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1), s.type=c(0,0,0,0),by=0,by.exists=c(FALSE,FALSE,FALSE,FALSE), p.order=c(0,0,0,0),x = x,n.knots=rep(0,4),fit.method="mgcv") H <- GAMsetup(G) H$y <- y # add data to H H$sig2 <- sig2 # add variance (signalling GCV use in this case) to H H$w <- w # add weights to H H$sp<-array(-1,H$m) H$fix<-array(FALSE,H$m) H$conv.tol<-1e-6;H$max.half<-15 H$min.edf<-5;H$fixed.sp<-0 H <- mgcv(H) # select smoothing parameters and fit model