Sampling variation refers to the uncertainty created by the sampling process. If your data is just a sample, another data set will produce a somewhat different result. The question we seek to answer is, “How much is *somewhat*?”

The method used here is called **bootstrapping**. Bootstrapping relies on a simulation of the sampling process called **resampling**: sampling from the sample.

To illustrate, consider estimating the power-law exponent on the relationship between species number and land area.

```
Species = fetchData("http://www.math.smith.edu/~bbaumer/mth247/SpeciesArea.csv")
plotPoints(Species ~ Area, data=Species)
```

The model we seek to fit is \[ \mathrm{Species} = a \mathrm{Area}^b \;. \] Judging from the shape of the graph, Species goes roughly as the square-root of area, so use \( b=0.5 \) as the initial guess for the nonlinear parameter. Finding the parameters is a straightforward application of `fitModel`

:

```
SpeciesModel = fitModel( Species ~ a*Area^b,
data=Species,
start=list(b=.5) )
SpeciesModel
```

```
## function (Area, b = 0.256503447940427, a = 4.10981633631802)
## a * Area^b
## <environment: 0x103b576e0>
## attr(,"coefficients")
## b a
## 0.2565 4.1098
```

```
coef(SpeciesModel) # if we just want the coeffiecients
```

```
## b a
## 0.2565 4.1098
```

It looks like a fourth-root model is appropriate. But how precise is the estimate of the coefficient \( b \)?

We begin by repeating what is above, but wrapping it in curly braces. This tells R to treat the entire code block as a single expression. We'll make use of this in the next step.

```
{ SpeciesModel = fitModel( Species ~ a*Area^b,
data=Species,
start=list(b=.5) )
coef(SpeciesModel)
}
```

```
## b a
## 0.2565 4.1098
```

Creating a resampling distribution is straightforward once we have gotten this far. Edit the code above as follows:

- Wrap the data frame with
`resample`

. - Put
`do(500)*`

in front of the expression, and - store the results so that you can access them later:

```
ResamplingDist = do(500)*
{ SpeciesModel = fitModel( Species ~ a*Area^b,
data=resample(Species),
start=list(b=.5) )
coef(SpeciesModel)
}
```

Now we have a set of 500 resampling realizations. We begin by looking at the variability in our estimated power graphically.

```
densityplot( ~ b, data=ResamplingDist ) # we could also have done a histogram
```

```
plotPoints( b ~ a, data=ResamplingDist)
```

Notice that the coefficient estimates are not independent of each other.

One common way to quantify sampling variability is with the **standard error**. The
standard error is simply the standard deviation of the sampling distribution, which we can estimate using our resampling distribution.

```
sd( ~ b, data=ResamplingDist )
```

```
## [1] 0.02109
```

Another standard way to present the precision is with a confidence interval.

For instance, to find the 95% confidence interval, use `confint`

.

```
confint(ResamplingDist)
```

```
## name lower upper
## 1 b 0.2114 0.2943
## 2 a 2.4933 6.0975
```

A 95% confidence (when based on normal distribution theory) stretches roughly 2 standard errors in each direction from the estimate.

```
se <- sd( ~b, data=ResamplingDist )
coef(SpeciesModel)['b'] + c(-1,1) * 2 * se
```

```
## [1] 0.2327 0.3171
```

Alternatively, we could take the middle 95% of the samping distribution as our interval.

```
qdata(c(0.025,0.975), b, data=ResamplingDist)
```

```
## 2.5% 97.5%
## 0.1886 0.2838
```

Suppose that we were interested in predicting the number of species in a land with
area \( 2 \times 10^6 \) km^{2.} This would be an extrapolation, and extrapolations are generally unreliable, but here goes …

The procedure is essentially the same as before, but now we compute an estimated response from our model rather that estimated coefficients.

First we do this with our original data to get a **point estimate**, a single number that
is our best guess but does not reveal anything about its precision.

```
{ SpeciesModel = fitModel( Species ~ a*Area^b,
data=Species,
start=list(b=.5) )
SpeciesModel( Area=2e6 )
}
```

```
## [1] 169.8
```

Now we would like to get an interval estimate via resampling.

```
ResamplingDist2 = do(500)*
{ SpeciesModel = fitModel( Species ~ a*Area^b,
data=resample(Species),
start=list(b=.5) )
SpeciesModel(2e6)
}
```

Now we have a set of 500 resampling realizations.

```
densityplot( ~ result, data=ResamplingDist2)
```

To find the 95% confidence interval, use `confint`

.

```
confint(ResamplingDist2)
```

```
## name lower upper
## 1 result 135.9 198.3
```

Fit a model to data of your own choosing. Give point and interval estimates for the parameters of your model and for an estimated response produced by your model. You might like to return to Eyeballing Parameters for some exmple data sets and models.