Re-sampling Techniques in Count Data Regression Models

Abstract Modeling count variables is a common task in many application areas such as economics, social sciences, and medicine. The classical Poisson regression model for count data is often used and it is limited in these disciplines since count data sets typically exhibit overdispersion, so negative binomial regression can be used. We use a jackknifeafterbootstrap procedure to assess the error in the bootstrap estimated parameters. The method is illustrated through two real examples. The results suggest that the jackknifeafterbootstrap method provides a reliable alternative to traditional methods particularly in small to moderate samples.

When the response variable (y) is a count variable and we fit the linear regression model using ordinary least squares (OLS) method, then we may have several problems.First, the usual assumption that the errors are normally distributed fails, since (y) is typically non normal.Second, OLS estimators also assume a homoscedastic error structure, this is problematic if (y) is a count variable.Third, if the errors are really hetroscedastic, the standard error estimates produced by OLS are biased (Demairs, 2004).Jackknife and bootstrap re-sampling techniques are designed to estimate standard errors, bias, confidence intervals, and prediction error.The bootstrap is a re-sampling method that draws a large collection of samples from the original data.It is used to select the observation randomly with replacement from the original data sample, and jackknife is generated by sequentially deleting single datum from the original sample (Efron and Tibshirani, 1993).We use a jackknife-afterbootstrap procedure to assess the error in the bootstrap estimated parameters.The method is illustrated through two real examples.In

2-Poisson Regression Model
Poisson Regression Model (PRM) is a technique which allows to model response variable that describs count data.It is often applied to study the occurrence of small number of counts as a function of a set of explanatory variables (Cameron & Trivedi, 1998).The PRM relates the probability function of a response variable (y) to a vector of explanatory variables (x) (Winkelmann , 2008) , more formally , the PRM assumes that the response variable (y) drawn from a Poisson distribution with mean and variance ( µ ) .The p.d.f of (y) is : , which is both its mean and its variance , that is (known as equidispersion ) (Agersti,2006) : With PRM the mean µ is explained in terms of explanatory variables (x) via an appropriate link function.The popular choice for the link function is the log link, that is: Where ( β ) is a ( 1 k * ) vector of parameters, and ( x ) is a ( 1 k * ) vector of explanatory variables.Taking the exponential of ( β x ) forces ( µ ) to be positive which is necessary since count only ( Re-sampling Techniques in…………….Jong & Heller, 2008).So, the multiple PRM can be written as:

Freese, 2001) (De
or equivalently: The parameters (β ) can be estimated by using the maximum likelihood method (m.l.).The standard error of the estimated parameters is:

3-Overdispersion and Underdispersion
The key assumption of the PRM is that the conditional mean equals the conditional variance i.e.
.In many applications this assumption has not met.If ,then we speak about overdispersion, respectively underdispersion.The PRM does not allow for overdispersion (Cameron & Trivedi , 1998)

4-Negative Binomial Regression Model
The negative binomial regression model (NBRM) is the most commonly used alternative to the (PRM) when it has overdispersion problem (Winkelmann, 2008).
Under the Poisson distribution, the mean, i µ , is assumed to be constant or homogeneous within the class .By assuming the specific distribution for ( i µ ) to be a gamma with mean and variance where the mean is θ = ) y ( E i and the variance is , This is called the negative binomial I . From regression analysis of count data the most common implementation of the negative binomial is called negative binomial II model (NB 2 ).By letting and variance . If a equals zero , then the mean and variance will be equal , resulting the distribution to be a poisson .If , the variance will exceed the mean and the distribution allows for overdispersion as well .

5-Jackknife after Bootstrap Procedure
The use of the bootstrap and the jackknife re-sampling methods is gradually increasing nowadays, due to increasing computer power.The basic idea of bootstrapping is to generate a large number of samples by randomly drawing observations with replacement from the original data set , and to recalculate a statistic for each bootstrap sample , whereas the jackknife is generated by sequentially deleting single datum from the original sample (Efron & Tibshirani , 1993) .Jackknife After Bootstrap (JAB) method was proposed by ( Efron,1992) to investigate the effect of a single observation in bootstrap, where Efron pointed out that the bootstrap estimates have two distinct sources of contain data points i X , and there are i B such samples , then :

6-Analytical Examples
In order to use the PRM and NBRM we deal with two real data sets .All the results done using S-plus 6.1 program.

6-1-Example 1
In this example we fit the PRM.The response variable represents the number of dead cocks.Three explanatory variables are considered , they are : age of the cocks in days , the quantity of the feed in kilogram (kg) and the temperature (AL-Suliaman, 1995).The sample size was (62), the results are shown in table (1).

6-2-Example 2
In this example we use gala data from faraway package (Faraway , 2006) The data describe the relationship between the number of plant species and several geographic variables is of interest, where n is 30.Species: The number of plant species found on the island, Endemics: The number of endemic species, Area: The area of the island (km $ ^ 2$), Elevation: The highest elevation of the island (m), Nearest: The distance from the nearest island (km), Scrnz: The distance from Santa Cruz island (km), and Adjacent: The area of the adjacent island (km  The bootstrap (B) and the jackknife-after-bootstrap (JAB) results are shown in table (4).

7-Conclusion
As a result, we conclude that the

(
Researchers spend much of their time counting things, numbers of symptoms, placements, and so on.Count variables indicate the number of times a particular event occurs to each case such as number of hospital visits per year , number of divorces per city (Orme & Combs-Orme, 2009).Count variable is an integer and can range from 0 through ∞ + .Two common distributions are used often to model the count variable; they are Poisson and negative binomial distributions.
and 4 we described the Poisson regression model, negative binomial regression model, and overdispersion, respectively.In section 5, the use of the jackknife-after-bootstrap was discussed.The analytical examples are given in section 6 where two real data sets were used.Finally, section 7 shows the conclusions.
The Poisson distribution is unimodal and skewed to the right over the possible values 0,1,2,… .It has a single parameter 0 > µ be shown that the marginal distribution of i y follows a negative binomial distribution with p.d.f : Figure (1) shows the observations with large influence on ) ) B ( ( e .S β, where Figure (1): JAB influence of the number of dead cocks parameters.
and NBRM parameters.The results suggest that the bootstrap re-sampling provides a reliable alternative to traditional methods and JAB procedure provides a good measure of diagnosis for bootstrap.We see that from figure (1) that cases (17 and 25) have large influence on both intercept and temp., whereas the case (47) has large influence on the age and feed.The cases (35) and (62) have large influence on age and feed respectively.From figure(2), no case has large influence on the Area and Adjacent.One case has influence, 27, 19, 15, and 30 on the Endemics, , and Scruz, respectively.Finally, the cases (27 and 18) have influence on the intercept.‫ﺍﻹﺩﺍﺭﺓ‬ ‫ﻤﻨﺸﻭﺭﺓ‬ ‫ﻏﻴﺭ‬ ، ‫ﺍﻟﻤﻭﺼل‬ ‫ﺠﺎﻤﻌﺔ‬ ، ‫ﻭﺍﻻﻗﺘﺼﺎﺩ‬ .

Table ( 1
):The results of PRM of the number of dead cocks . 2

] 23 Iraqi Journal of Statistical Sciences (22)2012 [ fit
(3)n this example we can't the PRM because we have overdispersion problem.So the best alternative model is NBRM, table(3)shows the results of NBRM.