STAT 516 Lec 12 (accessible)

Logistic regression

Author

Karl Gregory

Published

April 20, 2026

Programming task data from Kutner et al. ()

Twenty-five people succeeded or failed at a programming task.

Months of programming experience was recorded for each person.

experience <- c(14,29,6,25,18,4,18,12,22,6,30,11,30,5,20,13,9,32,24,13,19,4,28,22,8)
success <- c(0,0,0,1,1,0,0,0,1,0,1,0,1,0,1,0,0,1,0,1,0,0,1,1,1)

Can we predict probability of success based on experience?


plot(success ~ experience)

A scatterplot

Logistic regression model

Assume YiBernoulli(πi),log(πi1πi)=β0+β1xi, for i=1,,n, where

  • Yi is the response for observation i.
  • xi is the value of a predictor/covariate/explanatory variable for obs i.
  • πi is the probability of “success” for observation i.
  • β0 and β1 are slope and intercept parameters.
  • πi/(1πi) is the odds of “success” for obs i.
  • log(πi/(1πi)) is the log-odds for obs i.

Logistic regression assumes the log-odds are linear in the predictor.

Odds

Han Solo saying, 'Never tell me the odds'

  • Let π be the probability of success.

  • Then π/(1π) is called the odds in favor of success.

    1. If π=1/2 then π/(1π)=1. “One-to-one” odds of success.
    2. If π=2/3 then π/(1π)=2. Success 2x more likely than failure.
    3. If π=1/4 then π/(1π)=1/3. Failure 3x more likely than success.

The logit and logistic transformations

  • The transformation y=ex1+ex is called the logistic transformation.
  • Its inverse x=log(y1y) is called the logit transformation.
  • We have log(πi1πi)=β0+β1xiπi=eβ0+β1xi1+eβ0+β1xi

x <- seq(-10,10,length=200)
y <- exp(x) / (1 + exp(x))
par(mfrow= c(1,2))
plot(y~x,type = "l")
plot(x~y,type = "l")

Two panels, each showing a curve

Goals in logistic regression

  1. Estimate β0 and β1.
  2. Obtain fitted probabilities π^1,,π^n.
  3. Build CI for β1 and test H0: β1=0.
  4. Give interpretations of the estimated regression coefficients.
  5. Check goodness of fit of the logistic regression model.
  6. Add additional covariates

Maximum likelihood estimation in logistic regression

  • We do not use least-squares to estimate β0 and β1.
  • Instead we use maximum likelihood estimators (MLEs).
  • The MLEs are the parameter values giving the observed data the highest possible probability.
  • Intercept b0 and slope b1 give to the observed data the probability Ln(b0,b1)=i=1n[πi(b0,b1)]Yi[1πi(b0,b1)]1Yi with πi(b0,b1)=eb0+b1xi1+eb0+b1xi for i=1,,n.
  • The MLEs β^0, β^1 are the values of b0, b1 that maximize Ln(b0,b1).
  • Ln(b0,b1) is called the likelihood function.

Computing the MLEs in logistic regression

  • There is no “closed-form” expression for β^0 and β^1.
  • One must find their values numerically, that is with an algorithm.
  • More convenient to work with logLn(b0,b1), which is given by n(b0,b1)=i=1n[Yi(b0+b1xi)log(1+eb0+b1xi)].
  • Newton’s method is one way to find the maximizers of n(b0,b1).

Programming task data (cont)

Newton-Raphson algorithm for computing β^0 and β^1.

Generalized linear models

  • The logistic regression model is in a class of models called GLMs.
  • GLM stands for generalized linear model.
  • Poisson regression, binomial response regression, i.a. are GLMs too.
  • Use glm() function in R to obtain β^0 and β^1.

Use glm() function with the option family = "binomial".

glm_out <- glm(success ~ experience, family = "binomial")
summary(glm_out)

Call:
glm(formula = success ~ experience, family = "binomial")

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -3.05970    1.25935  -2.430   0.0151 *
experience   0.16149    0.06498   2.485   0.0129 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 34.296  on 24  degrees of freedom
Residual deviance: 25.425  on 23  degrees of freedom
AIC: 29.425

Number of Fisher Scoring iterations: 4

x <- seq(min(experience),max(experience),length = 200)
pihat_x <- 1/(1 + exp( -(coef(glm_out)[1] + coef(glm_out)[2]*x)))
plot(success ~ experience); lines(pihat_x~x)

Scatterplot with curve added

Fitted probabilities

  • Define the fitted probabilities as π^i=eβ^0+β^1xi1+eβ^0+β^1xi for i=1,,n.
  • For any value xnew, we estimate the probability of “success” as π^new=eβ^0+β^1xnew1+eβ^0+β^1xnew.

Programming task data (cont)

plot(success ~ experience); lines(pihat_x~x)
points(glm_out$fitted.values~experience,pch = 19)

Scatterplot with curve added and points plotted on the curve

Asymptotic distribution of slope estimator and CI

  • For large enough n, β^1 is approximately Normal, such that β^1β1se^{β^1}approxNormal(0,1), where, setting w^i=π^i(1π^i) for i=1,,n, we may write se^{β^1}=[i=1nw^ixi2(i=1nw^i)1(i=1nw^ixi)2]12.
  • We can make an approximate (1α)100% CI for β1 as β^1±zα/2 se^{β^1}.

Programming task data (cont)

# Confidence interval for beta1
pihat <- glm_out$fitted.values
b1hat <- coef(glm_out)[2]
w <- pihat*(1-pihat)
se <- sqrt(1/(sum(w*experience^2) - sum(w*experience)^2/sum(w)))
lo <- b1hat - 1.96 * se
up <- b1hat + 1.96 * se
c(lo,up)
experience experience 
0.03412491 0.28884692 
# CIs for both beta0 and beta1 automatically from glm_out
confint.default(glm_out)
                  2.5 %     97.5 %
(Intercept) -5.52797622 -0.5914155
experience   0.03412744  0.2888444

Testing whether the slope coefficient is zero

To test H0: β1=0 versus H1: β10, do:

  1. Compute Ztest=β^1se^{β^1}.

  2. Reject H0 at α if |Ztest|>zα/2.

  3. Obtain p value as 2(1P(Z>|Ztest|)), ZNormal(0,1).

The summary() function on the glm() output prints this p value.

Odds ratios

  • Let π0 and π1 be success probs under an initial and an altered condition, respectively.

  • Then we call the ratio π1/(1π1)π0/(1π0) the odds ratio associated with the change from the initial to the altered condition.

  • The odds ratio is the factor by which the odds are multiplied when the initial condition is changed to the altered condition.

Interpreting the logistic regression parameters

  • Let π0 and π1 be the “success” probabilities at x0 and x0+1.

  • Then we have the two equations

    1. log(π01π0)=β0+β1x0
    2. log(π11π1)=β0+β1(x0+1)
  • Subtracting the first equation from the second gives β1=log(π11π1)log(π01π0)=log(π1/(1π1)π0/(1π0)).

  • The quantity π1/(1π1)π0/(1π0) is called an odds ratio.

  • So β1 is log of the odds ratio associated with a unit increase in x.

Odds ratio from a unit increase in x

  • From the previous slide, we have eβ1=π1/(1π1)π0/(1π0).

  • Can build a CI for eβ1 by exponentiating the CI for β1.

  • Gives CI for eβ1 as [eβ^1zα/2se^{β^1},eβ^1+zα/2se^{β^1}].

  • A unit increase in x multiplies the odds of success by the factor eβ1.

  • What if the CI for eβ1 contains 1?

Programming task data (cont)

exp(confint.default(glm_out,parm = "experience"))
              2.5 %   97.5 %
experience 1.034716 1.334884

Each additional month of experience increases the odds of completing the programming task by a factor of 1.035 to 1.335, with 95% confidence.

Residuals for logistic regression

  • Ordinary residuals Yiπ^i cannot be Normally distributed.
  • In GLMs, one looks at special residuals called deviance residuals.
  • In logistic regression, the deviance residuals are defined as d^i=sign(Yiπ^i)2[Yilogπ^i+(1Yi)log(1π^i)] for i=1,,n.
  • These are not Normal either, but are useful for assessing model fit.

Programming task data (cont)

plot(glm_out,which = 1)

Residuals versus fitted values plot

Checking model fit with a simulated envelope

The simulated envelope method is described in Kutner et al. ():

  • Fit the logistic regression model and obtain π^1,,π^n.
  • Obtain the deviance residuals; sort them as d^(1)<d^(2)<<d^(n).
  • Generate many new data sets YiBernoulli(π^i), i=1,,n.
  • For each new data set, obtain sorted d^(1)<d^(2)<<d^(n).
  • Plot d^(i) as well as the 0.025 and 0.975 quantiles and the mean of the d^(i)  i (it doesn’t matter what is chosen as the x-axis).
  • The quantiles of the d^(i) make a band. If the model fits, then the d^(i) should lie within the band and close to the mean.

Asks: If the model is correct, how would the deviance residuals behave?

Programming task data (cont)

library(glmtoolbox) # first time run install.packages("glmtoolbox")
envelope(glm_out,type = "deviance")


experience2 <- (experience - mean(experience))^2
envelope(glm(success ~ experience2, family = "binomial"),type = "deviance")

German credit score data from Hofmann ()

Response is credit rating (good/bad), various predictors.

library(foreign) # credit-g dataset from https://www.openml.org/
Warning: package 'foreign' was built under R version 4.4.1
link <- url("https://people.stat.sc.edu/gregorkb/data/dataset_31_credit-g.arff")
credg <- read.arff(link)
colnames(credg)
 [1] "checking_status"        "duration"               "credit_history"        
 [4] "purpose"                "credit_amount"          "savings_status"        
 [7] "employment"             "installment_commitment" "personal_status"       
[10] "other_parties"          "residence_since"        "property_magnitude"    
[13] "age"                    "other_payment_plans"    "housing"               
[16] "existing_credits"       "job"                    "num_dependents"        
[19] "own_telephone"          "foreign_worker"         "class"                 

summary(credg[,1:3])
    checking_status    duration                           credit_history
 <0         :274    Min.   : 4.0   all paid                      : 49   
 >=200      : 63    1st Qu.:12.0   critical/other existing credit:293   
 0<=X<200   :269    Median :18.0   delayed previously            : 88   
 no checking:394    Mean   :20.9   existing paid                 :530   
                    3rd Qu.:24.0   no credits/all paid           : 40   
                    Max.   :72.0                                        

Logistic multiple regression model

Observe (x1,Y1),,(xn,Yn), xi=(xi1,,xip)T and assume YiBernoulli(πi),log(πi1πi)=β0+β1xi1++βpxip, for i=1,,n, where

  • Yi is the response for observation i.
  • xi1,,xip are the values of the predictors for obs i.
  • πi is the probability of “success” for observation i.
  • β0 is an intercept and β1,,βp are slope parameters.
  • πi/(1πi) is the odds of “success” for obs i.
  • log(πi/(1πi)) is the log-odds for obs i.

So we assume the log-odds are a linear function of the predictors.

Interpreting multiple logistic regression parameters

  • Let π0j and π1j be the “success” probabilities at x0j and x0j+1 but with x0k fixed for all kj.
  • Then we have the two equations
    1. log(π0j1π0j)=β0+kjβkx0k+βjx0j
    2. log(π1j1π1j)=β0+kjβkx0k+βj(x0j+1)
  • Subtracting the first equation from the second gives βj=log(π1j/(1π1j)π0j/(1π0j)) and eβj=π1j/(1π1j)π0j/(1π0j).
  • So β1 is log of the odds ratio associated with a unit increase in xj with all other predictors held fixed.

German credit score data (cont)

glm_out <- glm(class ~ ., family = "binomial", data = credg)
summary(glm_out)

Call:
glm(formula = class ~ ., family = "binomial", data = credg)

Coefficients:
                                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                                   1.505e+00  1.248e+00   1.206 0.227801    
checking_status>=200                          9.657e-01  3.692e-01   2.616 0.008905 ** 
checking_status0<=X<200                       3.749e-01  2.179e-01   1.720 0.085400 .  
checking_statusno checking                    1.712e+00  2.322e-01   7.373 1.66e-13 ***
duration                                     -2.786e-02  9.296e-03  -2.997 0.002724 ** 
credit_historycritical/other existing credit  1.579e+00  4.381e-01   3.605 0.000312 ***
credit_historydelayed previously              9.965e-01  4.703e-01   2.119 0.034105 *  
credit_historyexisting paid                   7.295e-01  3.852e-01   1.894 0.058238 .  
credit_historyno credits/all paid             1.434e-01  5.489e-01   0.261 0.793921    
purposedomestic appliance                    -2.173e-01  8.041e-01  -0.270 0.786976    
purposeeducation                             -7.764e-01  4.660e-01  -1.666 0.095718 .  
purposefurniture/equipment                    5.152e-02  3.543e-01   0.145 0.884391    
purposenew car                               -7.401e-01  3.339e-01  -2.216 0.026668 *  
purposeother                                  7.487e-01  7.998e-01   0.936 0.349202    
purposeradio/tv                               1.515e-01  3.370e-01   0.450 0.653002    
purposerepairs                               -5.237e-01  5.933e-01  -0.883 0.377428    
purposeretraining                             1.319e+00  1.233e+00   1.070 0.284625    
purposeused car                               9.264e-01  4.409e-01   2.101 0.035645 *  
credit_amount                                -1.283e-04  4.444e-05  -2.887 0.003894 ** 
savings_status>=1000                          1.339e+00  5.249e-01   2.551 0.010729 *  
savings_status100<=X<500                      3.577e-01  2.861e-01   1.250 0.211130    
savings_status500<=X<1000                     3.761e-01  4.011e-01   0.938 0.348476    
savings_statusno known savings                9.467e-01  2.625e-01   3.607 0.000310 ***
employment>=7                                 2.097e-01  2.947e-01   0.712 0.476718    
employment1<=X<4                              1.159e-01  2.423e-01   0.478 0.632415    
employment4<=X<7                              7.641e-01  3.051e-01   2.504 0.012271 *  
employmentunemployed                         -6.691e-02  4.270e-01  -0.157 0.875475    
installment_commitment                       -3.301e-01  8.828e-02  -3.739 0.000185 ***
personal_statusmale div/sep                  -2.755e-01  3.865e-01  -0.713 0.476040    
personal_statusmale mar/wid                   9.162e-02  3.118e-01   0.294 0.768908    
personal_statusmale single                    5.406e-01  2.102e-01   2.572 0.010113 *  
other_partiesguarantor                        1.415e+00  5.685e-01   2.488 0.012834 *  
other_partiesnone                             4.360e-01  4.101e-01   1.063 0.287700    
residence_since                              -4.776e-03  8.641e-02  -0.055 0.955920    
property_magnitudelife insurance             -8.690e-02  2.313e-01  -0.376 0.707115    
property_magnitudeno known property          -5.359e-01  4.017e-01  -1.334 0.182211    
property_magnitudereal estate                 1.945e-01  2.360e-01   0.824 0.409743    
age                                           1.454e-02  9.222e-03   1.576 0.114982    
other_payment_plansnone                       6.463e-01  2.391e-01   2.703 0.006871 ** 
other_payment_plansstores                     1.232e-01  4.119e-01   0.299 0.764878    
housingown                                   -2.402e-01  4.503e-01  -0.534 0.593687    
housingrent                                  -6.839e-01  4.770e-01  -1.434 0.151657    
existing_credits                             -2.721e-01  1.895e-01  -1.436 0.151109    
jobskilled                                   -7.524e-02  2.845e-01  -0.264 0.791419    
jobunemp/unskilled non res                    4.795e-01  6.623e-01   0.724 0.469086    
jobunskilled resident                        -5.666e-02  3.501e-01  -0.162 0.871450    
num_dependents                               -2.647e-01  2.492e-01  -1.062 0.288249    
own_telephoneyes                              3.000e-01  2.013e-01   1.491 0.136060    
foreign_workeryes                            -1.392e+00  6.258e-01  -2.225 0.026095 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1221.73  on 999  degrees of freedom
Residual deviance:  895.82  on 951  degrees of freedom
AIC: 993.82

Number of Fisher Scoring iterations: 5

Note that glm() estimates three coefficients for checking_status.

summary(credg$checking_status)
         <0       >=200    0<=X<200 no checking 
        274          63         269         394 

Numeric predictors to encode the levels of the categorical predictor: xi1={1200checking0otherwisexi2={10checking<2000otherwisexi3={1no checking0otherwise

Likewise for other categorical predictors.

Deviances replace error sums of squares in GLMs

  • The deviance is the sum of squared deviance residuals i=1nd^i2 .
  • In logistic regression the deviance can be computed as Dev=2i=1n[Yilogπ^i+(1Yi)log(1π^i)].
  • Full-reduced model test: Reject H0: βj=0 for all jD if Dev(Reduced)Dev(Full)>χs,α2, where s is the number of predictors in D (need large n).

German credit score data (cont)

Test whether any level of checking status is important to the credit score.

credg_red <- credg[,-1] # remove checking_status column

glm_full <- glm(class ~ ., family = "binomial", data = credg)
glm_red <-  glm(class ~ ., family = "binomial", data = credg_red)

p <- length(coef(glm_full)) - 1
s <- nlevels(credg$checking_status) - 1

1 - pchisq(glm_red$deviance - glm_full$deviance,s)
[1] 2.731149e-14

Variable selection in logistic regression

  • We may want to discard some of our predictors.
  • One way is to add/remove variables stepwise according to AIC.
  • Can do this just as we did in multiple linear regression.
  • Be cautious about making inferences after selecting a model.

German credit score data (cont)

glm_all <- glm(class ~ ., family = "binomial", data = credg)
step_back <- step(glm_all,
                  direction = "backward",
                  scope = formula(glm_all),
                  criterion = "aic",
                  trace = 0) # suppress printed output
summary(step_back)

Call:
glm(formula = class ~ checking_status + duration + credit_history + 
    purpose + credit_amount + savings_status + installment_commitment + 
    personal_status + other_parties + age + other_payment_plans + 
    housing + own_telephone + foreign_worker, family = "binomial", 
    data = credg)

Coefficients:
                                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                                   4.838e-01  1.017e+00   0.476 0.634362    
checking_status>=200                          1.024e+00  3.626e-01   2.824 0.004739 ** 
checking_status0<=X<200                       3.900e-01  2.121e-01   1.839 0.065928 .  
checking_statusno checking                    1.718e+00  2.281e-01   7.531 5.05e-14 ***
duration                                     -2.568e-02  8.940e-03  -2.872 0.004074 ** 
credit_historycritical/other existing credit  1.373e+00  4.041e-01   3.397 0.000680 ***
credit_historydelayed previously              7.910e-01  4.488e-01   1.762 0.077985 .  
credit_historyexisting paid                   7.115e-01  3.788e-01   1.879 0.060305 .  
credit_historyno credits/all paid            -1.188e-01  5.268e-01  -0.225 0.821612    
purposedomestic appliance                    -2.576e-01  7.763e-01  -0.332 0.740041    
purposeeducation                             -9.262e-01  4.569e-01  -2.027 0.042628 *  
purposefurniture/equipment                   -4.216e-02  3.415e-01  -0.123 0.901748    
purposenew car                               -7.827e-01  3.272e-01  -2.392 0.016752 *  
purposeother                                  6.523e-01  7.832e-01   0.833 0.404946    
purposeradio/tv                               1.368e-01  3.288e-01   0.416 0.677335    
purposerepairs                               -6.402e-01  5.808e-01  -1.102 0.270365    
purposeretraining                             1.382e+00  1.240e+00   1.114 0.265228    
purposeused car                               8.246e-01  4.288e-01   1.923 0.054495 .  
credit_amount                                -1.294e-04  4.221e-05  -3.066 0.002169 ** 
savings_status>=1000                          1.289e+00  5.072e-01   2.542 0.011008 *  
savings_status100<=X<500                      3.282e-01  2.767e-01   1.186 0.235477    
savings_status500<=X<1000                     4.304e-01  3.933e-01   1.094 0.273900    
savings_statusno known savings                9.628e-01  2.570e-01   3.746 0.000179 ***
installment_commitment                       -3.299e-01  8.554e-02  -3.857 0.000115 ***
personal_statusmale div/sep                  -2.872e-01  3.763e-01  -0.763 0.445327    
personal_statusmale mar/wid                   1.297e-01  3.061e-01   0.424 0.671803    
personal_statusmale single                    5.356e-01  1.975e-01   2.712 0.006686 ** 
other_partiesguarantor                        1.528e+00  5.581e-01   2.737 0.006193 ** 
other_partiesnone                             4.874e-01  3.997e-01   1.220 0.222648    
age                                           1.309e-02  8.398e-03   1.559 0.118967    
other_payment_plansnone                       6.995e-01  2.350e-01   2.976 0.002916 ** 
other_payment_plansstores                     7.864e-02  4.033e-01   0.195 0.845394    
housingown                                    2.918e-01  2.887e-01   1.011 0.312127    
housingrent                                  -1.497e-01  3.411e-01  -0.439 0.660793    
own_telephoneyes                              2.794e-01  1.842e-01   1.516 0.129394    
foreign_workeryes                            -1.382e+00  6.207e-01  -2.227 0.025925 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1221.7  on 999  degrees of freedom
Residual deviance:  910.5  on 964  degrees of freedom
AIC: 982.5

Number of Fisher Scoring iterations: 5

envelope(step_back,type="deviance")

Classification with the logistic regression model

Consider classifying the observations as 1 or 0 according to π^i:

  • Choose a threshold c[0,1] and make the classification Y^i={1,π^ic0,π^i<c.

  • Can compute misclassification rate #{Y^iYi}/n.

  • Can compute observed true positive and false positive rates TP=#{Y^i=1Yi=1}#{Yi=1} and FP=#{Y^i=1Yi=0}#{Yi=0}.

  • Plotting TP against FP over all c[0,1] creates the receiver operating characteristic (ROC) curve.

German credit score data (cont)

Compute TP and FP over a range of thresholds c. Plot ROC curve.

Y <- ifelse(credg$class == "good",1,0)
pi_hat <- step_back$fitted.values

n1 <- sum(Y == 1)
n0 <- sum(Y == 0)

cc <- sort(c(0,pi_hat,1))
TP <- FP <- numeric(length(cc))
for(j in 1:length(cc)){
  
  Yhat <- pi_hat >= cc[j]
  TP[j] <- sum(Yhat == 1 & Y == 1) / n1
  FP[j] <- sum(Yhat == 1 & Y == 0) / n0
  
}

plot(TP ~ FP, type = "l", main = "ROC curve")
abline(0,1,lty = 3)

An ROC curve


Can use ROC curves to compare models.

Two ROC curves

Traning and testing data sets

  • Overfitting is fitting a model such that it performs well on the data used to fit it, but will perform poorly when it is given new data.

  • To avoid, split data into training and testing data sets {(xitrain,Yitrain)}iItrain and {(xitest,Yitest)}iItest, where ItrainItest= and ItrainItest={1,,n}.

  • Fit models on training data. Compare performance on testing data.

  • For example, compute misclassification rate or draw ROC curve for testing data.

German credit score data (cont)

set.seed(1)
n <- nrow(credg)
n_train <- ceiling(.5*n)
n_test <- n - n_train
ind_train <- sample(n,n_train) # randomly select a training set

credg_train <- credg[ind_train,]
credg_test <- credg[-ind_train,]

glm_train <- glm(class ~ ., family = "binomial", data = credg_train)

ypred_train <- glm_train$fitted.values >= 0.5
y_train <- as.numeric(credg_train$class == "good")
misclass_train <- mean(ypred_train != y_train)
misclass_train
[1] 0.204
pihat_test <- predict(glm_train,newdata = credg_test,type="response")

ypred_test <- pihat_test >= 0.5
y_test <- as.numeric(credg_test$class == "good")

misclass_test <- mean(ypred_test != y_test)
misclass_test
[1] 0.258

References

Hofmann, Hans. 1994. Statlog (German Credit Data).” UCI Machine Learning Repository.
Kutner, Michael H, Christopher J Nachtsheim, John Neter, and William Li. 2005. Applied Linear Statistical Models. McGraw-hill.