Difference between revisions of "Logistic regression"
Jump to navigation
Jump to search
m (→GNU/Octave example: tweak) |
|||
| Line 1: | Line 1: | ||
'''Logistic regression''' is a type of curve fitting. It used for discrete outcome variables, e.g. | '''Logistic regression''' is a type of curve fitting. It used for discrete outcome variables, e.g. pass or fail. | ||
==GNU/Octave example== | ==GNU/Octave example== | ||
Revision as of 03:03, 18 March 2017
Logistic regression is a type of curve fitting. It used for discrete outcome variables, e.g. pass or fail.
GNU/Octave example
% -------------------------------------------------------------------------------------------------
% TO RUN THIS FROM WITHIN OCTAVE
% run logistic_regression_example.m
%
% TO RUN FROM THE COMMAND LINE
% octave logistic_regression_example.m > tmp.txt
% -------------------------------------------------------------------------------------------------
clear all;
close all;
% Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y = [ 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ];
x = [ 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ];
y=transpose(y);
x=transpose(x);
[theta, beta, dev, dl, d2l, gamma] = logistic_regression(y,x,1)
% calculate standard error
se = sqrt (diag (inv (-d2l)))
% create array
% [[ alpha se_alpha zvalue_alpha zvalue_alpha^2 P_Wald_alpha ]
% [ beta_1 se_beta_1 zvalue_beta_1 zvalue_beta_1^2 P_Wald_beta_1 ]
% [ beta_2 se_beta_2 zvalue_beta_2 zvalue_beta_1^2 P_Wald_beta_1 ]
% [ ... ... ... ... ... ]
% [ beta_n se_beta_n zvalue_beta_n zvalue_beta_1^2 P_Wald_beta_n ]]
logit_arr=zeros(size(beta,1)+1,5);
logit_arr(1,1)=theta;
logit_arr(2:end,1)=beta;
logit_arr(1:end,2)=se;
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2); % zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3); % zvalue_i^2
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1) % Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/
% calculate the probability for a range of values
studytime_arr= [ 1, 2, 3, 4, 5]
len_studytime_arr=size(studytime_arr,2);
% As per the manual ( https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html ) GNU/Octave function fits:
% logit (gamma_i (x)) = theta_i - beta' * x, i = 1 ... k-1
for ctr=1:len_studytime_arr
P_logit(ctr)=1/(1+exp(theta-studytime_arr(ctr)*beta));
end
% print to screen output
P_logit
logit_arr
% create plot
len_x = size(x,1)
for ctr=1:len_x
P_fitted(ctr)=1/(1+exp(theta-x(ctr)*beta));
end
plot(x,y,'o')
hold on;
grid on;
plot(x,P_fitted,'-')
xlabel('Study time (hours)');
ylabel('Probability of passing the exam (-)');