Handling Blank responses, Coding,
Categorization and Data Entry
These activities ensure accuracy of the data and
its conversion from raw form to reduced data
Exploring,
Displaying and Examining
data
Breaking
down, inspecting and rearranging data
to start the search for meaningful descriptions,
patterns and relationship.
Coding Rules
Appropriate to the
research problem
Exhaustive
Categories
Categories
should
should be
be
Mutually exclusive
Derived from one
classification principle
Appropriateness
Let’s say your population is
students at institutions of higher
learning
What is you age group?
15 – 25 years
26 – 35 years
36 – 45 years
Above 45 years
Exhaustiveness
What is your race?
Malay
Chinese
Indians
Others
Mutual Exclusivity
What is your occupation type?
Professional
Managerial
Sales
Clerical
Others
Crafts
Operatives
Unemployed
Housewife
Single Dimension
What
is your occupation type?
Professional
Managerial
Sales
Clerical
Others
Crafts
Operatives
Unemployed
Housewife
Coding Open-ended Responses
Coding Open Ended Questions
Handling Blank Responses
How
do we take care of missing
responses?
If
> 25% missing, throw out the questionnaire
Other ways of handling
•
•
•
•
•
Use the midpoint of the scale
Ignore (system missing)
Mean of those responding
Mean of the respondent
Random number
How to Select a Test
Two-Sample Tests
k-Sample Tests
____________________________________________
____________________________________________
Measurement
Scale
One-Sample Case
Related Samples
Independent
Samples
Nominal
Binomial
x2 one-sample test
McNemar
Ordinal
KolmogorovSmirnov one-sample
test
Runs test
Sign test
Related Samples
Fisher exact
test
x2 two-samples
test
Median test
Friedman twoway ANOVA
Repeatedmeasures ANOVA
Wilcoxon
Mann-Whitney
matched-pairs
test
U
Cochran Q
Independent
Samples
x2 for k samples
Median
extension
Kruskal-Wallis
one-way ANOVA
Kolmogorov-
Smirnov
Wald-Wolfowitz
Interval and
Ratio
t-test
Z test
t-test for paired
samples
t-test
Z test
One-way
ANOVA
n-way ANOVA
Data Transformation
Weights
Assigning
numbers to responses on a
pre-determined rule
Respecification
of the Variable
Transforming
existing data to form new
variables or items
Recode
Compute
Scale Transformation
Reason
for Transformation
to improve interpretation and
compatibility with other data sets
to enhance symmetry and stabilize
spread
improve linear relationship between
the variables (Standardized score)
Xi - X
z
s
Data Transformation
Section 1 - Computer Anxiety
Computers make me feel
uncomfortable
1
2
3
4
5
6
7
I get a sinking feeling when I
think of trying to use a computer
1
2
3
4
5
6
7
Computers scare me
1
2
3
4
5
6
7
I feel comfortable using a
computer
1
2
3
4
5
6
7
Working with a computer makes
me nervous
1
2
3
4
5
6
7
Sample SPSS Codebook
Research Model
5 items
Attitude
4 items
Subjective
norm
4 items
Perceived
Behavioral
Control
5 items
3 items
Intention to
Share
Information
Actual
Sharing of
Information
Factor Analysis - Command
Assumptions in FA
Question:
How valid is our instrument?
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy.
Bartlett's Test of
Sphericity
Approx. Chi-Square
df
Sig.
KMO should be > 0.5
Bartlett’s Test should be
significant ie; p < 0.05
.882
2878.230
78
.000
Measure of Sampling Adequacy
MSA
0.80 and above
Comment
Meritorious
0.70 – 0.80
Middling
0.60 – 0.70
Mediocre
0.50 – 0.60
Miserable
Below 0.50
Unacceptable
Assumptions in FA
Anti-image Matrices
Anti-image Covariance
Cronbach's
Alpha if Item
Deleted
.951
.961
.959
.962
.958
Table in Report
Variable
N of Item
Item
Deleted
Alpha
Attitude
5
-
0.977
SN
4
-
0.912
Pbcontrol
4
-
0.919
Intention
5
-
0.966
Actual
3
-
0.771
Example - Recoding
Perceived Enjoyment
PE1
The actual process of
using Instant Messenger is
pleasant
1
2
3
4
5
6
7
PE2
I have fun using Instant
Messenger
1
2
3
4
5
6
7
PE3
Using Instant Messenger
bores me
1
2
3
4
5
6
7
PE4
Using Instant Messenger
provides me with a lot of
enjoyment
1
2
3
4
5
6
7
PE5
I enjoy using Instant
Messenger
1
2
3
4
5
6
7
Recoding - Command
Data before Transformation
Computing New Variable - Command
Data after Transformation
Frequencies - Command
Question:
Frequencies
1. Is our sample representative?
2. Data entry error
Gender
Valid
Male
Female
Total
Frequency
144
48
192
Percent
75.0
25.0
100.0
Valid Percent
75.0
25.0
100.0
Cumulative
Percent
75.0
100.0
Current Position
Valid
Technician
Engineer
Sr Engineer
Manager
Above manager
Total
Frequency
34
66
54
32
6
192
Percent
17.7
34.4
28.1
16.7
3.1
100.0
Valid Percent
17.7
34.4
28.1
16.7
3.1
100.0
Cumulative
Percent
17.7
52.1
80.2
96.9
100.0
Table in Report
Gender
Male
Female
Position
Technician
Engineer
Sr Engineer
Manager
Above manager
Frequency
Percentage
144
48
75.0
25.0
34
66
54
32
6
17.7
34.4
28.1
16.7
3.1
Descriptives - Command
Descriptives
Descriptive Statistics
Age
Years working in the
organization
Total years of
working experience
Attitude
subjective
Pbcontrol
Intention
Actual
Valid N (listwise)
N
Statistic
192
Minimum
Statistic
19
Maximum
Statistic
53
Mean
Statistic
33.39
Std.
Deviation
Statistic
8.823
192
1
18
5.36
4.435
1.448
.175
1.333
.349
192
1
28
9.04
7.276
1.051
.175
-.025
.349
192
192
192
192
192
192
2.00
2.00
2.00
2.00
2.33
5.00
5.00
5.00
5.00
5.00
3.8104
3.7031
3.4792
3.8188
4.0625
.64548
.67034
.73672
.63877
.58349
-.480
-.101
.015
-.528
-.361
.175
.175
.175
.175
.175
.242
.755
-.028
.687
-.328
.349
.349
.349
.349
.349
Skewness
Statistic
Std. Error
.667
.175
Kurtosis
Statistic
Std. Error
-.557
.349
Question:
1. Is there variation in our data?
2. What is the level of the phenomenon we are measuring?
Table in Report
Mean
Std. Deviation
Attitude
3.81
0.65
Subjective Norm
3.70
0.67
Behavioral Control
3.48
0.74
Intention
3.82
0.64
Actual
4.06
0.58
Chi Square Test - Command
Crosstabulation
Question:
Is level of sharing dependent on gender?
Gender * Intention Level Crosstabulation
Gender
Male
Female
Total
Count
% within Gender
% within Intention Level
% of Total
Count
% within Gender
% within Intention Level
% of Total
Count
% within Gender
% within Intention Level
% of Total
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases
Value
8.934 b
7.704
11.274
8.888
df
1
1
1
1
Asymp. Sig.
(2-sided)
.003
.006
.001
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.002
.001
.003
192
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 9.
00.
T-test - Command
Question:
t-test
(2 Independent) Does intention to share vary by gender?
Group Statistics
Intention
Gender
Male
Female
N
Mean
3.9000
3.5750
144
48
Std.
Deviation
.60302
.68619
Std. Error
Mean
.05025
.09904
Independent Samples Test
Levene's Test for
Equality of Variances
F
Intention
Equal variances
assumed
Equal variances
not assumed
3.591
Sig.
.060
t-test for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
3.122
190
.002
.32500
.10410
.11965
.53035
2.926
72.729
.005
.32500
.11106
.10364
.54636
Paired t-test - Command
Question:
t-test
(2 Dependent) Are there differences between intention to
share and actual sharing behavior?
Paired Samples Statistics
Pair
1
Mean
3.8188
4.0625
Intention
Actual
Std.
Deviation
.63877
.58349
N
192
192
Std. Error
Mean
.04610
.04211
Paired Samples Correlations
N
Pair 1
Intention & Actual
192
Correlation
.817
Sig.
.000
Paired Samples Test
Paired Differences
Pair 1
Intention - Actual
Mean
-.24375
Std.
Deviation
.37326
Std. Error
Mean
.02694
95% Confidence
Interval of the
Difference
Lower
Upper
-.29688
-.19062
t
-9.049
df
191
Sig. (2-tailed)
.000
One Way ANOVA - Command
Question:
One way ANOVA
(k independent) Does intention vary by position?
ANOVA
Intention
Between Groups
Within Groups
Total
Sum of
Squares
7.864
70.068
77.933
df
4
187
191
Mean Square
1.966
.375
F
5.247
Sig.
.001
Intention
Duncana,b
Subset for alpha = .05
Current Position
N
1
2
Engineer
66
3.6424
Manager
32
3.6625
Technician
34
3.8941
Sr Engineer
54
4.0000
Above manager
6
4.5333
Sig.
.101
1.000
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 19.157.
b. The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
Mann-Whitney - Command
Question:
Mann-Whitney
(2 independent) Does the variables vary by gender?
Ranks
Intention
Gender
Male
Female
Total
N
144
48
192
Mean Rank
103.64
75.08
Sum of Ranks
14924.00
3604.00
Test Statistics a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Intention
2428.000
3604.000
-3.266
.001
a. Grouping Variable: Gender
Kruskal-Wallis - Command
Question:
Kruskal-Wallis
(k independent) Does the variables vary by position?
Ranks
Intention
Position
Technician
Engineer
Sr Engineer
Manager
Above manager
Total
N
34
66
54
32
6
192
Mean Rank
101.32
79.68
114.54
81.63
171.17
Test Statistics a,b
Chi-Square
df
Asymp. Sig.
Intention
28.179
4
.000
a. Kruskal Wallis Test
b. Grouping Variable: Position
Correlation - Command
Correlation
(Interval/ratio)
Question:
Are the variables related?
Correlations
Attitude
subjective
Pbcontrol
Intention
Actual
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (1-tailed)
N
Correlation Coefficient
Sig. (1-tailed)
N
Correlation Coefficient
Sig. (1-tailed)
N
Correlation Coefficient
Sig. (1-tailed)
N
Correlation Coefficient
Sig. (1-tailed)
N
**. Correlation is significant at the 0.01 level (1-tailed).
*. Correlation is significant at the 0.05 level (1-tailed).
Assumptions – Advanced Diagnostics
(Hair et al., 2006)
Residuals Statisticsa
Predicted Value
Std. Predicted Value
Standard Error of
Predicted Value
Adjusted Predicted Value
Residual
Std. Residual
Stud. Residual
Deleted Residual
Stud. Deleted Residual
Mahal. Distance
Cook's Distance
Centered Leverage
Value
H
i
s
t
o
g
r
a
m
D
e
p
n
d
e
n
V
a
i
b
l
e
:
I
n
t
e
n
t
i
o
n
7
0
6
5
0
4
3
0
Frequncy
Assumptions (Normality)
2
0
1
M
e
a
n
=
1
.
9
E
1
7
0
S
t
d
.
D
e
v
=
0
9
2
N
=
1
9
2
-4
2
0
2
4
6
R
e
g
r
e
s
io
n
S
ta
n
d
a
r
d
iz
e
d
R
e
s
id
u
a
l
N
o
r
m
a
lP
-0
l1
P
o
t
f
R
e
g
r
e
s
i
o
n
S
t
a
n
d
a
r
d
i
z
e
d
R
e
s
i
d
u
a
l
D
e
p
n
d
n
t
V
a
b
l
e
:
I
e
t
o
n
.0
0
.0
8
..6
4
ExpectdCum
Prob
Assumptions
(Normality of the Error term)
.0
0
2
.0
.0
.O
2
0
.
4
0
.
6
0
.
8
1
.
0
b
s
e
r
v
e
d
C
u
m
P
r
o
b
S
c
a
t
e
r
p
l
o
t
D
e
p
n
d
e
n
V
i
a
b
e
:
I
n
t
e
n
t
i
o
n
4
2
0
R
egrR
seiondSatludentized
Assumptions (Constant Variance)
-2
2
.02
.5
03
.0
3
.
5
0
4
.
0
4
.
5
0
5
.
0
In
te
n
tio
n
P
a
r
t
i
l
R
e
g
r
s
i
o
n
P
l
o
t
D
e
p
n
d
e
n
V
a
i
b
l
e
:
I
e
n
i
n
.0
1
5
0
.5
Inteion
Assumptions (Linearity)
--0
.1
5
.5-2-1A
0
1
tiu
d
e
P
a
r
t
i
l
R
e
g
r
s
i
o
n
P
l
o
t
D
e
p
n
d
e
n
t
V
a
i
b
l
e
:
I
e
n
i
n
2
.1
0
5
.5
0
Inteion
Assumptions (Linearity)
.-1
0
5
.0
-2-1s
0
1
2
u
b
je
c
tiv
e
P
a
r
t
i
a
l
R
e
g
r
e
s
i
o
n
P
l
o
t
D
e
p
n
d
e
n
t
V
a
i
b
l
e
:
I
e
n
i
n
2
.1
0
5
.5
0
Inteion
Assumptions (Linearity)
.-1
0
5
.0-2-1
0
1
P
b
c
o
n
tro
l
Table Presentation
Variable
Dependent = Intention
Standardized Beta
Attitude
Subjective Norm
Perceived Control
0.607**
0.238**
0.105**
R2
Adjusted R2
F Value
D-W
0.693
0.688
141.13
1.501
*p< 0.05, **p< 0.01
Discriminant - Command
Question:
Discriminant
Analysis
Which variables can discriminate high
and low intention to share?
Analysis Case Processing Summary
Unweighted Cases
Valid
Excluded Missing or out-of-range
group codes
At least one missing
discriminating variable
Both missing or
out-of-range group codes
and at least one missing
discriminating variable
Unselected
Total
Total
Dividing the Sample into Estimation and
Split/Holdout Sample: Random Selection
Command:
TRANSFORM RANDOM
NUMBER SEED
TRANSFORM COMPUTE
Randz = UNIFORM(1) > 0.65
will give 65% of respondent for
estimation and the remainder for
holdout sample
Test for Model
Wilks' Lambda
Test of Function(s)
1
Wilks'
Lambda
.796
Chi-square
28.214
df
3
Sig.
.000
Test Results
Box's M
F
5.942
Approx.
.939
df1
6
df2
9055.846
Sig.
.465
Tests null hypothesis of equal population covariance matrices.
Goodness of Model
Eigenvalues
Function
1
Eigenvalue
.257a
% of
Variance
100.0
Cumulative %
100.0
Canonical
Correlation
.452
a. First 1 canonical discriminant functions were used in the
analysis.
Tests of Equality of Group Means
Attitude
Norm
pbc
Wilks'
Lambda
.850
.833
.985
F
22.007
24.998
1.949
df1
1
1
1
df2
125
125
125
Sig.
.000
.000
.165
Coefficients
Standardized Canonical
Discriminant Function Coefficients
Attitude
Norm
pbc
Function
1
.322
.741
.321
Canonical Discriminant Function Coefficients
Function
1
Attitude
.524
Norm
1.185
pbc
.415
(Constant)
-7.759
Unstandardized coefficients
Structure Matrix
Function
1
Norm
.883
Attitude
.828
pbc
.246
Pooled within-groups correlations between discriminating
variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within
function.
Classification
Functions at Group Centroids
Function
Level
1
Low
-.236
High
1.069
Unstandardized canonical discriminant
functions evaluated at group means
Classification Function Coefficients
Level
Low
High
2.848
3.532
8.746
10.293
6.553
7.095
-32.031
-44.209
Fisher's linear discriminant functions
Attitude
Norm
pbc
(Constant)
N Z
N
Z
Z
N N
A
B
B
A
CU
A
B
ZA = centroid Group A
ZB = centroid Group B
NA & NB = Number in each group
Predictive Validity
Classification Resultsb,c,d
Cases Selected
Original
Count
%
Cross-validated a
Count
%
Cases Not Selected
Original
Count
%
Level
Low
High
Low
High
Low
High
Low
High
Low
High
Low
High
a. Cross validation is done only for those cases in the analysis. In cross validation, each
case is classified by the functions derived from all cases other than that case.
b. 84.3% of selected original grouped cases correctly classified.
c. 90.8% of unselected original grouped cases correctly classified.
d. 84.3% of selected cross-validated grouped cases correctly classified.
Benchmark for Comparison
How good is the Hit Ratio? Compute Hit Ratio for
split sample and compare it against
Maximum Chance Criterion: This is just the size of the largest
group. Minimum criterion to be met by the Hit Ratio
Proportional Chance Criterion: Should be used when group sizes
are unequal. If two groups this is given as follows:
Cpro = p2 + (1 - p)2
p = proportion in group
Press’s Q: Compares No. of correct classification (n) against Total
Sample (N) and Number of Groups (k)
[N - (n * k)]2
Press Q
N(k - 1)
Press Q 2 with 1 degree of freedom. (3.84,
6.64)
Logistic Regression- Command
Logistic Regression- Command
Initial Output
Case Processing Summary
Unweighted Cases
Selected Cases
a
N
Included in Analysis
Missing Cases
Total
Unselected Cases
Total
No missing cases
Percent
100.0
.0
100.0
.0
100.0
192
0
192
0
192
a. If weight is in effect, see classification table for the total
number of cases.
Dependent Variable Encoding
Original Value
Low
High
Internal Value
0
1
Correctly classifies all those
With high values but misses
All those with low values.
Classification Tablea,b
Predicted
Step 0
Observed
Sharing Level
Low
High
Overall Percentage
a. Constant is included in the model.
b. The cut value is .500
Sharing Level
Low
High
0
84
0
108
Percentage
Correct
.0
100.0
56.3
This is the proportion of
respondents in the high
Sharing category
Output
Variables in the Equation
Step 0
Constant
B
.251
S.E.
.145
Wald
2.984
df
1
Sig.
.084
Exp(B)
1.286
Variables not in the Equation
Step
0
Variables
Attitude
SN
PBC
Overall Statistics
Score
30.588
38.624
.120
41.833
df
The constant is entered
first, the other variables
are not included
1
1
1
3
Sig.
.000
.000
.729
.000
The Wald statistics is like the
t-value. The constant by
itself does not significantly
Improve prediction
Output
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
48.073
48.073
48.073
df
3
3
3
Sig.
.000
.000
.000
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
a
215.088
.221
Nagelkerke
R Square
.297
The Model accounts
for 29.7% of the variance
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.
Hosmer and Lemeshow Test
Step
1
Chi-square
71.722
df
8
Sig.
.000
A significant chi square indicates
That the predicted probabilities do
Not match the observed probabilities.
This is not what we usually want.
Contingency Table – Hosmer Lemeshow
Contingency Table for Hosmer and Lemeshow Test
This is a more detailed assessment
of the Hosmer Lemeshow Test.
need to look at how close or
how different are the observed
and expected values for each group