Definition of 'Z-Test' A statistical test used to determine whether two population means are different when the variances are known and the sample size is large. The test statistic is assumed to have a normal distribution and nuisance parameters such as standard deviation should be known in order for an accurate z-test to be performed.

One-Sample z-test Requirements: Normally distributed population, σ known Test for population mean Hypothesis test Formula: where is the sample mean, Δ is a specified value to be tested, σ is the population standard deviation, and n is the size of the sample. Look up the significance level of the z-value in the standard normal table (Table in Appendix ). A herd of 1,500 steer was fed a special high-protein grain for a month. A random sample of 29 were weighed and had gained an average of 6.7 pounds. If the standard deviation of weight gain for the entire herd is 7.1, test the hypothesis that the average weight gain per steer for the month was more than 5 pounds. null hypothesis: H0: μ = 5 alternative hypothesis: Ha: μ > 5 Tabled value for z ≤ 1.28 is 0.8997 1 – 0.8997 = 0.1003 So, the conditional probability that a sample from the herd gains at least 6.7 pounds per steer is p = 0.1003. Should the null hypothesis of a weight gain of less than 5 pounds for the population be rejected? That depends on how conservative you want to be. If you had decided beforehand on a significance level of p < 0.05, the null hypothesis could not be rejected. In national use, a vocabulary test is known to have a mean score of 68 and a standard deviation of 13. A class of 19 students takes the test and has a mean score of 65. Is the class typical of others who have taken the test? Assume a significance level ofp < 0.05. There are two possible ways that the class may differ from the population. Its scores may be lower than, or higher than, the population of all students taking the test; therefore, this problem requires a two-tailed test. First, state the null and alternative hypotheses: null hypothesis: H0: μ = 68 alternative hypothesis: Ha : μ ≠ 68

Because you have specified a significance level, you can look up the critical z-value in Table of Appendix before computing the statistic. This is a two-tailed test; so the 0.05 must be split such that 0.025 is in the upper tail and another 0.025 in the lower. The z-value that corresponds to – 0.025 is –1.96, which is the lower criticalz-value. The upper value corresponds to 1 – 0.025, or 0.975, which gives a z-value of 1.96. The null hypothesis of no difference will be rejected if the computed z statistic falls outside the range of –1.96 to 1.96. Next, compute the z statistic:

Because –1.006 is between –1.96 and 1.96, the null hypothesis of population mean is 68 and cannot be rejected. That is, there is not evidence that this class can be considered different from others who have taken the test. Confidence interval for population mean using z Formula: where a and b are the limits of the confidence interval, is the sample mean, is the upper (or positive) z-value from the standard normal table corresponding to half of the desired alpha level (because all confidence intervals are two-tailed), σ is the population standard deviation, and n is the size of the sample. Example 3 A sample of 12 machine pins has a mean diameter of 1.15 inches, and the population standard deviation is known to be 0.04. What is a 99 percent confidence interval of diameter width for the population? First, determine the z-value. A 99 percent confidence level is equivalent to p < 0.01. Half of 0.01 is 0.005. The z-value corresponding to an area of 0.005 is 2.58. The interval may now be calculated:

The interval is (1.12, 1.18). We have 99 percent confidence that the population mean of pin diameters lies between 1.12 and 1.18 inches. Note that this is not the same as saying that 99 percent of the machine pins have diameters between 1.12 and 1.18 inches, which would be an incorrect conclusion from this test. Choosing a sample size Because surveys cost money to administer, researchers often want to calculate how many subjects will be needed to determine a population mean using a fixed confidence interval and significance level. The formula is

where n is the number of subjects needed, is the critical z-value corresponding to the desired significance level, σ is the population standard deviation, and w is the desired confidence interval width. Example 4 How many subjects will be needed to find the average age of students at Fisher College plus or minus a year, with a 95 percent significance level and a population standard deviation of 3.5?

Rounding up, a sample of 48 students would be sufficient to determine students' mean age plus or minus one year. Note that the confidence interval width is always double the “plus or minus” figure. The T-Test The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design.

Figure 1. Idealized distributions for treated and comparison group posttest values.

Figure 1 shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the idealized distribution -- the actual distribution would usually be depicted with a histogram or bar graph. The figure indicates where the control and treatment

group means are located. The question the t-test addresses is whether the means are statistically different. What does it mean to say that the averages for two groups are statistically different? Consider the three situations shown in Figure 2. The first thing to notice about the three situations is that the difference between the means is the same in all three. But, you should also notice that the three situations don't look the same -- they tell very different stories. The top example shows a case with moderate variability of scores within each group. The second situation shows the high variability case. The third shows the case with low variability. Clearly, we would conclude that the two groups appear most different or distinct in the bottom or low-variability case. Why? As there is relatively little overlap between the two bell-shaped curves. In the high variability case, the group difference appears least striking because the two bell-shaped distributions overlap so much.

Figure 2. Three scenarios for differences between means.

This leads us to a very important conclusion: when we are looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their scores. The t-test does just this. Statistical Analysis of the t-test

The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group difference. Figure 3 shows the formula for the t-test and how the numerator and denominator are related to the distributions.

Figure 3. Formula for the t-test.

The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square root. The specific formula is given in Figure 4:

Figure 4. Formula for the Standard error of the difference between the means.

Remember, that the variance is simply the square of the standard deviation. The final formula for the t-test is shown in Figure 5:

Figure 5. Formula for the t-test.

The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value you have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, you need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred you would find a statistically significant difference between the means even if there was none (i.e., by "chance"). You also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, you can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, you can conclude that the difference between the means for the two groups is different (even given the variability). Fortunately, statistical computer programs routinely print the significance test results and save you the trouble of looking them up in a table. The t-test, one-way Analysis of Variance (ANOVA) and a form of regression analysis are mathematically equivalent and would yield identical results. The Analysis Of Variance, popularly known as the ANOVA, can be used in cases where there are more than two groups. When we have only two samples we can use the t-test to compare the means of the samples but it might become unreliable in case of more than two samples. If we only compare two means, then the t-test (independent samples) will give the same results as the ANOVA.

It is used to compare the means of more than two samples. This can be understood better with the help of an example. One Way Anova EXAMPLE: Suppose we want to test the effect of five different exercises. For this, we recruit 20 men and assign one type of exercise to 4 men (5 groups). Their weights are recorded after a few weeks. We may find out whether the effect of these exercises on them is significantly different or not and this may be done by comparing the weights of the 5 groups of 4 men each. The example above is a case of one-way balanced ANOVA. It has been termed as one-way as there is only one category whose effect has been studied and balanced as the same number of men has been assigned on each exercise. Thus the basic idea is to test whether the samples are all alike or not. Why Not Multiple T-Tests? As mentioned above, the t-test can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using many t-tests. But conducting such multiple t-tests can lead to severe complications and in such circumstances we use ANOVA. Thus, this technique is used whenever an alternative procedure is needed for testing hypotheses concerning means when there are several populations. One Way and Two Way Anova Now some questions may arise as to what are the means we are talking about and why variances are analyzed in order to derive conclusions about means. The whole procedure can be made clear with the help of an experiment. Let us study the effect of fertilizers on yield of wheat. We apply five fertilizers, each of different quality, on five plots of land each of wheat. The yield from each plot of land is recorded and the difference in yield among the plots is observed. Here, fertilizer is a factor and the different qualities of fertilizers are called levels. This is a case of one-way or one-factor ANOVA since there is only one factor, fertilizer. We may also be interested to study the effect of fertility of the plots of land. In such a case we would have two factors, fertilizer and fertility. This would be a case of two-way or two-factor ANOVA. Similarly, a third factor may be incorporated to have a case of three-way or three-factor ANOVA.

Chance Cause and Assignable Cause In the above experiment the yields obtained from the plots may be different and we may be tempted to conclude that the differences exist due to the differences in quality of the fertilizers. But this difference may also be the result of certain other factors which are attributed to chance and which are beyond human control. This factor is termed as “error”. Thus, the differences or variations that exist within a plot of land may be attributed to error. Thus, estimates of the amount of variation due to assignable causes (or variance between the samples) as well as due to chance causes (or variance within the samples) are obtained separately and compared using an F-test and conclusions are drawn using the value of F. Assumptions There are four basic assumptions used in ANOVA. the expected values of the errors are zero the variances of all errors are equal to each other the errors are independent they are normally distributed

SPSS for Windows

A brief tutorial

This tutorial is a brief look at what SPSS for Windows is capable of doing. Examples will come from Statistical Methods for Psychology by David C. Howell. It is not our intention to teach you about statistics in this tutorial. For that you should rely on your classes in statistics and/or a good textbook. If you're a novice this tutorial should give you a feel for the programme and how to navigate through the many options. Beyond that, the SPSS Help Files should be used as a resource. Further, SPSS sells a number of very good manuals. The Basics SPSS for Windows has the same general look a feel of most other programmes for Windows. Virtually anything statistic that you wish to perform can be accomplished in combination with

pointing and clicking on the menus and various interactive dialog boxes. You may have noted that the examples in the Howell textbook are performed/analyzed via code. That is, SPSS, like many other packages, can be accessed by programming short scripts, instead of pointing and clicking. We will not cover any programming in this tutorial. Presumeably, SPSS is already installed on your computer. If you don't have a shortcut on your desktop go to the [Start => Programs] menu and start the package by clicking on the SPSS icon. Before proceeding I should say a few words about a very simple convention that will be used in this tutorial. In this point and click environment one often has to navigate through many layers of menu items before encountering the required option. In the above paragraph the prescribed task was to locate the SPSS icon in the[Start] menu structure. To get to that icon, one must first click on [Start] then move the pointer to the [Programs] options, before locating the SPSS icon. This sequence of events can be conveyed by typing [Start => Programs] . That is, one must move from the outer layer of the menu structure to some inner layer in sequence.... Now, back to the tutorial. Once you've clicked on the SPSS icon a new window will appear on the screen. The appearance is that of a standard programme for windows with a spreadsheet-like interface.

As you can see, there are a number of menu options relating to statistics, on the menu bar. There are also shortcut icons on the toolbar. These serve as quick access to often used options. Holding your mouse over one of these icons for a second or two will result in a short function description for that icon. The current display is that of an empty data sheet. Clearly, data can either be entered manually, or it can be read from an existing data file.

Browsing the file menu, below, reveals nothing too surprising - many of the options are familiar. Although, the details are specific to SPSS. For example, the [New]option is used to specify the type of window to open. The various options, under the [New] heading are,

[Data] Default window with a blank data sheet ready for analyses [Syntax] One can write scripts like those present in the Howell text, instead of using the menus. See the SPSS manuals for help on this topic. [Output] Whenever a procedure is run, the out is directed to a separate window. One can also have multiple [Output] windows open to organize the various analyses that might be conducted. Later, these results can be saved and/or printed. [Script] This window provides the opportunity to write fullblown programmes, in a BASIC-like language. These programmes have access to functions that make up SPSS. With such access it is possible to write user-defined procedures - those not part of SPSS by taking advantage of the SPSS functions. Again, this is beyond the scope of this tutorial.

Also present in the [File] menu are two separate avenues for reading data from existing files. The first is the [Open] option. Like other application packages (e.g., WordPerfect, Excel, ....) SPSS also has it's own format for saving data. In this case, the accepted extension for any file saved using the proprietary format is "sav". So, one can have a datafile saved as "data1.sav". Anyways, this format is not readable with a text editor, it is a binary format. The benefits are that all formatting changes are maintained and the file can be read faster, hence the [Open] option. It is specifically meant for files saved in the SPSS format. The second option, [Read ASCII Data], as the name suggests is to read files that are saved in ASCII format. As can be seen, there are two choices - [Freefield] and [Fixed Columns]. Clicking on one of these options will produce a dialog box. One must specify a number of parameters before a file can be read successfully.

Reading ASCII files requires that the user know something about the format of the data file. Otherwise, one is likely get stuck in the process of reading, or the result may be a costly error. The more restrictive format is [Fixed Columns]. One must know how many variables there are, whether a variable is in numeric or stringformat, and the first and last column of each variable. For example, think of the following as an excerpt from an ASCII datafile.

male 37 102 male 22 115 male 27 99 .... .. ... female 48 107 female 21 103 female 28 122 ...... .. ...

An examination of the datafile provides several key pieces of information, 1. There are 3 variables 2. Variable 1 is a string , Variable 2 and 3 are numeric 3. Variable 1: first column=1, last column=6 o Notice that none of the columns overlap. The longest case for column one is the name "female", that spans from the first column to the sixth - or, the letter e. As you can see, one has to manually locate the first and last column, of each variable. 4. Variable 2: first column=9, last column=10 5. Variable 3: first column=12, last column=14 One needs all of the above information, in addition to, name for each of the three variables. It is a highly structured way of setting up and describing the data. For such files I would suggest becoming comfortable with a good text editor. Failing that, you may wish to try Notepad or WordPad in Win95, but ensure that you save as a textfile with WordPad. A fullfledged word processor like Word or WordPerfect will also work provided that you remember to save as a

textfile. These same editors will allow you to figure out the column locations for each of the variables. The [Freefield] option is less restrictive. Essentially, the columns can be ragged (i.e., overlapping). One need only preserve the order of each variable across all of the cases.

male 37 102 male 22 115 male 27 99 .... .. ... female 48 107 female 21 103 female 28 122 ...... .. ...

Experiment with creating datafiles and reading them with this method. As for the SPSS format, there are a large number of sample datafiles included in your package. Just click on [Open] and find the SPSS home directory. Make sure the filetype in the dialog box associated with [Open] is set to "*.sav" - the default... Before we move onto actual data, click on [Statistics] . The menu that appears reveals many classes of statistics available for use. Each class is further subdivided into other options, as denoted by the little arrow at the right size of the menu selector. Explore what is offered by moving your mouse over the various procedures listed.

Data

To begin the process of adding data, just click on the first cell that is located in the upper left corner of the datasheet. It's just like a spreadsheet. You can enter your data as shown. Enter each datapoint then hit [Enter]. Once you're done with one column of data you can click on the first cell of the next column. These data are taken from table2.1 in Howell's text. The first column represents "Reaction Time in 100ths of a second" and the second column indicates "Frequency".

If you're entering data for the first time, like the above example, the variable names will be automatically generated (e.g., var00001, var00002,....). They are not very informative. To change these names, click on the variable name button. For example, double click on the "var00001" button. Once you have done that, a dialog box will appear. The simplest option is to change the name to something meaningful. For instance, replace "var00001" in the textbox with "RT" (see figure below).

In addition to changing the variable name one can make changes specific to [Type], [Labels], [Missing Values], and [Column Format].

[Type] One can specify whether the data are in numeric or string format, in addition to a few more formats. The default is numeric format.

[Labels] Using the labels option can enhance the readability of the output. A variable name is limited to a length of 8 characters, however, by using a variable label the length can be as much as 256 characters. This provides the ability to have very descriptive labels that will appear at the output. Often, there is a need to code categorical variables in numeric format. For example, male and female can be coded as 1 and 2, respectively. To reduce confusion, it

is recommended that one uses value labels . For the example of gender coding, Value:1 would have a correspoding Value label: male. Similarly, Value:2 would be coded with Value Label: female. (click on the [Labels] button to verify the above)

[Missing Values] See the accompanying help. This option provides a means to code for various types of missing values. [Column Format] The column format dialog provides control over several features of each column (e.g., width of column).

The next image reflects the variable name change.

Once data has been entered or modified, it is adviseable to save. In fact, save as often as possible [File => SaveAs].

SPSS offers a large number of possible formats, including their own. A list of the available formats can be viewed and selected by clicking on the Save as type: , on the SaveAs dialog box. If your intention is to only work in SPSS, then there may be some benefit to saving in the SPSS(*.sav) format. I assume that this format allows for faster reading and writing of the data file. However, if your data will be analyzed and looked by other packages (e.g., a spreadsheet), it would be adviseable to save in a more universal format (e.g., Excel(*.xls), 1-2-3 Rel 3.0 (*.wk3).

Once the type of file has been selected, enter a filename, minus the extension (e.g., sav, xls). You should also save the file in a meaningful directory, on your harddrive or floppy. That is, for any given project a separate directory should be created. You don't want your data to get mixed-up.

The process of reading already saved data can be painless if the saved format is in the SPSS or a spreadsheet format. All one has to do is,

o

click on [File => New => Data]

o o o o

click on [File => Open] : a dialog box will appear navigate to desired directory using the Look in: menu at the top of the dialog box select file type in the Files of type menu click on the filename that is needed.

The process of reading existing files is slightly more involved if the format is ASCII/plain text (see the earlier description of [Freefield] and [Fixed Columns]). As an example, the ASCII data from table2.1 in the Howell text will be used. A file containing the data should be included in the accompanying disk for the text. [Note: It was not present in my disk, so I downloaded the file from Howell's webpage.] I've placed the files on my harddrive at c:\ascdat. In the case of this set

of data,there are four columns representing observation number, reaction time, setsize, and the presence or absence of the target stimulus. This information can be found in thereadme.txt file that is also on the disk. Typically, we are aware of the contents of our own data files, however, it doesn't hurt to keep a record of the contents of such files. To make life easier the [File => Read ASCII Data => Freefield] will be used.

The resulting dialog box requires that a File , a Name and a Data Type be specified for each variable, or column of data. The desired file is accessed by clicking on the [Browse] button, and then navigating to the desired location. Since the extension for the sought after file is dat there is no need to change the Files of type:selection. However, if the extension is something else (e.g., *.txt) then it would be necessary to select All files(*.*) from the Files of type: menu. Since there are 4 variables in this data set, 4 names with the corresponding type information must be specified. To Add the first variable, observations, to the list,

o o o o

type "obs" in the Name box the Data Type is set to Numeric by default. If "obs" was a string variable, then one would have to click on String click on the Add button to include this variable to the list. repeat the above procedure with new names and data types for each of the remaining variables. It is important that all variables be added to the list. Otherwise, the data will be scrambled.

(Please explore the various options by clicking on any accessible menu item.)

The resulting data files appears in the data editor like the following.

Descriptive Statistics

We can replicate the frequency analyses that are described in chapter 2 of the text, by using the file that was just read into the data editor - tab2-1.dat. These analyses were conducted on the reaction time data. Recall, that we have labelled this data as RT.

To begin, click on [Statistics=>Summarize=>Frequencies]....

The result is a new dialog box that allows the user to select the variables of interest. Also, note the other clickable buttons along the border of the dialog box. The buttons labelled [Statistics...] and [Charts...] are of particular importance. Since we're interested in the reaction time data, click on rt followed by a mouse click on the arrow pointing right. The consequence of this action is a transference of the rt variable to the Variables list. At this point, clicking on the [OK] button would spawn an output window with the Frequency information for each of the reaction times. However, more information can be gathered by exploring the options offered by the [Statistics...] and [Charts...].

[Statistics...] offers a number of summary statistics. Any statistic that is selected will be summarized in the output window.

As for the options under [Charts...] click on Bar Charts to replicate the graph in the text.

Once the options have been selected, click on [OK] to run the procedure. The results are then displayed in an output window. In this particular instance the window will include summary statistics for the variable RT, the frequency distribution, and the frequency distribution. You can see all of this by scrolling down the window. The results should also be identical to those in the text.

You may have gathered from the above that calculating summary statistics requires nothing more than selecting variables, and then selecting the desired statistics. The frequency example allowed us to generate frequency information plus measures of central tendencies and dispersion. These statistics can be had by clicking directly on [Statistics=>Summarize=>Descriptives]. Not surprisingly, another dialog box is attached to this procedure. To control the type of statistics produced, click on the [Options...] button. Once again, the options include the typical measures of central tendency and dispersion. Each time as statistical procedure is run, like [Frequencies...] and [Descriptives...] the results are posted to an Output Window. If several procedures are run during one session the results will be appended to the same window. However, greater organization can be reached by opening new Output windows before running each procedure - [File=>New=>Output]. Further, the contents of each of these windows can be saved for later review, or in the case of charts saved to be later included in formattted documents. [Explore by left mouse clicking on any of the output objects (e.g., a frequency table, a chart, ...) followed by a right button click. The left button click will highlight the desired object, while the right button click will pop up a new menu. The next step is to click on the copy option. This action will store the object on the clipboard so that it can be pasted to Word for Windows document, for example.....] Chi-Square & T-Test

The computation of the Chi-Square statistic can be accomplished by clicking on [Statistics => Summarize => Crosstabs...]. This particular procedure will be your first introduction to coding of data, in the data editor. To this point data have been entered in a column format. That is, one variable per column. However, that method is not sufficient in a number of situations, including the calculation of Chi-Square, Independent T-tests, and any Factorial ANOVA design with between subjects factors. I'm sure there are many other cases, but they will not be covered in this tutorial. Essentially, the data have to be entered in a specific format that makes the analysis possible. The format typcially reflects the design of the study, as will be demonstrated in the examples. In your text, the following data appear in section 6.????. Please read the text for a description of the study. Essentially, the table - below - includes the observed data and the expected data in parentheses.

Fault Low High Total

Guilty 153(127.559) 105(130.441) 258

Not Guilty 24(49.441) 76(50.559) 100

Total 177 181 358

In the hopes of minimizing the load time for remaining pages, I will make use of the built in table facility of HTML to simulate the Data Editor in SPSS. This will reduce the number of images/screen captures to be loaded. For the Chi-Square statistic, the table of data can be coded by indexing the column and row of the observations. For example, the count for being guilty with Low fault is 153. This specific cell can be indexed as coming from row=1 and column=1. Similarly, Not Guilty with High fault is coded as row=2 and column=2. For each observation, four in this instance, there is unique code for location on the table. These can be entered as follows,

Row 1 1 2 2

Column 1 2 1 2

Count 153 24 105 76

So, 2 rows * 2 columns equals 4 observations. That should be clear. For each of the rows, there are 2 corresponding columns, that is reflected in the Count column. The Count column represents the number of time each unique combination Row and Column occurs.

The above presents the data in an unambigous manner. Once entered, the analysis is a matter of selecting the desired menu items, and perhaps selecting additional options for that statistic. [Don't forget to use the labelling facilities, as mentioned earlier, to meaningfully identify the columns/variables. The labels that are chosen will appear in the output window.] To perform the analysis,

The first step is to inform SPSS that the COUNT variable represents the frequency for each unique coding of ROW and COLUMN, by invoking the WEIGHT command. To do this, click on [Data => Weight Cases]. In the resultant dialog box, enable the Weight cases by option, then move the COUNT variable into the Frequency Variable box. If this step is forgotten, the count for each cell will be 1 for the table.

Now that the COUNT variable has been processed as a weighted variable, select [Statistics => Summarize => Crosstabs...] to launch the controlling dialog box. At the bottom of the dialog box are three buttons, with the most important being the [Statistics...] button. You must click on the [Statistics...] button and then select the Chi-square option, otherwise the statistic will not be calculated. Exploring this dialog box makes it clear that SPSS can be forced to calcuate a number of other statistics in conjuction with Chi-square. For example, one can select the various measures of association (e.g., contingency coefficient, phi and cramer's v,...), among others. Move the ROW variable into the Row(s): box, and the COLUMN variable into the Column(s):, then click [OK] to perform the analysis. A subset of the output looks like the following,

Although simple, the calculation of the Chi-square statistic is very particular about all the required steps being followed. More generally, as we enter hypothesis testing, the user should be very careful and should make use of manuals for the programme and textbooks for statistics.

T-tests By now, you should know that there are two forms of the t-test, one for dependent variables and one for independent variables, or observations. To inform SPSS, or any stats package for that matter, of the type of design it is necessary to have to different ways of laying out the data. For the dependent design, the two variables in question must be entered in two columns. For independent t-tests, the observations for the two groups must be uniquely coded with a Gruop variable. Like the calculation of the Chi-square statistic, these calculations will reinforce the practice of thinking about, and laying out the data in the correct format. Dependent T-Test To calculate this statistic, one must select [Statistics => Compare Means => Paired-Samples T Test...] after enterin the data. For this analysis, we'll use the data from Table 7.3, in Howell.

Enter the data into a new datafile. Your data should look a bit like the following. That is, the two variables should occupy separate columns...

Mnths_6 124 94 115 110 116 139 116 110 129 120 105 88 120 120 116 105 ... ... 123

Mnths_24 114 88 102 2 2 2 2 2 2 2 2 2 2 2 2 2 ... ... 132

Note that the variable names start with a letter and are less than 8 characters long. This is a bit constraining, however, one can use the variable label option to label the variable with a longer name. This more descriptive name will then be reproduced in the output window.

To calculate the t statistic click on [Statistics => Compare Means => Paired-Samples T Test...], then select the two variables of interest. To select the two variables, hold the [Shift] key down while using the mouse for selection. You will note that the selection box requires that variables be selected two at a time. Once the two variables have been selected, move them to the Paired Variables: list. This procedure can be repeated for each pair of variables to be analyzed. In this case, select MNTHS_6 and MNTHS_24 together, then move them to the Paired Variables list. Finally, click the [OK] button. The critical result for the current analysis will appear in the output window as follows,

As you can see an exact t-value is provided along with an exact p-value, and this p-value is greater that the expected value of 0.025, for a two-tailed assessment. Closer examination indicates several other statistics are presented in output window. Quite simply, such calculations require very little effort! Independent T-tests When calculating an independent t-test, the only difference involves the way the data are formatted in the datasheet. The datasheet must include both the raw data and group coding, for each variable. For this example, the data from table 7.5 will be used. As an added bonus, the number of observations are unequal for this example. Take a look at the following table to get a feel for how to code the data. Group 1 1 1 1 1 1 1 1 1 1 2 2 Exp_Con 96 127 127 119 109 143 ... ... 106 109 114 88

2 2 2 2 2 2 2 2

104 104 91 96 ... ... 114 132

From the above you can see that we used the "Group" variable to code for the two variables. The value of 1 was used to code for "LBW-Experimental", while a value of 2 was used to code for "LBW-Control". If you're confused please study the table, above. To generate the t-statistic,

Clik on [Statistics => Compare Means => Independent-Samples T Test] to launch the appropriate dialog box. Select "exp_con" - the dependent variable list - and move it to the Test Variable(s): box. Select "group" - the grouping variable list - and move it to the Grouping Variable: box. The final step requires that the groups be defined. That is, one must specify that Group1 the experimental group in this case - is coded as 1, and Group2 - the control group in this case - is coded as 2. To do this, click on the [Define Groups...] button. Click on the [Continue] button to return to the controlling dialog box. Run the analysis by clicking on the [OK] button. The output for the current analysis extracted from the output window looks like the following.

The p-value of .004 is way lower than the cutoff of 0.025, and that suggests that the means are significantly different. Further, a Levene's Test is performed to ensure that the correct results are used. In this case the variances are equal, however, the calculations for unequal variances are also presented, among some other statistics - some not presented.

Correlations and Regression

This will be a brief tutorial, since there is very little that is required to calculate correlations and linear regressions. To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...], and [Statistics => Regression => Linear] for the calculation of a linear regression. For this section, the analyses presented in the computer section of the Correlation and Regression chapter will be replicated. To begin, enter the data as follows,

IQ 102 108 109 118 79 88 ... ... 85

GPA 2.75 4.00 2.25 3.00 1.67 2.25 ... ... 2.50

Simple Correlation

Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and "GPA" to the Variables: list. [Explore the options presented on this controlling dialog box.] Click on [OK] to generate the requested statistics.

The results from output window should look like the following,

As you can see, r=0.702, and p=.000. The results suggest that the correlation is significant. Note: In the above example we only created a correlation matrix based on two variables. The process of generating a matrix based on more than two variables is not different. That is, if the dataset consisted of 10 variables, they could have all been placed in the Variables: list. The resulting matrix would include all the possible pairwise correlations.

Correlation and Regression

Linear regression....it is possible to output the regression coefficients necessary to predict one variable from the other - that minimize error. To do so, one must select the [Statistics => Regression => Linear...] option. Further, there is a need to know which variable will be used as the dependent variable and which will be used as the independent variable(s). In our current example, GPA will be the dependent variable, and IQ will act as the independent variable. Specifically,

Initiate the procedure by clicking on [Statistics => Regression => Linear...] Select and move GPA into the Dependent: variable box Select and move IQ into the Independent(s): variable box Click on the [OK] to generate the statistics. Note: A variety of options can be accessed via the buttons on the bottom half of this controlling dialog box (e.g., Statistics, Plots,...). Many more statistics can be generated by explore the additional options via the Statistics button.

Some of the results of this analysis are presented below,

The correlation is still 0.702, and the p value is still 0.000. The additional statistics are "Constant", or a from the text, and "Slope", or B from the text. If you recall, the dependent variable is GPA, in this case. As such, one can predict GPA with the following, GPA = -1.777 + 0.0448*IQ The next section will discuss the calculation of the ANOVA.

One-Way ANOVA

As in the independent t-test datasheet, the data must be coded with a group variable. The data that will be used for the first part of this section is from Table 11.2, of Howell. There are 5 groups of 10 observations each - resulting in a total of 50 observations. The group variable will be coded from 1 to 5, for each group. Take a look at the following to get an idea of the coding.

Groups 1 1 1 ... 1 2 2 2 ... ... ... 5 5 ... 5

Scores 9 8 6 ... 7 7 9 6 ... ... ... 10 19 ... 11

The coding scheme uniquely identifies the origin of each observation. To complete the analysis,

Select [Statistics => Compare Means => One-Way ANOVA...] to launch the controlling dialog box. Select and move "Scores" into the Dependent list: Select and move "Groups" into the Factor: list Click on [OK] The preceeding is a complete spefication of the design for this oneway anova. The simple presentation of the results, as taken from the output window, will look like the following,

The analysis that was just performed provides minimal details with regard to the data. If you take a look at the controlling dialog box, you will find 3 additional buttons on the bottom half - [Contrasts...], [Post Hoc..], and [Options...].

Selecting [Options...] you will find,

If Descriptive is enabled, then the descriptive statistics for each condition will be generated. Making Homogeneity-of-variance active forces a Levene's test on the data. The statistics from both of these analyses will be reproduced in the output window. Selecting [Post Hoc] will launch the following dialog box,

One can active one or multiple post hoc tests to be performed. The results will then be placed in the output window. For example, performing a R-E-G-W F statistic on the current data would produce the following,

Finally, one can use the [Contrasts...] option to specify linear and/or orthogonal sets of contrasts. One can also perform trend analysis via this option. For example, we may wish to contrast the third condition with the fifth,

For each contrast, the coefficients must be entered individually, and in order. Once can also enter multiple contrasts, by using the [Next] present in the dialog box. The result for the example contrast would look like the following,

Further, one can use the Polynomial option to test whether a specific trend in the data exists. Factorial designs will be covered in the next section.

Factorial ANOVA

To conduct a Factorial ANOVA one only need extend the logic of the oneway design. Table 13.2 presents the data for a 2 by 5 factorial ANOVA. The first factor, AGE, has two levels, and the second factor, CONDITION, has five levels. So, once again each observation can be uniquely coded.

AGE Old = 1 CONDITION Counting = 1

Young = 2

Rhyming = 2 Adjective = 3 Imagery = 4 Intentional = 5

For each pairing of AGE and CONDITION, there are 10 observations. That is, 2*5 conditions by 10 observations per condition results in 100 observations, that can be coded as follows. [Note, that the names for the factors are meaniful.]

AGE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2

CONDITIO 1 1 1 ... 1 2 2 2 ... ... ... 5 5 ... 5 1 1 1 ... 1 2 2 2 ... ... ... 5

Scores 9 8 6 ... 7 7 9 6 ... ... ... 10 19 ... 11 8 6 4 ... 7 10 7 8 ... ... ... 21

2 2 2

5 ... 5

19 ... 21

Examine the table carefully, until you understand how the coding has been implemented. Note: one can enhance the readability of the output by using Value Labelsfor the two factors.

To compute the relevant statistics - a simple approach,

Select [Statistics => General Linear Model => Simple Factorial...] Select and move "Scores" into the Dependent: box Select and move "Age" into the Factor(s): box. Click on [Define Range...] to specify the range of coding for the Age factor. Recall that 1 is used for Old and 2 is used for Young. So, the Minimum: value is <1>, and the Maximum: value is 2. Click on [Continue]. Select and move "Conditio" into the Dependent: box Click on [Define Range...] to specify the range of the Condition factor. In this case the Minimum: value is 1 and the Maximum: value is 5. By clicking on the [Options...] button one has the opportunity to select the Method used. According to the online help,

"Method: Allows you to choose an alternate method for decomposing sums of squares. Method selection controls how the effects are assessed." For the our purposes, selecting the Hierarchical, or the Experimental method will make available the option to output Means and counts. --- Note: I don't know the details of these methods, however, they are probably documented.

Under [Options...] activate Hierarchical, or Experimental, then activate Means and counts - Click [Continue] Click on [OK] to generate the output.

As you can see the use of the Means and count option produces a nice summary table, with all the Variable Labels and Value Labels that were incorporated into the datasheet. Again, the use of those options makes the output a great deal more readable.

The output is a complete source table with the factors identified with Variable Labels As noted earlier, the analysis that was just conducted is the simplest approach to performing a Factorial ANOVA. If one uses [Statistics => General Linear Model => GLM - General Factorial...], then more options become available. The specification of the Dependent and Independent factors is the as the method used for the Simple Factorial analysis. Beyond that, the options include,

By selecting [Model...], one can specify a Custom model. The default is for a Fully Factorial model, however, with the Custom option one can explicitly determine the effects to look at. The Contasts option allows one "test the differences among the levels of a factor" (see the manual for greater detail). Various graphs can be specified with the [Plots...] option. For example, one can plot "Conditio" on the Horizontal Axis:, and "Age" on Separate Lines:, to generate a simple "conditio*age" plot (see the dialog box for [Plots...],

The standard post-hoc tests for each factor can be calculated by selecting the desired options under [Post Hoc...]. All one has to do is select the factors to analyze and the appropriate post-hoc(s). The [Options...] dialog box provides a number of diagnostic and descriptive features. One can generate descriptive statistics, estimates of effect size, and tests for homogeneity of variance - among others. An example source table using some of these options would look like the following,

The use of the GLM - General Factorial procedure offers a great deal more than the Simple Factorial. Depending on your needs, the former procedure may provide greater insight into your data. Explore these options! Higher order factorial designs are carried in the same manner as the two factor analysis presented above. One need only code the factors appropriately, and enter the corresponding observations. Repeated measures designs will be discussed in the next section.

One-Sample z-test Requirements: Normally distributed population, σ known Test for population mean Hypothesis test Formula: where is the sample mean, Δ is a specified value to be tested, σ is the population standard deviation, and n is the size of the sample. Look up the significance level of the z-value in the standard normal table (Table in Appendix ). A herd of 1,500 steer was fed a special high-protein grain for a month. A random sample of 29 were weighed and had gained an average of 6.7 pounds. If the standard deviation of weight gain for the entire herd is 7.1, test the hypothesis that the average weight gain per steer for the month was more than 5 pounds. null hypothesis: H0: μ = 5 alternative hypothesis: Ha: μ > 5 Tabled value for z ≤ 1.28 is 0.8997 1 – 0.8997 = 0.1003 So, the conditional probability that a sample from the herd gains at least 6.7 pounds per steer is p = 0.1003. Should the null hypothesis of a weight gain of less than 5 pounds for the population be rejected? That depends on how conservative you want to be. If you had decided beforehand on a significance level of p < 0.05, the null hypothesis could not be rejected. In national use, a vocabulary test is known to have a mean score of 68 and a standard deviation of 13. A class of 19 students takes the test and has a mean score of 65. Is the class typical of others who have taken the test? Assume a significance level ofp < 0.05. There are two possible ways that the class may differ from the population. Its scores may be lower than, or higher than, the population of all students taking the test; therefore, this problem requires a two-tailed test. First, state the null and alternative hypotheses: null hypothesis: H0: μ = 68 alternative hypothesis: Ha : μ ≠ 68

Because you have specified a significance level, you can look up the critical z-value in Table of Appendix before computing the statistic. This is a two-tailed test; so the 0.05 must be split such that 0.025 is in the upper tail and another 0.025 in the lower. The z-value that corresponds to – 0.025 is –1.96, which is the lower criticalz-value. The upper value corresponds to 1 – 0.025, or 0.975, which gives a z-value of 1.96. The null hypothesis of no difference will be rejected if the computed z statistic falls outside the range of –1.96 to 1.96. Next, compute the z statistic:

Because –1.006 is between –1.96 and 1.96, the null hypothesis of population mean is 68 and cannot be rejected. That is, there is not evidence that this class can be considered different from others who have taken the test. Confidence interval for population mean using z Formula: where a and b are the limits of the confidence interval, is the sample mean, is the upper (or positive) z-value from the standard normal table corresponding to half of the desired alpha level (because all confidence intervals are two-tailed), σ is the population standard deviation, and n is the size of the sample. Example 3 A sample of 12 machine pins has a mean diameter of 1.15 inches, and the population standard deviation is known to be 0.04. What is a 99 percent confidence interval of diameter width for the population? First, determine the z-value. A 99 percent confidence level is equivalent to p < 0.01. Half of 0.01 is 0.005. The z-value corresponding to an area of 0.005 is 2.58. The interval may now be calculated:

The interval is (1.12, 1.18). We have 99 percent confidence that the population mean of pin diameters lies between 1.12 and 1.18 inches. Note that this is not the same as saying that 99 percent of the machine pins have diameters between 1.12 and 1.18 inches, which would be an incorrect conclusion from this test. Choosing a sample size Because surveys cost money to administer, researchers often want to calculate how many subjects will be needed to determine a population mean using a fixed confidence interval and significance level. The formula is

where n is the number of subjects needed, is the critical z-value corresponding to the desired significance level, σ is the population standard deviation, and w is the desired confidence interval width. Example 4 How many subjects will be needed to find the average age of students at Fisher College plus or minus a year, with a 95 percent significance level and a population standard deviation of 3.5?

Rounding up, a sample of 48 students would be sufficient to determine students' mean age plus or minus one year. Note that the confidence interval width is always double the “plus or minus” figure. The T-Test The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis for the posttest-only two-group randomized experimental design.

Figure 1. Idealized distributions for treated and comparison group posttest values.

Figure 1 shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the idealized distribution -- the actual distribution would usually be depicted with a histogram or bar graph. The figure indicates where the control and treatment

group means are located. The question the t-test addresses is whether the means are statistically different. What does it mean to say that the averages for two groups are statistically different? Consider the three situations shown in Figure 2. The first thing to notice about the three situations is that the difference between the means is the same in all three. But, you should also notice that the three situations don't look the same -- they tell very different stories. The top example shows a case with moderate variability of scores within each group. The second situation shows the high variability case. The third shows the case with low variability. Clearly, we would conclude that the two groups appear most different or distinct in the bottom or low-variability case. Why? As there is relatively little overlap between the two bell-shaped curves. In the high variability case, the group difference appears least striking because the two bell-shaped distributions overlap so much.

Figure 2. Three scenarios for differences between means.

This leads us to a very important conclusion: when we are looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their scores. The t-test does just this. Statistical Analysis of the t-test

The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group difference. Figure 3 shows the formula for the t-test and how the numerator and denominator are related to the distributions.

Figure 3. Formula for the t-test.

The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square root. The specific formula is given in Figure 4:

Figure 4. Formula for the Standard error of the difference between the means.

Remember, that the variance is simply the square of the standard deviation. The final formula for the t-test is shown in Figure 5:

Figure 5. Formula for the t-test.

The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value you have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, you need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred you would find a statistically significant difference between the means even if there was none (i.e., by "chance"). You also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, you can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, you can conclude that the difference between the means for the two groups is different (even given the variability). Fortunately, statistical computer programs routinely print the significance test results and save you the trouble of looking them up in a table. The t-test, one-way Analysis of Variance (ANOVA) and a form of regression analysis are mathematically equivalent and would yield identical results. The Analysis Of Variance, popularly known as the ANOVA, can be used in cases where there are more than two groups. When we have only two samples we can use the t-test to compare the means of the samples but it might become unreliable in case of more than two samples. If we only compare two means, then the t-test (independent samples) will give the same results as the ANOVA.

It is used to compare the means of more than two samples. This can be understood better with the help of an example. One Way Anova EXAMPLE: Suppose we want to test the effect of five different exercises. For this, we recruit 20 men and assign one type of exercise to 4 men (5 groups). Their weights are recorded after a few weeks. We may find out whether the effect of these exercises on them is significantly different or not and this may be done by comparing the weights of the 5 groups of 4 men each. The example above is a case of one-way balanced ANOVA. It has been termed as one-way as there is only one category whose effect has been studied and balanced as the same number of men has been assigned on each exercise. Thus the basic idea is to test whether the samples are all alike or not. Why Not Multiple T-Tests? As mentioned above, the t-test can only be used to test differences between two means. When there are more than two means, it is possible to compare each mean with each other mean using many t-tests. But conducting such multiple t-tests can lead to severe complications and in such circumstances we use ANOVA. Thus, this technique is used whenever an alternative procedure is needed for testing hypotheses concerning means when there are several populations. One Way and Two Way Anova Now some questions may arise as to what are the means we are talking about and why variances are analyzed in order to derive conclusions about means. The whole procedure can be made clear with the help of an experiment. Let us study the effect of fertilizers on yield of wheat. We apply five fertilizers, each of different quality, on five plots of land each of wheat. The yield from each plot of land is recorded and the difference in yield among the plots is observed. Here, fertilizer is a factor and the different qualities of fertilizers are called levels. This is a case of one-way or one-factor ANOVA since there is only one factor, fertilizer. We may also be interested to study the effect of fertility of the plots of land. In such a case we would have two factors, fertilizer and fertility. This would be a case of two-way or two-factor ANOVA. Similarly, a third factor may be incorporated to have a case of three-way or three-factor ANOVA.

Chance Cause and Assignable Cause In the above experiment the yields obtained from the plots may be different and we may be tempted to conclude that the differences exist due to the differences in quality of the fertilizers. But this difference may also be the result of certain other factors which are attributed to chance and which are beyond human control. This factor is termed as “error”. Thus, the differences or variations that exist within a plot of land may be attributed to error. Thus, estimates of the amount of variation due to assignable causes (or variance between the samples) as well as due to chance causes (or variance within the samples) are obtained separately and compared using an F-test and conclusions are drawn using the value of F. Assumptions There are four basic assumptions used in ANOVA. the expected values of the errors are zero the variances of all errors are equal to each other the errors are independent they are normally distributed

SPSS for Windows

A brief tutorial

This tutorial is a brief look at what SPSS for Windows is capable of doing. Examples will come from Statistical Methods for Psychology by David C. Howell. It is not our intention to teach you about statistics in this tutorial. For that you should rely on your classes in statistics and/or a good textbook. If you're a novice this tutorial should give you a feel for the programme and how to navigate through the many options. Beyond that, the SPSS Help Files should be used as a resource. Further, SPSS sells a number of very good manuals. The Basics SPSS for Windows has the same general look a feel of most other programmes for Windows. Virtually anything statistic that you wish to perform can be accomplished in combination with

pointing and clicking on the menus and various interactive dialog boxes. You may have noted that the examples in the Howell textbook are performed/analyzed via code. That is, SPSS, like many other packages, can be accessed by programming short scripts, instead of pointing and clicking. We will not cover any programming in this tutorial. Presumeably, SPSS is already installed on your computer. If you don't have a shortcut on your desktop go to the [Start => Programs] menu and start the package by clicking on the SPSS icon. Before proceeding I should say a few words about a very simple convention that will be used in this tutorial. In this point and click environment one often has to navigate through many layers of menu items before encountering the required option. In the above paragraph the prescribed task was to locate the SPSS icon in the[Start] menu structure. To get to that icon, one must first click on [Start] then move the pointer to the [Programs] options, before locating the SPSS icon. This sequence of events can be conveyed by typing [Start => Programs] . That is, one must move from the outer layer of the menu structure to some inner layer in sequence.... Now, back to the tutorial. Once you've clicked on the SPSS icon a new window will appear on the screen. The appearance is that of a standard programme for windows with a spreadsheet-like interface.

As you can see, there are a number of menu options relating to statistics, on the menu bar. There are also shortcut icons on the toolbar. These serve as quick access to often used options. Holding your mouse over one of these icons for a second or two will result in a short function description for that icon. The current display is that of an empty data sheet. Clearly, data can either be entered manually, or it can be read from an existing data file.

Browsing the file menu, below, reveals nothing too surprising - many of the options are familiar. Although, the details are specific to SPSS. For example, the [New]option is used to specify the type of window to open. The various options, under the [New] heading are,

[Data] Default window with a blank data sheet ready for analyses [Syntax] One can write scripts like those present in the Howell text, instead of using the menus. See the SPSS manuals for help on this topic. [Output] Whenever a procedure is run, the out is directed to a separate window. One can also have multiple [Output] windows open to organize the various analyses that might be conducted. Later, these results can be saved and/or printed. [Script] This window provides the opportunity to write fullblown programmes, in a BASIC-like language. These programmes have access to functions that make up SPSS. With such access it is possible to write user-defined procedures - those not part of SPSS by taking advantage of the SPSS functions. Again, this is beyond the scope of this tutorial.

Also present in the [File] menu are two separate avenues for reading data from existing files. The first is the [Open] option. Like other application packages (e.g., WordPerfect, Excel, ....) SPSS also has it's own format for saving data. In this case, the accepted extension for any file saved using the proprietary format is "sav". So, one can have a datafile saved as "data1.sav". Anyways, this format is not readable with a text editor, it is a binary format. The benefits are that all formatting changes are maintained and the file can be read faster, hence the [Open] option. It is specifically meant for files saved in the SPSS format. The second option, [Read ASCII Data], as the name suggests is to read files that are saved in ASCII format. As can be seen, there are two choices - [Freefield] and [Fixed Columns]. Clicking on one of these options will produce a dialog box. One must specify a number of parameters before a file can be read successfully.

Reading ASCII files requires that the user know something about the format of the data file. Otherwise, one is likely get stuck in the process of reading, or the result may be a costly error. The more restrictive format is [Fixed Columns]. One must know how many variables there are, whether a variable is in numeric or stringformat, and the first and last column of each variable. For example, think of the following as an excerpt from an ASCII datafile.

male 37 102 male 22 115 male 27 99 .... .. ... female 48 107 female 21 103 female 28 122 ...... .. ...

An examination of the datafile provides several key pieces of information, 1. There are 3 variables 2. Variable 1 is a string , Variable 2 and 3 are numeric 3. Variable 1: first column=1, last column=6 o Notice that none of the columns overlap. The longest case for column one is the name "female", that spans from the first column to the sixth - or, the letter e. As you can see, one has to manually locate the first and last column, of each variable. 4. Variable 2: first column=9, last column=10 5. Variable 3: first column=12, last column=14 One needs all of the above information, in addition to, name for each of the three variables. It is a highly structured way of setting up and describing the data. For such files I would suggest becoming comfortable with a good text editor. Failing that, you may wish to try Notepad or WordPad in Win95, but ensure that you save as a textfile with WordPad. A fullfledged word processor like Word or WordPerfect will also work provided that you remember to save as a

textfile. These same editors will allow you to figure out the column locations for each of the variables. The [Freefield] option is less restrictive. Essentially, the columns can be ragged (i.e., overlapping). One need only preserve the order of each variable across all of the cases.

male 37 102 male 22 115 male 27 99 .... .. ... female 48 107 female 21 103 female 28 122 ...... .. ...

Experiment with creating datafiles and reading them with this method. As for the SPSS format, there are a large number of sample datafiles included in your package. Just click on [Open] and find the SPSS home directory. Make sure the filetype in the dialog box associated with [Open] is set to "*.sav" - the default... Before we move onto actual data, click on [Statistics] . The menu that appears reveals many classes of statistics available for use. Each class is further subdivided into other options, as denoted by the little arrow at the right size of the menu selector. Explore what is offered by moving your mouse over the various procedures listed.

Data

To begin the process of adding data, just click on the first cell that is located in the upper left corner of the datasheet. It's just like a spreadsheet. You can enter your data as shown. Enter each datapoint then hit [Enter]. Once you're done with one column of data you can click on the first cell of the next column. These data are taken from table2.1 in Howell's text. The first column represents "Reaction Time in 100ths of a second" and the second column indicates "Frequency".

If you're entering data for the first time, like the above example, the variable names will be automatically generated (e.g., var00001, var00002,....). They are not very informative. To change these names, click on the variable name button. For example, double click on the "var00001" button. Once you have done that, a dialog box will appear. The simplest option is to change the name to something meaningful. For instance, replace "var00001" in the textbox with "RT" (see figure below).

In addition to changing the variable name one can make changes specific to [Type], [Labels], [Missing Values], and [Column Format].

[Type] One can specify whether the data are in numeric or string format, in addition to a few more formats. The default is numeric format.

[Labels] Using the labels option can enhance the readability of the output. A variable name is limited to a length of 8 characters, however, by using a variable label the length can be as much as 256 characters. This provides the ability to have very descriptive labels that will appear at the output. Often, there is a need to code categorical variables in numeric format. For example, male and female can be coded as 1 and 2, respectively. To reduce confusion, it

is recommended that one uses value labels . For the example of gender coding, Value:1 would have a correspoding Value label: male. Similarly, Value:2 would be coded with Value Label: female. (click on the [Labels] button to verify the above)

[Missing Values] See the accompanying help. This option provides a means to code for various types of missing values. [Column Format] The column format dialog provides control over several features of each column (e.g., width of column).

The next image reflects the variable name change.

Once data has been entered or modified, it is adviseable to save. In fact, save as often as possible [File => SaveAs].

SPSS offers a large number of possible formats, including their own. A list of the available formats can be viewed and selected by clicking on the Save as type: , on the SaveAs dialog box. If your intention is to only work in SPSS, then there may be some benefit to saving in the SPSS(*.sav) format. I assume that this format allows for faster reading and writing of the data file. However, if your data will be analyzed and looked by other packages (e.g., a spreadsheet), it would be adviseable to save in a more universal format (e.g., Excel(*.xls), 1-2-3 Rel 3.0 (*.wk3).

Once the type of file has been selected, enter a filename, minus the extension (e.g., sav, xls). You should also save the file in a meaningful directory, on your harddrive or floppy. That is, for any given project a separate directory should be created. You don't want your data to get mixed-up.

The process of reading already saved data can be painless if the saved format is in the SPSS or a spreadsheet format. All one has to do is,

o

click on [File => New => Data]

o o o o

click on [File => Open] : a dialog box will appear navigate to desired directory using the Look in: menu at the top of the dialog box select file type in the Files of type menu click on the filename that is needed.

The process of reading existing files is slightly more involved if the format is ASCII/plain text (see the earlier description of [Freefield] and [Fixed Columns]). As an example, the ASCII data from table2.1 in the Howell text will be used. A file containing the data should be included in the accompanying disk for the text. [Note: It was not present in my disk, so I downloaded the file from Howell's webpage.] I've placed the files on my harddrive at c:\ascdat. In the case of this set

of data,there are four columns representing observation number, reaction time, setsize, and the presence or absence of the target stimulus. This information can be found in thereadme.txt file that is also on the disk. Typically, we are aware of the contents of our own data files, however, it doesn't hurt to keep a record of the contents of such files. To make life easier the [File => Read ASCII Data => Freefield] will be used.

The resulting dialog box requires that a File , a Name and a Data Type be specified for each variable, or column of data. The desired file is accessed by clicking on the [Browse] button, and then navigating to the desired location. Since the extension for the sought after file is dat there is no need to change the Files of type:selection. However, if the extension is something else (e.g., *.txt) then it would be necessary to select All files(*.*) from the Files of type: menu. Since there are 4 variables in this data set, 4 names with the corresponding type information must be specified. To Add the first variable, observations, to the list,

o o o o

type "obs" in the Name box the Data Type is set to Numeric by default. If "obs" was a string variable, then one would have to click on String click on the Add button to include this variable to the list. repeat the above procedure with new names and data types for each of the remaining variables. It is important that all variables be added to the list. Otherwise, the data will be scrambled.

(Please explore the various options by clicking on any accessible menu item.)

The resulting data files appears in the data editor like the following.

Descriptive Statistics

We can replicate the frequency analyses that are described in chapter 2 of the text, by using the file that was just read into the data editor - tab2-1.dat. These analyses were conducted on the reaction time data. Recall, that we have labelled this data as RT.

To begin, click on [Statistics=>Summarize=>Frequencies]....

The result is a new dialog box that allows the user to select the variables of interest. Also, note the other clickable buttons along the border of the dialog box. The buttons labelled [Statistics...] and [Charts...] are of particular importance. Since we're interested in the reaction time data, click on rt followed by a mouse click on the arrow pointing right. The consequence of this action is a transference of the rt variable to the Variables list. At this point, clicking on the [OK] button would spawn an output window with the Frequency information for each of the reaction times. However, more information can be gathered by exploring the options offered by the [Statistics...] and [Charts...].

[Statistics...] offers a number of summary statistics. Any statistic that is selected will be summarized in the output window.

As for the options under [Charts...] click on Bar Charts to replicate the graph in the text.

Once the options have been selected, click on [OK] to run the procedure. The results are then displayed in an output window. In this particular instance the window will include summary statistics for the variable RT, the frequency distribution, and the frequency distribution. You can see all of this by scrolling down the window. The results should also be identical to those in the text.

You may have gathered from the above that calculating summary statistics requires nothing more than selecting variables, and then selecting the desired statistics. The frequency example allowed us to generate frequency information plus measures of central tendencies and dispersion. These statistics can be had by clicking directly on [Statistics=>Summarize=>Descriptives]. Not surprisingly, another dialog box is attached to this procedure. To control the type of statistics produced, click on the [Options...] button. Once again, the options include the typical measures of central tendency and dispersion. Each time as statistical procedure is run, like [Frequencies...] and [Descriptives...] the results are posted to an Output Window. If several procedures are run during one session the results will be appended to the same window. However, greater organization can be reached by opening new Output windows before running each procedure - [File=>New=>Output]. Further, the contents of each of these windows can be saved for later review, or in the case of charts saved to be later included in formattted documents. [Explore by left mouse clicking on any of the output objects (e.g., a frequency table, a chart, ...) followed by a right button click. The left button click will highlight the desired object, while the right button click will pop up a new menu. The next step is to click on the copy option. This action will store the object on the clipboard so that it can be pasted to Word for Windows document, for example.....] Chi-Square & T-Test

The computation of the Chi-Square statistic can be accomplished by clicking on [Statistics => Summarize => Crosstabs...]. This particular procedure will be your first introduction to coding of data, in the data editor. To this point data have been entered in a column format. That is, one variable per column. However, that method is not sufficient in a number of situations, including the calculation of Chi-Square, Independent T-tests, and any Factorial ANOVA design with between subjects factors. I'm sure there are many other cases, but they will not be covered in this tutorial. Essentially, the data have to be entered in a specific format that makes the analysis possible. The format typcially reflects the design of the study, as will be demonstrated in the examples. In your text, the following data appear in section 6.????. Please read the text for a description of the study. Essentially, the table - below - includes the observed data and the expected data in parentheses.

Fault Low High Total

Guilty 153(127.559) 105(130.441) 258

Not Guilty 24(49.441) 76(50.559) 100

Total 177 181 358

In the hopes of minimizing the load time for remaining pages, I will make use of the built in table facility of HTML to simulate the Data Editor in SPSS. This will reduce the number of images/screen captures to be loaded. For the Chi-Square statistic, the table of data can be coded by indexing the column and row of the observations. For example, the count for being guilty with Low fault is 153. This specific cell can be indexed as coming from row=1 and column=1. Similarly, Not Guilty with High fault is coded as row=2 and column=2. For each observation, four in this instance, there is unique code for location on the table. These can be entered as follows,

Row 1 1 2 2

Column 1 2 1 2

Count 153 24 105 76

So, 2 rows * 2 columns equals 4 observations. That should be clear. For each of the rows, there are 2 corresponding columns, that is reflected in the Count column. The Count column represents the number of time each unique combination Row and Column occurs.

The above presents the data in an unambigous manner. Once entered, the analysis is a matter of selecting the desired menu items, and perhaps selecting additional options for that statistic. [Don't forget to use the labelling facilities, as mentioned earlier, to meaningfully identify the columns/variables. The labels that are chosen will appear in the output window.] To perform the analysis,

The first step is to inform SPSS that the COUNT variable represents the frequency for each unique coding of ROW and COLUMN, by invoking the WEIGHT command. To do this, click on [Data => Weight Cases]. In the resultant dialog box, enable the Weight cases by option, then move the COUNT variable into the Frequency Variable box. If this step is forgotten, the count for each cell will be 1 for the table.

Now that the COUNT variable has been processed as a weighted variable, select [Statistics => Summarize => Crosstabs...] to launch the controlling dialog box. At the bottom of the dialog box are three buttons, with the most important being the [Statistics...] button. You must click on the [Statistics...] button and then select the Chi-square option, otherwise the statistic will not be calculated. Exploring this dialog box makes it clear that SPSS can be forced to calcuate a number of other statistics in conjuction with Chi-square. For example, one can select the various measures of association (e.g., contingency coefficient, phi and cramer's v,...), among others. Move the ROW variable into the Row(s): box, and the COLUMN variable into the Column(s):, then click [OK] to perform the analysis. A subset of the output looks like the following,

Although simple, the calculation of the Chi-square statistic is very particular about all the required steps being followed. More generally, as we enter hypothesis testing, the user should be very careful and should make use of manuals for the programme and textbooks for statistics.

T-tests By now, you should know that there are two forms of the t-test, one for dependent variables and one for independent variables, or observations. To inform SPSS, or any stats package for that matter, of the type of design it is necessary to have to different ways of laying out the data. For the dependent design, the two variables in question must be entered in two columns. For independent t-tests, the observations for the two groups must be uniquely coded with a Gruop variable. Like the calculation of the Chi-square statistic, these calculations will reinforce the practice of thinking about, and laying out the data in the correct format. Dependent T-Test To calculate this statistic, one must select [Statistics => Compare Means => Paired-Samples T Test...] after enterin the data. For this analysis, we'll use the data from Table 7.3, in Howell.

Enter the data into a new datafile. Your data should look a bit like the following. That is, the two variables should occupy separate columns...

Mnths_6 124 94 115 110 116 139 116 110 129 120 105 88 120 120 116 105 ... ... 123

Mnths_24 114 88 102 2 2 2 2 2 2 2 2 2 2 2 2 2 ... ... 132

Note that the variable names start with a letter and are less than 8 characters long. This is a bit constraining, however, one can use the variable label option to label the variable with a longer name. This more descriptive name will then be reproduced in the output window.

To calculate the t statistic click on [Statistics => Compare Means => Paired-Samples T Test...], then select the two variables of interest. To select the two variables, hold the [Shift] key down while using the mouse for selection. You will note that the selection box requires that variables be selected two at a time. Once the two variables have been selected, move them to the Paired Variables: list. This procedure can be repeated for each pair of variables to be analyzed. In this case, select MNTHS_6 and MNTHS_24 together, then move them to the Paired Variables list. Finally, click the [OK] button. The critical result for the current analysis will appear in the output window as follows,

As you can see an exact t-value is provided along with an exact p-value, and this p-value is greater that the expected value of 0.025, for a two-tailed assessment. Closer examination indicates several other statistics are presented in output window. Quite simply, such calculations require very little effort! Independent T-tests When calculating an independent t-test, the only difference involves the way the data are formatted in the datasheet. The datasheet must include both the raw data and group coding, for each variable. For this example, the data from table 7.5 will be used. As an added bonus, the number of observations are unequal for this example. Take a look at the following table to get a feel for how to code the data. Group 1 1 1 1 1 1 1 1 1 1 2 2 Exp_Con 96 127 127 119 109 143 ... ... 106 109 114 88

2 2 2 2 2 2 2 2

104 104 91 96 ... ... 114 132

From the above you can see that we used the "Group" variable to code for the two variables. The value of 1 was used to code for "LBW-Experimental", while a value of 2 was used to code for "LBW-Control". If you're confused please study the table, above. To generate the t-statistic,

Clik on [Statistics => Compare Means => Independent-Samples T Test] to launch the appropriate dialog box. Select "exp_con" - the dependent variable list - and move it to the Test Variable(s): box. Select "group" - the grouping variable list - and move it to the Grouping Variable: box. The final step requires that the groups be defined. That is, one must specify that Group1 the experimental group in this case - is coded as 1, and Group2 - the control group in this case - is coded as 2. To do this, click on the [Define Groups...] button. Click on the [Continue] button to return to the controlling dialog box. Run the analysis by clicking on the [OK] button. The output for the current analysis extracted from the output window looks like the following.

The p-value of .004 is way lower than the cutoff of 0.025, and that suggests that the means are significantly different. Further, a Levene's Test is performed to ensure that the correct results are used. In this case the variances are equal, however, the calculations for unequal variances are also presented, among some other statistics - some not presented.

Correlations and Regression

This will be a brief tutorial, since there is very little that is required to calculate correlations and linear regressions. To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...], and [Statistics => Regression => Linear] for the calculation of a linear regression. For this section, the analyses presented in the computer section of the Correlation and Regression chapter will be replicated. To begin, enter the data as follows,

IQ 102 108 109 118 79 88 ... ... 85

GPA 2.75 4.00 2.25 3.00 1.67 2.25 ... ... 2.50

Simple Correlation

Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and "GPA" to the Variables: list. [Explore the options presented on this controlling dialog box.] Click on [OK] to generate the requested statistics.

The results from output window should look like the following,

As you can see, r=0.702, and p=.000. The results suggest that the correlation is significant. Note: In the above example we only created a correlation matrix based on two variables. The process of generating a matrix based on more than two variables is not different. That is, if the dataset consisted of 10 variables, they could have all been placed in the Variables: list. The resulting matrix would include all the possible pairwise correlations.

Correlation and Regression

Linear regression....it is possible to output the regression coefficients necessary to predict one variable from the other - that minimize error. To do so, one must select the [Statistics => Regression => Linear...] option. Further, there is a need to know which variable will be used as the dependent variable and which will be used as the independent variable(s). In our current example, GPA will be the dependent variable, and IQ will act as the independent variable. Specifically,

Initiate the procedure by clicking on [Statistics => Regression => Linear...] Select and move GPA into the Dependent: variable box Select and move IQ into the Independent(s): variable box Click on the [OK] to generate the statistics. Note: A variety of options can be accessed via the buttons on the bottom half of this controlling dialog box (e.g., Statistics, Plots,...). Many more statistics can be generated by explore the additional options via the Statistics button.

Some of the results of this analysis are presented below,

The correlation is still 0.702, and the p value is still 0.000. The additional statistics are "Constant", or a from the text, and "Slope", or B from the text. If you recall, the dependent variable is GPA, in this case. As such, one can predict GPA with the following, GPA = -1.777 + 0.0448*IQ The next section will discuss the calculation of the ANOVA.

One-Way ANOVA

As in the independent t-test datasheet, the data must be coded with a group variable. The data that will be used for the first part of this section is from Table 11.2, of Howell. There are 5 groups of 10 observations each - resulting in a total of 50 observations. The group variable will be coded from 1 to 5, for each group. Take a look at the following to get an idea of the coding.

Groups 1 1 1 ... 1 2 2 2 ... ... ... 5 5 ... 5

Scores 9 8 6 ... 7 7 9 6 ... ... ... 10 19 ... 11

The coding scheme uniquely identifies the origin of each observation. To complete the analysis,

Select [Statistics => Compare Means => One-Way ANOVA...] to launch the controlling dialog box. Select and move "Scores" into the Dependent list: Select and move "Groups" into the Factor: list Click on [OK] The preceeding is a complete spefication of the design for this oneway anova. The simple presentation of the results, as taken from the output window, will look like the following,

The analysis that was just performed provides minimal details with regard to the data. If you take a look at the controlling dialog box, you will find 3 additional buttons on the bottom half - [Contrasts...], [Post Hoc..], and [Options...].

Selecting [Options...] you will find,

If Descriptive is enabled, then the descriptive statistics for each condition will be generated. Making Homogeneity-of-variance active forces a Levene's test on the data. The statistics from both of these analyses will be reproduced in the output window. Selecting [Post Hoc] will launch the following dialog box,

One can active one or multiple post hoc tests to be performed. The results will then be placed in the output window. For example, performing a R-E-G-W F statistic on the current data would produce the following,

Finally, one can use the [Contrasts...] option to specify linear and/or orthogonal sets of contrasts. One can also perform trend analysis via this option. For example, we may wish to contrast the third condition with the fifth,

For each contrast, the coefficients must be entered individually, and in order. Once can also enter multiple contrasts, by using the [Next] present in the dialog box. The result for the example contrast would look like the following,

Further, one can use the Polynomial option to test whether a specific trend in the data exists. Factorial designs will be covered in the next section.

Factorial ANOVA

To conduct a Factorial ANOVA one only need extend the logic of the oneway design. Table 13.2 presents the data for a 2 by 5 factorial ANOVA. The first factor, AGE, has two levels, and the second factor, CONDITION, has five levels. So, once again each observation can be uniquely coded.

AGE Old = 1 CONDITION Counting = 1

Young = 2

Rhyming = 2 Adjective = 3 Imagery = 4 Intentional = 5

For each pairing of AGE and CONDITION, there are 10 observations. That is, 2*5 conditions by 10 observations per condition results in 100 observations, that can be coded as follows. [Note, that the names for the factors are meaniful.]

AGE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2

CONDITIO 1 1 1 ... 1 2 2 2 ... ... ... 5 5 ... 5 1 1 1 ... 1 2 2 2 ... ... ... 5

Scores 9 8 6 ... 7 7 9 6 ... ... ... 10 19 ... 11 8 6 4 ... 7 10 7 8 ... ... ... 21

2 2 2

5 ... 5

19 ... 21

Examine the table carefully, until you understand how the coding has been implemented. Note: one can enhance the readability of the output by using Value Labelsfor the two factors.

To compute the relevant statistics - a simple approach,

Select [Statistics => General Linear Model => Simple Factorial...] Select and move "Scores" into the Dependent: box Select and move "Age" into the Factor(s): box. Click on [Define Range...] to specify the range of coding for the Age factor. Recall that 1 is used for Old and 2 is used for Young. So, the Minimum: value is <1>, and the Maximum: value is 2. Click on [Continue]. Select and move "Conditio" into the Dependent: box Click on [Define Range...] to specify the range of the Condition factor. In this case the Minimum: value is 1 and the Maximum: value is 5. By clicking on the [Options...] button one has the opportunity to select the Method used. According to the online help,

"Method: Allows you to choose an alternate method for decomposing sums of squares. Method selection controls how the effects are assessed." For the our purposes, selecting the Hierarchical, or the Experimental method will make available the option to output Means and counts. --- Note: I don't know the details of these methods, however, they are probably documented.

Under [Options...] activate Hierarchical, or Experimental, then activate Means and counts - Click [Continue] Click on [OK] to generate the output.

As you can see the use of the Means and count option produces a nice summary table, with all the Variable Labels and Value Labels that were incorporated into the datasheet. Again, the use of those options makes the output a great deal more readable.

The output is a complete source table with the factors identified with Variable Labels As noted earlier, the analysis that was just conducted is the simplest approach to performing a Factorial ANOVA. If one uses [Statistics => General Linear Model => GLM - General Factorial...], then more options become available. The specification of the Dependent and Independent factors is the as the method used for the Simple Factorial analysis. Beyond that, the options include,

By selecting [Model...], one can specify a Custom model. The default is for a Fully Factorial model, however, with the Custom option one can explicitly determine the effects to look at. The Contasts option allows one "test the differences among the levels of a factor" (see the manual for greater detail). Various graphs can be specified with the [Plots...] option. For example, one can plot "Conditio" on the Horizontal Axis:, and "Age" on Separate Lines:, to generate a simple "conditio*age" plot (see the dialog box for [Plots...],

The standard post-hoc tests for each factor can be calculated by selecting the desired options under [Post Hoc...]. All one has to do is select the factors to analyze and the appropriate post-hoc(s). The [Options...] dialog box provides a number of diagnostic and descriptive features. One can generate descriptive statistics, estimates of effect size, and tests for homogeneity of variance - among others. An example source table using some of these options would look like the following,

The use of the GLM - General Factorial procedure offers a great deal more than the Simple Factorial. Depending on your needs, the former procedure may provide greater insight into your data. Explore these options! Higher order factorial designs are carried in the same manner as the two factor analysis presented above. One need only code the factors appropriately, and enter the corresponding observations. Repeated measures designs will be discussed in the next section.