Neural Network for Cancer Diagnosis

Published on January 2017 | Categories: Documents | Downloads: 21 | Comments: 0 | Views: 303
of 21
Download PDF   Embed   Report

Comments

Content

EAI 320 Practical 3 Report
Neural Networks

Supervised Learning in an Expert System
GJ Fouche
u13004019
June, 2015

DECLARATION OF ORIGINALITY
UNIVERSITY OF PRETORIA
The University of Pretoria places great emphasis upon integrity and ethical conduct in the
preparation of all written work submitted for academic evaluation.
While academic staff teach you about referencing techniques and how to avoid plagiarism,
you too have a responsibility in this regard. If you are at any stage uncertain as to what is
required, you should speak to your lecturer before any written work is submitted.
You are guilty of plagiarism if you copy something from another author’s work (e.g. a
book, an article or a website) without acknowledging the source and pass it off as your own.
In effect you are stealing something that belongs to someone else. This is not only the case
when you copy work word-for-word (verbatim), but also when you submit someone else’s work
in a slightly altered form (paraphrase) or use a line of argument without acknowledging it. You
are not allowed to use work previously produced by another student. You are also not allowed
to let anybody copy your work with the intention of passing if off as his/her work.
Students who commit plagiarism will not be given any credit for plagiarised work. The
matter may also be referred to the Disciplinary Committee (Students) for a ruling. Plagiarism
is regarded as a serious contravention of the University’s rules and can lead to expulsion from
the University.
The declaration which follows must accompany all written work submitted while you are a
student of the University of Pretoria. No written work will be accepted unless the declaration
has been completed and attached.

Full names of student:
Student number:
Topic of work:

Declaration
1. I understand what plagiarism is and am aware of the University’s policy in this regard.
2. I declare that this assignment report is my own original work. Where other people’s work
has been used (either from a printed source, Internet or any other source), this has been
properly acknowledged and referenced in accordance with departmental requirements.
3. I have not used work previously produced by another student or any other person to
hand in as my own.
4. I have not allowed, and will not allow, anyone to copy my work with the intention of
passing it off as his or her own work.

SIGNATURE:

DATE:

Contents
1.1
1.2

1.3

1.4

1.5
1.6

1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . .
Problem Definition and Methodology . . . . . . . . .
1.2.1 Hidden Layer size and amount . . . . . . . . .
1.2.2 Learning Rate . . . . . . . . . . . . . . . . . .
Results and Graphs . . . . . . . . . . . . . . . . . . .
1.3.1 Instability of using single neuron hidden layer
1.3.2 Results for different hidden layer sizes . . . . .
1.3.3 The effect of learning rate . . . . . . . . . . .
Discussion and Conclusion . . . . . . . . . . . . . . .
1.4.1 Effects of hidden layer size and the dataset . .
1.4.2 The effect of learning rate . . . . . . . . . . .
1.4.3 Final Properties and Stop Condition Notes . .
Appendix A: Hidden Neurons . . . . . . . . . . . . .
Appendix B: Learning Rate Results . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

. 2
. 3
. 3
. 3
. 4
. 4
. 4
. 5
. 5
. 5
. 6
. 6
. 8
. 16

Introduction

Artificial Neural Networks (or ANN) are a topic of paramount importance
in the scope of artificial intelligence and intelligent systems. In effect ANN’s
attempt in some degree to computationally model the inner workings of biological neural nets (like those contained in the brain.) They are in some
degree the most significant early attempt to model ”intelligence” in the field,
especially in deep learning [1].
The architecture of ANN’s generally consists of a network of parallel, multilayered and interconnected neurons connect by inputs and outputs. Different
architectures exist however nothing but the most standard of neural networks
is explored in this report. One characteristic that ANN’s share with their
biological counterparts is the necessity for some kind of learning (or training)
process in order to adapt them to the required dataset. Neural networks are
good for problems where pattern recognition and data interpolation in noisy
spaces form part of the problem. If sufficiently trained they have the ability
2

to generalize beyond the data used for training.
This report will explore the fundamental aspects and problems of: supervised
learning and neural-network topology. All within the context of a simple expert system to diagnose malignant growths based on 9 characteristic inputs.

1.2
1.2.1

Problem Definition and Methodology
Hidden Layer size and amount

Number of hidden neurons
According to some sources, the optimum number of neurons in the hidden
layer is somewhere in the range between the number of inputs to the network
and the number of outputs [2]. For this reason, this report will focus on
analysing networks with neurons from 2 to 9. For interest sake, the case for
1 hidden neuron will also be tested.
Number of hidden layers
In this case, only a single hidden layer will be used. Although networks of
more layers do exist, the potential generalization gain is minimal in most
cases. Since zero hidden layers are required to resolve linearly separable
data, and we don’t know whether or not this dataset is linear - we can only
conclude by experiment exactly how many hidden layers are required. There
is no well defined formula or process for choosing the number of hidden layers
so generally starting with 1 layer and only adding subsequent if really needed
will have to do [3].

1.2.2

Learning Rate

For results, a relatively slow but visible learning rate (α) is about 0.01 [4],
this rate will be used for initial observations with limited stop conditions
to determine optimal network size and then further learning rates will be
simulated in neural networks of the best size(s) found.

3

1.3
1.3.1

Results and Graphs
Instability of using single neuron hidden layer
Sample Prediction Accuracy
1
58 %
2
96 %
3
78 %
Mean
77.3 %
Figure 1.1: A table of samples for single hidden neuron.

1.3.2

Results for different hidden layer sizes

Body of Results in Appendix A combined to form the following graph.

Figure 1.2: A summary plot of measured results in simulations with differing
hidden neuron counts. (see appendix a)

4

1.3.3

The effect of learning rate

The following plot is based on raw data plotted with results from Appendix B.

Figure 1.3: Plot of training rate vs alpha based on Appendix B results

1.4
1.4.1

Discussion and Conclusion
Effects of hidden layer size and the dataset

Findings in using less than 2 neurons
It is evident in the results that at first glance the accuracy of a single neuron
hidden layer is surprising, it is only through more investigation that it becomes evident that the average accuracy over many samples is actually very
low for a single neuron hidden layer. Due to this inconsistency between samples and the fact that the only differing quality is weight initialization and
training data sampling which are both random processes - it leads one to believe that the effectiveness of the single neuron hidden layer is exceptionally
sensitive to different training data and initial weights. This is the indication
of a poor network - This fact means that the Proben1 cancer dataset and its
5

resultant generalization curve is non-linear because of the inefficiency of a
low neuron count network.
For more than 2 hidden neurons
In the data presented by Figure 1.2 it can be seen that between using 2
and 4 neurons in the hidden layer the number of epochs required for the
best test error decreases, this suggests that more hidden layers means easier
pattern recognition. When investigated, the plot data suggests the following
trade-offs:
Number of Hidden Neurons
2-4
5-6
7-9

Advantage
Fast Train Time
Good Performance
Good Performance

Disadvantage
Poor Performance
Increased Train Times
Very Long Train Times

Figure 1.4: A table of trade-offs for small and large hidden layers
We can conclude then, that the best compromise is a hidden layer with 6 - 7
neurons.

1.4.2

The effect of learning rate

Learning rate (α) has a profound influence on the training of the neural
network. It has been shown that it directly controls the rate at which the
neural network adapts to data, but the network is very prone to oscillation
given a learning rate that is simply too high. It has been established that the
ideal learning rate for this network would sit somewhere on ∈ (0.01, 0.03).

1.4.3

Final Properties and Stop Condition Notes

Data suggests the following Neural Network properties:
1. Hidden Layers = 1
2. Hidden Neurons ∈ [6, 7]
3. Learning Rate α ∈ (0.01, 0.03)
Due to the oscillatory nature of validation error in the dataset, the stop
condition implemented detects positive gradients only after a certain desired
level of validation error has been reached. This seemed to work well.

6

Bibliography
[1] R. Collobert, “Deep learning for efficient discriminative parsing,” in International Conference on Artificial Intelligence and Statistics, no. EPFLCONF-192374, 2011.
[2] J. Heaton, Introduction to neural networks with Java. Heaton Research,
Inc., 2008.
[3] P. McCullagh, J. A. Nelder, and P. McCullagh, Generalized linear models,
vol. 2. Chapman and Hall London, 1989.
[4] S. Russel and P. Norvig, Artificial Intelligence : A Modern Approach.
Pearson, third ed., 2010.

7

1.5

Appendix A: Hidden Neurons

Figure 1.5: Results for nHidden = 2

8

Figure 1.6: Results for nHidden = 3

9

Figure 1.7: Results for nHidden = 4

10

Figure 1.8: Results for nHidden = 5

11

Figure 1.9: Results for nHidden = 6

12

Figure 1.10: Results for nHidden = 7

13

Figure 1.11: Results for nHidden = 8

14

Figure 1.12: Results for nHidden = 9

15

1.6

Appendix B: Learning Rate Results

Figure 1.13: Results for alpha = 0.01

16

Figure 1.14: Results for alpha = 0.02

17

Figure 1.15: Results for alpha = 0.03

18

Figure 1.16: Results for alpha = 0.04

19

Figure 1.17: Results for alpha = 0.05

20

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close