support vector machines and least squares

Published on December 2016 | Categories: Documents | Downloads: 48 | Comments: 0 | Views: 279
of 10
Download PDF   Embed   Report

support vector machines and least squares

Comments

Content


Computer Science and Information Technology 2(1): 30-39, 2014 http://www.hrpub.org
DOI: 10.13189/csit.2014.020103
Support Vector Machine and Least Square Support
Vector Machine Stock Forecasting Models
Lucas Lai
*
, James Liu

1
Computer Department, University of Polytechnic, Hong Kong, China
*Corresponding Author: [email protected]
Copyright © 2014 Horizon Research Publishing All rights reserved.
Abstract This paper explores the Support Vector
Machine and Least Square Support Vector Machine models
in stock forecasting. Three prevailing forecasting techniques
- General Autoregressive Conditional Heteroskedasticity
(GARCH), Support Vector Regression (SVR) and Least
Square Support Vector Machine (LSSVM) are combined
with the wavelet kernel to form three novel algorithms
Wavelet-based GARCH (WL_GARCH), Wavelet-based
SVR (WL_SVR) and Wavelet-based Least Square Support
Vector Machine (WL_LSSVM) to solve the non-linear and
non-parametric financial time series problem. This paper
presents a platform for comparison of the wavelet-based
algorithm using Hang Sang Index, Dow Jones and Shanghai
Composite Index which has significant influence to each
other. It has been discovered that wavelet-based model is not
as good as the LS-SVM model. The best result is from
LS-SVM without wavelet-based kernel.
Keywords Autoregressive Conditional
Heteroskedasticity; Support Vector Regression; Least
Square Support Vector Machine; Wavelet
Transform;Daubechieswaveletes; Symlet Wavelets

1. Introduction
The argument over the practical use of Artificial
Intelligence to forecast financial market is a very sensitive
and controversial issue. In the book of [4], it says that the
prices of securities fully reflect available information in the
Efficient Market Hypothesis (EMH). Investors buying
securities in an efficient market should be expected to obtain
an equilibrium rate of return. Weak-form EMH asserts that
stock prices already reflect all information contained in the
history of past prices. The semistrong-form hypothesis
asserts that stock prices already reflect all publicly available
information. The strong-form hypothesis asserts that stock
prices reflect all relevant information. Under EMH, it is
possible to extract information from the historical prices of
the stock, as an input to the forecasting tools to project the
future value. The argument here is almost everyone
particularly the securities player will access to different
forecasting news. Once the news is available, the market will
digest the impact of the news and there is no advantages in
using it hence it will become useless. However, the issue
here is the accuracy of the forecasting news and which form
of EMH is the target market. USA stock market is a typical
strong-form EMH while Chinese stock market is a
weak-form EMH. Moody, Standard and Poor ratings and
their forecast news are very popular but how often we rely on
their forecast news to trade!
Reference [3] explained that US equity returns have been
predictable for many years especially in the long run.
Earnings yieldshas had clear empirical advantages over
dividend yields. Earnings yields is the benchmark on how
well the company performs while dividend yields is the
ability of the company to distribute its profit. It is not always
a good indictor as banking and utilities sectors have steady
dividend yield while new Initial Public Offering (IPO) will
not be so generous. The use of dividend yield as a predictive
variable leads to a basis in forecasting regression. [31]
proved that random walk is not a sufficient and necessary
condition for EMH. [30] found out that Chinese stock market
cannot be classified as weak-form EMH. [9] proved that the
β parameter of a company (which is a ratio between stock
returns and market moves did not show significant
relationship. Capital Asset Pricing Model - CAPM is based
on market portfolio but in reality it is difficult to find. [15]
stated that CAPM is not applicable to recent Chinese stock
market. He also mentioned CAPM is robust but Arbitrage
Pricing Theory (APT) easily analyses all factors affecting the
stock price. The proof of CAPM is rigid but not APT. In
1992, using NYSE, AMEX and NASDAQ he found out β
has nothing to do with the company size. All these findings
using modern investment theories could be confusing as it is
difficult to draw conclusion on how to use it. This is probably
because the market is not easy to be defined and there is no
single market that would not be affected by others. Today’s
economic model is quite different from that 10 or may be 20
years ago and it would make the financial forecast even more
challenging. It is necessary to develop new tools and
methodologies in financial forecast as the markets are
becoming more robust and complicated.

Computer Science and Information Technology 2(1): 30-39, 2014 31

The objective of this paper is to review the wavelet-based
forecasting models through which we would like to test the
predictability of the models and compare those without the
wavelet-based models. The models are based on GARCH,
SVR and LSSVM. They are set to forecast the actual daily
close value of Hong Kong Hang Sang Index (HSI) given the
past 5-year records. HSI has been selected because it reflects
the semi-strong-form EMH [2]. Hong Kong being the third
largest financial trading centre cannot be compared with the
US market which has a very long history, enormous trading
volume, pioneer of financial reform and impeccable
securities law. Before Hong Kong was a follower of the US
market until recently that Chinese market has significant
impact on it. Hong Kong investment advisor [27] has pointed
out that the Hong Kong Stock market is not efficient and lack
of volume like the US stock market to support the
development of other approaches like artificial intelligence
method. His theory will be challenged and this paper has
shown that the proposed models can accurately predict Hong
Kong Stock market using the latest forecasting techniques.
[7] forecasted the volatility of stock index and [18] predicted
the stock returns which are an indirect approach for the
actual index value. The actual index value from these
approaches may not be useful. It is well known throughout
the literature that financial time series particularly stock
index is non-linear. The three main factors of such time
series are trend, seasonal and stochastic. These 3 factors
affect the prediction result in stock index as it is impossible
to develop a model to integrate all these factors. [13] used
Chaotic Oscillatory-based Neural Networks and Lee
Oscillator to successfully catch the variability period of HSI
between 2007 and 2008. But it was a pattern prediction
rather than actual value forecast. The application of the
stochastic factor in stock forecast is limited, hence we focus
on the trend and season and our challenge is to find out the
best model for the prediction task. Despite the fact that stock
index forecast has been conducted for many decades, the
latest artificial intelligence techniques such as GARCH,
SVR and LSSVM have improved the degree of prediction
accuracy. Our objective is to seek for the best algorithm from
the current techniques and apply it to recent financial time
series.
This paper explores the prediction performances of
wavelet-based models such as WL_GARCH, WL_SVRand
WL_LSSVM in predicting exact stock prices on the Hang
Sang Index (HSI) over a 4-day and 20-day forecasting
horizons respectively. There are 5 trading days in a week but
wavelet-based models can only deal with even number of
days and hence a 4-day cycle is chosen to represent a week.
In order to compare the 4-day short-term forecast, a 20-day
long-term forecast is selected which is 4 weeks to represent a
month. The model will give a 4-day and a 20-day ahead
forecast respectively. In addition, the same datasets were
employed in GARCH, SVR and LSSVM without the
wavelet-based kernel as comparison. This paper is an
extension of the work from [14] on using SVR in stocking
forecasting but wavelet-based kernel is introduced. SVR was
conducted with the software system from [16], LSSVM was
conducted using the LS-SVMLAB toolbox which is
provided by KatholiekeUniversiteit Leuven[26] while the
experiment of GARCH wasconducted with MATLAB
GARCH toolbox. The three wavelet-based algorithms,
WL_GARCH, WL_SVRand WL_ LSSVM, are developed
by the authors under MATLAB environment using GARCH,
SVR and LSSVM as the basic kernel.
The rest of this paper is organized as follows. Section 2
describes the method, description on GARCH, SVR,
LSSVM and Wavelet Transform function. Section 3
provides the empirical modeling of our models and the
empirical result. Section 4 gives the conclusion and outlines
our future work.
2. Methods
Three different markets DJ, HSI and SH historical data are
input into the above 6 forecasting models. It is the objective
of this paper to test the accuracy of the forecasting result
using hybrid kernel-based function.
2.1. GARCH
GARCH (General Autoregressive Conditional
Heteroskedasticity) by Bollerslev is a linear time series
prediction method. It is a standard textbook material in
econometrics and finance[6]. There are many families of
GARCH as described in [11] and its application is
throughout the financial institutes. GARCH models are
designed to capture certain characteristics that are commonly
associated with financial time series such as fat tails,
volatility clustering leverage effects. One branch of GARCH
called Ngarch as described in [22] is an alternative approach
to the famous Black Scholes Model. ARFIMA-FIGARCH
from[25]that can predict the Indian Stock Data during the
period 3 July, 1990 to 18 September 2009 accurately. In
[12]paper, GARCH prediction on NK225 has the RMSE
value of 0.2013 while that of the pure SVM is 0.1820 and the
best RMSE value from Wavelet-based RVM is 0.0202.
2.2. Support Vector Regression
The following is a brief description on SVR for nonlinear
function estimation such as the financial times series. In the
primal weight space the model takes the form
( ) ( ) ,
T
f x x b ω φ = + (1)
withthe given training data
N
k k k
y x
1
} , {
=
and (.) ϕ :
nh n
R R → a mapping to a high dimensional feature space
which can be infinite dimensional and is only implicitly
defined. Note that in this nonlinear case the vector ω can also
become infinite dimensional. The optimization problem in
the primal weight space becomes
*
* *
, , ,
1
1
min ( , , ) ( )
2
l
T
p i i i i
w b
i
J C
ξ ξ
ω ξ ξ ω ω ξ ξ
=
= + +
∑ (2)

32 Support Vector Machine and Least Square Support Vector Machine Stock Forecasting Models

subject to:
k k
T
k
b x y ξ ε ϕ ω + ≤ − − ) ( ,k = 1, …,N
*
) (
k k k
T
y b x ξ ε ϕ ω + ≤ − + , k = 1, …,N
ξ
i

i
*
≥0,k=1, …, N,
Applying the Lagrangian and conditions for optimality, the
following is the dual problem
*
* * * * *
,
, 1 1 1
1
max ( , ) ( )( ) ( ) ( )
2
N N N
D k k l l k k k k k
k l k k
J y
α α
α α α α α α ε α α α α
= = =
= − − − − + −
∑ ∑ ∑

(3)
subject to :

=
= −
N
k
k k
1
*
0 ) ( α α

] , 0 [ ,
*
c
k k
∈ α α
Here the kernel trick has been applied with
) ( ) ( ) (
, l
T
k l k
x x x x K ϕ ϕ = for k, l = 1,...,N. The dual
representation of the model becomes
*
1
( ) ( ) ( , )
N
k k k
k
f x K x x b α α
=
= − +

(4)
Consider the following Vapnik’s ε-insensitive loss function
0 if ( )
( ( ))
( ( )) otherwise
y f x
L y f x
L y f x
ε
ε
ε
¦ − ≤
¦
− =
´
− −
¦
¹

(5)
Eq.5 is a convex cost function where L(.) is convex.
Primal problem
*
*
, , ,
1
1
min ( ( ) ( ))
2
N
T
k k
w b
k
w w C L L
ε ε
ε ε
=
+ +

(6)
subject to
k k
T
k
b x w y ε ε ϕ + ≤ − − ) (
*
) (
k k k
T
y b x w ε ε ϕ + ≤ − +
0 ,
*

k k
ε ε
where
*
,
k k
ε ε are slack variables. Here,
k
x is mapped to a
higher dimensional space by the function φ and ξ
k
is the
upper training error (ξ
k
*
is the lower) subject to the
ε–insensitive tube ε ϕ ≤ − − | ) ( | b x w y
k
T
k
. The parameters
which control the regression quality are the cost of error C,
the width of the tube ε, and the mapping function φ.
The constraints imply that we should put most data x
k
in the
tube ε ϕ ≤ − − | ) ( | b x w y
k
T
k
. If x
k
is not in the tube, there is
an error ξ
k
or ξ
k
*
which we must minimize the objective
function SVR to avoid under-fitting or over-fitting the
training data by minimizing the training error

=
+
N
k
k k
L L C
1
*
)) ( ) ( ( ε ε
as well as the regularization term
2
1
w
T
w.
The Lagrangian for this problem is
* * *
*
1 1
* * * *
1 1
1
( , , , ; , , , )
2
( ( ) ( )) ( ( ) )
( ( ) ) ( )
T
N N
T
k k k k k k
k k
N N
T
k k k k k k k k
k k
L b
c L L y x b
y x b
ω ε ε α α η η ω ω
ξ ξ α ε ε ω φ
α ε ε ω φ η ε η ε
= =
= =
= +
+ − + − + + −
+ + − − − +
∑ ∑
∑ ∑
(7).
With Lagrange multipliers 0 , , ,
* *

k k k k
η η α α for
k=1,…,N.
Dual problem
* *
* *
, , ,
max ( , , , )
D
J
α α η η
α α η η
(8)
subject to

=
= −
N
k
k k
1
*
0 ) ( α α

, 0 ) ( ' = − −
k k k
cL η α ε k=1,…,N
, 0 ) ( '
* * *
= − −
k k k
cL η α ε k=1,…,N
0 , , ,
* *

k k k k
η η α α , k=1,…,N
So far from .(1) to (8), SVR estimation function combined
with the loss function is the foundation of the SVR.
Support Vector Machine (SVM) is used in many machine
learning tasks such as pattern recognition, object
classification, and with regression analysis in time series
prediction in Support Vector Regression, or SVR, a
methodology in which a function is estimated using observed
data which in turn is used to train the SVM. It differs from
traditional time series prediction methodologies in that there
is no model in the strict sense – the data drives the prediction.
[19] used SVR to determine the minimum enclosing zone
and [10] used SVR in predicting country investment risk.
SVR has been used in long term stock market forecasting.
[21] used an accelerated Levenberg-Marquardt algorithm to
predict the stock market series of the Jakarta Stock Indices
over 10 months, achieving an RMSE of 1.96%. [2] applied
SVR to forecast the price trend for a single Chinese stock.
[20] used SVR to predict the first day returns of US stock
market IPOs, but found to be accurate in only 18% of cases.
[29] claimed a profit over two months using a methodology
that combined news and technical indicators. [12] used SVR
to forecast the direction of stock movements which was
correct 73% of the time. [24]reported the use of SVR in
financial time series prediction over a 5-day forecasting
horizon.
2.3. Least Square Support Vector Machine
LSSVM regression is closely related to regularization
networks, Gaussian processes to reproduce kernel Hilbert
spaces but emphasizes primal-dual interpretations in the
context of constrained optimization problems. It is relatively
a new tool, there is very little research in financial
forecasting using LSSVM such as [28].

Computer Science and Information Technology 2(1): 30-39, 2014 33

The following is a brief description ofLSSVM
mechanism on regression problems. Given a training data
{ }
N
k k k
y x
1
,
=
, we can formulate the following optimization
problem in the primal weight space

=
+ =
N
k
k
T
p
e b w
e C e w J
1
2
2
1
2
1
, ,
) , ( min ω ω (9)
such that y
k

T
φ (x
k
)+b+e
k
, k= 1,…,N is modified here at
two points comparing with (1) from the SVR section above.
First, instead of inequality constraints one takes equality
constraints where the value y
k
at the left hand side is rather
considered as a target value than a threshold value. Upon this
target value an error variable e
k
is allowed such that
misclassifications can be tolerated in the case of overlapping
distributions. These error variables play a similar role as the
slack variablesξ
i
in SVR. Secondly, a squared loss function
2
k
e is taken for this error variable. These modifications will
greatly simplify the problem.
2.4. Wavelet Transform
The wavelet transform (WT) has been found to be
particularly useful for analyzing signals which can best be
described as aperiodic, noisy, intermittent and transient [1].
It really began in the mid-1980s where they were developed
to interrogate seismic signals. The application of wavelet
transform analysis in science and engineering really began to
take off at the beginning of the 1990s. WT and Fourier
transform (FT) are very similar in nature especially FT has
been around since the 1800s. FT is built from sines and
cosines functions which are periodic waves that continue
forever. This approach is only good for signals that have
time-independent wave-like features, signals which have
more localized features for which sines and cosines do not
model very well. WT is a different set of building blocks to
model these types of signals [2]. In this paper, WT is tested if
it can improve the forecasting accuracy of financial time
series which by definition is not with time-independent
wave-like features. Wavelet is a mathematical function used
to divide a given function or continuous-time signal into
different scale components. A wavelet transform is the
representation of a function by wavelets. The wavelets are
scaled and translated copies (known as “daughter wavelets”)
of a finite-length or fast-decaying oscillating waveform
(known as “mother wavelet”). It is widely applicable to time
series analysis. In [8], multi-resolution discrete wavelet
transforms combining with SVR technique was applied to
forecast the opening cash index of Nikkei 225 with MAPE
value at 0.31 which is a very good result. [23]forecasted
GDP growth one- and two-quarter-ahead of Germany,
France, Italy and Spain using multi-resolution discrete
wavelet transforms. The best mean squared error was 65%
better relative to the autoregressive benchmark in Spain but
it was 10% worst in Italy. However, GDP growth cannot be
compared with financial index as the latter is more volatile.
DWT is any wavelet transform for which the wavelets are
discretely sampled. It was invented by the Hungarian
mathematician Alfred Haar. The most commonly used set of
DWT was formulated by the Belgian mathematician Ingrid
Daubechies in 1988 which is one of the methods considered
in this paper. This formulation is based on the use of
recurrence relations to generate progressively finer discrete
samplings of an implicit mother wavelet function; each
resolution is twice that of the previous scale. There are a
number of families in Daubechies and Haar is the first one.
Daubechies wavelets are quite asymmetric, in order to
improve symmetry while retaining simplicity, Daubechies
proposed Symmlets as a modification to her original
wavelets (also symlets). The Daubeches and Symmlets
wavelets are employed here in this paper.
Reference [23] described the conventional factor model,
the data-generating process of each variable is the sum of
two components: a component associated with factors
common to all series and an idiosyncratic term. The
underlying idea is that one can summarize the large
information set into a small number of variables, the
common factors, which retain the main features. Wavelet
multi-resolution analysis allows one to decompose a time
series into a low-frequency base scale and higher-frequency
scale. Those frequency components can be analyzed
individually or compared across variables. A). Times series
are decomposed to orthogonal components of different
frequencies. B). Each time scale uses a model to fit in. C).
Overall forecast is obtained by recombining the components.
[23] only used Symlet wavelet at level 4. Here, we used
Symlet wavelet functions with coefficients from 2 to 8 and
Daubechies wavelet function coefficients from 1 to 20 for
comparison. The selections of such coefficients are based on
the work [14].
The discrete wavelet transform (DWT) can be written as:


∞ −
= dt t t x T
n m n m
) ( ) (
, ,
ψ
(10)
where the integers m and n control the wavelet dilation and
translation respectively. By choosing an orthonormal
wavelet basis, ) (
,
t
n m
ψ , we can reconstruct the original
signal in terms of the wavelet coefficients,
n m
T
,
, using the
inverse discrete wavelet transform as follows:
∑ ∑

−∞ =

−∞ =
=
m n
n m n m
t T t x ) ( ) (
, ,
ψ (11)
The orthonormal discrete wavelets are associated with
scaling functions and their dilation equations as follows:
) 2 ( 2
2 /
,
n t
m m
n m
− =
− −
φ φ
(12)
They have the property
1 ) (
0 , 0
=


∞ −
dt t φ (13)

34 Support Vector Machine and Least Square Support Vector Machine Stock Forecasting Models

The scaling function can be convolved with the signal to
produce approximation coefficients as follows:

+∞
∞ −
= dt t t x S
n m n m
) ( ) (
, ,
φ
(14)
We can represent a signal x(t) with a combined series
expansion using both the approximation coefficients and the
wavelet coefficients as follows:
∑ ∑ ∑

−∞ = −∞ =

−∞ =
+ =
n
m
m n
n m n m n m n m
t T t S t x
0
0 0
) ( ) ( ) (
, , , ,
ψ φ (15)
3. Empirical Modeling
3.1. Data
The objective of this paper is to predict the 4-day and
20-day horizons of HSI closing value given the historical
data of HSI. Based on the winner of ENUNITE [17], the
benchmark to measure the forecasting accuracy is the mean
absolute percentage error and our aim is set that to below 2.
The historical data of HSI during August 2003 till June 2009
is downloaded from Yahoo financial website and it is
separated into two datasets. The first set during 5 July 2007
till 30 June 2009 with 488 records is used to predict the
4-day with a sliding window of 248 days which is roughly a
one year dataset. The first shift-window during 5 July 2007
till 8 July 2008 is used to predict the next 4-day from 9 July
2008 onward. The next shift-window during 11 July 2007
till 14 July 2008 is used to predict the next 4-day from 15
July 2008 onward. Totally, there are 60 results. Another set,
during 18 August 2003 till 30 June 2009 with 1448 records
is used to predict the 20-day with a sliding window of 248
days. The first shift-window during 18 August 2003 till 16
August 2004 is used to predict the next 20-day from 17
August 2004 onward. The next shift-window during 16
September 2003 till 13 September 2004 is used to predict
the next 20-day from 14 September 2004 onward. Totally,
there are 60 results. The above data range is a test on the
model robustness to highly volatile market as it ended near
the financial tsunami. As a summary, a one year sliding
window of 248 days is applied to the 488 records
(5.7.2007-30.6.2009) to predict the stock price in the next
4days, and to the 1448 records (18.8.2003-30.6.2009) in
order to predict the stock price in the next 20 days. The
purpose is to test the general forecasting ability of each
model.
Using the same methodologies, two sets of index values
of Shanghai composite Index and Dow Jones Index with the
same record length and roughly the same period (Shanghai
composite index 17.7.2003-30.6.2009 &
3.7.2007-30.6.2009 and Dow Jones 30.9.2003-30.6.2009 &
25.7.2007-30.6.2009) were analyzed by these models. As
mentioned in the introduction, Shanghai composite index –
China stock market is a weak-form EMH, HSI – Hong Kong
stock market is semi-strong-form EMH and Dow Jones
Index – US stock market is a strong-form EMH. Our
purpose is to put these 3 markets to test under the above
models and hypothesis that the strong form EMH should
perform better than weak form of EMH. It also provides a
foundation that our models can handle all kinds of market
and its robustness in handling extreme data values during
financial tsunami. The unprecedented financial tsunami is
once in a life time experience for all financial institutions to
handle. Comparedwith the last financial crisis in 1997 due
to the collapse of Long Term Capital Management, the
magnitude is far greater. The following figures are the
characteristics of these data range.

Figure 1. Shanghai, HSI and Dow indexes 2007 to 2009

Figure 2. Shanghai, HSI and Dow indexes 2003 to 2009
A 4-day instead of 5-day forecast horizon is applied in this
paper. It is because that the discrete wavelettransform (DWT)
function only accepts even number. In this paper, only one
parameter, the daily close value is used and a new data
pre-processing technique - windowize is considered. It
makes a nonlinear Auto Regressive predictor with a
1 244 488
7.3
7.7
8.1
8.5
SH7-9 log-levels
1 244 488
-10
-5
0
5
10
SH2007-09 returns
1 244 488
9.2
9.5
9.8
10.1
10.4
HSI7-9 log-levels
1 244 488
-20
-10
0
10
20
HSI2007-09 returns
1 244 488
8.7
8.9
9.1
9.3
9.5
DOW7-9 log-levels
1 244 488
-10
0
10
20
DOW2007-09 returns
1 724 1448
6.9
7.3
7.7
8.1
8.5
SH3-9 log-levels
1 724 1448
-10
-5
0
5
10
SH2003-09 returns
1 724 1448
9.2
9.4
9.6
9.8
10
10.2
10.4
HSI3-9 log-levels
1 724 1448
-20
-10
0
10
20
HSI2003-09 returns
1 724 1448
8.7
8.9
9.1
9.3
9.5
DOW3-9 log-levels
1 724 1448
-10
0
10
20
DOW2003-09 returns

Computer Science and Information Technology 2(1): 30-39, 2014 35

nonlinear regressor. The last elements of the resulting matrix
will contain the future values of the time series, the others
will contain the past inputs. The following is a simple
example.
(
(
(
(
(
(
(
(
(
¸
(









¸

=
3 2 1
3 2 1
3 2 1
3 2 1
3 2 1
3 2 1
3 2 1
g g g
f f f
e e e
d d d
c c c
b b b
a a a
A

W=windowize(A,[1 2 3])
(
(
(
(
(
(
¸
(






¸

=
3 2 1 3 2 1 3 2 1
3 2 1 3 2 1 3 2 1
3 2 1 3 2 1 3 2 1
3 2 1 3 2 1 3 2 1
3 2 1 3 2 1 3 2 1
g g g f f f e e e
f f f e e e d d d
e e e d d d c c c
d d d c c c b b b
c c c b b b a a a
W
Windowize is the relative index of data points in matrix A,
that are selected to make a window. Each window is put in a
row of matrix W. The matrix W contains as many rows as
there are different windows selected in A. It has been
discovered this method outperforms the RDP as it is easier to
apply. [14] employed RDP5, RDP10, RDP15 and RDP20 to
perform the same function as the windowize.
3.2. Forecasting Models and Parameters
Six algorithms have been developed in this paper. There
are parameters in each model that require the algorithm to
search in order to get the best result. C parameters are set to
500, 1,000, 5,000, 10,000, 20,000, 40,000 and g set to 1, 2
for the SVR and WL_SVR model based on the work of [14].
C is the value in (1) and g is the parameter of the mapping
function φ . For the wavelet-based kernel, discrete wavelet
transform is used and two types of methods are employed.
The first is Daubechies with coefficients from 1 to 20 and
the other is Symlet with coefficients from 2 to 8.
3.3. Empricial Results
MAPE = 100
n
A
P A n
i

=

1
| |

MAPE stands for Mean Absolute Percentage Error which
is the measure of accuracy in a fitted time series value in
statistics, specifically trending. A and P are the real and the
predicted values of the close value of the HSI respectively
and n is the time frame or number of days.
Table 1. Empirical Result in forecasting Hang Sang Index expressed in MAPE
Data sets range 2005-2010 2005-2010 2006-2011 2006-2011 2010 2010 2011 2011
Forecast Horizon 4 days 20 days 4 days 20 days 4 days 20 days 4 days 20 days
SVR 0.4937 1.8217 0.4037 1.9353 0.6787 1.2924 0.4037 1.8291
WL_db_svm 3.2709 9.2519 2.1003 4.4247 1.2170 2.1459 1.3914 2.1887
WL_sym_svm 1.4682 10.1778 2.1571 4.4247 0.5890 2.9880 2.1503 2.7954
LSSVM 0.8372 1.1407 0.8397 6.1239 1.7730 2.7961 0.8101 100
WL_db_lssvm 1.4045 3.7927 1.7167 1.7666 1.4045 3.7927 1.7167 1.7666
WL_sym_lssvm 2.1936 2.6297 1.9534 1.7368 2.1936 2.6297 1.9534 1.7368
Garch 4.3704 12.8672 1.7703 12.6837 4.4057 13.3005 1.5819 11.2425
WL_db_garch 3.3246 6.6324 2.1701 1.5146 2.2104 1.5966 0.5192 1.8205
WL_sym_garch 2.1939 5.237 2.1542 1.7941 2.3885 1.6668 0.5192 1.8267




36 Support Vector Machine and Least Square Support Vector Machine Stock Forecasting Models

From Table 1, SVR has 4 best MAPE, wavelet transform
models has 3 while LSSVM has only 1. It seems the winner
is SVR model. In general wavelet transform has improved
the accuracy in GARCH models except the data range
2006-2011. However, the application of wavelet transform to
SVR and LSSVM do not produce the same result. Most
likely, it is because SVR and LSSVM use windowize method
to pre-process the data and then map data into higher
dimension. But in WL_db_svm, WL_sym_svm,
WL_db_lssvm and WL_sym_lssvm, windowize method
cannot be applied to the transformed data from the wavelet
functions but only use normalization. For GARCH, log
return method is used while its wavelet models use
normalization. It seems SVR and LSSVM are more robust
because of the windowize technique. In each of the basic
model SVR, LSSVM and GARCH, wavelet based transform
models have improved the accuracy. This confirms that the
application of wavelet based models in previous work has
significant improvement on financial time series forecasting.
The best MAPE result is from the short-term 4-day forecast.
In fact, it all comes from SVR model. For the long-term
20-dayforecast, each model has its merit. The best result is
0.4037 4-day forecast horizon of 2011 from the above table.
In Lin (2001) ENUNITE competition, he won the best
forecasting result with MAPE 1.9. It is our target to keep the
MAPE within 2. HSI close value in 30 June 2011 was 22398
and MAPE 0.437 implying the next 4-day value is +/-
22398*0.4037/100 or 98 points. It is a very useful
investment benchmark as the daily fluctuation of HSI close
value is usually more than 150 points and to accurately
forecast the next 4 days within 98 points difference is
extremely difficult. The drawback here is the selection of
parameter in each model which already explained in each
algorithm. The above experiments have tested that the
parameters selected in the above algorithms are correct.
Once we know the current best MAPE and its parameters, it
is easy to input into the next forecasting horizon.
In Table 2, the average MAPE of the 60 results in each
model is displayed and LSSVM gives the best result because
4 out of 6 MAPE values are the lowest. The improvement of
MAPE accuracy in the use of wavelet functions only happen
in GARCH model. The sum of the best result of the 4-day
and 20-day MAPE for Shanghai Composite Index is 5.3249,
HSI is 5.2561 and Dow Jones is 4.4379. It is obvious that the
prediction result of Dow Jones outperforms the other indexes
in this exercise as it has the least MAPE figure of 4.4379.
This confirms the speculation that strong-form EMH market
should get better result in the above models. Shanghai
Composite Index and HSI MAPE values are very close
suggesting that China and Hong Kong security market are
closely related. In general the improvement of accuracy
using wavelet function also only happens in GARCH models.
The degree of accuracy in GARCH and its wavelet function
are poor compared with that of SVR and LSSVM. As
explained in our data section, the pre-processing data method
in GARCH cannot use windowize method and it is very
likely why its result is so poor. The strength of GARCH is its
flexible adaptation of the dynamics of volatilities and its ease
of estimation when compared to other models. It is a
return-based model but it might neglect the important
intraday information. E.g. when today’s closing price equals
to last day’s closing price, the price return will be zero, but
the price variation during today might be volatile. [15]
explained the model is not able to capture the information.
Despite the renowned reputation in GARCH and previous
work on the successful application of GARCH with wavelet
based kernel to financial time series, our experiment cannot
attain the same result. However, the effect of wavelet based
kernel is still a major contributing factor in the overall result
in GARCH model. Perhaps another type of GARCH model
should be employed to achieve a better result. This will be in
our future work and not the scope of this paper. In this
section, the focus is to compare and identify the fundamental
factors that cause the difference in different models and
markets. We simply provide the best model for the above
exercises based on our findings.
Table 2. Various markets performance
Index Average Sh Composite Sh Composite Hang Sang Hang Sang Dow Jones Dow Jones
Data sets range 2007 to 2009 2003 to 2009 2007 to 2009 2003 to 2009 2007 to 2009 2003 to 2009
Forecast Horizon 4 days 20 days 4 days 20 days 4 days 20 days
SVR 1.3755 6.4903 2.8785 4.1385 2.0899 2.5235
WL_db_svm 1.5095 21.4870 3.5369 9.2184 2.5781 6.0027
WL_sym_svm 1.6965 22.7968 4.5380 8.9549 4.0757 6.5828
LSSVM 2.0918 3.9494 2.4693 2.7868 1.9144 3.9494
WL_db_lssvm 2.7785 7.7177 3.6038 5.3428 2.3008 7.7177
WL_sym_lssvm 3.1976 6.9102 3.9298 4.5821 2.5853 6.9102
Garch 6.3198 20.6941 7.9502 16.4846 6.5432 12.6895
WL_db_garch 8.0719 24.5217 6.7212 12.5457 4.5167 7.1925
WL_sym_garch 3.1996 20.5802 3.3281 10.6222 2.4473 6.1495

Computer Science and Information Technology 2(1): 30-39, 2014 37

Table 3. Various markets performance max and min difference
Index Sh Composite Sh Composite Hang Sang Hang Sang Dow Jones Dow Jones
Data sets range 2007 to 2009 2003 to 2009 2007 to 2009 2003 to 2009 2007 to 2009 2003 to 2009
Forecast Horizon 4 days 20 days 4 days 20 days 4 days 20 days
SVR 7.3422 19.0784 21.1762 32.1048 18.6788 21.2680
WL_db_svm 6.2279 95.1529 15.7930 44.2048 14.7343 28.0634
WL_sym_svm 8.2657 91.0760 23.3956 42.9311 15.5045 38.8243
LSSVM 9.1604 18.4490 7.6761 9.7202 12.8402 18.4490
WL_db_lssvm 8.1484 24.5040 14.6272 29.5486 12.9265 24.5040
WL_sym_lssvm 8.5131 17.5376 18.0802 21.6681 15.0719 17.5379
Garch 16.0702 51.9487 27.1573 69.7003 20.3877 74.6821
WL_db_garch 19.6848 96.6332 26.0193 66.3760 18.0263 39.0890
WL_sym_garch 6.7354 81.8148 13.7217 56.9161 14.9553 32.6312
Table 4. Descriptive statistics for various stock indexes during 2007 to 2009
Returns SH Composite Index Hang Sang Index Dow Jones Index
Statistics p-value h-value Statistics p-value h-value Statistics p-value h-value
Mean -0.0567 -0.0393 -0.1006
Variance 6.1035 7.8655 4.2002
Skewness -0.0332 0.1709 0.1807
Kurtosis 4.1061 6.1697 7.1703
Normality 24.9141 0 1 206.2428 0 1 355.5439 0 1
Q(6) 6.0892 0.4133 0 5.427 0.4903 0 29.6717 0 1
Q(6)* 13.2112 0.0398 1 191.5078 0 1 195.1023 0 1
ARCH(6) 11.7167 0.0686 0 96.4186 0 1 112.366 0 1
Table 5. Descriptive statistics for various stock indexes during 2003 to 2009
Returns SH Composite Index Hang Sang Index Dow Jones Index
Statistics p-value h-value Statistics p-value h-value Statistics p-value h-value
Mean -0.0452 0.0385 -0.0065
Variance 3.536 3.2458 1.7012
Skewness -0.2169 0.0918 0.0575
Kurtosis 5.999 12.3643 14.7956
Normality 553.6119 0 1 5289 0 1 8390 0 1
Q(6) 19.0444 0.0041 1 9.8543 0.1309 0 63.3866 0 1
Q(6)* 128.4139 0 1 852.7444 0 1 839.8699 0 1
ARCH(6) 83.7537 0 1 366.6877 0 1 412.3289 0 1
Notes : Normality is the Bera-Jarque(1981) normality test;Q(6) is the Ljung-Box Q test at 6 order for Raw returns; Q(6)* is LB Q test for
squared returns; ARCH(6) is Engle’s (1982) LM test for ARCH effect.

38 Support Vector Machine and Least Square Support Vector Machine Stock Forecasting Models

Table 3 shows the difference between maximum and
minimum MAPE of the 60 results. This is crucial when
selecting which model to use in forecasting. Remember these
results are from the extreme volatile period caused by
financial tsunami. Combining Tables II and III, Shanghai
composite index in SVR model ends up having the best
average 1.3755 and the least difference 7.3422 in the 4-day
forecast. It is very likely that China stock market is still a
close market and the impact of financial tsunami is small. In
HSI experiment, LSSVM model has the best average 2.4693
and least difference7.6761 for 4 days and best average
2.7868 and least difference 9.7202. It should be noted that
SVR has the best average 2.8785 and least difference 21.174
for 4 days and best average 4.1385 and least difference
32.1048 which is second to LSSVM in terms of accuracy. As
far as the objective of this paper is concerned, we need to
find out which is the best model for HSI forecast. From
Tables II and III, it is obvious the choice is LSSVM but
Table I points to SVR. As Table I is from the most current
data while Tables II and III are not, our final
recommendation is SVR despite a bigger difference value
but it has the smallest MAPE 0.4037. The difference value is
a test of the model robustness and the criterion here is having
a reasonable value. For the second choice, LSSVM is a good
candidate for financial advisor for their decision making.
Tables 4 and 5 report the summary of the descriptive
statistics for various stock indexes during the two periods
based on log-return analysis. If skewness is negative, it shifts
to the left and vice versa. If it is a normal distribution,
kurtosis is 3. When kurtosis is greater than 3, it is more
outlier-prone than normal distribution and vice versa. When
normality h = 1, it is a normal distribution. When Q(6) h = 1,
the statistic of raw returns indicates significant
autocorrelation. When Q(6)* h = 1, the statistic of squared
raw returns indicates significant correlation..When ARCH(6)
h = 1, ARCH effect shows significant evidence in support
of GARCH effects (i.e. heteroscedasticity).Except 2007 to
2009 Shanghai composite series, others are typically
characterized by excessive kurtosis and asymmetry. It can be
concluded that the above series are characterized by
heteroscedasticity and time-varying autocorrelation;
therefore, GARCH class models should fit for forecasting.
As seen from Figure 1, Figure 2, Table 3 and Table 4, all
series exhibit more variability, skewness, kurtosis and
volatility clustering such that nonlinear asymmetric
EGARCH model should fit it more accurately. In Table II, all
values in GARCH model are from EGARCH model with
parameters, R,1,M,1,P,1,Q,2. The result consistent with the
statistics findings.
4. Conclusion and Future Work
Based on EMH, the above models have been tested in 3
markets. The winner is SVR model as it produces the best
MAPE for the HSI value and can perform equally well in the
3 markets. The accuracy for a long term forecast20-day or
one month is always difficult but the results have
demonstrated that it is still possible to get MAPE under 2. It
is a significant improvement and very useful tool in financial
time series analysis. Decision makers can rely on our models
to analyse the market trend or benchmark for investment
portfolio. As in the experiment, it is a tedious task to search
for the right parameters for the models and so far there is no
simple solution to the above problem. The science of
forecasting is still relying on trial and error approach.
However, the experiments have provided a consistent
approach which is to search for the parameters as explained
in the above sections using the recent historical data. The
disadvantage could be time consuming but it seems the ends
justify the means if the objective is achieved.
The consistent performance of the Least Square Support
Vector Forecasting model has been demonstrated in
experiments especially from Table 2. The above approaches
are limited to three forecasting techniques which are
GARCH, SVR and LSSVM. In order to increase the
predictability of the SVR model, chart pattern is another
approach which will be explored. In addition, the chaotic
factors of the above markets have not been scrutinized. It
will be included it in future work in these models. For the
time being, it is believed that the above models are useful for
handling the current market demand even under extreme
condition such as financial tsunami.
Acknowledgements
The authors would like to acknowledge the partial support
of the CRG grant G-YM07 of The Hong Kong Polytechnic
University.

REFERENCES
[1] S. P. Addison, The Illustrated Wavelet Transform Handbook.
Institute of Physics Publishing Limited, Dirac House, temple
Back, Bristol BSI 6BE, UK, 2002.
[2] Y. Bao. K. Z. Liu.Guo. T. L. W. Wang. Forecasting Stock
Composite Index by Fuzzy Support Vector Machines
Regression, Fourth International Conference on Machine
Learning and Cybernetics, pp. 3535-3540, Guanzhou, China
18-21 August 2005.
[3] Bjorn. Questioning the Inefficient Market Hypothese: Theory
and Econometrics, PhD paper. Yale University, USA, 2003.
[4] Z. Bodie, A. Kane. A. J. Marcus. Investments, Mcgraw-Hill
International, USA, 2003.
[5] A. Boggess. J. F. Narcowich. A first course in wavelets with
fourier analysis. John Wiley & Sons, Inc., Hoboken, New
Jersey, USA, 2009.
[6] T.Bollerslev.Glossary to ARCH(GARCH), School of
Economics & Management University of Aarhus, Denmark

Computer Science and Information Technology 2(1): 30-39, 2014 39

June, 2007.
[7] C. Chen. W. K. Hardle. K. Jeong. Forecasting Volatility with
SVM-Based GARCH Model. Journal of Forecasting,
406-422, (DOI:10.1002/for.1134) 2009.
[8] W. Dai. C. J. Lu. Financial Time Series Forecasting using a
Compound Model Based on Wavelet Frame and Support
Vector Regression 4th International Conference on Natural
Computation, IEEE, 329-332,2008.
[9] E. Famma. Efficient Market Capital. Journal of Finance,
1575-1616, 1991.
[10] I. Fernandex.B. Irma.Zanakis.H. Stelios. S. Walczak.
Knowledge discovery techniques for predicting country
investment risk, Computers & Industrial Engineering 43, pp
787-800, 2002.
[11] Hentschel, L. Nesting symmetric and asymmetric GARCH
models, Journal of Financial Economics 39, 71-104, 1995.
[12] S.C.Huang. T.K. Wu.Wavelet-Based Relevance Vector
Machines for Stock Index Forecasting,2006 International
Joint Conference on Neural Networks, Vancouver, BC,
Canada, pp 603-609, July 2008
[13] K.M. Kong. H.Y. Wong. S.Lee. J. Liu Fuzz-IEEE, IEEE,
2009
[14] K.C. Lai. N.K. Liu. Stock Forecasting Using Support Vector
Machine, International Conference on Machine Learning
and Cybernetics (ICMLC), Vol 4, PP1607-1614, DOI:
10.1109/ICMLC.2010.5580999, Qingdao, China Print
ISBN 978-1-4244-6526-2, INSPEC Accession Number
11536134
[15] Y.F. Li. Research on Stock Value Investment Based on
Artificial Intelligence, Dissertation for the Doctoral Degree in
Management, Harbin Institute of Technology, 2008.
[16] C.J Lin. C.C., Chang, LIBSVM: a library for support vector
machines, 2001. Software available at
http://ww.csie.ntu.edu.tw/cjlin/libsvm.
[17] Lin, C.J., Chen, B.J. & Chang, M.W. Load Forecasting
Using Support Vector Machines: A Study on ENUNITE
Competition 2001, Department of Computer Science and
Information Engineering, National Taiwan University, 2001.
[18] D. Olson C. Mossman. Cross-correlations and Predictability
of Stock Return, Journal of Forecasting, 145-160, 2001
[19] A.M. Malyscheff. T.B. Trafalis. S. Raman, From support
vector machine learning to the determination of the
minimum enclosing zone, Computers & Industrial
Engineering 42 pp 59-74, 2002.
[20] R. Mitsdorffler J. Diederich. Prediction of First-Day Returns
of Initial Public Offering in the US Stock Market Using Rule
Extraction from Support Vector Machines, Studies in
Computational Intelligence (SCI) 80, 185-203 2008.
[21] F. Pasila. S. Ronni. L.H. Wijaya. Long-term Forecasting in
Financial Stock Market using accelerated LMA on
Neuro-Fuzzy structure and additional Fuzzy C-Means
Clustering for optimizing the GMFs, International Joint
Conference on Neural Networks, 3960-3965, 008.
[22] Posedel, P, Analysis of the exchange rate and pricing foreign
currency options on the Coration market : The NGARCH
model as an alternative to the Black-Scholes model,
Financial Theory and Practice 30(4)347-368(2006)
[23] A. Rua. A wavelet approach for factor-augmented forecasting.
Journal of Forecasting, (DOI:102.1002/for.1200), 2010.
[24] N. I. Sapankevych. R.Sankar. Time Series Prediction Using
Support Vector Machines: A Survey, IEEE Computational
Intelligence Magazine, pp. 25-38 May 2009.
[25] P. Sivakumar. Bagavathi.V.P.Mohandas.Modeling and
Predicting Stock Returns using the ARFIMA-FIGARCH a
case study on Indian Stock data, 2009 World Congress on
Nature & Biologically Inspired Computing (NaBIC 2009)
pp896-901, 2009
[26] J. Suykens. KatholiekeUniversiteit Leuven Software
available at http://www.esat.kueuven.be/sista/lssvmlab, 2011
[27] A. Wong. Winner of Bull and Bear, China People’s
University Publisher, 2010
[28] Zhang, Y. shen,WStock Yield Forecast based on LS-SVM in
Bayesianinference, 2009 ETP International Conference on
Future Computer and Communication, pp.8-11, DPO
10/11-0FCC.2009.34 IEEE, 2009.
[29] Y. Zhai, A. Hsu and S. K. Halgamuge, “Combining News
and Technical Indicators in Daily Stock Price Trends
Prediction”, Springer-Verlag Berlin Heidelberg,, pp.
1087-1096, 2007.
[30] J.G. Zhou. J.M. Tian. Predicting Corporate Financial Distress
Based on Rough Sets and Wavelet Support Vector Machine.
2007 International Conference on Wavelet Analysis and
Pattern Recognition, 602-607, 2007.
[31] R.E. Lucas. Asset Prices in an Exchange Economy.
Econometrica,1429-1445, 1978.


Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close