Documente Academic
Documente Profesional
Documente Cultură
CHAPTER 1
INTRODUCTION
1.1 Overview
Stock market plays a very important role in fast economic growth of the
developing country like India. Developing nation’s growth may depend on performance
of stock market. If stock market rises, then countries economic growth would be high. If
stock market falls, then countries economic growth would be down. In other words, we
can say that stock market and country growth is tightly bounded with the performance of
stock market. In any country, only 10% of the people engaging themselves with the stock
market investment because of the dynamic nature of the stock market. There is a
misconception about the stock market i.e. buying and selling of shares is an act of
gambling. This misconception can be changed and bringing the awareness across the
people.
The attempt to solve the investor's problem of what to buy and when to buy led to
the emergence of two distinct schools of thought regarding security valuation and stock
price behaviour in the early period of stock market study. They are popularly referred as
the Fundamental Analysis and the Technical Analysis. The recent invention in computer
science and accessibility of Internet leads to use latest methods in prediction of stock
market. ARIMA model is one of the standard models for the prediction of future direction
of time series. RNN is the model used the techniques of computer science.
The Fundamental Analysts maintain that at any instant an individual security has
an intrinsic value, which should be equal to the present value of the future stream of
income from that security discounted at an appropriate risk- related rate of interest. The
actual price of a security is considered to be a function of set of anticipated returns and
anticipated capitalization rates. The real worth of a security is estimated by considering
the key economic and financial variables such as earning, dividends, and growth in
earning, capital structure, size of the company etc.
Predicting the Stock Market has been the bane and goal of investors since its
existence. Everyday billions of dollars are traded on the exchange, and behind each dollar
is an investor hoping to profit in one way or another. Entire companies rise and fall daily
based on the behaviour of the market. Should an investor be able to accurately predict
market movements, it offers a tantalizing promise of wealth and influence. It is no wonder
then that the Stock Market and its associated challenges find their way into the public
imagination every time it misbehaves. The 2008 financial crisis was no different, as
evidenced by the flood of films and documentaries based on the crash. If there was a
common theme among those productions, it was that few people knew how the market
worked or reacted.
Forecasting techniques play important role in stock market which can search
uncover and hidden patterns and increasing the certain level of accuracy, where
traditional and statistical methods are lacking. There is huge amount of data are generated
by stock markets forced the researchers to apply forecasting to make investment
decisions.
The credit of originating the concept of investment value goes to John B. Williams
[1938], who has also presented an actual formula for determining the intrinsic value of
stocks. However, the concept of intrinsic value was popularized by B. Graham and D.
Dodd [1934] in their classic book 'Security Analysis'. Many researchers have suggested
further development in the theory of intrinsic value.
1.3 Objectives
1.4 Limitations
• Handling of time series data in neural networks is very complicated.
• For the prediction of stock market with time series analysis technique requires high
volumes of data for training.
• The performance studies neglect some important features of the financial markets like
transaction costs, limited volume at given prices.
• The non-quantifiable factors like natural disasters, changes in the company board,
company merges, etc cannot be considered.
Chapter 1 gives a brief description about stock market and how it works and the
models that can be used to predict the stock price. Also, the problem statement,
objectives, limitations.
Chapter 2 includes literature survey, which have been referred to develop the
model.
Chapter 3 deals with the analysis part of development. Details of existing system,
its drawbacks, the proposed system, its advantages, the functional and non-functional as
well as the hardware and software requirements are specified.
CHAPTER 2
LITERATURE SURVEY
I.Svalina, et al works on [1], Stock Market are one of the important parts of the
economy of a country. Actually, it’s the most important way for the companies to raise
capital. Not only the investors but also the common peoples are also finding it as an
investment tool. As stock market influences individual and national economy heavily,
predicting the future values of stock market is essential task while taking the correct
decision whether to buy or sell the share [3]. But it was very difficult to predict the stock
price trends [14] efficiently because many factors such as economics, politics,
environment etc were deciding parameters.
Adaptive Network-Based Fuzzy Inference System (ANFIS) has been used for
stock prediction of Istanbul Stock Exchange [2]. [1] also uses an ANFIS based model for
stock price prediction. A three-stage stock market prediction system is introduced in
paper [13]. [5] presents an integrated system where wavelet transforms and recurrent
neural network (RNN) based on artificial bee colony (abc) algorithm (called ABC-RNN)
are combined for stock price forecasting. A review of used data mining techniques used
in this purpose is analysed in [10].
The development process of the TS fuzzy model can be achieved in two steps 1)
the determination of the membership functions in the rule antecedents using the model
input data; 2) the estimation of the consequence parameters. They used least-square
estimation to estimate these parameters. The results were promising. M.H. FazelZarandiet
al. [8] have developed a type-2 fuzzy rule based expert system for stock price analysis.
Interval type-2 fuzzy logic system permitted to model rule uncertainties and every
membership value of an element was interval itself. The proposed type-2 fuzzy model
applied the technical and fundamental indexes as the input variables. S
AbdulsalamiSulaiman Olaniyi et al [11] have proposed a linear regression method of
analysing coupled behaviour of stocks in the market.
T.-J. Hsieh et al works on [4], Modelling functions of neural networks are being
applied to a widely expanding range of applications in addition to the traditional areas
such as pattern recognition and control. Its non-linear learning and smooth interpolation
capabilities give the neural network an edge over standard computers and expert systems
for solving certain problems. Accurate stock market prediction is one such problem.
Several mathematical models have been developed, but the results have been
dissatisfying. We chose this application as a means to check whether neural networks
could produce a successful model in which their generalization capabilities could be used
for stock market prediction.
Fujitsu and Nikko Securities are working together to develop TOPIX’s a buying
and selling prediction system. The input consists of several technical and economic
indexes. In our system, several modular neural networks leamed the relationships
between the past technical and economic indexes and the timing for when to buy and sell.
A prediction system that was made up of modular neural networks was found to be
accurate. Simulation of buying and selling stocks using the prediction system showed an
excellent profit. Stock price fluctuation factors could be extracted by analysing the
networks.
CHAPTER 3
ANALYSIS
3.1.1 Description
3.1.2 Drawbacks
The accuracy of the prediction by Linear Regression is actually not high enough
to make a good decision on stock trading. Linear Regression is limited to linear
relationships. The algorithm already assume the system is a straight-line. However, for
stock trading, the values of the system could be either a raise, a drop or remain constant.
The data values are scattered and fluctuated. Apart from that, Linear Regression is not a
complete description of relationships among variable. It only provides the functionality to
investigate on the mean of the dependent variable and the independent variable. However,
it is not applicable for the situation we encountered in stock market. And hence, the
prediction is actually suppressed by this constraint
3.2.1 Description
Analysts making forecasts often have extensive domain knowledge about the
quantity they are forecasting, but limited statistical knowledge. In the Prophet model
specification, there are several places where analysts can alter the model to apply their
expertise and external knowledge without requiring any understanding of the underlying
statistics.
Capacities: Analysts may have external data for the total market size and can
apply that knowledge directly by specifying capacities. Changepoints: Known dates of
changepoints, such as dates of product changes, can be directly specified. Holidays and
seasonality: Analysts that we work with have experience with which holidays impact
growth in which regions, and they can directly input the relevant holiday dates and the
applicable time scales of seasonality. Smoothing parameters: By adjusting τ an analyst
can select from within a range of more global or locally smooth models.
The seasonality and holiday smoothing parameters allow the analyst to tell the
model how much of the historical seasonal variation is expected in the future. With good
visualization tools, analysts can use these parameters to improve the model fit. When the
model fit is plotted over historical data, it is quickly apparent if there were changepoints
that were missed by the automatic changepoint selection.
3.2.2 Advantages
The non-functional requirement specifies the criteria that can be used to judge the
operation of a system, rather than specific behaviours.
• User Interfaces: The external users are the clients. All the clients can use
this software for indexing and searching.
• Hardware Interfaces: The external hardware interface used for indexing
and searching is personal computers of the clients. The PC’s may be laptops
with wireless LAN as the internet connections provided will be wireless.
• Software Interfaces: The Operating Systems can be any version of
Windows.
• Performance Requirements: The PC’s used must be atleast have i5
processor so that they can give optimum performance of the product.
• Hard Disk : 10 GB
• RAM : 8 GB RAM
• Processor : Multicore processor, i5-i7
• Processor speed : 2.6 GHz and above
CHAPTER 4
DESIGN
GLOBAL
DATASET
TEST PROPHET
TESTING SET
RESULTS MODEL
As illustrated in the figure 4.4 the uncleaned training data is sent to the data
processing model which results in parsed processed data (parse data), where machine
learning techniques applied and sent to the training model. The results are predicted and
with the help of matplot library the product attribute graph is obtained.
CHAPTER 5
IMPLEMENTATION
• Data Collection
• Data Pre-processing
• Normalization
• Model Fitting
• Testing/Validation
Quandl's data products come in many forms and contain various objects, including
time-series and tables. Through our APIs and various tools (R, Python, Excel, etc.), users
can access/call the premium data to which they have subscribed. (Our free data can be
accessed by anyone who has registered for an API key.)
5.1.3 Normalization
Normalization is a technique often applied as part of data preparation for machine
learning. The goal of normalization is to change the values of numeric columns in the
dataset to a common scale, without distorting differences in the ranges of values. For
machine learning, every dataset does not require normalization. It is required only when
5.1.5 Testing/Validation
Test Dataset: The sample of data used to provide an unbiased evaluation of a final
model fit on the training dataset. The Test dataset provides the gold standard used to
evaluate the model. It is only used once a model is completely trained (using the train and
validation sets).
Some of the main packages used in this project are as mentioned below:
1. Fbprophet
Prophet follows the sklearn model API. An instance off the Prophet class is created
and then call its fit and predict methods.
2. pytrends
Unofficial API for Google Trends (fork). Allows simple interface for automating
downloading of reports from Google Trends. Main feature is to allow the script to
login to Google on your behalf to enable a higher rate limit.
3. Pandas
Pandas is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series.
4. NumPy
NumPy is a package in Python used for Scientific Computing. NumPy package is
used to perform different operations.
5. from pytrends.request import TrendReq
It is used to connect to google.
6. matplotlib. pyplot as plt
Matplotlib is a Python 2D plotting library which produces publication quality figures
in a variety of hardcopy formats and interactive environments across platforms.
7. Quandl
The API can be used to deliver more complex datasets. This call gets the quarterly
percentage change in AAPL stock between 1985 and 1997, closing prices only, in
JSON format.
Parameters none
details
import pytrends
import pandas as pd
import numpy as np
import matplotlib
import quandl
class Stocker():
# Enforce capitalization
ticker = ticker.upper()
self.symbol = ticker
try:
except Exception as e:
print(e)
return
stock = stock.reset_index(level=0)
stock['ds'] = stock['Date']
self.stock = stock.copy()
self.min_date = min(stock['Date'])
self.max_date = max(stock['Date'])
# Find max and min prices and dates on which they occurred
self.max_price = np.max(self.stock['y'])
self.min_price = np.min(self.stock['y'])
self.min_price_date = self.min_price_date[self.min_price_date.index[0]]
self.max_price_date = self.max_price_date[self.max_price_date.index[0]
self.round_dates = True
self.training_years = 3
# Prophet parameters
self.changepoint_prior_scale = 0.05
self.weekly_seasonality = True
self.daily_seasonality = True
self.monthly_seasonality = True
self.yearly_seasonality = True
self.changepoints = None
self.min_date.date(), self.max_date.date()))
"""
Make sure start and end dates are in the range and can be
"""
# Default start and end date are the beginning and end of data
if start_date is None:
start_date = self.min_date
if end_date is None:
end_date = self.max_date
try:
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
except Exception as e:
print(e)
return
valid_start = False
valid_end = False
# User will continue to enter dates until valid dates are met
valid_end = True
valid_start = True
valid_end = False
valid_start = False
else:
valid_end = False
valid_start = False
"""
"""
if not df:
df = self.stock.copy()
# keep track of whether the start and end dates are in the data
start_in = True
end_in = True
if self.round_dates:
start_in = False
end_in = False
else:
else:
if (not start_in):
else:
valid_start = False
valid_end = False
# No round dates, if either data not in, print message and return
if (start_date in list(df['Date'])):
valid_start = True
if (end_date in list(df['Date'])):
valid_end = True
print('Start Date not in data (either out of range or not a trading day.)')
print('End Date not in data (either out of range or not a trading day.)')
return trim_df
self.reset_plot()
if start_date is None:
start_date = self.min_date
if end_date is None:
end_date = self.max_date
stat_min = min(stock_plot[stat])
stat_max = max(stock_plot[stat])
stat_avg = np.mean(stock_plot[stat])
date_stat_min = date_stat_min[date_stat_min.index[0]].date()
date_stat_max = date_stat_max[date_stat_max.index[0]].date()
stat], self.max_date.date())
# Percentage y-axis
if plot_type == 'pct':
# Simple Plot
plt.style.use('fivethirtyeight');
else:
label = stat)
plt.legend(prop={'size':10})
# Stat y-axis
plt.style.use('fivethirtyeight');
plt.legend(prop={'size':10})
plt.show();
@staticmethod
def reset_plot():
matplotlib.rcParams.update(matplotlib.rcParamsDefault)
matplotlib.rcParams['figure.figsize'] = (8, 5)
matplotlib.rcParams['axes.labelsize'] = 10
matplotlib.rcParams['xtick.labelsize'] = 8
matplotlib.rcParams['ytick.labelsize'] = 8
matplotlib.rcParams['axes.titlesize'] = 14
matplotlib.rcParams['text.color'] = 'k'
dataframe = dataframe.set_index('ds')
dataframe = dataframe.resample('D')
dataframe = dataframe.reset_index(level=0)
dataframe = dataframe.interpolate()
return dataframe
dataframe = dataframe.reset_index(drop=True)
weekends = []
weekends.append(i)
return dataframe
# Calculate and plot profit from buying and holding shares for specified date range
self.reset_plot()
# Total profit
print('{} Total buy and hold profit from {} to {} for {} shares = ${:.2f}'.format
plt.style.use('dark_background')
plt.text(x = text_location,
s = '$%d' % total_hold_profit,
plt.grid(alpha=0.2)
plt.show();
def create_model(self):
model = fbprophet.Prophet(daily_seasonality=self.daily_seasonality,
weekly_seasonality=self.weekly_seasonality,
yearly_seasonality=self.yearly_seasonality,
changepoint_prior_scale=self.changepoint_prior_scale,
changepoints=self.changepoints)
if self.monthly_seasonality:
return model
pd.DateOffset(years=self.training_years)).date())]
self.changepoint_prior_scale = prior
model = self.create_model()
model.fit(train)
if i == 0:
predictions = future.copy()
future = model.predict(future)
future = model.predict(future)
if days > 0:
else:
fig, ax = plt.subplots(1, 1)
ax.fill_between(future['ds'].dt.to_pydatetime(), future['yhat_upper'],
# Plot formatting
plt.title(title);
plt.show()
if start_date is None:
if end_date is None:
end_date = self.max_date
# Training data starts self.training_years years before start date and goes up to start
end_date.date())]
test['in_range'] = False
for i in test.index:
if (test.ix[i, 'y'] < test.ix[i, 'yhat_upper']) & (test.ix[i, 'y'] > test.ix[i, 'yhat_lower']):
if not nshares:
end_date.date()))
future.ix[len(future) - 1, 'yhat']))
${:.2f}.'.format(train_mean_error))
# Direction accuracy
print('When the model predicted an increase, the price increased {:.2f}% of the )
print('When the model predicted a decrease, the price decreased {:.2f}% of the )
print('The actual value was within the {:d}% confidence interval {:.2f}% of the
self.reset_plot()
fig, ax = plt.subplots(1, 1)
'Observations')
'Observations')
ax.fill_between(future['ds'].dt.to_pydatetime(), future['yhat_upper'],
plt.vlines(x=min(test['ds']).date(), ymin=min(future['yhat_lower']),
'Prediction Start')
# Plot formatting
start_date.date(), end_date.da
if start_date is None:
if end_date is None:
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
start_date.date())]
plt.legend(prop={'size':10})
plt.show();
self.reset_plot()
plt.grid(color='k', alpha=0.3)
plt.xticks(results['cps'], results['cps'])
plt.legend(prop={'size':10})
plt.show();
CHAPTER 6
TESTING
In this chapter, an overview of testing is provided to verify the correctness and the
functionality of the system. Software testing is the process of analysing a software item to
detect the differences between the existing and required conditions and to evaluate the
features of software item. Software testing is an activity that should be done throughout
the development process. Software testing is a task intended to detect defects in software
by contrasting a computer program’s expected results with its actual results for given set
of inputs.
Test Environment
Test Case
Set of test inputs, execution conditions, and expected results were developed for a
particular objective, such as to exercise a particular program path or to verify compliance
with a specific requirement. It included the following.
• Features to be tested
• Items to be tested
• Purpose of testing
• Pass/Fail criteria
Hence as mentioned above the traditional unit/integration testing would not work
on machine learning models hence it is tested based on its accuracy and prediction.
For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:
Accuracy=TP+TN/TP+TN+FP+FN
When it comes to forecasting the models are evaluated based on the expected
results they predict, In case of stock market forecasting, we have divided the data into
training set and testing set again it is split into training dataset and validation dataset in
the training set. We train our model using the training dataset and validation dataset is
used to test the trained data. A validation dataset is a sample of data held back
from training your model that is used to give an estimate of model skill while tuning
model's hyperparameters.
A test dataset is a dataset that is independent of the training dataset, but that
follows the same probability distribution as the training dataset. If a model fit to the
training dataset also fits the test dataset well.
As you can see in the above graph dotted vertical line passing through the y axis is
the point from which our prediction starts and the prices depicted in blue line is our
predicted stocks values and the black line is the observed value. Hence by observing the
predicted vs observed value we can tell how well our model works.
Conclusion
Future Enhancements
In the future, the stock market prediction system can be further improved by
utilizing a much bigger dataset than the one being utilized currently. This would help to
increase the accuracy of our prediction models. Furthermore, other models of Machine
Learning could also be studied to check for the accuracy rate resulted by them.
SNAPSHOTS
ANNEXURE A
GLOSSARY
Accuracy
Accuracy is a metric by which one can examine how good is the machine learning model.
Assets
Everything a company or person owns, including money, securities, equipment and real
estate. Assets include everything that is owed to the company or person.
Bar Chart
Bar chart are the type of graph that are used to display and and compare the numbers,
frequencyor other measures.
Classification
The identification of which of two or more categories an item falls under; a classic
machine learning task. Deciding whether an email message is spam or not classifies it
among two categories, and analysis of data about movies might lead to classification of
them among several genres.
Confidence Interval
Covariance
A measure of the relationship between two variables whose values are observed at the
same time; specifically, the average value of the two variables diminished by the product
of their average values.
Capital Stock
All shares representing ownership of a company, including preferred and common shares.
Close Price
The price of the last board lot trade executed at the close of trading
Dependent variable
The value of a dependent value “depends” on the value of the independent variable. If
you're measuring the effect of different sizes of an advertising budget on total sales, then
the advertising budget figure is the independent variable and total sales is the dependent
variable
Reinforcement Learning
A class of machine learning algorithms in which the process is not given specific goals to
meet but, as it makes decisions, is instead given indications of whether it’s doing well or
not.
Also, RMSE. The square root of the Mean Squared Error. This is more popular than Mean
Squared Error because taking the square root of a figure built from the squares of the
observation value errors gives a number that’s easier to understand in the units used to
measure the original observations.
A statistical measure of the state of the stock market, based on the performance of certain
stocks. Examples include the S&P/TSX Composite Index and the S&P/TSX Venture
Composite Index.
ANNEXURE B
ACRONYMS
BIBLIOGRAPHY
[1] I. Svalina, V. Galzina, R. Luji, and G. Imunovi, "An adaptive network- based fuzzy
inference system (ANFIS) for the forecasting: The case of close price indices,"
Expert systems with applications, vol. 40, no. 15, pp. 60556063, 2013.
[2] M. A. Boyacioglu and D. Avci, "An adaptive network-based fuzzy inference system
(ANFIS) for the prediction of stock market return: the case of the Istanbul stock
exchange," Expert Systems with Applications, vol. 37, no. 12, pp. 79087912, 2010.
[3] E. F. Fama and K. R. French, "Common risk factors in the returns on stocks and
bonds," Journal of financial economics, vol. 33, no. 1, pp. 356, 1993.
[4] T.-J. Hsieh, H.-F. Hsiao, and W.-C. Yeh, "Forecasting stock markets using wavelet
transforms and recurrent neural networks: An integrated system based on artificial
bee colony algorithm," Applied soft comput- ing, vol. 11, no. 2, pp. 25102525,
2011.
[5] Hall JW. Adaptive selection of US stocks with neural nets. In: Deboeck GJ, editor.
Trading on the edge: neural, genetic, and fuzzy systems for chaotic financial
markets. New York: Wiley; 1994. p. 45–65.
[6] Tay FEH, Cao LJ. Application of support vector machines in 1nancial time series
forecasting. Omega 2001; 29:309–17.
[7] Eugene F. Fama “The Behavior of Stock Market Prices”, the Journal of Business,
Vol 2, No. 2, pp. 7–26, January 1965.
[8] Cao LJ, Tay FEH. Financial forecasting using support vector machines. Neural
Computing Applications 2001; 10:184–92.
[9] Zhen Hu, Jibe Zhu, and Ken Tse “Stocks Market Prediction Using Support
Vector Machine”, International Conference on Information Management,
Innovation Management and Industrial Engineering, 2013.M.
[12] K. jae Kim, “Financial time series forecasting using support vector
machines,” Neurocomputing, vol. 55, 2003.
[13] Debashish Das and Mohammad shorif uddin data mining and neural network
techniques in stock market prediction: a methodological review, international
journal of artificial intelligence & applications, vol.4, no.1, January 2013