Main Report PDF

Stock Market Portfolio Optimization
CHAPTER 1
INTRODUCTION
1.1 Overview
Stock market plays a very important role in fast economic growth of the
developing country like India. Developing nation’s growth may depend on performance
of stock market. If stock market rises, then countries economic growth would be high. If
stock market falls, then countries economic growth would be down. In other words, we
can say that stock market and country growth is tightly bounded with the performance of
stock market. In any country, only 10% of the people engaging themselves with the stock
market investment because of the dynamic nature of the stock market. There is a
misconception about the stock market i.e. buying and selling of shares is an act of
gambling. This misconception can be changed and bringing the awareness across the
people.
The attempt to solve the investor's problem of what to buy and when to buy led to
the emergence of two distinct schools of thought regarding security valuation and stock
price behaviour in the early period of stock market study. They are popularly referred as
the Fundamental Analysis and the Technical Analysis. The recent invention in computer
science and accessibility of Internet leads to use latest methods in prediction of stock
market. ARIMA model is one of the standard models for the prediction of future direction
of time series. RNN is the model used the techniques of computer science.
The Fundamental Analysts maintain that at any instant an individual security has
an intrinsic value, which should be equal to the present value of the future stream of
income from that security discounted at an appropriate risk- related rate of interest. The
actual price of a security is considered to be a function of set of anticipated returns and
anticipated capitalization rates. The real worth of a security is estimated by considering
the key economic and financial variables such as earning, dividends, and growth in
earning, capital structure, size of the company etc.
Dept. of I.S.E., S.C.E. 2018-19 1

Predicting the Stock Market has been the bane and goal of investors since its
existence. Everyday billions of dollars are traded on the exchange, and behind each dollar
is an investor hoping to profit in one way or another. Entire companies rise and fall daily
based on the behaviour of the market. Should an investor be able to accurately predict
market movements, it offers a tantalizing promise of wealth and influence. It is no wonder
then that the Stock Market and its associated challenges find their way into the public
imagination every time it misbehaves. The 2008 financial crisis was no different, as
evidenced by the flood of films and documentaries based on the crash. If there was a
common theme among those productions, it was that few people knew how the market
worked or reacted.
Forecasting techniques play important role in stock market which can search
uncover and hidden patterns and increasing the certain level of accuracy, where
traditional and statistical methods are lacking. There is huge amount of data are generated
by stock markets forced the researchers to apply forecasting to make investment
decisions.
The credit of originating the concept of investment value goes to John B. Williams
[1938], who has also presented an actual formula for determining the intrinsic value of
stocks. However, the concept of intrinsic value was popularized by B. Graham and D.
Dodd [1934] in their classic book 'Security Analysis'. Many researchers have suggested
further development in the theory of intrinsic value.
Fundamentalist forecast stock prices on the basis of economic condition of the

industry and company statistics. In a major study covering the period 1927 - 1960. King
[1966] has found that about 50% of the variance of the average stock returns is explained
by the overall market factors. In another study over the period January 1966 to June 1970
Livingston [1977] observed that approximately 23% of the variance in stock returns was
accounted by the market effect. Elton and Gruber [1987] also reported that about 25% to
50% of the variations in a company's annual earning are due to the state of the overall
economy.
Dept. of I.S.E., S.C.E. 2018-19 2

1.2 Problem Statement

Investors are familiar with the saying, “buy low, sell high” but this does not
provide enough context to make proper investment decisions. Before an investor invests
in any stock, he needs to be aware how the stock market behaves. Investing in a good
stock but at a bad time can have disastrous results, while investment in a mediocre stock
at the right time can bear profits. Financial investors of today are facing this problem of
trading as they do not properly understand as to which stocks to buy or which stocks to
sell in order to get optimum profits. Predicting long term value of the stock is relatively
easy than predicting on day-to-day basis as the stocks fluctuate rapidly every hour based
on world events.
1.3 Objectives
• In the past decades, there is an increasing interest in predicting markets among

economists, policymakers, academics and market makers. The objective of the
proposed work is to study and improve the supervised learning algorithms to predict
the stock price.
• The system must be able to access a list of historical prices. It must calculate the
estimated price of stock based on the historical data.
• To predict approximate value of share price using Recurrent Neural Network.
• To provide analysis for users through User Interface.
1.4 Limitations
• Handling of time series data in neural networks is very complicated.
• For the prediction of stock market with time series analysis technique requires high
volumes of data for training.
• The performance studies neglect some important features of the financial markets like
transaction costs, limited volume at given prices.
• The non-quantifiable factors like natural disasters, changes in the company board,
company merges, etc cannot be considered.
Dept. of I.S.E., S.C.E. 2018-19 3

1.5 Organization of the Report
This report gives a description Stock Market Portfolio Optimization on Recurrent

Neural Network. This report is organized as 5 chapters, namely, introduction, analysis,
design, implementation, and lastly conclusion and future enhancements.
Chapter 1 gives a brief description about stock market and how it works and the
models that can be used to predict the stock price. Also, the problem statement,
objectives, limitations.
Chapter 2 includes literature survey, which have been referred to develop the
model.
Chapter 3 deals with the analysis part of development. Details of existing system,
its drawbacks, the proposed system, its advantages, the functional and non-functional as
well as the hardware and software requirements are specified.
Chapter 4 specifies the design details. Design is the process of establishing a

system that will satisfy the previously identified functional and non-functional
requirements. It contains a mention of the system block diagram or the architecture, and
various diagrams like the use case diagram, the sequence diagram, and lastly the activity
diagram.
Chapter 5 includes the implementation part. Implementation is the process of

converting the system design into an operational one. This phase starts after the
completion of the development phase and must be carefully planned and controlled as it is
a key stage. It includes a list of main packages, some of the user-defined functions and
some sample code.
Chapter 6 includes the testing part which is an investigation conducted to provide

stakeholders with information of the quality of product or service under test. It also
gives a business an opportunity to understand the risks of software implementation.
Test techniques include, but are not limited to the process of executing a program or
application with the intent of finding software bugs.
Dept. of I.S.E., S.C.E. 2018-19 4

CHAPTER 2
LITERATURE SURVEY
I.Svalina, et al works on [1], Stock Market are one of the important parts of the
economy of a country. Actually, it’s the most important way for the companies to raise
capital. Not only the investors but also the common peoples are also finding it as an
investment tool. As stock market influences individual and national economy heavily,
predicting the future values of stock market is essential task while taking the correct
decision whether to buy or sell the share [3]. But it was very difficult to predict the stock
price trends [14] efficiently because many factors such as economics, politics,
environment etc were deciding parameters.
Adaptive Network-Based Fuzzy Inference System (ANFIS) has been used for
stock prediction of Istanbul Stock Exchange [2]. [1] also uses an ANFIS based model for
stock price prediction. A three-stage stock market prediction system is introduced in
paper [13]. [5] presents an integrated system where wavelet transforms and recurrent
neural network (RNN) based on artificial bee colony (abc) algorithm (called ABC-RNN)
are combined for stock price forecasting. A review of used data mining techniques used
in this purpose is analysed in [10].
M. A. Boyacioglu et al works on [2], The financial market is a complex,

evolutionary, and non-linear dynamical system. The field of financial forecasting is
characterized by data intensity, noise, non-stationary, unstructured nature, high degree of
uncertainty, and hidden relationships [2]. Many factors interact in finance including
political events, general economic conditions, and traders’ expectations. Therefore,
predicting finance market price movements is quite difficult. Increasingly, according to
academic investigations, movements in market prices are not random. Rather, they
behave in a highly non-linear, dynamic manner. The standard random walk assumption of
futures prices may merely be a veil of randomness that shrouds a noisy non-linear
process.
Dept. of I.S.E., S.C.E. 2018-19 5

Support vector machine (SVM) is a very speci1c type of learning algorithms

characterized by the capacity control of the decision function, the use of the kernel
functions and the sparsity of the solution [6–8]. Established on the unique theory of the
structural risk minimization principle to estimate a function by minimizing an upper
bound of the generalization error, SVM is shown to be very resistant to the over-1tting
problem, eventually achieving a high generalization performance. Another key property
of SVM is that training SVM is equivalent to solving a linearly constrained quadratic
programming problem so that the solution of SVM is always unique and globally optimal,
unlike neural networks training which requires nonlinear optimization with the danger of
getting stuck at local minima.
E. F. Fama et al works on [3], Prediction of stock prices is very challenging and

complicated process because price movement just behaves like a random walk and time
varying. In recent years various researchers have used intelligent methods and techniques
in stock market for trading decisions. Here, we present a brief review of some of the
significant researchers. A Sheta [7] has used Takagi- Sugeno (TS) technique to develop
fuzzy models for two nonlinear processes. They were estimated software effort for a
NASA software projects and the prediction of the next week S&P 500 for stock market.
The development process of the TS fuzzy model can be achieved in two steps 1)
the determination of the membership functions in the rule antecedents using the model
input data; 2) the estimation of the consequence parameters. They used least-square
estimation to estimate these parameters. The results were promising. M.H. FazelZarandiet
al. [8] have developed a type-2 fuzzy rule based expert system for stock price analysis.
Interval type-2 fuzzy logic system permitted to model rule uncertainties and every
membership value of an element was interval itself. The proposed type-2 fuzzy model
applied the technical and fundamental indexes as the input variables. S
AbdulsalamiSulaiman Olaniyi et al [11] have proposed a linear regression method of
analysing coupled behaviour of stocks in the market.
T.-J. Hsieh et al works on [4], Modelling functions of neural networks are being
applied to a widely expanding range of applications in addition to the traditional areas
such as pattern recognition and control. Its non-linear learning and smooth interpolation
Dept. of I.S.E., S.C.E. 2018-19 6

capabilities give the neural network an edge over standard computers and expert systems
for solving certain problems. Accurate stock market prediction is one such problem.
Several mathematical models have been developed, but the results have been
dissatisfying. We chose this application as a means to check whether neural networks
could produce a successful model in which their generalization capabilities could be used
for stock market prediction.
Fujitsu and Nikko Securities are working together to develop TOPIX’s a buying
and selling prediction system. The input consists of several technical and economic
indexes. In our system, several modular neural networks leamed the relationships
between the past technical and economic indexes and the timing for when to buy and sell.
A prediction system that was made up of modular neural networks was found to be
accurate. Simulation of buying and selling stocks using the prediction system showed an
excellent profit. Stock price fluctuation factors could be extracted by analysing the
networks.
Dept. of I.S.E., S.C.E. 2018-19 7

CHAPTER 3
ANALYSIS
3.1 Existing System
3.1.1 Description
Linear regression is widely used throughout Finance in a plethora of applications.

Linear regression is a method used to model a relationship between a dependent variable
(y), and an independent variable (x). With simple linear regression, there will only be one
independent variable x. There can be many independent variables which would fall under
the category of multiple linear regression. In this circumstance, we only have one
independent variable which is the date. The date will be represented by an integer starting
at 1 for the first date going up to the length of the vector of dates which can vary
depending on the time series data. Our dependent variable, of course, will be the price of
a stock.
3.1.2 Drawbacks
The accuracy of the prediction by Linear Regression is actually not high enough
to make a good decision on stock trading. Linear Regression is limited to linear
relationships. The algorithm already assume the system is a straight-line. However, for
stock trading, the values of the system could be either a raise, a drop or remain constant.
The data values are scattered and fluctuated. Apart from that, Linear Regression is not a
complete description of relationships among variable. It only provides the functionality to
investigate on the mean of the dependent variable and the independent variable. However,
it is not applicable for the situation we encountered in stock market. And hence, the
prediction is actually suppressed by this constraint
Dept. of I.S.E., S.C.E. 2018-19 8

3.2 Proposed System
3.2.1 Description
Analysts making forecasts often have extensive domain knowledge about the
quantity they are forecasting, but limited statistical knowledge. In the Prophet model
specification, there are several places where analysts can alter the model to apply their
expertise and external knowledge without requiring any understanding of the underlying
statistics.
Capacities: Analysts may have external data for the total market size and can
apply that knowledge directly by specifying capacities. Changepoints: Known dates of
changepoints, such as dates of product changes, can be directly specified. Holidays and
seasonality: Analysts that we work with have experience with which holidays impact
growth in which regions, and they can directly input the relevant holiday dates and the
applicable time scales of seasonality. Smoothing parameters: By adjusting τ an analyst
can select from within a range of more global or locally smooth models.
The seasonality and holiday smoothing parameters allow the analyst to tell the
model how much of the historical seasonal variation is expected in the future. With good
visualization tools, analysts can use these parameters to improve the model fit. When the
model fit is plotted over historical data, it is quickly apparent if there were changepoints
that were missed by the automatic changepoint selection.
3.2.2 Advantages
• Our analyst-in-the-loop modeling approach is an alternative approach that

attempts to blend the advantages of statistical and judgmental forecasts by
focusing analyst effort on improving the model when necessary rather that directly
producing forecasts through some unstated procedure.
• We find that our approach closely resembles the “transform-visualizemodel” loop
proposed by Wickham & Grolemund (2016), where the human domain knowledge
is codified in an improved model after some iteration. Typical scaling of
Dept. of I.S.E., S.C.E. 2018-19 9

forecasting would rely on fully automated procedures, but judgmental forecasts

have been shown to be highly accurate in many applications (Lawrence et al.
2006).
• Our proposed approach lets analysts apply judgment to forecasts through a small
set of intuitive model parameters and options, while retaining the ability to fall
back on fully automated statistical forecasting when necessary.
3.3 Requirements Specifications
The direct result of requirements analysis is Requirements specification.

Hardware requirements specifications list the necessary hardware for the proper
functioning of the project. Software requirements specifications is a description of a
software system to be developed, laying out functional and non-functional requirements,
and may include a set of use cases that describe interactions the users will have the
software. In software engineering, a functional requirement defines the function of a
system and its components. A function is described as a set of inputs, the behaviour, and
outputs. A non-functional requirement that specifies the criteria that can be used to judge
the operation of a system, rather than specific behaviour.
3.3.1 Functional Requirements
The functional Requirements Specification documents the operation and activities

that a system must be able to perform.
Functional requirements include:
• Descriptions of how data is collected and stored.

• Descriptions of data cleaning and pre-processing methods.
• Descriptions of work-flows performed by the system.
• Descriptions of outputs.
• How the system meets applicable regulatory requirements.
3.3.2 Non-Functional Requirements
The non-functional requirement specifies the criteria that can be used to judge the
operation of a system, rather than specific behaviours.
Dept. of I.S.E., S.C.E. 2018-19 10

• User Interfaces: The external users are the clients. All the clients can use
this software for indexing and searching.
• Hardware Interfaces: The external hardware interface used for indexing
and searching is personal computers of the clients. The PC’s may be laptops
with wireless LAN as the internet connections provided will be wireless.
• Software Interfaces: The Operating Systems can be any version of
Windows.
• Performance Requirements: The PC’s used must be atleast have i5
processor so that they can give optimum performance of the product.
3.3.3 Hardware Requirements
Hardware requirements specifications list the necessary hardware for the

proper functioning of the project.
• Hard Disk : 10 GB
• RAM : 8 GB RAM
• Processor : Multicore processor, i5-i7
• Processor speed : 2.6 GHz and above
3.3.4 Software Requirements
Software requirements specifications is a description of a software system to be

developed, laying out functional and non-functional requirements, and may include a set
of use cases that describe interactions the users will have the software.
• Operating System : Windows 10 or Linux

• Programming Language : Python 3.7
• IDE : Jupyter Notebook
• API’s : Quandl
Dept. of I.S.E., S.C.E. 2018-19 11

CHAPTER 4
DESIGN
4.1 SYSTEM ARCHITECTURE

The figure 4.1 gives the overall system architecture of our project. It shows the
working of our project where the data is extracted from the global dataset which
undergoes the data processing. Then the filtered data is sent to the Prophet Model. Then
the model predicts the test results.
GLOBAL
DATASET
QUANDL API DATA PRE-

DATA FRAME
PROCESSING
TEST PROPHET
TESTING SET
RESULTS MODEL
Figure 4.1 System Architecture
Dept. of I.S.E., S.C.E. 2018-19 12

4.2 USE CASE DIAGRAM

In the figure 4.2 actor is the user which is represented by stick diagram. The
following use case diagram shows the interaction between user and the system. The user
inputs the date from which the stock market attributes are fetched from quandl API. The
prediction range can also be input by the user.
Figure 4.2 Use Case Diagram
4.3 PROPHET MODEL

Prophet is optimized for the business forecast tasks that were encountered at
Facebook.Prophet’s default settings to produce forecasts that are often accurate as those
produced by skilled forecasters, with much less effort. With Prophet, you are not stuck
with the results of a completely automatic procedure if the forecast is not satisfactory —
an analyst with no training in time series methods can improve or tweak forecasts using a
variety of easily-interpretable parameters. We have found that by combining automatic
forecasting with analyst-in-the-loop forecasts for special cases, it is possible to cover a
wide variety of business use-cases. The following diagram illustrates the forecasting
process we have found to work at scale.
Dept. of I.S.E., S.C.E. 2018-19 13

Figure4.3 Prophet Model
4.4 SEQUENCE DIAGRAM
A Sequence diagram is a structured in such a way that it represents a timeline

which begins at the top and descends gradually to mark the sequence of interactions. Each
object has a column and the message exchanged between them are represented by arrows.
As illustrated in the figure 4.4 the uncleaned training data is sent to the data
processing model which results in parsed processed data (parse data), where machine
learning techniques applied and sent to the training model. The results are predicted and
with the help of matplot library the product attribute graph is obtained.
Dept. of I.S.E., S.C.E. 2018-19 14

Figure4.4 Sequence Diagram
4.5 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of stepwise

activities and actions with support for choice, iteration and concurrency. In the Unified
Modelling Language, activity diagrams are intended to model both computational and
organizational processes as well as the data flows intersecting with the related
activities. Although activity diagrams primarily show the overall flow of control, they can
also include elements showing the flow of data between activities through one or more
data stores.
Dept. of I.S.E., S.C.E. 2018-19 15

Figure4.5 Activity Diagram
Dept. of I.S.E., S.C.E. 2018-19 16

CHAPTER 5
IMPLEMENTATION
5.1 Main Modules
The main models are
• Data Collection
• Data Pre-processing
• Normalization
• Model Fitting
• Testing/Validation
5.1.1 Data Collection
Data collection is the process of gathering and measuring information on targeted

variables in an established system, which then enables one to answer relevant questions
and evaluate outcomes. Data collection is a component of research in all fields of study
including physical and social sciences, humanities, and business. While methods vary by
discipline, the emphasis on ensuring accurate and honest collection remains the same. The
goal for all data collection is to capture quality evidence that allows analysis to lead to the
formulation of convincing and credible answers to the questions that have been posed.
Quandl's data products come in many forms and contain various objects, including
time-series and tables. Through our APIs and various tools (R, Python, Excel, etc.), users
can access/call the premium data to which they have subscribed. (Our free data can be
accessed by anyone who has registered for an API key.)
Dept. of I.S.E., S.C.E. 2018-19 17

5.1.2 Data Pre-processing
It is a data mining technique that transforms raw data into an understandable

format. Raw data (real world data) is always incomplete and that data cannot be sent
through a model. That would cause certain errors. That is why we need to pre-process data
before sending through a model.
5.1.2.1 Steps in Data Pre-processing

1. Import Libraries
2. Read Data
3. Checking for Missing Values
4.Checking for Categorical Variable
5.Satndarized the data
5.1.3 Normalization
Normalization is a technique often applied as part of data preparation for machine
learning. The goal of normalization is to change the values of numeric columns in the
dataset to a common scale, without distorting differences in the ranges of values. For
machine learning, every dataset does not require normalization. It is required only when
features have different ranges.
5.1.4 Model Fitting

Keras is an incredible library: it allows us to build state-of-the-art models in a few
lines of understandable Python code. Although other neural network libraries may be
faster or allow more flexibility, nothing can beat Keras for development time and ease-of-
use.
With the training and validation data prepared, the network built, and the
embeddings loaded, model to learn how to write patent abstracts
5.1.5 Testing/Validation
Test Dataset: The sample of data used to provide an unbiased evaluation of a final
model fit on the training dataset. The Test dataset provides the gold standard used to
Dept. of I.S.E., S.C.E. 2018-19 18

evaluate the model. It is only used once a model is completely trained (using the train and
validation sets).
Validation Dataset: The sample of data used to provide an unbiased evaluation of

a model fit on the training dataset while tuning model hyperparameters. The evaluation
becomes more biased as skill on the validation dataset is incorporated into the model
configuration.
5.2 Main Packages
Some of the main packages used in this project are as mentioned below:
1. Fbprophet
Prophet follows the sklearn model API. An instance off the Prophet class is created
and then call its fit and predict methods.
2. pytrends
Unofficial API for Google Trends (fork). Allows simple interface for automating
downloading of reports from Google Trends. Main feature is to allow the script to
login to Google on your behalf to enable a higher rate limit.
3. Pandas
Pandas is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series.
4. NumPy
NumPy is a package in Python used for Scientific Computing. NumPy package is
used to perform different operations.
5. from pytrends.request import TrendReq
It is used to connect to google.
6. matplotlib. pyplot as plt
Matplotlib is a Python 2D plotting library which produces publication quality figures
in a variety of hardcopy formats and interactive environments across platforms.
Dept. of I.S.E., S.C.E. 2018-19 19

7. Quandl
The API can be used to deliver more complex datasets. This call gets the quarterly
percentage change in AAPL stock between 1985 and 1997, closing prices only, in
JSON format.
5.3 Main User Defined Functions
A User-Defined Function (UDF) is a function provided by the user of program or

environment, in a context where the usual assumption is that functions are built into
program or environment. A user defined function is a programmed routine that has its
parameters set by the user of the system. Below mentioned are the user-defined functions
used.
Function Name create_prophet_model()

Syntax create_prophet_model(self, days=0, resample=False)
Description Method to fit the dataframe into a fbprophet forcasting

model,predict future values for ‘days’ number of days and
plot the values.
Parameters days, resample
Called Function create_model(), fit(), predict()
Return Value model, predicted data
Table 5.1 Function create_prophet_model () details
Function Name evaluate_prediction()

Syntax evaluate_prediction(self, start_date=None, end_date=None,
nshares = None)
Description Method to evaluate the accuracy of the predictions of the
model.
Parameters start_date, end_date, nshares
Called Function make_future_dataframe(), fit(), predict()
Return Value void
Table 5.2 Function evaluate_prediction () details
Dept. of I.S.E., S.C.E. 2018-19 20

Function Name changepoint_prior_analysis()

Syntax changepoint_prior_analysis(self, changepoint_priors=[0.001,
0.05, 0.1, 0.2], colors=['b', 'r', 'grey', 'gold'])
Description Method to analyse the effect of different changepoint values
and plot a graph to depict the same.
Parameters changepoint_priors, colors
Return Value Void
Table 5.3 Function changepoint_prior_analysis() details
Function Name buy_and_hold()

Syntax buy_and_hold(self, start_date=None, end_date=None,
nshares=1)
Description Preditcts the value of ‘n’ number of share values.
Parameters start_date, end_date, nshares
Called Function handle_dates()
Return Value Void
Table 5.4 Function buy_and_hold() details
Function Name predict_future()

Syntax predict_future(self, days=30)
Description Predicts the future values for ‘days’ number of days and
plots a graph for the same.
Parameters days
Called Function create_model(), fit(), predict()
Return Value void
Table 5.5 Function predict_future() details
Dept. of I.S.E., S.C.E. 2018-19 21

Function Name changepoint_prior_validation ()

Syntax changepoint_prior_validation(self, start_date=None,
end_date=None,changepoint_priors = [0.001, 0.05, 0.1,
0.2])
Description Evaluates the prediction errors and accuracy.
Parameters start_date, end_date, changepoint_priors
Return Value Void
Table 5.6 Function changepoint_prior_validation () details
4.4 Main Built-In Functions.
A function that is built into an application and can be accessed by end-users.

Some of the built-in applications used in the code are:
Function name get()

Syntax quandl.get('%s/%s' % (exchange, ticker))
Description To import the data from quandl repository
Parameters exchange, ticker
Return value dataframe
Table 5.7 Function get () details
Function name Prophet()
Syntax model = fbprophet.Prophet()
Description creates a fbprophet model
Parameters none
Return value model
Table 5.8 Function Prophet () details
Dept. of I.S.E., S.C.E. 2018-19 22

Function name fit()

Syntax model.fit(train)
Description Fits the ‘train’ dataframe to the fbprophet model.
Parameters dataframe for training
Return value ----
Table 5.9 Function fit () details
Function name make_future_dataframe()

Syntax model.make_future_dataframe(periods=180, freq='D')
Description Make dataframe with future dates fro forecasting
Parameters periods, freq
Table 5.10 Function make_future_dataframe ()
details
Function name predict()

Syntax model.predict(future)
Description Predict the values for ‘future’ given a model ‘model’
Parameters dataframe
Table 5.11 Function predict () details
5.5 Sample Code

file.py
import fbprophet
import pytrends
import pandas as pd
import numpy as np
Dept. of I.S.E., S.C.E. 2018-19 23

from pytrends. request import TrendReq
import matplotlib.pyplot as plt
import matplotlib
import quandl
class Stocker():
# Initialization requires a ticker symbol
def __init__(self, ticker, exchange='WIKI'):
# Enforce capitalization
ticker = ticker.upper()
# Symbol is used for labeling plots
self.symbol = ticker
# Retrieval the financial data
try:
stock = quandl.get('%s/%s' % (exchange, ticker))
except Exception as e:
print('Error Retrieving Data.')
print(e)
return
# Set the index to a column called Date
stock = stock.reset_index(level=0)
# Columns required for prophet
stock['ds'] = stock['Date']
if ('Adj. Close' not in stock.columns):
stock['Adj. Close'] = stock['Close']
stock['Adj. Open'] = stock['Open']
stock['y'] = stock['Adj. Close']
stock['Daily Change'] = stock['Adj. Close'] - stock['Adj. Open']
Dept. of I.S.E., S.C.E. 2018-19 24

# Data assigned as class attribute
self.stock = stock.copy()
# Minimum and maximum date in range
self.min_date = min(stock['Date'])
self.max_date = max(stock['Date'])
# Find max and min prices and dates on which they occurred
self.max_price = np.max(self.stock['y'])
self.min_price = np.min(self.stock['y'])
self.min_price_date = self.stock[self.stock['y'] == self.min_price]['Date']
self.min_price_date = self.min_price_date[self.min_price_date.index[0]]
self.max_price_date = self.stock[self.stock['y'] == self.max_price]['Date']
self.max_price_date = self.max_price_date[self.max_price_date.index[0]
# The starting price (starting with the opening price)
self.starting_price = float(self.stock.ix[0, 'Adj. Open'])
# The most recent price
self.most_recent_price = float(self.stock.ix[len(self.stock) - 1, 'y']
# Whether or not to round dates
self.round_dates = True
# Number of years of data to train on
self.training_years = 3
# Prophet parameters
# Default prior from library
self.changepoint_prior_scale = 0.05
self.weekly_seasonality = True
self.daily_seasonality = True
self.monthly_seasonality = True
self.yearly_seasonality = True
Dept. of I.S.E., S.C.E. 2018-19 25

self.changepoints = None
print('{} Stocker Initialized. Data covers {} to {}.'.format(self.symbol,
self.min_date.date(), self.max_date.date()))
"""
Make sure start and end dates are in the range and can be
converted to pandas datetimes. Returns dates in the correct format
"""
def handle_dates(self, start_date, end_date):
# Default start and end date are the beginning and end of data
if start_date is None:
start_date = self.min_date
if end_date is None:
end_date = self.max_date
try:
# Convert to pandas datetime for indexing dataframe
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
except Exception as e:
print('Enter valid pandas date format.')
print(e)
return
valid_start = False
valid_end = False
Dept. of I.S.E., S.C.E. 2018-19 26

# User will continue to enter dates until valid dates are met
while (not valid_start) & (not valid_end):
valid_end = True
valid_start = True
if end_date.date() < start_date.date():
print('End Date must be later than start date.')
start_date = pd.to_datetime(input('Enter a new start date: '))
end_date= pd.to_datetime(input('Enter a new end date: '))
valid_end = False
valid_start = False
else:
if end_date.date() > self.max_date.date():
print('End Date exceeds data range')
end_date= pd.to_datetime(input('Enter a new end date: '))
valid_end = False
if start_date.date() < self.min_date.date():
print('Start Date is before date range')
start_date = pd.to_datetime(input('Enter a new start date: '))
valid_start = False
return start_date, end_date
"""
Return the dataframe trimmed to the specified range.
"""
def make_df(self, start_date, end_date, df=None)
# Default is to use the object stock data
if not df:
Dept. of I.S.E., S.C.E. 2018-19 27

df = self.stock.copy()
start_date, end_date = self.handle_dates(start_date, end_date)
# keep track of whether the start and end dates are in the data
start_in = True
end_in = True
# If user wants to round dates (default behavior)
if self.round_dates:
# Record if start and end date are in df
if (start_date not in list(df['Date'])):
start_in = False
if (end_date not in list(df['Date'])):
end_in = False
# If both are not in dataframe, round both
if (not end_in) & (not start_in):
trim_df = df[(df['Date'] >= start_date.date()) &
(df['Date'] <= end_date.date())]
else:
# If both are in dataframe, round neither
if (end_in) & (start_in):
else:
# If only start is missing, round start
Dept. of I.S.E., S.C.E. 2018-19 28

if (not start_in):
trim_df = df[(df['Date'] > start_date.date()) &
# If only end is imssing round end
elif (not end_in):
(df['Date'] < end_date.date())]
else:
valid_start = False
valid_end = False
while (not valid_start) & (not valid_end):
# No round dates, if either data not in, print message and return
if (start_date in list(df['Date'])):
valid_start = True
if (end_date in list(df['Date'])):
valid_end = True
# Check to make sure dates are in the data
if (start_date not in list(df['Date'])):
print('Start Date not in data (either out of range or not a trading day.)')
start_date = pd.to_datetime(input(prompt='Enter a new start date: '))
elif (end_date not in list(df['Date'])):
print('End Date not in data (either out of range or not a trading day.)')
Dept. of I.S.E., S.C.E. 2018-19 29

end_date = pd.to_datetime(input(prompt='Enter a new end date: ') )
# Dates are not rounded
return trim_df
# Basic Historical Plots and Basic Statistics
def plot_stock(self, start_date=None, end_date=None, stats=['Adj. Close'],

plot_type='basic')
self.reset_plot()
start_date = self.min_date
stock_plot = self.make_df(start_date, end_date)
colors = ['r', 'b', 'g', 'y', 'c', 'm']
for i, stat in enumerate(stats)
stat_min = min(stock_plot[stat])
stat_max = max(stock_plot[stat])
stat_avg = np.mean(stock_plot[stat])
date_stat_min = stock_plot[stock_plot[stat] == stat_min]['Date']
date_stat_min = date_stat_min[date_stat_min.index[0]].date()
date_stat_max = stock_plot[stock_plot[stat] == stat_max]['Date']
Dept. of I.S.E., S.C.E. 2018-19 30

date_stat_max = date_stat_max[date_stat_max.index[0]].date()
print('Maximum {} = {:.2f} on {}.'.format(stat, stat_max, date_stat_max))
print('Minimum {} = {:.2f} on {}.'.format(stat, stat_min, date_stat_min))
print('Current {} = {:.2f} on {}.\n'.format(stat, self.stock.ix[len(self.stock) - 1,
stat], self.max_date.date())
# Percentage y-axis
if plot_type == 'pct':
# Simple Plot
plt.style.use('fivethirtyeight');
if stat == 'Daily Change':
plt.plot(stock_plot['Date'], 100 * stock_plot[stat],color = colors[i], linewidth
= 2.4, alpha = 0.9,label = stat)
else:
plt.plot(stock_plot['Date'], 100 * (stock_plot[stat] - stat_avg) / stat_avg,
color = colors[i], linewidth = 2.4, alpha = 0.9,
label = stat)
plt.xlabel('Date'); plt.ylabel('Change Relative to Average (%)'); plt.title('%s

Stock History' % self.symbol);
plt.legend(prop={'size':10})
plt.grid(color = 'k', alpha = 0.4);
# Stat y-axis
elif plot_type == 'basic':
Dept. of I.S.E., S.C.E. 2018-19 31

plt.style.use('fivethirtyeight');
plt.plot(stock_plot['Date'], stock_plot[stat], color = colors[i], linewidth = 3,
label = stat, alpha = 0.8)
plt.xlabel('Date'); plt.ylabel('US $'); plt.title('%s Stock History' % self.symbol);
plt.grid(color = 'k', alpha = 0.4);
plt.show();
# Reset the plotting parameters to clear style formatting
# Not sure if this should be a static method
@staticmethod
def reset_plot():
# Restore default parameters
matplotlib.rcParams.update(matplotlib.rcParamsDefault)
# Adjust a few parameters to liking
matplotlib.rcParams['figure.figsize'] = (8, 5)
matplotlib.rcParams['axes.labelsize'] = 10
matplotlib.rcParams['xtick.labelsize'] = 8
matplotlib.rcParams['ytick.labelsize'] = 8
matplotlib.rcParams['axes.titlesize'] = 14
matplotlib.rcParams['text.color'] = 'k'
# Method to linearly interpolate prices on the weekends
def resample(self, dataframe):
# Change the index and resample at daily level
Dept. of I.S.E., S.C.E. 2018-19 32

dataframe = dataframe.set_index('ds')
dataframe = dataframe.resample('D')
# Reset the index and interpolate nan values
dataframe = dataframe.reset_index(level=0)
dataframe = dataframe.interpolate()
return dataframe
# Remove weekends from a dataframe
def remove_weekends(self, dataframe):
# Reset index to use ix
dataframe = dataframe.reset_index(drop=True)
weekends = []
# Find all of the weekends
for i, date in enumerate(dataframe['ds']):
if (date.weekday()) == 5 | (date.weekday() == 6):
weekends.append(i)
# Drop the weekends
dataframe = dataframe.drop(weekends, axis=0)
return dataframe
# Calculate and plot profit from buying and holding shares for specified date range
def buy_and_hold(self, start_date=None, end_date=None, nshares=1):
self.reset_plot()
# Find starting and ending price of stock
Dept. of I.S.E., S.C.E. 2018-19 33

start_price = float(self.stock[self.stock['Date'] == start_date]['Adj. Open'])
end_price = float(self.stock[self.stock['Date'] == end_date]['Adj. Close'])
# Make a profit dataframe and calculate profit column
profits = self.make_df(start_date, end_date)
profits['hold_profit'] = nshares * (profits['Adj. Close'] - start_price)
# Total profit
total_hold_profit = nshares * (end_price - start_price)
print('{} Total buy and hold profit from {} to {} for {} shares = ${:.2f}'.format
(self.symbol, start_date.date(), end_date.date(), nshares, total_hold_profit))
# Plot the total profits
plt.style.use('dark_background')
# Location for number of profit
text_location = (end_date - pd.DateOffset(months = 1)).date()
# Plot the profits over time
plt.plot(profits['Date'], profits['hold_profit'], 'b', linewidth = 3)
plt.ylabel('Profit ($)'); plt.xlabel('Date'); plt.title('Buy and Hold Profits for {} {} to
{}'.format( self.symbol, start_date.date(), end_date.date()))
# Display final value on graph
plt.text(x = text_location,
y = total_hold_profit + (total_hold_profit / 40),
s = '$%d' % total_hold_profit,
color = 'g' if total_hold_profit > 0 else 'r', size = 14)
plt.grid(alpha=0.2)
Dept. of I.S.E., S.C.E. 2018-19 34

plt.show();
# Create a prophet model without training
def create_model(self):
# Make the model
model = fbprophet.Prophet(daily_seasonality=self.daily_seasonality,
weekly_seasonality=self.weekly_seasonality,
yearly_seasonality=self.yearly_seasonality,
changepoint_prior_scale=self.changepoint_prior_scale,
changepoints=self.changepoints)
if self.monthly_seasonality:
# Add monthly seasonality
model.add_seasonality(name = 'monthly', period = 30.5, fourier_order = 5)
return model
# Graph the effects of altering the changepoint prior scale (cps)
def changepoint_prior_analysis(self, changepoint_priors=[0.001, 0.05, 0.1, 0.2],
colors=['b', 'r', 'grey', 'gold']):
# Training and plotting with specified years of data
train = self.stock[(self.stock['Date'] > (max(self.stock['Date']) –
pd.DateOffset(years=self.training_years)).date())]
# Iterate through all the changepoints and make models
for i, prior in enumerate(changepoint_priors):
# Select the changepoint
Dept. of I.S.E., S.C.E. 2018-19 35

self.changepoint_prior_scale = prior
# Create and train a model with the specified cps
model = self.create_model()
model.fit(train)
future = model.make_future_dataframe(periods=180, freq='D')
# Make a dataframe to hold predictions
if i == 0:
predictions = future.copy()
future = model.predict(future)
# Fill in prediction dataframe
predictions['%.3f_yhat_upper' % prior] = future['yhat_upper']
predictions['%.3f_yhat_lower' % prior] = future['yhat_lower']
predictions['%.3f_yhat' % prior] = future['yhat']
# Make and predict for next year with future dataframe
future = model.make_future_dataframe(periods = days, freq='D')
future = model.predict(future)
if days > 0:
# Print the predicted price
print('Predicted Price on {} = ${:.2f}'.format(
future.ix[len(future) - 1, 'ds'].date(), future.ix[len(future) - 1, 'yhat']))
title = '%s Historical and Predicted Stock Price' % self.symbol
else:
Dept. of I.S.E., S.C.E. 2018-19 36

title = '%s Historical and Modeled Stock Price' % self.symbol
# Set up the plot
fig, ax = plt.subplots(1, 1)
# Plot the actual values
ax.plot(stock_history['ds'], stock_history['y'], 'ko-', linewidth = 1.4, alpha = 0.8, ms =
1.8, label = 'Observations')
# Plot the predicted values
ax.plot(future['ds'], future['yhat'], 'forestgreen',linewidth = 2.4, label = 'Modeled')
# Plot the uncertainty interval as ribbon
ax.fill_between(future['ds'].dt.to_pydatetime(), future['yhat_upper'],
future['yhat_lower'], alpha = 0.3, facecolor = 'g', edgecolor = 'k', linewidth = 1.4,
label = 'Confidence Interval')
# Plot formatting
plt.legend(loc = 2, prop={'size': 10}); plt.xlabel('Date'); plt.ylabel('Price $');
plt.grid(linewidth=0.6, alpha = 0.6)
plt.title(title);
plt.show()
return model, future
# Evaluate prediction model for one year
def evaluate_prediction(self, start_date=None, end_date=None, nshares = None):
# Default start date is one year before end of data
# Default end date is end date of data
Dept. of I.S.E., S.C.E. 2018-19 37

start_date = self.max_date - pd.DateOffset(years=1)
# Training data starts self.training_years years before start date and goes up to start
train = self.stock[(self.stock['Date'] < start_date.date()) &
(self.stock['Date'] > (start_date - pd.DateOffset(years=self.training_years)).date())]
# Testing data is specified in the range
test = self.stock[(self.stock['Date'] >= start_date.date()) & (self.stock['Date'] <=
end_date.date())]
# Calculate percentage of time actual value within prediction range
test['in_range'] = False
for i in test.index:
if (test.ix[i, 'y'] < test.ix[i, 'yhat_upper']) & (test.ix[i, 'y'] > test.ix[i, 'yhat_lower']):
test.ix[i, 'in_range'] = True in_range_accuracy = 100 * np.mean(test['in_range']
if not nshares:
# Date range of predictions
print('\nPrediction Range: {} to {}.'.format(start_date.date(),
end_date.date()))
# Final prediction vs actual value
print('\nPredicted price on {} = ${:.2f}.'.format(max(future['ds']).date(),
future.ix[len(future) - 1, 'yhat']))
Dept. of I.S.E., S.C.E. 2018-19 38

print('Actual price on {} = ${:.2f}.\n'.format(max(test['ds']).date(),
print('Average Absolute Error on Training Data = test.ix[len(test) - 1, 'y']))
${:.2f}.'.format(train_mean_error))
print('Average Absolute Error on Testing Data = (test_mean_error))
# Direction accuracy
print('When the model predicted an increase, the price increased {:.2f}% of the )
print('When the model predicted a decrease, the price decreased {:.2f}% of the )
print('The actual value was within the {:d}% confidence interval {:.2f}% of the
time.'.format(int(100 * model.interval_width), in_range_accuracy))
# Reset the plot
self.reset_plot()
# Set up the plot
fig, ax = plt.subplots(1, 1)
# Plot the actual values
ax.plot(train['ds'], train['y'], 'ko-', linewidth = 1.4, alpha = 0.8, ms = 1.8, label =
'Observations')
ax.plot(test['ds'], test['y'], 'ko-', linewidth = 1.4, alpha = 0.8, ms = 1.8, label =
'Observations')
# Plot the predicted values
ax.plot(future['ds'], future['yhat'], 'navy', linewidth = 2.4, label = 'Predicted');
# Plot the uncertainty interval as ribbon
ax.fill_between(future['ds'].dt.to_pydatetime(), future['yhat_upper'],
facecolor = 'gold', edgecolor = 'k', linewidth = 1.4, label = 'Confidence Interval')
Dept. of I.S.E., S.C.E. 2018-19 39

# Put a vertical line at the start of predictions
plt.vlines(x=min(test['ds']).date(), ymin=min(future['yhat_lower']),
ymax=max(future['yhat_upper']), colors = 'r', linestyles='dashed', label =
'Prediction Start')
# Plot formatting
plt.legend(loc = 2, prop={'size': 8}); plt.xlabel('Date'); plt.ylabel('Price $');
plt.grid(linewidth=0.6, alpha = 0.6)
plt.title('{} Model Evaluation from {} to {}.'.format(self.symbol,
start_date.date(), end_date.da
# Default start date is two years before end of data
# Default end date is one year before end of data
start_date = self.max_date - pd.DateOffset(years=2)
end_date = self.max_date - pd.DateOffset(years=1)
# Convert to pandas datetime for indexing dataframe
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
# Select self.training_years number of years
train = self.stock[(self.stock['Date'] > (start_date - (self.stock['Date'] <
start_date.date())]
Dept. of I.S.E., S.C.E. 2018-19 40

plt.show();
# Plot of training and testing average uncertainty
self.reset_plot()
plt.plot(results['cps'], results['train_range'], 'bo-', ms = 8, label = 'Train Range')
plt.plot(results['cps'], results['test_range'], 'r*-', ms = 8, label = 'Test Range')
plt.xlabel('Changepoint Prior Scale'); plt.ylabel('Avg. Uncertainty ($)');
plt.title('Uncertainty in Estimate as Function of CPS')
plt.grid(color='k', alpha=0.3)
plt.xticks(results['cps'], results['cps'])
plt.show();
Dept. of I.S.E., S.C.E. 2018-19 41

CHAPTER 6
TESTING
In this chapter, an overview of testing is provided to verify the correctness and the
functionality of the system. Software testing is the process of analysing a software item to
detect the differences between the existing and required conditions and to evaluate the
features of software item. Software testing is an activity that should be done throughout
the development process. Software testing is a task intended to detect defects in software
by contrasting a computer program’s expected results with its actual results for given set
of inputs.
The aim of testing phase is to discover defects or errors by testing individual

program components. During a system testing, these components are integrated to form a
complete system. At this stage, testing was focused on establishing that the system met its
functional requirements, and does not behave in an unexpected way. Test data were
inputs which had been devised to test the system and the outputs were predicted from
these inputs if the system operates according to its specification. Testing was done to
examine the behaviour in a cohesive system. The test cases were selected to ensure that
the system behaviour can be examined in all possible combination of conditions.
Accordingly, the expected behaviour of the system under different combinations

were given. Therefore, test cases were selected which had inputs and the outputs were on
expected lines. Inputs that were not valid and for which suitable messages had to be given
and the inputs that did not occur frequently were regarded as special cases.
Test Environment
A testing environment is a setup of software and hardware on which the testing

team is going to perform the testing of the newly built software product. This setup
consists of the physical setup which includes hardware, and logical setup that includes
Server Operating system, client operating system, database server, front end running
environment, browser (if web application), or any other software components required to
run this software product. This testing setup is to be built on both the ends.
Dept. of I.S.E., S.C.E. 2018-19 42

Test Case
Set of test inputs, execution conditions, and expected results were developed for a
particular objective, such as to exercise a particular program path or to verify compliance
with a specific requirement. It included the following.
• Features to be tested
• Items to be tested
• Purpose of testing
• Pass/Fail criteria
6.1 Testing in Machine Learning
A DataScience/MachineLearning career has primarily been associated with

building models that could do numerical or class-related predictions. This is unlike
conventional software development, which is associated with both development and
"testing" the software. And the related career profiles are software developer/engineers
and test engineers/QA professionals. However, in the case of Machine Learning, the
career profile is a data scientist. The usage of the word "testing " in relation to Machine
Learning models is primarily used for testing the model performance in terms of
accuracy/precision of the model. It can be noted that the word, "testing" means different
for conventional software development and Machine Learning models development.
Hence as mentioned above the traditional unit/integration testing would not work
on machine learning models hence it is tested based on its accuracy and prediction.
Accuracy is one metric for evaluating classification models.

Informally, accuracy is the fraction of predictions our model got right. Formally,
accuracy has the following definition:
Accuracy=Number of correct predictions/Total number of predictions
For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:
Accuracy=TP+TN/TP+TN+FP+FN
Dept. of I.S.E., S.C.E. 2018-19 43

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN =

False Negatives.
When it comes to forecasting the models are evaluated based on the expected
results they predict, In case of stock market forecasting, we have divided the data into
training set and testing set again it is split into training dataset and validation dataset in
the training set. We train our model using the training dataset and validation dataset is
used to test the trained data. A validation dataset is a sample of data held back
from training your model that is used to give an estimate of model skill while tuning
model's hyperparameters.
A test dataset is a dataset that is independent of the training dataset, but that
follows the same probability distribution as the training dataset. If a model fit to the
training dataset also fits the test dataset well.
Figure: 5.1 Model Evaluation
As you can see in the above graph dotted vertical line passing through the y axis is
the point from which our prediction starts and the prices depicted in blue line is our
predicted stocks values and the black line is the observed value. Hence by observing the
predicted vs observed value we can tell how well our model works.
Dept. of I.S.E., S.C.E. 2018-19 44

6.2 System Testing
System testing is the testing conducted on a complete, integrated system to

evaluate the system compliance with its specified requirements. System testing involves
putting the new program in many different environments to ensure that the program
works in typical customer environments with various versions and types of operating
systems and/or applications.
System testing is actually a series of different tests whose primary purpose is to

fully exercise the computer-based system. Although each test has a different purpose, the
main purpose is to verify that all the system elements have been properly integrated and
perform the allocated functions.
Dept. of I.S.E., S.C.E. 2018-19 45

CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion
The popularity of stock market trading is growing rapidly, which is encouraging

researchers to find out new methods for the prediction using new techniques. The
forecasting technique is not only helping the researchers but it also helps investors and any
person dealing with the stock market. In order to help predict the stock indices, a
forecasting model with good accuracy is required. In this work, we have used one of the
most precise forecasting technology using Recurrent Neural Network and Long Short-
Term Memory unit which helps investors, analysts or any person interested in investing in
the stock market by providing them a good knowledge of the future situation of the stock
market.
Future Enhancements
In the future, the stock market prediction system can be further improved by
utilizing a much bigger dataset than the one being utilized currently. This would help to
increase the accuracy of our prediction models. Furthermore, other models of Machine
Learning could also be studied to check for the accuracy rate resulted by them.
Dept. of I.S.E., S.C.E. 2018-19 46

SNAPSHOTS
Snapshot 1 : Data Frame
Snapshot 2: Stock values
Dept. of I.S.E., S.C.E. 2018-19 47

Snapshot 3: Prediction without training
Snapshot 4:Stock history
Dept. of I.S.E., S.C.E. 2018-19 48

Snapshot 5: Prediction without training
Snapshot 6: stock prediction
Dept. of I.S.E., S.C.E. 2018-19 49

Snapshot 7: Changepoint Prior Scale
Snapshot 8:Testing and Training curves
Dept. of I.S.E., S.C.E. 2018-19 50

Snapshot 9: Hold and buy v/s prediction
Snapshot 10: Predicted increase/decrease
Dept. of I.S.E., S.C.E. 2018-19 51

Snapshot 11: Accuracy of prediction
Snapshot 12: Prediction with range
Dept. of I.S.E., S.C.E. 2018-19 52

Snapshot 13: Enter company name
Snapshot 14: Predicted stock values
Dept. of I.S.E., S.C.E. 2018-19 53

ANNEXURE A
GLOSSARY
Accuracy
Accuracy is a metric by which one can examine how good is the machine learning model.
Assets
Everything a company or person owns, including money, securities, equipment and real
estate. Assets include everything that is owed to the company or person.
Bar Chart
Bar chart are the type of graph that are used to display and and compare the numbers,
frequencyor other measures.
Classification
The identification of which of two or more categories an item falls under; a classic
machine learning task. Deciding whether an email message is spam or not classifies it
among two categories, and analysis of data about movies might lead to classification of
them among several genres.
Confidence Interval
A range specified around an estimate to indicate margin of error, combined with a

probability that a value will fall in that range. The field of statistics offers specific
mathematical formulas to calculate confidence intervals
Covariance
A measure of the relationship between two variables whose values are observed at the
same time; specifically, the average value of the two variables diminished by the product
of their average values.
Dept. of I.S.E., S.C.E. 2018-19 54

Capital Stock
All shares representing ownership of a company, including preferred and common shares.
Close Price
The price of the last board lot trade executed at the close of trading
Dependent variable
The value of a dependent value “depends” on the value of the independent variable. If
you're measuring the effect of different sizes of an advertising budget on total sales, then
the advertising budget figure is the independent variable and total sales is the dependent
variable
Reinforcement Learning
A class of machine learning algorithms in which the process is not given specific goals to
meet but, as it makes decisions, is instead given indications of whether it’s doing well or
not.
Root Mean Squared Error
Also, RMSE. The square root of the Mean Squared Error. This is more popular than Mean
Squared Error because taking the square root of a figure built from the squares of the
observation value errors gives a number that’s easier to understand in the units used to
measure the original observations.
Recurrent Neural Networks
A recurrent neural network is a class of artificial neural network where connections

between nodes form a directed graph along a temporal sequence.
Stock Price Index
A statistical measure of the state of the stock market, based on the performance of certain
stocks. Examples include the S&P/TSX Composite Index and the S&P/TSX Venture
Composite Index.
Dept. of I.S.E., S.C.E. 2018-19 55

ANNEXURE B
ACRONYMS
ARIMA Autoregressive integrated moving average
ANFIS Adaptive Network-Based Fuzzy Inference System
ABC-RNN Artificial bee colony (abc) algorithm
API Application Program Interface
IDE Integrated Development Environment
JSON JavaScript Object Notation
RNN Recurrent Neural Network
RMSE Root Mean Square Error
SVM Support Vector Machine
S&P Standard & Poor
Dept. of I.S.E., S.C.E. 2018-19 56

BIBLIOGRAPHY
[1] I. Svalina, V. Galzina, R. Luji, and G. Imunovi, "An adaptive network- based fuzzy
inference system (ANFIS) for the forecasting: The case of close price indices,"
Expert systems with applications, vol. 40, no. 15, pp. 60556063, 2013.
[2] M. A. Boyacioglu and D. Avci, "An adaptive network-based fuzzy inference system
(ANFIS) for the prediction of stock market return: the case of the Istanbul stock
exchange," Expert Systems with Applications, vol. 37, no. 12, pp. 79087912, 2010.
[3] E. F. Fama and K. R. French, "Common risk factors in the returns on stocks and
bonds," Journal of financial economics, vol. 33, no. 1, pp. 356, 1993.
[4] T.-J. Hsieh, H.-F. Hsiao, and W.-C. Yeh, "Forecasting stock markets using wavelet
transforms and recurrent neural networks: An integrated system based on artificial
bee colony algorithm," Applied soft computing, vol. 11, no. 2, pp. 25102525,
2011.
[5] Hall JW. Adaptive selection of US stocks with neural nets. In: Deboeck GJ, editor.
Trading on the edge: neural, genetic, and fuzzy systems for chaotic financial
markets. New York: Wiley; 1994. p. 45–65.
[6] Tay FEH, Cao LJ. Application of support vector machines in 1nancial time series
forecasting. Omega 2001; 29:309–17.
[7] Eugene F. Fama “The Behavior of Stock Market Prices”, the Journal of Business,
Vol 2, No. 2, pp. 7–26, January 1965.
[8] Cao LJ, Tay FEH. Financial forecasting using support vector machines. Neural
Computing Applications 2001; 10:184–92.
Dept. of I.S.E., S.C.E. 2018-19 57

[9] Zhen Hu, Jibe Zhu, and Ken Tse “Stocks Market Prediction Using Support
Vector Machine”, International Conference on Information Management,
Innovation Management and Industrial Engineering, 2013.M.
[10] Wei Huang, Yoshiteru Nakamori, Shou-Yang Wang, “Forecasting stock

market movement direction with support vector machine”, Computers &
Operations Research, Volume 32, Issue 10, October 2005, Pages 2513–2522.
[11] N. Ancona, Classification Properties of Support Vector Machines for Regression,

Technical Report, RIIESI/CNR- Nr. 02/99.
[12] K. jae Kim, “Financial time series forecasting using support vector
machines,” Neurocomputing, vol. 55, 2003.
[13] Debashish Das and Mohammad shorif uddin data mining and neural network
techniques in stock market prediction: a methodological review, international
journal of artificial intelligence & applications, vol.4, no.1, January 2013
Dept. of I.S.E., S.C.E. 2018-19 58

Main Report PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Main Report PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Stock Market Portfolio Optimization

Dept. of I.S.E., S.C.E. 2018-19 1

Fundamentalist forecast stock prices on the basis of economic condition of the

Dept. of I.S.E., S.C.E. 2018-19 2

1.2 Problem Statement

• In the past decades, there is an increasing interest in predicting markets among

Dept. of I.S.E., S.C.E. 2018-19 3

1.5 Organization of the Report

This report gives a description Stock Market Portfolio Optimization on Recurrent

Chapter 4 specifies the design details. Design is the process of establishing a

Chapter 5 includes the implementation part. Implementation is the process of

Chapter 6 includes the testing part which is an investigation conducted to provide

Dept. of I.S.E., S.C.E. 2018-19 4

M. A. Boyacioglu et al works on [2], The financial market is a complex,

Dept. of I.S.E., S.C.E. 2018-19 5

Support vector machine (SVM) is a very speci1c type of learning algorithms

E. F. Fama et al works on [3], Prediction of stock prices is very challenging and

Dept. of I.S.E., S.C.E. 2018-19 6

Dept. of I.S.E., S.C.E. 2018-19 7

3.1 Existing System

Linear regression is widely used throughout Finance in a plethora of applications.

Dept. of I.S.E., S.C.E. 2018-19 8

3.2 Proposed System

• Our analyst-in-the-loop modeling approach is an alternative approach that

Dept. of I.S.E., S.C.E. 2018-19 9

forecasting would rely on fully automated procedures, but judgmental forecasts

3.3 Requirements Specifications

The direct result of requirements analysis is Requirements specification.

3.3.1 Functional Requirements

The functional Requirements Specification documents the operation and activities

Functional requirements include:

• Descriptions of how data is collected and stored.

3.3.2 Non-Functional Requirements

Dept. of I.S.E., S.C.E. 2018-19 10

3.3.3 Hardware Requirements

Hardware requirements specifications list the necessary hardware for the

3.3.4 Software Requirements

Software requirements specifications is a description of a software system to be

• Operating System : Windows 10 or Linux

Dept. of I.S.E., S.C.E. 2018-19 11

4.1 SYSTEM ARCHITECTURE

QUANDL API DATA PRE-

Figure 4.1 System Architecture

Dept. of I.S.E., S.C.E. 2018-19 12

4.2 USE CASE DIAGRAM

Figure 4.2 Use Case Diagram

4.3 PROPHET MODEL

Dept. of I.S.E., S.C.E. 2018-19 13

Figure4.3 Prophet Model

4.4 SEQUENCE DIAGRAM

A Sequence diagram is a structured in such a way that it represents a timeline

Dept. of I.S.E., S.C.E. 2018-19 14

Figure4.4 Sequence Diagram

4.5 ACTIVITY DIAGRAM

Activity diagrams are graphical representations of workflows of stepwise

Dept. of I.S.E., S.C.E. 2018-19 15

Figure4.5 Activity Diagram

Dept. of I.S.E., S.C.E. 2018-19 16

5.1 Main Modules

The main models are

5.1.1 Data Collection

Data collection is the process of gathering and measuring information on targeted

def init(self, ticker, exchange='WIKI'):