Deep learning

© All Rights Reserved

2 (de) vizualizări

Deep learning

© All Rights Reserved

- MBA 5.2 AI in Business
- XOR Problem Demonstration Using MATLAB
- behera
- DM-NeuralNet-GA
- 1207.0580Improving neural networks by preventing co-adaptation of feature detectors
- DIYDeepLearningforVisionaHands-OnTutorialwithCaffe
- Kin Keung Lai - Neural Network Metal Earning for Credit Scoring
- Neural Networks
- Deep and Wide: Multiple Layers in Automatic Speech Recognition
- Risi
- tradus
- IT Soft Computing
- biologically inspired learning tool
- Chapter 3 New
- DATA MINING THROUGH NEURAL NETWORKS USING RECURRENT NETWORK
- 1501.07873v2
- Knoblach_SheMet97.pdf
- 8-Anamika 2009a.pdf
- lec3.pdf
- Everything You Need to Know About Neural Networks – Hacker Noon

Sunteți pe pagina 1din 7

net/publication/319041996

CITATIONS READS

0 892

3 authors, including:

Oscar. Chang

Yachay Tech

26 PUBLICATIONS 165 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Grey-Based Neural Network model to predict supply chain performance. View project

All content following this page was uploaded by Oscar. Chang on 11 September 2017.

A Deep Learning Algorithm to Forecast Sales of

Pharmaceutical Products

Oscar Chang Ivan Naranjo Christian Guerron Dennys Criollo

Research and Innovation Centro de despacho Sistemas Gerencia de Sistemas

IT Empresarial FARMAENLACE IT Empresarial IT Empresarial

Yachay, Ecuador Yachay, Ecuador Yachay, Ecuador Yachay, Ecuador

oscarchang@it-empresarial.com ivannaranjo@farmaenlace.com christianguerron@it- denniscriollo@farmaenlace.com

empresarial.com

Sistemas Centro de Despacho

IT Empresarial FARMAENLACE

Yachay,Ecuador Quito,Ecuador

jeffersonguerron@it- gemosquera@usfq.edu.ec

empresarial.com

Abstract— This work proposes a deep neural network (DNN) computational power and abstract representation of some

algorithm that accomplishes consistent sales forecasting for functions [10] [11]. Unfortunately, well stablished gradient

weekly data of pharmaceutical products. The resultant time descent methods such as backpropagation, that have proved

series is used to train with backpropagation, step by step, a DNN, effective when applied to shallow architectures, does not work

where shallow nets face selected scenarios, with different space-

well when applied to deep architectures.

time data considerations.

In each step, by using a sum of square differences and a peak

search procedure, a reasonable quality in the obtained abstract In previous works [12] [13] [14] we have shown an innovative

representations is pursued. First an autoencoder is trained as to line of deep learning algorithms, with its own set of

develop in its hidden layer neural data abstractions about a advantages / disadvantages, but eventually producing efficient

random moving window. Thereafter the autoencoder neural computing processors. We have taken these ideas

abstractions are used to train a second shallow net which further and in this paper, we propose a DNN specialized in

operates in a restricted area and specializes in one week ahead forecasting the sales or pharmaceutical products. The general

predictions. Lastly by using the abstraction of this second net problem is to find, for each outlet and for each product, an

plus recently captured information, a third shallow net is trained

ideal balance that minimizes inventory costs and maximize

to produce its own one week ahead estimates, using new timing

and data procedures. After training, the whole stacked system customer attention. For a distribution centers with hundreds of

can produce stable weekly forecasting with up to 91%_ 55 % hit outlets and thousands of products, this becomes a most

rate, for assorted products and periods. The system has been entangled and important operation, where deep learning could

tested in real time with real data. contribute with practical solutions.

Keywords— Deep Learning, Time Series Prediction, Sales Our methodology contemplates the training with

Forecasting. backpropagation of shallow networks inside explicit scenarios,

with specialized tasks, where predictive information about

I. INTRODUCTION predictive sales, circulates freely and is used as immediate

In the deep learning world state-of-the art performance have targets or rewards for local neural training. The final objective

gained a good reputation in fields like object recognition [1], is to produce reliable abstract representations of the data

speech recognition [2], natural language processing [3], behavior at both short-term and long-term inﬂuences, codified

physiological affect modelling [4] and many others. More in hidden layers, and then stack them together as to produce

recently papers on time-series prediction or classiﬁcation with forecasting information.

deep neural networks have been reported [5] [6] [7] [8].

We also propose a primordial method to measure the

The search for depth quality of abstract representations generated in the different

Both in biology and circuit complexity theory it is maintained used hidden layers, by monitoring, while training is in

that deep architectures can be much more efﬁcient (even progress, the neural activity of hidden neurons. This procedure

exponentially efficient) than shallow ones in terms of requires quadratic sum of differences over a selected period.

II. DEEP NETWORK one output layers. To combine the higher-level features of

different data behaviors, hidden layers are trained separately

A. Deep architecture and then stacked, on top of which the output layer is added.

The proposed stacked network utilizes five layers of The third hidden layers also incorporated as input recently

sigmoidal neurons organized as one input, three hidden and captured information such as the last eight weeks’ average, and

some fresh peaks and valleys values (Fig. 1).

For operative purposes, the proposed stacked architecture is provides the final prediction information. Finally, the zone

derived from three shallow networks called Autoencoder, “unknown future” is used to test the performance of the system

Precursor and Gambler. and to make a real prediction, when the unknown future line

reaches the end of the data. At any time, more data can be

B. Data Handling added and the system responds creating new predictions.

Given three years of daily sales grouped in weeks, the

network unravels the problem of predicting sales one week

ahead of the current input window (one product-one outlet). C. Input Vector

The dataset is taken from the database of a real pharmaceutical

databases in Ecuador. For training purposes, the available data The input vector is composed by a moving window of 16

is divide in three mobile zones (Fig. 2), where times moves to consecutive weeks plus three other elements defined by the

the right. day/month/year where the top right of the moving window

stays at a given time instant (Fig. 2). All 19 entries are

The first initial zone, to the left, is reserved to train the first normalized to neural values inside the analog segment [0,1].

two shallow nets “autoencoder-predictor” which work as a When a target is needed, it will be taken as the sale value of

coordinate duet. the week next to the right of the sample window (near future).

The shown data ranges from January 2014 to April 2017.

The next zone, of about 10 weeks, is reserved to train the net

“gambler”, which holds the final solidary network output and

IT _Empresarial.

neuron in two consecutive randomly selected images in times t

and t-1. That is:

(1)

Where:

Vt = hidden outputs variation between two consecutive

inputs

n = number of hidden neurons

Figure 2. Data handling and input vector. Weekly sales behavior of a typical oi,t = output of hidden neuron i at time t.

pharmaceutical products, with an erratic pattern of consumption and a moving

window of 16 weeks data sale plus window ubication date information.

In a typical run, with small initial random weights in the

hidden layer, V starts from a small value and then grows into a

. random oscillatory time series. We use this outcome and

For training purposes, the moving widow travels in introduce a selective peak search procedure where the last

different space-time patterns for diverse training scenarios. found peak value of V is stored until a bigger peak value is

found. In pseudo code:

cycles_count=0;

III. FIRST SCENARIO: THE AUTOENCODER do

Our autoencoder has 19 inputs, 11 hidden and 19 output {

neurons. To train it, the moving window is located at a random

position inside the autoencoder zone and the same input vector calculate net;

is used as target. backpropagation;

The job of the trained autoencoder is to reproduce in its calculate Vn

output, as exactly as possible, the image of the moving

window just loaded in its inputs, for any random position in the if(Vn>peak) peak={Vn;cycles_count=0;}

allowed area. Since hidden neurons are less than input neurons,

cycles_count++;

data compression and abstract representations must occur

during training. Our stacked systems will work with } while(cycles_count<3000);

abstraction that travel from layer to layer as the main source of

information, so we take special care about abstractions’

quality. It turns out that after a while the period required to reach

the next peak value (cycles count) grows exponentially,

probably as an overtraining signal. So, for our purposes if in

3000 consecutive cycles no new bigger peak originate, the net

is assumed to be done and training stops. In our experiments,

this peak-search scheme produces small output error and high

hidden layer activity in the autoencoder, taking an average of

50k cycles of backpropagation to be completed.

The precursor is a three-layer neural net whose inputs are

the nonlinear abstractions generated by the trained

Figure 3. The Autoencoder and the Precursor. Once the Autoencoder is

autoencoder. Its only output is trained to predict, inside the

trained, its hidden layer becomes the input to the Precursor, which never sees precursor zone, the sale value for the week next to the window

the real input windows but only abstractions created by hidden1. Also, the position. Precursor never sees the actual data patterns captured

learning cycles of precursor do not affect the weights of the autoencoder by the window but only the resulting abstraction generated by

the Autoencoder, that is why good quality abstractions were

required.

We try several metric methods to avoid overfitting-underfitting To train the precursor the moving window is located at a

problems [15] and at the same time try to guarantee quality random position inside the allowed zone (Fig. 2) and the value

abstractions from the involved hidden layers. We finally of the week next to moving window is used as target. After

adopted the following scheme, which begins by measuring the feeding forward all the participating layers, one backpro cycle

quadratic variation V among all the outputs of the hidden is carried out, only for the precursor layers. As before we check

the hidden layer activity using (1) and the mentioned peak- of precursor convey valuable feature abstractions about

search algorithm. predictions in the allowed training zone. Noticeably enough

the predictive capacity of the duet fades away when it moves

A. Trained staked Behavior to the future, toward never seen data (Fig. 4).

precursor duet is shown. The quadratic error inside the

allowed training zone shrinks to a minimal. The two curves

look almost identical, so it is expected that the hidden layers

Figure 4. The behavior of the staked Autoencoder-Precursor duet. After training the quadratic error inside the allowed training zone shrinks to a minimal. The two

curves look almost identical, so the hidden layers of precursor should convey valuable feature abstractions about predicting. The predictive capacity fades away

when the duet moves to the unknown future, with never seen data (Fig. 4).

The Gambler is a third neural network responsible for We work with a group of several different pharmaceutical

predicting the sales value of the moving window’s immediate products and for illustrative purposes we selected four products

week, in a new, never seen training zone. As shown in Fig. 1 A, B, C, D with different nature and behaviors. For training

this shallow network accepts as input the abstractions purposes, the data covers more than 170 weeks, from 2014 to

generated by the Precursor (7 inputs) and a selected group of 2017. In the shown graphics, the one-week-ahead predictions

recent events defined by the average value of the 8 last weeks includes 22 consecutive weeks from December 2016 to June

(1 input), and the peaks and valleys of the last 8 weeks (16 2017.

inputs). This totalizes 24 signals that bear recent and old Weeks 16 to 150 are used to train the duet Autoencoder-

information. Precursor duet and Weeks 150 to 160 are used to train the

Gambler. After each prediction, an error measure is made

A. The Gambler Training. against a threshold obtained as the 10% average value of the

To train the GAMBLER the moving window moves step given data. If threshold is satisfied the prediction is declared as

by step strictly to the future, beginning in the gambler training “true” and stored. If threshold is not accomplished the duet

zone in Fig. 2, Fig. 4 and going forward two successive steps, Autoencoder-Gambler is retrained for up to four more

up to the border of the unknown future, carrying out one opportunities. After this the prediction is declared as “false”

backpropagation cycle in each step. Once the unknown future and stored. Once one prediction is finished, the whole system

is touched the window returns to the beginning in the gambler advances one week toward the unknown future and so on, until

training zone. This left to right sweep is repeated while the

the end of the data is reached.

hidden layer activity of Gambler in monitored with the peak-

search method previously described. Termination occurs in an The resultant time series (blue) and their respective one week

average of 25k backpropagation cycles. ahead predictions (red) are shown in Fig. 5, times run to the

right. Black vertical lines represent fails, where the predictions

did not satisfy the 10% threshold and re-training of the duet

AUTOENCODER-PRECURSOR was done. When new

weekly information is added to the data, the process fires again

and a new prediction is produced.

Figure 5. The Results. From left to right high, medium, low-low and low rotation products sales (blue) and their respective one week ahead predictions (red), from

December-2016 to June-2017. Black vertical lines represent failures where the prediction did not satisfy the required 20% threshold and retraining of the whole

network was required. For three out of four randomly selected products, the hit rate is noticeable high.

Product A. High rotation, average sale is 72 units per week. analysis that uses both short-term and long-term features and in

There are 3 fails in 22 trials. The hit rate for this run is 86.4 %. experiments with real-world data delivers good results for

Some peaks and valleys are correctly anticipated. Other runs products with different consumption behaviors. Due to the

may go down to 66.3 %. many possibilities in training strategies, linking with other

Product B. Medium rotation, average sale 9.6 units per week. training techniques such as reinforcement learning and genetic

There are 4 fails in 22 trials. The hit rate is 81.8 %. Some algorithms are foreseen.

peaks and valleys are correctly anticipated.

Product C. High rotation, average sale 0.6 units per week.

There are 10 fails in 22 trials. The hit rate is 54.5 %.

Product D. Low rotation, average sale 110 units per week. REFERENCES

There are 4 fails in 22 trials. The hit rate is 81.8 %.

[1] Alex Krizhevsky, I Sutskever, and GE Hinton. Imagenet classiﬁcation

VII. DISCUSSION with deep convolutional neural networks. Advances in Neural

Information Processing Systems, pages 1–9, 2012. (object

recognition)

For some products, the system produces good hit rate [2] O Abdel-Hamid and A Mohamed. Applying convolutional neural

forecasting, where some peaks and valleys are well predicted. networks concepts to hybrid NN-HMM model for speech recognition.

Acoustics, Speech, and Signal Processing, 2012. (speech recognition)

For other kind of products, the hit rate barely keeps above

[3] R. Collobert and Jason Weston. A uniﬁed architecture for natural

54%. Further parameters and training strategies should yet be language processing: Deep neural networks with multitask learning.

developed for these cases. Proceedings of the 25th international conference on Machine learning,

2008. (natural language processing )

According to our results the proposed peak-search scheme

[4] HP Martinez. Learning deep physiological models of affect.

produces good enough abstraction that convey important Computational Intelligence Magazine, (April):20–33, 2013.

information, raising the hit rates well above 50%. (physiological affect modelling)

[5] Atiya, Amir F. , Gayar, Neamat El and El-Shishiny, Hisham(2010) 'An

VIII. CONCLUSIONS Empirical Comparison of Machine Learning Models for Time Series

Forecasting', Econometric Reviews, 29: 5, 594-621

We presented a sales forecasting deep learning model that [6] S. Prasad, P. Prasad. ‘Deep Recurrent Neural Networks for Time Series

makes sales predictions by staking abstract representations, Prediction’.

whose quality is monitored by using a sum of square [7] DeepLearningforEvent-DrivenStockPrediction

differences and a peak search scheme. Mladen Dalto 'Deep neural networks for time series prediction with

Abstraction are produced by three different shallow applications in ultra-short-term wind forecasting', 2015 IEEE

International Conference on Industrial Technology (ICIT)

networks: autoencoder, precursor and gambler, trained inside

[8] XiaoDing, YueZhang, TingLiu, JunwenDuan, Mladen Dalto, “Deep

explicit scenarios, with focused tasks, timing and reward neural networks for time series prediction with applications in ultra-

procedures. Our training algorithm accomplishes a temporal short-term wind forecasting time series_I. “University of Zagreb,

Faculty of Electrical Engineering and Computing, E-mail: [13] O Chang, P Constante, A Gordon, M Singaña, F Acuna. “A deep

mladen.dalto@fer.hr. architecture for visually analyze Pap cells”. - IEEE 2nd Colombian

[9] C. Deep Prakash et al….. Data Analytics based Deep Mayo Predictor for Conference Automatic Control (CCAC), Oct. 2015. DOI:

IPL-9 (time series_II . International Journal of Computer Applications 10.1109/CCAC.2015.7345210

(0975 – 8887) Volume 152 – No.6, October 2016 exponential [14] Chang Oscar, A Bio-Inspired Robot with Visual Perception of

[10] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Affordances, in Computer Vision - ECCV 2014 Workshops,vol. 8926,

learning: A review and new perspectives. Pattern Analysis and Lecture Notes in Computer Science 2015 , pp. 420-426,Springer

Applications, (1993):1–30, 2013. International Publishing, http://link.springer.com/chapter/10. 1007

[11] Yoshua Bengio. Learning Deep Architectures for AI. Foundations and [15] MathWorks. “Improve Neural Network Generalization and Avoid

Trends® in Machine Learning, 2(1):1–127, 2009. Overfitting”. https://www.mathworks.com/help/nnet/ug/improve-neural-

network-generalization-and-avoid-

[12] O. Chang, P Constante, A Gordon, M Singaña. “A Novel Deep Neural

overfitting.html?requestedDomain=www.mathworks.com#responsive_o

Network that Uses Space-Time Features for Tracking and Recognizing a ffcanvas. 2017

Moving Object”. Journal of Artificial Intelligence and Soft Computing

Research. Poland, 2017. (on press)

- MBA 5.2 AI in BusinessÎncărcat deBrijesh Kumar
- XOR Problem Demonstration Using MATLABÎncărcat deal-amin shohag
- beheraÎncărcat decoolboy_mady
- DM-NeuralNet-GAÎncărcat deKalyan Murmu
- 1207.0580Improving neural networks by preventing co-adaptation of feature detectorsÎncărcat deLucas Gallindo
- DIYDeepLearningforVisionaHands-OnTutorialwithCaffeÎncărcat de김택수
- Kin Keung Lai - Neural Network Metal Earning for Credit ScoringÎncărcat dehenrique_oliv
- Neural NetworksÎncărcat detarasimadhu545
- Deep and Wide: Multiple Layers in Automatic Speech RecognitionÎncărcat dejehosha
- RisiÎncărcat deSaptarshi S Mohanta
- tradusÎncărcat deMarian Antal
- IT Soft ComputingÎncărcat deKriti Bajpai
- biologically inspired learning toolÎncărcat deajafari200
- Chapter 3 NewÎncărcat deTruongNguyenMinhTrung
- DATA MINING THROUGH NEURAL NETWORKS USING RECURRENT NETWORKÎncărcat deCS & IT
- 1501.07873v2Încărcat deBM
- Knoblach_SheMet97.pdfÎncărcat deMayu Dharan M
- 8-Anamika 2009a.pdfÎncărcat deOmar Chayña Velásquez
- lec3.pdfÎncărcat deKoteswaraRaoB
- Everything You Need to Know About Neural Networks – Hacker NoonÎncărcat desrivatsa
- Machine Learning (2).pdfÎncărcat deGopeekrishnanRambhad
- ANN Based Controller for Three Phase Four Leg Shunt Ac 2016 Ain Shams EngineÎncărcat depralay roy
- Over the Air Deep Learning Based Radio Signal ClassificationÎncărcat deHaroon Muhammad
- Распознавание английского алфавита с частотными кепстральными коэффициентами Mel и обратным распространениемÎncărcat deDário Menezes
- Deep Transfer Metric LearningÎncărcat deJaime Cardoso
- 1801.01315 pixel linkÎncărcat detiokcek
- INTRODUCTION TO MACHINE LEARNING.pdfÎncărcat deMoTech
- Neural Hand (1)Încărcat deAnirudh Jain
- Paper_v3Încărcat deAdhithya Ramamoorthy
- Proceedings 02 00325 v2Încărcat deVikas Kumar

- GraphologyÎncărcat deAkash Kumar
- 9048160278Încărcat deazmi_fa
- Use of Ethical StandardsÎncărcat devirginia.philbrook
- A Critical Appraisal of Bloom’s Taxonomy_2016Încărcat deWahyudiUkswSalatiga
- 001 gace early childhood testÎncărcat deapi-265795386
- CC EXM5Încărcat deAlice Marie Alburo
- Artificial Neural Networks Methodological Advances and Bio Medical Applications by Kenji SuzukiÎncărcat dezubairaw24
- journal_3-1-9Încărcat desengottuvel
- M.phil AqeelÎncărcat deBernard Washington
- AP8-AA2-EV3-SIMULACIÓN ENTREVISTAÎncărcat deroman mauricio
- Examen EOIÎncărcat deNoelia Cuadrado Solà
- H.Q Mitchel Traveller Student's Book.pdfÎncărcat deCAMEVA
- Geodma for Image ClassificationÎncărcat dejanetpy
- How the Body Shapes the MindÎncărcat deSantiago Reghin
- ET21 Pengenalan Linguistik KorpusÎncărcat deSyahiirah Jaafar
- Gayathari%27s SC NotesÎncărcat deSaket Sharan
- TOK Presentation ScriptÎncărcat deSean Leong (11W)
- Fst ReportÎncărcat devanlxvnit
- How to Profile a Narcissist With One Simple QuestionÎncărcat depvarillas
- personal project planning 14-15Încărcat deapi-283955440
- 7006Încărcat deayazakhtar
- Introducing New VocabÎncărcat deSarah Patterson
- edu 5490- unit 1 plan- decision making goal settingÎncărcat deapi-242723472
- Radix Purpose WorkÎncărcat deVladimir Bozic
- Naess_1973Încărcat deMikashnikov
- Matrix Reality Fashion Flow JoinsÎncărcat deAnonymous 8erOuK4i
- a2Încărcat deEmma
- IotÎncărcat degdeepthi
- 3. the Nature of Organizational TheoryÎncărcat decsushant
- Google's RecruitmentÎncărcat desiddiqua

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.