0 Voturi pozitive0 Voturi negative

3 (de) vizualizări10 paginiJan 07, 2020

© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

© All Rights Reserved

3 (de) vizualizări

© All Rights Reserved

- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- Micro: A Novel
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The 6th Extinction
- The Black Swan
- The Art of Thinking Clearly
- The Last Battle
- Prince Caspian
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Theory of Death: A Decker/Lazarus Novel

Sunteți pe pagina 1din 10

Intro

No man is an island entire of itself; every man in a piece of the continent, a part of the

main – John Donne. In many ways, this quote emblemizes the true nature of human capacity,

specifically in terms of the architecture of our intelligence. Though each individual is endowed

with his or her own intellectual ability, the true power of human intelligence comes from its

status as a social intelligence. Social intelligences are marked by their reliance on collaboration,

by their vested interest in the coherent contributions of many. Humanity has made remarkable

progress as a function of this social intelligence, and thus one is compelled to test the effects of

“social” system in terms of computational manifestations. Generally, this study is focused on the

computational intelligence implications of collaboration in unsupervised neural networks.

The successes of unsupervised networks in recent history have been profound. The

achievements of DQN networks in a variety of Atari Games (Mnih et al) have shown the

flexibility and utility of single entity policy-based value algorithms. Furthermore, the profound

achievements of Deep Mind’s AlphaGo Zero showed the world the capacity and potential of

adversarial unsupervised learning; specifically, its strategic innovations in the game of Go

resembled many of the features of human intelligence. As the natural next step of exploration,

one is motivated to seek a path that is both outside of the regime of the single-entity framework

and aside from the concept of an adversarial system, ergo to pursue a model of collaboration.

Given the wide breadth of this field, this paper focuses on the particular phenomena of learning

in the case of imperfect or incomplete knowledge sources.

To understand the problem, imagine this scenario. A young boy has older brother who

takes to a sport early in the younger boy’s life. When the younger brother comes of age, he also

wishes to play this sport, and, as such, his older brother teaches him what he knows. Given that

the sport is complicated enough, it is fair to assume that the older brother does not have a

complete or comprehensive understanding of the sport, specifically when it comes to

determining the relatively optimal action in every possible scenario of the game. After some

finite period of teaching, the boy knows everything that the older brother knows, and then it is

possible for them to learn in concert.

This hypothetical enlivens a variety of interesting questions concerning the particular

dynamics of learning that occur. For instance, is it fair to assume that learning from a teacher is

more efficient that exploring on one’s own? If so, by what metric? If not, why? Does learning

from incomplete knowledge sources serve as a hindrance or an assistant in most cases? Is this

proportional to the level of knowledge that the “amateur” teacher has? The answer to each of

these questions may provide insight into new ways of maximizing the computational ability of

reinforcement learning systems and motivate the topic of this investigation.

Such an investigation into this "Big Brother" scenario has yet to be produced in the

community; however, different regimes of exploration have provided insights into questions

which may assist this investigation indirectly. For instance, significant literature exists

concerning the computational effectiveness of parallelizing systems, specifically in terms of the

most effective ways of maximizing one’s hardware. Such advances provide a framework for how

Brown 2

to best implement these sorts of student-teacher networks. Considering this thought, some

progress has been made in the dynamics of teacher-student networks. For instance, in Rusu et al,

it was shown that policy distillation from a teacher network could yield increased robustness and

efficiency in student networks. Such investigations highlight interesting dynamics when it comes

socially collaborative settings but do not attend to the specific quandary of this investigation.

Methods

To address this problem, this study uses the “Cartpole Problem” as the basis of training.

The goal of the task is for the neural network in question to balance a pole on a cart. The neural

net receives only the state information of the system, such as the velocity and position of the cart

and the dynamics of the pole, and must choose to either apply a fixed force to the right or left at

every time step. This particular problem defines a relatively simple task; the degrees of freedom

of the system are relatively few, and the DQN framework has empirically shown to converge on

this problem in a short sum of time. Thus, it is the optimal sort of task for this Big Brother

scenario, as the learning dynamics of the interactions can be truly elucidated.

As mentioned briefly before, this study uses the neural network architecture of a Deep Q

Network (DQN), popularized by Mnih et al in their famous work concerning neural networks

mastering Atari games using an architecture of this type. DQN’s use classical neural networks to

approximate Q-values, which functionally equate to value functions of states .Although this is

not a study into the particulars of the DQN framework, it is important to recognize why they are

useful in solving this type of problem. DQN’s are a particular type of a more general set of

architectures, reinforcement learning systems, which, in practice, means they learn “by

themselves” by reinforcing certain behaviors and dissuading others. In considering the pattern of

reinforcement learning that defines early learning in children, one could say that this best

emulates how human beings learn, even if the methods are not perfectly analogous to any

physiological process in the human body.

To model the "Big Brother" hypothetical discussed above, the system proceeds as

follows. First, a neural network designated as the “Big Brother” is trained on the Cartpole

problem for some finite sum training session. A training session is defined to be 100 separate

runs of the task while the network is able to update in response to the final results. Afterward,

another neural network, “Little Brother,” is trained on this same task for the same amount of

training session as the Big Brother; however, his options for training are different. At each time

step, the Little Brother has the option of either choosing his action based on his own Q-values or

“asking” what his Big Brother would do and doing that. This “asking” ability comes in the form

of supplying the Big Brother network with the state information and returning the action given

by its own neural network. The Little Brother makes this decision with some probability, which

in this study favors asking the Big Brother with 90% probability. After this period of time, the

Little Brother ceases to be able to sample from the Big Brother network, and both networks train

independently. During this training, the performance of each network is tracked and each

proceeds to train until either it achieves mastery of the task or attains the maximum allowable

training sessions. The maximum number of allowable training sessions is 80 plus the number of

Brown 3

initial training sessions. Mastery in this task is equivalent to achieving an average of over 999

steps in 100 testing sessions.

Hopefully it is evident how the set up described above models the phenomena closely but

imperfectly. Assigning the dynamics of the Big Brother sampling function to a probability

distribution is likely an oversimplification, but we hope that it captures the interplay between

tutelage and autonomy. Likewise, many of the parameters, such as the value of this probability

distribution, had to be chosen by intuition rather than analysis, and this could limit the generality

of this study. However, we are encouraged to proceed despite these limitations, as it allows for at

least some exploration into this regime of thought.

There is one parameter that has a canonical choice about its values, and this is the

maximization of the epsilon value in the DQN. Colloquially this means that the neural networks

have a maximum tendency towards exploration. This choice was made for two reasons, one

being philosophical, the other being pragmatic. Philosophically, exploration is set to be

maximized because that value best models the exploratory nature of children when they first

begin to learn a new task. This exploration is the computational equivalent of having no “fear of

failure” and appears to be a self-evident decision when faced with modelling early learning. The

second reason is pragmatic: higher degrees of exploration have empirically shown to reduce

convergence time. Thus, by maximizing the exploration, one can produce more results in less

time.

In analyzing the results of this experiment, it’s important to clarify one’s motivations in

the endeavors. As such, in the opinion of this author, the most important aspect of the analysis is

to clarify how the learning dynamics of the big brother/little brother system affect the

computational efficiency and robustness of convergence. To capture this, the analysis of the

effects is derived from the graphs comparing the “relative” training trials and the average number

of steps over the course of 100 trials. The term “relative” implies a certain conditionality to this

measure; in order to really compare the computational efficiency of the teaching from the Big

Brother, the relative training trials for the Little Brother account for the number of training

sessions that the Big Brother went though. Essentially, this corresponds to shifting the graph of

the Little Brother over to the right by the number of initial training trials for the Big Brother. The

effect of this is to allow the graph comparing the learning charts of both entities to adequately

describe the utility of the Little Brother learning from his older Brother.

The charts for initial training sessions of 5 to 35 are shown below respectively. In

analyzing them, three distinct regimes of behavior are evident for the Little Brother:

comparable/robust, lagging/robust, and simply lagging.

Brown 4

Brown 5

Brown 6

Brown 7

Comparable/Robust

The trials with 5-15 initial training sessions display what is described as comparable and

robust behavior. To obtain this label, the average trajectory of the Little Brother must act on

similar timescales to the Big Brother and be more likely to converge by the termination point.

The timescale similarity is evident in these graphs by the similar slopes early in the learning

process, while said convergence is apparent in the superior average steps achieved by the

termination point.

Lagging/Robust

The trials 20-25 exhibit what is defined as lagging yet robust behavior in the Little

Brother. This behavior is defined by a significant delay in the achievement of the final average

behavioral pattern (after the learning ceases leading to augmented performance) while also

achieving superior performance once this is achieved.

Simply Lagging

Trials with initial training sessions above 25 achieve lagging behavior in the Little

Brother, in which the Big and Little Brother exhibit a very similar learning trajectory, but the

Little Brother, in terms of relative training, takes far more invested time.

In order to appreciate these results, one must consider the effect of the initial training

trials. We note distinct patterns of failure for the Big Brother specifically in the

comparable/robust and lagging/robust regimes, which are likely to be functions of limited

experience in the state space. Essentially, the Big Brother has not yet seen much of the problem

and thus fails often in his initial confrontations with it. This behavior ceases to occur after 30

Brown 8

initial training trials, and thus we conclude that there is some average limit to the breadth of

initial state space exploration needed to converge in the allotted time. If the entire trajectory was

plotted, including data given in the initial training, then the Big Brother would learn in an

alternating series of plateaus and positive slopes.

With this in mind, one can fully consider the utility of training one neural network with

the ability to sample from an incomplete-teacher. First, we consider the best performing regime –

comparable and robust. In this regime, we found that we could achieve significantly faster

convergence on similar timescales in terms of relative training sessions. Essentially, the Little

Brother does not experience the same plateau in learning as the Big Brother, and we assert this is

a product of the teaching. Because the Big Brother has seen a significant portion of the initial

state space (at least in comparison to no experience), the ability to sample from him allows the

Little Brother to avoid the initial pitfalls of blind exploration. This is analogous to real life,

because, if we consider the hypothetical posed, the Big Brother has been through the experience

of learning from scratch and will have the Little Brother learn from his mistakes. An interesting

question is whether this is an artifact of the size of the initial state space, though this will be

addressed in the discussion section.

Though the utility of the lagging/robust and simply lagging regimes is less than

lagging/comparable, it is interesting to probe them in order to understand the effect of increased

duration of initial trials. If one studies the graphs of all initial training trials, a pattern of

displacement in the learning trajectory becomes evident. Essentially, as the initial training

increases, the steep positive slope of the Little Brothers learning trajectory is shifted to the right.

In the case of the robust/lagging, the negative impact of this distention or lag is rectified by an

increased rate of convergence. In the case of simply lagging, there is enough exploration time in

the initial training session for the Big Brother to converge with similar success rates as the Little

Brother and do so more efficiently. Both of these regimes affirm two conclusions. For one, in the

case of the Little Brother, the increased initial training does not directly impact the rate of growth

i.e. it does not augment its slope; however, it does shift the initial point of the steep positive

slope towards the right. This is evident by the extremely similar learning trajectories in the

robust/lagging and simply lagging cases, distinguishable only when the rapid increase in

performance occurs. Secondly, the initial training does have a palpable impact on the stability of

convergence for the Little Brother, and the utility of this impact is greatest in the regime of

relatively low initial training sessions.

Discussion

In reflecting on the achievements of this study, it is important to revisit the questions that

were raised in the beginning.

Is it fair to assume that learning from a teacher is more efficient that exploring on one’s

own? If so, by what metric? If not, why?

We are defining efficient to mean greater rates of convergence in less time. Thus,

according to data in this study, it is more efficient to learn from a teacher than to explore on

one’s own when that learning is focused on “early” behavior. This is evident in the convergence

Brown 9

of the learning trajectory for the Little Brother at earlier periods than the Big Brother in the

robust/comparable and robust/lagging regimes. However, as the initial training sessions increase,

one achieves similar convergence patterns in the Little Brother with less efficient time expenses.

Does learning from incomplete knowledge sources serve as a hindrance or an assistant in

most cases? Is this proportional to the level of knowledge that the “amateur” teacher has?

The presence of a teacher never seems to serve as a hindrance; however, the utility of the

teacher declines as the sum of initial training sessions increases. We hypothesize that this is the

result of a high utility when it comes to avoiding “amateur” mistakes, however, once these early

mistakes are evaded, the Little Brother follows an otherwise standard learning trajectory.

Limitations

The greatest limitation of this study was the availability of data. Training the dual neural

networks took significant sums of time, and as such, we were able to capture relatively few trials

to conclude from. In order to augment the validity and generality of this study, we will replicate

the experiments and increase the total sum of trials.

Further Extensions

In many ways, this experiment has generated more questions than answers. For instance,

how did the specifics of the sampling probabilities affect the behavior of this system? To address

this, one avenue of study would be to test different “personalities” of Little Brother, essentially

meaning that one changes the probability that the Little Brother will sample from the Big

Brother; for example, if the Little Brother is “obedient” he will sample often, but if he is

“rebellious,” he may sample relatively sporadically. In doing so, a fuller picture of the efficiency

metric could develop and would better elucidate the effects of the incomplete teacher. Another

question is whether there is some analytical means of determining the optimal initial training

length that maximizes robustness and speed in relation to the size of the state space. Such

endeavors would be far more mathematical in nature but would be an interesting investigation in

the relationship between task complexity and performance. A final potential direction could be

the use of a different sort of sampling method, such as the policy distillation methods discussed

at the beginning of this study. Such methods may increase the efficiency of the initial training for

the Little Brother, and in doing so augment the utility of this method even further.

Conclusion

Investigations such as this work to challenge the methodologies of human beings in an

analytical way, with hopes that such challenges will bear the fruit of a more effective systems. In

addition, there is the added corollary that Mother Nature has encouraged the development of

efficient systems for survival, and thus it makes intuitive sense that as we attempt to develop our

own learning systems, we take inspiration from her. The utility of such action is evident here –

the learning dynamics of this coupled system led to greater efficiency and robustness than a

singular learning entity. In general, we find this utility present and as such are encouraged to

Brown 10

continually take lessons for the empirical dynamics of the human experience and augment our

systems thusly.

- Who is a Lifelong LearnerÎncărcat deDexter Jimenez Resullar
- Application of Soft Computing in Electrical Engineering IJERTCONV5IS01058Încărcat deAparna Goli
- Anfis RobotÎncărcat deJonnathan Raúl Díaz
- An Introduction to the Use of Neural Networks in Control SystemsÎncărcat deSuraj Sharma
- INTELIGENCIA ARTIFICIAL EN LA RADIODIFUSIONÎncărcat deKARIM PALACIOS
- Artificial Intelligence in Cyber SecuritÎncărcat dePiyush Kunwar
- Estimation of future water level based on rainfall data using ANN.docxÎncărcat deShûßhàm Dëshmûkh
- mc0076Încărcat deneeraj385
- surbhiÎncărcat deEr Shanu Rathore
- 9783319608365-c2Încărcat deHabib Mrad
- 2DMatrixNNÎncărcat deGebeleizis Morbidu
- BasicSfuncitonsÎncărcat deOseas Omero
- GlobalÎncărcat deEricChen
- On the Improvement of the Turing MachineÎncărcat deHardik A. Gohel
- Neural+Fuzzy+GaÎncărcat deShubhra Verma
- A Neural Network ApproachÎncărcat deyugeswar
- cÎncărcat deRajesh Ramakrishnan
- Prediction of bearing capacity of circular footings on soft clay stabilized with granular soilÎncărcat deAhmet Demir
- Untitled 1Încărcat deNawaz Oravingal
- data mining using neural networksÎncărcat deLubna Shaikh
- Paper-1 Texture and Intensity Based Classification of Malaria Parasite in Blood Images Using Levenberg-Marquardt (LM) AlgorithmÎncărcat deRachel Wheeler
- uso modelosÎncărcat deAlberto Garcia Cantillo
- 10.1.1.9.5896Încărcat deIon Mikel Onandia Martinez
- %2Fvar%2Ftmp%2Fpdf%2Fpdf_486959553_2015-08-03%2F7719974e6c7167eb9845b63e921577d6.pdfÎncărcat deChan Mehboob
- Hierarchical Neural Network Algorithm for Classification of Normal Daily Activity Using Wearable SensorsÎncărcat deTan Quoc Huynh
- [Arxiv] Wavelet Convolutional Neural Networks.pdfÎncărcat deMohit
- p52Încărcat deCradle
- le14Încărcat deFranz Liszt
- CIGRE-140 Wind Farm Fuzzy Modeling for Adequacy Evaluation of Power SystemÎncărcat dekamuik
- Philo 213Încărcat deBernard Frias

- digital unit plan templateÎncărcat deapi-338028697
- INTRINSICALLY MOTIVATING INSTRUCTIONÎncărcat deCHeart
- Lesson Plan ListeningÎncărcat deRena Juliana
- math lesson plan- peas and carrotsÎncărcat deapi-252651233
- ch 2.docxÎncărcat deMalik Muhammad Usman
- 3 tiered assessment plan bowens week5 5130Încărcat deapi-336353028
- Standard Costing and Variance Analysis for Prime Costs - HiltonÎncărcat defarhanyusri
- Foem gujranwalaÎncărcat deUsman Rizvi
- social studies matrix paperÎncărcat deapi-298479644
- Abstract_Ayesha Hafeez GondalÎncărcat deMuhammad Shahzad Mughal
- Grade 10 Social Studies - Essential LearningsÎncărcat deMrBinet
- IdentityÎncărcat deloyblogger
- 2019SP-PSY-213-WB1-1.pdfÎncărcat deBrendon Coop
- DLL in UCSP 12 - 1Încărcat dehenry tulagan
- Suggested Parent Interview GuideÎncărcat dePaulojoy Buenaobra
- copy of aes tpack-samr staff slidesÎncărcat deapi-306940216
- Sample 1Încărcat dekishan
- english training proposal 2015.docxÎncărcat deSaarah Thifal Aini
- basistrening presentasjon ntg 2015-engelskÎncărcat deapi-264075612
- lesson plan 2 johannahmurdock kin 355Încărcat deapi-250886778
- Certificate of Performance RatingÎncărcat deJhigz Avenid
- description: tags: performance2006Încărcat deanon-919317
- Syllabus ANNÎncărcat deNIRMAL KUMAR
- Anne Marie de Mejía Bilingual Education in South AmericaÎncărcat deMaxwell Miranda
- lesson plan gamemaker 4Încărcat deapi-169562716
- ABSTRACT Buk SukmaÎncărcat demuazzinnur
- jurnal VAKÎncărcat deica
- d_6_visar_gui_1702_1_e.pdfÎncărcat deArya Jain
- pact math lesson plansÎncărcat deapi-259165264
- [Duch-07] What is Computational Intelligence and What Could It BecomeÎncărcat deslmscribd

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.