021 The process of behavioral training

8/8. The process of ‚training‘ will now be shown in its relation to ultrastability. All training involves some use of ‚punishment‘ or ‚reward‘, and we must translate these concepts into our form. ‚Punishment‘ is simple, for it means that some sensory organs or nerve endings have been stimulated with an intensity high enough to cause step-function changes in the nervous system. The concept of ‚reward‘ is more complex. It usually involves the supplying of some substance (e.g. food) or condition (e.g. escape) whose absence would act as ‚punishment‘. The chief difficulty is that the evidence suggests that the nervous system, especially the mammalian, contains intricate and specialised mechanisms which give the animals properties not to be deduced from basic principles alone. Thus it has been shown that dogs with an oesophageal fistula, deprived of water for some hours, would, when offered water, drink approximately the quantity that would correct the deprivation, and would then stop drinking; they would stop although no water had entered stomach or system. The properties of these mechanisms have not yet been fully elucidated; so training by reward uses mechanisms of unknown properties. Here we shall ignore these complications. We shall assume that the training is by pain, i.e. by some change which threatens to drive the essential variables outside their normal limits; and we shall assume that training by reward is not essentially dissimilar.

In other training experiments, the regularity of action 2 (supplied above by the constant physical properties of glass) may be supplied by an assistant who constantly obeys the rules laid down by the experimenter. Grindley, for instance, kept a guinea-pig in a silent room in which a buzzer was sounded from time to time. If and only if its head turned to the right did a tray swing out and present it with a piece of carrot; after a few nibbles the carrot was withdrawn and the process repeated. Feedback is demonstrably present in this system, for the diagram of immediate effects is:


The buzzer, omitted for clarity, comes in as parameter and serve merely to call this dynamic system into functional existence; for only when the buzzer sounds does the linkage 2 exist. This type of experiment reveals its essential dynamic structure more clearly if contrasted with elementary Pavlovian conditioning. In the experiments of Grindley and Pavlov, both use the sequences ‚ … buzzer, animal’s response, food‘. In Grindley’s experiment, the value of the variable ‚food‘ depended on the animal’s response: if the head turned to the left. ‚food‘ was ’no carrot‘, while if the head turned to the right, ‚food‘ was ‚carrot‘ given‘. But in Pavlor’s experiments the nature of every stimulus throughout the session was already determined before the session commenced. The Pavlovian experiment, therefore, allows no effect from the variable ‚animal’s behaviour‘ to ‚quantity of food given‘; there is no functional circuit and no feedback. 

It may be thought that the distinction (which corresponds to that made by Hilgard and Marquis between ‚conditioning‘ and ‚instrumental learning‘) is purely verbal. This is not so, for the description given above shows that the distinction may be made objectively by examining the structure of the experiment.