Positive Reinforcement (+R): When something is added to the environment after a behavior that will increase the occurrence of the behavior (examples: treats, praise, play, getting to do something the animal wants to do, like sniffing something interesting or running, cues and clicks are also reinforcers when used correctly).
Negative Reinforcement (-R): When an aversive is applied and then eliminated to increase the occurrence of a behavior (example: When a leash correction is utilized and then ceased when the desired behavior occurs)
Positive Punishment (+P): When an aversive is introduced during or immediately after a behavior to reduce the occurrence of the behavior (examples: shocks, leash corrections, hitting, yelling).
Negative Punishment (-P): When something rewarding is taken away to reduce the occurrence of a behavior (example: walking away when a dog jumps on you).
For more information and examples, visit: http://www.clickersolutions.com/articles/2001/ocguide.htm
The two days I spent in Columbia, Maryland listening to Kathy Sdao talk about cues and speaking with other positive reinforcement trainers gave me enough to think and write about for at least six months.
One of the things I keep reflecting on was a fascinating conversation I had with one seminar attendee after the last day. As we were gathering our things to leave, Kathy’s co-instructor, Carolyn Barney, asked me if I would give this woman my herding trainer’s information and tell her a little bit about what we do.
During the course of the conversation, I found out that this woman was doing some herding with her Border Collies, but she didn’t seem thrilled with her trainer. She described the training as, “Nothing bad”. When I told her that we never verbally correct our dogs in training (much less physically), the woman responded that they didn’t either, “Except, you know, she’ll yell at the dog when it’s gripping the sheep or something.”
I really wish I wasn’t so afraid to upset people, because that statement bothered me, and I later kicked myself for not speaking up. First of all, this woman didn’t know what tending was. I’m pretty green, but I know the basics which told me she was greener, which means her dog is too. There is no reason I can see why a dog that early in its training would be put in a situation where it would be gripping sheep. It seemed like setting the dog up to fail and then punishing it. Not to mention the potential for redirected aggression on the stock from the correction itself or causing fear or anxiety in the dog when it was around sheep.
Interestingly, this woman wasn’t concerned about the yelling or the gripping, which were red flags to me. Instead, she was concerned about her trainer’s use of “pressure” to get the dog to change direction.
“Pressure” as I understand it and as I have seen it utilized with driving and fetching dogs, is where the handler uses his/her body (and specifically body position and angle) to turn the dog without ever making contact with the dog. Basically, it is invading the dog’s flight zone or interfering with his path of travel to get him to turn. When he turns, the handler backs away. Patricia McConnell calls this maneuver a “body block”. Carolyn calls it “dancing with the dog”.
This particular woman seemed very concerned about pressure. Now, she had to hurry off, so I never got a chance to ask her what exactly she meant by her phrase, though I did get to briefly explain how I’d seen it used. Judging by her long, “Ohhhh” and the relief in her eyes, I think her trainer might have been using “pressure” differently than I’ve seen it used, but the question and the conversation led me to finally address this ugly beast.
When a trainer says they use “positive reinforcement” does that mean that’s all they use all the time?
I believe the answer is an overwhelming, resounding – no.
I can only speak for myself, but I think it’s impossible to strip away a dog’s experiences to only positive reinforcement. I think it’s a good goal, to get as close to that as you can, but I also think it’s important to keep in mind that you probably will never get there (I’ll probably never win the powerball either, but I keep playing).
The fact of the matter is that negative reinforcement, positive punishment and negative punishment happen. For those of us who are mostly +R junkies, we wish it didn’t and sometimes we beat ourselves up when it does, but it happens. All we can do is try to mitigate it.
For example, beating your child until he is black and blue because he got a bad grade is positive punishment, but so is yelling at him or just giving him a disappointed frown. Some of those things are going to have more of a long term effect on him than others. Likewise, shocking a dog when he gets out of a “heel” position is positive punishment, but so is just standing still while he pulls on the end of a leash attached to a harness (not to be confused with the application of collar cues where the light pressure is an antecedent to the desired behavior). Again, some methods have more serious long term effects.
When positive reinforcement trainers say that they are positive reinforcement trainers, I don’t think they mean that that’s all that they use. If that is what they mean, they are most likely misinformed. If that’s truly what they mean and they’re not misinformed, I’d like to train with them. I think what most trainers really mean is that they try to use as much positive reinforcement as is humanly possible and that they will choose a +R solution over a +P solution most of the time. I also think the label “positive reinforcement trainer” was established so that the trainers who use mostly +R methods could distinguish themselves from “balanced trainers” and other trainers who use more even keeled approaches to the operant conditioning quadrants (i.e. they may use positive punishment 25% of the time and positive reinforcement 25% of the time, etc. instead of where a “positive reinforcement trainer” may try to use positive reinforcement more like 80 or 90% of the time). Besides, going through this whole long spiel that I’m going through right now doesn’t exactly fit well on a business card.
The other differentiating factor I find in “positive reinforcement trainers” versus various other trainers is the severity of the use of methods that fall into the other quadrants. Positive reinforcement trainers try to use the least severe positive punishment methods available if they are forced to use them at all. For example, in the pressure situation I described above, the scenario actually plays out something like this at Raspberry Ridge:
Slight pressure is added to the dog’s flight zone (the handler invades the dog’s flight zone from several feet away without ever making physical contact with the dog). That’s +P where the positive punishment is about a level 1 on the punishment scale I just made up in my head. When the dog moves, the handler backs away, relieving the pressure. That’s –R where the negative reinforcement is rewarding the dog’s correct movement by backing out of its personal bubble. Then, when the dog is performing the right behavior, you click or verbally mark, praise and the dog gets to continue working the stock. That’s +R where the positive reinforcement is at about a level 8 on the positive reinforcement scale I just made up in my head (where any good herding dog considers just getting to chase the sheep willy nilly around the fields all day is at least a level 10). In this one fluid scenario you are applying three quadrants.
The difference here is that “pressure” is being applied by a positive reinforcement trainer. The positive punishment is relatively mild and avoided if at all possible (by using management tools) and the positive reinforcement is a very high value reward.
Here’s a scenario where you could teach the same behavior, but I would not consider you a +R trainer (Note: I am not and have never trained a dog to herd this way, this situation is only from texts I have read, please do not rely on this advice to train your dog. If you want to herd this way, consult a professional trainer, but better yet, try a positive reinforcement one). The dog is headed toward the stock in a direction you don’t want him to. Lucky you have that electronic shock collar. As the dog moves to grip the sheep, you shock the dog. That’s +P where the positive punishment is about a level 8 on my punishment scale. When the dog moves, the shock stops. That’s –R. When the dog is performing the correct behavior, you do or say nothing. That’s actually –R as well, where the aversive has been removed and by performing the correct behavior the dog gets to avoid punishment. Lucky him. Positive reinforcement is not utilized at all. The behavior can be taught both ways. Which one do you think is more pleasant?
Let’s take another quick example that most people are more familiar with. Scenario one – positive reinforcement trainer teaching a dog to lie down.
Dog looks at the ground, click and treat with a good treat. +R where the positive reinforcement is about a level 5. Dog looks further at the ground, click and treat with a good treat. +R. Dog looks at the sky. No click and treat. –P where the dog doesn’t get to win the treat that round (arguably this isn’t even –P but for the sake of this post I’m going to consider it as such). Dog tries again and puts his paws on the ground. Click and jackpot treat. +R where the positive reinforcement is about a level 6. Dog lies down. Click and treat and praise profusely. +R where the positive reinforcement is about a level 7.
Scenario two – punitive trainer teaching a dog to lie down.
Dog is standing there, trainer takes his/her hands and forces the dog to lie down by pushing the dog to the ground. +P where the positive punishment is about a level 4. Dog gets up. Trainer forces dog to the ground again and sternly says, “Lie down”. +P where the positive punishment is about a level 5. Dog gets up. Trainer says, “Lie down”. Dog stands there. Trainer pushes the dog to the ground and again sternly says, “Lie down”. +P where the positive punishment is about a level 5. Rinse and repeat. Eventually, when the dog lies down on command, the trainer may use verbal praise or a pet. This is actually a combination of –R where the dog is attempting to avoid a punishment and +R where the positive reinforcement is only about a level 2.
See the difference?
So when people say that clicker trainers and positive reinforcement trainers only use positive reinforcement and we only stick to “our” methods, that’s not exactly true. The difference between positive reinforcement trainers and other trainers is that we will sit down and think creatively and try to find ways to use +R as much as possible. If my dog runs out into the street and isn’t responding to my recall, I’m not going to sit back and think, “Hm…how can I make this rewarding?” I’m going to do what every sane dog owner in the world would do; I’m going to stop wasting my breath with continuing to call the dog and go get the dog. Later, I might beat myself up about yanking her collar or even grabbing her tail if I have to, but as long as she’s breathing at the end of the day, I’m going to be okay, and so is she. Afterward, however, I’ll think long and hard about how to make it so that that never happens again (like using management tools in the future or by cracking down on recall and boundary training in a controlled situation), one, because it’s not safe and two, because I don’t like to use positive punishment if there is another way.
I’m always trying my best to amplify positive reinforcement and minimize positive punishment and in my mind, that’s the definition of a positive reinforcement trainer.