Kirkpatrick Level 4 Is Not a Nice-to-Have — It's the Only Level That Matters — The Lab

Results are the point. Everything else is a proxy. Here's why most L&D stops at Level 1 — and how to build your way to what actually counts.

Donald Kirkpatrick published his four-level training evaluation model in 1959. Sixty-five years later, the majority of corporate L&D teams are still measuring primarily at Level 1. We ask learners how they felt about the training. We call it evaluation. And then we wonder why no one at the executive table takes L&D seriously.

The four levels are: Reaction (did learners enjoy it?), Learning (did they acquire knowledge or skill?), Behaviour (did they apply it?), Results (did it move the business?). Level 4 — Results — is the one that justifies training budgets. It is also the one that most teams treat as aspirational rather than operational.

This is not an accident. Level 4 is genuinely hard. But it is achievable, and the teams that achieve it have a fundamentally different relationship with their organisations than those that do not.

Why Most L&D Stops at Level 1

Level 1 surveys are easy. Every LMS has them built in. You send a five-question form after the course, learners rate their satisfaction, you average the scores and report that 87% of participants found the training "useful" or "very useful." The report looks professional. It takes about four hours to produce. And it tells you almost nothing about whether the training worked.

The problem is not that Reaction data is useless. Highly negative reaction to a course can signal genuine design problems. But high satisfaction scores have consistently weak correlation with learning outcomes and essentially no correlation with behaviour change. People enjoy training they find easy. Easy training is rarely transformative.

Level 2 — Learning — is where most evaluation ambition stops. Assessment scores, pre/post test comparisons, observed skill demonstrations. These are meaningful. But they still measure performance in a training environment, under optimal conditions, immediately after instruction. They do not measure what happens when a learner gets back to their desk, their inbox, their demanding manager, and the hundred other priorities competing for their attention.

The Transfer Gap

Research on training transfer consistently shows that only 10–20% of what is learned in formal training is ever applied on the job in a sustained way. This is not a failure of learner intelligence or motivation. It is a structural failure. Training environments are controlled. Work environments are not. Training content is presented in isolation. Work requires integrating new knowledge with existing habits, competing demands, and imperfect information.

Level 3 — Behaviour — is where the transfer gap becomes visible. Are people actually doing differently what you trained them to do? This requires observation, manager input, or data from the work environment itself. It is harder than a survey. It is also where most of the real information lives.

Level 4 goes further: did the behaviour change produce business results? Did the sales training increase close rates? Did the compliance training reduce incidents? Did the leadership programme reduce voluntary turnover on the teams whose managers went through it?

How to Actually Get to Level 4

The path to Level 4 evaluation starts before the course is designed, not after it is delivered. You cannot measure results if you have not agreed, in advance, what results you are trying to produce and how you will measure them.

This requires a different kind of stakeholder conversation. Instead of asking "what do you want your people to learn?", ask "what business metric are you trying to move, and what is it currently sitting at?" The answer to that question gives you your Level 4 target. Everything else — the course design, the learning objectives, the assessment strategy — should be traceable back to that number.

Establish a baseline. Before training launches, record the current state of whatever metric you are targeting. Customer satisfaction score, safety incident frequency, sales conversion rate, employee satisfaction on manager effectiveness — whatever the outcome is, you need a before number to compare your after number to.

Then measure at meaningful intervals. Behaviour change and business results do not appear immediately after training. They emerge over weeks and months, as new behaviours become habits and habits produce outcomes. Measure at 30 days, 60 days, and 90 days. Look for trend lines, not single data points.

Finally, control for confounding variables as well as you reasonably can. If sales went up after your sales training, was it the training or a new product launch? These questions cannot always be perfectly answered, but acknowledging them and designing simple controls — such as comparing trained and untrained teams if possible — adds rigour that executives will respect.

The Conversation That Changes Everything

There is a moment in every L&D team's development when they stop reporting on activity (courses delivered, completion rates, learner hours) and start reporting on outcomes (behaviours changed, business metrics moved). That moment changes how the organisation perceives the learning function.

When you can walk into a quarterly business review and say: "The leadership development programme correlated with a 12% reduction in voluntary turnover on participating teams over six months, compared to 3% on non-participating teams," you are not reporting on training. You are reporting on business impact. That is a fundamentally different conversation, and it opens doors that completion rate reports never will.

"Level 4 is not aspirational. It is the only level that answers the question every CFO is actually asking: did we get a return on this investment?"

Kirkpatrick Level 4 Is Not a Nice-to-Have — It's the Only Level That Matters

Why Most L&D Stops at Level 1

The Transfer Gap

How to Actually Get to Level 4

The Conversation That Changes Everything

Found this useful?