Behavior Change: The Complete Science-Based Guide

Roughly half of the people who decide to change a behavior will fail to do it. They’ll set the goal, feel the motivation, maybe even start, and then nothing sticks. Sheeran (2002) put the number at 47%: nearly half of people who form a genuine intention to change never follow through at all.

That’s not a failure of willpower. It’s not a failure of information. It’s a failure of the models we’ve been using to think about behavior change.

Most guides on this topic hand you a framework, a set of “hacks,” and a pep talk. They assume that if you just understand the right principles and try hard enough, you can change anything. This guide is different because it takes the full evidence seriously, including the parts most guides ignore. Like the fact that your personality traits are roughly 50% heritable and remarkably stable across your adult life. Or that the celebrated nudge literature collapses to near-zero effect sizes once you correct for publication bias. Or that the willpower-as-a-limited-resource theory, which shaped a decade of interventions, essentially failed to replicate.

If you want to actually change behavior (yours, your team’s, your users’) you need to understand what the evidence really shows, not the cleaned-up version.

That’s what this guide does.

How to Read This Guide

This is long. You don’t need to read it start to finish. If you want the practical playbook, skip to “How to Design for Behavior Change.” If you want the scientific foundation that makes the playbook make sense, start here and read through. Jump around. Come back to sections when you need them.

A note on numbers. This guide reports effect sizes from meta-analyses throughout. Here’s what the key metrics mean. d (Cohen’s d) measures how much an intervention group differs from a control group in standard deviation units. d = 0.20 is small, 0.50 is medium, 0.80 is large. Most behavior change interventions fall between 0.15 and 0.50. r (correlation) measures how strongly two things are related, from 0 to 1. In psychology, r = .30 is a meaningful relationship. R² (variance explained) tells you what percentage of the differences between people a predictor accounts for. For perspective: the average effect size across all of social psychology is about d = 0.40 (Richard, Bond, & Stokes-Zoota, 2003). Many of the numbers in this guide are smaller than that, which is part of the story.

On scope. This guide covers the science of behavior change as studied in social, health, and organizational psychology, behavioral economics, and personality science. It does not cover clinical psychotherapy (CBT, ACT, DBT), trauma-informed approaches, or structural determinants of health, each of which has its own substantial evidence base and deserves its own treatment.


Table of Contents

  1. What Is Behavior Change?
  2. The Science of How Behavior Works
  3. Major Behavior Change Frameworks
  4. What Actually Works (And What Doesn’t)
  5. Why Behavior Change Is So Hard
  6. How to Tell Signal from Story
  7. The Matching Framework: Fit Before Force
  8. Behavior Change Across Domains
  9. How to Design for Behavior Change
  10. Common Myths About Behavior Change
  11. Frequently Asked Questions
  12. References

What Is Behavior Change?

Here’s a scenario most of us know well. You decide, genuinely, that you’re going to exercise regularly. You buy the shoes. You download the app. You block out time on your calendar. Two weeks later, the shoes are in the closet and the app is sending notifications you swipe away without reading.

You didn’t lack the goal. You didn’t lack the information. Something else went wrong, and understanding what went wrong is the entire point of behavior change science.

Behavior change is any measurable modification in what a person does: their actions, routines, habits, or patterns of conduct. It includes:

  • Starting new behaviors (beginning an exercise routine, adopting a meditation practice)
  • Stopping existing behaviors (quitting smoking, cutting back on alcohol)
  • Shifting how behaviors are performed (eating more slowly, switching from driving to cycling)

That sounds simple. It isn’t. Because the science is clear on something most people overlook: behavior change is not the same as attitude change. You can completely change your beliefs about exercise without exercising more. This gap, between what people intend to do and what they actually do, is one of the most robust findings in the field (Sheeran, 2002; Webb & Sheeran, 2006). We’ll get into the numbers shortly. They’re striking.

Not All Behavior Change Is the Same

Here’s something that most frameworks gloss over: different types of behavior require fundamentally different strategies. Lumping them together is a mistake.

One-off decisions are behaviors you perform once (or rarely) and the consequences play out over time. Signing up for a 401(k). Choosing a health insurance plan. Getting vaccinated. For these, the dominant challenge is initiation: getting the person to act at the decision point. Simplification and friction reduction can help here, though it’s worth noting that defaults (like auto-enrollment) aren’t really behavior change at all. They’re administrative actions performed on people, not by people. We’ll come back to this distinction, because it matters more than the field admits.

Repeated behaviors are things you need to do again and again: exercising, eating well, taking medication, meditating. Here, the challenge shifts from initiation to maintenance. Simplifying a signup flow gets you enrolled at the gym; it doesn’t get you there on a Tuesday in February when it’s raining and you’re tired. An important nuance: most of these behaviors are goal-directed, not habitual (we’ll unpack this distinction shortly). You can automate small initiatory cues (laying out your gym clothes the night before, setting a consistent alarm) but the core behavior itself requires ongoing conscious effort. That means repeated behaviors require sustained motivation, environmental design, and, critically, compatibility with who you are as a person. The fit between you and the behavior determines sustainability far more than any automation strategy.

Complex behavioral patterns are multi-component, context-dependent bundles of action: managing a team effectively, parenting well, navigating a chronic illness. These aren’t single behaviors at all; they’re repertoires that require judgment, adaptation, and ongoing skill development. No single framework handles these well, and anyone who tells you their model covers all three types equally is oversimplifying.

The strategy that works for getting someone to check a box on an enrollment form is not the strategy that works for getting them to exercise four times a week for the rest of their life. Knowing which type of behavior change you’re dealing with is the first design decision you need to get right.

A Brief History

The scientific study of behavior change has roots in multiple traditions, each with genuine insights, and each with oversimplified claims the evidence doesn’t support.

Behaviorism (1900s-1960s). B.F. Skinner, Ivan Pavlov, and others established that behavior is shaped by reinforcement, punishment, and environmental contingencies. This tradition emphasized observable behavior over internal mental states, a useful corrective to armchair theorizing, but it overcorrected by treating the person as a black box.

Social cognition (1970s-1990s). Albert Bandura’s Social Cognitive Theory, Icek Ajzen’s Theory of Planned Behavior, and the Health Belief Model shifted focus to beliefs, attitudes, and self-efficacy as drivers of behavior. This was the era of “if we change how people think, we change what they do.” It was partly right, and the parts where it was wrong are exactly the intention-action gap we’ll cover below.

Behavioral economics (2000s-present). Daniel Kahneman, Richard Thaler, and Cass Sunstein brought attention to cognitive biases, heuristics, and choice architecture, the idea that the structure of decisions shapes behavior as much as individual motivation. This unlocked real design insights, but also generated enormous hype that the real-world evidence has since moderated considerably.

Individual differences (ongoing). The behavioral genetics and personality science traditions, drawing on twin studies and longitudinal research, have increasingly demonstrated that stable individual differences (personality traits, cognitive ability, and interests) powerfully constrain and channel behavior change. This is the tradition most behavior change guides ignore. This guide doesn’t.

Understanding what the evidence actually shows requires looking across all of them, and being honest about where each tradition overstepped.


The Science of How Behavior Works

Before you can change behavior, you need to understand how it operates in the first place. And the first thing to understand is that your brain doesn’t run behavior through a single system.

Your Brain Has Two Operating Modes

Imagine you’re driving home from work. You take a different route because you heard about construction on the radio; that’s one system. You arrive at your usual destination without remembering a single turn; that’s the other.

The brain has two distinct systems for controlling action, supported by different neural circuits in the basal ganglia (Yin & Knowlton, 2006; Graybiel, 2008):

Goal-directed behavior is controlled by the dorsomedial striatum and prefrontal cortex. It’s flexible, deliberate, and sensitive to outcomes. You consider the options, evaluate the consequences, and choose. It’s powerful but expensive; it requires attention and cognitive resources.

Habitual behavior is controlled by the dorsolateral striatum. It’s automatic, triggered by context cues, and largely insensitive to changes in outcomes. Once a habit forms, it fires in response to the cue regardless of whether the outcome is still valuable. This is why you sometimes drive to your old apartment months after moving.

The transition between these systems is gradual, not a switch. With repeated practice in a stable context, behavioral control shifts from prefrontal deliberation to striatal automaticity (Wood & Neal, 2007). Your brain essentially “chunks” the behavior, packaging multi-step sequences into single units that require less cognitive overhead (Graybiel, 2008).

Here’s the critical catch: the old habit trace is never fully erased. It’s overridden by new learning, not deleted. This is why relapse is so common: the original habit can be reactivated by the original context cues, especially under stress or cognitive load. Anyone who has quit smoking and then lit up at a party where everyone was smoking understands this at a visceral level.

Most Behaviors People Want to Change Aren’t Habits

Here’s where a critical distinction gets lost in nearly every popular treatment of this topic: the scientific definition of a habit is far narrower than the everyday one.

In neuroscience, a habit has a precise operational test called outcome devaluation (Dickinson, 1985; Daw & O’Doherty, 2014). You train an animal to perform an action for a reward. Then you devalue the reward, make the food undesirable. Then you test whether the animal still performs the action. If it stops, the behavior was goal-directed: it was being maintained by an ongoing evaluation of the action’s consequences. If it persists despite the reward being worthless, it’s a true habit: a stimulus-response reflex running on autopilot, disconnected from current goals.

This test has been replicated extensively in both animals and humans, and here’s what it reveals: true habits only emerge after extensive overtraining on simple behaviors in stable contexts (Tricomi, Balleine, & O’Doherty, 2009). In human studies, genuine devaluation-insensitive responding required multiple days of repetitive training on a basic stimulus-response task. The behaviors were simple. The environments were controlled. And even then, the transition was gradual.

Now think about the behaviors people actually care about changing: exercising, eating well, writing, studying, building a business, managing stress. Every one of these is complex, effortful, and requires ongoing conscious evaluation. Would you stop exercising if a doctor told you it was harming your joints? Of course, because exercise is goal-directed behavior. You’re continually evaluating whether the outcomes are worth the effort. That’s the opposite of a habit.

This means that most of what the self-help industry calls “habit formation” is actually an attempt to automate behaviors that are, by their nature, resistant to automation. You can build small automaticities around initiatory cues: putting on your running shoes when the alarm goes off, opening your journal when you sit down with coffee. These micro-sequences can become genuinely habitual. But the core behavior itself (the run, the writing, the focused study session) remains goal-directed. It requires conscious engagement, draws on motivation, and responds to changes in your goals and circumstances. It will never happen on autopilot.

The practical implication cuts deep: if the behavior you’re trying to change is fundamentally goal-directed (and it almost certainly is), then the most important question isn’t “how do I make this automatic?” It’s “have I picked a behavior I can sustain with ongoing conscious effort?” That means the fit between you and the behavior (whether you find it tolerable, meaningful, or even enjoyable) matters far more than any habit hack. Because you’re going to be consciously choosing this behavior, day after day, for as long as you do it. If you hate it, no cue-routine-reward loop will save you.

This doesn’t mean habits are irrelevant. True habits (checking your phone, grabbing a snack, biting your nails) are real and powerful, and the science of breaking and replacing them is valuable. But the science of forming habits has been wildly oversold as a strategy for complex self-improvement behaviors that were never going to become automatic in the first place.

How Long Habits Actually Take to Form (Not 21 Days)

Let’s kill a myth right now. The “21-day habit” figure has no scientific basis. Zero. It traces to a 1960 self-help book by plastic surgeon Maxwell Maltz, who noticed his patients took about 21 days to adjust to their new appearance after surgery (Maltz, 1960). That’s a finding about self-image adaptation, not habit formation. It got picked up by self-help authors, lost its caveats, and became gospel.

The most-cited empirical study on habit formation is Lally et al. (2010), which tracked 96 volunteers attempting to form new daily behaviors. The headline finding (“it takes 66 days to form a habit”) has been repeated in thousands of articles, books, and apps. Almost none of them mention what the behaviors actually were.

Here’s what participants were trying to make habitual: drinking a glass of water with lunch. Eating a piece of fruit. Doing a set of sit-ups after morning coffee. Taking a 10-minute walk after breakfast. These are about as simple as a behavior can get, single, bounded actions performed once a day in a fixed context. No one in this study was trying to “build an exercise habit” or “eat healthy” or “write every day.” They were trying to drink water and eat fruit.

And here’s how long even that took, among the minority who succeeded:

  • Median time to reach automaticity: 66 days, but only for the 48% of participants who showed the expected habit-formation pattern
  • Range: 18 to 254 days
  • Drinking a glass of water (arguably the simplest voluntary behavior a person can perform) took a median of 59 days, and a quarter of participants were projected to need more than 75 days
  • Eating a piece of fruit took a median of 65 days, with a quarter of participants projected to need over 106 days, well beyond the study’s 84-day observation window
  • Physical activities (sit-ups, short walks, 15-minute runs, not “exercise” in the sense most people use the word) took a median of 91 days, meaning most participants in this category hadn’t even plateaued by the time the study ended
  • Missing a single day did not significantly derail the process

Consider what that means: it took two months for motivated volunteers to make drinking a glass of water feel automatic, and a quarter of them needed even longer. For physical activities, the timeline stretched to three months, with the majority of those participants still on an upward trajectory when the observation window closed at 84 days.

And the majority of participants didn’t get there at all. The asymptotic curve (the statistical signature of habit formation) was a good fit for only 39 of 82 participants (48%). In a study specifically designed to form habits, with motivated volunteers who chose their own behaviors and tracked them for 12 weeks, more than half did not show the expected habit-formation pattern. Eight participants showed no automaticity increase whatsoever: a flat line across the entire study. Twelve more produced data that the statistical model couldn’t fit at all; of those, five appeared to be increasing linearly at day 84, with no plateau in sight.

The picture is even starker for physical activities. Of the 34 participants who chose activities like sit-ups or short walks, only 13 (38%) showed the expected asymptotic curve. The other 21 (roughly two-thirds) either dropped out, showed no change, or produced data inconsistent with habit formation. The study’s authors speculated that these participants were “relatively slow” and might have reached a plateau with more time. Perhaps. But the data themselves do not demonstrate this, and twenty-one non-converging cases out of thirty-four is not a hopeful minority. It is the majority.

The variation is enormous, and the variation itself is the point. How fast you form a habit depends on what the behavior is, how consistent your context is, and (we’ll get to this) who you are.

But there is an even deeper issue with this study that the habit formation literature almost never acknowledges:

The study measured “automaticity,” not habit, and the authors said so explicitly. The researchers used the Self-Report Habit Index (SRHI), which asks participants to rate statements like “I do this automatically” and “I do without thinking.” But as the study’s own authors noted in their discussion, Wood and Neal (2007) had already proposed that complex behaviors develop goal-directed automaticity: a subjective sense of routine that is fundamentally different from true habit. Goal-directed automaticity is flexible and remains tied to conscious goals; if the goal is devalued, the behavior stops. True habit persists regardless. The SRHI cannot distinguish between these two types. The authors wrote: “We have assessed the development of automaticity for performing these behaviours rather than specifically habit. New measures will be needed to disentangle these two forms of automaticity.”

This is arguably the study’s most important finding, and it is almost never cited. Even within the minority who showed the expected curve, we cannot say with certainty that they formed true habits rather than comfortable routines, familiar patterns that still depend on ongoing motivation and would stop the moment the person decided they weren’t worth doing anymore.

The implication for anyone trying to change a meaningful behavior is worth stating plainly. If it takes two months for half of motivated people to automate drinking a glass of water (and the study’s own authors aren’t certain that’s a true habit rather than a routine) the prospect of making genuinely complex behaviors automatic through repetition alone is remote. Exercise, healthy eating, creative work, financial discipline, these aren’t one-step behaviors performed in a single context. They require ongoing judgment, adaptation, effort management, and conscious engagement. They are goal-directed by nature. The Lally data is often cited as evidence that habits just take patience. Read carefully, it is better understood as evidence that true habit formation is limited to very simple behaviors, and that for everything else, we need a different framework entirely.

The Intention-Action Gap: Why Wanting Isn’t Enough

One of the most consistent findings in behavioral science deserves more attention than it gets: wanting to change is necessary but nowhere near sufficient.

Sheeran (2002) found that intentions account for approximately 28% of the variance in behavior. In practical terms, 47% of people who form intentions to change a behavior fail to act on them. Nearly half. These aren’t people who are ambivalent; they’ve formed a genuine intention. They still don’t do it.

Webb and Sheeran (2006) quantified the leakage further. Across 47 experimental studies, a medium-to-large change in intention (d = 0.66) produced only a small-to-medium change in behavior (d = 0.36). Even when you successfully move the intention needle, behavior follows only partially.

Why the gap? Sheeran and Webb (2016) identified three failure modes:

Failing to start. You forget. You miss the right moment. You can’t muster the activation energy to begin. The intention is sitting there in your head, but it never connects to action.

Failing to maintain. You start, but competing goals interfere. You’re emotionally disrupted. Your automatic habits override your new intentions. That second week of the diet, when stress hits and you default to the pantry, that’s this category.

Failing to stop the old behavior. Old habits get triggered by context cues and override new intentions, especially under cognitive load or stress. The intention-action gap is wider under conditions of fatigue and cognitive load, exactly the conditions of real life (Sheeran & Webb, 2016).

This is why one of the most studied strategies for closing the gap isn’t about motivation at all. Implementation intentions, Gollwitzer’s (1999) “if-then” plans (“If situation X arises, then I will perform behavior Y”), work by pre-loading a cue-response link. They effectively delegate behavioral initiation to environmental triggers rather than relying on effortful deliberation. They bypass the gap rather than trying to bridge it with more wanting. As we’ll see below, the original effect size estimates for this technique have shrunk substantially under scrutiny, but the underlying logic of reducing decision load at the moment of action remains sound.

The Role of Individual Differences: Where This Guide Diverges

Here’s where this guide diverges from every other one you’ll read.

Most behavior change guides treat the person attempting to change as a blank slate, a generic human who just needs the right framework, the right motivation, the right environment. Swap in the technique, get the result.

The evidence says otherwise. Who you are matters at least as much as what you do.

A massive meta-analysis of 2,748 publications covering over 14 million twin pairs found that the average heritability of human behavioral traits is 49% (Polderman et al., 2015). Half. Personality traits specifically show heritabilities of 40-60%, and when you measure them properly, using combined self- and peer-report rather than self-report alone, estimates rise to .66-.79 (Riemann, Angleitner, & Strelau, 1997).

Personality traits are also remarkably stable across adulthood. Roberts and DelVecchio (2000) found that rank-order stability rises from about .54 in college years to a plateau of approximately .74 between ages 50 and 70, approaching the reliability ceiling of the measures themselves. That plateau means that the apparent “change” is mostly measurement error, not genuine personality change. The person who was the least conscientious in their cohort at age 25 tends to still be the least conscientious at age 55.

(Yes, there is modest mean-level change: people become slightly more agreeable and conscientious with age, the so-called “maturity principle.” But these shifts are small, typically d = 0.1 to 0.3 per decade (Roberts, Walton, & Viechtbauer, 2006). And crucially, the rankings hold even as the averages drift.)

Now here’s the nuance that prevents this from being deterministic: Fleeson’s Whole Trait Theory (Fleeson, 2001; Fleeson & Jayawickreme, 2015) shows that people vary considerably around their stable mean. Using experience sampling methods, Fleeson demonstrated that each person’s moment-to-moment behavior fluctuates across a real range; an introvert can act extraverted at a party, a disorganized person can be meticulous on a project that matters to them. But each person has a characteristic distribution with a stable center. Think of it as ranges, not rails. You have flexibility within your range. You don’t have unlimited flexibility to become someone else entirely.

What does this mean for behavior change? Three things that most guides won’t tell you:

1. The person attempting to change is not a blank slate. They bring a set of stable dispositions (personality traits, cognitive abilities, interests, and values) that create real constraints on which behaviors are easy to adopt, which are effortful, and which are essentially unsustainable.

2. The same intervention works differently on different people. An exercise program designed for extraverts who thrive on group energy will be a different experience for an introvert who finds gym environments draining. The behavior looks identical; the person-behavior fit is completely different.

3. Acting against your disposition has real costs. Jacques-Hamilton, Sun, and Smillie (2019) ran a randomized controlled trial where participants were instructed to “act extraverted” for a week. Dispositional introverts who did this reported increased negative affect, fatigue, and feelings of inauthenticity. The benefits of counter-dispositional behavior are front-loaded; the costs are back-loaded. Studies that only measure immediate outcomes systematically overestimate the benefits and underestimate the costs.

This doesn’t mean people can’t change. They can. But it means that the fit between the person and the target behavior is a design variable, not a nuisance factor. Ignoring it is why so many behavior change programs produce short-lived results for some people and no results for others.

The evidence points toward a different approach: instead of forcing people to override who they are, design for fit between the person and the behavior. Match first. Force only when you must.

We’ll build out exactly what that looks like in practice later in this guide. But first, we need to look at the major frameworks that have been proposed: what each one gets right, what each one misses, and how they fit together.


Major Behavior Change Frameworks

There are dozens of behavior change frameworks. Most of them are pretty good at describing what influences behavior. Very few of them tell you what to actually do.

This distinction matters more than it sounds. A framework that says “motivation, ability, and environment all matter” is technically correct. But it’s about as useful as telling a drowning person that swimming involves arm movements. The question practitioners actually face is: given this specific person in this specific context trying to do this specific thing, what intervention should I design?

Here are the frameworks that have earned the most scientific attention. For each, I’ll tell you what it does well, what it doesn’t, and what the evidence actually shows. If you want to skip the individual reviews, the comparison table and the commentary following it summarize what matters most.

COM-B Model and the Behaviour Change Wheel

The COM-B model (Michie, van Stralen, & West, 2011) is one of the most widely adopted frameworks in behavioral science, particularly in UK public health. It says behavior arises from the interaction of three necessary conditions:

  • Capability (physical and psychological): Can the person actually do it? Do they have the skills, knowledge, and cognitive bandwidth?
  • Opportunity (physical and social): Does the environment allow it? Are the right tools, resources, and social supports in place?
  • Motivation (reflective and automatic): Does the person want to do it, both at the level of conscious planning and at the level of habits, emotions, and impulses?

The Behaviour Change Wheel wraps around COM-B, linking these three conditions to nine intervention functions (education, persuasion, incentivization, training, environmental restructuring, and so on) and seven policy categories. It was derived from a systematic review of 19 existing frameworks, which is part of why it feels comprehensive; it essentially absorbed its competitors.

What it does well: COM-B gives you a diagnostic checklist. Before you design an intervention, you ask: Is this a capability problem, an opportunity problem, or a motivation problem? That question alone prevents the most common design mistake, which is defaulting to “just educate people harder” when the real barrier is environmental.

What it doesn’t do: COM-B has several limitations worth being honest about.

First, the capability-opportunity-motivation triad is not original to COM-B. The same three-factor structure appears in Blumberg and Pringle (1982), who modeled workplace performance as a function of capacity, willingness, and opportunity. Triandis (1977) proposed a similar decomposition of behavior. A 1991 US consensus meeting on physical activity intervention independently identified the same triad. COM-B’s genuine contribution is the systematic review linking these constructs to specific intervention types, but the constructs themselves have been circulating in the behavioral sciences for decades.

Second, COM-B is close to tautological. Stating that behavior requires the capability to perform it, the opportunity to perform it, and the motivation to perform it is not far from stating that behavior occurs when the conditions for behavior are met. The framework identifies categories of barriers but does not specify mechanisms of change or predict effect sizes. You can diagnose that someone has a “motivation problem,” but COM-B won’t tell you which motivational intervention will work or how large the effect will be.

Third, the three categories are quite coarse-grained. Personality, emotions, beliefs, and social status are all empirically distinct determinants of behavior, but COM-B compresses them into just three buckets, personality and emotions get folded into “motivation,” beliefs get split across “capability” and “motivation,” and social status is subsumed under “opportunity.” When meaningfully distinct constructs are collapsed together, the diagnostic value of the framework is reduced. A practitioner who identifies a “motivation problem” may be looking at a personality mismatch, an emotional barrier, a belief deficit, or a genuine lack of desire, each of which requires a fundamentally different intervention.

None of this makes COM-B useless. As a first-pass diagnostic tool, it’s a reasonable starting point, and the Behaviour Change Wheel provides a structured process that many practitioners find helpful. But treating a low-resolution map as a precision instrument leads to interventions that are correspondingly imprecise.

Fogg Behavior Model (B=MAP)

BJ Fogg’s Behavior Model (Fogg, 2009) proposes that behavior occurs when three elements converge at the same moment: Motivation (the desire to act), Ability (the capacity to act easily), and a Prompt (a cue that triggers action now). If any element is missing or insufficient, the behavior doesn’t happen.

The model is represented as a two-dimensional space with motivation on the y-axis and ability on the x-axis, divided by an “action line.” When prompted, behaviors above the line happen; behaviors below it don’t. The core practical insight: you can move a behavior above the action line either by increasing motivation or by making the behavior easier. Fogg’s work emphasizes the latter, on the principle that reducing friction is more reliable than trying to amplify motivation.

This insight led to Fogg’s most influential practical contribution: Tiny Habits (Fogg, 2020). Instead of committing to “exercise for 30 minutes,” you commit to “do two push-ups after I use the bathroom.” The behavior is so small that ability barriers nearly disappear, and the prompt is anchored to an existing routine. Once the tiny behavior is established, it can scale up naturally. The approach has the significant merit of being immediately actionable, something many academic frameworks lack.

It’s worth noting that Fogg published B=MAP in 2009, two years before COM-B appeared. Both frameworks identify something like motivation, ability/capability, and environmental conditions as necessary for behavior. But where COM-B is primarily a diagnostic framework (identify the barrier category, then consult a taxonomy of interventions), B=MAP is a design framework: it tells you what to do: make the behavior smaller, attach it to an existing prompt, and work with motivation rather than against it.

What it does well: The explicit inclusion of prompts as a necessary condition is a genuine contribution, many behavioral failures aren’t motivational or ability-related; people simply forget or are never cued to act. Fogg’s emphasis on shrinking the behavior rather than amplifying motivation is well-supported by the friction-reduction literature and produces more immediately actionable design guidance than most academic frameworks.

What it doesn’t do: Like COM-B, B=MAP treats the person attempting to change as relatively generic. It doesn’t incorporate personality, stable individual differences, or the question of whether a behavior is a good fit for the specific person attempting it. Two push-ups after using the bathroom is a fine tiny habit for someone who finds push-ups tolerable; for someone who doesn’t, it’s an irritant they’ll abandon within a week. B=MAP is excellent at the mechanics of behavior initiation. It is less equipped to address the deeper question of which behaviors are sustainable for which people, the person-behavior fit problem that, as we’ve discussed, is central to long-term change.

Theory of Planned Behavior (TPB)

Ajzen’s Theory of Planned Behavior (1991) is one of the most tested models in all of social psychology. Its core claim is straightforward: behavioral intention, determined by your attitudes toward the behavior, the social pressure you feel, and how much control you believe you have, is the best predictor of whether you’ll actually do the thing.

The evidence base is massive. Armitage and Conner (2001) found across 185 studies that TPB accounts for about 27% of variance in behavior and 39% of variance in intention. McEachan et al. (2011), using only prospective designs, found TPB explained 19.3% of behavioral variance. TPB-based interventions produce a medium effect on actual behavior (δ = 0.50, artifact-corrected; Steinmetz et al., 2016).

Those numbers are real. They replicate. And they reveal the model’s fundamental limitation.

Here’s the honest take: TPB is essentially a model of rational decision-making. It assumes people think about whether they want to do something, consider what others think, assess whether they can pull it off, form an intention, and then act on that intention. This is a reasonable description of how people decide to, say, sign up for a class or schedule a doctor’s appointment. It is a terrible description of how people eat lunch, check their phone, or pour a drink after work.

TPB doesn’t account for habits. It doesn’t account for emotions. It doesn’t account for the fact that a “medium-to-large” change in intention produces only a “small-to-medium” change in behavior (Sheeran, 2002), the famous intention-behavior gap. Subjective norms are consistently the weakest predictor in the model, suggesting the social pressure component is underspecified. And the model is entirely static: it doesn’t capture how behavior unfolds over time, how feedback loops work, or how context shifts everything.

TPB is excellent for understanding the predictors of behavior. It is mediocre as a guide for changing it.

Social Cognitive Theory (SCT) and Self-Efficacy

Imagine you’ve never run a mile in your life. Someone signs you up for a 5K race next month. Two things will determine whether you actually show up to train: whether you think training will lead to good outcomes (outcome expectations), and whether you believe you’re capable of following through (self-efficacy).

That second part, self-efficacy, is the beating heart of Bandura’s Social Cognitive Theory (1986). It’s your belief in your own capability to perform a specific behavior, and it is one of the most replicated findings in psychology.

Self-efficacy correlates with behavior at r = .30-.45 across domains (Stajkovic & Luthans, 1998; Moritz et al., 2000). Experimentally increasing self-efficacy produces d = 0.47 changes in behavior (Sheeran et al., 2016). The effect is robust across health, education, work, and sport.

Bandura identified four sources of self-efficacy, and the ranking matters: mastery experiences (actually succeeding at the thing) are the most powerful. Then vicarious learning (watching someone similar to you succeed), then verbal persuasion (someone telling you that you can do it), and finally physiological states (interpreting your arousal as excitement rather than anxiety).

What it does well: SCT gives you concrete intervention targets. Want to increase someone’s exercise behavior? Don’t lecture them about the benefits of exercise. Help them succeed at a small version of the behavior first. The mastery-experience pathway is well-supported and deeply practical.

What it doesn’t do: The broader SCT framework, with its triadic reciprocal determinism and comprehensive scope, is so broad it’s difficult to test as a whole. Most studies test self-efficacy in isolation, not the full model. There’s also a chicken-and-egg problem: self-efficacy correlates with behavior, but it may partly reflect past behavior rather than independently causing future behavior. People who’ve exercised before feel more capable of exercising. That’s not self-efficacy driving behavior; that’s behavior driving self-efficacy.

Self-Determination Theory (SDT)

Here’s something most behavior change guides get wrong: they treat motivation as a single dial that goes from low to high. Crank it up and people change. Keep it cranked and they maintain.

Deci and Ryan’s Self-Determination Theory (1985; Ryan & Deci, 2000) says that’s the wrong model entirely. What matters is not how much motivation you have, but what kind.

SDT proposes three basic psychological needs, autonomy (feeling that you’re choosing the behavior, not being coerced), competence (feeling capable), and relatedness (feeling connected to others), and a motivation continuum from purely external to fully intrinsic. The core insight: autonomous motivation (doing something because you genuinely value it) predicts sustained behavior change. Controlled motivation (doing it for external rewards or to avoid guilt) predicts short-term compliance followed by abandonment.

The evidence supports this. Autonomous motivation correlates with health behaviors at r = .26 across 184 studies (Ng et al., 2012). And here’s the finding that should make every incentive designer nervous: expected tangible rewards can actually undermine intrinsic motivation (d = -0.36 for free-choice behavior; Deci, Koestner, & Ryan, 1999).

The practical implication: Paying people to exercise may work for those who had zero interest to begin with. But for people who already exercise because they enjoy it? The payment can shift their motivation from intrinsic to extrinsic, and when the payment stops, so does the behavior. This is the overjustification effect, and it’s one of the most important boundary conditions in incentive design.

What SDT doesn’t tell you: The three-need structure, while robust, can be hard to operationalize for specific behaviors. And the motivation continuum has mixed structural support; some studies find the neat progression from external to intrinsic, others don’t. SDT is excellent theory. Translating it into specific intervention components requires additional work.

Nudge Theory and Choice Architecture

In 2001, a company changed the default enrollment option for its 401(k) retirement plan. Instead of requiring employees to opt in, the new system enrolled them automatically and let them opt out. Participation jumped from roughly 50% to 90% (Madrian & Shea, 2001).

This became the flagship example of nudge theory (Thaler & Sunstein, 2008): the idea that the structure of choices (defaults, framing, salience, simplification) can predictably shift behavior without restricting options or requiring anyone to become a better person.

The field exploded. Nudge units opened in governments worldwide. Behavioral economics became the hottest thing in social science. And the meta-analytic evidence looked impressive: Mertens et al. (2022) found an overall effect of d = 0.43 across 455 effect sizes, with defaults producing the largest effects (d = 0.68).

Then the evidence started collapsing.

Problem 1: Publication bias destroys the effect. When researchers recalculated Mertens et al.’s meta-analysis after adjusting for publication bias using PET-PEESE methods, the effect sizes fell to d = 0.01-0.02 for most nudge categories (Maier et al., 2022). Mertens et al. responded that PET-PEESE can overcorrect when true effect sizes vary substantially across studies, and the methodological debate is ongoing. But even taking the most generous reading of the contested estimates, the original d = 0.43 headline is substantially inflated. Only “structure” nudges (physical rearrangement) retained a measurable effect under any correction method, and even that was d = 0.12, below the threshold for a “small” effect.

Problem 2: Real-world effects are a fraction of academic claims. DellaVigna and Linos (2022) compared academic nudge trials with at-scale nudge unit trials, the kind run by actual government behavioral insights teams working with real populations. Academic studies showed average effects of 8.7 percentage points. At-scale trials? 1.4 percentage points. That’s roughly an 8-fold reduction. If a financial advisor promised you 10% returns and delivered 1.6%, you’d call it what it is.

I’ve been running behavioral experiments since 2011, and I have always been struck by how weak nudges are in the field. They usually fail to have any recognizable impact at all. The academic literature creates a misleading picture because journals favor surprising positive results, and null findings rarely get published. The incentive structure of academic publishing systematically inflates what nudges appear to deliver.

Problem 3: The most celebrated nudge isn’t behavior change. Here’s the deeper issue that the field avoids. A behavior is a mental or physical action that someone performs. When you auto-enroll someone in a 401(k), they haven’t done anything. An administrative system acted on them without their awareness or active decision-making. Calling this “behavior change” is a category error. If your employer automatically deducted 1% of your paycheck for a charity you’d never heard of, would you call that a change in your charitable behavior? Of course not. You’d call it a payroll policy.

The same logic applies to opt-out organ donation, which is the other flagship example. Arshad et al. (2019), comparing 35 OECD countries, found no significant difference in deceased organ donation rates between opt-out and opt-in countries. Opt-out countries actually had fewer living donors (4.8 vs. 15.7 per million population). Spain, which has the highest deceased donation rate globally, achieves it through systemic infrastructure (trained coordinators, ICU protocols, public education), not through a default checkbox.

The honest assessment: Nudges were the most overhyped development in behavioral science. After correcting for publication bias, the evidence that informational and framing nudges change behavior is close to zero. Default “nudges” can change administrative outcomes, but calling that behavior change requires redefining behavior to include things people never consciously chose. The field excels at public relations. It has yet to demonstrate that nudges reliably change what people actually do.

Implementation Intentions

You’ve decided to start exercising. You’re motivated. You’ve even bought running shoes. But somehow, weeks pass and you never actually go for a run.

The problem isn’t motivation. It’s the gap between intending to do something and actually doing it. Gollwitzer’s (1999) implementation intentions, specific “if-then” plans, are one of the most widely studied strategies for bridging that gap.

The format is simple: “If [situation X arises], then I will [perform behavior Y].” If it’s Monday morning and I’ve finished my coffee, then I will put on my running shoes and go out the door. If I’m offered dessert at dinner, then I will ask for fruit instead.

The mechanism is intuitive: forming an if-then plan creates a mental link between a situational cue and a planned response, delegating behavioral control from effortful deliberation to something closer to automatic activation. You’re essentially pre-programming a response so you don’t have to make a decision in the moment.

Evidence, and a cautionary tale about effect size inflation: Gollwitzer and Sheeran’s (2006) meta-analysis across 94 studies reported d = 0.65 for goal attainment, a medium-to-large effect that made implementation intentions look like one of the most powerful tools in behavioral science. That number has been cited thousands of times.

But the original meta-analysis was published as a book chapter in an edited volume (Advances in Experimental Social Psychology), not a standard journal article, and included no publication bias correction. Many included studies came from small lab samples, often from Gollwitzer’s own research group. As the literature has matured (with larger samples, independent replications, pre-registered designs, and domain-specific meta-analyses) the effect has shrunk considerably:

  • Physical activity: g = 0.31 post-intervention, declining to g = 0.24 at follow-up (Belanger-Gravel et al., 2013). A later RCT-only meta-analysis found d = 0.15, which was not statistically significant (da Silva et al., 2018).
  • Unhealthy eating reduction: d = 0.18 (Carrero, Vila & Redondo, 2019, 70 interventions)
  • Mental contrasting with implementation intentions (MCII): g = 0.34 before publication bias correction, dropping to g = 0.24 after trim-and-fill adjustment, with Egger’s test confirming significant funnel plot asymmetry (Wang, Wang & Gai, 2021)
  • Pre-registered field experiment on gym attendance (N = 877): A precisely estimated null effect. Participants engaged with their plans; the plans simply did not change behavior (Carrera et al., 2018).
  • Two large government-funded screening trials (N = 34,633 and N = 39,734): Both found that the implementation-intention-based planning tools had no significant effect on screening uptake (Wilding et al., 2020; Robb et al., 2025 [TEMPO trial, The Lancet]).

A 2024 update by Sheeran, Listrom, and Gollwitzer (the field’s own leading researchers) analyzed 642 independent tests and reported effects ranging from d = 0.27 to d = 0.66 depending on outcome type, with behavioral outcomes at the lower end.

The pattern is clear: the original d = 0.65 was derived from small lab studies and has not held up in larger, more rigorous, real-world tests. The realistic effect for health behaviors is approximately d = 0.15-0.31 (small by Cohen’s conventions) and for repeated behaviors like exercise, it is frequently null.

The critical boundary condition is one-time vs. repeated behaviors. Implementation intentions show their clearest effects for discrete, one-time actions with a narrow window of opportunity: getting a flu shot (+4.2 percentage points; Milkman et al., 2011), returning a screening kit, showing up to vote (+4.1 pp among those contacted; Nickerson & Rogers, 2010). For these fleeting-opportunity behaviors, pre-loading a cue-response link makes sense, you’re bridging a single moment of inaction.

But for repeated behaviors (exercise, diet, creative work) the picture is far weaker. When you can always do it “tomorrow,” the if-then trigger doesn’t carry the same urgency. And as the Milkman megastudy (2021, Nature) demonstrated with 61,293 gym members across 54 conditions (all of which included a planning baseline) only 8% of interventions produced effects that persisted after the four-week program ended. Experts predicted effects that were 9.1 times more optimistic than what was actually observed.

Bottom line: Implementation intentions are a real technique, not a mirage. They can help with discrete, one-time actions where the window is narrow. But the d = 0.65 headline figure is a textbook example of early effect size inflation, and for the repeated behavioral changes that most readers of this guide are trying to make, the evidence base is far weaker than the popular literature suggests. Use them as one tool among many, not as a silver bullet.

The BCT Taxonomy: A Shared Language for What We Do

Before leaving frameworks, it’s worth mentioning the Behaviour Change Technique Taxonomy v1 (Michie et al., 2013), which identified and standardized 93 distinct behavior change techniques organized into 16 groupings. These range from goal setting and self-monitoring to feedback, social support, habit formation, and reward.

The taxonomy solved an important problem: researchers and practitioners had been describing their interventions so vaguely that it was nearly impossible to compare them or figure out which “active ingredients” were actually driving results. Now there’s a shared language.

But having a taxonomy doesn’t tell you which techniques to use when. Ninety-three techniques is a lot of options, and the taxonomy doesn’t specify how they should be combined, sequenced, or matched to specific barriers. Think of it as a periodic table of behavior change elements, essential for cumulative science, but you still need chemistry to know which elements to combine.

Framework Comparison

Framework Predictive Power Intervention Effect Replication Best For
COM-B / BCW N/A (diagnostic tool) Moderate (utility studies) N/A First-pass barrier diagnosis
Fogg Behavior Model (B=MAP) N/A (design tool) N/A (limited RCT evidence) N/A Behavior design and initiation
TPB R² = .19-.39 δ = 0.50 (artifact-corrected) Excellent (correlational) Understanding predictors of behavior
SCT / Self-Efficacy r = .30-.45 d = 0.47 Excellent Building confidence and capability
SDT r = .26 Moderate Excellent Motivation quality and sustainability
Nudge / Choice Architecture N/A d = 0.01-0.02 (bias-corrected) Poor (collapses under bias correction) Administrative defaults; limited for actual behavior
Implementation Intentions N/A d = 0.65 (original); d = 0.15-0.31 (domain-specific) Mixed (see below) One-time actions; weaker for repeated behaviors
BCT Taxonomy v1 N/A (classification) N/A (identifies active ingredients) N/A Standardizing intervention content

A few things jump out from this table. First, the intervention effect sizes are in the same general range across the frameworks that replicate: δ = 0.50 for TPB (artifact-corrected) and d = 0.47 for self-efficacy, both solidly medium effects. Davis et al. (2015) reviewed 82 theories of behavior and behavior change and found that the most frequently recurring constructs across all of them were the same handful: self-efficacy, outcome expectations, intentions, reinforcement, social norms, and environmental context. The frameworks package these constructs differently, but they’re largely drawing from the same pool. Both COM-B and B=MAP identify motivation, ability, and environmental conditions as necessary for behavior. But the resemblance is surface-level. B=MAP translates directly into actionable design steps (shrink the behavior, anchor it to a prompt, work with motivation rather than against it), while COM-B identifies barrier categories without specifying what to do about them. This is the difference between a design framework and a classification system.

Second, notice that two intervention paradigms in this table have undergone significant evidentiary corrections. The original implementation intentions effect (d = 0.65) has shrunk to d = 0.15-0.31 in domain-specific meta-analyses of health behaviors, with pre-registered field experiments sometimes finding null effects. And nudge effects have largely collapsed: the d = 0.43 headline figure from Mertens et al. (2022) drops to d = 0.01-0.02 after publication bias correction (Maier et al., 2022). These corrections are not reasons to dismiss these techniques entirely, but they should calibrate your expectations. When a technique’s effect size drops by half or more under rigorous conditions, you’re looking at a useful tool, not a transformative one.

Third, the broader pattern here is sobering. DellaVigna and Linos (2022) compared 126 RCTs from two major US nudge units against published academic studies and found that real-world effects were roughly one-sixth the size of academic publication effects, a gap driven primarily by selective publication and underpowered studies. This isn’t unique to nudging; it’s a structural feature of behavioral science that should make you skeptical of any headline effect size that hasn’t been tested at scale.

And fourth, none of these frameworks alone is sufficient. Asking “is this a capability problem, an opportunity problem, or a motivation problem?” is useful, but you don’t need a formal framework to do that. Fogg’s B=MAP gives you a practical design process for behavior initiation. TPB and SCT help you understand predictors. SDT tells you about motivation quality. Implementation intentions can help bridge specific moments of inaction. The best practitioners draw from multiple frameworks rather than pledging allegiance to one, but they also recognize that the most important question, the one most frameworks skip, is whether the target behavior fits the person attempting it.

One gap that runs through nearly every framework in this table: none of them explicitly incorporate personality or stable individual differences, despite these being among the strongest and most consistent predictors of behavior (Polderman et al., 2015; Roberts & DelVecchio, 2000). COM-B folds personality into “motivation.” Fogg’s model doesn’t address it. TPB treats it as background noise. This is a significant blind spot. As the individual differences section of this guide documents, who the person is (their traits, preferences, and natural behavioral tendencies) powerfully shapes which behaviors are sustainable and which require permanent scaffolding. A framework that ignores this is working with an incomplete picture.

A note on my own work: I’ve developed a framework called the Behavioral State Model that attempts to address this gap by decomposing behavior into eight specific components (personality, perception, emotions, abilities, social status, motivations, physical environment, and social environment) rather than compressing them into three broad categories. It grew out of the observation, central to this guide, that personality and individual differences are absent from most behavior change frameworks despite being among the strongest predictors of what people actually do. I mention it here for transparency. The model has not yet been tested in peer-reviewed empirical studies, and I hold my own work to the same evidentiary standards I apply throughout this guide.


What Actually Works (And What Doesn’t)

The evidence base for behavior change is large. The honest picture is humbling.

If you’ve read the frameworks section above, you know that behavioral scientists have built increasingly sophisticated models of why people do what they do. But modeling behavior and changing it are different problems. And the evidence for changing it, durably, at scale, in real-world conditions, is more modest than most popular accounts suggest.

Here’s what the research actually shows, organized by strength of evidence.

Tier 1: Strong Evidence of Effectiveness

Environmental restructuring. The most durable behavior changes don’t try to change the person. They change the persistent environment around the person so that different behavior becomes the natural path.

Removing candy from the break room. Putting the stairs in front of the elevator. Redesigning a workflow so documentation happens inside the tool people already use. These are physical or structural changes to the environment that make desired behavior easier and undesired behavior harder. They work because they require no ongoing effort, insight, or motivation from the individual. The environment does the work.

This is the most important pattern in all of behavior change research: genuine environmental restructuring, where the person still acts but the context makes the right action easier, outperforms interventions that try to change the person.

A note on defaults: auto-enrollment in retirement plans (Madrian & Shea, 2001) and green energy opt-out defaults (Ebeling & Lotz, 2015) produced large participation increases. These are often cited as the crown jewels of behavioral science. But they deserve a critical distinction. In a default, the person doesn’t do anything. An administrative system acts on them. Calling this “behavior change” stretches the definition past the point of usefulness. It’s a policy change that produces an administrative outcome, not a change in what people consciously choose or do. Environmental restructuring that changes what people actually do (like putting healthier food at eye level in a cafeteria) is the real Tier 1 finding. Defaults that bypass people’s agency entirely are better understood as policy levers than behavior change.

Pharmacological interventions (where applicable). GLP-1 receptor agonists produce weight loss of 15-22% of body weight (Wilding et al., 2021; Jastreboff et al., 2022), 3-5x larger than behavioral interventions alone. Varenicline roughly doubles smoking quit rates (Cahill et al., 2013). These are real, large effects.

But the STEP 4 trial tells the rest of the story: when semaglutide was withdrawn, participants rapidly regained weight (Rubino et al., 2021). The drug didn’t “teach” people to eat differently. It altered their physiology while they took it, and when the alteration stopped, the behavior reverted.

This isn’t a knock on pharmacological interventions. It’s a clarification of their mechanism. They work through ongoing physiological alteration, not permanent behavior change. Chronic conditions require chronic intervention. That’s true for hypertension medication, and it’s true for GLP-1 agonists.

Contingency management. If you provide tangible reinforcement (money, vouchers, prizes) contingent on verified behavior (like drug-free urine samples), you get the most effective psychosocial treatment for stimulant use disorders (d = 0.42; Prendergast et al., 2006; De Crescenzo et al., 2018). This approach outperforms CBT, community reinforcement, and twelve-step facilitation for stimulant addiction.

The mechanism is straightforward: external reinforcement contingent on objectively verified behavior. It doesn’t require insight, attitude change, or motivation. It works while the contingencies operate.

The catch is equally straightforward: when the contingencies are removed, effects typically diminish. The behavior was maintained by external reinforcement, not internal change. Like pharmacotherapy, contingency management may need to be long-term for chronic conditions, which creates obvious challenges for how treatment is funded and structured.

Tier 2: Moderate Evidence

Self-monitoring. In meta-analyses, self-monitoring shows up more consistently than any other single technique as a component of effective interventions. Michie et al.’s (2009) meta-regression found that interventions including self-monitoring combined with another self-regulation technique produced d = 0.42, versus d = 0.26 without. Harkin et al. (2016) found d = 0.40 for self-monitoring of goal progress specifically. That said, “most effective on average” is exactly the kind of claim this guide cautions against. Self-monitoring works well for people and behaviors where the feedback loop matters (diet, spending, physical activity). It is less relevant for behaviors where the barrier is not awareness of the gap but capability, fit, or environmental structure.

Why does tracking what you eat, how much you move, or how you spend your money actually work? The answer comes from control theory (Carver & Scheier, 1982): self-monitoring creates a feedback loop. It closes the gap between what you’re actually doing and what you intended to do. Without monitoring, that gap is invisible; you drift without noticing. With monitoring, the discrepancy becomes salient, and your self-regulatory system kicks in to reduce it.

This is why self-monitoring is especially powerful when combined with goal setting and feedback: goals define the target, monitoring reveals the current position, and feedback highlights the gap. It’s a complete control loop.

The limitation is that self-monitoring degrades when monitoring stops. If you stop tracking your food intake, the feedback loop breaks. The behavior often drifts back. But while active, self-monitoring is one of the most reliable tools we have.

Implementation intentions. Forming specific “if-then” plans can help bridge the gap between intending to act and actually acting. The original meta-analytic estimate of d = 0.65 (Gollwitzer & Sheeran, 2006) has not held up under scrutiny; domain-specific meta-analyses find d = 0.15-0.31 for health behaviors, and pre-registered field experiments have found null effects for repeated behaviors like exercise (Carrera et al., 2018). The technique is most useful for discrete, one-time actions (scheduling a screening, getting a flu shot) and least useful for ongoing behavioral patterns. If you already want to change but keep failing to follow through on a specific, time-bound action, implementation intentions can help. For sustained behavioral change, other tools are more important.

Motivational interviewing. A client-centered counseling approach that produces d = 0.22 across outcomes (Lundahl et al., 2010). Most effective for alcohol use, least for smoking and diet. Motivational interviewing works better as a catalyst for change initiation than as a maintenance strategy; it helps people find their own reasons for changing, but finding reasons is only the first step.

Combined interventions. The best outcomes typically come from combining multiple approaches. Pharmacotherapy plus behavioral counseling for smoking cessation outperforms either alone. Self-monitoring plus feedback plus goal setting outperforms self-monitoring alone. This makes intuitive sense: behavior is multiply determined, so interventions that address multiple determinants simultaneously tend to do better than those targeting a single lever.

Tier 3: Weak or Inconsistent Evidence

Information and education alone. The assumption that people behave poorly because they lack knowledge, the “information deficit model”, is one of the most persistent and well-refuted ideas in behavioral science. Health education alone produces consistently small or null effects on behavior.

You already know that exercise is good for you. You already know that vegetables are healthier than chips. You already know you should save for retirement. The problem was never information. Telling people facts they already know, with more conviction, does not produce behavior change. It produces head-nodding.

Fear appeals without efficacy. Tannenbaum et al. (2015) found only a small positive effect (d = 0.15), and only when fear messages were paired with information about how to reduce the threat. Fear appeals without accompanying efficacy information (“smoking kills, and there’s nothing easy you can do about it”) can trigger denial and avoidance rather than action. Scaring people is not a behavior change strategy. Scaring people and giving them a clear, doable response path is a modest one.

Sustained counter-dispositional behavior. This one cuts deep. Acting against your personality disposition, such as an introvert forcing themselves to network aggressively or a spontaneous person forcing rigid routines, produces immediate mood gains but delayed fatigue (Leikas & Ilmarinen, 2017). Leikas and Ilmarinen found this fatigue effect was universal, it was not moderated by personality traits, meaning extraverts and introverts were equally affected. But when Jacques-Hamilton, Sun, and Smillie (2019) ran a week-long randomized trial, dispositional introverts who were instructed to “act extraverted” also experienced increased negative affect and feelings of inauthenticity, costs that partially or fully offset the mood gains.

The implication: behavior change strategies that require people to chronically act against their nature face a fundamental sustainability problem. The energetic cost of swimming against your own temperament is real, measurable, and cumulative. This is why the most durable behavior changes either align with existing dispositions or modify the environment rather than requiring ongoing effortful self-regulation against type.

One-shot programs without follow-up. The fade-out effect is one of the most consistent findings in intervention science. Initial gains from interventions typically decline by 50% or more within 1-2 years (Bailey et al., 2017). IQ gains from early childhood interventions fade to near zero within several years (Protzko, 2015). Physical activity effects decay similarly (Fjeldsoe et al., 2011).

The programs that maintain gains? They’re the ones where the environmental change persists: permanently smaller class sizes, ongoing medication, physically restructured environments. One-shot programs for complex behaviors rarely produce durable change, because they’re essentially temporary perturbations of a system that has a strong tendency to return to baseline.

The Timing Window Most People Miss

There’s one evidence-based insight that deserves more attention than it gets: habit discontinuity.

Research by Verplanken and Wood (2006) and Wood, Tam, and Witt (2005) shows that life transitions (moving to a new city, starting a new job, going through a divorce, having a child) create natural windows where old habits are disrupted and new ones form more easily. During these transitions, the contextual cues that maintained old habits are gone, and behavior becomes temporarily more deliberate and malleable.

This is profoundly underused in intervention design. Most behavior change programs are delivered at arbitrary times, with no consideration of whether the person is in a transition window. But the evidence suggests that the same intervention delivered during a life transition may be substantially more effective than the same intervention delivered during a period of contextual stability.

If you’re designing an intervention, ask: can we reach people at moments when their habits are already disrupted? New employee onboarding, post-hospital discharge, the first weeks after a move: these are the moments when people are most receptive to forming new patterns, because the old patterns have already been broken by circumstances.

The Evidence Hierarchy

Across all domains, the interventions with the strongest evidence follow a consistent pattern:

Rank Approach Mechanism Durability
1 Environmental restructuring Persistent context change (person still acts, but context favors the right action) High (as long as environment stays changed)
2 Pharmacological / physiological interventions Ongoing biological alteration Only while treatment continues
3 Self-monitoring + feedback Feedback loops (control theory) Moderate (degrades when monitoring stops)
4 Implementation intentions Environmental cueing via if-then plans Moderate for one-time actions; weak-to-null for repeated behaviors
5 Social norms and accountability Social pressure and identity Moderate but can backfire
6 Nudges (informational, framing) Priming, anchoring, salience Near-zero after publication bias correction
7 Information, education, persuasion Knowledge transfer Weak (rarely sufficient alone)

A caveat before reading too much into the rankings: these are averages across heterogeneous studies and populations. As the Matching Framework section will argue, the most important determinant of whether a technique works is the fit between the technique, the person, and the specific behavioral barrier. Self-monitoring is powerful when the barrier is a gap between intention and awareness. It is irrelevant when the barrier is capability or environmental structure. No technique is universally “best.” The ranking reflects the meta-analytic record, not a prescription for every case.

That said, one pattern does emerge clearly, and it aligns with a principle that runs through this entire guide: interventions that change the persistent environment outperform interventions that try to change the person. Removing the candy from the break room beats telling people about the glycemic index. Redesigning the workflow beats running a training session.

This isn’t because people are dumb or lack willpower. It’s because the environment exerts a constant, effortless influence on behavior, while personal effort is a finite resource. The best behavior change strategies work with this reality rather than against it.

Note what’s near the bottom of the hierarchy. After correcting for publication bias, nudges (informational framing, social norms messages, calorie labels) show effects close to zero (Maier et al., 2022). Administrative defaults produce measurable enrollment changes but shouldn’t be confused with behavior change, since the person hasn’t actively done anything differently.

The practical question, then, is not “how do I motivate this person to be different?” It’s “how do I redesign the environment so the desired behavior becomes the path of least resistance?”


Why Behavior Change Is So Hard

Most behavior change books skip this chapter. They go straight to the tips, the frameworks, the five-step plans. That’s like handing someone a map without mentioning that the terrain is mostly quicksand.

Here’s the thing: if behavior change were as simple as the bestsellers suggest, the self-help industry would have put itself out of business decades ago. It hasn’t, because changing behavior is genuinely, structurally difficult, and the reasons are more interesting (and more useful) than “you just need more willpower.”

Let me walk you through what actually gets in the way.

1. The Fade-Out Effect

You know that January motivation high? It has a half-life. And it’s shorter than you think.

Bailey et al. (2017) reviewed educational interventions and found that initial effects (d = 0.2–0.4) typically declined by 50% or more within 1–2 years. Protzko (2015) found that IQ gains from childhood interventions faded to near zero. Physical activity effects show the same decay pattern (Fjeldsoe et al., 2011).

This isn’t a failure of any particular program. It’s the base rate. The default trajectory for any behavior change intervention is: gains emerge, gains peak, gains erode.

The programs that actually maintain gains have one thing in common: the environmental change persists. Permanently smaller class sizes. Ongoing medication. Physically restructured environments that stay restructured. The change lives in the system, not in the person’s willpower reserves.

The implication is blunt: any intervention that relies on ongoing effort, motivation, or scaffolding will see gains erode when those supports are removed. This is not a failure of implementation. It is the expected outcome.

2. Effect Sizes Shrink from Lab to Field

DellaVigna and Linos (2022) documented one of the most important findings for anyone working in applied behavioral science, and one of the least discussed at conferences.

The same types of interventions that produce average effects of 8.7 percentage points in academic publications produce average effects of just 1.4 percentage points when implemented at scale by nudge units.

That’s an 8-fold reduction.

The reasons: publication bias (journals favor significant results), less controlled conditions in the real world, and implementation fidelity losses. Researchers test interventions on motivated samples in pristine conditions, then publish only the winners. Practitioners deploy those same ideas on messy populations in messy contexts, and the magic evaporates.

This doesn’t mean interventions are worthless at scale. A 1.4 percentage point effect applied to millions of people can produce large aggregate impact at near-zero marginal cost. But if you’re planning based on the published effect size, you’re budgeting with someone else’s money. Discount lab results substantially before projecting real-world impact.

3. Selection Masquerades as Transformation

Many apparent “behavior change successes” are actually selection effects in disguise. The people who join weight loss programs, exercise regimens, or smoking cessation programs are not random draws from the population. They’re more motivated, more likely to have social support, and may already have higher baseline capacity for change.

The Finnish Twin Cohort studies (Kujala et al., 1998; 2002) make this vivid. Population-level observational studies show large health benefits from regular exercise, hazard ratios of 0.5–0.7 versus sedentary individuals. Impressive. But look at discordant twin pairs (one twin exercises, the other doesn’t, shared genetics controlled for) and the mortality benefit is substantially attenuated. A large portion of the apparent “exercise benefit” reflects genetic confounding: the same factors that make people inclined to exercise also make them healthier.

Similarly, Macnamara, Hambrick, and Oswald (2014) found that deliberate practice explains only 26% of variance in games, 21% in music, 18% in sports, 4% in education, and less than 1% in professions. The “10,000 hours” narrative implies that practice is nearly everything. The data say it’s a piece, sometimes a small piece.

Before crediting an intervention, always ask: who got in, who stayed, and who dropped out? If you only measure the people who completed the program, you’re measuring the survivors, not the intervention.

4. Traits Are Stickier Than Interventions Assume

The cross-lagged panel model literature has delivered a critical methodological correction that most practitioners still haven’t absorbed.

Hamaker et al. (2015) demonstrated that traditional longitudinal models confound stable between-person differences with within-person change. When more appropriate models (like the Random Intercept Cross-Lagged Panel Model) are applied, many previously reported “causal” effects shrink dramatically or disappear (Orth et al., 2022).

What does that look like in practice? Take the claim that “building self-esteem reduces depression.” Sounds dynamic, sounds like something you can intervene on. But when you separate the stable component from the changing component, the picture shifts. People who have chronically higher self-esteem also tend to have chronically lower depression, not because one causes the other moment-to-moment, but because both reflect stable underlying dispositions.

The dynamic causal story is often a stable trait difference wearing a causal costume.

5. Counter-Dispositional Behavior Has Real Costs

This might be the most underappreciated finding in the entire behavior change literature: faking it has a hidden price tag.

Leikas and Ilmarinen (2017) found that acting more extraverted produced immediate positive affect but was followed by increased fatigue three hours later. Critically, this fatigue effect was not moderated by personality traits, introverts and extraverts were equally affected. The delayed cost of extraverted behavior appears to be universal, not specific to introverts. Jacques-Hamilton, Sun, and Smillie (2019) extended this in a week-long randomized controlled trial and found that dispositional introverts instructed to act extraverted additionally experienced increased negative affect and inauthenticity that partially or fully offset the mood gains.

The benefits are front-loaded. The costs are back-loaded. In the moment, counter-dispositional behavior feels productive. The bill arrives later.

This has profound implications for program design. Behavior change interventions that require sustained counter-dispositional behavior, asking introverts to network aggressively, asking low-conscientiousness individuals to maintain rigid schedules, asking high-openness people to follow rigid protocols, are fighting against the person’s natural energetic equilibrium. The further a target behavior sits from someone’s dispositional mean, the more effort required, the greater the accumulating costs, and the less likely the behavior is to stick.

6. Relapse Is Learning, Not Moral Failure

Here’s something most behavior change programs get completely wrong: they treat relapse as a sign that the intervention failed or the person is weak. Neither is true.

Old behavior traces are never fully erased. They’re suppressed. Your brain doesn’t delete the neural pathways for smoking when you quit. It builds new pathways that inhibit the old ones. But under stress, fatigue, or a context shift back to familiar territory, those old pathways reactivate. In learning science, this is called spontaneous recovery: the reappearance of a previously extinguished behavior when conditions change.

This is not a moral failure. It’s how learning works. Extinction doesn’t erase the original association; it creates a new, competing association. The old one is always there, waiting for the right trigger.

That’s why the alcoholic who’s been sober for five years walks into the bar where they used to drink and feels the pull. The context re-activates the suppressed behavior trace. It’s not weakness. It’s neuroscience.

The practical implication: design for relapse. Expect it. Build recovery protocols into the intervention from day one. The question isn’t “how do we prevent relapse?”. It’s “how do we make recovery from relapse fast and automatic?”

7. Chronic Conditions Require Chronic Intervention

Weight loss, addiction, depression, and most behavioral health challenges are better modeled as chronic conditions requiring ongoing management than as problems solvable by a time-limited intervention.

The evidence is consistent:

  • Weight loss: Average maintained loss drops to 3–4% at 4 years (Franz et al., 2007). GLP-1 agonists produce large losses but rapid regain when discontinued (Rubino et al., 2021).
  • Addiction: 1-year relapse rates are 40–60% for alcohol, 80–90% for opioids without medication, 70–80% for stimulants (McLellan et al., 2000).
  • Depression: 50%+ relapse within 2 years of treatment termination.

McLellan et al.’s key insight is that relapse rates for addiction are comparable to non-adherence rates for other chronic conditions (diabetes: 30–50%; hypertension: 50–70%). We don’t call a diabetic a failure for needing ongoing insulin. We shouldn’t call a person struggling with addiction a failure for needing ongoing support.

The evidence for “cure” via behavioral intervention is weak. The evidence for “management” via sustained support is much stronger. If your program has a graduation date, ask yourself what happens the day after.


Evidence Standards: How to Tell Signal from Story

Before you believe any behavior change claim (including mine) run these checks.

The central question is not “Did the metric move?” but “What caused the metric to move, and will it persist?” Most of what passes for evidence in behavior change wouldn’t survive a week of scrutiny in a well-run product analytics team.

Here’s how to separate signal from story.

The 5 Quick Checks (Pocket Card)

Keep these five questions loaded. Use them on every claim, every case study, every “we increased engagement by 40%” pitch:

  1. Selection: Who entered, who dropped out, and who remained?
  2. Counterfactual: Compared with what baseline or control?
  3. Time horizon: Did the effect hold after the novelty window?
  4. Transportability: Will this effect shrink in real-world deployment?
  5. Behavioral outcome: Did behavior change, or just attitudes and self-report?

If you can’t get clear answers to all five, the result is marketing, not evidence.

1. Separate Treatment Effects from Selection Effects

A gym reports that “members who attend 3x/week lose an average of 15 pounds.” But members who attend 3x/week are a self-selected group, they were probably more motivated and healthier to begin with.

Many intervention wins are partly compositional: higher-capacity people opt in, lower-capacity people churn out, and the post-period average rises even if treatment impact is modest.

At minimum, report:

  • Entry filter: Who qualified and who was excluded before treatment began?
  • Retention profile: Who stayed, who dropped, and how outcomes differ across those groups?
  • Intent-to-treat estimate: What is the effect when all assigned participants are counted?
  • Sensitivity checks: Do conclusions hold under plausible missing-data assumptions?

Selection check: Before crediting transformation, ask who got in, who stayed, and who left.

2. Separate Within-Person Change from Stable Between-Person Differences

A study claims that “increasing gratitude journaling improved well-being over 12 months.” But people who journal more might simply be the kind of people who have higher well-being to begin with.

Traditional cross-lagged models mix up two different things:

  • Between-person differences: Person A is usually higher/lower than Person B
  • Within-person change: The same person actually changed over time

Modern methods separate these two signals (Hamaker et al., 2015; Orth et al., 2022). When they are separated, many earlier “causal” claims shrink or vanish.

If your design cannot cleanly distinguish trait-like stability from within-person change, downgrade causal claims accordingly.

3. Apply Transportability Discounts from Lab to Field

A university study finds that a text-message nudge increased flu vaccination by 11 percentage points. A state health department deploys the same nudge and gets 1.2 percentage points.

Average effects shrink when interventions move from controlled contexts to real-world systems. DellaVigna and Linos (2022) found average nudge effects around 8.7 percentage points in academic publications versus 1.4 percentage points at scale.

Use a “field discount” when planning:

  • Budget impact on conservative, field-realistic effect sizes
  • Expect heterogeneity by subgroup and implementation quality
  • Treat pilot effects as upper bounds, not forecasts

4. Aggregate Behavior Before You Infer Traits or Capacity

A manager concludes an employee “isn’t detail-oriented” based on one sloppy report. But that employee was sleep-deprived that week and is meticulous 90% of the time.

Single observations are noisy. Aggregating behavior across occasions and contexts raises signal quality and often materially increases predictive validity (Epstein, 1979; Epstein, 1983).

That means:

  • Prefer repeated measures over one-time assessments
  • Evaluate trajectories, not point estimates
  • Avoid claims based on anecdotal snapshots

5. Report Measurement Like an Operator, Not a Marketer

“Our wellness program improved employee health” means nothing without knowing: whose health, measured how, over what period, compared to what.

For each primary outcome, specify:

  • Denominator: Who is in scope?
  • Time window: Over what exact interval?
  • Cohort logic: Which entry cohorts are compared?
  • Decision threshold: What effect size would change a decision?

If these are missing, the result may be persuasive, but it is not decision-grade.

6. Interpret Heritability Correctly

“Intelligence is 80% heritable” does NOT mean 80% of your intelligence is genetic. It means that 80% of the variation in intelligence across a population is associated with genetic differences, in that population, at that time.

Heritability is:

  • A population-level variance estimate, not an individual destiny score
  • Context-dependent (can vary by environment and cohort)
  • Compatible with meaningful environmental effects, especially when environments change opportunity structure (Tucker-Drob & Bates, 2016)

The right frame is constraints and affordances, not determinism.

7. Build Falsification Into the Intervention Plan

If you can’t say what would convince you the program isn’t working, you don’t have an intervention, you have a religion.

Before launching, define failure conditions:

  • What pattern would count as “no effect”?
  • What level of fade-out makes continuation unjustified?
  • What subgroup harm triggers redesign or shutdown?

Most behavior programs fail slowly because they never define failure upfront.

Minimum Reporting Standard

Element Required Question Why It Matters
Assignment logic How were people assigned/exposed? Prevents hidden selection bias
Attrition Who dropped and why? Detects survivorship distortion
Outcome definition What exact behavior changed? Avoids proxy theater
Time horizon Immediate, 3-month, 12-month effects? Captures durability vs novelty
Counterfactual Compared with what baseline/control? Anchors causal inference
Transportability Where should effect sizes be discounted? Improves real-world forecasting

The Matching Framework: Fit Before Force

Most behavior change asks: “How do we get people to do X?”

Better question: “What version of X fits who these people actually are?”

That question changes everything. It shifts you from forcing a predetermined behavior onto a population to designing behaviors that a population can actually sustain. And the evidence overwhelmingly says: the second approach wins.

The Core Thesis

Most behavior change frameworks start with a target behavior and then search for ways to push people toward it, through motivation, incentives, nudges, willpower training, you name it. The matching framework flips the sequence. It starts with the person (their dispositions, their context, their constraints) and asks what behavior fits.

This isn’t defeatism. It’s engineering. An engineer doesn’t curse gravity; they design for it. Matching does the same with human nature.

This approach is formalized in the Behavioral State Model, which decomposes the determinants of behavior into eight specific components: personality, perception, emotions, abilities, social status, motivations, physical environment, and social environment. Where frameworks like COM-B compress all of these into three broad categories (losing critical diagnostic resolution in the process), and B=MAP focuses on the mechanics of behavior initiation, the BSM is built around the question that drives this entire section: what is the fit between this specific person and this specific behavior? The model’s central concept, Behavior Market Fit, is the behavioral analogue of product-market fit. Just as a product succeeds when it meets a real need in a way the market actually wants, a behavior sustains when it meets a real goal in a way the person can actually perform. The eight-component decomposition exists to make that fit assessment precise rather than impressionistic.

The logic for the matching approach rests on five converging lines of evidence, and each one adds a brick.

Five Evidence Lines: Building the Case

1. Stable individual differences are real and consequential.

Personality traits are 40–60% heritable (Polderman et al., 2015), with multi-rater estimates reaching .66–.79 (Riemann et al., 1997). And heritability doesn’t decline with age, it increases. From about 41% at age 9 to 66% at age 17 for cognitive ability (Haworth et al., 2010), approaching .80 in adulthood (Bouchard, 2013).

Why? Because as people gain autonomy, they increasingly select environments that match their genetic predispositions. This amplifies individual differences rather than dampening them. The world doesn’t sand people down into sameness. It gives them room to become more themselves.

Rank-order personality stability plateaus at approximately .74 in middle adulthood (Roberts & DelVecchio, 2000). While modest mean-level changes occur (people become slightly more agreeable and conscientious with age) the rankings hold: the least conscientious person in a cohort at 20 tends to remain the least conscientious at 50 (Roberts, Walton, & Viechtbauer, 2006).

2. Environment amplifies more than it invents.

Turkheimer’s (2000) three laws of behavior genetics: (1) all human behavioral traits are heritable, (2) the effect of shared family environment is smaller than the effect of genes, (3) a substantial portion of behavioral variation is not accounted for by genes or families.

Plomin and Daniels (1987) showed that for most psychological traits, the environmental variance is overwhelmingly nonshared: unique to each individual, not common to family members. Kong et al. (2018) demonstrated “genetic nurture”: parental genes that aren’t transmitted to the child still influence outcomes through the environment parents create. Nature and nurture aren’t separable because genes shape the environments that shape behavior.

The practical implication: environments are enormously important, but they work by magnifying or suppressing existing tendencies rather than installing new ones from scratch. You can give everyone in a school the same curriculum, but the students who emerge from it will be more different from each other, not less.

3. Person-environment fit predicts outcomes with large effect sizes.

The Kristof-Brown, Zimmerman, and Johnson (2005) meta-analysis found that person-job fit correlated with job satisfaction at r = .56 and organizational commitment at r = .47. Person-organization fit predicted satisfaction at r = .44 and commitment at r = .51.

Read those numbers again. These effect sizes far exceed what most behavior change interventions achieve. And they come not from changing people, but from putting the right people in the right environments.

Vocational interests are as heritable as personality traits, as stable across adulthood, and predict occupational choice, persistence, and satisfaction with effect sizes as large or larger than cognitive ability (Rounds & Su, 2014). General mental ability tests predict job performance at r = .31–.51 across all job types (Schmidt & Hunter, 1998; Sackett et al., 2022).

The message is consistent: fit predicts outcomes better than forcing does.

4. Counter-dispositional behavior has real, measurable costs.

As documented above, acting against your disposition produces immediate benefits but delayed fatigue (Leikas & Ilmarinen, 2017) and, for introverts acting extraverted, reduced authenticity and increased negative affect (Jacques-Hamilton et al., 2019). Sustainability is inversely related to dispositional distance. The further you push from someone’s baseline, the higher the ongoing tax.

5. People naturally self-select matching environments.

Active gene-environment correlation (the tendency of people to seek environments that match their genetic predispositions) is the mechanism behind the Wilson Effect (increasing heritability with age). Left to their own devices, people naturally gravitate toward fit. The matching framework doesn’t fight this tendency; it accelerates and optimizes it.

When someone finds work they love, a social group that fits, a routine that feels natural, they didn’t change who they are. They found where who they are works well. Matching is the science of doing that deliberately rather than by accident.

Traits Create Ranges, Not Destinies

Let me be precise here, because this is where people get twitchy. Matching is not determinism. Heritability does not mean immutability.

What the evidence establishes is that genetic endowment sets a reaction range: a bounded zone of likely outcomes. Environment determines where within that range a person falls.

A person at the 20th percentile of extraversion may, with effort and the right context, function at the 35th percentile, but they will not become a 90th-percentile extravert. Interventions within the range are realistic. Interventions requiring someone to leave their range are not.

Fleeson’s (2001) Whole Trait Theory provides the nuanced framework. Using experience sampling, Fleeson showed that each person’s moment-to-moment behavior varies around a stable personal mean. An introvert can act extraverted at a party. They have real flexibility. But their average level of expressed extraversion is stable, and acting at the extremes of their distribution requires effort and carries costs (Fleeson & Jayawickreme, 2015).

Think of it as a rubber band. You can stretch it. But it has a resting state it keeps returning to, and the further you stretch, the more force it takes and the faster it snaps back.

Importantly, averages can move even when rank order stays fairly stable. Schooling reforms can raise average measured cognitive performance while relative differences remain similar (Brinch & Galloway, 2012; Roberts & DelVecchio, 2000). That’s exactly what a constraints-and-affordances model predicts: you can raise the floor and shift the whole distribution without changing who’s where within it.

This is a probabilistic, design-oriented view. Traits shift probabilities, not certainties. Opportunity structure can widen or narrow the range of viable outcomes. Chronic underperformance is often a system-person mismatch, not a character defect. If someone is consistently struggling, your first question should be about fit, not about effort.

Heterogeneity of Treatment Effects: Why Averages Lie

Here’s something that should change how you think about every behavior change study you’ve ever read: different people respond differently to the same intervention.

This sounds obvious when you say it out loud. But the entire behavior change field is built on reporting average treatment effects: a single number that summarizes what happened across all participants. That average hides enormous variation.

In any intervention study, some people improve dramatically. Some don’t change at all. Some actually get worse. The average effect is a fiction that describes no one in particular.

This is the heterogeneity of treatment effects problem, and it’s why so many evidence-based programs produce disappointing results in practice. The program worked, on average, in the study. But your population isn’t the study’s population, and the people you’re trying to reach may be precisely the ones for whom the average effect doesn’t apply.

Matching is designing for this heterogeneity. Instead of deploying one intervention and hoping the average holds, you segment your population, understand the variation, and design different paths for different people. It’s the difference between prescribing the same dose of the same drug to every patient and actually doing diagnostics first.

The average treatment effect is useful for journal publications. For practice, you need to know for whom and under what conditions the effect holds. That’s what matching gives you.

Situations vs. Traits: Correcting the Record

A common claim in behavioral science is that “situations are as powerful as traits” because both show correlations with behavior around r = .20–.30. This comparison gets repeated at conferences, in textbooks, and in TED talks. And it is misleading for three reasons.

1. Temporal scope: snapshot vs. lifetime.

Situational effects are measured immediately after a manipulation. You prime someone, you measure the effect, you write the paper. Trait effects operate across all situations in a person’s life, every day, every context, every year. A trait’s per-occasion effect may be small, but its cumulative effect over a lifetime is enormous. Comparing a trait correlation from a single observation to a situational effect from a single observation is like comparing a day’s interest on a savings account to a day’s return on a slot machine. The slot machine looks better on any given day. The savings account wins over 30 years (Funder & Ozer, 2019).

2. Aggregation: noise vs. signal.

Epstein (1979, 1983) showed that when behavior is aggregated across multiple occasions (averaging out situational noise) trait-behavior correlations rise to r = .40–.60, substantially higher than situational effects. Single behavioral observations are noisy. Averaged behavior reveals the true predictive power of traits. Comparing a single-occasion trait correlation to a single-occasion situation effect and calling them “equal” is a measurement artifact, not a scientific finding.

3. Durability: fade vs. persist.

Situational effects typically fade. That’s what the entire fade-out literature documents. Trait effects, by definition, persist. Even small trait effects (r = .10–.20) compound to very large cumulative consequences over a lifetime (Funder & Ozer, 2019).

One more thing most people miss: most adult life consists of weak situations: contexts that allow broad latitude for dispositional expression (Meyer, Dalal, & Hermida, 2010). Laboratory experiments overrepresent strong situations (controlled paradigms with clear demands), which makes situational effects look more powerful than they are in everyday life. You leave the lab, the manipulation disappears, and you go back to being you.

The Fit-First Sequence

In applied settings, matching works best when used as an ordered operating model:

  1. Problem Market Fit: Is the outcome a real, painful, high-priority problem for the target population? (If people don’t care about the problem, nothing else matters.)
  2. Behavior Market Fit: Is the target behavior something this population is naturally willing and able to do in context? (This is where most programs fail, they pick behaviors that look good on a whiteboard but require sustained counter-dispositional effort. And since most meaningful target behaviors are goal-directed rather than habitual, people will be consciously choosing to perform them every time. That makes fit the dominant factor in sustainability, not automation, not habit hacks, but whether the person can tolerate or enjoy the behavior over the long run.)
  3. Solution Market Fit: Does the intervention materially reduce friction and increase reward for that behavior? (Not “is the intervention clever?” but “does it actually make the right behavior easier?”)
  4. Product Market Fit: Does the behavior sustain under real market constraints, competition, cost, attention, churn? (The intervention that works in a funded pilot with a dedicated team doesn’t necessarily survive first contact with reality.)

The sequence matters: Problem → Behavior → Solution → Product. Most teams skip directly to solution design. They build an elegant app or a clever nudge for a behavior people were never likely to sustain. That’s not an intervention problem. It’s a fit problem.

A Practical Decision Rule

Use this quick test when selecting target behaviors:

  • If a behavior needs constant pressure to happen, it is probably low-fit.
  • If a behavior happens more reliably as friction drops, it is probably high-fit.
  • If output improves while effort stays stable or falls, fit is improving.

These are rough heuristics, not precision instruments. But they’ll save you from investing in behaviors that require permanent life support.

The Practical Framework: 8 Steps

Here’s how matching works in practice. This is the operating manual.

Step 1: Map the profile with enough fidelity.

Use validated measures of personality (Big Five), interests, cognitive ability, and values. When feasible, combine self-report with observer ratings to reduce measurement error. You don’t need a clinical assessment, you need a directionally accurate picture of who you’re designing for. If you’re working with a population rather than an individual, use representative sample data to understand the distribution.

Step 2: Run a Fit Dial audit.

Score five levers from 0 to 100. Think of each as a dial you can turn:

  • Role: Is the core task mix aligned with the person’s natural strengths? A high-extraversion person doing solo data entry all day has a Role dial set to about 15.
  • Reward: Are incentives reinforcing the desired behavior, or subsidizing the wrong one? If your bonus structure rewards individual performance but you need team collaboration, the Reward dial is off.
  • Rhythm: Does the work cadence match attentional and energy constraints? A deep-focus worker forced into 30-minute meeting blocks has a mismatched Rhythm.
  • Relationships: Do social dynamics amplify or suppress the target behavior? A shy person in a team of dominant personalities has a Relationships mismatch.
  • Rules: Do constraints create helpful structure or chronic friction? Some people thrive with clear procedures; others suffocate under them.

If effort keeps rising while output stalls, at least one dial is usually off.

Step 3: Estimate behavior-distance cost.

For each target behavior, estimate how far it sits from the person’s dispositional mean. Near-mean behaviors are cheap to sustain, they feel natural, require minimal self-regulation, and persist without scaffolding. Far-from-mean behaviors require ongoing external support, willpower, and accountability structures that must be permanently maintained.

A useful thought experiment: if you removed all external support tomorrow, would this behavior continue? If yes, it’s close to the mean. If no, you’re paying a distance tax.

Step 4: Generate multiple behavior paths.

For any outcome, there are usually several candidate behaviors that could get you there. Design multiple options and score each on four criteria:

  • Compelling: Does it connect to identity or immediate value? Will the person actually want to do this?
  • Simple: Is activation energy low enough for repeated execution? Can someone do this on their worst day?
  • Rewarding: Is reinforcement proximal enough to support repetition? Does it feel good soon enough?
  • Useful: Does it produce a meaningful downstream outcome? Does it actually move the needle on the problem?

Select behaviors with the strongest overall fit, not the ones that look most impressive on paper. The boring behavior that people actually do beats the exciting behavior that people abandon.

Step 5: Change context before trying to change personality.

Tune defaults, tooling, task design, collaboration patterns, and accountability structures so the desired behavior is easier than the undesired behavior. This is cheaper, faster, and more durable than personality change, because the context stays changed even when attention moves elsewhere.

Example: if you want a team to document their work, don’t run a “documentation culture” workshop. Embed documentation prompts into the tools they already use. Make documenting the path of least resistance.

Step 6: Use strong situations selectively.

In high-cost error domains (safety, compliance, finance) use tighter rules and immediate feedback. These are contexts where you want to override individual differences, because the cost of variance is too high.

In creative, exploratory, or innovation-oriented domains, preserve flexibility so beneficial variance can emerge. Not every context calls for constraint.

The operating principle: constrain where variance is dangerous, liberate where variance is valuable.

Step 7: Measure in aggregates over time.

Judge systems by repeated behavioral outcomes across contexts, not isolated wins or self-report. Use clear denominators, time windows, and cohort logic. Separate measured claims from illustrative examples.

A single data point is an anecdote. A trend line is evidence. Design your measurement to produce trend lines.

Step 8: Accept limits and re-route.

Some person-behavior combinations are too far apart to sustain. Recognize this early. Treat it as a design constraint, not a moral judgment: re-route behavior, redesign the role or context, or reduce reliance on that specific behavior.

The best practitioners I know are the ones willing to say: “This isn’t a motivation problem. This is a fit problem. Let’s find a different path to the same outcome.”


Behavior Change Across Domains

Theory is nice. Here’s what the evidence says in the messy real world.

Each domain below has its own research tradition, its own heroes, and its own favorite interventions. But the same patterns show up everywhere: small effect sizes, stubborn fade-out, selection effects masquerading as transformation, and the recurring dominance of environmental change over individual willpower.

Health

Weight loss is where the gap between popular expectation and scientific reality is widest. People expect to lose 25-30% of their body weight. The actual trajectory: behavioral interventions produce 5-9% loss at 6 months, declining to 3-4% net at 4 years (Franz et al., 2007). That’s remarkably consistent regardless of diet type.

GLP-1 agonists (semaglutide, tirzepatide) produce dramatically larger losses, 15-22% of body weight (Wilding et al., 2021; Jastreboff et al., 2022). But when the drug stops, the weight returns (Rubino et al., 2021). This isn’t a failure of the patient. It’s confirmation that obesity operates as a chronic condition requiring chronic management, not a behavior problem awaiting the right motivational speech.

Exercise interventions produce an average effect of d = 0.19 (Conn, Hafdahl, & Mehr, 2011). That’s roughly the difference between “not exercising” and “exercising slightly more.” It’s not the transformative overhaul promised by intervention brochures. And the Finnish Twin Cohort studies suggest that a substantial portion of the observational “exercise benefit” reflects genetic confounding and selection; people who exercise are already different from people who don’t in ways that precede the exercise itself (Kujala et al., 1998; 2002).

Smoking cessation is often cited as the behavior change success story. Look closer. For any individual smoker, single-attempt quit rates remain stubbornly low: roughly 3-5% unaided at 12 months, and 12-16% even with the best pharmacotherapy (Cahill et al., 2013). The dramatic population declines in smoking? Those are driven primarily by policy (taxes, advertising bans, smoke-free laws) and cohort replacement, not individual behavior change interventions.

Workplace and Organizations

Person-environment fit provides some of the strongest applied evidence for the matching framework. Kristof-Brown et al. (2005) found that person-job fit correlates with job satisfaction at r = .56, far higher than what most organizational behavior change interventions achieve. That single number tells you something important: getting the right person in the right role matters more than trying to reshape whoever happens to be there.

Behavior-Based Safety programs that use peer observation and behavioral feedback reduce workplace injuries by 50-75% (Sulzer-Azaroff & Austin, 2000). But notice why they work: the target behaviors are simple, observable, and repeated. The feedback is immediate. The environment stays changed. BBS is a success story for precisely the conditions most behavior change programs don’t have.

A critical distinction worth flagging: change management (Kotter, Prosci/ADKAR) focuses on organizational transitions and is largely a practitioner discipline with limited experimental evidence. Behavior change science specifies precise behaviors, their determinants, the techniques most likely to shift them, and measurable outcomes. Most change management initiatives specify desired outcomes (“improve collaboration”) without specifying the behaviors that constitute collaboration. This is why so many of them fail.

Products and Markets

For product teams, the core question isn’t “How do we increase engagement?” It’s “Are we asking for a behavior users are naturally willing and able to repeat?” This is the Behavior Market Fit test.

Well-known product pivots are often interpreted as growth hacks, but many are better explained as behavior-fit corrections. Instagram’s early pivot toward photo sharing is the canonical case: product growth accelerated when the required behavior became simpler, more rewarding, and more socially legible. Airbnb scaled by structuring exchanges around behaviors both hosts and guests could realistically perform.

I should be honest: these examples are illustrative, not randomized causal proof. We can’t run RCTs on product pivots. The broader point is robust, though: when your target behavior is low-fit, product optimization mostly tunes friction around a fundamentally weak loop. No amount of onboarding polish fixes a behavior people don’t want to do.

Public Policy

Public policy is where nudge theory built its reputation. The UK Behavioural Insights Team’s tax compliance trials added a social norms message to reminder letters and increased timely payment from roughly 67% to 83% (Hallsworth et al., 2017). Simplified FAFSA increased college enrollment by 8 percentage points (Bettinger et al., 2012). Auto-enrollment in retirement savings increased participation from roughly 50% to 90% (Madrian & Shea, 2001).

These numbers are real. But they deserve more scrutiny than they typically get.

First, the DellaVigna and Linos (2022) finding applies in full force: when the same types of interventions move from academic publications to real-world deployment at scale, effect sizes shrink by roughly 8x. The tax letter result is from a published trial. The question is what happens when you deploy it across every municipality with variable implementation quality and less motivated populations.

Second, the biggest “successes” (like auto-enrollment) aren’t behavior change. They’re administrative policy changes that produce enrollment outcomes without anyone consciously deciding anything. Simplification (like the FAFSA redesign) has more legitimacy as a behavior change tool because it actually reduces friction on a decision people are trying to make. The distinction matters: making it easier for someone to do something they’re already trying to do is fundamentally different from silently enrolling them in something they never thought about.

Third, social norms messages (the workhorse of nudge unit trials) show effects that collapse after publication bias correction (Maier et al., 2022). A social norms message about energy use fades as soon as the letter hits the recycling bin. The systemic infrastructure changes that actually move population-level behavior in policy contexts (tobacco taxes, smoke-free laws, building codes, zoning) are policy design, not nudges. They deserve the credit that nudges have been getting.

Education

The National Study of Learning Mindsets (Yeager et al., 2019) was a preregistered RCT with 12,490 students, one of the most rigorous tests of any psychological intervention in education. The result: a brief growth mindset intervention raised GPAs by 0.10 grade points for lower-achieving students, but only in schools with supportive peer norms. In unsupportive schools, the intervention had no effect.

A broader meta-analysis puts this in perspective: the mindset-achievement correlation is r = .10, and intervention effects average d = 0.08 (Sisk et al., 2018). Let me be direct. An effect of d = 0.08 is barely distinguishable from zero in practice. Believing you can improve is better than believing you can’t. But mindset is not a general-purpose lever for academic transformation.

The most robust findings in educational psychology, spaced retrieval practice and distributed practice, are about learning efficiency, not personality transformation (Dunlosky et al., 2013). They work because they align with how memory naturally functions. They are also, somewhat tragically, among the most underused techniques in actual classrooms.

Addiction

Here’s the uncomfortable truth: no single treatment works for a majority of individuals. The most effective combinations, pharmacotherapy plus behavioral therapy, achieve roughly 40-50% success at 1 year for most substances. Contingency management produces the largest psychosocial effects (d = 0.42), but primarily during the active reinforcement period (Prendergast et al., 2006). When the external contingencies stop, relapse rates climb.

McLellan et al. (2000) made the argument that should have reframed the entire field: addiction relapse rates (40-60% for alcohol, 70-90% for opioids without medication) are comparable to relapse rates for other chronic medical conditions (Type I diabetes non-adherence runs 30-50%, hypertension 50-70%, asthma 60-80%). The parallel is exact. Addiction should be treated as a chronic condition requiring ongoing management, not as a character flaw solvable by a 28-day program.

The most durable addiction treatment isn’t a behavior change technique at all. It’s medication-assisted treatment (buprenorphine, methadone, naltrexone) maintained indefinitely. Behavioral interventions are useful adjuncts, but they are adjuncts.


How to Design for Behavior Change

Based on the evidence reviewed above, this is a fit-first operating system for designing interventions that hold up outside controlled pilots. It works for products, programs, policies, and personal goals.

Quick Start (30-Minute Version)

You don’t have time to read the full 8-step system right now? Here’s the compressed version:

  1. Pick one concrete behavior. Not an outcome, not an attitude. A behavior.
  2. Run Identity / Capability / Context fit checks. If it fails on any, redesign the behavior before building anything.
  3. Remove one friction point and add one feedback loop.
  4. Track behavior weekly for 6 weeks. Not satisfaction. Behavior.
  5. Keep what compounds, drop what needs constant force.

That’s the 80/20 version. If you want the full system, keep reading.

Step 1: Work in the Right Sequence

Use this order every time:

  1. Problem: Is the outcome real, costly, and actively felt by the people you’re designing for?
  2. Behavior: What exact repeated actions produce that outcome?
  3. Solution: What intervention makes those actions easier and more rewarding?
  4. Product/System: Does this hold under real constraints (attention, cost, churn, staffing)?

Most teams jump straight to solutions. They design an app, a training program, a campaign, and then work backward to figure out what behavior it’s supposed to change. This is the fastest path to elegant but low-impact programs.

Step 2: Specify the Target Behavior Precisely

If you can’t observe it and count it, you can’t change it.

Define behavior in operational terms before choosing any tactics:

  • Actor: Who must do it?
  • Action: What exact action?
  • Context: Under what conditions?
  • Frequency: How often?
  • Quality threshold: What counts as success?

“Improve employee wellness” is not a behavior. “Each employee logs at least 3 physical activity sessions per week using the company portal” is a behavior. One of these you can measure and manage. The other generates nice slide decks and no outcomes.

Step 3: Run a Fit Diagnostic

Before designing any intervention, evaluate whether the proposed behavior is sustainable for your target population:

  • Identity Fit: Does this align with how people see themselves and what they value?
  • Capability Fit: Are skills, resources, and cognitive bandwidth sufficient?
  • Context Fit: Does the environment actually support execution?

Then audit the Fit Dial, five contextual factors that determine whether a behavior has a natural home:

  • Role: Does the behavior belong in this person’s role?
  • Reward: Does something reinforcing happen soon after the behavior?
  • Rhythm: Does it fit naturally into existing routines and time patterns?
  • Relationships: Do the people around the actor support the behavior?
  • Rules: Do formal and informal rules enable the behavior?

Low fit predicts low adoption, high fatigue, and fragile outcomes. In low-fit cases, your first move is to redesign the behavior or the context, not to add more motivation.

Step 4: Choose Levers by Failure Mode

Different failures require different levers. Diagnosing the failure mode correctly is half the battle.

Failure Mode Primary Lever Typical Techniques
Not starting Activation energy reduction Defaults, simplification, if-then plans (for discrete actions)
Starting but not sustaining Reinforcement + feedback loops Self-monitoring, progress feedback, social accountability
Reverting under stress Context redesign Friction management, environmental restructuring, cue redesign
Chronic low-fit behavior Behavior re-specification Task redesign, role redesign, alternative behavior paths

Start with environmental and structural levers. Add motivational levers only after fit and context are addressed. If you find yourself reaching for motivational strategies before you’ve fixed the environment, you’re probably solving the wrong problem.

Step 5: Design Strong vs. Flexible Situations Deliberately

Not every situation should be designed the same way.

Use strong situations where variation is expensive (safety, legal compliance, financial controls):

  • Clear rules
  • Immediate feedback
  • Low ambiguity
  • Consistent consequences

Use flexible situations where variation is productive (innovation, strategy, creative work):

  • Wider choice sets
  • Higher autonomy
  • Task-interest matching
  • Exploratory feedback loops

Most systems fail by being too loose where precision is needed and too rigid where adaptability is needed. A nuclear power plant should be a strong situation. A design studio should be a flexible one. A shocking number of organizations get this backward.

Step 6: Build Measurement Architecture Before Launch

You need instrumentation, not just an intervention. Minimum viable measurement:

  • Primary behavioral outcomes (not attitudes, not satisfaction surveys, but behavior)
  • Denominators, time windows, and cohort logic (who was eligible, over what period, in what groups)
  • Intent-to-treat tracking (include dropouts as failures, not as missing data)
  • Durability checkpoints (immediate, 6 weeks, 6 months, 12 months)

Add a simple operational Fit Score tracked weekly for 6+ weeks: if output rises while subjective effort drops or stays stable, fit is likely improving. If effort rises faster than output, the system is over-forcing.

Step 7: Pilot as a Learning System, Not a Victory Lap

Run short, disciplined cycles:

  1. Launch the smallest viable intervention
  2. Observe behavior trajectories and subgroup heterogeneity
  3. Keep components with durable lift
  4. Remove components that produce only novelty spikes

Define continuation criteria in advance. If you decide what counts as success after seeing the data, you are not piloting; you are storytelling.

Step 8: Plan Maintenance from Day One

Given everything we know about fade-out, assume maintenance is the hard part. Not the afterthought. The hard part.

  • Persistent environmental changes where possible (these are the most durable lever you have)
  • Ongoing support structures when the behavior is genuinely effortful
  • Transition plans from external control to autonomous motivation, when feasible, which is less often than people assume
  • Chronic-condition framing where the evidence supports ongoing management rather than one-off cure

For genuinely simple behaviors (drinking water, taking a pill), expect 66+ days for automaticity (Lally et al., 2010). For complex, goal-directed behaviors (exercise, diet management, skill development), don’t plan for automaticity at all, plan for sustained conscious engagement, which means the behavior must be tolerable and well-fitted enough to survive ongoing deliberate choice. If someone tells you their behavior change program works in two weeks, ask them what the 6-month data looks like. Then watch them change the subject.


Common Myths About Behavior Change

Myth: “It takes 21 days to form a habit”

The median time to habit automaticity is 66 days, with a range of 18 to 254 days (Lally et al., 2010). The 21-day figure traces to a plastic surgeon’s observations about patient self-image, not habit science. It has no experimental basis whatsoever.

Myth: “You can turn any behavior into a habit with enough repetition”

This is arguably the most consequential myth in all of self-improvement, because it shapes how millions of people approach behavior change, and sets most of them up for failure.

In neuroscience, a habit has a precise definition: a behavior that persists even when the outcome is devalued: when the reward is no longer desirable (Dickinson, 1985; Daw & O’Doherty, 2014). True habits are stimulus-response reflexes, controlled by the dorsolateral striatum, that run independently of your current goals. They emerge only after extensive repetition of simple behaviors in stable contexts (Tricomi, Balleine, & O’Doherty, 2009). Drinking a glass of water every morning can become a habit. Checking your phone when you sit down can become a habit. These are low-complexity, low-effort actions performed in consistent environments.

But exercising, writing, studying, eating a carefully managed diet, building a business, these are complex, effortful, goal-directed behaviors controlled by a different neural system entirely (dorsomedial striatum and prefrontal cortex). They require ongoing conscious evaluation: Is this still worth the effort? Am I doing it correctly? Should I adjust my approach? That ongoing evaluation is what makes them effective, and what makes them impossible to automate. You can build small habits around initiatory cues (laying out gym clothes, opening your laptop at the same time each day), but the core behavior will always require conscious engagement.

The popular habit literature has taken a real finding (that simple behaviors can become automatic) and extrapolated it to complex behaviors where it doesn’t apply. The result is that people blame themselves when “good habits” fail to form around behaviors that were never habit candidates to begin with. The failure isn’t theirs. The model was wrong. For complex behaviors, the question isn’t “how do I make this automatic?” It’s “have I picked a version of this behavior that I can consciously sustain?”

Myth: “Willpower is like a muscle that can be trained”

The ego depletion effect, the idea that self-control is a limited resource depleted by use, failed to replicate in a 23-laboratory Registered Replication Report (d = 0.04; Hagger et al., 2016). Bias-corrected meta-analysis found the true effect was not significantly different from zero (Carter et al., 2015). The foundational metaphor of the entire willpower industry doesn’t hold up.

Myth: “10,000 hours of practice makes anyone an expert”

Deliberate practice explains only 12% of performance variance on average, and less than 1% in professions (Macnamara et al., 2014). Individual differences in aptitude, personality, and cognitive ability account for far more. Practice matters, but it is nowhere near sufficient, and the original claim was always based on studying people who were already pre-selected for talent.

Myth: “Anyone can change anything with the right mindset”

Growth mindset interventions produce an average effect of d = 0.08 on academic achievement (Sisk et al., 2018). That’s statistically negligible for most individuals. Believing you can improve is better than believing you can’t. But mindset is not a master key that unlocks transformation. The gap between “mindset matters a little” and “mindset changes everything” is enormous, and the evidence lands firmly on the first side.

Myth: “Information changes behavior”

The information deficit model has been refuted across decades of research. Knowing that smoking causes cancer, that exercise is beneficial, or that saving for retirement is important does not, by itself, change behavior. If information were sufficient, no doctor would smoke and no financial advisor would carry credit card debt. The intention-action gap ensures that knowledge translates into action only partially, and often barely at all.

Myth: “Just fake it till you make it”

This one is especially sneaky because it seems to work at first. Acting counter to your disposition produces immediate mood gains. But the bill comes later: increased fatigue, reduced authenticity, and net negative outcomes for people whose dispositions are furthest from the target behavior (Jacques-Hamilton et al., 2019). The costs are back-loaded and invisible in momentary snapshots, which is exactly why this myth survives. The people who tell you it worked are the ones for whom the target behavior wasn’t that far from their disposition in the first place. For the people it fails (the deep introverts forced to “be more outgoing,” the conscientious plodders told to “think big”) the costs accumulate silently until they burn out.

Myth: “The environment determines behavior”

Environment matters enormously, but it interacts with stable individual differences. The shared family environment accounts for less than 10% of variance in adult personality (Plomin & Daniels, 1987). Heritability increases with age as people self-select environments matching their dispositions (Bouchard, 2013). Situations matter most when they are strong: clear, constrained. Most of adult life consists of weak, self-selected situations where traits dominate.

Myth: “Matching means giving up on growth”

Matching is not resignation. It’s leverage. It means selecting behavior paths that are sustainable for a specific person or population, then engineering context so progress compounds. A person who builds their career around their natural strengths isn’t “settling”; they’re positioning themselves where effort converts into results instead of being absorbed by friction. This is probabilistic optimization, not trait determinism.

Myth: “Selection is cheating”

In applied systems, selection isn’t a nuisance variable to ignore; it’s often the dominant mechanism. When a company hires well, retention improves and performance rises, and everyone credits the culture. When a therapy program attracts motivated clients, outcomes look great, and everyone credits the method. Good design treats selection transparently: define entry criteria, track attrition, and separate selection effects from treatment effects. Pretending selection doesn’t exist is how you build a program that only works on people who didn’t need it.


Frequently Asked Questions

Basics

What is the difference between behavior change and habit formation? Behavior change is the broad category: modifying any action or pattern. Habit formation is one narrow mechanism within it: a simple, repeated behavior in a stable context becoming automatic, meaning it fires in response to a cue regardless of whether you’re currently motivated or even thinking about it. Scientifically, a true habit is defined by the devaluation test: it persists even when the outcome is no longer desirable (Dickinson, 1985). Most behaviors people actually want to change (exercising, eating well, writing, studying, managing stress) are not habits and never will be. They’re goal-directed behaviors that require ongoing conscious engagement, evaluation, and effort. The popular conflation of “behavior I do regularly” with “habit” has caused enormous confusion. Regularity doesn’t make something a habit. A behavior you consciously choose to perform every day because you value the outcome is goal-directed, no matter how routine it feels.

Why do people find it so hard to change their behavior? Multiple factors stack against you. The intention-action gap means intentions explain only about 28% of behavioral variance. Counter-dispositional behavior burns energy and creates fatigue. The fade-out effect means intervention gains decay when supports are removed. Habitual responses override new intentions, especially under stress. And stable individual differences create real constraints on which behaviors are sustainable for which people.

How long does it actually take to change a behavior? It depends on the behavior’s complexity, the person, and the context. For simple daily habits, the median is 66 days to reach automaticity (Lally et al., 2010), with a range of 18 to 254 days. For complex behavioral patterns, much longer. And for changes requiring chronic counter-dispositional effort, “how long” may be the wrong question entirely, because some changes require ongoing environmental support rather than a finite formation period.

What is the most effective behavior change technique? There isn’t one. That question contains the same assumption this entire guide argues against: that a single technique works best for all people, all behaviors, and all contexts. The evidence on heterogeneity of treatment effects tells us the opposite. Different people respond differently to the same intervention, and the fit between the person, the behavior, and the technique determines what works. Self-monitoring shows up frequently as a component of effective interventions (d = 0.42 when combined with other self-regulation techniques; Michie et al., 2009), but it works by closing a feedback loop, which means it helps most when the barrier is a gap between intention and awareness of current behavior. If the barrier is capability, environmental structure, or person-behavior fit, self-monitoring alone won’t move the needle. Implementation intentions (d = 0.15-0.31 for health behaviors in domain-specific meta-analyses) can help bridge specific moments of inaction for one-time actions, but they are weak to null for repeated behaviors. The honest answer: diagnose the failure mode first, then match the technique to the barrier. “What’s the best tool?” depends entirely on what you’re building.

Science and Evidence

What is the intention-action gap? The well-documented disconnect between what people intend to do and what they actually do. About 47% of people who form behavioral intentions fail to act on them (Sheeran, 2002). Even when interventions successfully change intentions, behavior changes only partially. A medium-to-large intention shift (d = 0.66) produces only a small-to-medium behavior shift (d = 0.36; Webb & Sheeran, 2006). This is why motivation-only approaches consistently disappoint.

Do nudges actually work? The evidence is far weaker than the field advertises. The headline meta-analytic effect (d = 0.43; Mertens et al., 2022) collapses to d = 0.01-0.02 after correcting for publication bias (Maier et al., 2022). Academic estimates of nudge effects are roughly 8x larger than what nudge units achieve at scale (DellaVigna & Linos, 2022). Default-based “nudges” (like auto-enrollment) can produce large administrative outcomes, but these aren’t really behavior change: the person hasn’t actively done anything differently. Informational and framing nudges show effects close to zero after bias correction. This has been one of the most overhyped areas in all of social science.

Is willpower a limited resource? The original ego depletion theory has largely been undermined. The 23-lab Registered Replication Report found d = 0.04 (Hagger et al., 2016). Bias-corrected meta-analysis found the effect was not distinguishable from zero (Carter et al., 2015). Self-control performance may decline with sustained effort, but the mechanism is likely motivational and attentional rather than metabolic. You don’t run out of willpower like you run out of gas.

How heritable are personality traits? Twin studies consistently estimate personality heritability at 40-60% using self-report (Polderman et al., 2015), and .66-.79 using multi-rater designs (Riemann et al., 1997). Heritability sets a reaction range, not a destiny; environment determines where within the range you fall. But the range itself is a real constraint that no amount of motivation overcomes.

What is the replication crisis, and how does it affect behavior change? The Open Science Collaboration (2015) found that only 36% of 100 psychology studies replicated successfully, with effect sizes roughly halved. Several popular behavior change tools (ego depletion, certain priming effects, some nudge findings) have been affected. Trait-based findings (Big Five structure, heritability, stability) have proven far more replicable than situational manipulation effects. The lesson: be skeptical of any single study, especially one with a surprising or large effect.

Practical Applications

What’s the best way to start exercising consistently? First, a reframe: exercise is not a habit in the scientific sense. It’s a goal-directed behavior, effortful, conscious, and responsive to your current goals and values. It will never become automatic the way checking your phone or grabbing your keys does. Every session requires you to consciously choose to do something your body finds demanding. Accepting this upfront prevents the discouragement that comes from expecting it to “get easy” and wondering what’s wrong with you when it doesn’t.

With that said, here’s what the evidence supports: (1) Pick the right exercise for you, not what’s “optimal” on paper. Fit between you and the activity predicts persistence far better than any habit hack. If you hate running, no cue-routine-reward structure will make you a runner. Find movement you genuinely tolerate or enjoy. (2) Start small enough that initiation isn’t the bottleneck. (3) You can automate the initiatory cue, anchor the start to a consistent time, place, and routine so you don’t have to decide when each day. That micro-sequence can become habitual even though the exercise itself won’t. (4) Track it: self-monitoring closes the feedback loop between what you intend to do and what you actually do, and it consistently shows up as a component of effective exercise interventions. (5) Be honest with yourself about whether you actually like moving your body this way, because you’re going to be consciously choosing it every time you do it.

Why do diets fail? Most diets produce 5-9% weight loss at 6 months, with gradual regain to 3-4% at 4 years (Franz et al., 2007). This trajectory is remarkably consistent regardless of diet type. The intervention produces a temporary perturbation, not a permanent shift in weight regulation. GLP-1 agonists produce larger losses (15-22%) but weight returns when the drug stops (Rubino et al., 2021). Obesity is best modeled as a chronic condition requiring ongoing management, not a willpower problem with a motivational solution.

Can personality change? Modestly. Roberts et al. (2017) found an average personality change of d = 0.37 from intervention. But most studies lacked adequate controls and long-term follow-up, and changes largely occurred during active treatment, raising concerns about demand characteristics. Mean-level changes occur naturally with age (people become slightly more agreeable and conscientious), but rank-order stability is high. Your relative standing among peers tends to persist.

What is person-environment fit and why does it matter? Person-environment fit is the compatibility between your characteristics (personality, abilities, values, interests) and your environment (job, organization, relationships, lifestyle). Kristof-Brown et al. (2005) found that fit correlates with job satisfaction at r = .44-.56, substantially larger effects than most behavior change interventions achieve. Designing for fit produces better outcomes than trying to force people to adapt to mismatched environments. This is one of the most underused insights in all of applied behavioral science.

How do I know if I should try to change a behavior or change my environment? Ask: how far is this behavior from my natural dispositional tendencies? If the target behavior is within your natural range (a stretch, not a contortion) behavior change techniques can help. If it requires chronic counter-dispositional effort, consider changing your environment instead: your role, routines, relationships, tools, or incentive structures. Make the desired outcome emerge from behavior that feels natural, rather than grinding against your grain indefinitely.

Is it true that behavior change programs mostly benefit people who were already going to change? Partly, and this is a bigger deal than most practitioners admit. Selection effects are real. People who join programs are often more motivated, more socially supported, and more resourced than non-joiners. Intent-to-treat analysis (counting dropouts as failures) and within-family/twin designs help control this bias, but published effects can still overstate general-population impact (DellaVigna & Linos, 2022; Kujala et al., 2002).

What role does motivation play in behavior change? Motivation matters, but its quality matters more than its quantity. Self-Determination Theory shows that autonomous motivation (doing something because you genuinely value it) predicts sustained behavior change, while controlled motivation (external rewards, guilt) predicts short-term compliance but poor maintenance (Ryan & Deci, 2000). More importantly, context and fit often dominate outcomes. Raising motivation without addressing fit usually yields short-lived gains that look great in a quarterly report and evaporate by the next one.

How should organizations approach behavior change? Start with behavior selection, not intervention design. Identify the specific behaviors needed, then run a fit diagnostic: does this population have the capability, the environmental support, and the motivation to perform these behaviors sustainably? If not, which of those is the binding constraint? Then redesign the context before attempting to change the people. Use person-environment fit as an organizing principle, including in hiring and role design, where selection effects are a feature, not a bug. The Fit-First Sequence (Problem → Behavior → Solution → Product) applies to organizations just as it does to individuals. Most organizational behavior change programs fail because they skip straight to solution design without asking whether the target behavior is a good fit for the people who have to perform it.

What is Behavior Market Fit in practical terms? Behavior Market Fit means the target population is both willing and able to perform the target behavior repeatedly in real context. In practice, test candidate behaviors for identity alignment, capability requirements, friction, and reward timing. If repeated execution requires constant external force, you likely don’t have Behavior Market Fit yet, and no amount of product polish will create it.

What’s the difference between behavior change and behavioral design? Behavior change is the outcome: a measurable modification in what someone does. Behavioral design is the process of intentionally structuring environments, products, and experiences to make certain behaviors more likely. The best behavioral design works with human nature rather than against it, reducing friction for desired behaviors and leveraging natural tendencies rather than relying on willpower.

Are there behaviors that are essentially impossible to change? “Impossible” is too strong, but some person-behavior combinations are so far apart that sustained change is unrealistic without ongoing external support. The evidence shows that the further a target behavior is from someone’s dispositional mean, the higher the effort cost, the faster the fade-out, and the less likely long-term maintenance becomes. Acknowledging these limits isn’t defeatism; it’s the starting point for finding a version of the goal that actually fits.


References

Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211.

Armitage, C. J., & Conner, M. (2001). Efficacy of the theory of planned behaviour: A meta-analytic review. British Journal of Social Psychology, 40(4), 471–499.

Arshad, A., Anderson, B., & Sharif, A. (2019). Comparison of organ donation and transplantation rates between opt-out and opt-in systems. Kidney International, 95(6), 1453–1460.

Bailey, D. H., Duncan, G. J., Odgers, C. L., & Yu, W. (2017). Persistence and fadeout in the impacts of child and adolescent interventions. Journal of Research on Educational Effectiveness, 10(1), 7–39.

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215.

Bandura, A. (1986). Social Foundations of Thought and Action: A Social Cognitive Theory. Prentice-Hall.

Belanger-Gravel, A., Godin, G., & Amireault, S. (2013). A meta-analytic review of the effect of implementation intentions on physical activity. Health Psychology Review, 7(1), 23–54.

Bettinger, E. P., Long, B. T., Oreopoulos, P., & Sanbonmatsu, L. (2012). The role of application assistance and information in college decisions. Quarterly Journal of Economics, 127(3), 1205–1242.

Blumberg, M., & Pringle, C. D. (1982). The missing opportunity in organizational research: Some implications for a theory of work performance. Academy of Management Review, 7(4), 560–569.

Bouchard, T. J., Jr. (2013). The Wilson Effect: The increase in heritability of IQ with age. Twin Research and Human Genetics, 16(5), 923–930.

Brinch, C. N., & Galloway, T. A. (2012). Schooling in adolescence raises IQ scores. Proceedings of the National Academy of Sciences, 109(2), 425–430.

Cahill, K., Stevens, S., Perera, R., & Lancaster, T. (2013). Pharmacological interventions for smoking cessation: An overview and network meta-analysis. Cochrane Database of Systematic Reviews, (5).

Carter, E. C., Kofler, L. M., Forster, D. E., & McCullough, M. E. (2015). A series of meta-analytic tests of the depletion effect. Journal of Experimental Psychology: General, 144(4), 796–815.

Carrera, M., Royer, H., Stehr, M., Sydnor, J., & Taubinsky, D. (2018). The limits of simple implementation intentions: Evidence from a field experiment on making plans to exercise. Journal of Health Economics, 62, 95–104.

Carrero, I., Vila, I., & Redondo, R. (2019). What makes implementation intention interventions effective for promoting healthy eating behaviours? A meta-regression. Appetite, 140, 239–247.

Carver, C. S., & Scheier, M. F. (1982). Control theory: A useful conceptual framework for personality-social, clinical, and health psychology. Psychological Bulletin, 92(1), 111–135.

Conn, V. S., Hafdahl, A. R., & Mehr, D. R. (2011). Interventions to increase physical activity among healthy adults: Meta-analysis of outcomes. American Journal of Public Health, 101(4), 751–758.

Daw, N. D., & O’Doherty, J. P. (2014). Multiple systems for value learning. In P. W. Glimcher & E. Fehr (Eds.), Neuroeconomics: Decision Making and the Brain (2nd ed., pp. 393–410). Academic Press.

De Crescenzo, F., et al. (2018). Comparative efficacy and acceptability of psychosocial interventions for individuals with cocaine and amphetamine addiction. PLoS Medicine, 15(12), e1002715.

Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125(6), 627–668.

da Silva, M. A. V., São-João, T. M., Brizon, V. C., Franco, D. H., & Mialhe, F. L. (2018). Impact of implementation intentions on physical activity practice in adults: A systematic review and meta-analysis of randomized clinical trials. PLOS ONE, 13(11), e0206294.

Deci, E. L., & Ryan, R. M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Plenum Press.

Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 308(1135), 67–78.

DellaVigna, S., & Linos, E. (2022). RCTs to scale: Comprehensive evidence from two nudge units. Econometrica, 90(1), 81–116.

Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques. Psychological Science in the Public Interest, 14(1), 4–58.

Ebeling, F., & Lotz, S. (2015). Domestic uptake of green energy promoted by opt-out tariffs. Nature Climate Change, 5, 868–871.

Epstein, S. (1979). The stability of behavior: I. On predicting most of the people much of the time. Journal of Personality and Social Psychology, 37(7), 1097–1126.

Epstein, S. (1983). Aggregation and beyond: Some basic issues on the prediction of behavior. Journal of Personality, 51(3), 360–392.

Fogg, B. J. (2009). A behavior model for persuasive design. In Proceedings of the 4th International Conference on Persuasive Technology (Article 40). ACM.

Fogg, B. J. (2020). Tiny Habits: The Small Changes That Change Everything. Houghton Mifflin Harcourt.

Fjeldsoe, B., Neuhaus, M., Winkler, E., & Eakin, E. (2011). Systematic review of maintenance of behavior change following physical activity and dietary interventions. Health Psychology, 30(1), 99–109.

Fleeson, W. (2001). Toward a structure and process integrated view of personality: Traits as density distributions of states. Journal of Personality and Social Psychology, 80(6), 1011–1027.

Fleeson, W., & Jayawickreme, E. (2015). Whole Trait Theory. Journal of Research in Personality, 56, 82–92.

Franz, M. J., et al. (2007). Weight-loss outcomes: A systematic review and meta-analysis of weight-loss clinical trials with a minimum 1-year follow-up. Journal of the American Dietetic Association, 107(10), 1755–1767.

Funder, D. C., & Ozer, D. J. (2019). Evaluating effect sizes in personality and social psychology. Advances in Methods and Practices in Psychological Science, 2(2), 156–168.

Gollwitzer, P. M. (1999). Implementation intentions: Strong effects of simple plans. American Psychologist, 54(7), 493–503.

Gollwitzer, P. M., & Sheeran, P. (2006). Implementation intentions and goal achievement: A meta-analysis of effects and processes. Advances in Experimental Social Psychology, 38, 69–119.

Graybiel, A. M. (2008). Habits, rituals, and the evaluative brain. Annual Review of Neuroscience, 31, 359–387.

Hagger, M. S., et al. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573.

Hallsworth, M., List, J. A., Metcalfe, R. D., & Vlaev, I. (2017). The behavioralist as tax collector. Journal of Public Economics, 148, 14–31.

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. P. P. (2015). A critique of the cross-lagged panel model. Psychological Methods, 20(1), 102–116.

Harkin, B., Webb, T. L., Chang, B. P. I., Prestwich, A., Conner, M., Kellar, I., Benn, Y., & Sheeran, P. (2016). Does monitoring goal progress promote goal attainment? A meta-analysis of the experimental evidence. Psychological Bulletin, 142(2), 198–229.

Haworth, C. M. A., et al. (2010). The heritability of general cognitive ability increases linearly from childhood to young adulthood. Molecular Psychiatry, 15(11), 1112–1120.

Jacques-Hamilton, R., Sun, J., & Smillie, L. D. (2019). Costs and benefits of acting extraverted: A randomized controlled trial. Journal of Experimental Psychology: General, 148(9), 1538–1556.

Jastreboff, A. M., et al. (2022). Tirzepatide once weekly for the treatment of obesity (SURMOUNT-1). New England Journal of Medicine, 387(3), 205–216.

Kong, A., et al. (2018). The nature of nurture: Effects of parental genotypes. Science, 359(6374), 424–428.

Kristof-Brown, A. L., Zimmerman, R. D., & Johnson, E. C. (2005). Consequences of individuals’ fit at work. Personnel Psychology, 58(2), 281–342.

Kujala, U. M., Kaprio, J., Sarna, S., & Koskenvuo, M. (1998). Relationship of leisure-time physical activity and mortality: The Finnish twin cohort. International Journal of Sports Medicine, 19(1), 22–28.

Kujala, U. M., Kaprio, J., Sarna, S., & Koskenvuo, M. (2002). Relationship of leisure-time physical activity and mortality. American Journal of Epidemiology, 156(11), 985–992.

Lally, P., van Jaarsveld, C. H. M., Potts, H. W. W., & Wardle, J. (2010). How are habits formed: Modelling habit formation in the real world. European Journal of Social Psychology, 40(6), 998–1009.

Leikas, S., & Ilmarinen, V.-J. (2017). Happy now, tired later? Extraverted and conscientious behavior are related to immediate mood gains, but to later fatigue. Journal of Personality, 85(5), 603–615.

Lundahl, B. W., Kunz, C., Brownell, C., Tollefson, D., & Burke, B. L. (2010). A meta-analysis of motivational interviewing. Research on Social Work Practice, 20(2), 137–160.

Macnamara, B. N., Hambrick, D. Z., & Oswald, F. L. (2014). Deliberate practice and performance in music, games, sports, education, and professions: A meta-analysis. Psychological Bulletin, 140(4), 1608–1640.

Madrian, B. C., & Shea, D. F. (2001). The power of suggestion: Inertia in 401(k) participation and savings behavior. Quarterly Journal of Economics, 116(4), 1149–1187.

Maier, M., Bartoš, F., Stanley, T. D., Shanks, D. R., Harris, A. J. L., & Wagenmakers, E.-J. (2022). No evidence for nudging after adjusting for publication bias. Proceedings of the National Academy of Sciences, 119(31), e2200300119.

Maltz, M. (1960). Psycho-Cybernetics. Prentice-Hall.

Milkman, K. L., Beshears, J., Choi, J. J., Laibson, D., & Madrian, B. C. (2011). Using implementation intentions prompts to enhance influenza vaccination rates. Proceedings of the National Academy of Sciences, 108(26), 10415–10420.

Milkman, K. L., Gromet, D., Ho, H., Kay, J. S., Lee, T. W., Pandiloski, P., Park, Y., Rai, A., Bazerman, M., Beshears, J., Bonds, L., de la Vega, A., Dellavigna, S., Duckworth, A. L., et al. (2021). Megastudies improve the impact of applied behavioural science. Nature, 600, 478–483.

McEachan, R. R. C., Conner, M., Taylor, N. J., & Lawton, R. J. (2011). Prospective prediction of health-related behaviours with the Theory of Planned Behaviour: A meta-analysis. Health Psychology Review, 5(2), 97–144.

McLellan, A. T., Lewis, D. C., O’Brien, C. P., & Kleber, H. D. (2000). Drug dependence, a chronic medical illness. JAMA, 284(13), 1689–1695.

Mertens, S., Herberz, M., Hahnel, U. J. J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. Proceedings of the National Academy of Sciences, 119(1), e2107346118.

Meyer, R. D., Dalal, R. S., & Hermida, R. (2010). A review and synthesis of situational strength in the organizational sciences. Journal of Management, 36(1), 121–140.

Michie, S., Abraham, C., Whittington, C., McAteer, J., & Gupta, S. (2009). Effective techniques in healthy eating and physical activity interventions: A meta-regression. Health Psychology, 28(6), 690–701.

Michie, S., van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for characterising and designing behaviour change interventions. Implementation Science, 6, 42.

Michie, S., Richardson, M., Johnston, M., Abraham, C., Francis, J., Hardeman, W., Eccles, M. P., Cane, J., & Wood, C. E. (2013). The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: Building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine, 46(1), 81–95.

Moritz, S. E., Feltz, D. L., Fahrbach, K. R., & Mack, D. E. (2000). The relation of self-efficacy measures to sport performance: A meta-analytic review. Research Quarterly for Exercise and Sport, 71(3), 280–294.

Ng, J. Y. Y., et al. (2012). Self-determination theory applied to health contexts: A meta-analysis. Perspectives on Psychological Science, 7(4), 325–340.

Nickerson, D. W., & Rogers, T. (2010). Do you have a voting plan? Implementation intentions, voter turnout, and organic plan making. Psychological Science, 21(2), 194–199.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Orth, U., Clark, D. A., Donnellan, M. B., & Robins, R. W. (2022). Testing prospective effects in longitudinal research: New guidelines for the cross-lagged panel model. Psychological Methods, 27(5), 771–792.

Plomin, R., & Daniels, D. (1987). Why are children in the same family so different from one another? Behavioral and Brain Sciences, 10(1), 1–16.

Polderman, T. J. C., et al. (2015). Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nature Genetics, 47(7), 702–709.

Prendergast, M., Podus, D., Finney, J., Greenwell, L., & Roll, J. (2006). Contingency management for treatment of substance use disorders: A meta-analysis. Addiction, 101(11), 1546–1560.

Protzko, J. (2015). The environment in raising early intelligence: A meta-analysis of the fadeout effect. Intelligence, 53, 202–210.

Richard, F. D., Bond, C. F., Jr., & Stokes-Zoota, J. J. (2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7(4), 331–363.

Riemann, R., Angleitner, A., & Strelau, J. (1997). Genetic and environmental influences on personality. Journal of Personality, 65(3), 449–475.

Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age. Psychological Bulletin, 126(1), 3–25.

Roberts, B. W., Luo, J., Briley, D. A., Chow, P. I., Su, R., & Hill, P. L. (2017). A systematic review of personality trait change through intervention. Psychological Bulletin, 143(2), 117–141.

Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life course. Psychological Bulletin, 132(1), 1–25.

Robb, K. A., Young, B., Murphy, M. K., Duklas, P., McConnachie, A., Hollands, G. J., McCowan, C., Macdonald, S., O’Carroll, R. E., O’Connor, R. C., & Steele, R. J. C. (2025). Behavioural interventions to increase uptake of FIT colorectal screening in Scotland (TEMPO): A nationwide, eight-arm, factorial, randomised controlled trial. The Lancet, 405(10484), 1081–1092.

Rounds, J., & Su, R. (2014). The nature and power of interests. Current Directions in Psychological Science, 23(2), 98–103.

Rubino, D., et al. (2021). Effect of continued weekly subcutaneous semaglutide vs placebo on weight loss maintenance (STEP 4). JAMA, 325(14), 1414–1425.

Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68–78.

Sackett, P. R., Zhang, C., Berry, C. M., & Lievens, F. (2022). Revisiting meta-analytic estimates of validity in personnel selection. Journal of Applied Psychology, 107(12), 2040–2068.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.

Sheeran, P. (2002). Intention-behavior relations: A conceptual and empirical review. European Review of Social Psychology, 12(1), 1–36.

Sheeran, P., et al. (2016). The impact of changing attitudes, norms, and self-efficacy on health-related intentions and behavior: A meta-analysis. Health Psychology, 35(11), 1178–1188.

Sheeran, P., & Webb, T. L. (2016). The intention-behavior gap. Social and Personality Psychology Compass, 10(9), 503–518.

Sheeran, P., Listrom, O., & Gollwitzer, P. M. (2024). The when and how of planning: Meta-analysis of the scope and components of implementation intentions in 642 tests. European Review of Social Psychology, 36(1), 162–194.

Sisk, V. F., Burgoyne, A. P., Sun, J., Butler, J. L., & Macnamara, B. N. (2018). To what extent and under what circumstances are growth mind-sets important to academic achievement? Psychological Science, 29(4), 549–571.

Stajkovic, A. D., & Luthans, F. (1998). Self-efficacy and work-related performance: A meta-analysis. Psychological Bulletin, 124(2), 240–261.

Steinmetz, H., Knappstein, M., Ajzen, I., Schmidt, P., & Kabst, R. (2016). How effective are behavior change interventions based on the theory of planned behavior? Zeitschrift für Psychologie, 224(3), 216–233.

Sulzer-Azaroff, B., & Austin, J. (2000). Does BBS work? Professional Safety, 45(7), 19–24.

Tannenbaum, M. B., Hepler, J., Zimmerman, R. S., Saul, L., Jacobs, S., Wilson, K., & Albarracin, D. (2015). Appealing to fear: A meta-analysis of fear appeal effectiveness and theories. Psychological Bulletin, 141(6), 1178–1204.

Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press.

Triandis, H. C. (1977). Interpersonal Behavior. Brooks/Cole.

Tricomi, E., Balleine, B. W., & O’Doherty, J. P. (2009). A specific role for posterior dorsolateral striatum in human habit learning. European Journal of Neuroscience, 29(11), 2225–2232.

Tucker-Drob, E. M., & Bates, T. C. (2016). Large cross-national differences in gene × socioeconomic status interaction on intelligence. Psychological Science, 27(2), 138–149.

Turkheimer, E. (2000). Three laws of behavior genetics and what they mean. Current Directions in Psychological Science, 9(5), 160–164.

Verplanken, B., & Wood, W. (2006). Interventions to break and create consumer habits. Journal of Public Policy & Marketing, 25(1), 90–103.

Wang, G., Wang, Y., & Gai, X. (2021). A meta-analysis of the effects of mental contrasting with implementation intentions on goal attainment. Frontiers in Psychology, 12, 565202.

Webb, T. L., & Sheeran, P. (2006). Does changing behavioral intentions engender behavior change? A meta-analysis. Psychological Bulletin, 132(2), 249–268.

Wilding, S., Tsipa, A., Branley-Bell, D., Greenwood, D. C., Vargas-Palacios, A., Yaziji, N., Addison, C., Kelly, P., Day, F., Horsfall, K., Conner, M., & O’Connor, D. B. (2020). Cluster randomized controlled trial of volitional and motivational interventions to improve bowel cancer screening uptake: A population-level study. Social Science & Medicine, 265, 113331.

Wilding, J. P. H., et al. (2021). Once-weekly semaglutide in adults with overweight or obesity (STEP 1). New England Journal of Medicine, 384(11), 989–1002.

Wood, W., & Neal, D. T. (2007). A new look at habits and the habit-goal interface. Psychological Review, 114(4), 843–863.

Wood, W., Tam, L., & Witt, M. G. (2005). Changing circumstances, disrupting habits. Journal of Personality and Social Psychology, 88(6), 918–933.

Yeager, D. S., et al. (2019). A national experiment reveals where a growth mindset improves achievement. Nature, 573, 364–369.

Yin, H. H., & Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nature Reviews Neuroscience, 7(6), 464–476.