Skip to main content

RE: From Hindsight to Foresight: How Evaluation Can Become Future-Informed

Stephanie Jill Hodge

United States of America

Stephanie Jill Hodge

Posted on 29/04/2026

From Hindsight to Foresight: How Evaluation Becomes Useful Again
Reflections from working inside systems that don’t hold still long enough to be measured

There is a quiet tension at the centre of most evaluation work, and if you’ve spent any time inside complex environmental or climate programmes, you feel it almost immediately. We are trained—carefully, rigorously—to look backward. To assess what was delivered, what worked, what didn’t, and whether it aligned with what was originally promised. And yet the systems we are working in—food systems, climate adaptation, biodiversity governance, circular economies—are not standing still long enough for that backward glance to remain relevant.

In my own work across Global Environment Facility-linked portfolios and parallel systems, I have watched this tension play out repeatedly. The formal questions—relevance, effectiveness, efficiency, sustainability—remain constant. But the world they are meant to interpret keeps shifting beneath them.

Take the United Nations Environment Programme / GEF ISLANDS programme across fourteen Pacific SIDS. On paper, it was doing what it said it would do. Infrastructure was being installed—systems for managing persistent organic pollutants, mercury, e-waste, used oil. Policies were drafted. Coordination mechanisms were established. If you stayed within the traditional frame, the evaluation could confirm delivery.

But when you step back and look at the system as it actually operates, the question changes. It becomes less about whether outputs were delivered, and more about whether the system being built will still function when the conditions change—which they inevitably will. Tourism fluctuates. Fiscal space contracts. Climate shocks disrupt infrastructure and supply chains. In that context, the real question decision-makers had was not “did this work?” but “will this hold?”

That question sits outside retrospective evaluation unless you deliberately bring it in.

The same pattern appeared in the circular economy work with the Asian Development Bank and GEF across Southeast Asia. We were looking at Extended Producer Responsibility-type systems—policy readiness, institutional arrangements, pilot implementation. All the right ingredients. But again, the fragility was not in the design. It was in the future conditions under which that design would have to operate. Commodity price fluctuations, regulatory enforcement cycles, political turnover—these are not edge cases. They are the operating environment. And yet they rarely sit at the centre of evaluation design.

Even in earlier GEF project design and evaluation work, including PIF-level advisory aligned with the Food and Agriculture Organization, the Theory of Change followed a familiar and comforting logic: outputs lead to capacity, capacity leads to improved management, improved management leads to environmental outcomes. It is clean. It is logical. It is also, in most cases, incomplete. Because it assumes a relatively stable enabling environment. It does not ask, in any structured way, under what future conditions that chain holds—and where it breaks.

Across all of this work, the same limitation keeps surfacing. Retrospective evaluation is very good at validating what has been delivered. It is much weaker at assessing what will endure. And it is largely silent on what is about to fail.

This is where foresight comes in—not as an abstract add-on, but as a practical necessity. And in my experience, it becomes most powerful not at the end of a programme, but in the middle of it—during course correction, during mid-term reviews, in those messy, uncomfortable moments where systems are clearly not behaving as expected but have not yet fully failed.

That is where I now do most of this work.

In a mid-term review, the temptation is always to stabilise the narrative. To explain variance. To adjust ratings. To recommend incremental fixes. But if you treat a mid-term review as a static checkpoint, you miss its real value. A mid-term review is the last credible moment to change direction before a programme locks itself into its own logic.

So the way I approach it is different.

I start by mapping the system not as a logframe, but as a pathway. Evidence to decision. Decision to pipeline. Pipeline to finance. Finance to implementation. Implementation to outcomes. And then I ask a simple question at each step: where is this moving, and where is it stuck?

Not in theory. In practice.

Where are decisions not being taken, even though evidence exists? Where are project concepts sitting without moving into investment-ready pipelines? Where is finance not flowing, even though priorities are clear? Where is implementation breaking down because legitimacy—particularly at the community level—is not secured?

This is not traditional evaluation terrain. But it is where programmes actually succeed or fail.

Once you see the system this way, foresight enters naturally. Because the next question is not “what has happened?” but “what happens next if nothing changes?” And then, “what happens under different plausible futures?”

In practical terms, that means stress-testing the system. Not through elaborate models, but through structured questioning. What happens to this financing model if public budgets contract? What happens to this delivery mechanism under extreme weather disruption? What happens to this policy if enforcement weakens after political change? You do not need perfect scenarios. You need plausible ones.

And then you bring that back into the evaluation.

Recommendations stop being generic—“strengthen capacity,” “improve coordination”—and become directional. Shift this part of the pipeline because it will not hold under foreseeable conditions. Rebalance this financing structure because it is too exposed to a single risk. Invest in this relationship or legitimacy mechanism because without it, implementation will stall regardless of technical design.

In other words, evaluation becomes less about judging the past and more about redirecting the future.

This is exactly the space I am working in now in the PNG Country Package context. Here, the issue is not a lack of activity. It is that movement along the system is uneven and often invisible. Decisions do not consistently translate into pipelines. Pipelines do not consistently translate into finance. Finance does not consistently translate into implementation at scale. And underlying all of it is a critical factor that traditional evaluation often underplays: legitimacy, particularly in a context where customary land ownership defines what is possible.

If you look at this through a retrospective lens, you will produce a perfectly reasonable evaluation that does very little to change outcomes. If you look at it through a forward lens—tracking where the system is likely to stall next—you begin to see where intervention actually matters.

This is not about abandoning the OECD-DAC criteria. It is about stretching them. Relevance becomes prospective—will this remain relevant under plausible futures? Sustainability becomes conditional—under what conditions does this hold? Effectiveness becomes dynamic—not just whether outcomes were achieved, but whether the system is capable of continuing to produce them.

And perhaps most importantly, evaluation shifts function. It stops being primarily a reporting mechanism and becomes a decision-support tool.

That sounds like a small shift. It is not. It requires evaluators to be more explicit about uncertainty, more engaged with system dynamics, and more willing to step slightly outside the comfort zone of purely retrospective judgment. It also requires institutions to accept that the most useful evaluation is not always the most certain one.

If I am honest, many of the most valuable insights in my work have come from the moments where we did exactly that—where we stopped asking “what was?” and started asking “what if?” Where we followed the system forward instead of backward. Where we treated uncertainty not as something to be minimised, but as something to be worked with.

That is where evaluation becomes useful again.

Because in a world that is no longer stable—and is not going to be—the question is not whether we can perfectly understand the past.

It is whether we can act, intelligently and in time, in the face of what comes next.