Ramesh Arvind Naagarajan

Our ICML 2026 paper

2026-05-05T00:00:00+00:00

Quick note that our paper, Hierarchical Causal Abduction: A Foundation Framework for Explainable Model Predictive Control, has been accepted at ICML 2026. The conference is in Seoul, at COEX, from the 6th to the 11th of July 2026. The paper is joint work with Zühal Wagner and my supervisor, Prof. Dr. Stefan Streif, at the Professorship of Automatic Control and System Dynamics, TU Chemnitz.

The figure above is the one-page version of the paper. The black-box MPC controller on the top left is the problem, the operators next to it are the audience, and the three panels in the middle are the framework. Panel one is the physics and domain knowledge encoded as a graph. Panel two is the optimiser’s own KKT structure used as evidence about which constraints actually drove the action. Panel three is the temporal causal graph learned from recent data with PCMCI. The hierarchical reasoning engine in the centre is the part that combines the three streams into a single auditable chain. The bottom row is what comes out the other side, a 53 percent improvement over LIME, validation across three domains, and an expert clarity score of 4.3 out of 5. The before-and-after on the right is the part that matters most to me, an operator going from “why did heating activate” to a forward-looking explanation about preventing a constraint violation two hours from now.

ICML 2026 received 23,918 submissions and accepted 6,352 of them, which puts the acceptance rate around 26.6 percent. I am genuinely thankful that this work made it through. The community of reviewers around explainable control at top ML venues is small and rigorous, and getting useful feedback there is itself a privilege.

The short version of the paper. We propose a framework that builds explanations for model predictive control by combining three things, a physics knowledge graph that captures what the operator already knows, the KKT structure of the controller’s own optimisation as evidence about which constraints actually drove the action, and a temporal causal graph learned from recent operating data. A hierarchical causal abduction procedure puts the three together and produces explanations that are auditable at three different levels, the binding constraint, the physical mechanism, and the data-grounded check. We test it on greenhouse climate, building HVAC, and chemical process control, and we improve explanation accuracy by 53 percent over LIME on our benchmark.

There is a longer companion post on this blog that walks through the framework section by section, including the failure modes the design was meant to block and the limitations we are still working on. If you came here for the technical version, that one is the right next click.

I will be at ICML in Seoul. If you work on explainable control, causality, mechanistic interpretability of domain-adapted models, or trustworthy AI for safety-critical systems, please come and find me. Conferences are mostly hallway conversations for me, and the hallway is where the most useful arguments tend to happen.

Reading constraints inside a controller’s language model

2026-04-15T00:00:00+00:00

The earlier post on this blog, the mechanistic interpretability primer, ended on a promise. I said the techniques transfer cleanly to any transformer, including ones that have been adapted to reason about constrained control problems, and that I would write up what we actually find when we run the toolkit on such a model. This is that write-up. It is deliberately concrete and a little narrow, because the most useful thing I can offer here is a worked example, not a survey.

The setup, briefly. We take an open-weights base language model, fine-tune it on a curated corpus of control-theory papers, MPC textbooks, and worked examples of constrained optimisation, and use the resulting model as a reasoning interface inside the explainer described in our ICML 2026 paper. The model never replaces the optimiser. It reads the optimiser’s artefacts, like the active constraints and the multipliers, and turns them into language an operator can interrogate. The question of this post is what is actually inside that model when it does that job.

Three questions worth asking

The mechinterp toolkit is most useful when the question you bring to it is precise. Vague questions like “is the model interpretable” rarely produce useful answers. The three questions I keep coming back to in our setting are these.

Where does the model represent the notion of a constraint being binding. This is a binary distinction the optimiser already produces, and we want to know whether the language model has internalised it as a recognisable structure or whether it is reasoning about active constraints case by case. If there is a clean binding-or-not feature, we can use it as a probe everywhere we need to. If there is not, we are doing something more like prompt engineering than interpretation.

Where does the model represent the prediction horizon. MPC is fundamentally about acting now to prevent something later, and a faithful explanation of an MPC action almost always reaches across time. We want to know whether the language model carries any internal sense of “this is a stage-k consideration” or whether everything collapses into one undifferentiated “future.”

Where does the model store the difference between an objective term and a constraint. From the operator’s point of view, these are the same kind of thing, a number you push around. From the optimiser’s point of view, they are very different. A faithful explainer needs to keep them distinct, and we want to know whether the language model is doing that automatically or only when explicitly prompted.

What we find, in plain terms

For the first question, the answer is mostly yes. Sparse autoencoders trained on the residual stream at middle layers surface features that fire on tokens describing binding constraints across very different surface forms. The same feature lights up on phrases like “the cooling capacity limit is hit,” “the ventilation valve is saturated,” and the more textbook-style “the inequality multiplier is positive.” Ablating this feature degrades the explanations in a specific way, the model still names the constraint correctly but loses the language of it being binding. That pattern of degradation is the kind of evidence the field treats as compelling.

For the second question, the answer is a soft yes. There is a cluster of features that correlate with stage indices in the prediction horizon, but the cluster is smeared and the features are not as monosemantic as the binding-constraint feature. The model has a notion of “now versus later,” but it does not seem to have crisp internal coordinates for stage-1, stage-2, stage-3, and so on. This is roughly what I expected, and it is consistent with the model having seen many descriptions of horizons but few examples of explicit per-stage reasoning.

For the third question, the answer is the most interesting one, because the answer is “kind of, and the failures are diagnostic.” There are features that distinguish objective contributions from constraint contributions, but they are layer-specific. In early layers the distinction is clean. In middle layers it gets confused, especially on prompts where the user has phrased a soft constraint as a cost penalty. In late layers it cleans up again, but only after the model has used the surrounding context to disambiguate. That layered trajectory is itself a finding. It tells us where in the network the real disambiguation happens, which tells us where to look first when an explanation goes wrong.

What this is good for

None of the above is meant to be a self-contained scientific result. The point is methodological. If you treat a domain-adapted language model as a black box that “either explains things well or it does not,” you have nothing to do when it fails except retrain it. If you treat it as a network with features and circuits that you can name, probe, and ablate, every failure becomes a localisable bug. The binding-constraint feature is robust. The horizon features are smeared. The objective-versus-constraint distinction is layer-dependent. Each of those statements is actionable in a way “the model sometimes hallucinates” is not.

The other thing this work is good for, and the reason it is on this blog rather than in a paper yet, is that it gives a concrete answer to a question I get a lot. The question is whether mechanistic interpretability is “ready for engineering.” My honest answer has been “yes for transformers, with caveats, and only if you ask precise questions.” The work above is the version of that answer with receipts. Sparse autoencoders, residual-stream probes, and targeted ablations are not magic, but they are tractable, and on a model that has been trained on a domain you understand, they tell you things you can use.

What is next

The next step is to use the binding-constraint feature as a hard signal inside the explainer itself, rather than as a diagnostic on the side. If the language model has a clean internal indicator of when a constraint is binding, we should be able to lift that indicator out and use it as a verification check on the explanations the model produces, in addition to the KKT-based check the optimiser already gives us. Two independent signals on the same property is the kind of redundancy that turns a research prototype into something a practitioner can trust.

I will write that up when we have the numbers. The mechinterp primer remains the right starting point if you are coming to this fresh, and the paper-walkthrough post is the right next read if you want to see how this language-model interpretability work fits into the larger framework.

Evaluating explanations: why LIME and SHAP are not enough

2025-12-19T00:00:00+00:00

LIME and SHAP have done the field a great service. They made explainability operational. Before them, “the model is interpretable” was a vibe. After them, it was a number you could put in a paper. That was real progress.

The cost of that progress is that we now treat the number as the goal. A SHAP value is a measure of marginal contribution under a specific game-theoretic assumption. It is one slice of what an explanation could mean. Used as the only slice, it misleads.

Three axes are missing from the SHAP-shaped picture, and any serious evaluation of an explanation system has to address all three.

Faithfulness. Does the explanation describe what the model actually did, or does it describe what a simpler surrogate model would have done in its place? LIME explicitly fits a local linear surrogate, which is honest but limits the faithfulness ceiling. SHAP estimates contributions under feature-coalition reasoning, which makes specific assumptions about feature independence that are routinely violated. For a deployed control system the right faithfulness test is operational, not statistical. Take the explanation, perturb the input in the way the explanation says matters, and check that the model’s output changes the way the explanation predicts. If it does not, the explanation is not faithful, regardless of what its SHAP values say.

Stability. Does a small change in input produce a small change in explanation? In safety-critical settings, an explanation that flips between contradictory stories on neighbouring inputs is dangerous. It trains operators to ignore the explanation entirely. Stability is testable. Sample neighbouring inputs, generate explanations, and measure the distance between explanations under a sensible metric. If the distance is high while the model output barely moved, the explanation system is unstable, and you have a problem.

Operator utility. Does the explanation actually help a human do their job better, or does it just feel informative. This is the test that gets skipped most often, because it is the most expensive. It needs a study with real operators, real tasks, and a measurable outcome. Decision time. Override rate. Detection of induced faults. The literature on this is thin and mostly comes from medical AI. Control systems need their own version of this work, and not enough of it exists yet.

The methods we developed in our 2025 papers were tested on the first two axes during paper review. The third axis, operator utility, is where I want the next chunk of work to live. It is harder, slower, and less publishable per unit time. It is also the only axis that matters when the system is actually running in a glasshouse with a real grower making real decisions.

A short note on tooling. SHAP is not the enemy. I still use it as one diagnostic among several, and I would still default to it for tabular models in low-stakes settings. The mistake is to treat it as the only diagnostic, the way too many papers do. The right stance is closer to how a control engineer thinks about Bode plots: useful, well understood, decisive in some questions, silent on others. You would not certify a controller on a Bode plot alone. You should not certify an explanation system on SHAP alone.

What I would like to see in the next year of explainability papers, in roughly priority order. More operator-in-the-loop studies. More stability analyses, especially in time-series and control settings. More work on explanation methods that are faithful by construction, like our optimiser-grounded approach, instead of faithful by post-hoc approximation. And less benchmarking against MNIST.

The benchmark for an explanation is whether it changes a human decision in the right direction. Everything else is a proxy.

Symbolic constraints, optimisation, and what LLMs miss

2025-11-28T00:00:00+00:00

Ask a modern frontier model to state the Karush-Kuhn-Tucker conditions for a constrained optimisation problem. It will give you a clean answer. Stationarity, primal feasibility, dual feasibility, complementary slackness. It can recite the textbook. Ask it to identify the active set in a small numerical example, and it sometimes gets that right too.

Now embed the same problem inside a control loop, give the model the sensor readings and the cost weights and the constraint bounds, and ask it to predict which constraint will become binding at the next sample period. The accuracy collapses.

This is not a knowledge gap. The model knows the math. It is a reasoning gap, and it is structural.

Optimisation reasoning has a particular shape that does not match how language models compute. Three patterns make this concrete.

The first pattern is global feasibility. To know whether a candidate solution is feasible, you have to evaluate every constraint, not the ones that look relevant. Language models are very good at attending to the most relevant tokens, which is the wrong attention pattern for feasibility checking. They will quietly skip over a constraint that looks numerically uninteresting and miss exactly the one that matters.

The second pattern is the active set. In a constrained optimum, only some constraints are tight. Identifying the active set is the central combinatorial step in QP and NLP solvers, and there are mature algorithms for it. Asking an LLM to do this implicitly, by reasoning through it in natural language, is asking it to simulate a solver. It can do this for very small problems. It does not scale. The error mode is interesting: the model picks a plausible-looking active set, then writes a confident justification for it, regardless of whether the active set is actually correct.

The third pattern is the duality argument. KKT logic flows in both directions. From the primal you can reason about the dual, and the dual gives you the shadow prices that explain the primal. Language models tend to flatten this into a single direction. They will explain a primal decision in primal terms (we did X because the cost of X was lowest) and skip the dual reasoning (we did X because the shadow price on the constraint that would have ruled out Y was higher than the shadow price on the constraint that would have ruled out X). The dual story is often the more useful one for an operator, and it is the one most likely to be lost.

These three patterns are not unique to LLMs. They show up in any system that tries to reason about optimisation without actually solving the optimisation. The difference is that a numerical solver will tell you when it cannot find a feasible point. An LLM will generate a fluent paragraph that sounds like it found one.

The engineering response, in our work, is to never ask the LLM to do the optimisation reasoning by itself. The optimiser does the optimisation. The LLM reads the optimiser’s output, the active set, the dual variables, the slack values, and translates that into a human-readable explanation. The LLM is a translator, not a solver.

Once you accept that split, a lot of the disappointment with LLMs in optimisation contexts goes away. The model is being used inside its competence, on the linguistic and compositional side. The numerical heavy lifting stays where it has always been good, in the solver.

The interesting research question that remains is the one in the middle. Can a language model, given access to a solver as a tool, reliably decide when to call the solver, what problem to pose to it, and how to interpret its output? That is a non-trivial reasoning problem in its own right, and it is closer to where the field is going than the “just prompt the model harder” line.

I am cautiously optimistic about that direction. I am not optimistic about LLMs as standalone optimisers, and I do not think any amount of scaling alone fixes the three patterns above.

From causal discovery to causal reasoning

2025-11-07T00:00:00+00:00

Causal discovery is having a moment. Constraint-based methods like PC and FCI, score-based methods like NOTEARS, time-series methods like PCMCI and Granger-style approaches, are all in active use. Given enough data and a tolerable set of assumptions, you can recover a plausible directed graph over your variables. The literature treats that graph as the output.

In a control setting, the graph is the input. The output is a decision an operator will live with.

That gap, between having a causal graph and using it to reason, is where most of the engineering effort actually goes, and it is underexposed in the literature.

Three problems show up the moment you try to deploy a discovered graph in a working system.

The first problem is that the graph is wrong. Not catastrophically wrong, just wrong in the way real models are wrong. An edge points the wrong direction. Two variables that should be connected are not. A latent confounder is misattributed to a spurious direct link. If you take the graph at face value and feed it to a downstream reasoning system, the system will produce confidently wrong answers. The honest fix is to never let the graph stand alone. It needs a domain expert in the loop, and the system has to make it cheap for that expert to inspect, edit, and version the graph.

The second problem is that the graph is static and the world is not. Greenhouse dynamics in summer are not the same as in winter. The right causal structure for tomato in flowering is not the same as in fruiting. A single discovered graph collapses time-varying causal structure into one frozen picture. You can address this with regime detection, with windowed discovery, with hierarchical graphs that distinguish slow-varying structure from fast-varying parameters. None of these fixes are free. They all add complexity and they all need their own validation story.

The third problem, the deepest one, is that even a correct, up-to-date graph does not tell you what to do. It tells you how variables relate. The leap from “ventilation flap causes humidity” to “should I open the ventilation flap right now” is a planning problem, not a discovery problem. The graph is a constraint on the planner, not a substitute for it.

Our 2025 Frontiers in Agronomy paper sits in this gap. We use constraint-based causal discovery on greenhouse sensor streams to recover a graph over climate, plant, and control variables. Then we expose that graph to an LLM as a structured reference. The LLM does not do causal discovery. It reads the discovered graph and uses it as a scaffold for plain-language recommendations that a grower can follow.

The split matters. The causal-discovery algorithm is good at finding relations from data. It is not good at deciding what to do with them. The LLM is good at composing readable, contextual recommendations. It is not good at separating correlation from causation. Putting them in series, with the graph as the bridge, lets each component do what it is actually good at.

There is an interesting open question hiding in this setup. How should the LLM handle disagreement with the graph? In some queries the model’s pretraining tells it one thing and the discovered graph tells it another. The conservative answer is “always defer to the graph.” The more useful answer is probably “flag the disagreement, explain both views, let the operator decide.” The right policy is not obvious and we are still learning it.

The bigger picture. Causal discovery alone is not the goal. It is one tool in a longer pipeline that ends with a human making a decision. The papers that move the field forward in the next few years will, I think, be the ones that take that pipeline seriously end to end.

Domain knowledge graphs as scaffolds for LLM reasoning

2025-10-17T00:00:00+00:00

Retrieval-augmented generation is the default answer to “how do I make an LLM stop hallucinating.” Index your documents, retrieve the top-k chunks, stuff them into the context window, and let the model generate. It works surprisingly well on broad domains, customer support, legal search, internal wikis. It works much less well on narrow, technical, control-heavy domains. There is a structural reason for this, and it points to a different design.

Retrieval over a corpus assumes that the right answer is somewhere in the corpus, expressed in roughly the right words. In a narrow domain like greenhouse climate control, the right answer is almost never expressed in the corpus. The corpus has fragments. It has a paper on vapour-pressure deficit, a manual on a specific climate computer, a PhD thesis on tomato transpiration, an FAQ on dehumidification. The operator’s actual question, “why did the controller open the ventilation flap right now,” is a composition of those fragments, and the composition is the hard part.

Knowledge graphs are good at exactly that compositional layer. A graph is a set of entities and a set of typed relations between them. For a greenhouse, the entities are sensors, actuators, climate variables, plant physiology states, and constraints. The relations are things like “ventilation flap actuator influences humidity variable,” “humidity variable affects fungal-disease risk,” “fungal-disease risk is constrained below threshold X for cultivar Y.” That is a small graph, a few hundred nodes, a few thousand edges. You can build it by hand with a domain expert in two afternoons.

The interesting move is what you do with the graph at inference time.

Naive use is bad. If you simply dump the entire graph into the context window, you have just made the LLM read a long, structured document, and you are back to the corpus-retrieval problem with extra steps.

The right use is constrained. The graph becomes a vocabulary that the LLM is allowed to talk about. When the model generates an explanation, it is required to express the explanation in terms of graph nodes and edges. Anything that does not reduce to the graph is flagged as ungrounded. The graph is not extra context. It is a contract about what the model is allowed to claim.

This is the move our 2025 Smart Agricultural Technology paper makes. We pair a model predictive controller with an LLM, and the LLM is forced to stay inside the domain knowledge graph when it explains a control action. The controller decides what to do. The graph decides what the explanation is allowed to say. The LLM does the linguistic gluing.

Three things become easier once you do this.

Verification. You can check that every entity and every relation in the explanation actually exists in the graph. If the model invents a new variable, the verifier catches it. This eliminates a whole class of confident-sounding hallucinations.

Editing. When the domain expert disagrees with an explanation, they can change the graph. They cannot easily change a 70-billion-parameter language model. The graph gives the human a steering wheel that the model cannot ignore.

Cross-domain reuse. The LLM stays the same. The graph swaps. Move from greenhouse to building HVAC and you swap the entities and relations, you do not retrain anything.

The cost is real. Building the graph is the unglamorous part of the work. It needs domain interviews, careful ontology decisions, and upkeep as the underlying plant changes. It also caps the system’s expressivity at whatever the graph contains. If the graph does not have a node for “leaf wetness,” the system cannot explain in terms of leaf wetness, even if the underlying physics involves it. That is a feature, not a bug, in safety-critical contexts. The system fails visibly, in the graph, where a human can see it, rather than invisibly, inside a transformer, where they cannot.

The pattern generalises beyond control. Any domain where the corpus is sparse, the variables are well-defined, and the cost of hallucination is high, is a domain where knowledge graphs as scaffolds beat retrieval over text. Medicine, manufacturing, energy systems, all of them fit. The trick is having the patience to build the graph.

Mechanistic interpretability for non-NLP people, a primer

2025-09-26T00:00:00+00:00

If you work in control, robotics, or any other field where neural networks are used as components rather than as the whole product, the mechanistic interpretability literature looks intimidating. There is a vocabulary problem. Circuits, features, superposition, induction heads, monosemanticity, sparse autoencoders, probing, patching. Each of these is a real concept with a real reason to exist, but the way the field talks about them assumes you have read the last three years of papers on transformer internals.

This post is a translation. It is the version I wish had existed when I started reading this literature seriously.

Start with the basic question. Mechanistic interpretability is not asking “which input features mattered for this output.” That is the SHAP question, and it has a different shape. Mechanistic interpretability asks “which internal computations did the model actually run to get from input to output.” It treats the network as a program and tries to reverse-engineer the program.

The unit of analysis is the circuit. A circuit is a small subset of neurons, attention heads, and connections that together implement a recognisable computation. The classic example is the induction head, a two-attention-head circuit in transformers that implements pattern completion of the form “A B … A -> B.” Once you know the circuit exists, you can find it, ablate it, and watch the model fail at the task. That is the kind of evidence the field treats as compelling.

Three concepts do most of the work.

Features. A feature is a direction in activation space that corresponds to a human-interpretable concept. “This input mentions the Eiffel Tower.” “This token is the first one of a list.” “Temperature is rising.” Features are the alphabet of the network’s internal language.

Superposition. Networks have far more features than neurons. They solve this by storing features in overlapping linear combinations. This is why you cannot just look at one neuron and read off “this neuron is the temperature neuron.” It almost never works that way. The temperature feature is spread across many neurons, and many other features share those same neurons.

Sparse autoencoders. The current best tool for getting around superposition. You train a wide, sparse autoencoder on the model’s activations and let it discover an over-complete basis. Each basis direction is a candidate feature. With enough scale, many of those features turn out to be human-interpretable.

Now the part that matters for non-NLP people.

Almost every mechanistic interpretability technique developed for language models transfers to any transformer. If you have a transformer-based controller, a transformer-based world model, a transformer-based perception module, the same toolkit applies. You can probe activations, find features, identify circuits, ablate them, and check whether your causal story holds.

What does not transfer cleanly is the intuition. Language models have a token vocabulary and a discrete, compositional structure that makes features feel natural. A controller that reads continuous sensor inputs does not have tokens. The “alphabet” is harder to even define. That does not mean features are absent. It means we have to do more work to find them.

This is where my current work goes. I take a domain-adapted language model that has been trained to reason about constrained control problems, and I ask the standard mechinterp questions of it. Which features encode the notion of a constraint being binding? Which circuits handle the prediction horizon? Where does the model store the difference between an objective and a constraint? Some of these questions have promising preliminary answers. The full version is in review and I will write about it once it is out.

The takeaway for now is not “go read 40 papers.” The takeaway is this. Mechanistic interpretability is a tractable engineering discipline, not a philosophy. It has its own tools, its own evidence standards, and its own failure modes. If you are deploying a neural network in a loop with humans or hardware, knowing what is inside it is a reasonable engineering expectation, not a research luxury.

Why LLMs need to explain themselves: a control-systems perspective

2025-09-05T00:00:00+00:00

There is a quiet assumption in most modern AI deployments. The model makes a decision, the operator accepts the decision, and the explanation, if anyone bothers, is generated afterwards by a separate post-hoc tool. SHAP, LIME, attention maps, the usual suspects. The explanation is treated as a kind of receipt. Optional. Cosmetic.

In a control system this assumption falls apart immediately.

A controller is not a one-shot predictor. It runs in a loop. Every sample period it picks an action, the plant moves, sensors update, the controller picks the next action. The grower, the operator, the plant manager, are not external auditors who occasionally check on it. They are inside the loop. They override it, they retune it, they switch it off when something feels wrong. The explanation is the channel through which the human and the controller stay in sync. If the channel is broken, the loop is broken.

That single observation reframes the whole explainability problem. Explanations stop being a UX nicety. They become a control-loop requirement, with the same status as observability or robustness margins.

A few consequences follow.

First, latency matters. A 30-second explanation is useless when the sample period is one minute. The explanation has to live on the same timescale as the decision, otherwise the operator falls back to the old habit of trusting their gut and ignoring the controller.

Second, faithfulness matters more than fluency. A confident, plausible-sounding explanation that does not actually reflect what the optimiser did is worse than no explanation at all. It teaches the operator a wrong mental model of the plant. Every time the LLM smooths over a constraint that was actually binding, it widens the gap between the human’s mental model and the real system. That gap is where accidents happen.

Third, the explanation has to be grounded in something. Free-form text out of a base LLM is not grounded in anything except its training corpus. In a control loop the natural grounding is the optimiser itself: the constraints, the cost terms, the active set, the prediction horizon, the reference signal. The LLM should be reading from those, not from a vibe.

Fourth, the explanation should be testable. If a controller says “I turned the heater down because the predicted humidity was about to hit the upper bound,” that statement is either true or false in the optimisation problem. We can check it. Explanations that are not checkable are not engineering artefacts, they are decoration.

When you stack those four requirements, latency, faithfulness, grounding, and testability, you end up in a very specific design corner. Post-hoc methods like SHAP cannot satisfy faithfulness because they approximate a different model. Pure prompt-engineered LLMs cannot satisfy grounding because they do not have privileged access to the optimiser. Attention maps cannot satisfy testability because there is no map from “this attention head lit up” to “this constraint was binding.”

What does fit is a tighter coupling: an LLM that reads structured state from the optimiser, a domain knowledge graph that constrains which entities and relations the LLM is allowed to talk about, and a verifier that checks every generated explanation back against the optimisation problem before it is shown to a human.

That stack is what our 2025 Smart Agricultural Technology paper prototypes. The greenhouse is incidental. The same stack would apply to any plant where decisions are made by an optimiser and consumed by a human. Building HVAC, district heating, autonomous trains, fuel-cell stacks, all of them have the same shape.

The deeper claim is this. We have spent ten years arguing about whether neural networks should be allowed near safety-critical control. The right question turns out to be different. The question is whether they can stay in the loop with humans, not whether they can replace them. And that question is an explainability question first.

Most of what comes next on this blog is about the engineering of that loop.

How we made greenhouse controllers explain themselves

2025-08-15T00:00:00+00:00

If you have ever stood in front of an industrial controller and asked “why did you just do that?”, you already know the problem this post is about.

Modern greenhouses run on model predictive control. The controller looks a few hours into the future, predicts what the plants and the building will do under different ventilation, lighting, irrigation, and CO₂ choices, and picks the sequence of actions that minimizes a cost function. It is genuinely good at its job. The trouble is that the cost function does not speak English, and the grower does.

This is the gap our two 2025 papers were trying to close, from two different sides.

Paper one: read the setpoint before you criticize the controller

Before you can explain a controller’s actions, you have to understand what it was being asked to do. In greenhouses, that “ask” is a reference trajectory for temperature, humidity, and CO₂ that shifts across the day and the season. These trajectories are not flat. They have ramps, dwell periods, plateaus, diurnal cycles, slow seasonal drifts, and the occasional anomaly when someone opened a vent at the wrong time.

Our Frontiers in Agronomy paper, Automated analysis of reference signals, is essentially a piece of plumbing nobody had built carefully: an automated pipeline that takes a real reference trajectory and decomposes it into the components a human grower would describe if you asked them to. Diurnal pattern here, weekly trend there, this hour is the ramp into night setback, that hour is a recovery from a humidity excursion. The output is a structured, operator-readable description of the setpoint, not a black-box model of it.

The reason this matters is that any explanation of controller behaviour is only as good as the description of what the controller was being asked to track. Without this layer, you end up explaining noise. With it, you can have a real conversation about whether the setpoint design itself was reasonable.

Read the paper.

Paper two: let the grower talk to the controller

Our Smart Agricultural Technology paper, Enhancing greenhouse management with interpretable AI, is the second half of the story. Once you can describe what the controller is being asked to do, you can build something that lets a grower ask, in plain English, why the controller is doing what it is doing.

The naive way to do this is to hand the question to a large language model and hope. That does not work for two reasons. First, the model has no idea what is actually happening inside the optimizer. Second, even when it sounds confident, its answers are not grounded in the controller’s own reasoning, which means they cannot be audited.

What we built is a thin language layer that does three things in sequence. It maps the operator’s natural-language question onto the structured artefacts the optimizer already produces, things like the active constraints, the multipliers, and the per-step contributions to the cost. It uses a domain knowledge graph, encoding what plants, vents, lights, and humidity actually do to each other, to constrain which explanations are even physically plausible. And it returns the answer in the operator’s own language, with the underlying evidence inline so a sceptical grower can dig into the numbers.

The headline result is that the system produces explanations that match expert annotations far better than off-the-shelf attribution methods. But the honest reason we are happy with it is more boring: when we sat with growers and watched them use it, they trusted it. They argued with it. They occasionally won the argument. That is what an interpretable system should feel like.

Read the paper.

What ties them together

These are two papers, but they are really one thesis. If you want a controller to explain itself in language a domain expert will accept, you need both an honest description of the goal it was given and a faithful translation of the reasoning it actually used. Either half on its own is a demo. Together, they start to look like a real tool.

What’s next

I am extending these ideas in two directions. One is mechanistic interpretability for the language layer itself, treating the explainer as an object of study, not just a wrapper. The other is moving beyond greenhouses to controlled-environment systems with stricter safety requirements, where the cost of an opaque decision is much higher.

If any of this resonates, or if you are working on something nearby, I would genuinely like to hear from you. Email is the fastest way.