Autonomous process discovery through reinforcement learning (RL) has achieved remarkable results, notably SynGameZero (Goettl et al., 2021), which discovers chemical process flowsheets via self-play without prior engineering knowledge. However, self-play architectures suffer from a fundamental limitation rooted in Goedel's incompleteness: a single architecture cannot detect its own systematic blind spots. We propose Multi-Architecture Collision (MAC), a framework where N agents with heterogeneous neural architectures independently solve the same process design task, then undergo pairwise collision to identify what no single agent discovered alone. We introduce the friction-optimal point (delta_opt) as a universal, domain-agnostic reward signal derived from tribological optimization theory. For N architectures, MAC produces N(N-1)/2 collision pairs, each yielding a blind spot detection. The framework requires no modification to existing RL pipelines and can wrap any self-play system as a single head within a multi-head architecture.
Keywords: Reinforcement Learning, Autonomous Process Design, Multi-Agent Systems, Blind Spot Detection, Friction Optimization, Flowsheet Synthesis
Recent advances in reinforcement learning for chemical process design have demonstrated that artificial agents can synthesize process flowsheets without relying on predefined engineering rules (Goettl et al., 2021; Goettl et al., 2023; Stops et al., 2023). The SynGameZero framework reformulates flowsheet synthesis as a two-player game where an agent learns through self-play, discovering fundamental process engineering paradigms — such as heteroazeotropic distillation — autonomously.
This line of research addresses a critical challenge: process synthesis is inherently creative work that resists formalization. Traditional approaches encode expert knowledge as rules, limiting discovery to what is already known. RL-based approaches break this ceiling by learning from interaction with process simulators.
However, self-play carries a structural limitation that has received insufficient attention: a system that plays against itself cannot detect its own systematic errors.
This is not merely a practical concern but a theoretical one. Goedel's first incompleteness theorem (1931) establishes that any sufficiently powerful formal system contains true statements it cannot prove about itself. Applied to neural architectures: any single network topology induces systematic biases in its solution space that it cannot detect through self-evaluation.
Consider a concrete example: A transformer-based agent trained via self-play on distillation problems may consistently overlook membrane-based separation strategies — not because membranes are suboptimal, but because the architecture's attention mechanism encodes an implicit bias toward sequential unit operations. Self-play cannot reveal this bias because both players share it.
We call this the architectural blind spot: the set of solutions that are structurally invisible to a given network topology, regardless of training duration or data quantity.
A related challenge in autonomous discovery is data heterogeneity. When integrating data from multiple sources — genomic, imaging, spectroscopic, process — the standard approach treats heterogeneity as a problem to be solved through harmonization or embedding.
We propose the opposite view: heterogeneity is signal. The distance between two representations of the same phenomenon contains information about what each representation fails to capture. This reframing transforms the data integration problem from "how to make sources compatible" into "how to extract insight from their incompatibility."
We present Multi-Architecture Collision (MAC), a framework that:
Theorem (Architectural Incompleteness). Let A be a neural architecture trained via self-play on task T. Let S_A denote the set of solutions discoverable by A. Then there exist solutions s in T such that s is not in S_A, and A cannot determine that s is not in S_A.
Informal proof. The architecture A induces an inductive bias I_A that constrains the hypothesis space. Self-play training explores S_A exhaustively (given sufficient time), but I_A is fixed. Since I_A is not itself a learnable parameter of the self-play loop, biases introduced by I_A are invisible to the training process.
Corollary. For two architectures A, B with I_A ≠ I_B, the set S_A \ S_B (solutions visible to A but not B) is generally non-empty, and vice versa. The union S_A ∪ S_B ∪ (S_A × S_B) strictly exceeds both individual solution sets.
In tribology, the Stribeck curve describes friction as a function of operating parameters. Every tribological system has a Stribeck minimum — the point of minimal friction where the system operates most efficiently.
We observe that process optimization problems universally exhibit this structure:
Friction(x) = F_boundary(x) + F_viscous(x)
F_boundary(x) → ∞ as x → 0 (insufficient separation)
F_viscous(x) → ∞ as x → ∞ (excessive energy input)
delta_opt = argmin_x Friction(x) (optimal operating point)
| Process Type | F_boundary (too little) | F_viscous (too much) | delta_opt |
|---|---|---|---|
| Distillation | Impurity (low reflux) | Energy waste (high reflux) | Optimal reflux ratio |
| Reaction | Low conversion (low T) | Side products (high T) | Optimal temperature |
| Extraction | Low yield (few stages) | Solvent cost (many stages) | Optimal stage count |
| Crystallization | Amorphous product (fast) | Time cost (slow) | Optimal cooling rate |
| Membrane sep. | Low flux (low dP) | Fouling (high dP) | Optimal pressure drop |
| Enzyme kinetics | Substrate limitation | Inhibition | Michaelis optimum |
The key insight: delta_opt is domain-agnostic. It depends only on the existence of two competing loss terms — which every real process has. This makes delta_opt a universal reward signal.
+------------------+
| Task T |
+--------+---------+
|
+--------------+--------------+
| | |
+-----v-----+ +----v------+ +----v------+
| Head A | | Head B | | Head C |
| (e.g. GNN | | (e.g. | | (e.g. |
| + MCTS) | | Transf.) | | MLP+RL) |
+-----+------+ +----+------+ +----+------+
| | |
+--------------+--------------+
|
+--------v---------+
| Collision Engine |
| N(N-1)/2 pairs |
+--------+---------+
|
+--------v---------+
| delta_opt Scorer |
+------------------+
Each head must satisfy only two interface requirements:
No constraint on internal architecture, training method, or model family. The heterogeneity of architectures is not a weakness to be managed but the primary source of blind spot detection capability.
# Existing SynGameZero agent becomes one head
head_a = MACHead(name="SynGame", solve_fn=syngamezero.solve)
# Add heterogeneous heads
head_b = MACHead(name="Transformer", solve_fn=transformer_agent.solve)
head_c = MACHead(name="Symbolic", solve_fn=symbolic_solver.solve)
# Run MAC
mac = MultiArchitectureCollision(heads=[head_a, head_b, head_c])
result = mac.run(task=separation_task)
print(result.blind_spots) # What individual heads miss
print(result.meta_blind_spot) # What ALL approaches miss
No retraining. No architectural changes. Pure composition.
| N (heads) | Collisions | Blind spots | Meta-blind-spots |
|---|---|---|---|
| 2 | 1 | 1 | 0 |
| 3 | 3 | 3 | 1 |
| 6 | 15 | 15 | 1 |
| 10 | 45 | 45 | 1 |
Three heads is the minimum for meaningful meta-analysis. MAC's overhead is modest: O(N) time with parallel execution, O(N²) sequential. For N=3: 3 solves + 3 collisions + 1 meta = 7 calls. Compared to thousands of self-play episodes, this is negligible.
SynGameZero (Goettl et al., 2021) is a natural single head within MAC. Its GNN + MCTS architecture excels at structural process reasoning. MAC does not replace it — it completes it by adding architectures with complementary inductive biases.
Specifically: SynGameZero's tree search explores deeply within its architecture's hypothesis space. A transformer-based head explores broadly across sequence-representable solutions. The collision between them reveals the boundary of both spaces.
Task: Binary azeotropic mixture separation (following Goettl et al., 2024)
Hypotheses:
Autonomous scientific discovery requires more than better individual agents. It requires agents that can see each other's blind spots. Multi-Architecture Collision provides a concrete, implementable path to this capability.
The framework is minimal (wraps existing agents without modification), scalable (combinatorial blind spot detection), and theoretically grounded (Goedel's incompleteness applied to neural architectures).
Goedel, K. (1931). Ueber formal unentscheidbare Saetze der Principia Mathematica und verwandter Systeme I. Monatshefte fuer Mathematik und Physik, 38(1), 173-198.
Goettl, Q., Grimm, D. G., & Burger, J. (2021). Using reinforcement learning in a game-like setup for automated process synthesis without prior process knowledge. Computer Aided Chemical Engineering, 50, 1583-1588.
Goettl, Q., Pirnay, J., Burger, J., & Grimm, D. G. (2023). Automated synthesis of steady-state continuous processes using reinforcement learning. Frontiers of Chemical Science and Engineering.
Goettl, Q., Pirnay, J., Burger, J., & Grimm, D. G. (2024). Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge. Computers & Chemical Engineering, 190.
Stops, L., et al. (2023). Flowsheet generation through hierarchical reinforcement learning and graph neural networks. AIChE Journal, 69(1).
Stribeck, R. (1902). Die wesentlichen Eigenschaften der Gleit- und Rollenlager. Zeitschrift des VDI, 46, 1341-1348.
Corresponding author: Julian Guggeis (julian@guggeis-it.de) — Straubing, Germany — March 2026