Beyond Self-Play: Multi-Architecture Collision

Abstract

Autonomous process discovery through reinforcement learning (RL) has achieved remarkable results, notably SynGameZero (Goettl et al., 2021), which discovers chemical process flowsheets via self-play without prior engineering knowledge. However, self-play architectures suffer from a fundamental limitation rooted in Goedel's incompleteness: a single architecture cannot detect its own systematic blind spots. We propose Multi-Architecture Collision (MAC), a framework where N agents with heterogeneous neural architectures independently solve the same process design task, then undergo pairwise collision to identify what no single agent discovered alone. We introduce the friction-optimal point (delta_opt) as a universal, domain-agnostic reward signal derived from tribological optimization theory. For N architectures, MAC produces N(N-1)/2 collision pairs, each yielding a blind spot detection. The framework requires no modification to existing RL pipelines and can wrap any self-play system as a single head within a multi-head architecture.

Keywords: Reinforcement Learning, Autonomous Process Design, Multi-Agent Systems, Blind Spot Detection, Friction Optimization, Flowsheet Synthesis

1. Introduction

1.1 The Promise of Knowledge-Free Process Discovery

Recent advances in reinforcement learning for chemical process design have demonstrated that artificial agents can synthesize process flowsheets without relying on predefined engineering rules (Goettl et al., 2021; Goettl et al., 2023; Stops et al., 2023). The SynGameZero framework reformulates flowsheet synthesis as a two-player game where an agent learns through self-play, discovering fundamental process engineering paradigms — such as heteroazeotropic distillation — autonomously.

This line of research addresses a critical challenge: process synthesis is inherently creative work that resists formalization. Traditional approaches encode expert knowledge as rules, limiting discovery to what is already known. RL-based approaches break this ceiling by learning from interaction with process simulators.

1.2 The Self-Play Blind Spot Problem

However, self-play carries a structural limitation that has received insufficient attention: a system that plays against itself cannot detect its own systematic errors.

This is not merely a practical concern but a theoretical one. Goedel's first incompleteness theorem (1931) establishes that any sufficiently powerful formal system contains true statements it cannot prove about itself. Applied to neural architectures: any single network topology induces systematic biases in its solution space that it cannot detect through self-evaluation.

Consider a concrete example: A transformer-based agent trained via self-play on distillation problems may consistently overlook membrane-based separation strategies — not because membranes are suboptimal, but because the architecture's attention mechanism encodes an implicit bias toward sequential unit operations. Self-play cannot reveal this bias because both players share it.

We call this the architectural blind spot: the set of solutions that are structurally invisible to a given network topology, regardless of training duration or data quantity.

1.3 Data Heterogeneity as Signal, Not Noise

A related challenge in autonomous discovery is data heterogeneity. When integrating data from multiple sources — genomic, imaging, spectroscopic, process — the standard approach treats heterogeneity as a problem to be solved through harmonization or embedding.

We propose the opposite view: heterogeneity is signal. The distance between two representations of the same phenomenon contains information about what each representation fails to capture. This reframing transforms the data integration problem from "how to make sources compatible" into "how to extract insight from their incompatibility."

1.4 Contribution

We present Multi-Architecture Collision (MAC), a framework that:

Wraps any existing RL agent as a single "head" within a multi-head system
Deploys N heads with heterogeneous architectures on the same task
Performs pairwise collision between all N(N-1)/2 head pairs
Extracts blind spots — solutions visible to one head but invisible to another
Synthesizes a meta-blind-spot: what ALL heads miss collectively
Uses a universal reward signal (delta_opt) derived from friction optimization theory

2. Theoretical Foundation

2.1 Goedel's Theorem Applied to Neural Architectures

Theorem (Architectural Incompleteness). Let A be a neural architecture trained via self-play on task T. Let S_A denote the set of solutions discoverable by A. Then there exist solutions s in T such that s is not in S_A, and A cannot determine that s is not in S_A.

Informal proof. The architecture A induces an inductive bias I_A that constrains the hypothesis space. Self-play training explores S_A exhaustively (given sufficient time), but I_A is fixed. Since I_A is not itself a learnable parameter of the self-play loop, biases introduced by I_A are invisible to the training process.

Corollary. For two architectures A, B with I_A ≠ I_B, the set S_A \ S_B (solutions visible to A but not B) is generally non-empty, and vice versa. The union S_A ∪ S_B ∪ (S_A × S_B) strictly exceeds both individual solution sets.

2.2 The Friction-Optimal Point (delta_opt)

In tribology, the Stribeck curve describes friction as a function of operating parameters. Every tribological system has a Stribeck minimum — the point of minimal friction where the system operates most efficiently.

We observe that process optimization problems universally exhibit this structure:

Friction(x) = F_boundary(x) + F_viscous(x)

  F_boundary(x) → ∞  as x → 0    (insufficient separation)
  F_viscous(x)  → ∞  as x → ∞  (excessive energy input)

  delta_opt = argmin_x Friction(x)    (optimal operating point)

Process Type	F_boundary (too little)	F_viscous (too much)	delta_opt
Distillation	Impurity (low reflux)	Energy waste (high reflux)	Optimal reflux ratio
Reaction	Low conversion (low T)	Side products (high T)	Optimal temperature
Extraction	Low yield (few stages)	Solvent cost (many stages)	Optimal stage count
Crystallization	Amorphous product (fast)	Time cost (slow)	Optimal cooling rate
Membrane sep.	Low flux (low dP)	Fouling (high dP)	Optimal pressure drop
Enzyme kinetics	Substrate limitation	Inhibition	Michaelis optimum

The key insight: delta_opt is domain-agnostic. It depends only on the existence of two competing loss terms — which every real process has. This makes delta_opt a universal reward signal.

3. Framework Architecture

                    +------------------+
                    |   Task T         |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
        +-----v-----+  +----v------+  +----v------+
        |  Head A    |  |  Head B   |  |  Head C   |
        | (e.g. GNN  |  | (e.g.     |  | (e.g.     |
        |  + MCTS)   |  | Transf.)  |  | MLP+RL)   |
        +-----+------+  +----+------+  +----+------+
              |              |              |
              +--------------+--------------+
                             |
                    +--------v---------+
                    |  Collision Engine |
                    |  N(N-1)/2 pairs   |
                    +--------+---------+
                             |
                    +--------v---------+
                    |  delta_opt Scorer |
                    +------------------+

3.1 Head Requirements

Each head must satisfy only two interface requirements:

solve(task) → solution: Produce a solution for the given task
explain(solution) → representation: Produce a parseable representation of its reasoning

No constraint on internal architecture, training method, or model family. The heterogeneity of architectures is not a weakness to be managed but the primary source of blind spot detection capability.

3.2 Integration with Existing Frameworks

# Existing SynGameZero agent becomes one head
head_a = MACHead(name="SynGame", solve_fn=syngamezero.solve)

# Add heterogeneous heads
head_b = MACHead(name="Transformer", solve_fn=transformer_agent.solve)
head_c = MACHead(name="Symbolic", solve_fn=symbolic_solver.solve)

# Run MAC
mac = MultiArchitectureCollision(heads=[head_a, head_b, head_c])
result = mac.run(task=separation_task)

print(result.blind_spots)      # What individual heads miss
print(result.meta_blind_spot)  # What ALL approaches miss

No retraining. No architectural changes. Pure composition.

4. Scaling Properties

N (heads)	Collisions	Blind spots	Meta-blind-spots
2	1	1	0
3	3	3	1
6	15	15	1
10	45	45	1

Three heads is the minimum for meaningful meta-analysis. MAC's overhead is modest: O(N) time with parallel execution, O(N²) sequential. For N=3: 3 solves + 3 collisions + 1 meta = 7 calls. Compared to thousands of self-play episodes, this is negligible.

5. Relation to SynGameZero

SynGameZero (Goettl et al., 2021) is a natural single head within MAC. Its GNN + MCTS architecture excels at structural process reasoning. MAC does not replace it — it completes it by adding architectures with complementary inductive biases.

Specifically: SynGameZero's tree search explores deeply within its architecture's hypothesis space. A transformer-based head explores broadly across sequence-representable solutions. The collision between them reveals the boundary of both spaces.

6. Proposed Experimental Validation

Task: Binary azeotropic mixture separation (following Goettl et al., 2024)

Hypotheses:

H1: MAC discovers at least one structurally novel flowsheet not found by any individual head.
H2: The delta_opt reward generalises across mixture types without re-engineering.
H3: Three heterogeneous heads outperform three homogeneous heads on solution diversity.
H4: The meta-blind-spot identifies a shared assumption of all RL-based approaches.

7. Conclusion

Autonomous scientific discovery requires more than better individual agents. It requires agents that can see each other's blind spots. Multi-Architecture Collision provides a concrete, implementable path to this capability.

The framework is minimal (wraps existing agents without modification), scalable (combinatorial blind spot detection), and theoretically grounded (Goedel's incompleteness applied to neural architectures).

References

Goedel, K. (1931). Ueber formal unentscheidbare Saetze der Principia Mathematica und verwandter Systeme I. Monatshefte fuer Mathematik und Physik, 38(1), 173-198.

Goettl, Q., Grimm, D. G., & Burger, J. (2021). Using reinforcement learning in a game-like setup for automated process synthesis without prior process knowledge. Computer Aided Chemical Engineering, 50, 1583-1588.

Goettl, Q., Pirnay, J., Burger, J., & Grimm, D. G. (2023). Automated synthesis of steady-state continuous processes using reinforcement learning. Frontiers of Chemical Science and Engineering.

Goettl, Q., Pirnay, J., Burger, J., & Grimm, D. G. (2024). Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge. Computers & Chemical Engineering, 190.

Stops, L., et al. (2023). Flowsheet generation through hierarchical reinforcement learning and graph neural networks. AIChE Journal, 69(1).

Stribeck, R. (1902). Die wesentlichen Eigenschaften der Gleit- und Rollenlager. Zeitschrift des VDI, 46, 1341-1348.

Corresponding author: Julian Guggeis (julian@guggeis-it.de) — Straubing, Germany — March 2026

Beyond Self-Play: Multi-Architecture Collision for Autonomous Process Discovery