Component Centric Placement Using Deep Reinforcement Learning

Imagine you are an architect tasked with designing the interior of a very busy, tiny apartment (the Printed Circuit Board, or PCB). You have a large, central piece of furniture (the main microchip) that must stay right in the middle of the room. Around it, you have dozens of smaller items (resistors, capacitors, and other "passive" components) that need to be placed.

The rules are strict:

No Overlaps: Nothing can sit on top of anything else.
Short Cables: The smaller items need to be plugged into the big central piece with wires. The shorter the wires, the better the apartment works.
Specific Neighbors: Some small items must be near specific power outlets on the central piece.

Doing this by hand is hard because there are millions of ways to arrange the furniture, and finding the perfect one takes forever. This paper introduces a smart robot (AI) that learns how to arrange this furniture automatically using a technique called Deep Reinforcement Learning.

Here is how the paper solves the problem, explained simply:

1. The "Grid" Trick (Discrete Action Space)

The Problem: If you tell a robot, "Put this item anywhere in the room," it gets confused. It might try to place it at coordinate (10.001, 20.002), which is physically impossible on a circuit board and creates a massive, unmanageable number of options.
The Solution: The authors tell the robot to only look at a pre-drawn grid of spots around the central piece. Think of it like a game of Battleship or a chessboard. The robot doesn't choose "anywhere"; it just chooses "Spot A," "Spot B," or "Spot C."

Why it helps: It turns a chaotic, infinite puzzle into a manageable game with a fixed number of moves, making the AI learn much faster.

2. The "Smart Neighbor" Rule (Net Proximity)

The Problem: A dumb robot might try to put a battery-powered item on the opposite side of the room from its power source, creating a long, messy wire.
The Solution: The researchers gave the robot a "cheat sheet" (prior knowledge). They told it: "Hey, if this item needs power from Pin 1, it should probably be placed near Pin 1."

The Reward System: When the robot places an item near its correct power source, it gets a "gold star" (positive reward). If it places it far away, it gets no points. This stops the robot from wasting time trying impossible or silly arrangements.

3. The "ID Card" System (Token-Based Input)

The Problem: In the past, AI tried to learn by looking at raw numbers (coordinates, distances). This is like trying to learn a language by memorizing the chemical composition of the letters. It's inefficient.
The Solution: The authors changed how they talk to the AI. Instead of just saying "Component #5," they say "Component #5, which belongs to the 'Power Group'."

The Analogy: Imagine a party. If you just say "Put the guest in a chair," they might sit anywhere. But if you say "Put the guest from the 'Marketing Team' near the 'Marketing Team' table," they naturally cluster together. By grouping components by their electrical connections (nets), the AI understands the relationships between items, not just their physical locations.

4. The Training Methods (The Coaches)

The paper tested three different "coaches" to teach the robot:

Simulated Annealing (SA): Like a human trying random moves, occasionally making a "bad" move to escape a bad spot, hoping to find a better one later.
DQN (Deep Q-Network): A strict coach that learns by trial and error, memorizing which specific moves lead to the best scores.
A2C (Actor-Critic): A coach with two voices. One voice (the Actor) tries new moves, while the other (the Critic) judges how good those moves were. This is often the most flexible teacher.

The Results: Did it work?

The team tested this on 9 real-world circuit boards of varying complexity.

The Winner: The method that combined the Grid Trick, the Smart Neighbor Rule, and the ID Card System (specifically using a DQN with net information) performed the best.
The Outcome: The AI placed components with wire lengths almost as good as a human expert, but it did it much faster and with fewer mistakes (like overlapping parts).

Summary

This paper is about teaching an AI to be a master interior designer for circuit boards. By simplifying the choices (using a grid), giving the AI common sense (knowing neighbors should be close), and teaching it to recognize groups (ID cards), they created a system that can automatically design complex electronics layouts as well as, or better than, human engineers.

1. Problem Statement

Automated component placement on Printed Circuit Boards (PCBs) is a critical step in layout design, yet it presents unique challenges distinct from System-on-Chip (SoC) or chiplet placement. These challenges include:

Heterogeneity: Significant variations in component sizes.
Board Complexity: Support for both single-sided and double-sided boards.
Constraints: Strict requirements for non-overlapping placement, wirelength minimization, and adherence to board boundaries.
Search Space: The continuous 2D nature of PCBs creates an intractably large search space for optimization algorithms.
RL Limitations: Existing Reinforcement Learning (RL) methods struggle to balance wirelength, congestion, and feasibility while handling diverse component sizes and discrete placement requirements.

2. Methodology

The authors propose a Component-Centric Layout Strategy combined with Deep Reinforcement Learning (DRL) to automate PCB placement.

A. Core Strategy: Component-Centric Discretization

Instead of treating the PCB as a continuous 2D plane, the authors model the placement area as a set of discrete physical locations surrounding a fixed "main component" (e.g., a microcontroller or power circuit).

Fixed Center: The main component is fixed at the center.
Discrete Actions: Passive components are placed in predefined candidate locations ( $L = \{l_1, ..., l_N\}$ ) around the main component. This drastically reduces the search space while ensuring manufacturability.
Net Proximity Heuristic: The method leverages prior knowledge that passive components must be placed near their corresponding voltage sources (power pins). This guides the RL agent away from infeasible or irrelevant search spaces.

B. State and Action Representation

State ( $S$ ): Unlike traditional feature-based inputs (coordinates, distances), the authors use a Token-Based Input. The state is a concatenation of the Passive ID and the Net ID (voltage source connection) encoded as one-hot vectors ( $s = [p_{state} \parallel n_{state}]$ ). This allows the agent to learn that passives connected to the same net should be physically proximate.
Action ( $A$ ): A discrete set of binary choices ( $a_i \in \{0, 1\}$ ) indicating whether to place a component at a specific candidate location $l_i$ .

C. Reward Function Design

The total reward ( $R_{total}$ ) is a weighted sum of two objectives:

Non-Overlap Reward ( $R_{non-overlap}$ ): Penalizes placements where components overlap.
Net Proximity Reward ( $R_{proximity}$ ): Rewards placing passives near their associated power pins (derived from the circuit schematic).

Constraint Relaxation: To prevent the agent from getting stuck due to hard constraints, a Top-K mechanism is introduced. It relaxes the reward function by granting positive rewards to the $K$ nearest neighbor actions to the target net's centroid, facilitating exploration.

D. Algorithms Evaluated

The study implements and compares three methods:

Simulated Annealing (SA): A traditional baseline for global optimization.
Deep Q-Network (DQN): A value-based method suitable for discrete action spaces.
Advantage Actor-Critic (A2C): A policy-based method that combines value estimation with policy optimization.
DQNnet: A variant of DQN that specifically incorporates the Net ID token into the state representation.

E. Evaluation Metric

The authors utilize Total Euclidean Wirelength (TEWL) rather than the traditional Half-Perimeter Wirelength (HPWL). TEWL calculates the actual distance between all pins, providing a more accurate correlation with routed wirelength, especially when multiple pins exist within a bounding box.

3. Key Contributions

Component-Centric Discretization: A novel approach to reduce the PCB placement search space by fixing the main component and discretizing the surrounding area, making RL application feasible for complex PCBs.
Token-Based State Representation: The integration of Net ID into the state vector allows the RL agent to inherently understand electrical connectivity constraints, leading to more intelligent placement decisions.
Reward Function Innovation: The design of a reward function that balances non-overlap constraints with net proximity, utilizing a Top-K relaxation to improve exploration efficiency.
Comprehensive Benchmarking: A rigorous evaluation on 9 real-world PCBs of varying complexity, comparing SA, DQN, A2C, and the proposed DQNnet against human-designed ground truths.

4. Results

The experiments were conducted on a dataset of 9 PCBs with varying numbers of passives (8–24), nets, and component size disparities.

Wirelength Performance (TEWL):
- A2C generally outperformed DQN and SA in terms of TEWL for most boards, achieving near-human or better wirelengths.
- DQNnet (incorporating net information) showed significant improvements over standard DQN across all boards, reducing TEWL significantly (e.g., on board U4, TEWL dropped from 1765 to 1308).
- In most cases, the ML methods surpassed human-designed ground truth (GT) in terms of pure wirelength metrics.
Feasibility and Constraints:
- While A2C achieved the best wirelength, it occasionally produced more routing conflicts and overlapping passives in highly complex scenarios (e.g., U20, U26) compared to DQN.
- DQNnet demonstrated a significant reduction in overlapping passives compared to standard DQN, though routing conflicts slightly increased.
Robustness: DQN proved more robust than A2C for the most complex boards (U20, U26) with large component size disparities, even with fine-tuning attempts on A2C.

5. Significance

This work bridges the gap between advanced AI techniques and practical PCB design. By shifting from continuous to component-centric discrete placement, the authors make the problem tractable for Deep Reinforcement Learning. The introduction of Net ID tokens into the state space is a critical insight, enabling the AI to learn electrical connectivity rules that traditional geometric approaches miss.

The results demonstrate that AI-driven placement can not only match but often exceed human performance in minimizing wirelength while adhering to strict physical constraints. This approach offers a scalable path toward fully automated, constraint-aware PCB layout design, reducing the time and expertise required for complex electronic system development.