Idempotent Slices with Applications to Code-Size Reduction

Imagine you are a chef running a massive, chaotic kitchen. You have a huge recipe book (the computer program) with thousands of steps. Some of these steps are repeated over and over again, but they are scattered all over the place. Sometimes, the same three steps appear in the middle of a soup recipe, then again in a cake recipe, and even inside a loop that happens every time you bake a batch of cookies.

Currently, your kitchen is full of duplicate instructions. The chef has to write out "chop onions," "sauté garlic," and "add salt" three separate times, even though the action is exactly the same. This makes the recipe book huge and hard to read.

This paper introduces a new way to organize that recipe book. The authors call it "Idempotent Slices." Here is how it works, broken down into simple concepts:

1. What is an "Idempotent Slice"?

Think of a slice of bread. If you toast it once, it's toast. If you toast it again, it's still toast (assuming you don't burn it to a crisp). It doesn't change the world; it just produces a result based on what you put in.

In computer science, an Idempotent Slice is a chunk of code that acts like that toast.

It's safe: You can run it once, or run it a thousand times, and it will always give you the exact same result without messing up the rest of the kitchen (the program's memory or state).
It's a "Slice": It's a specific piece of the recipe that calculates one specific thing.

The problem the authors found is that previous tools tried to find these chunks but were like a clumsy chef: they would miss chunks if the recipe had complex loops or weird branching paths (like "if the oven is hot, do X, else do Y"). They would also get confused if the recipe wasn't written in a perfectly tidy format.

2. The New Tool: The "Gated" Map

To fix this, the authors built a new map of the kitchen called GSA (Gated Static Single Assignment).

Imagine the recipe book is a city.

Old Maps (SSA): These maps show you the streets, but they don't tell you why you are taking a specific turn. They just say, "You are at the intersection."
The New Map (GSA): This map adds "gates" or "traffic lights" to every intersection. It explicitly says, "You only take this road if the light is green AND the rain is falling."

By using this super-detailed map, the authors' algorithm can trace the exact path of ingredients (data) and decisions (control flow) without getting lost. It can find those scattered "chop onion" steps even if they are hidden inside a complex loop or a weird "if/else" structure that confused the old tools.

3. The Magic Trick: Outlining and Merging

Once the algorithm finds these safe, repeatable chunks (the slices), it does two things:

Outlining (Cutting it out): It takes that scattered chunk of code and cuts it out of the main recipe. It turns it into a standalone "mini-recipe" (a separate function).
Merging (The Library): It looks at all the mini-recipes it just cut out. If it finds two that are identical (or very similar), it throws away the duplicates and replaces them with a single reference: "Go to the 'Chop Onions' library and get the result."

The Analogy:
Imagine you have a 100-page instruction manual for a toy.

Before: On page 5, it says "Attach wheel A." On page 20, it says "Attach wheel A." On page 50, it says "Attach wheel A." You have to read the whole sentence every time.
After: The authors cut out the "Attach wheel A" instructions. They put them in a small "Wheel Guide" booklet. Now, on pages 5, 20, and 50, the manual just says: "See Wheel Guide, Step 1."
Result: The main manual is much shorter. The total amount of paper (code size) is reduced.

4. Why is this a big deal?

It finds hidden gems: Previous tools could only find code that was right next to each other (contiguous). This new tool can find code that is scattered across different parts of the program, even inside loops, as long as it's "safe" to move.
It saves space: In their tests, they took programs that were already highly optimized and managed to shrink them by up to 12% in specific cases. That's a lot of space saved in a world where storage and bandwidth matter.
It's fast: Even though the math behind it sounds scary (quadratic time complexity), in the real world, it behaves almost linearly. It's like a librarian who should take forever to find a book but actually finds it in seconds because most books are in the right place.

5. The Catch

Is it perfect? Not quite.
Sometimes, cutting out a small piece of code and turning it into a separate function adds a tiny bit of "overhead" (like the time it takes to walk to the library to get the guide). If the piece of code is too small or used too rarely, the overhead might make the program slightly bigger or slower.

The authors built a "cost model" (a smart calculator) to decide: "Is this chunk big enough and used often enough to be worth cutting out?" If the answer is no, they leave it alone.

Summary

The authors created a smarter way to find duplicate, safe-to-move code in computer programs. By using a more detailed map (GSA) and a careful cutting-and-pasting strategy, they can shrink program sizes significantly without breaking anything. It's like decluttering a messy garage by realizing you have three identical screwdrivers scattered in different boxes, so you throw two away and just keep one in a central spot.

Here is a detailed technical summary of the paper "Idempotent Slices with Applications to Code-Size Reduction".

1. Problem Statement

The paper addresses the challenge of reducing code size in compiled binaries by identifying and eliminating redundant computations. While previous techniques exist, they suffer from specific limitations:

Incompleteness of Existing Algorithms: A prior algorithm by Guimarães and Pereira (2023) for identifying "idempotent backward slices" (subprograms that can be safely extracted and called lazily) fails in two key scenarios:
1. Programs that do not satisfy the "conventional" Static Single Assignment (CSSA) property (e.g., overlapping live ranges of $\phi$ -function variables).
2. Control-flow graphs (CFGs) lacking a "hammock" structure (single-entry, single-exit regions), specifically failing to capture certain control dependencies in complex branching patterns.
Limitations of Current Code-Size Optimizers: Existing tools like the LLVM IROutliner and function merging by sequence alignment (FMSA) are limited. IROutliner only handles contiguous instruction sequences within basic blocks, while FMSA struggles to merge non-contiguous or non-ordered instructions within the same function.

The core problem is the lack of a sound, efficient, and general algorithm to extract idempotent backward slices from arbitrary control-flow graphs to enable aggressive, sparse code-size reduction.

2. Methodology

The authors propose a new framework based on Gated Static Single Assignment (GSA) form to solve the identification and extraction of idempotent slices.

A. Formalization of Idempotent Backward Slices

The paper defines an Idempotent Backward Slice as a maximal subprogram $S$ with respect to a criterion variable $v$ that satisfies:

Single-Entry: The basic blocks in $S$ form a single-entry region (all paths from the program entry to $S$ pass through a unique entry block).
Idempotency: Executing $S$ multiple times with the same inputs yields the same results and does not alter the observable program state (no side effects, no exceptions, no mutable memory writes).

Crucially, unlike dense slices, idempotent slices are restricted to the loop in which the criterion is defined to ensure they compute a single value rather than multiple values per iteration.

B. The GSA-Based Algorithm

To overcome the limitations of previous sparse dependence graph algorithms, the authors utilize the Gated Static Single Assignment (GSA) form:

GSA Conversion: They convert the program from SSA to GSA using the algorithm by Tu and Padua. This replaces $\phi$ $ϕ$ -functions with gate instructions ( $\gamma$ $γ$ , $\mu$ $μ$ , $\eta$ $η$ ) that explicitly encode control predicates alongside data dependencies.
- $\mu$ -instructions handle loop headers.
- $\gamma$ -instructions handle simple junctions with explicit predicates.
- $\eta$ -instructions handle value gating at loop exits.
Slice Identification: The algorithm performs a backward traversal on the GSA dependence graph. Because GSA makes control dependencies explicit in the syntax, the algorithm does not need to separately infer control dependence (a common source of errors in previous methods).
Stop Criteria: The traversal stops when it encounters:
- Function parameters (intra-procedural limit).
- $\mu$ -instructions at the same loop depth as the criterion (ensuring the slice does not escape the defining loop).

C. Slice-Based Code-Size Reduction (SBCR)

The paper outlines a four-step optimization pipeline:

Identification: Find all idempotent backward slices in the program.
Outlining: Extract identified slices into standalone functions. This involves reconstructing the Control Flow Graph (CFG) of the slice using "Transposition" (preserving internal edges) and "Attraction" (redirecting external edges to the first dominator within the slice).
Common Slice Identification: Use structural hashing and canonical comparison (via LLVM's mergefunc) to identify isomorphic (identical) slice functions.
Merging & Cost Modeling: Merge identical slices into a single function. A cost model determines profitability based on:
- $I$ : Number of instructions in the slice.
- $P$ : Number of parameters.
- $C$ : Number of occurrences.
- Heuristic: Only outline if $3 < I \le 20 $,$ P \le 1 $, and$ C \ge 10$.

3. Key Contributions

Formal Definition: A rigorous definition of idempotent backward slices that ensures referential transparency and single-entry properties.
Sound Algorithm: A linear-time algorithm (relative to CFG edges) for extracting slices from GSA form, which correctly handles non-hammock CFGs and non-CSSA programs where previous methods failed.
Sparse Code-Size Reduction: A novel optimization (SBCR) capable of merging non-contiguous and non-ordered instruction sequences within a single function or across functions, a capability lacking in IROutliner and FMSA.
Implementation & Evaluation: A robust implementation within LLVM 17.0.6, tested on the entire LLVM Test Suite (2,007 programs).

4. Results

The authors evaluated SBCR against two baselines: FMSA (function merging by sequence alignment) and LLVM IROutliner.

Code Size Reduction:
- On the full test suite, SBCR showed a slight geometric mean increase in .text size (0.11%) because not all slices are profitable.
- However, on the 29 programs where SBCR successfully reduced code size, it achieved a geometric mean reduction of -7.24% in instruction count and -8.39% in .text size.
- Best Case: On the AMGmk benchmark, SBCR reduced .text size by -12.49% (compared to -5.59% for FMSA and -0.94% for IROutliner).
- Complementarity: SBCR, FMSA, and IROutliner are complementary. Combining them (IROutliner $\to$ SBCR $\to$ FMSA) yielded the best results, reducing .text size by -9.68% on 86 benchmarks.
Performance Impact:
- Runtime overhead was statistically insignificant (geometric mean +0.06%).
- In specific cases (e.g., GlobalDataFlow-dbl), SBCR improved runtime by -3.39% due to better instruction cache locality.
Compilation Overhead:
- SBCR increased compilation time by an average of 4.22%.
- Despite a theoretical $O(N^2)$ worst-case complexity, empirical results showed near-linear scaling with program size due to the small size of most slices and the low number of candidates meeting the cost model.

5. Significance

This paper makes a significant contribution to compiler optimization by:

Solving a Theoretical Gap: It corrects the flaws in previous slice extraction algorithms regarding control-flow complexity and SSA properties, providing a sound theoretical basis for idempotent slicing.
Enabling New Optimizations: By allowing the extraction of non-contiguous code regions, SBCR unlocks redundancy elimination opportunities that contiguous-sequence-based optimizers (like IROutliner) miss.
Practical Viability: The implementation demonstrates that complex semantic transformations can be integrated into modern compiler pipelines (LLVM) with manageable compilation overhead and significant code-size benefits, particularly for embedded systems or bandwidth-constrained environments.
Future Direction: It establishes idempotent slices as a general abstraction for redundancy elimination, opening avenues for further research in profile-guided optimization and more aggressive profitability heuristics.