Boosting Go Performance with Stack-Allocated Slices

Introduction

Go developers constantly seek ways to optimize program performance. A significant source of slowdown in many applications is heap allocation. Each allocation from the heap requires complex memory management and adds pressure on the garbage collector. Even with recent improvements like the Green Tea garbage collector, heap operations still carry overhead. This article explores a powerful technique available in Go 1.24+: stack allocation of slices when their size is known at compile time. Stack allocations are dramatically cheaper – often free – and produce no garbage collector load, making them ideal for hot code paths.

Boosting Go Performance with Stack-Allocated Slices — Source: blog.golang.org

The Cost of Heap Allocations in Slice Growth

Consider a function that builds a slice of tasks from a channel:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Every time append needs more capacity, it allocates a new backing array on the heap. The growth pattern (usually doubling) means the first few appends cause multiple small allocations and leave behind garbage. For example, starting from nil:

Iteration 1: allocate backing array of size 1 (heap)
Iteration 2: allocate size 2, free size 1
Iteration 3: allocate size 4, free size 2
Iteration 4: no allocation (capacity still 4, length 3)
Iteration 5: allocate size 8, and so on

This ramp-up phase is expensive, especially if the slice never grows large. The heap allocator is invoked many times, and short-lived objects are created that the garbage collector must later reclaim.

Stack Allocation When Size Is Known at Compile Time

Starting with Go 1.24, the compiler can detect situations where the maximum size of a slice is known at compile time. If the compiler can prove the slice will never exceed a certain capacity, it allocates the entire backing array on the stack. This completely eliminates heap allocations and garbage collector overhead for that slice.

How the Compiler Determines Maximum Size

The analysis looks for loops that append to a slice where the number of iterations is bounded by a compile-time constant. For example:

func process(c chan task) {
    const maxTasks = 1000
    tasks := make([]task, 0, maxTasks)  // hint: maximum size
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Here the explicit capacity hint maxTasks tells the compiler exactly how much space is needed. If the number of iterations is limited by a constant (e.g., for i := 0; i < 100; i++), the compiler infers the maximum size without a hint.

The optimization works for slices built in any way that yields a predictable maximum length, including:

Loops with constant iteration counts
Slices initialized with make([]T, n) where n is a constant


Slices created by copying from a fixed-size array



Benefits of Stack Allocation for Slices
Stack-allocated slices bring several advantages:

Zero heap allocation: The backing array lives on the stack, so append never calls the heap allocator.
No garbage collection pressure: Stack memory is freed automatically when the function returns, without GC involvement.
Excellent cache locality: Stack frames are contiguous and typically hot in CPU caches, making subsequent accesses very fast.
Lower overhead per append: The only cost is a bounds check or pointer bump; no allocation bookkeeping.


Example: Converting Heap Allocation to Stack Allocation
Let's revisit the original process function. If the channel is known to deliver at most, say, 256 tasks, we can write:
func process(c chan task) {
    const maxIter = 256
    tasks := make([]task, 0, maxIter)
    for i := 0; i < maxIter; i++ {
        t, ok := <-c
        if !ok {
            break
        }
        tasks = append(tasks, t)
    }
    processAll(tasks)
}
The compiler now sees that the backing array needs space for 256 elements and allocates it on the stack. Each append reuses the same stack memory until the slice's length equals its capacity. No heap allocations occur during the loop.

Special Case: Zero-Size Slices
If the bound is zero (e.g., make([]T, 0, 0)), the backing array has zero bytes. The compiler handles this trivially by using a shared zero-base pointer, again stack-allocated. This is already efficient but now explicit.

Limitations and When to Use Stack Allocation
Stack allocation of slices works only when the maximum capacity is a compile-time constant. Dynamic sizes (e.g., reading from a file with unknown length) must still use heap allocation. Also, very large arrays on the stack could cause stack overflow; the compiler usually limits stack frames to a few megabytes. For slices likely to exceed that limit, stick with heap allocation.
Use this optimization primarily in hot loops where you can bound the data size. Common candidates include:

Parsing fixed-format input lines
Building intermediate slices in recursive algorithms with depth limits
Processing batches of tasks where batch size is constant


Conclusion
Stack allocation of slices is a clear win for performance in Go 1.24 and later. By moving the backing array from heap to stack, you eliminate both allocation overhead and garbage collector pressure. The compiler does the hard work; you just need to provide a capacity hint or write a loop with a constant bound. When performance matters, consider whether your slices can live on the stack.





Related Articles
7 Key Facts About the Netherlands' Open-Source Code Platform (A GitHub Alternative)
Transforming Spotify Ads Management with Claude: A Conversational Interface Guide
Go 1.26 Arrives with Language Refinements, Performance Boosts, and Experimental Features
Optimizing Go Performance: Stack vs Heap Allocations for Slices
6 Essential Ways to Govern AI Agent Tool Calls in .NET with the Agent Governance Toolkit
7 Must-Know Facts About GDB Source-Tracking Breakpoints
Securing AI Agent Tool Calls: A Q&A on .NET's Agent Governance Toolkit
The Unexpected Persistence of Legacy Code and the Rapid Rise of Stack Overflow

Boosting Go Performance with Stack-Allocated Slices

Introduction

The Cost of Heap Allocations in Slice Growth

Stack Allocation When Size Is Known at Compile Time

How the Compiler Determines Maximum Size

Benefits of Stack Allocation for Slices

Example: Converting Heap Allocation to Stack Allocation

Special Case: Zero-Size Slices

Limitations and When to Use Stack Allocation

Conclusion

Related Articles

Recommended

Discover More