Home / Tutorials / STM32 Tutorial / Understanding -O1, -O2, -O3, -Os, and -Og in Embedded Firmware
pcbway
Compiler optimization levels

Understanding -O1, -O2, -O3, -Os, and -Og in Embedded Firmware

When compiling embedded C or C++ firmware, you will often see compiler flags such as -O0, -O1, -O2, -O3, -Os, and -Og. These are compiler optimization levels. They tell the compiler how much effort it should spend improving the generated machine code.

For desktop applications, optimization usually means “make the program run faster.” In embedded systems, optimization has a wider meaning. You may care about speed, flash usage, RAM usage, interrupt latency, power consumption, boot time, or debugging reliability. Choosing the wrong optimization level can result in larger firmware, increased difficulty in debugging, or even expose bugs that were previously hidden at lower optimization levels.

This tutorial explains what each optimization level does, when to use it, and what embedded developers should be aware of.


What Is Compiler Optimization?

Compiler optimization is the process of transforming your source code into more efficient machine code without changing the intended behavior of the program.

For example, this code:

int x = 5 * 10;

may be compiled as if it were written like this:

int x = 50;

The compiler can also remove unused code, simplify loops, inline small functions, reduce memory accesses, and store variables in CPU registers instead of RAM.

In embedded systems, these optimizations can make a big difference. A small 8-bit AVR, an ARM Cortex-M0, or an ESP32 has limited flash, limited RAM, and real-time timing requirements. The compiler’s optimization level affects all of these.


Common Optimization Levels

-O0 : No Optimization

-O0 means optimization is disabled. This is usually the default when no optimization flag is provided.

Example:

arm-none-eabi-gcc main.c -O0 -g -o firmware.elf

This level is commonly used during early debugging because the generated code closely follows the source code.

Advantages:

- Fast compile time
- Easiest debugging
- Variables are easier to inspect
- Breakpoints behave more predictably

Disadvantages:

- Larger code
- Slower execution
- Higher power consumption in some cases
- Timing may be very different from release builds

For embedded development, -O0 is useful while bringing up hardware, checking peripheral initialization, or stepping through code line by line. However, you should not assume that firmware tested only at -O0 will behave the same way at release optimization levels.


-O1: Basic Optimization

-O1 enables a basic set of optimizations. It improves code quality without being too aggressive.

Example:

arm-none-eabi-gcc main.c -O1 -g -o firmware.elf

At this level, the compiler may remove unused code, simplify expressions, reduce redundant memory loads, and make better use of CPU registers.

For embedded use, -O1 is a good first step when you want better performance than -O0 but still want debugging to remain somewhat manageable.

Use -O1 when:

- You want light optimization
- You are debugging a timing-sensitive issue
- `-O0` is too slow or too large
- You are not yet ready to use full release optimization

However, many embedded projects skip -O1 and use either -Og for debugging or -O2 / -Os for release builds.


-O2: Common Release Optimization

-O2 is one of the most common optimization levels for production firmware.

Example:

arm-none-eabi-gcc main.c -O2 -o firmware.elf

This level enables many optimizations that improve speed without usually causing a huge increase in code size.

In embedded systems, -O2 is often a good default for release builds when performance matters.

Typical effects of -O2 include:

- Faster loops
- Better register allocation
- Dead code elimination
- Common subexpression elimination
- Function inlining where reasonable
- Improved instruction scheduling

For microcontrollers like STM32, SAMD, ESP32, RP2040, and many ARM Cortex-M devices, -O2 often gives a good balance between speed and size.

Use -O2 when:

- You are building release firmware
- Execution speed matters
- Your flash size is not extremely tight
- You want a stable general-purpose optimization level

One important warning: debugging optimized code can be confusing. Variables may appear as “optimized out,” breakpoints may not behave exactly as expected, and the compiler may rearrange instructions.


-O3: Aggressive Speed Optimization

-O3 enables more aggressive optimizations than -O2.

Example:

arm-none-eabi-gcc main.c -O3 -o firmware.elf

It may perform more aggressive inlining, loop transformations, and other speed-focused optimizations.

However, -O3 is not always better for embedded firmware.

Why?

Because embedded systems often have limited flash, limited cache, and strict timing requirements. Aggressive inlining can make the firmware larger. Larger code may reduce instruction cache efficiency or exceed flash limits. In some cases, -O3 can make firmware slower than -O2.

Use -O3 only when:

- You have measured a real performance problem
- You have benchmarked `-O3` against `-O2`
- Your flash size is still acceptable
- You have tested timing-sensitive code carefully

Do not assume that -O3 is automatically the best release setting. In embedded work, measurement matters more than the optimization number.


-Os: Optimize for Size

-Os tells the compiler to optimize for smaller code size.

Example:

arm-none-eabi-gcc main.c -Os -o firmware.elf

This is very useful in embedded systems because flash memory is often limited.

For example, if your firmware is close to the flash limit of an ATmega328P, STM32F030, PIC32, or other small microcontroller, -Os may help the firmware fit.

Use -Os when:

- Flash memory is limited
- You are building for small MCUs
- Code size matters more than maximum speed
- You are near the firmware size limit

For many embedded projects, -Os is a better release choice than -O2, especially for small devices.

Common examples:

AVR/Arduino Uno: often use -Os
Small ARM Cortex-M0/M0+: often use -Os
Bootloaders: often use -Os
Tiny sensor nodes: often use -Os

However, smaller code is not always slower. In some MCUs, smaller code can perform well because it fits better in flash or cache.


-Og: Optimize for Debugging

-Og is designed to improve debugging while still enabling some optimizations.

Example:

arm-none-eabi-gcc main.c -Og -g -o firmware.elf

For embedded development, -Og is often better than -O0 once your project becomes more complex.

It keeps debugging usable while allowing the compiler to perform optimizations that do not heavily interfere with the debugging experience.

Use -Og when:

- You are actively debugging firmware
- You want better code than -O0
- You still need meaningful breakpoints and variable inspection
- You want your debug build to behave closer to release builds

A good embedded workflow is:

Debug build:   -Og -g
Release build: -O2 or -Os

Recommended Optimization Levels for Embedded Projects

A practical setup looks like this:

Development / debugging:
-Og -g

Early hardware bring-up:
-O0 -g

Normal release build:
-O2

Flash-limited release build:
-Os

Performance-critical release build:
-O3 only after benchmarking

For example, an STM32 project might use:

CFLAGS_DEBUG = -Og -g3
CFLAGS_RELEASE = -O2

An AVR project might use:

CFLAGS_DEBUG = -Og -g3
CFLAGS_RELEASE = -Os

A bootloader might use:

CFLAGS_RELEASE = -Os

Why Code Works at -O0 but Fails at -O2

This is one of the most common embedded firmware problems.

A developer writes code, tests it at -O0, and everything works. Then they enable -O2 or -Os, and the firmware stops working.

It is tempting to blame the compiler, but the real cause is usually a bug in the code. Optimization often exposes bugs that were already present.

Common causes include:

- Missing volatile
- Undefined behavior
- Bad pointer usage
- Stack overflow
- Race conditions
- Timing assumptions
- Uninitialized variables
- Incorrect delay loops
- Memory-mapped registers accessed incorrectly

The Importance of
volatile

In embedded C,

volatile
tells the compiler that a variable can change outside the normal program flow.

This is important for:

- Hardware registers
- Interrupt service routines
- Flags shared between ISR and main code
- Memory-mapped peripherals

Consider this example:

int button_pressed = 0;

void EXTI0_IRQHandler(void)
{
    button_pressed = 1;
}

int main(void)
{
    while (button_pressed == 0)
    {
        // wait
    }

    // continue when button is pressed
}

At higher optimization levels, the compiler may assume that

button_pressed
does not change inside the
while
loop because it cannot see any code inside the loop modifying it. The compiler may optimize the loop into an infinite loop.

The correct version is:

volatile int button_pressed = 0;

void EXTI0_IRQHandler(void)
{
    button_pressed = 1;
}

int main(void)
{
    while (button_pressed == 0)
    {
        // wait
    }

    // continue when button is pressed
}

Now the compiler knows it must reload

button_pressed
from memory each time.

Use

volatile
for variables that can change due to interrupts or hardware.

Do not use

volatile
as a general fix for all optimization problems. It is not a replacement for proper locking, atomic access, or good program design.


Hardware Registers and Optimization

Peripheral registers must usually be accessed through volatile-qualified pointers or structs.

Example:

#define GPIOA_ODR (*(volatile uint32_t *)0x48000014)

void led_on(void)
{
    GPIOA_ODR |= (1 << 5);
}

Without

volatile
, the compiler might remove or combine register accesses in ways that are valid for normal memory but incorrect for hardware registers.

Most vendor libraries already handle this correctly. For example, STM32 HAL, CMSIS, AVR headers, and ESP-IDF register definitions normally mark hardware registers as volatile.


Delay Loops Can Break Under Optimization

This is bad embedded code:

void delay(void)
{
    for (int i = 0; i < 100000; i++)
    {
    }
}

At higher optimization levels, the compiler may remove the loop because it does nothing.

A slightly better version is:

void delay(void)
{
    for (volatile int i = 0; i < 100000; i++)
    {
    }
}

But the best solution is to use a hardware timer, SysTick, RTOS delay, or vendor-provided delay function.

Better examples:

HAL_Delay(100);

or:

vTaskDelay(pdMS_TO_TICKS(100));

or a timer-based delay function.

In embedded firmware, timing should not rely on empty loops unless you fully understand the compiler, CPU clock, and generated assembly.


Code Size: -O2 vs -Os

In embedded work, smaller code is often better.

A typical comparison might look like this:

-O0:  42 KB
-O1:  31 KB
-O2:  28 KB
-Os:  24 KB
-O3:  36 KB

This is not universal, but it shows a common pattern.

-O0 is often large because the compiler does not simplify much.

-O2 often reduces size while improving speed.

-Os usually gives the smallest output.

-O3 may increase size because of aggressive inlining and loop optimizations.

Always check your firmware size after changing optimization levels.

For GCC-based embedded builds, you may see output like:

text    data     bss     dec     hex
24576   128      2048    26752   6880

Where:

text = code and constants in flash
data = initialized variables copied to RAM
bss  = zero-initialized variables in RAM

Useful Size Optimization Flags

For embedded firmware, -Os is commonly combined with section garbage collection.

Compiler flags:

-ffunction-sections -fdata-sections

Linker flag:

-Wl,--gc-sections

Example:

arm-none-eabi-gcc main.c \
  -Os \
  -ffunction-sections \
  -fdata-sections \
  -Wl,--gc-sections \
  -o firmware.elf

These options allow the linker to remove unused functions and data from the final firmware image.

This is especially useful when using large libraries where only a small part of the library is actually needed.


Optimization and Interrupts

Optimization can affect interrupt-related code if shared variables are not handled correctly.

Example:

uint8_t rx_ready = 0;

void USART_IRQHandler(void)
{
    rx_ready = 1;
}

int main(void)
{
    while (!rx_ready)
    {
    }

    process_data();
}

This should be:

volatile uint8_t rx_ready = 0;

void USART_IRQHandler(void)
{
    rx_ready = 1;
}

int main(void)
{
    while (!rx_ready)
    {
    }

    process_data();
}

For multi-byte variables shared with interrupts,

volatile
alone may not be enough.

Example:

volatile uint32_t tick_count;

On an 8-bit MCU, reading a 32-bit value may require multiple instructions. An interrupt could update the value halfway through the read.

In that case, you may need to temporarily disable interrupts or use an atomic access method.

Example:

uint32_t get_tick_count(void)
{
    uint32_t value;

    __disable_irq();
    value = tick_count;
    __enable_irq();

    return value;
}

The exact method depends on your platform.


Optimization and Debugging

When optimization is enabled, debugging can become confusing.

You may see:

- Variables shown as <optimized out>
- Breakpoints skipped
- Source lines executed out of order
- Functions inlined and not visible in the call stack
- Loops transformed into different assembly

This does not necessarily mean the debugger is broken. It means the compiler changed the code structure while preserving the intended behavior.

For debugging, prefer:

-Og -g3

Instead of:

-O2 -g

You can still debug -O2 builds, but the experience is harder.


Per-Function Optimization

Sometimes you may want most of the firmware optimized normally, but one function optimized differently.

With GCC, you can use function attributes.

Example:

__attribute__((optimize("O0")))
void debug_sensitive_function(void)
{
    // Easier to debug
}

Or:

__attribute__((optimize("O3")))
void performance_critical_function(void)
{
    // Speed-critical code
}

Use this carefully. Per-function optimization can be useful, but it can also make the build harder to understand and maintain.


Practical Embedded Build Recommendations

For most embedded firmware projects, use separate debug and release configurations.

Debug Build

-Og -g3

Good for:

- Stepping through code
- Inspecting variables
- Debugging peripheral setup
- Testing logic

Release Build for Speed

-O2

Good for:

- General production firmware
- Motor control
- Communication stacks
- Real-time applications
- DSP-like code, if size is acceptable

Release Build for Size

-Os

Good for:

- Small microcontrollers
- Bootloaders
- Arduino/AVR projects
- Battery-powered sensor nodes
- Firmware near the flash limit

Aggressive Performance Build

-O3

Use only after testing and benchmarking.


Example Makefile Setup

MCU = cortex-m4

CC = arm-none-eabi-gcc

COMMON_FLAGS = \
    -mcpu=$(MCU) \
    -mthumb \
    -Wall \
    -Wextra \
    -ffunction-sections \
    -fdata-sections

DEBUG_FLAGS = -Og -g3
RELEASE_FLAGS = -O2

LDFLAGS = -Wl,--gc-sections

debug:
	$(CC) $(COMMON_FLAGS) $(DEBUG_FLAGS) main.c $(LDFLAGS) -o firmware_debug.elf

release:
	$(CC) $(COMMON_FLAGS) $(RELEASE_FLAGS) main.c $(LDFLAGS) -o firmware_release.elf

size:
	arm-none-eabi-size firmware_release.elf

For a size-focused release build, change:

RELEASE_FLAGS = -O2

to:

RELEASE_FLAGS = -Os

How to Choose the Right Optimization Level

A good decision flow is:

Are you debugging?
Use -Og -g3.

Are you doing early board bring-up?
Use -O0 -g or -Og -g3.

Are you building production firmware?
Use -O2.

Are you running out of flash?
Use -Os.

Are you chasing maximum speed?
Try -O3, but compare it against -O2.

Did the code break when optimization was enabled?
Look for missing volatile, undefined behavior, race conditions, timing assumptions, or stack problems.

Final Thoughts

Compiler optimization is not just a performance setting. In embedded systems, it affects code size, timing, debugging, interrupt behavior, and hardware access.

For most projects, a good default is:

Debug:   -Og -g3
Release: -O2 or -Os

Use -O2 when performance matters. Use -Os when flash size matters. Use -O3 only after measuring. Avoid relying on -O0 behavior for final firmware.

Most importantly, when optimized firmware behaves differently from unoptimized firmware, do not immediately blame the compiler. In embedded C and C++, optimization often reveals hidden bugs such as missing

volatile
, unsafe interrupt sharing, undefined behavior, or timing assumptions.

A reliable embedded project should be tested using the same optimization level that will be used in production.

Check Also

Structures and padding in embedded systems

Structures and Padding in Embedded Systems: Why sizeof() Is Bigger Than You Expect

Updated: June 16, 2026In embedded systems, every byte matters. Whether you are working with a …

Index