未解决
知从木牛瑞萨RH850 P1M-C软件算法优化实践CyberSecurity Application of ZC.MuNiu on Renesas RH850 ICUM

Renesas RH850 P1M-C Software Algorithm Optimization Practice_页面_01.jpg

1  项目背景 Introduction

在嵌入式安全通信领域,AES-CMAC(RFC 4493)是报文完整性校验的核心算法之一。本文记录了在瑞萨 RH850 P1H-C(R7F701372)平台上,利用 ABCA0 硬件加速器对 aes_cmac() 函数进行五轮深度性能优化的完整过程——从原始的 1.2 秒,最终压缩至 90 毫秒,实现 约 13 倍的整体性能提升。

In the field of embedded secure communication, AES-CMAC (RFC 4493) is one of the core algorithms for message integrity verification. This paper documents the complete process of conducting five rounds of in-depth performance optimization for the aes_cmac() function using the ABCA0 hardware accelerator on the Renesas RH850 P1H-C (R7F701372) platform—reducing the original 1.2 seconds to 90 milliseconds, achieving approximately a 13-fold overall performance improvement.


 

2  算法原理简述   Design of Security Debugging Function

AES-CMAC 的核心是一个 CBC-MAC 变体,对输入数据按 16 字节分块,逐块执行 XOR → AES-ECB 加密 的链式运算。最后一块根据填充情况选择不同的子密钥(subkey_1 或 subkey_2)参与异或。

The core of AES-CMAC is a variant of CBC-MAC, which processes input data in 16-byte blocks, performing a chained operation of XOR followed by AES-ECB encryption. The final block selects different subkeys (subkey_1 or subkey_2) for XOR based on the padding scenario.

M ⊕ IV → AES → X

M ⊕ X → AES → X

M ⊕ X → AES → X  ← 最终 MAC

算法本身是串行的——每一块的输出是下一块的输入——因此无法通过并行化加速。优化空间全部集中在 减少每块的 CPU 侧指令开销最小化硬件交互轮次 上。

The algorithm itself is sequential—each block's output serves as the input for the next block—thus, it cannot be accelerated through parallelization. The optimization potential lies entirely in reducing CPU-side instruction overhead per block and minimizing hardware interaction rounds.


 

3  性能基线分析Design of Security Debugging Function

原始实现中,aes_cmac() 在循环内对每一块调用 R_AES_HW_ECB_Encrypt(),而该函数每次都会完整执行初始化流程:

In the original implementation, aes_cmac() calls R_AES_CW-ECBEncrypt() on each block within the loop, and this function executes the initialization process completely each time:

每块执行: Init → Configure → LoadKey → ProcessBlock

原始每块指令分解:

操作

指令数(估)

说明

R_AES_HW_ECB_Encrypt 函数调用/返回

10

call/ret + 参数传递

R_AES_HW_Init

15

每块检查 s_initialized + wait

R_AES_HW_Configure

20

参数校验 + MD 寄存器写入

R_AES_HW_LoadKey

60

bytes_to_words(~45) + di/ei + CTL + IDAT +   wait

R_AES_HW_ProcessBlock

90

bytes_to_words(~45) + di/ei + CTL + IDAT +   wait + di/ei + ODAT + words_to_bytes(~45)

aes_cmac循环 block_xor_triple 调用

16

函数调用 + 4×XOR

aes_cmac 循环: if/else if 分支

6

每块判断 last block

总计

250+


 

APP 数据 CMAC 计算实测耗时:1.2 秒。

APP data CMAC calculation actual measurement time: 1.2 seconds.

 

4  优化过程Design of Security Debugging Function

4.1     第 1 轮:循环外一次性完成硬件初始化 Complete hardware initialization in one go outside the loop

问题分析 Problem analysis :

R_AES_HW_ECB_Encrypt() 每次调用都执行 Init + Configure + LoadKey,但这些操作的结果在连续的 CMAC 运算中是不变的——密钥不变,模式不变,硬件只需配置一次。

Each call to R_AES-HW-ECBEncrypt() executes Init+Configure+LoadKey, but the results of these operations remain unchanged in continuous CMAC operations - the key remains unchanged, the mode remains unchanged, and the hardware only needs to be configured once.

改动modify:

1.png

效果effect:

每块消除 Init(~15) + Configure(~20) + LoadKey(~60) = ~95 条指令。

实测时间:1.2 s → 0.75 s,提升约 1.6×。

Each block eliminates approximately 95 instructions, including Init (~15), Configure (~20), and LoadKey (~60).

Actual testing time: 1.2 s → 0.75 s, with an increase of approximately 1.6 times.

要点key points:

这是最"显而易见"的优化,但很多基于硬件加速器的加密库默认就是逐块完整初始化的——原因通常是安全性考量(防止密钥残留)或 API 简洁性。在裸机可控环境下,循环外初始化是安全且收益显著的第一步。

This is the most obvious optimization, but many hardware accelerator based encryption libraries default to complete initialization block by block - usually due to security considerations (preventing key residue) or API simplicity. In a controllable bare metal environment, out of loop initialization is a safe and profitable first step.

4.2     第 2 轮:byteswords 转换优化 bytes Words Conversion Optimization

问题分析 Problem analysis:

R_AES_HW_ProcessBlock() 内部的处理流程为:

The internal processing flow of R_AES_CrocessBlock() is as follows:

bytes → bytes_to_words() → 写 IDAT 寄存器 → HW 加密 → 读 ODAT 寄存器 → words_to_bytes() → bytes

其中 bytes_to_words() 和 words_to_bytes() 各约 45 条指令,合计 ~90 条指令——是为了将 16 字节数组与 ABCA0 的 32 位寄存器接口适配。

RH850 是小端(Little-Endian)架构。对于 4 字节对齐的 uint32 指针,直接内存读取的结果与手动拼接完全等价:

Among them, bytes_to-words() and words_to-bytes() each have about 45 instructions, totaling~90 instructions - in order to adapt the 16 byte array to the 32-bit register interface of ABCA0.

RH850 is a Little Endian architecture. For a 4-byte aligned uint32 pointer, the result of direct memory reading is completely equivalent to manual concatenation:

/* C 语言手动拼接 */

uint32 val = bytes[0] | (bytes[1]<<8) | (bytes[2]<<16) | (bytes[3]<<24);

/* 小端平台上直接读取 */

uint32 val = *(uint32 *)(bytes);    /* 结果完全相同 */

改动modify:

将 bytes_to_words 和 words_to_bytes 用 RH850 汇编重写,充分利用 RH850 的 32 位 load/store 指令,消除逐字节移位拼接:

Rewrite bytes_to-words and words_to-bytes using RH850 assembly, fully utilizing RH850's 32-bit load/store instruction to eliminate byte by byte shift concatenation:

; bytes_to_words 汇编版本(示意)

; 利用 ld.w 直接读取 32 位,替代逐字节 shift+or

ld.w    r6, 0[r10]      ; 直接读 4 字节

st.w    r6, 0[r11]

ld.w    r6, 4[r10]

st.w    r6, 4[r11]

; ...

同时在 aes_cmac.c 中调用改为:

At the same time, change the call in aes_cmac. c to:

R_AES_HW_ProcessBlock_Aligned((const uint32 *)temp, (uint32 *)prev);

效果effect:

每块消除 2 次转换函数 = ~90 条指令。

实测时间:0.75 s → 0.4 s,提升约 1.9×。

Each block eliminates 2 conversion functions=~90 instructions.

Actual testing time: 0.75 s → 0.4 s, with an increase of approximately 1.9 times.

第二轮选择用汇编重写转换函数,而非简单的 C 语言指针强转,原因在于:工程需要保证 严格的数据对齐假设显式化,汇编版本明确控制了 load/store 的地址对齐汇编版本可精确控制指令序列,避免编译器生成多余的 load/store 对在后续优化中,这些汇编片段可以直接内联到更大的优化函数中。

In the second round, we chose to rewrite the conversion function using assembly instead of simple C language pointer rotation. The reason is that the project needs to ensure strict data alignment, and the assembly version explicitly controls the address alignment of load/store. The assembly version can precisely control the instruction sequence and avoid the compiler generating redundant load/store pairs. In subsequent optimizations, these assembly fragments can be directly inline into larger optimization functions.

4.3     第 3 轮:CMAC 主循环下沉到驱动层

问题分析 Problem analysis:

即使使用了汇编优化的 ProcessBlock_Aligned,每块仍存在大量逐块冗余开销,这些开销的本质原因是 抽象层边界切割不当——CMAC 的逐块逻辑(XOR、分支判断、HW 交互)分散在 aes_cmac.c 和 icum_d_aes_hw.c 两个编译单元中,编译器无法跨文件内联和优化。

Even with assembly optimized ProcessBlock_Aligned, there is still a significant amount of block by block redundancy overhead per block, which is essentially due to improper boundary cutting of the abstraction layer - CMAC's block by block logic (XOR, branch decision, HW interaction) is scattered in two compilation units, aes_cmac. c and icum_d_aes_hw. c, making it impossible for the compiler to inline and optimize across files.

改动modify:

核心思路:将整个 CMAC 主循环(XOR → HW ECB → ODAT 读取 → last block 处理)合并为驱动层的单个函数,一次性接收全部参数。

Core idea: Merge the entire CMAC main loop (XOR → HW ECB → ODA read → last block processing) into a single function in the driver layer, receiving all parameters at once.

aes_hw_error_t R_AES_HW_CMAC_ProcessBlocks(

    const uint8  *p_input,

    uint32        length,

    const uint8  *p_subkey1,

    const uint8  *p_subkey2,

    uint8        *p_mac_out)

 

(1)函数内部结构 Internal structure of function:

1. 参数检查(仅一次)

2. 计算 n_blocks, remainder, prefix_count

3. 预计算 CTL 值

   - ctrl_first = START | NEW_KEY(首块)

   - ctrl_rest  = 0x0000(后续块)

4. 首块单独处理(循环外),使用 ctrl_first

5. 后续块紧凑循环(无分支),使用 ctrl_rest

6. 最后一块:subkey XOR + padding 处理

1. Parameter check (only once)

2. Calculate n_blocks, residual, prefix_count

3. Pre calculate CTL values

-Ctrlfirst=START | NEW_KEY (first block)

-Ctrd_reset=0x0000 (subsequent block)

4. Process the first block separately (outside the loop) using ctrd_first

5. Compact loop for subsequent blocks (without branches), use ctrd_reset

6. Last block: subkey XOR+padding processing

(2)每块热循环内的实际操作 Actual operation within each thermal cycle:

2.png

 

(3)aes_cmac.c 主循环替换为单次调用 Replace the aes_cmac. c main loop with a single call:

hw_ret = R_AES_HW_CMAC_ProcessBlocks(input, length, subkey_1, subkey_2, mac_value);

if (hw_ret == AES_HW_OK) {

    return;

}/* 否则走纯软件回退路径 */

效果effect:

每块消除 ~37 条指令。

实测时间:0.4 s → 0.2 s,提升约 2×。

Eliminate~37 instructions per block.

Actual testing time: 0.4 s → 0.2 s, an increase of about 2 ×.

设计考量 design considerations:

将 CMAC 逻辑下沉到驱动层违背了"驱动只做硬件抽象"的传统分层原则。但在嵌入式性能敏感场景中,这种权衡是合理的:CMAC 是该硬件加速器的主要使用场景,不是边缘用例;裸机环境下没有多进程竞争,不需要严格的抽象隔离;保留了R_AES_HW_ProcessBlock_Aligned() 作为通用接口,其他算法仍可使用。

Downgrading CMAC logic to the driver layer violates the traditional layering principle of 'drivers only do hardware abstraction'. But in embedded performance sensitive scenarios, this trade-off is reasonable: CMAC is the main usage scenario for this hardware accelerator, not an edge use case; There is no multi process competition in a bare metal environment, and strict abstract isolation is not required; Retained R_AES-HW-ProcessBlock_Aligned() as a universal interface, while other algorithms can still be used.

4.4     第 4 轮:循环内 PSW 寄存器读取优化 Optimization of PSW register reading within the loop

问题分析 Problem analysis:

DISABLE_INTERRUPT_WITH_CHECK 宏的展开逻辑:

#define DISABLE_INTERRUPT_WITH_CHECK(saved) do {

    saved = STSR(PSW);              /* 读系统寄存器 ~3 条指令 */

    if (0 == (saved & 0x20)) {      /* 条件分支 ~2 条指令 */

        __DI();                               /* ~1 条指令 */

    }

} while(0)

每块执行 2 次(写 IDAT 前 + 读 ODAT 前),仅 STSR(PSW) 就是 ~6 条指令。STSR 是特权系统寄存器读取,开销高于普通内存访问,循环执行期间,PSW.ID 位(中断禁止标志)不会被外部代码改变——裸机环境下没有其他线程会在我们的循环中间修改中断状态。

Each block is executed twice (before writing IDAT+before reading ODAT), and STSR (PSW) alone contains~6 instructions. STSR is a privileged system register read with higher overhead than regular memory access. During loop execution, the PSW.ID bit (interrupt disable flag) will not be changed by external code - no other thread in a bare metal environment will modify the interrupt state in the middle of our loop.

改动modify:

3.png

效果effect:

每块消除 2 次 STSR(PSW) = ~8 条指令。

实测时间:0.2 s → 180 ms。

Each block eliminates 2 STSR (PSW)=~8 instructions.

Actual measurement time: 0.2 s → 180 ms.

4.5     第 5 轮:开启编译器优化(-Ospeed)

问题分析 Problem analysis:

经过前四轮的手工优化,CPU 侧的"显性冗余"已被基本消除。此时进一步的手动代码重构收益递减。但回顾整个优化过程,有一个维度始终未被触及——编译器优化级别。工程中 CMAC 和 AES 硬件驱动相关的 .gpj 文件此前可能使用的是默认或较低的优化级别(如 -O0 或 -O1)。编译器在低优化级别下:不进行函数内联(即使在同一编译单元内);不消除冗余的 load/store;不进行循环展开或指令调度;不利用 RH850 的寻址模式优化。

After the first four rounds of manual optimization, the "explicit redundancy" on the CPU side has been basically eliminated. At this point, further manual code refactoring yields diminishing returns. But looking back at the entire optimization process, there is one dimension that has never been touched upon - the compiler optimization level. The. gpj files related to CMAC and AES hardware drivers in engineering may have previously used default or lower optimization levels (such as - O0 or - O1). At low optimization levels, the compiler does not inline functions (even within the same compilation unit); Do not eliminate redundant load/store; Do not perform loop expansion or instruction scheduling; Optimizing the addressing mode without utilizing RH850

改动modify:

对以下 .gpj 文件添加 -Ospeed 编译选项:

CMAC 相关的 .gpj(aes_cmac.c 所在工程文件)

AES 硬件驱动的 .gpj(icum_d_aes_hw.c 所在工程文件)

Add the - OSpeed compilation option to the following. gpj files:

CMAC related. gpj (project file containing aes_cmac. c)

AES hardware driver. gpj (project file containing icum_d_aes_hw. c)

效果effect:

实测时间:180 ms → 90 ms,提升约 2×。

Actual testing time: 180 ms → 90 ms, an increase of about 2 times.

这是五轮优化中 单轮收益最大 的一步,说明前四轮手工优化后的代码中存在一些可被编译器消除的低效模式——特别是函数调用边界处的寄存器保存/恢复、冗余的条件分支、以及循环中的不变量计算。

This is the most profitable step in a single round of optimization among the five rounds, indicating that there are some inefficient patterns in the code after the first four rounds of manual optimization that can be eliminated by the compiler, especially register save/restore at function call boundaries, redundant conditional branching, and invariant computation in loops.

and invariant computation in loops.




标志.png

阅读全文 收起
发布于 2026-04-21 18:12:03
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
RH850系列芯片ICUM与PE核间通信介绍 Communication Between ICUM And PE Core In RH850 Series Chips

Communication Between ICUM and PE Core In RH850 Series Chips_页面_01.jpg

简介

在汽车电子系统的信息安全应用中,多核芯片的核间通信至关重要。瑞萨RH850系列MCU采用了主核(PE)与安全核(ICUM)的双核架构,两者如何高效、可靠地进行数据交换与任务同步?本文将以RH850 P1x-C芯片ICUM与PE核间中断建立为例,深入浅出地讲解其核心原理与配置步骤。

In automotive Cyber Security applications, inter-core communication in multi-core chips is crucial. The Renesas RH850 series MCUs feature a dual-core architecture with a main core (PE) and a security core (ICUM). How do they efficiently and reliably exchange data and synchronize tasks? This article uses the establishment of interrupts between ICUM and PE of the RH850 P1x-C as an example to explain the core principles and configuration steps in an easy-to-understand manner.


1  核间中断的两种跳转方式TWO METHODS FOR INTER-CORE INTERRUPT VECTORING

当PE核需要触发ICUM核的中断(反之亦然),CPU如何找到对应的处理程序?RH850提供了两种方式:直接向量法 (Direct Vector Method) 和表引用法 (Table Reference Method)。

When the PE core needs to trigger an interrupt on the ICUM core (or vice versa), how does the CPU find the corresponding handler? The RH850 provides two methods: Direct Vector Method and Table Reference Method.

直接向量法:中断处理程序的地址由基地址(Base Address)加上固定偏移量计算得出。基地址由 PSW.EBV 位决定(选择 RBASE 或 EBASE 寄存器),而偏移量则由 RINT 位控制(通常为 0x100)。这种方法简单直接,适用于中断源较少或优先级固定的场景。

Direct Vector Method: The address of the Interrupt Service Routine (ISR) is calculated as Base Address + Offset Address. The base address is selected by the PSW.EBV bit (choosing the RBASE or EBASE register), and the offset is controlled by the RINT bit (typically 0x100). This method is straightforward, suitable for scenarios with fewer interrupt sources or fixed priorities.

可以用以下公式来进行计算:

The following formula can be used for calculation:

Exception Handler Address = base address + offset address

base address:在 PSW.EBV 位中选择是使用 RBASE 寄存器还是 EBASE 寄存器作为基地址。当 PSW.EBV 位被设置为 1 时,使用 EBASE 寄存器的值作为基地址。当 PSW.EBV 位被清零为 0 时,使用 RBASE 寄存器的值作为基地址。

base address:In the PSW.EBV bit, choose whether to use the RBASE register or the EBASE register as the base address. When the PSW.EBV bit is set to 1, the value of the EBASE register is used as the base address. When the PSW.EBV bit is cleared to 0, the value of the RBASE register is used as the base address.

offset address:如果 RBASE.RINT 位或 EBASE.RINT 位被设置为 1,则所有用户中断都使用 100H 的偏移量进行处理。如果该位被清零为 0,则代表使用表引用法。偏移地址根据下表来确定

offset Address: If the RBASE.RINT bit or EBASE.RINT bit is set to 1, all user interrupts will be processed using an offset of 100H. If this bit is cleared to 0, it indicates the use of table reference method. The offset address is determined according to the following table.


1.png

对于直接向量方法,每个中断优先级都有一个用户中断异常处理程序,具有相同优先级的多个中断分支会指向同一中断处理程序,如果希望从一开始就为每个中断处理程序使用不同的代码区域,可以通过Table Reference Method解决。

For the direct vector method, each interrupt priority has a user interrupt exception handler. Multiple interrupt branches with the same priority will point to the same interrupt handler. If it is desired to use different code regions for each interrupt handler from the very beginning, the Table Reference Method can be used to solve this problem.

表引用法:中断处理程序的地址存储在一个中断向量表中。CPU通过 INTBP 寄存器(中断表基址指针)加上中断通道号乘以4(INTBP + channel × 4)来读取表中的地址,然后跳转执行。这种方式更为灵活,允许为每个中断通道指定独立的处理函数。

Table Reference Method: The ISR addresses are stored in an interrupt vector table. The CPU reads the address from the table using the INTBP register (Interrupt Table Base Pointer) plus the interrupt channel number multiplied by 4 (INTBP + channel × 4), then jumps to it. This method is more flexible, allowing independent handlers for each interrupt channel.

可以用以下公式来进行计算:

The following formula can be used for calculation:

Exception Handler Address = INTBP register value + EI level maskable interrupt channel number * 4




2.png


2  中断注册INTERRUPT REGISTRATION

核间通信需要两条“通道”:一条从PE到ICUM,一条从ICUM到PE。

Inter-core communication requires two "channels": one from PE to ICUM, and one from ICUM to PE.

PE 到 ICUM:使用 EIINT0 中断。需要在ICUM端的异常处理文件(如 exception.850)中注册该中断,并将其与处理函数绑定。当中断触发时,ICUM会先判断中断源,然后调用预先注册的回调函数执行具体功能。

PE to ICUM: Uses the EIINT0 interrupt. You need to register this interrupt in the ICUM's exception handling file (e.g., exception.850) and bind it to a handler function. When the interrupt triggers, the ICUM first identifies the source and then calls the pre-registered callback function.


3.png


ICUM 到 PE:使用 EIC3 中断。在PE端的启动文件中注册该中断,并指定其处理函数。根据上文所述,使用表引用法时,其向量在表中的偏移量为 0x0C。

ICUM to PE: Uses the EIC3 interrupt. Register this interrupt in the startup file at the PE end and specify its handling function. According to the above description, when using the table reference method, the offset of the vector in the table is 0x0C.

4.png

3  中断使能INTERRUPT ENABLE

注册完成后,必须使能中断,通信才算真正准备就绪。使能过程分为几个关键步骤:

After registration, interrupts must be enabled for communication to be truly ready. The enabling process involves several key steps。

1.    使能整体中断掩码:

PE到ICUM:设置 PE2ICUIE 寄存器中对应的位为 1。

ICUM到PE:设置 ICU2PEIE 寄存器中对应的位为 1。

2.    配置中断控制寄存器:为中断设置优先级(Priority)和确认中断向量选择方法(TBn位)。同时,必须将对应的掩码位(MKn)清零,以取消对该中断的屏蔽。

3.    清除CPU级中断掩码:操作 IMR(中断掩码寄存器),将对应中断的掩码位置 0。

4.    全局使能中断:最后,调用 ENABLE_INTERRUPT() 等类似函数,开放CPU的中断总开关。

1.      Enable global interrupt mask:

From PE to ICUM: Set the corresponding bit in the PE2ICUIE register to 1.

From ICUM to PE: Set the corresponding bit in the ICU2PEIE register to 1.

2.      Configure EIC registers: Set the priority (Priority) and the method for selecting the interrupt vector (TBn bit) for the interrupt. At the same time, the corresponding mask bit (MKn) must be cleared to cancel the masking of this interrupt.

3.   Clear CPU-level interrupt mask: Perform the operation on the IMR (Interrupt Mask Register), set the mask position of the corresponding interrupt to 0.

4.      Enable interrupts globally: Finally, call functions such as ENABLE_INTERRUPT() to open the overall interrupt switch of the CPU.


4  中断触发INTERRUPT TRIGGER

配置完成后,软件可以通过向特定寄存器写“1”来软件触发一个核间中断。

Once configured, software can trigger an inter-core interrupt by writing a "1" to a specific register.

ICUM 触发 PE 中断:向ICU2PEFS寄存器对应位置写1。

ICUM triggers PE interrupt:Write 1 to the corresponding position of the ICU2PEFS register

PE 触发 ICUM 中断:向PE2ICUFS寄存器对应位置写1。

PE triggers ICUM interrupt:Write 1 to the corresponding position of the PE2ICUFS register


5.png

5  权限控制ACCESS CONTROL

在尝试读写上述控制寄存器时,如果发现无法写入,通常是PBG(外设总线保护)权限未开启。PBG是一种硬件保护机制,防止未授权的访问破坏关键配置。

If you find you cannot write to the control registers mentioned above, it is often because the PBG (Peripheral Bus Guard) permissions are not enabled. PBG is a hardware protection mechanism that prevents unauthorized access from corrupting critical configurations.

例如,ICU_CMDREG 属于 PBG5 保护组。要开启PE核对其的访问权限,需要配置相应的保护寄存器(如 FSGD5APROT02 和 FSGD5APROT03),将对应 PEID 的权限位置 1。

For example, ICU_CMDREG belongs to the PBG5 protection group. To grant the PE core access, you need to configure the corresponding guard registers (e.g., FSGD5APROT02 and FSGD5APROT03) and set the permission bit for the corresponding PEID to 1.


6.png

通过以上五个步骤,开发者就能在RH850 P1x-C的双核之间建立起一条稳定、安全的“通信桥梁”。这种核间中断机制是实现安全启动、安全诊断、安全升级等复杂功能的基础,确保了主核与安全核可以协同工作,共同守护车辆的信息安全。

Through the five steps outlined above,developers can establish a stable and secure "communication bridge" between the dual cores of the RH850 P1x-C. This inter-core interrupt mechanism is the foundation for implementing complex functions like Secure Boot, Secure Diagnostics, and Secure Update, ensuring the main core and the security core can work together to safeguard the vehicle's cybersecurity.

标志.png



阅读全文 收起
发布于 2026-03-16 16:56:28
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
知从木牛瑞萨RH850P1X-C安全调试过程介绍SecureDebug of ZC.MuNiu CyberSecurity on Renesas RH850 P1X-C

SecureDebug of ZC.MuNiu CyberSecurity  on Renesas RH850 P1X-C_页面_1.jpg

1  引言

       

随着汽车电子电气架构向集中化、网联化、智能化演进,车载控制器的信息安全已从附加特性变为核心需求,是保障车辆功能安全、数据主权与用户隐私的基石。瑞萨电子(Renesas)推出的RH850/P1x-C系列微控制器(MCU)凭借其高性能多核架构与集成化的硬件安全模块(HSM/ICU-M),已成为汽车底盘、动力及先进车身控制器的主流平台。该平台不仅在运行时提供加密、认证与隔离保护,其调试接口与流程本身也嵌入了基于芯片级硬件安全机制(如OCDID认证)的严密设计,有效防止固件泄露、IP盗取及非授权代码注入。

With the evolution of automotive electronic and electrical architecture towards centralization, networking, and intelligence, the information security of in vehicle controllers has shifted from an additional feature to a core requirement, serving as the cornerstone for ensuring vehicle functional safety, data sovereignty, and user privacy. The RH850/P1x-C series microcontrollers (MCUs) launched by Renesas have become the mainstream platform for automotive chassis, power, and advanced body controllers due to their high-performance multi-core architecture and integrated hardware safety module (HSM/ICU-M). The platform not only provides encryption, authentication, and isolation protection during runtime, but its debugging interface and process are also embedded with a rigorous design based on chip level hardware security mechanisms (such as OCDID authentication), effectively preventing firmware leakage, IP theft, and unauthorized code injection.


在量产与售后维护阶段,如何在确保系统整体安全性的前提下,对已部署安全保护机制的控制器进行授权调试,是工程实践中的关键难题。传统的调试接口管理方式往往与最终的安全策略存在矛盾。为解决此问题,知从提出并实现了一种基于服务化接口的安全调试方案。该方案通过在RH850/P1x-C芯片的主核与ICU-M核之间构建标准化的安全调试服务接口(如IcumIf_JtagPwd_Enable、IcumIf_JtagPwd_Disable),将调试接口的使能与关闭操作封装为可控、可审计的安全服务。其核心是运用芯片提供的片上调试ID硬件认证机制,通过向特定地址(如RH850 P1X-C芯片的0xFF280050与0xFF2800B0)写入认证密钥,动态管理调试接口的访问权限,从而在开发调试便利性与产品生命周期安全之间取得最佳平衡。

How to authorize and debug controllers with deployed security protection mechanisms while ensuring overall system security during mass production and after-sales maintenance is a key challenge in engineering practice. The traditional debugging interface management method often conflicts with the final security strategy. To address this issue, ZC proposed and implemented a secure debugging solution based on service-oriented interfaces. This solution constructs a standardized secure debugging service interface (such as IcumIf_JtagPwd_Enable, IcumIf_JtagPwd_Disable) between the main core of RH850/P1x-C chip and the ICU-M core, encapsulating the enable and disable operations of the debugging interface into controllable and auditable secure services. Its core is to use the on-chip debugging ID hardware authentication mechanism provided by the chip, dynamically manage the access permissions of the debugging interface by writing authentication keys to specific addresses (such as 0xFF280050 and 0xFF28000B0 of RH850 P1X-C chip), thus achieving the best balance between development and debugging convenience and product lifecycle security.


2  安全调试功能介绍


RH850 P1X-C芯片的安全调试机制基于片上调试ID(On-Chip Debug ID, OCDID)硬件认证机制实现。该机制要求在调试器连接时,必须通过预先设置的OCDID进行身份验证,方可开启调试接口的访问权限。实现该功能的关键,在于准确掌握OCDID在芯片内存映射中的存储地址及其编程方法。

The secure debug mechanism of RH850 P1X-C chip is implemented based on the On Chip Debug ID (OCDID) hardware authentication mechanism. This mechanism requires authentication through pre-set OCDID when the debugger connects, in order to grant access to the debugging interface. The key to implementing this function lies in accurately mastering the storage address and programming method of OCDID in the chip memory mapping.

通过对RH850/P1x-C系列芯片技术手册的解析,确认P1X-C型号芯片的OCDID存储地址分为两段独立配置:

第一段配置地址:0xFF280050

第二段配置地址:0xFF2800B0

需向上述两段地址分别写入相应的128位认证密钥,方可完整配置OCDID。该配置通常需在芯片初始化阶段,通过特定的安全服务或底层驱动函数完成。

Through the analysis of the RH850/P1x-C series chip technical manual, it is confirmed that the OCDID storage address of the P1X-C model chip is divided into two independent configurations:

First section configuration address: 0xFF280050

Second section configuration address: 0xFF2800B0

To fully configure OCDID, corresponding 128 bit authentication keys need to be written to the two addresses mentioned above. This configuration usually needs to be completed during the chip initialization phase through specific security services or underlying driver functions.


1.png

       

在知从的设计方案中,安全调试功能的启用与禁用机制通过诊断通信协议进行远程管理。具体而言,设计了两个专用的诊断服务:

服务 0x31 0xFFBB:此服务用于启用调试接口。当通过CAN总线接收到此诊断请求报文时,系统将调用安全服务接口 IcumIf_JtagPwd_Enable()。该函数内部执行逻辑为,通过发送的20字节MasterPassWord,生成32字节SecureJtagPassWord,向芯片指定的OCDID配置地址(0xFF280050与 0xFF2800B0)写入生成的SecureJtagPassWord,从而激活基于OCDID的调试访问认证机制。此后,调试器(如iSYSTEM)连接时必须提供匹配的密钥方可建立调试会话。

服务 0x31 0xFFCC:此服务用于禁用调试接口。当接收到此诊断请求时,系统将调用安全服务接口 IcumIf_JtagPwd_Disable()。该函数执行的操作是通过发送的20字节MasterPassWord,同样通过核间通信将数据发送到ICUM核,在ICUM核生成32字节SecureJtagPassWord,并与芯片中设置的OCDID进行比较,如果比较成功则向上述OCDID地址写入默认值(如全F),以清除已配置的密钥,从而关闭调试接口的认证保护,或将其恢复至默认无效状态。

In the design scheme of ZC, the enabling and disabling mechanism of security debugging function is remotely managed through diagnostic communication protocol. Specifically, two dedicated diagnostic services have been designed:

Service 0x31 0xFBBB: This service is used to enable the debugging interface. When receiving this diagnostic request message through the CAN bus, the system will call the security service interface IcumIf_JtagPwd_Enabling (). The internal logic of this function is as follows: Using the 20-byte MasterPassWord sent, the PE will send it via inter-core communication to the ICUM core. The ICUM core will generate a 32-byte SecureJtagPassWord, and write the generated SecureJtagPassWord to the specified OCDID configuration address (0xFF280050 and 0xFF2800B0) of the chip, thereby activating the debugging access authentication mechanism based on the OCDID. Thereafter, the debugger (such as iSYSTEM) must provide a matching key when connecting to establish a debugging session.

Service 0x31 0xFCCC: This service is used to disable the debugging interface. When receiving this diagnostic request, the system will call the security service interface IcumIf_JtagPwd_Disab(). The operation performed by this function is to send the 20-byte MasterPassWord, and simultaneously send the data to the ICUM core through inter-core communication. In the ICUM core, a 32-byte SecureJtagPassWord is generated and compared with the OCDID set in the chip. If the comparison is successful, the default value (such as all Fs) will be written to the above OCDID address to clear the configured key, thereby disabling the authentication protection of the debugging interface or restoring it to the default invalid state.


2.png

       此设计将底层的芯片安全调试硬件控制(OCDID配置)封装为标准化的、可远程调用的诊断服务,实现了调试权限的按需、可控管理,显著提升了在量产部署和售后维护场景下进行授权诊断与调试的安全性和便利性。

This design encapsulates the underlying chip security debugging hardware control (OCDID configuration) into standardized, remotely callable diagnostic services, achieving on-demand and controllable management of debugging permissions, significantly improving the security and convenience of authorized diagnosis and debugging in mass production deployment and after-sales maintenance scenarios.

在通过诊断服务 0x31 0xFFBB启用安全调试功能后,调试器(如 iSYSTEM winIDEA)连接芯片时必须通过OCDID认证,即输入正确的32字节SecureJtagPassWord。该密码并非明文存储,而是由受保护的20字节MasterPassWord通过特定算法派生得出,以确保根密钥的安全性。

After the security debugging function is enabled through the diagnostic service 0x31 0xFFBB, the debugger (such as iSYSTEM winIDEA) must pass OCDID authentication when connecting to the chip, that is, enter the correct 32 byte SecureJTagPassWord. The password is not stored in plain text, but is derived from a protected 20 byte MasterPassWord using a specific algorithm to ensure the security of the root key.

密钥派生与获取可以通过知从啸天工具来获取:输入20字节的MasterPassWord到PassWord,然后输入预设的Salt值,设置好Key Length,就可以得到派生好的SecureJtagPassWord。

Key derivation and acquisition can be obtained through the ZC XiaoTian tool: enter a 20 byte MasterPassWord into PassWord, then enter the preset Salt value, set the Key Length, and you can obtain the derived SecureJtagPassWord.


3.png

标志.png

阅读全文 收起
发布于 2026-03-10 15:32:15
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
知从木牛英飞凌TRAVEO CYT4BB SECUREDEBUG介绍SECUREDEBUG OF ZC.MUNIU ON INFINEON TRAVEO CYT4BB



阅读全文 收起
发布于 2026-01-21 09:29:47
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
INFINEON TC2XX SECURE APPLICATION INTRODUCTION FROM ZC.MUNIU CYBERSECURITY

阅读全文 收起
发布于 2025-07-23 19:59:14
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
Application of ZC.MuNiu CyberSecurity on Infineon Traveo CYT4BB

1      Introduction of CYT4BB and CyberSecurity Applications

CYT4BB TRAVEO™ T2G 32-bit automotive MCU is targeted at automotive systems such as high-end body control units.The CYT4BB features two Arm® Cortex®-M7 CPUs (for main processing) and one Arm Cortex-M0+ CPU (for peripheral and security processing). Among them, the CM0 core integrates a variety of hardware algorithm functions, including: AES, CHACHA, CMAC, CRC, DES/TDES, SHA1/SHA2/SHA3, HMAC, TRNG/PRNG, RSA, and many other algorithms, based on which it can be implemented to extend a variety of information security applications, such as: SecureBoot, SecurUpdate, SecureDiagnostics other functions.

In addition, ZC Muniu CryptoLibrary integrates a variety of software algorithms and extended applications based on the original hardware algorithm function, such as: SM2/SM3/SM4, ECDSA 256R1, RSASSA-PKCS-v1_5(4096bits Key), RSASSA-PSS-(4096bits Key), SHE/ USERKEY Load, GetUid, KDF, DebugHandling, etc., to achieve secure storage.

image.png

1.1   Communication Feature


       The IPC hardware contains register structures for IPC channel and IPC interrupt. IPC channel registers implement lock/release mechanisms, and messaging. IPC interrupt structure registers generate interrupts to each CPU for messaging events and lock/release events.The IPC channel structure ACQUIRE register provides lock feature and IPC_STRUCTx_LOCK_STATUS indicates lock status. The IPC_STRUCTx_NOTIFY register generates notification event, the IPC_STRUCTx_RELEASE register releases IPC channel structure and generates release event.

After the CM0 core releases the CM7 core through secure boot, the CM7 calls IPC channel and interrupt initialisation, and calls Cy_Crypto_Enable to notify the CM7 core to initialise the IPC channel and related interrupts. When both initialisations are complete, other algorithmic tasks can be invoked for processing.

image.png

image.png

2      Secure Boot

Secure Boot is a fundamental function of the MCU, implemented through hardware encryption modules. This mechanism must operate independently of user programs and cannot be compromised. As the foundation of the entire secure boot trust chain, Secure Boot is mainly used to verify the integrity and authenticity of key programs defined by users in Flash memory after the MCU starts and before user programs execute, to determine if they have been tampered with. If the verification fails, it indicates that the MCU is in an untrusted state, and some functions or even the entire program cannot run.

image.png

CYT4BB uses RSASSA-PKCS-v1_5 (2048bits Key) to read the Public Key pre-stored in SFLASH and the Signature value in Code Flash when the CM0 core is powered on and started up, and then through the pre-programmed RSA Verify interface integrated in SFLASH, it computes the Boot_Hash and Calc_Hash, which are compared to determine the reliability of the secure boot root of trust. Afterwards, the CM7 core is started for the verification of the secure boot trust chain. The first stage bootloader is verified by the root of trust code, and successful verification allows this to verify valid software execution and continue to verify the validity of software in subsequent boot stages.

image.png

3      SECURE DIAGNOSTICS

Secure Diagnostic is an important means of protecting the internal data security of ECUs (Electronic Control Units). It is primarily used for diagnostic services that require identity verification when programs or data are downloaded/upload to a server and when specific memory locations are read from the server. Unusual program uploads or downloads to the server could potentially damage electronic devices or other vehicle components, or may violate vehicle emission or safety standards. On the other hand, when retrieving data from the server, data security could be compromised. Therefore, it is necessary to require the upper computer to prove its identity before executing these services, and only after legal identity confirmation is allowed to access data and diagnostic services.

image.png

CYT4BB uses various hardware encryption algorithms such as TRNG/PRNG, AES128/256, AES-CMAC, HMAC, etc. to confirm the identity of the client and decide whether the client is allowed to access. The keys used for AES, CMAC, HMAC and other algorithms can be stored in advance using the LoadKey interface, and processed on the CM0 side by means of KeyID call, which ensures the reliability of the authentication information.

4      SECURE UPDATE

As network environments become increasingly complex, ensuring that the release source of upgrade packages is valid, not tampered with, data is not lost, and upgrade content is not maliciously obtained during the software upgrade process is becoming more and more important.In traditional upgrade processes, the upgrade package data is essentially transmitted in plaintext, and the data verification method is also a less secure CRC algorithm.

On the one hand, CYT4BB can adopt the RSASSA-PKCS-v1_5(2048/3072bits Key) algorithm that comes with its CM0 core, and on the other hand, it can also adopt Ed25519, ECDSA 256R1, RSASSA-PKCS-v1_5(4096bits Key), RSASSA-PSS-(4096bits Key), and so on, which can satisfy the user's needs for multiple algorithms. Key), RSASSA-PSS-(4096bits Key), etc., which can meet the user's needs for multiple algorithms. Moreover, for the PublicKey and PrivateKey used by the asymmetric algorithms therein, the same LoadUserKey interface can be used to pre-store them in the specified location of SFLASH and invoke them through the KeyID, which ensures the authenticity and integrity of the security upgrade process.

       In addition, CYT4BB also supports certificate storage, which can call CM0 core Srom related interfaces to store the required certificates in the corresponding location of the Work Flash, and through the X.509 certificate parsing call interface and configure the algorithms for the implementation of certificate-based security upgrade mechanism

image.png


5      SECURE STORAGE

Secure Storage can protect the contents of data areas from being stolen abnormally, preventing the copying of stored keys, certificates, and other content due to forced access to the data storage area by controllers. Currently, mainstream chips can protect data by setting up Flash, Nvm, RAM, and other storage areas. Activating this feature can effectively prevent the aforementioned situations

CYT4BB achieves the foundation of secure storage by designating an internal SFLASH area for key and user information storage. On this basis, CYT4BB implements the secure storage of SHE key in accordance with AUTOSAR_TR_SecureHardwareExtensions specification, and encrypts and protects the key deposited in SFLASH to ensure the reliability of secure startup, secure diagnosis, and secure upgrade functions.image.png



阅读全文 收起
发布于 2025-04-16 10:29:29
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
知从木牛英飞凌TRAVEO CYT4BB信息安全应用介绍


1      CYT4BB介绍及信息安全应用

CYT4BB TRAVEO™ T2G 32位汽车MCU面向高端车身控制单元等汽车系统。CYT4BB具有两个Arm® Cortex®-M7 CPU(用于主处理)和一个Arm Cortex-M0+ CPU(用于外设和安全处理)。其中CM0核集成了多种硬件算法功能,包括:AES、CHACHA、CMAC、CRC、DES/TDES、SHA1/SHA2/SHA3、HMAC、TRNG/PRNG、RSA等多种算法,基于此可以实现多种信息安全应用的扩展,如:安全启动、安全升级、安全诊断等功能。

此外,知从木牛信息安全库在原有的硬件算法功能基础上,集成了多种软件算法及扩展应用,如:SM2/SM3/SM4、ECDSA 256R1、RSASSA-PKCS-v1_5(4096bits Key)、RSASSA-PSS-(4096bits Key)、SHE/USERKEY Load、GetUid、KDF、DebugHandling等,实现了安全存储的功能。


image.png

1.1   通信机制Communication Feature

通信机制:IPC硬件包含IPC通道和IPC中断的寄存器结构,其通道寄存器实现Lock/Release机制和消息传递。IPC的中断结构寄存器为消息传递事件和Lock/Release事件生成对每个CPU的中断。IPC通道结构中的ACQUIRE寄存器提供Lock特性,IPC_STRUCTx_LOCK_STATUS表示Lock状态。IPC_STRUCTx_NOTIFY寄存器生成通知事件,IPC_STRUCTx_RELEASE寄存器释放IPC通道结构并生成释放事件。

CM0核通过安全启动释放CM7核后,CM7调用IPC通道及中断初始化,并调用 Cy_Crypto_Enable通知CM7核进行IPC通道及相关中断初始化。当两边初始化均完成后,方可调用其他算法任务进行处理。


image.png

image.png

2      安全启动Secure Boot

安全启动(SecureBoot)是 MCU 的基本功能,通过硬件加密模块来实现,该机制必须独立于用户程序运行,不能被破坏。作为整个安全启动信任链的基础,安全启动主要用于在 MCU 启动之后,用户程序执行之前,对用户定义的 Flash 中关键程序的数据完整性和真实性进行验证,确定是否被篡改。如果验证失败,说明 MCU 处于不可信的状态,部分功能甚至整个程序不能运行。


image.png

CYT4BB采用RSASSA-PKCS-v1_5(2048bits Key)的方式,于CM0核上电启动时读取预先存储在SFLASH中的Public Key和Code Flash中的Signature值,通过集成在SFLASH中的预设RSA Verify接口,计算得出Boot_Hash和Calc_Hash,比较后得出安全启动信任根的可靠性。之后启动CM7核进行安全启动信任链的校验。通过信任根代码的root对第一阶段引导程序进行验证,验证成功则可通过此验证有效的软件执行并继续验证后续引导阶段软件有效性。


image.png

3      安全诊断SECURE DIAGNOSTICS

安全诊断(Secure Diagnostic)是保护ECU内部数据安全的重要手段,主要用于将程序或数据下载 / 上传到服务器以及从服务器读取特定内存位置的诊断服务需要进行身份验证。异常的程序上传或下载到服务器的数据可能会潜在地破坏电子设备或其他车辆部件,或可能违背车辆的排放或安全等标准。另一方面,当从服务器检索数据时,可能会违反数据安全性。因此需在这些服务执行前,要求上位机证明其身份,在合法身份确认之后,才允许其访问数据和诊断服务。


image.png

CYT4BB采用可使用TRNG\PRNG、AES128\256、AES-CMAC、HMAC等多种硬件加密算法机制,来确认客户端的身份,并决定客户端是否被允许访问。其中,对AES、CMAC、HMAC等算法所使用的密钥,皆可使用LoadKey接口进行预先存储,通过KeyID调用的方式在CM0侧进行处理,保证了身份验证信息的可靠性。

4      安全升级SECURE UPDATE

随着越来越复杂的网络环境,在软件升级更新过程中,保证升级包的发布来源有效、不被篡改、数据不丢失以及升级内容不被恶意获取变得越来越重要。传统升级过程的升级包数据基本上是以明文传输,数据校验方式也是安全性较低的CRC算法。

而CYT4BB一方面可采用其CM0核内部自带的RSASSA-PKCS-v1_5(2048/3072bits Key)算法,也可采用知从木牛所集成CryptoLibrary中的Ed25519、ECDSA 256R1、RSASSA-PKCS-v1_5(4096bits Key)、RSASSA-PSS-(4096bits Key)等,可满足用户的多种算法使用需求。并且,对于其中非对称算法所使用的PublicKey、PrivateKey,同样可以使用LoadUserKey接口,将其预先存储于SFLASH指定位置,并通过KeyID进行调用,保证了安全升级过程的真实性和完整性。

此外,CYT4BB也支持证书存储,可调用CM0核Srom相关接口,将所需证书存储于Work Flash相应位置,通过X.509证书解析调用接口并配置对于算法,实现基于证书的安全升级机制。

image.png


5      安全存储SECURE STORAGE

安全存储可保护数据区域内容不被异常窃取,避免因为控制器通过强行访问数据存储区,将存储的密钥,证书等内容进行复制。目前主流芯片都可通过设置Flash,Nvm,RAM等存储区进行数据保护,开启此功能可有效避免上述情况产生。

 CYT4BB则通过在内部专门划定一块用于密钥与用户信息存储的SFLASH区域,实现了安全存储的基础。在此基础上,知从木牛按照AUTOSAR_TR_SecureHardwareExtensions规范要求,实现SHE Key的安全存储,对存入SFLASH中的密钥进行加密保护,保证了安全启动,安全诊断、安全升级等功能的可靠性。

image.png


阅读全文 收起
发布于 2025-04-16 10:28:05
写回答
好问题0
好问题0
已收藏
收藏问题
未解决
知从科技的信息安全产品都有哪些?

信息安全产品支持什么芯片?

阅读全文 收起
发布于 2023-07-07 09:43:09
写回答
好问题0
好问题0
已收藏
收藏问题