Key Point 1: What are the Three Layers?
The information between the three layers isn’t randomly transmitted—the lower the layer, the more detailed, faster, and more abundant the data; the higher the layer, the more refined, slower, and more aggregated the data. This “layered aggregation” design is the fundamental logic of the entire BMS architecture, not something arbitrarily determined.
Key Point Two: Why Layering is Important
For example, a shipping container might contain over 10,000 cell-level data points (voltage, temperature, internal resistance, etc.). If all the raw data were pushed to the EMS, the EMS processor simply couldn’t process it, and the communication bus would immediately become congested. The essence of layering is to perform the correct data compression in the right place—the BMU is responsible for acquisition, the BCU for filtering, and the BAU for decision-making and reporting.
If this logic isn’t understood, engineers often make a classic mistake during integration and debugging: directly demanding that the BAU pass all low-level data through to the EMS. This results in prolonged communication cycles, delayed alarm responses, and in the worst case, a 300-millisecond delay in thermal runaway protection action—a time difference that is fatal at the safety boundary.
Point 3: What's Inside the Container?
BMU↔BCU: Uses a CAN bus (Controller Area Network, a highly reliable industrial communication protocol). The wiring harness is directly soldered to the battery module. The communication cycle is on the order of 10ms, transmitting single-cell cell voltage (accuracy 1mV), temperature (accuracy 0.1℃), and equalization status. Opening the container, you’ll see densely packed orange or black thin lines; that’s the CAN wiring harness.
BCU↔BAU: Also uses CAN, but the cycle is slower, ranging from 100 to 500ms. Transmits cluster-level SOC (State of Charge), SOH (State of Health), and maximum charge/discharge current limit (CLIM).
BAU↔PCS/EMS: Switch to Modbus TCP or IEC 61850 (a power system communication standard), using Ethernet with a cycle time of 500ms to 1 second. This transmits system-level alarms, total power limits, and remote control commands.
Why use CAN at the lower level and Modbus/Ethernet at the upper level? CAN’s advantages are strong real-time performance and interference resistance, but the number of nodes is limited (usually ≤110). Modbus/Ethernet’s advantages are long transmission distance and universal interfaces, suitable for cross-device integration. Choosing the wrong protocol will either result in insufficient real-time performance or double the cabling costs.
【Application Scenarios】
When writing BMS tender documents, procurement personnel often only write “must support CAN/Modbus communication.” Those unfamiliar with the three-layer architecture might think this is sufficient. However, without specifying the communication cycle, data point mapping table, and message format versions for each layer, semantic layer conflicts will inevitably occur when BMS from different vendors interface with EMS—the data may be “readable,” but the numerical meaning will not match. Once understood, the specification can explicitly require: BAU reporting ≥80 data points, SOC reporting cycle ≤1s, and alarm response delay ≤200ms.
▶ Scenario 2: System Integration and Troubleshooting
Integration engineers discover discrepancies between the SOC displayed by the EMS and the power level on the PCS side during on-site integration. Those unfamiliar with the three-layer architecture might check the battery cells, wasting considerable time. The problem often lies in the Accumulation error at the BCU↔BAU layer: the BCU reports capacity increments every 500ms, and if the timestamps are not synchronized when the BAU summarizes, integral drift will occur. Once you understand this, directly check the BAU’s clock synchronization configuration; the problem can be located within 10 minutes.
▶ Scenario 3: After-sales SOH Degradation Anomaly Judgment
Maintenance personnel discover that the SOH of a certain battery cluster is degrading 15% faster than other clusters. Those unfamiliar with the architecture might suspect a cell quality issue and directly request replacement. In reality, it could be due to an incorrect temperature sampling point location in the BMU, causing a long-term deviation in the equalization strategy. After understanding the three layers, you can retrieve the original temperature distribution data from the BMU layer and compare it with historical equalization action records to accurately pinpoint whether the problem lies in the data acquisition layer or the cell itself, avoiding unnecessary losses from mistakenly replacing cells.
[Real Case]
Background: The project has a capacity of 1MWh. The system has been installed and is in the commissioning phase, with a planned grid connection within 3 days. The contract stipulates a penalty of $10,000 USD per day for delays.
Process: The engineer received the BMS vendor’s communication protocol document and found that the SOC register address reported by the BAU was 0x0001, with a value range of 0-1000 (representing 0%-100%). The EMS’s default parsing rule is that 0-100 directly corresponds to 0%-100%. The engineer did not carefully check the data scaling factor and directly connected using the EMS’s default configuration. After power-on, the EMS displayed an SOC of 8%, while the actual system was fully charged. He assumed it was a BMS calibration issue and repeatedly asked the BMS vendor to recalibrate for two consecutive days, but the BMS vendor was unable to quickly identify the problem. On the third day, another experienced engineer intervened and discovered within five minutes that the EMS parsing had omitted a coefficient by a factor of 10—a typical semantic layer error (the communication protocol was established, but the meaning of the values was misunderstood by both parties).
Result: The project was delayed by two days, incurring a $20,000 fine. Disputes also arose regarding liability, impacting future collaborations.
Lesson Learned: Communication “connectivity” is only a syntactic success; data “reliability” is the semantic layer success. Both must be verified separately.