#70 Modem Init Failure

Mở
%! (template.HTML=3 tháng trước cách đây)đang mở bởi rborisov · 0 ý kiến

AT+CGDCONT / vzwadmin APN: Modem Init Failure Investigation

Date: 2026-03-13 Branch: VOD-HWCamera-Dash-MTK


Symptom

Modem initialization never completes when vzwadmin is present in the APN list (/etc/mvcamera/config.json). Renaming the APN to any other string causes initialization to succeed. The system enters an infinite reinit loop.


Root Cause: vzwadmin Rejection → Infinite Init Loop

The modem rejects AT+CGDCONT=2,"IPV4V6","vzwadmin"\r with ERROR (carrier-reserved APN name). This triggers a failure cascade:

[init] clear context 2 → Temp2      ✓
[init] set context 2  → vzwadmin    ✗  (modem returns ERROR)
  ↓ 5 retries exhausted
[jump] → index 41: AT+CFUN=1,1     → reboot
[init] restart from index 1 (isRebootCommandAlreadySent reset by outer loop)
  ↓ vzwadmin fails again
[jump] → index 41: CFUN, but isRebootCommandAlreadySent=true
  ↓ → MODEM_PROCESS_FAILURE
[outer] MODEM_CLEAN_DEVICE_FOR_FAILURE → MODEM_SERIAL_PORT_SCANNING → restart
  ↓ isRebootCommandAlreadySent reset again
  ↓ loop repeats forever

The clearing phase succeeds (Temp2 works), but after the CFUN reboot the modem has Temp2 in context 2, not vzwadmin. Validation in isQuectelModemInit then sees the mismatch and triggers re-init, compounding the loop.


Design Issues

1. CGDCONT errors are fatal — they should not be (primary issue)

ModemManager.cpp:72-79: Every APN-set command has nextCommandIndexIfError = MODEM_AT_COMM_INDEX_AFTER_CFUN. One rejection by the modem for a single APN aborts the entire sequence and reboots. There is no way to skip a failed APN and continue setting the rest.

2. No response accumulation — single read decides success/failure

AtCommandInterface.cpp:93-103: When select() signals data, there is one read() call. If the modem's OK arrives in a second serial fragment, that single read yields no OK → immediate AT_COMMAND_RESPONSE_ERROR, before the timeout even expires. This makes response detection brittle for larger payloads (e.g., AT+CGDCONT? query responses).

3. Clearing APNs before writing creates a destructive intermediate state

ModemManager.cpp:61-69: All 8 contexts are cleared to TempN first. If a subsequent set command fails, the modem reboots with Temp names instead of the original APN configuration. The post-reboot validation then always fails because it compares against the JSON-expected names.

4. No "read before write" — blindly overwrites even correct APNs

AT+CGDCONT? is only queried after reboot for validation. It is never queried at the start of initialization to check whether APNs are already correct. If they are (or pre-provisioned by carrier), the entire clear → set → reboot cycle is unnecessary and destructive.

5. Validation iterates currentApnList (modem) not availableApnList (config)

ModemManager.cpp:861-872: The outer loop is over currentApnList (what the modem reported). If the modem reports fewer contexts than expected, or reports them in a different order, the index pairing between currentApnList[n] and availableApnList[n] breaks silently.


Concrete Improvements

Fix 1: Query AT+CGDCONT? at init start, skip APNs already correct

Add AT+CGDCONT? as one of the first init commands with a callback that populates currentApnList. Then in modemInterfaceInit, before generating an APN-set command, check if currentApnList[n] already matches availableApnList[n]. If it does, skip both the clear and the set for that context and advance initAPNlistIndex.

This avoids clearing pre-provisioned APNs (like vzwadmin) that the modem will reject being re-written.

Fix 2: CGDCONT set failure → skip to next APN, not to CFUN

Change the error routing for indices 17–24:

index 17 error → index 18  (next APN)
index 18 error → index 19
...
index 24 error → index 25  (AT+CTZU=1, first post-APN command)

After all APNs have been attempted (skipping failures), add an explicit validation command (AT+CGDCONT?) to check that all critical APNs (especially IMS, required for VoLTE) are present. Only trigger CFUN if a critical APN is genuinely missing.

Fix 3: Accumulate response bytes until terminal line

In AtCommandInterface.cpp:81-103, stay in AT_COMMAND_RESPONSE_WAIT and append to modemResponseBuffer on each read() until either the expected response substring or ERROR is found — or the timeout fires. A single-read strategy works only if the modem always sends the full response atomically, which serial ports do not guarantee.

// Accumulate approach:
modemResponseBuffer += ConvertedATResponseBuffer;
if (modemResponseBuffer.find(deviceAtCommand.atCmdExpectedResponse) != std::string::npos) {
    stateMachineStatus = AT_COMMAND_RESPONSE_OK;
} else if (modemResponseBuffer.find("ERROR") != std::string::npos) {
    stateMachineStatus = AT_COMMAND_RESPONSE_ERROR;
}
// else stay in RESPONSE_WAIT for next read

This also removes false-ERROR when the first fragment of a long response arrives without OK yet.

Fix 4: Separate "clear" and "set" only for contexts that need changing

Instead of blindly clearing all 8 contexts to TempN, only clear a context if its current value differs from expected AND the expected value can actually be set. If a clear succeeds but the set fails (as with vzwadmin), restore the original value (or leave it if it matches expected). This prevents the destructive Temp2-after-reboot problem.

Fix 5: Validate by iterating availableApnList (config) not currentApnList (modem)

In isQuectelModemInit, iterate the expected list and search the modem's reported list for each entry. This is resilient to the modem reporting APNs in a different order or reporting more/fewer contexts than expected.


Recommended Immediate Fix (Minimal Change)

The smallest change that breaks the infinite loop: change nextCommandIndexIfError for CGDCONT SET commands (indices 17–24) from MODEM_AT_COMM_INDEX_AFTER_CFUN to the next APN index. Add AT+CGDCONT? as a post-APN-set validation command with a callback that checks for critical APNs before deciding whether to proceed or reboot. This stops the loop while keeping the overall structure intact.


Key Files

File Relevance
initmodem/src/ModemManager.cpp Init command array, APN generation, validation logic
initmodem/src/AtCommandInterface.cpp AT state machine, response parsing
initmodem/src/JsonFileParser.cpp APN config parsing from /etc/mvcamera/config.json
initmodem/src/ModemManager.h ModemAPNList struct, class interface
initmodem/include/user_config.h MODEM_TOTAL_NUMBER_OF_APN_AVAILABLE, timing constants
# AT+CGDCONT / vzwadmin APN: Modem Init Failure Investigation **Date:** 2026-03-13 **Branch:** VOD-HWCamera-Dash-MTK --- ## Symptom Modem initialization never completes when `vzwadmin` is present in the APN list (`/etc/mvcamera/config.json`). Renaming the APN to any other string causes initialization to succeed. The system enters an infinite reinit loop. --- ## Root Cause: `vzwadmin` Rejection → Infinite Init Loop The modem rejects `AT+CGDCONT=2,"IPV4V6","vzwadmin"\r` with `ERROR` (carrier-reserved APN name). This triggers a failure cascade: ``` [init] clear context 2 → Temp2 ✓ [init] set context 2 → vzwadmin ✗ (modem returns ERROR) ↓ 5 retries exhausted [jump] → index 41: AT+CFUN=1,1 → reboot [init] restart from index 1 (isRebootCommandAlreadySent reset by outer loop) ↓ vzwadmin fails again [jump] → index 41: CFUN, but isRebootCommandAlreadySent=true ↓ → MODEM_PROCESS_FAILURE [outer] MODEM_CLEAN_DEVICE_FOR_FAILURE → MODEM_SERIAL_PORT_SCANNING → restart ↓ isRebootCommandAlreadySent reset again ↓ loop repeats forever ``` The clearing phase succeeds (`Temp2` works), but after the `CFUN` reboot the modem has `Temp2` in context 2, not `vzwadmin`. Validation in `isQuectelModemInit` then sees the mismatch and triggers re-init, compounding the loop. --- ## Design Issues ### 1. CGDCONT errors are fatal — they should not be (primary issue) `ModemManager.cpp:72-79`: Every APN-set command has `nextCommandIndexIfError = MODEM_AT_COMM_INDEX_AFTER_CFUN`. One rejection by the modem for a single APN aborts the entire sequence and reboots. There is no way to skip a failed APN and continue setting the rest. ### 2. No response accumulation — single read decides success/failure `AtCommandInterface.cpp:93-103`: When `select()` signals data, there is one `read()` call. If the modem's `OK` arrives in a second serial fragment, that single read yields no `OK` → immediate `AT_COMMAND_RESPONSE_ERROR`, before the timeout even expires. This makes response detection brittle for larger payloads (e.g., `AT+CGDCONT?` query responses). ### 3. Clearing APNs before writing creates a destructive intermediate state `ModemManager.cpp:61-69`: All 8 contexts are cleared to `TempN` first. If a subsequent set command fails, the modem reboots with Temp names instead of the original APN configuration. The post-reboot validation then always fails because it compares against the JSON-expected names. ### 4. No "read before write" — blindly overwrites even correct APNs `AT+CGDCONT?` is only queried *after* reboot for validation. It is never queried at the *start* of initialization to check whether APNs are already correct. If they are (or pre-provisioned by carrier), the entire clear → set → reboot cycle is unnecessary and destructive. ### 5. Validation iterates `currentApnList` (modem) not `availableApnList` (config) `ModemManager.cpp:861-872`: The outer loop is over `currentApnList` (what the modem reported). If the modem reports fewer contexts than expected, or reports them in a different order, the index pairing between `currentApnList[n]` and `availableApnList[n]` breaks silently. --- ## Concrete Improvements ### Fix 1: Query `AT+CGDCONT?` at init start, skip APNs already correct Add `AT+CGDCONT?` as one of the first init commands with a callback that populates `currentApnList`. Then in `modemInterfaceInit`, before generating an APN-set command, check if `currentApnList[n]` already matches `availableApnList[n]`. If it does, skip both the clear and the set for that context and advance `initAPNlistIndex`. This avoids clearing pre-provisioned APNs (like `vzwadmin`) that the modem will reject being re-written. ### Fix 2: CGDCONT set failure → skip to next APN, not to CFUN Change the error routing for indices 17–24: ``` index 17 error → index 18 (next APN) index 18 error → index 19 ... index 24 error → index 25 (AT+CTZU=1, first post-APN command) ``` After all APNs have been attempted (skipping failures), add an explicit validation command (`AT+CGDCONT?`) to check that all critical APNs (especially IMS, required for VoLTE) are present. Only trigger CFUN if a critical APN is genuinely missing. ### Fix 3: Accumulate response bytes until terminal line In `AtCommandInterface.cpp:81-103`, stay in `AT_COMMAND_RESPONSE_WAIT` and append to `modemResponseBuffer` on each `read()` until either the expected response substring or `ERROR` is found — or the timeout fires. A single-read strategy works only if the modem always sends the full response atomically, which serial ports do not guarantee. ```cpp // Accumulate approach: modemResponseBuffer += ConvertedATResponseBuffer; if (modemResponseBuffer.find(deviceAtCommand.atCmdExpectedResponse) != std::string::npos) { stateMachineStatus = AT_COMMAND_RESPONSE_OK; } else if (modemResponseBuffer.find("ERROR") != std::string::npos) { stateMachineStatus = AT_COMMAND_RESPONSE_ERROR; } // else stay in RESPONSE_WAIT for next read ``` This also removes false-ERROR when the first fragment of a long response arrives without `OK` yet. ### Fix 4: Separate "clear" and "set" only for contexts that need changing Instead of blindly clearing all 8 contexts to `TempN`, only clear a context if its current value differs from expected AND the expected value can actually be set. If a clear succeeds but the set fails (as with `vzwadmin`), restore the original value (or leave it if it matches expected). This prevents the destructive Temp2-after-reboot problem. ### Fix 5: Validate by iterating `availableApnList` (config) not `currentApnList` (modem) In `isQuectelModemInit`, iterate the expected list and search the modem's reported list for each entry. This is resilient to the modem reporting APNs in a different order or reporting more/fewer contexts than expected. --- ## Recommended Immediate Fix (Minimal Change) The smallest change that breaks the infinite loop: change `nextCommandIndexIfError` for CGDCONT SET commands (indices 17–24) from `MODEM_AT_COMM_INDEX_AFTER_CFUN` to the next APN index. Add `AT+CGDCONT?` as a post-APN-set validation command with a callback that checks for critical APNs before deciding whether to proceed or reboot. This stops the loop while keeping the overall structure intact. --- ## Key Files | File | Relevance | |------|-----------| | `initmodem/src/ModemManager.cpp` | Init command array, APN generation, validation logic | | `initmodem/src/AtCommandInterface.cpp` | AT state machine, response parsing | | `initmodem/src/JsonFileParser.cpp` | APN config parsing from `/etc/mvcamera/config.json` | | `initmodem/src/ModemManager.h` | `ModemAPNList` struct, class interface | | `initmodem/include/user_config.h` | `MODEM_TOTAL_NUMBER_OF_APN_AVAILABLE`, timing constants |
Đăng nhập để tham gia bình luận.
Không có Milestone
Không có người được phân công
1 tham gia
Đang tải...
Hủy bỏ
Lưu
Ở đây vẫn chưa có nội dung nào.