-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'wifi' task crashes issue (IDFGH-12738) #13721
Comments
Here is my SDK config file: |
We have found that if we do have more than 12 devices in the same network it seens that the wifi crash issue happen more frenquently, with 25 devices it crashes 2 out 25 daily in random hours, sometimes takes 4 hours to come back, sometimes 3 minutes, even without reboot and without any explanation. Some crashes doesnt recover and it shows the wifi interface connected but there is no tcp/ip or udp reply. BLE still working perfect in this case but it NIMBLE. We are using 5.2.1 and could not found what is the root cause of it yet. |
hi @filzek @InterfacerCompany ,so your crash issue will happen both on v5.1.3 and v5.2.1 |
and could you describe your application scenario? I think you use the nimble and wifi coexist, and connecting to a TCP server, uploading data at regular intervals? |
Hi @Xiehanxin, We've been working hard to ensure the WiFi driver and the lwIP stack don't cause any issues. We've made some intriguing discoveries, but we're not ready to share details just yet. Tomorrow, we'll conduct an extensive debugging session across 57 devices that have been specially configured via the sdkconfig. This setup aims to stabilize the system and prevent crashes, and we're hopeful it will resolve the issues. As of now, the latest firmware build, which only includes changes to the SDKCONFIG, has been running smoothly without any problems for the past 48 hours. Although these adjustments might seem a bit unusual or counterintuitive, they appear to be effective. We'll provide a more detailed update once we verify the fixes in tomorrow's session. Also we have add this boards to the esp-insights, we can share the results if you want to be part of the dashboard, or if you have access it is dashboard-id=18558382-e648-46c0-9a8e-f9243b2f0dd0 We still have problem with the POWER on Nimble, sometimes the power get very very low, like -50 to -80 dBm. |
Hi @Xiehanxin. I'm not tested on v5.2.1 yet. But previously my project used v5.0.6 - similar situation, it crashed randomly. I'm updated it to v5.1.3 hoping that it will be stable. The logic of my application is next:
You can see decoded core dump after the system is crashed. I will provide more from different devices. All the time my devices in WiFi STA mode and connected to router. |
hi @InterfacerCompany @filzek it seems that it run out of the memory, Could you add a task to periodically use esp_get_minimum_free_heap_size to print the remaining memory |
Hi @Xiehanxin. I don't think so. My app all the time monitoring the free heap size, and fragmentation. I also have a special logic that will reboot esp32 if the memory becomes too low or fragmentation too big. But I don't see that logic become active and reboot the board. But it is only for critical cases, normally my app is not going out of memory. Here is the memory status during system runtime:
|
I have a similar thing on ESP32 (ESP32-D0WDQ6 v1.0) and esp32s3. The symptom is that from the serial all seems normal, but the device can no longer deliver data, and is no longer available to ping on the wifi. If I don't enable Nimble at startup, then the issue does not occur, so it feels like a coexistence issue. |
Continuing this topic. // 87 --------------------------------- // 89 --------------------------------- // 90 --------------------------------- // 91 --------------------------------- Does anybody have an idea how it can be fixed? |
@Xiehanxin Any feedback about the log in #13721 (comment) ? |
showMemoryRAMStatus ======================================================= the memory is not the problem, as the heap is constant and always have free memory We use esp32 3.0 wirth psiram enable and alloc all the heap in spiram, so, the internal/default memory is always constant free. we suspect that BLE driver and Wifi driver sometime runs a concurrence and crash, also we have found some very odd bahvior in the WiFi connection:
So, something related to the WiFi is crashing the boards. |
Hi guys. I continue facing this problem. Devices 85,86,87: Device 88: Devices 90,91: So, it seems during operation with memory some variables become broken and FW crashes when trying to free a broken pointer : block_trim_free tlsf.c:496 (block_is_free(block) && "block must be free") or some other opperations insert_free_block tlsf.c:358 (current && "free list cannot have a null entry") in case of mdns code. Here is also decoded core dump files: Also, it seems that v5.2.2 in the core-dump file, the 'used stack' size for the task is not correct. Examples: 0x3ffd929c wifi23/1073582736 1073581360/4760 v5.1.4 0x3ffd9228 wifi 23/23 528/5612 Can somebody check the decoded core dump files and help me with this problem. Any ideas is welcome ))) Maybe somebody from the esp-idf team is here. |
Some additional information about my project. I'm trying to keep heap memory as big as possible. Unfortunately, I can't add SPI SRAM to my current HW. So, here are options that are activated to save RAM memory:
LWIP_TCPIP_CORE_LOCKING and LWIP_TCPIP_CORE_LOCKING_INPUT are not reducing the heap memory but seems if those activated systems become more stable. "mem":{"u8": {"cur": "87.68 KBytes", "min": "67.14 KBytes", "max": "111.09 KBytes", "maxBlock": "72.00 KBytes", "frag":"18%"},"u32": {"cur":"14.03 KBytes"}} Does somebody see problems with such configuration? |
@AxelLin hello there! Today we made a new discovery about the WiFi and BLE stacks in the ESP32 ESP-IDF. Under a certain catastrophic sequence of events—still to be fully reproduced—the system enters what we’re calling a ZOMBIE state. What Is a Zombie State?BLE and WiFi appear dead: WiFi stack illusion: Misleading connection status: Our Findings and WorkaroundsHandler Code: Temporary Fix: Logging for Analysis: Additional Logging: Next Steps We need that the driver when get lost has a mechanism that tell us it, so we can do something later. |
Hi @filzek |
It's on esp-insights, there is no error log from loge, but the logic behind we have been able to register, yes I can send the log we got so far. |
Answers checklist.
IDF version.
v5.1.3
Espressif SoC revision.
ESP32-D0WD-V3
Operating System used.
Windows
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
CMD
Development Kit.
custom PCB
Power Supply used.
External 5V
What is the expected behavior?
I expected that my device would work continuously for a long period of time without crashes and reboots.
What is the actual behavior?
After working for a while, my device crashes and reboots.
Steps to reproduce.
Unfortunately, I don't have the steps to reproduce. It just happens with some period. It can crash a few times during the day. Or even it can be working 24 hours and then crash.
Debug Logs.
More Information.
I have several devices in the fields and they crashed at different times. Most of them in 'wifi' thread, sometimes in 'tiT' thread.
I used WI-FI and BLE(Host is NimBLE - BLE only ) communication at the same time. WI-FI power save mode is WIFI_PS_MIN_MODEM.
I have also enabled LWIP_TCPIP_CORE_LOCKING and LWIP_TCPIP_CORE_LOCKING_INPUT . In my opinion, after enabling it my device starts working more stable(increases time before crash).
The text was updated successfully, but these errors were encountered: