Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address several ESP32 network driver issues #1137

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

UncleGrumpy
Copy link
Collaborator

@UncleGrumpy UncleGrumpy commented Apr 23, 2024

These changes close the following issues for the ESP32 network driver:

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

#ifdef CONFIG_AVM_ENABLE_NETWORK_PORT_DRIVER

REGISTER_PORT_DRIVER(network, network_driver_init, NULL, network_driver_create_port)
REGISTER_PORT_DRIVER(network, network_driver_init, network_driver_stop, network_driver_create_port)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network_driver_stop will not be called when calling network:stop/0. It gets called during tests (when GlobalContext is destoryed and recreated).
So this change sounds correct, but it doesn't do what you are saying.

@UncleGrumpy UncleGrumpy marked this pull request as draft April 28, 2024 18:44
@UncleGrumpy UncleGrumpy force-pushed the esp32_network_issues branch 4 times, most recently from 908bd03 to 40b8959 Compare May 1, 2024 02:40
@UncleGrumpy
Copy link
Collaborator Author

Thank you for the explanation of the REGISTER_PORT_DRIVER destroy_cb in the Telegram chat, this did not work as advertised! I was also several missing steps, so even when the VM exited and the destroy_cb would execute this code, netif and IP callbacks would still be triggered.

These problems have all been fixed, and the driver shuts down and exits cleanly. There is still one ESP-IDF message (in some builds, depending on the IDF version) in the console logs that is being reported as an error:

E (8246) wifi:NAN WiFi stop

I looked into this and it is actually a known ESP-IDF issue, that will be changed to an information message in a future update. See espressif/esp-idf#12473.

@UncleGrumpy UncleGrumpy marked this pull request as ready for review May 1, 2024 03:03
@UncleGrumpy UncleGrumpy changed the title Address several ESP32 network issues Address several ESP32 network driver issues May 1, 2024
@UncleGrumpy UncleGrumpy force-pushed the esp32_network_issues branch from 40b8959 to 8eb624e Compare May 12, 2024 03:01
@UncleGrumpy UncleGrumpy force-pushed the esp32_network_issues branch 2 times, most recently from a222bd4 to dfa2f88 Compare May 25, 2024 21:51
@UncleGrumpy UncleGrumpy force-pushed the esp32_network_issues branch 3 times, most recently from 6063cca to 2623f6d Compare June 6, 2024 01:11
@@ -723,7 +744,6 @@ static void start_network(Context *ctx, term pid, term ref, term config)
if ((err = esp_wifi_set_config(ESP_IF_WIFI_AP, ap_wifi_config)) != ESP_OK) {
ESP_LOGE(TAG, "Error setting AP mode config %d", err);
free(ap_wifi_config);
free(sta_wifi_config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this causes a crash. Can you elaborate further?

Copy link
Collaborator Author

@UncleGrumpy UncleGrumpy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pointer has already been free'd between lines 703 and 715. Freeing it again will cause a double-free crash, this is an error condition already - but we want the user to get a nice Erlang stack trace (or catch the error), rather than a hard crash of the VM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look at the sources, but I'm not understanding what you mean here. Can you elaborate further?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bettio I understand the crash could also be avoided if sta_wifi_config was set to NULL after line 716 (old) 736 (new). If sta_wifi_config was not NULL, it is freed anyway in the block above.

Copy link
Collaborator Author

@UncleGrumpy UncleGrumpy Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is from release-0.6:
https://github.com/atomvm/AtomVM/blob/eb406a82e15c53d77304d30ceabcfc67a2d5e593/src/platforms/esp32/components/avm_builtins/network_driver.c#L724C1-L738C6

    if (!IS_NULL_PTR(sta_wifi_config)) {
        if ((err = esp_wifi_set_config(ESP_IF_WIFI_STA, sta_wifi_config)) != ESP_OK) {
            ESP_LOGE(TAG, "Error setting STA mode config %d", err);
            free(sta_wifi_config);
            if (!IS_NULL_PTR(ap_wifi_config)) {
                free(ap_wifi_config);
            }
            term error = port_create_error_tuple(ctx, term_from_int(err));
            port_send_reply(ctx, pid, ref, error);
            return;
        } else {
            ESP_LOGI(TAG, "STA mode configured");
            free(sta_wifi_config);
        }
    }

Either path through this code will free(sta_wifi_config). Freeing again below will cause a hard fault.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bettio I understand the crash could also be avoided if sta_wifi_config was set to NULL after line 716 (old) 736 (new). If sta_wifi_config was not NULL, it is freed anyway in the block above.

This might be an easier way to avoid the bug, but I still am not a fan of freeing the same variable multiple times, it always makes me have to stop an re-examine the code the next time I’m working on a feature, because it looks suspicious.

@petermm
Copy link
Contributor

petermm commented Jun 15, 2024

This is great!

Can anecdotally confirm this resolves esp32s2 OOM'ing with the forthcoming wifi test CI (oom only on esp-idf v5.1.4, on the test after wifi).. and adding network:stop() to the code resolved it. (3 failed runs OOM, 3 success runs with network:stop())

Have tested full Sim CI suite - all is green, the systems with wifi exercised network:stop().

Screenshot 2024-06-15 at 23 48 39

@bettio
Copy link
Collaborator

bettio commented Jun 18, 2024

Can anecdotally confirm this resolves esp32s2 OOM'ing with the forthcoming wifi test CI (oom only on esp-idf v5.1.4, on the test after wifi).. and adding network:stop() to the code resolved it. (3 failed runs OOM, 3 success runs with network:stop())

That is awesome, however I need a complete understanding about how this fix is working during the review...

@UncleGrumpy UncleGrumpy force-pushed the esp32_network_issues branch 2 times, most recently from d2bbaaf to 0d68f33 Compare June 28, 2024 00:37
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 8, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 8, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 9, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 9, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 9, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 9, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 22, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 24, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 25, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 25, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
UncleGrumpy added a commit to UncleGrumpy/AtomVM that referenced this pull request Sep 28, 2024
This driver offers resource based nif APIs using unit and channel resources, as
well as a set of convenient gen_server based APIs, which use pin numbers, as an
alternative to using the nif resource handles.

Unit 2 is currently disabled for the ESP32C3 by default due to known hardware
limitations, see:
https://docs.espressif.com/projects/esp-idf/en/latest/esp32c3/api-reference/peripherals/adc_oneshot.html#hardware-limitations

Unit 2 is currently disabled by default for the ESP32 (classic) due to WiFi
conflicts. This decision can be revisited after PR atomvm#1137 is merged, and the
WiFi driver can be successfully stopped after it has been started.

Unit 2 is enabled by default for all other chips with more than one ADC unit,
as the chips released after the ESP32 classic have a functional ADC arbitrator
peripheral that allows ADC unit 2 use while WiFi is enabled.

Signed-off-by: Winford <winford@object.stream>
Removes several possible uses of free() on memory that had been previously free'd. This would
happen under specific error conditions in the `start_network` function in network_driver.c
that `network:start/1` uses to configure and start the network.

Signed-off-by: Winford <winford@object.stream>
…SP32

Adds a destroy callback to the ESP32 network driver to completely stop the
driver and free all network resources when network:stop/0 is used.  Previosly
the driver was not being stopped internally and resources were not freed when
the gen_server was stopped, causing instability, and possible crashes when
event callbacks were triggered, but there was no process alive to handle them.

Closes atomvm#643

Signed-off-by: Winford <winford@object.stream>
Adds an event handler for `event 21` the `WIFI_EVENT_STA_BEACON_TIMEOUT` event
and an option to add an Erlang callback handler for the event. The event will
be logged with an info level message that includes a suggestion about the two
most likely causes, poor rssi and network congestion. A callback config option
`{beacon_timeout, fun()}` may be added to the `sta` config.

Closes atomvm#1100

Signed-off-by: Winford <winford@object.stream>
@UncleGrumpy UncleGrumpy force-pushed the esp32_network_issues branch from 0d68f33 to ec7602c Compare October 3, 2024 22:48
@bettio bettio merged commit 3c80b39 into atomvm:release-0.6 Oct 3, 2024
82 of 87 checks passed
@UncleGrumpy UncleGrumpy deleted the esp32_network_issues branch November 20, 2024 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants