Skip to content

Feature Request - Better WiFi Management #1643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
derei opened this issue May 20, 2025 · 32 comments
Open

Feature Request - Better WiFi Management #1643

derei opened this issue May 20, 2025 · 32 comments

Comments

@derei
Copy link

derei commented May 20, 2025

@openshwprojects

Issue: Lack of Granular WiFi Recovery and Monitoring in OBK Scripts
Device: RGBW LED Controller iH001 with CB3S (BK7231N)

Problem Summary

During my learning and testing with OBK scripting, I ran into repeated problems related to unreliable WiFi handling:

  • WiFiState == 4 is not a trustworthy indicator of actual connectivity. In many cases, it was 4 but the device still failed to communicate or respond over network.
  • The restart command appears ineffective in restoring WiFi connectivity. It does not seem to reinitiate the WiFi subsystem.
  • The only workaround that restores WiFi is a physical power cycle. However, even this isn't deterministic - it often takes 2 or more cycles before a connection is achieved.
  • The time-to-connect after boot is highly variable and appears sensitive to race conditions.

Script Context

I built a full WiFi state indicator script using OBK scripting and PingHost watchdog. However, even with visual indicators and ping tests, the device behavior remains unreliable when WiFi fails.

clearAllHandlers
clearRepeatingEvents
// init visual indicator
backlog setChannel 1 0; setChannel 2 5; setChannel 3 5; setChannel 4 0
// state visual indicator (will blink until connection is successful)
addRepeatingEventID 1 -1 99 backlog ToggleChannel 4; ClampChannel 4 0 1

SetPinRole 9 Btn

alias good_to_go backlog cancelRepeatingEvent 99; setChannel 1 10; setChannel 2 7; setChannel 3 3; setChannel 4 5; echo === GOOD TO GO!
alias restart_device echo NOW IT SHOULD RESTART...

// WiFi state color indicators
addChangeHandler WiFiState == 1 backlog setChannel 1 0; setChannel 2 0; setChannel 3 5; setChannel 4 0
addChangeHandler WiFiState == 2 backlog setChannel 1 6; setChannel 2 0; setChannel 3 2; setChannel 4 0
addChangeHandler WiFiState == 3 backlog setChannel 1 5; setChannel 2 0; setChannel 3 0; setChannel 4 0
addChangeHandler WiFiState == 4 startScript autoexec.bat pingCheck

return

pingCheck:
echo PING CHECKING...
PingHost 192.168.0.1
PingInterval 10
good_to_go
waitFor $noPingTime > 600
restart_device
return

Suggestions / Feature Requests

  1. Reliable WiFi Reconnect Command
    A new command such as wifi_reset that properly shuts down and reinitializes the WiFi subsystem (as if it were freshly booted) would be extremely helpful. restart doesn’t serve this purpose.

  2. WiFi Connection Health Variable
    We need a more truthful state than WiFiState == 4. A $WiFiAlive or $HasIP constant could help scripts verify actual operability (e.g. successful DHCP, ping responsiveness, etc.).

  3. Expose $noPingTime to if conditions, not just waitFor
    Currently, $noPingTime appears to be usable only inside waitFor blocks. This limits scripting flexibility. It would give much more control over wifi state actions if real time if ... then logic could apply to it.

  4. Built-in WiFi Watchdog
    Extend PingHost or create a native WiFiWatchdog command that can:

    • auto-check both WiFiState and ping success
    • auto-retry reconnection routines
    • optionally force reconnection reboot after timeout
  5. Expose Reconnect Failure Count or Attempts
    Exposing how many connection attempts failed in a boot cycle would help scripts decide when to escalate (e.g. trigger external power relay).

  6. Boot process is hindered by racing conditions
    Not just WiFi, but various parts of the boot process are affected by race conditions, leading to unpredictable start-up behaviour and scripting failures.


P.S.
OBK scripting feels extremely unintuitive - full of exceptions, selectively applied conditions, and arbitrary restrictions. Many things that should logically work simply don’t, with no clear reason. This makes the system frustrating and error-prone, especially for a project that aims to promote privacy, decentralization, and empowerment through open technology.

Right now, OBK feels accessible only to a very narrow group of technically obsessive users. Anyone with less experience or patience is likely to give up - pushed away not by the complexity of the hardware, but by the unpredictability of the scripting environment itself.

After spending several days wrestling with this LED controller, I’m convinced that building a system from scratch (e.g. ESP32 with ArduinoIDE or PlatformIO) would produce something far cleaner, more predictable, and more sustainable. It’s not the hardware that’s the barrier - it’s the inconsistent behaviour and lack of scriptability that drains time and motivation.

@openshwprojects
Copy link
Owner

I can add those things to scripting, but keep in mind that we also have Berry support now: https://www.elektroda.com/rtvforum/topic4117238.html
It's not fully documented yet, and it's in development, but it should help us normalizing the OBK scripts.

Apart from that, I'm suprised about what you say about restart. Restart (reboot) is a fresh boot, no variables are kept. Any ways to reproduce the problem? Also pinging @NonPIayerCharacter @divadiow

Maybe could you provide an UART log showing what is printed when you do software reboot and it does not connect to WiFi?

@derei
Copy link
Author

derei commented May 20, 2025

@openshwprojects
I won’t be providing a UART log. My USB-to-TTL adapter is unreliable - it works for flashing, but I’ve never gotten any readable serial output. Just random garbage characters, especially if I mess with VCC (yes, I did adjust baud rate). So unless I switch hardware, I’m blind on that front.

Now, about restart: yes, the command executes, but every single time I use it, WiFi dies - no exceptions. The device reboots into a state where it never reconnects. Not once has it recovered from a software restart. Only manual power-cycling fixes it, and even that’s not always reliable on the first try.

So while restart works in a mechanical sense, it guarantees a non-functional network state afterward. That makes it useless in scripts meant for recovery.

Also, Berry isn’t a solution here. It’s promising, and I’ve started exploring it, but it’s still experimental, not fully documented, and doesn’t replace OBK core logic. Critical operations like startup, recovery, connection handling still depend on OBK. So unless Berry eventually replaces OBK at the base level, it can’t fix the current reliability problems. At best, it’s something to use after the device is stable.

As long as OBK scripting is what boots the device, restores function after failure, handles offline conditions, and keeps automation, that must be the focus - to make it reliable.

Let me know if you'd like help testing anything without UART. But I’ll be focusing on black-box behavior and scripting edge cases - not low-level output I can’t access for now (yeah, I cheapened on an Aliexpress UART adaptor).


P.S.
After digging through Elektroda, I’ve seen I’m not the only one hitting these issues. But the reports are scattered, inconsistent, and easy to dismiss in isolation. Unless someone actively connects the dots, the scope of the problem stays hidden - and perhaps that is why it was never addressed.

@openshwprojects
Copy link
Owner

How many devices do you have and how many devices have this problem?

I have devices used everyday and they always seem to connect...

What kind of info is missing for berry, what can I add? We can start with most basic things, and I'll add them to berry as requested.

@openshwprojects
Copy link
Owner

Look, daily usage LED strip:

Image
And here is RGBCW bulb that survived many router restarts, check uptime:

Image
I need help to find the issue because it does not happen on my WiFi...

@derei
Copy link
Author

derei commented May 20, 2025

@openshwprojects
I'm currently testing with a single device - the RGBW LED strip controller (iH001 with CB3S). I do have other devices (Gosund sockets, lights) that I plan to port eventually, but I’m starting here to ensure I can get at least one unit running reliably before scaling up.

And while this controller may be on the finnicky side, the issues I’m facing are not edge cases. Basic WiFi recovery, stable boot, and reliable state detection should be considered core functionality. Right now, they're not dependable.

About Berry
As mentioned earlier, I just started looking into Berry, and it does look promising. But from what I’ve seen so far, it still relies on OBK underneath. Even scripting an autoexec.be appears to route through OBK callbacks. That makes sense, but it means Berry can’t currently act as a full scripting replacement, especially for boot logic and recovery.

My plan was this:

  • Make the device rock-solid and fast at boot (ideally <5s to usable state).
  • Add AP/SafeMode handling before WiFi to avoid lockups.
  • Layer Berry for logic, automation, and Home Assistant integration after the core is reliable.

In practice though, boot times vary wildly - from ~5s to 30s+ - and at least half of the times when I get WiFiState == 4, the device isn’t actually connected.

Your reliable device
You mentioned a RGBCW bulb with 118 days uptime. Can you share the script or config used on that device? Even if it's not public, I’d really appreciate a look - I'd like to adapt it for my controller and at least to see if the hardware is to blame, or the issue is in between keyboard and chair (lol).

Core Issues That Still Need Solving

  • Boot racing conditions - not just the WiFi problem; the whole process feels inconsistent and non-deterministic.
  • Lack of WiFi control:
    • No way to trigger reconnect manually (connect/reconnect)
    • $noPingTime is only usable in waitFor, not in if ... then
  • No true indicator of real connectivity (beyond misleading WiFiState == 4)
  • Documentation and syntax consistency - OBK scripting often behaves in ways that defy intuition, with scattered exceptions and inconsistent command behaviour. It’s usable, but getting to reliability takes trial-and-error. This isn’t a hard problem to fix, just a tedious one. That’s why I see it as a third priority, behind fixing boot timing and improving WiFi control.

Helping
I'd love to help - I'm genuinely interested in freeing people from any kind of monopoly or centralization. But just to be upfront: my C skills are basically zero.

That said, I can absolutely help in other areas. Whether it's:

  • doing the grunt work of rewriting or restructuring documentation (as long as there's someone to clarify behaviour when needed),
  • running diagnostics or edge-case tests on my device(s),
  • testing debug-ready firmware if you have builds that expose deeper logs or internal state.

@openshwprojects
Copy link
Owner

You are correct that scripting system is not mature, and I will do my best to improve it for you, but you are most likely somewhat wrong in regard to the WiFi/restart, or, to be precise, your issue does not seem to be common one.

My devices are mostly stock and they reboot reliably. I literally flashed hundreds of devices: https://openbekeniot.github.io/webapp/devicesList.html and I don't recognize "reboot not cleaning" problem.
For example, this RGBCW lamp is just 5x PWM lamp with no autoexec.bat.
Here's another plug:
Image
Again, no WiFi problem.
Maybe issue is somewhere else - for example, can you take 2MB flash backup of your device and then try doing "Restore RF partition"? But maybe not yet, it could be done as last resort...

Or maybe you have this flag enabled and it has some instability?

37 | [WiFi] Quick connect to WiFi on reboot (TODO: check if it works for you and report on github -- | --

You don't need C knowledge to help, your suggestions are already helpful, we can go through scripting issues together later, but first we need to sort out that WiFi thing because it's certainly not standard. They may be some bug or issue, but it's not by design...

@DeDaMrAzR
Copy link
Contributor

@derei

First of all - welcome to the world of open source :)

On the connection issues - BK7231 based devices are extremely cheap and are known to be of poor quality, it just may be a case you've gotten a bad chip/module in your device. OBK has a signal strength indicator on it's landing/home page, what is your signal reading there?

I've got 50+ devices running at any given time, some of them for more than 100 days without any problems as they are reporting to HA for monitoring and testing purposes. Try a different device/module and see if the problem persists.

Scripting is never a beginner friendly activity in any type of environment, especially in some thing like OBK as it has rapid development process and super platform heavy adoption/porting rate. We are constantly trying to improve the scripting documentation and examples and actively participate in elektroda forum community to resolve such issues.

Again have in mind that this is an open source project, wide spread (many platforms) and with very rapid development so barging in with negative attitude will not make things any better. Some of your suggestions are spot-on, we have internally discussed some of them, but this is a small team of enthusiasts trying our best to make something work. Stick around and give it a chance, the team will help you to the best of our ability.

Some of you initial comments are out of place. You confirmed that you programing skills are lacking but still demand immediate changes to something that is not to your liking. You have also jumped way ahead of your self by trying to script something without understanding it first nor knowing how any of the processes work (like reboot or wifi connect ie). You are mentioning boot racing conditions being problematic for wifi connectivity - which ones?? What have you noticed/confirmed is in racing condition??
For something to be deterministic you will need to know how it operates in the first place - again you (we all) are dealing with a 2$ WiFi module here, bear that in mind - it is not ideal.

Again, welcome to the project, we will help you out as much as possible, lower your expectations to your skill level and try to be more precise with your problems and we will get involved to solve them if possible.

Thank you.

@derei
Copy link
Author

derei commented May 20, 2025

@openshwprojects
Before flashing OBK I made a complete backup of stock FW.
I did suspect RF partition issue especially after meddling with the device for a while (flashing various fw variants and not knowing why it wasn't connecting reliably). So, what I did, I reflashed Stock (that rewrote RF), and then flashed 1.18.101_Berry.

Flags
I made sure to not have any unnecessary flags enabled. The only flag I used was 3 (raw controls). So, that one is excluded - but now that you mentioned, I looked over all flags, and when reading about 26 and 31, I realised the problem with not getting any uart logs was that I never used uart2 - I'll need a 2nd usb adapter, so I can monitor both uart ports at the same time, and then perhaps I can find more. Thanks for this unintended trigger.

  • as for flag 37 + 51 - I tried them sometime early-on, but they made things even worse.

Boot speed tests
As you got me intrigued, I deleted autoexec.bat and did some boot tests:

  • Power-Cycling - boots in 45-55s (yeah, I know...), and Wi-Fi mostly succeeds from 1st attempt.
  • reboot / restart - never succeeds to reestablish Wi-Fi

Could it be RF Partition? - Definitely a possibility. But given that it's module-locked, all I can do is rely on my initial backup. I doubt I can cannibalise it from a different module.

@NonPIayerCharacter
Copy link
Contributor

"restart" should really work, unless it isn't executed. A good substitution for it is "deepsleep 1", since it's instant.

WiFiState == 4 is triggered when we receive IP, or when connected to AP if static IP is configured. On BK7231 it's triggered only in one place: https://github.com/openshwprojects/OpenBK7231T_App/blob/master/src/hal/bk7231/hal_wifi_bk7231.c#L294, so it should be very reliable.

As a test i would also suggest to reboot your router. I remember that i had a problem once with BK7231T device, and that it would never connect on the first try. Router reboot helped with that. (AX3600 Openwrt 24.10).

What happens if you configure ping watchdog in device web page? I use it on several non-beken devices and it works fine for me.
Instead of waitFor noPingTime, just configure 'Take action after this number of seconds with no reply'. It doesn't reset the device, but it forcibly disconnects from AP.

Boot times are strange, UART logs would be really helpful. You can enter command 'logPort 1' to print OpenBeken logs to UART1, but the downside is that only OpenBeken logs are printed, logs from SDK will still print to UART2.

@derei
Copy link
Author

derei commented May 20, 2025

@DeDaMrAzR

Thank you for the welcome and apologies if I left that impression to you - that wasn't my intention.
And definitely NOT demanding. My intention was to make some observations based on my brief experience (believe me, initial experience is often more valuable than "used with the system" kind of experience - it shows what others got too familiar with to observe anymore).

Regarding Programming Skills - I may not be able to program in C, but I have experience with other programming and scripting languages. I am not new to system-level troubleshooting, scripting logic, hw diagnosis, automation.

You are mentioning boot racing conditions being problematic for wifi connectivity - which ones?? What have you noticed/confirmed is in racing condition??

I inferred from device behaviour:

  • Timing from power-on to network readiness varies from ~5s to 50s+, even on the same firmware version and script.
  • WiFiState == 4 is frequently reached when there is no actual connection
  • After issuing restart, WiFi consistently fails to come back. This is 100% reproducible on my iH001 device. Only power-cycling restores it.
  • Occasionally, even a single power cycle isn’t enough. Two or more are needed before connection succeeds.
    If I knew why these symptoms happened, I’d submit a potential solution. Since I don’t, I’m reporting it in the only way I can: as structured, observable failure behaviour.

“Some of your initial comments are out of place...”

I understand where you're coming from, and I made sure to address it first thing in this reply. I care about this project enough to engage with it despite the full stack of frustrations that hit me from day one. I can see the work that has been put on it from functional standpoint and I understand that refinements may not be on the first list of priorities (yet) - hence I offered myself to help with making at least some of the documentation user-friendly.

I'm battling my own health issues and limited energy, but despite that I had to speak up, not as criticism to the team, but to show that users who may want to join OBK are often deterred by this kind of hard to pinpoint behaviour of OBK and cheap HW.

@DeDaMrAzR
Copy link
Contributor

@derei

Just to repeat, welcome 😃 any help is highly appreciated. But we can cross that bridge once we burn it 😄

Would you mind if we delve into some basics first? I will ask couple of super basic questions to determine what is it you can do so we can do better to help you out.

  • can you reliably flash/interact with OBK module via UART?
  • what is the network type you are trying to hook up to? (N modules support only 2.4GHz WPA-WPA2 networks)
  • if at any point your module is connected successfully can you share a OBK home page screen shot?

I believe we can move from here and determine what your issue is, ok?

@DeDaMrAzR
Copy link
Contributor

DeDaMrAzR commented May 21, 2025

Just to share my iH001 device home page, note the uptime and signal strength (WiFI RSSI), granted not on the latest build but it was a work longevity check.

Image

@derei
Copy link
Author

derei commented May 21, 2025

@NonPIayerCharacter
Good points.

  • router reboot, already done several times. "Power it off and on" is the first thing one learns 😆
  • ping watchdog: I just enabled it. Will let you know if anything relevant changes (or doesn't)
  • WiFiState - from my non-C programmer stance I looked over the source and it looks like a passive check (it gets RW_EVT_STA_GOT_IP from hw driver, I'm assuming, and it considers it a done deal). My first thought would be: HTTP Server is not always ready (why?)

@DeDaMrAzR
Copy link
Contributor

Just did the OTA on that same module to test the latest build.

Image

@derei
Copy link
Author

derei commented May 21, 2025

@DeDaMrAzR

  1. Yes. Already done, with no issues. But I am also learning (to pay attention, mostly to the fact that module has a 2nd TX pin)
  2. It’s a standard 2.4GHz WPA2 personal network (Virgin Media router, latest model). No mesh, no guest isolation, no MAC filtering, no dual-band handoff issues. The OBK module is about 2 meters from the router.

Image

Boot Log:

Info:MAIN:Main_Init_Before_Delay
Info:CFG:####### Boot Count 448 #######
Warn:CFG:CFG_InitAndLoad: Correct config has been loaded with 6 changes count.
Error:CMD:no file early.bat err -2
Info:GEN:PIN_SetupPins pins have been set up.
Info:MAIN:Main_Init_Before_Delay done
Info:MAIN:Main_Init_Delay
Info:MAIN:Main_Init_Delay done
Info:MAIN:Main_Init_After_Delay
Info:MAIN:Using SSID ##################
Info:MAIN:Using Pass ##################
Error:HTTP:Created HTTP SV thread with (stack=2048)
Info:MQTT:MQTT_RegisterCallback called for bT obkC9E0CDCC/ subT obkC9E0CDCC/+/set
Info:MQTT:MQTT_RegisterCallback called for bT bekens_n/ subT bekens_n/+/set
Info:MQTT:MQTT_RegisterCallback called for bT cmnd/obkC9E0CDCC/ subT cmnd/obkC9E0CDCC/+
Info:MQTT:MQTT_RegisterCallback called for bT cmnd/bekens_n/ subT cmnd/bekens_n/+
Info:MQTT:MQTT_RegisterCallback called for bT obkC9E0CDCC/ subT obkC9E0CDCC/+/get
Info:CMD:CMD_StartScript: started @startup at the beginning
Info:CMD:LFS_ReadFile: failed to file autoexec.bat
Info:CMD:CMD_StartScript: failed to get file autoexec.bat
Info:BERRY:[berry init]
Info:BERRY:[berry start]
Info:BERRY:[berry end]
Info:BERRY:[berry start]
Info:BERRY:be_pcall fail, retcode 3
Info:BERRY:top=3
Info:BERRY:stack traceback:
Info:BERRY:	
Info:BERRY:string
Info:BERRY::1:
Info:BERRY: in function `
Info:BERRY:main
Info:BERRY:`
Info:BERRY:stack[1] = type='function' ()
Info:BERRY:stack[2] = type='string' (import_error)
Info:BERRY:stack[3] = type='string' (module 'autoexec' not found)
Info:BERRY:[berry end]
Info:MAIN:Main_Init_After_Delay done
Info:MAIN:Time 1, idle 215338/s, free 74808, MQTT 0(0), bWifi 0, secondsWithNoPing -1, socks 2/38 
... 
Info:MAIN:Time 5, idle 186924/s, free 74808, MQTT 0(0), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:MAIN:WiFi SSID: waiting for SSID switch 1/3 (using SSID1)
Info:MAIN:Registered for wifi changes
Info:MAIN:Connecting to SSID ##################
Info:MAIN:Time 6, idle 180850/s, free 69472, MQTT 0(0), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:MAIN:Boot complete time reached (5 seconds)
Info:CFG:####### Set Boot Complete #######
Info:MAIN:Time 7, idle 178620/s, free 69472, MQTT 0(0), bWifi 0, secondsWithNoPing -1, socks 2/38 
...
Info:MAIN:Time 10, idle 0/s, free 69712, MQTT 0(0), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:GEN:dhcp=0 ip=0.0.0.0 gate=0.0.0.0 mask=0.0.0.0 mac=##################
Info:GEN:sta: 0, softap: 0, b/g/n
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTING - 1
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTED - 4
Info:MAIN:Time 11, idle 84565/s, free 58168, MQTT 0(0), bWifi 1, secondsWithNoPing -1, socks 3/38 
Info:MAIN:Time 12, idle 190444/s, free 69904, MQTT 0(0), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:MQTT:mqtt_host empty, not starting mqtt
Info:MAIN:Time 13, idle 186435/s, free 69904, MQTT 0(1), bWifi 1, secondsWithNoPing -1, socks 2/38 
... 
Info:MAIN:Time 20, idle 186771/s, free 69688, MQTT 0(1), bWifi 1, secondsWithNoPing -1, socks 3/38 
Info:GEN:dhcp=0 ip=################## gate=192.168.0.1 mask=################## mac=##################
Info:GEN:sta: 1, softap: 0, b/g/n
Info:GEN:sta:rssi=-47,ssid=##################,bssid=##################,channel=6,cipher_type:CCMP
Info:MAIN:Time 21, idle 186940/s, free 69904, MQTT 0(1), bWifi 1, secondsWithNoPing -1, socks 2/38 
...
Info:MAIN:Time 28, idle 187821/s, free 69904, MQTT 0(1), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:MQTT:mqtt_host empty, not starting mqtt
Info:MAIN:Time 29, idle 185071/s, free 69904, MQTT 0(2), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:MAIN:Time 30, idle 186134/s, free 69688, MQTT 0(2), bWifi 1, secondsWithNoPing -1, socks 3/38 
Info:GEN:dhcp=0 ip=################## gate=192.168.0.1 mask=################## mac=##################
Info:GEN:sta: 1, softap: 0, b/g/n
Info:GEN:sta:rssi=-55,ssid=##################,bssid=##################,channel=6,cipher_type:CCMP
id=##################,channel=6,cipher_type:CCMP
Info:MQTT:mqtt_host empty, not starting mqtt
Info:MAIN:Time 61, idle 182198/s, free 69904, MQTT 0(4), bWifi 1, secondsWithNoPing -1, socks 2/38 
...
Info:MAIN:Time 70, idle 186976/s, free 69904, MQTT 0(4), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:GEN:dhcp=0 ip=################## gate=192.168.0.1 mask=################## mac=##################
Info:GEN:sta: 1, softap: 0, b/g/n
Info:GEN:sta:rssi=-46,ssid=##################,bssid=##################,channel=6,cipher_type:CCMP
Info:MAIN:Time 71, idle 185041/s, free 69904, MQTT 0(4), bWifi 1, secondsWithNoPing 1, socks 2/38 
...
Info:MAIN:Time 74, idle 186351/s, free 69904, MQTT 0(4), bWifi 1, secondsWithNoPing 4, socks 2/38 
Info:MAIN:[Ping watchdog] No ping replies within 5 seconds. Will try to reconnect.
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_AUTH_FAILED - 3
Info:MAIN:Time 75, idle 190229/s, free 74560, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
...
Info:MAIN:Time 80, idle 186921/s, free 74560, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:GEN:dhcp=0 ip=0.0.0.0 gate=0.0.0.0 mask=0.0.0.0 mac=##################
Info:GEN:sta: 0, softap: 0, b/g/n
Info:MAIN:Time 81, idle 187608/s, free 74560, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
... 
Info:MAIN:Time 84, idle 189282/s, free 74560, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:MAIN:WiFi SSID: waiting for SSID switch 1/3 (using SSID1)
Info:MAIN:Registered for wifi changes
Info:MAIN:Connecting to SSID [##################]
Info:MAIN:Time 85, idle 179900/s, free 70600, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
...
Info:MAIN:Time 88, idle 84467/s, free 69440, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:MAIN:Time 89, idle 0/s, free 69440, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:MAIN:Time 90, idle 0/s, free 69440, MQTT 0(4), bWifi 0, secondsWithNoPing -1, socks 2/38 
Info:GEN:dhcp=0 ip=0.0.0.0 gate=0.0.0.0 mask=0.0.0.0 mac=##################
Info:GEN:sta: 0, softap: 0, b/g/n
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTING - 1
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTED - 4
Info:MAIN:Time 91, idle 75972/s, free 69848, MQTT 0(4), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:MAIN:Time 92, idle 185379/s, free 69888, MQTT 0(4), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:MQTT:mqtt_host empty, not starting mqtt
Info:MAIN:Time 93, idle 188709/s, free 61280, MQTT 0(5), bWifi 1, secondsWithNoPing -1, socks 3/38 
...
Info:MAIN:Time 100, idle 184552/s, free 69888, MQTT 0(5), bWifi 1, secondsWithNoPing -1, socks 2/38 
Info:GEN:dhcp=0 ip=################## gate=192.168.0.1 mask=################## mac=##################
Info:GEN:sta: 1, softap: 0, b/g/n
Info:GEN:sta:rssi=-51,ssid=##################,bssid=##################,channel=6,cipher_type:CCMP
Info:MAIN:Time 101, idle 184157/s, free 69888, MQTT 0(5), bWifi 1, secondsWithNoPing -1, socks 2/38 
...
Info:MAIN:Time 108, idle 189629/s, free 69888, MQTT 0(5), bWifi 1, secondsWithNoPing -1, socks 2/38
  • OBK stack initializes fast: by Time 6, all scripts, Berry, MQTT stubs are initialized. (still, I wouldn't mind it faster)
  • WIFI_STA_CONNECTED - 4 isn’t logged until Time 11 ... but ...
  • IP = 0.0.0.0 at Time 10 - 20
  • OBK shows WIFI_STA_CONNECTED - 4 even though:
    • DHCP has not completed
    • Device is not reachable
    • Gateway is 0.0.0.0
      (So the current implementation treats “associated with AP” or "got IP callback" too early)
  • IP assigned at Time 20 - 21, but still not yet reachable via browser
  • ping watchdog triggers at Time 74
  • WIFI_STA_AUTH_FAILED - 3 at Time 75, connection reset
  • Reconnect attempt begins at Time 84
  • Valid IP regained by Time 100

@NonPIayerCharacter
Copy link
Contributor

IP 0.0.0.0 at 10 sec is normal, since you have yet to connect. At 20 sec everything looks ok. It logs every ten seconds, so if you would've connected at 19 sec, then log at 20 sec would've showed your ip, gw, etc.
WIFI_STA_CONNECTED event is received the moment DHCP is completed.
WIFI_STA_AUTH_FAILED is after ping watchdog forcibly disconnects, so this is normal behaviour.

Got IP callback was present in earlier versions, but it was fixed. (there were 2 WIFI_STA_CONNECTED events, one for connection, the other for IP).
You can try to configure static IP to see if it fixes anything.
And if you enable flag 37 (without 51), then it will try to connect at boot, not at 5 sec uptime.

@derei
Copy link
Author

derei commented May 21, 2025

@NonPIayerCharacter
Just to clarify: my device was never using DHCP. It had a static IP configured from the beginning.
So the explanation about IP showing up late due to DHCP delay doesn’t apply here.

Despite the static config, the IP, gateway, and netmask were all reported as 0.0.0.0 until Time 20, and the device was not reachable via ping or browser before that - so there’s clearly a delay somewhere between internal WiFi stack readiness and full network usability, even in static IP mode.

Also worth noting: WIFI_STA_CONNECTED is still logged at Time 11, but IP = 0.0.0.0 at that point. That suggests the event is still triggered before full IP stack confirmation - even if the device had full static IP configured.

Takeaway
WiFiState == 4 does not currently guarantee that the device is reachable. Static or DHCP, that gap needs to be detectable somehow.

@openshwprojects
Copy link
Owner

@derei , just to be sure, can you please check with no scripts in littlefs? you have some autoexec.bat or be (berry) and maybe, just maybe, it is slow to execute or something and causes instability? Just to be sure. Make a backup first. Remove all scripts, do full power off and on, and check, is WiFi still slow and problematic?

@openshwprojects
Copy link
Owner

On a side note, @derei , apart from WIFi issues (which are specific to your device mostly, not a generic problem), can you tell me which features, scripts, would you suggest as most important for next Berry tutorial? So I can help you setting up your automations with Berry?

@derei
Copy link
Author

derei commented May 21, 2025

@openshwprojects
Before the log and screenshot I already did lfs_format, cleanAll, and reapplied wifi and ip settings anew.

So, cleaner than that it's not possible.

@openshwprojects
Copy link
Owner

Are you sure? What is that then?

Image

@derei
Copy link
Author

derei commented May 21, 2025

@openshwprojects
That log doesn’t contradict what I said.

I believe the errors simply shows it tried to run an autoexec.bat file, but the file wasn’t found.
I'm guessing that should be expected behavior, as lfs was wiped.

I see berry tried to init and threw an import_error, but again: that's because there was nothing to import. This doesn’t mean there's leftover logic - it means Berry is active, but has nothing to execute.

So yes: it was a clean state. No .bat, no .be.
What you see in that log is consistent with an empty file system.

@openshwprojects
Copy link
Owner

Ah, I see. You may be right here. It's just message cause of missing import.

@DeDaMrAzR
Copy link
Contributor

Talking about connectivity issue - I've seen somewhat similar situation with a static IP set being in conflict with another device on the network. Can you try to go DHCP for that device and see if that will resolve the connectivity issue? There is nothing unusual in the log provided... 🤔

To do that set all addresses to 0.0.0.0 in IP settings and reboot .

Also for completes, just so we can eliminate more things try to flash your device with standard (not BERRY) version and see if the connectivity issue remains, not that it should matter but we can try a deductive method to get to the bottom of the issue.

@derei
Copy link
Author

derei commented May 21, 2025

@DeDaMrAzR I know all my devices. Plus, if it were ip conflict it wouldn't connect at all.
I'll test DHCP, and then I'm thinking to test stock fw against how fast it connects to the app. While we won't have conventional OBK-like logs, we can infer from how fast the device will become available in gosund app. And I'll also try to get some uart logs.
But this is unlikely to happen today or tomorrow (I have some urgent priorities).
Thanks guys for this dedication! 🙌 .

@DeDaMrAzR
Copy link
Contributor

Any reports on DHCP test or connectivity improvements?

Did you try to flash it with standard binary (non BERRY)?

Can you try any other similar device?

@openshwprojects
Copy link
Owner

I have seen this happen with MAC conflict. In BK OBK, you can change MAC

@MaxineMuster
Copy link
Contributor

Despite the static config, the IP, gateway, and netmask were all reported as 0.0.0.0 until Time 20, and the device was not reachable via ping or browser before that - so there’s clearly a delay somewhere between internal WiFi stack readiness and full network usability, even in static IP mode.

Also worth noting: WIFI_STA_CONNECTED is still logged at Time 11, but IP = 0.0.0.0 at that point. That suggests the event is still triggered before full IP stack confirmation - even if the device had full static IP configured.

Takeaway WiFiState == 4 does not currently guarantee that the device is reachable. Static or DHCP, that gap needs to be detectable somehow.

I can't see that in your logs. Network state including IP is displayed every 10 seconds.
At time 10, first IP 0.0.0.0 is printed, only after(!) this, STA_CONNECTING and STA_CONNECTED are logged, so the first time the IP can actually be displayed during network state is at time 20, where it's shown.
Maybe it might be an idea to do an additional log of network state in case of network related state changes.

The last point is totally valid, though not necessarily an OBK issue.
First, there can be other issues preventing reachability (duplicate IP or MAC, even a static IP from another network will show WiFiState 4, and you will not be able to reach the device). I don't want to suggest that this is the case here, just to point it out.
That's why something like the ping command is here to actually test connectivity.

A last point to accept is that there is a somewhat limited possibility what to do or see with regard to the WiFi driver. We always rely on a library from the sdk to implement the functions properly. We call something like "reset wifi" and can not tell, if the results are exactly like a fresh rebooted wifi.
Searching for WiFi SSID, connecting, WPA, all this is only achieved by calling the manufacturers library functions. So, there might be some issues in OBK code how the library is used, but it might also be related to the wifi-library.

@divadiow
Copy link
Contributor

but it might also be related to the wifi-library.

worth seeing if the new SDK libs are any different perhaps?

@derei
Copy link
Author

derei commented Jun 2, 2025

@openshwprojects

So, I did some tests. I reflashed stock fw, collected uart logs, then did the same on a fresh OpenBK7231N_QIO_1.18.109_berry.
After flashing OBK, I only configured WiFi, Static IP, Pins. No autoexec.bat or any other kind of intervention.

Below are extracts from UART logs, for comparison:

Stock FW (configured and used with Gosund app):

Event Time (approx) Log evidence
Bootloader Start T+0.0s V:BK7231N_1.0.1 and J 0x10000
BLE Stack OK T+1.0s BLE_STACK_OK, CREATE DB SUCCESS
SDK Init Complete T+1.5s [mf_init succ]
Wi-Fi Connected T+2.5s SM_CONNECT_IND_ok, ip_addr: <little-endian>
MQTT Cloud Connect Complete T+3.0s mqtt connect success
BLE Connected + DP exchange T+3–7s Ble Connected, ble_send_data_to_app ...

Total boot to full cloud/BLE readiness: ~3–4 seconds

OpenBK7231N_QIO_1.18.109_berry (DNS 1.1.1.1)

Event Time (approx) Log evidence
Start of boot / RF init T+0s bandgap_calm_in_efuse=0x68
Wi-Fi scan initiated T+5s Info:MAIN:Boot complete time reached (5 seconds)
First Wi-Fi connect attempt T+9–11s SM_CONNECT_IND_fail, WIFI_STA_DISCONNECTED
Auth failures + retries T+11–31s Repeated WIFI_STA_AUTH_FAILED and scan/connect loops
Wi-Fi connected successfully T+31s WIFI_STA_CONNECTED, sta_ip_start
IP + signal info finalized T+40s ip=192.168.xxx.xxx, rssi=-52, full Wi-Fi info dump

OpenBK7231N_QIO_1.18.109_berry (DNS 192.168.xxx.xxx)

Event Time (approx) Log evidence
Start of boot / RF init T+0s bandgap_calm_in_efuse=0x68
Boot delay messages begin T+0.1s–T+0.9s #Startup delayed Xms# (0...90ms)
Main_Init_Delay done T+1s Main_Init_Delay done
Wi-Fi scan initiated T+5s Info:MAIN:Boot complete time reached (5 seconds)
First Wi-Fi connect attempt T+9–11s WIFI_STA_CONNECTING, WIFI_STA_DISCONNECTED, SM_CONNECT_IND_fail
Auth failures + scan retries T+11–31s WIFI_STA_AUTH_FAILED, multiple supplicant_main_exiting loops
Wi-Fi connected successfully T+31s WIFI_STA_CONNECTED, sta_ip_start
Static IP configured T+32s configuring interface mlan (with Static IP)
IP + signal info finalized T+40s ip=192.168.xxx.xxx, rssi=-56, full Wi-Fi info dump

The ~30+ second delay before Wi-Fi connection in OBK is not caused by MQTT config (I'm running the device alone, no HA yet) or observable DNS resolution behavior. While I tested both Cloudflare DNS (1.1.1.1) and local router DNS (192.168.xxx.xxx), the logs show no evidence of DNS queries.

Instead, the delay stems from repeated Wi-Fi authentication failures (WIFI_STA_AUTH_FAILED) and supplicant_main_exiting loops, suggesting OBK’s supplicant retries handshake multiple times before succeeding. The same device connects immediately under stock fw, which implies OBK’s Wi-Fi stack may be less tolerant of strict or timing-sensitive routers. Worth investigating WPA2 handshake compatibility and retry logic.

@MaxineMuster
Copy link
Contributor

Wasn't there a "quick connect" for Beken devices?
Flag 37 iirc. Did you try that?

@derei
Copy link
Author

derei commented Jun 2, 2025

@MaxineMuster I tried it before and it caused connection issues.
But thanks for reminding me of it (37+51) - I just tried it and it works now.

Possibly there were some artifacts remaining from previous flashing, and now it's gone - I did an Erase All at 9600 baud before flashing obk, that may have helped.

Regardless, the fact the connection keeps dropping and retrying during boot (if 37+51 not enabled), still points out towards unoptimised network protocol. Worth looking into it - something is definitely not working as it should, and coverig it up is not the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants