Yeelight color bulbs offline

OK so this explains the behavior I have been following over at the Home Assistant forums which was originally thought to be because of Music Mode being on. Turned it off but my lights still go unreachable from time to time. Following

Hi,
I read you pm and noticed several devices show very poor signal strength (<= -70), please check if they are the devices that go offline frequently.

We have observed similar issues where low RSSI cause the devices go and stay offline. This could be caused by long distance from routers, or surroundings that weaken signal strength, sometimes defect in hardware.

If that’s the case, please move the devices nearer to the router and check RSSI again, if the RSSI stays low, then it is probably defected hardware. If RSSI looks good in the new location, check if the offline issue is gone.

Overheating issue can also drive the device offline, this only occurs on devices that have bad temperature sensors installed, and when they are turned on. In this case, turn the light off and see if the issue does not stay.

@liufei
I don’t think that is a fair representation of the results I sent you.

Of the results I sent:

5 were -40-49 db (classed as excellent)
9 were -50-59 db (classed as good)
12 were -60-69 db (classed as fair)
3 are >= -70db (classed as weak)

The reason for this is that I previously had 2 routers serving the house and turned one off to debug this problem that you keep blaming on your customers routers.

What I will do now is turn the second one back on, which will push up the signal strength to the other half of the house. I will send you the updated screenshots and I will have you either investigate this problem once and for all or refund my money for this defective product.

I am 100% sure that neither RSSI is not the issue here. I am sure of that because the devices which have failed in the last 2 weeks are not anywhere near the lowest RSSI of the devices I showed you, those are on the other side of the house.

Also with your comment here:

Overheating issue can also drive the device offline, this only occurs on devices that have bad temperature sensors installed, and when they are turned on. In this case, turn the light off and see if the issue does not stay.

I can’t do anything once the wifi connection is lost, I have no control over the bulb. Of course turning it off and on always, in every case fixes the problem, but that doesn’t mean it was overheated and needed to cool down first, I don’t let it sit before I power it back on, I just know by now there’s a fault in your product which causes it to disconnect from the wifi after a couple of weeks of continuous operation and that if I power it straight off and on again it will reconnect.

I also get the best experience with your product when there is a power outage, since it powers off all of the devices at the same time, so I know I won’t have to reset a light for another 2-3 weeks. If there’s no power outage, I know it will be a light every few days. So there is a review of your product: “works best after a power outage, before it goes back to being buggy, with no fix after almost 2 years”.

@liufei I also see nothing addressing the router logs that I showed you above. I will reset that device now and give you its particular RSSI value and then you can explain to me the log entries I shared above which shows the bulb disconnecting.

You also haven’t shared an email address with me so I can send you my entire unredacted and unfiltered set of router logs for the last 8 days to help in your analysis.

Edit: Sent you a PM with a screenshot. RSSI value of the bulb after reboot and before I turn back on the second router is -52db. This is considered a good value (https://www.netspotapp.com/what-is-rssi-level.html), only 2 db away from excellent.

Why did this bulb just stop responding the day before yesterday and why did I need to turn it back on again today?

Also, finally, why are these other devices not affected:

I have never had to reset any of these devices. Even if the RSSI level is low for a couple of the bulbs you saw, it does not explain the following:

  • Why a low data rate device like a yeelight even cares?
  • Why a lossy wifi signal would cause a bulb to disconnect from the network and never reconnect again until it is powered off and on
  • Why bulbs with perfectly good RSSI values are affected
  • Why a reconnect interval can’t be very simply coded into your bulb firmware so that it at least tries to renegotiate with the router once it loses connection

Can you explain the above for me?

@ngardiner I googled around using key phrase ‘deauthentication due to local deauth request’ and found below link:
https://www.raspberrypi.org/forums/viewtopic.php?t=191287
which says “This problem is related the config option “wpa_group_rekey”, because reducing the value to 600 (10 minutes) make the issue appear after 1 hour. Starting hostapd directly shows the following not really helpful”.
This seems to match with the log you captured on the router, because it happens when the fourth ‘group key handshake’ took place.
Please, if possible, check if this setting can be enlarged on your router, to see if the issue can be fixed. We will do some local testing regarding this setting to find more.

This is not true. There have been hundreds of group key handshake events, I just showed a subset from a single log file, the logs rotate daily. I offered twice now to send them to you for analysis if you just give me a way to do so privately.

I have changed this parameter as you have asked me to do, I will see if it has any effect, however either way I still feel I am making an AP side change due to a device side bug, and you have no documentation to explain why this is or what others should set on their side to avoid triggering the same bug.

Hi,

Thanks for helping to test this parameter out. We have modified this setting on our router and setup a test bed to collect log on device side, hoping to capture the offline event. The first step to narrow down the root cause of such issue is to find out the pattern and reproduce it. Next step is to fix it on device side.

If the setting has nothing to do with this issue, we would try purchasing a C7 router.

My email address is liufei@yeelight.com. Sorry for missing your request twice…

@liufei

I have disabled this feature (group key handshake) and I have some preliminary findings to share.

What I would like to ask you first are 2 questions:

  1. Are any of the devices in your test lab using hostapd for the AP functionality? (I would think this is quite common)
  2. If so, can you please check the hostapd.conf file and confirm the setting of the following variable, which I have changed on my routers:

wpa_group_rekey=0

Please note the following description from hostapd.conf documentation:

Time interval for rekeying GTK (broadcast/multicast encryption keys) in
seconds. (dot11RSNAConfigGroupRekeyTime)
This defaults to 86400 seconds (once per day) when using CCMP/GCMP as the
group cipher and 600 seconds (once per 10 minutes) when using TKIP as the
group cipher.
wpa_group_rekey=86400

Note that after turning this off, rekey events still occur when a device leaves the wifi network:

Rekey GTK when any STA that possesses the current GTK is leaving the BSS.
(dot11RSNAConfigGroupRekeyStrict)
wpa_strict_rekey=1

So, this setting only affects the behaviour of the scheduled rekeying and not the strict rekeying when a device leaves the network.

My findings so far:

  • It is not possible in such a short timeframe to identify if the problem is solved, however
  • My Yeelights are noticably more responsive now, in the past they have required several events to turn them on, they now (seem?) to come on immediately

That said, I want to make it clear that:

  • I don’t consider this fixed, I have not confirmed that it will not happen again and it could simply be a placebo effect
  • I still don’t understand why I need to change a hostapd default security setting to stop a behaviour in a single device from taking that device offline and never reconnecting. Note as I have said ad nauseum, no other device on my network suffers from this issue
  • I will correct any attempt to call this a router or customer side bug until it is proven to be so
  • I still expect Yeelight to fix thisor come up with appropriate guidance for other customers who cannot tune this setting on their side (or in the event that this setting causes a significant security risk, which I am not fully sure of yet) as hostapd is going to be a widely used platform

Just some other food for thought. Searching on this topic brings me to a few very similar stories:

https://forums.whirlpool.net.au/archive/1089217

I think a rekey value of zero actually prevents the AP from changing keys at all. This is bad because it reduces the security. It would prevent drop outs from wireless cards that aren’t properly implementing TKIP / AES though…

Also one of the simplest questions:

  • Group key is used for broadcast and multicast only. Unicast traffic is not affected. Why is this key causing a full loss of wifi connectivity?

Same Problem, Offline bulb or not list.

Please fix the problem.

Edit; Yeelink-Light-color2_miapbf69

Six hours ago, when I had no problems but now not control it.

1, Resseting Device(Success)

  1. Connection to Wifi or Internet (Success)

  2. Connection steps (Success)

  3. Connection Complate ( Success )


  1. Connecting and Control List but no bulb? ( Failed )

@liufei

Although I have noticed a faster response times from my Yeelights (again though, I mentioned that it is maybe because I am paying more attention now) after making the change, today another light went offline:

image

In the logs I see:

Dec 14 17:59:27 xxxx hostapd: wlan1-1: AP-STA-DISCONNECTED xx:xx:xx:xx:xx:9d
Dec 14 17:59:27 xxxx hostapd: wlan1-1: STA xx:xx:xx:xx:xx:9d IEEE 802.11: disassociated due to inactivity
Dec 14 17:59:28 xxxx hostapd: wlan1-1: STA xx:xx:xx:xx:xx:9d IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

I believe I have read about this message as well, and yet another option that I will need to set to work around this problem, namely:

disassoc_low_ack=0

Again, this seems to be a workaround for device issues but I will try it and report back.

Edit: Just to confirm, setting has been applied to both routers and I had to power down the bulb and power it back on again to get it back on the network. We will see if any other bulbs go offline.

@liufei

Just an update on what I see as a result of the change I made above. All other devices appear to work as normal and I have not so far had any bulbs go offline (however this tends to happen every few days and at this point is still inconclusive) but I would like to add this one observation:

All of the responsiveness improvement that I mentioned in the previous posts after changing the first setting we tried are now gone after making the second change. I can assure you both settings are active at the same time on my router, the second setting has not cancelled out the first. I am not sure exactly what the implications of these settings are for the bulb response times but they are back to how they were originally - which I am fine with, it’s how they have been for their entire lifetime, I’m just trying to gather some data points here.

I am again seeing the behaviour I always have which is that if I walk into a room with motion sensors (I use the Xiaomi Aqara motion sensors) with a group of lights, only some of the lights may illuminate immediately, others may take another second to light up. I don’t care about this personally and don’t have any reason to pursue it (I see no timeouts in the logs, I don’t see any warnings in homeassistant or the app, I think it’s just the way it is with a lot of bulbs on a wifi network). It’s the bulbs going offline that I need solved as I plan to remove all light switches and replace them with touchscreens and if I do so, I won’t have a way to reset the lights if they go offline.

hi, @ngardiner, thanks for the information. this will help a lot in our reproducing and analyzing this issue.
We have referred the issues reported in this topic to Marvell, the MCU provider of this product, and they asked for ‘the sniffer log in 802.11 plus radiotap header as below shown’. I wonder if you would like to help collecting the data for them.

Hi @liufei,

Sure, I will run this capture. Would you just like a general capture or should I leave it running until the bulb that I am capturing goes offline?

Hi @liufei

Can you please ask Marvell whether they want a capture from a working or non working device and how long I should capture for?

Also, what shall I do with the resulting file? Email to you? Do you have a size limit for emails?

Edit 22/12 - Any updates? It’s been a few days now.

Hi @liufei,

Just an update. I have been running this configuration for a week now and I have not had any yeelights go offline, which I am happy with. I have however found that the settings proposed affected a different device (my Fronius solar power inverter) causing it to drop off the network.

I have split the devices over different SSIDs so I can set these settings for the Yeelights only to avoid impact to my other devices.

So, I now have a configuration for just my device which has to be applied to a specific SSID to keep Yeelights working. Not a solution. It does however let me work for now. I will continue monitoring it.

Note that I have never had to change the default settings for a device before and am still not 100% sure what the changes mean. This is not a case of one setting affects one device and another affects a different device, I have only had to change these settings because the Yeelights go offline often, and in the process of changing these settings I have knocked my inverter (and all of my power monitoring) offline. It’s the first time it’s been offline since installation 1 year ago and I’m not really pleased but that is what happens when I mess around with settings on the live network to test what works with the Yeelights. Given all this effort and troubleshooting I’ve put in I am expecting you have tested and made some progress on your side at replicating my problem, at least? You know which software (hostapd) and what settings caused it so it’s a 1 or 2 hour job to lab that up at most. Let me know when you have hostapd running in your lab with a wifi device (any, I doubt it matters) and let’s see what you see.

Hi @ngardiner,
Marvell need to capture the data exchange the moment the issue happened. But I guess you will need to run the capture for hours before the issue reproduces.
I’ve setup a router with similar settings but did not observe the issue yet. Wonder if there are other factors playing a role in this case. Keep it running. Asked Marvell to setup this testbed.

Hi @liufei,

I will run an ongoing capture for a single bulb and wait for an issue with that bulb. It may take some time. In the meantime, can you give a bit more detail on your testbed? I just want to make sure it is similar to my environment.

Did you note my note before? The two settings I had to set for Yeelight stability have caused my Fronius inverter to go offline. i had to move everything else out of that SSID. I lost a week’s worth of inverter monitoring since I made the change:

So, even if this does fix my Yeelight issue I need to flag that it may be incompatible with other devices which worked fine until I made this change.

I have got exactly same problem with asus router and UniFi AP. But problem is only with LED bulbs (LED ceiling lights are ok).