Another connection issue thread with lots of research and will to help and test!

Thanks, do now know how to use it right now, but will consider for next steps.

Thanks for this, btw. While I think commenting out the throwing of the exception is not helpful, because that’s just ignoring the issue :smiley: The force closing of the socket may fix it. It’s not nice, but what do you do if the Yeelights just can’t handle longer running socket connections. I’ll give it a try!

That’s why I said it’s a quick and sloppy solution.
But for me it works for using all the basic functionalities.
Power, Colour, Brightness, Temperature, Flows.
In my usage scenarios I don’t need to keep sockets open.

Thank you for leaving with your useless information Dalanim

yeah same. Using a differnt / own fork of the yeelight py lib is not so trivial with hassio. I did a quick look, but didn’t yet find an easy way. Maybe with a custom component that ships directly with the py lib… Will give it a try the next days.

However @Yeelight: This still doesn’t solve the initial issue and I would still appreciate any feedback to my initial post! Thanks!

Hello,
I actually came to discover something interesting.
Yeelight

Telnet works perfectly fine with commands which do an action. Like Power ON and Power OFF.
But when trying {“id”: 100000000, “method”: “get_prop”, “params”: [“power”, “bright”]}
it will just not answer.

Yeelight Team, any ideas?

best regards,
Thomas

This is interesting, though we failed to reproduce. How does this issue happen? Does it happen the moment you telnet to it or after some exchange of commands and responses? When the device failed to response to ‘get_prop’, did ‘set_prop’ have any effect on the device, and did it send back any response message?

Hi, Andy. Sorry for all the troubles and pains you’ve gone through. We are working to understand and find a solution for the issues you mentioned.

It seems the key issue here is, bulbs go offline much more easily when internet access was cut off. Althrough the main intention of LAN mode was not to let the device run cloud-less, but to allow 3rd party integrations, the bulbs should not behave like that (go offline).
We will try reproduce in our lab and find a solution. Be back soon.

As for the HA issue. We are not aware how the HA pluggin works as we did not develop that. Something may ring a bell is that as a method of traffic-control, the bulbs only allows 60 requests per minute. Does the HA flood a bunch of requests to the bulb by any chance before it complained connect loss? Or does HA have any tuning of TCP layer, like changing the keepalive parameters? Does HA read from the socket from time to time even when it’s not sending any requests? It could cause congestion in the bulbs sending queue which in time affects all traffic when bulbs’ (very limited) memory runs out.

Your Wi-Fi setup is of no significance to this issue, I do agree :slight_smile:

Hello,

let me answer step by step.
First of all I want to say that I’m really happy with those bulbs. From how the light looks, the connectivity, the multiple commands we can use, built-in Wifi. This lights are far better than many of their competitors.

How does this issue happen?
Reproduction is fairly easy.

  1. Open up 4 terminals to same bulb and test get_prop command on each terminal. Works fine
  2. Now open a 5th connection which gets refused as per documentation. Close Terminal
  3. Close all other 4 terminals
  4. Open up a new terminal and get_prop will not work anymore. All other commands work fine.

So I think somehow the connection limit function within the firmware is somehow making the response for get_prop return either and empty string or none/null.

SET_PROP works fine as all other commands.
set_prop returns {“id”:100000000, “error”:{“code”:-1, “message”:“method not supported”}}

This is also what makes Bulbs become unavaible in Homeassistant, thus making all people cry on the forum.
Homeassistant executes this command (get_prop) and waits for response in order to show lights available.
I was able to work my way around this by changing the part with get_prop and actually use SSDP disovery to get the properties which works fine.

I’m not writing on this forum because I can’t get past the issues, but because there are much people which don’t have either the technical knowledge or time to get past this issue.
If you are able to change this part in the firmware people will be very very happy and you won’t have the forum clogged with topics regarding homeassitant.

best regards,
Thomas

Hello,

the rate is not what is affecting Homeassistant.
Homeassistant is opening up a socket and listening to it. It is not asking from time to time for changes in the bulb, it just gets them. So means it just sends 1 single command at startup/initial connections.
Everytime it changes the light it sends 1 command, not multiple.

So there is no way it send more than 60 commands per minute, except when there’s somebody pressing the buttons like crazy.

There is also no changes in keepalive parameters which could flood the connection.

best regards,
Thomas Barbut

There is another thing which might help to get down the issue.

So for example if your send
{ “id”: 1, “method”: “set_power”, “params”:[“on”, “smooth”, 500]}
It will send {“method”:“props”,“params”:{“power”:“on”}} to all open Sockets.

If you send {“method”:“props”,“params”:{“power”:“on”}} a second time there will be no answer as lights are already on which is completely fine.

If you send {“method”:“props”,“params”:{“power”:“off”}} there will be an answer again on all sockets.

Maybe get_prop behaves like the other commands and somehow as some variable(state) saved in the firmware which gets stuck
get_prop shouldn’t work on same principle as the set commands, it should always answer without coniditions. Maybe theres a condition.

Hi, Tomas,

I tried your procedure but no luck, did not reproduce. The commands works just fine.

I’m now curious in how 3rd party process the messages received from the bulbs.
There are two types of messages (in JSON format)

  1. Response to requests. The bulbs send one response to each line of input from the clients
  2. Notifications of state change. Whenver the state on the bulbs changes, the bulbs broadcast notifications to all clients connected.

I guess HA wait on the response from bulbs to make sure they are ‘present’ or ‘available’. But please note that, after sending out the request, it should not expect the next message received to be a response, instead, it may need to go through several notifications until meet the response message.

This is just a wild guess. If anyone knows how HA interworks with the bulbs, it really would help a lot.

br,
Fei Liu

Hello,

There is no way the procedure does not work. I tested it with a total of 5 bulbs.
Maybe you are on different Firmware version?
I’m on 2.0.6_0030 connected to Frankfurt.

Here a recording of procedure.
https://www.youtube.com/watch?v=AFaTAUqau4k&feature=youtu.be
Make sure to open 5 simultaneous connection after you power on the bulb.
You need to hit the connection limit, in order to trigger the behaviour. So 5.

Make sure to use GET_PROP command. Other commands work fine.

best regards,
Thomas

Hi, Thomas,

Very nice video! The key difference is that in my test, the telnet sessions were closed ‘gracefully’ by the clients software, but in your video, the putty windows were closed ‘abruptly’ . Anyways the issue can reproduced in lab now. A patch is to be ready soon. Please hold on.

br,
Fei Liu

Hello,

awesome.

I didn’t answer you on the interworks of Homeassistant because I wanted this issue to get this issue with the firmware itself to get adressed.

So in Homeassistant there are 2 ways in which it gets data from the bulb.

  1. It sends a Request with ID and waits or answer, there is an 5 second timeout.
  2. It gets notifications of state change from open socket and processes it.

First thing when Homeassistant starts up it does the following. As long as availablevariable is false it will retry the entire procedure.

I’ll try to write it in machine code.

avalablevariable = false (bulb will show unavailable)
IF Socket OPEN THEN
{
     IF send_command('get_prop')=TRUE THEN
      {
           IF get_response=TRUE within 5 seconds THEN
              {
                avalablevariable = TRUE
               }
           ELSE
              {
                close the socket
              }
      } 
     ELSE
       {
        close the socket
       }
}

So this means, that if the socket is opened it tries to send command, if command is sent successfully it checks for answer. If either send command or get_resposne is not true it will close the socket and reattempt later. It always start with avaiablevariable=false.
I do not really see a problem in this approaach.

The other thing regarding state change, it uses the open socket (so only 1 socket per bulb) and sends the different SET_prop commands and then bulb notifies all open sockets about the change.
So when I change state on Device A, Device B will show the change. This is behaviour from Homeassistant is as expected per Yeelight Documentation.

So long story short. Homeassistant Yeelight Plugin respects the documentation provided by Yeelight for the bulb. It is also not spamming new sockets.

best regards,
Thomas B

Hi, Thomas,

We have located the root cause of the issue with get_prop. I wonder if you prefer a beta version, or waiting for a ‘release’ version which would need several more weeks.

The HA process looks impeccable. Though there is still one thing I want to be sure. Does get_response ignore all notifications and wait until a response is received? Or if a notification comes before the expected response, will HA deem it as a fault?

br,
Fei LIu

Hello,

Thanks for the quick answers.
I can imagine, that it would need intensive testing before releasing it to that many devices around the globe. I would be more than happy to recieve the update and test it also on my side. My MI ID is: 6382797779

Regarding the thing you want to be sure, I’m more than happy to answer that.
So HA starts an asyncronous request.
So when it sends out GET_PROP it sends it with ID and expecting an answer with the same ID. If in the meantime other notifications come in it will just process them without affecting this request.
So, no. It will no deem it as a fault.


Given that we have such good communication, there is 1 more issue with the firmware which should be adressed and you can replicate easily which I just remembered.

Just unplug the WIFI and plug it back in. The bulb will not reconnect till next physical power off and on.
The real world scenario is:
There is a electrical outage. The bulb firmware will load in maybe 3 seconds whereas most ISP Home routers will take at least 1 minute.

If I’m not mistaken you are using EPS 8266 or somethin similar in the bulb.
You could simply check for WIFI and reconnect in the loop(){} function. That’s what I am doing with all my custom sensors and devices. It might be resource intensive to implement it on every loop, but could for example time it. Like once every minute or something like this.

Maybe there is actually an reconnect function and I didn’t wait enough? Not sure.

It is not reallly, really big issue for me, but imagine using only Yeelight bulbs in a big project. Let’s say you have a hotel. Imagine having to go around the building switching off all bulbs on and off. You could solve it by mounting “Smart wall switches” and fire off on procedure if bulbs are not responsive just like @com_wolf (Another connection issue thread with lots of research and will to help and test!) mentioned above , but again this would be an additional investment (again, imagine a hotel, can get very expensive), which can be solved, maybe by a few lines of code.

best regards,
Thomas

OK. It could be related with some AUTOIP functions. One thing I’m sure, there ARE scanning and reconnect procedures before the bulbs finally connect to Wi-Fi. But if the DHCP server is not present (soon enought), autoip will come into play. I’m not sure though. Need to confirm with team after going to office tomorrow (not sure what time is it for you). Be back soon.

Sure.
Thanks! Appreciated.

Hey @liufei,

I’m posting so late, as I didn’t want to interrupt the conversation between you and @tomb92!
And thanks for the reply, really appreciated!!

And thanks for looking into the internet issue, that is awesome! Let me know if I can provide any more details, like about my network or anything!

About your HA questions, I think @tomb92 answered a lot already. I’m also not the expert on how HA works, but I’m pretty proficient in Python, so I do understand most of the things it does. So I try to respond:

Does the HA flood a bunch of requests to the bulb by any chance before it complained connect loss?

No, HA only does one request to each bulb every 30 seconds (by default), which is get_properties(). And that’s exactly the point where it throws the error message I posted

Or does HA have any tuning of TCP layer, like changing the keepalive parameters?

As far as I can see, no. The HA plugin definitely not, maybe the Python Yeelight lib, but I couldn’t spot tha there either.

Does HA read from the socket from time to time even when it’s not sending any requests? It could cause congestion in the bulbs sending queue which in time affects all traffic when bulbs’ (very limited) memory runs out.

As far as I can understand, yes. I am not 100% sure how it handles the socket connections as in: I don’t 100% understand if it keeps the socket open forever or recreates it at certain events. But I think @tomb92 already responded to that.

I had a quick chat with @tomb92 and he also things my issue could be the get_properties issue he mentioned.
Would it be possible that I could also get this fix build? As I mentioned, I do have quite some different lamps I could test it on! Not only the bulbs, but also different ceiling lights and the Mi bedside Lamps that throw this error every few minutes.
My MI Account ID: 1890771080

Thanks and greetings,

Andy!