TCP Stack issue (might help with the fw 51 issues)

During the last two days, I use 3 telnet sessions to monitor the connection and 1 script to send and receive commands from the bulb, and I didn’t see any issue. :frowning_face:

Sorry I didn’t get back to you sooner. I went to bed at a reasonable hour :slight_smile:
I’m letting the bulbs idle all night with no connectivity. I’ll take half of them tonight and keep a tcp connection open. Perhaps that will turn up some difference. I really hope to find a way to trigger this so it can be easily identified.

I see in another chain there is a .57 version. my last test was on .56. I’ll get all the bulbs up to .57 first.

I’m seeing some “lan_err” properties when using music mode.
This program connects to a light, gets a music connection, uses the music connection to show a few colors, then turns music mode off (which often yields the “lan_err”) then disconnect from the bulb. It’s output should be easy to read with the hint that ip addresses with a (M) after them represent music sockets.

I don’t know if this is related to any other issues but I presume someone put the lan_err stuff in so it can be reported.

02:12:45.871850 10.1.0.212 connect ‘begins’
02:12:45.873522 10.1.0.212 connection ‘made’
02:12:45.873558 10.1.0.212 connect ‘completes’
02:12:45.873579 10.1.0.212 send {‘id’: 1, ‘method’: ‘set_power’, ‘params’: [‘on’, ‘smooth’, 100]}
02:12:45.890466 10.1.0.212 recv {‘id’: 1, ‘result’: [‘ok’]}
02:12:45.892749 10.1.0.212 recv {‘method’: ‘props’, ‘params’: {‘power’: ‘on’}}
02:12:45.974130 10.1.0.212(M) connect ‘begins’
02:12:45.974495 10.1.0.212 send {‘id’: 1, ‘method’: ‘set_music’, ‘params’: [1, ‘10.1.0.3’, 9500]}
02:12:45.987346 10.1.0.212(M) connection ‘made’
02:12:45.987507 10.1.0.212(M) connect ‘completes’
02:12:45.987609 10.1.0.212(M) send {‘id’: 1, ‘method’: ‘set_rgb’, ‘params’: [255, ‘smooth’, 1000]}
02:12:45.987776 10.1.0.212 recv {‘method’: ‘props’, ‘params’: {‘music_on’: 1}}
02:12:45.989165 10.1.0.212 recv {‘id’: 1, ‘result’: [‘ok’]}
02:12:46.988759 10.1.0.212(M) send {‘id’: 1, ‘method’: ‘set_rgb’, ‘params’: [65280, ‘smooth’, 1000]}
02:12:47.990475 10.1.0.212(M) send {‘id’: 1, ‘method’: ‘set_rgb’, ‘params’: [16711680, ‘smooth’, 1000]}
02:12:48.991984 10.1.0.212 send {‘id’: 1, ‘method’: ‘set_music’, ‘params’: [0]}
02:12:49.002922 10.1.0.212(M) connection ‘eof’
02:12:49.003282 10.1.0.212(M) connection ‘lost’
02:12:49.003559 10.1.0.212 recv {‘id’: 1, ‘result’: [‘ok’]}
02:12:49.004840 10.1.0.212 recv {‘method’: ‘props’, ‘params’: {‘lan_err’: ‘lan_clnt_recv-649-128,lan_cmd_ack-780–1,lan_cmd_ack-780–1,lan_cmd_ack-780–1’, ‘music_on’: 0}}
02:12:49.994386 10.1.0.212 send {‘id’: 1, ‘method’: ‘set_power’, ‘params’: [‘off’, ‘smooth’, 100]}
Sleeping to receive trailing messages
02:12:50.011106 10.1.0.212 recv {‘id’: 1, ‘result’: [‘ok’]}
02:12:50.012554 10.1.0.212 recv {‘method’: ‘props’, ‘params’: {‘power’: ‘off’}}

Yes, we put some calling path info in the property so we could trace the internal error. You can just ignore it for now.

This one should help a lot!

If you do not allow a bulb to talk to the internet, just your inside network, every 15 minutes it resets it’s connections and wireless association. When this happens we run the risk of coming back up (apparently) without lan mode enabled. This is like clockwork, the bulb resets itself every single 15 minutes and you can probably get a bulb ‘hung’ (as we’ve been calling it) overnight.

After configuring my bulbs and enable lan_mode. From my firewall for packets coming from the bulbs I drop all that are not destined for my internal network, I also drop all packets for name services. With an open connection I query the bulb every few seconds (over the same open connection). Meanwhile, I watch my access point for the uptime on the bulbs association, when that uptime hits 15 minutes the connection breaks, however, we don’t see that the connection broke until the bulb is back up and running and it sends a reset packet after we attempt a send. Until then we just send with no response, it’s that reset that tips off the client that the connection is broken which makes it tough to nail down when it happens.

Here is my python script to test this. Again, you must block the bulb from getting dns or talking back to any of the internet servers to guarentee the every 15 minute reset and increase the likelyhood that the bulb will come back (apparently) without lan mode enabled.

I suppose there is some code there that resets because the bulb can’t “phone home” which I would point out will cause far more problems than it will ever solve but that’s just my opinion being demonstrated. Regardless, I hope this helps you find the bug that’s affecting the others.

#!/usr/bin/python3

import socket
import json
import time
import threading
import traceback
import argparse


def option_parser():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "-a",
        "--address",
        help="address of bulb",
        action="store",
        required=True,
    )

    parser.add_argument(
        "-p",
        "--pulse",
        help="pulse time for get_prop",
        action="store",
        type=float,
        default=5.0,
    )

    return parser.parse_args()


OPTS = option_parser()


def tstamp():
    now = time.time()
    ts = time.strftime("%H:%M:") + "{:09.6f}".format(now % 60)
    return ts


STATUS_PROPS = [
    "bright",
    "color_mode",
    "ct",
    "delayoff",
    "flowing",
    "hue",
    "music_on",
    "name",
    "power",
    "rgb",
    "sat",
]


def cmd(method, params=None, id=1):
    if params is None:
        params = []
    msgdict = dict(id=id, method=method, params=params)
    msg = json.dumps(msgdict) + "\r\n"
    return msg.encode()


def log(*args, **kwargs):
    print(
        "{} {:^15s}".format(tstamp(), OPTS.address),
        *args,
        flush=True,
        **kwargs
    )


class BulbCom:
    def __init__(self, addr):
        self.addr = addr
        self.sock = socket.socket()
        log("connecting")
        self.sock.connect((self.addr, 55443))
        log("connected")

        self.listener = threading.Thread(target=self.reader)
        self.listener.start()

    def reader(self):
        log("reader begins")

        while True:
            try:
                result = self.sock.recv(8192)
            except ConnectionError:
                log("connection error {}".format(traceback.format_exc()))
                break
            else:
                log("read {!r}".format(result))

        log("reader ends")

    def send(self, method, params=None, id=1):
        message = cmd(method, params, id)
        log("send {!r}".format(message))
        self.sock.send(message)


def main():
    bcom = BulbCom(OPTS.address)

    while True:
        bcom.send("get_prop", STATUS_PROPS)
        time.sleep(OPTS.pulse)


main()

Thanks a lot for your information! The “phone home” behavior is by design, this is to solve some wired offline issue. The good news is that we provide a beta firmware to another user who can easily reproduce the problem in his network and util now, his device is working fine.
We will provide the firmware to all whitelisted users soon and hopefully it can solve the problem.