14:01 PM
ruaok
let me put them in.
2021-05-04 12423, 2021
14:01 PM
alastairp
one sec, I just shut down the container again
2021-05-04 12452, 2021
14:01 PM
alastairp
template was missing the files, I may have started the wrong image. started again
2021-05-04 12454, 2021
14:01 PM
ruaok
oh that is why I fell over. lol
2021-05-04 12409, 2021
14:02 PM
alastairp
now empty configs are there
2021-05-04 12447, 2021
14:02 PM
alastairp
logs are showing failures due to get_playlists_created_for_user
2021-05-04 12455, 2021
14:02 PM
ruaok
heh, gaga is up. must be in rescue mode.
2021-05-04 12456, 2021
14:02 PM
alastairp
(reads playlists from timescale)
2021-05-04 12408, 2021
14:03 PM
alastairp
but listens unavailable page is showing correctly now
2021-05-04 12414, 2021
14:03 PM
ruaok
yeah, I totally forgot about playlists. :(
2021-05-04 12442, 2021
14:04 PM
ruaok
still, I looked for timescale connections in all view modules. I wonder how that slipped through.
2021-05-04 12439, 2021
14:05 PM
ruaok
views. not db modules. 🤦‍♂️
2021-05-04 12454, 2021
14:05 PM
alastairp
maybe because it uses the engine instead of ListenStore?
2021-05-04 12430, 2021
14:06 PM
alastairp
yeah right - semi related to my comment on the PR, and I'm writing a new issue to unify connections to external services in a single place. I'll discuss it with _lucifer after this
2021-05-04 12409, 2021
14:07 PM
alastairp
incoming queue is climbing. at 350 now
2021-05-04 12428, 2021
14:07 PM
ruaok
> maybe because it uses the engine instead of ListenStore?
2021-05-04 12429, 2021
14:07 PM
ruaok
yep.
2021-05-04 12450, 2021
14:07 PM
ruaok
350? np. 350k more a problem. lol.
2021-05-04 12441, 2021
14:12 PM
ruaok
12 minutes and not a lot of time off. maybe the fan was switched out first thing.
2021-05-04 12458, 2021
14:13 PM
zas
weird it pings, is it in rescue mode?
2021-05-04 12402, 2021
14:14 PM
BrainzGit
2021-05-04 12415, 2021
14:14 PM
alastairp
typescale? *sigh*
2021-05-04 12436, 2021
14:14 PM
ruaok
zas: that is allI can guess. its been pingable all this time, but up with another SSH key.
2021-05-04 12457, 2021
14:14 PM
ruaok
alastairp: I feel yer pain.
2021-05-04 12436, 2021
14:15 PM
zas
we didn't receive a report for hetzner yet though
2021-05-04 12402, 2021
14:16 PM
ruaok
yeah. I wonder if the whole server is being swapped.
2021-05-04 12414, 2021
14:17 PM
zas
they perhaps detected another issue
2021-05-04 12440, 2021
14:17 PM
BrainzGit
2021-05-04 12428, 2021
14:21 PM
alastairp
yvanzo: hi, do you have any thoughts on where in a Dockerfile to add LABEL? I added it to the beginning, but as one of our labels includes the git commit hash (passed in as a build arg), it means we have to rebuild the whole image each time we release:
https://github.com/metabrainz/listenbrainz-server…
2021-05-04 12420, 2021
14:22 PM
alastairp
I know you use labels in a few different places. do you think it's OK to put this at the end of the Dockerfile? Do you know if it has an effect on the metadata? (I don't know if labels are attached to all intermediate layers or only the final image)
2021-05-04 12436, 2021
14:22 PM
BrainzGit
2021-05-04 12457, 2021
14:23 PM
alastairp
I guess one option is to add main labels at the beginning and varying labels at the end
2021-05-04 12427, 2021
14:25 PM
ruaok
pings timing out.
2021-05-04 12427, 2021
14:27 PM
ruaok
zas: GO!
2021-05-04 12432, 2021
14:27 PM
ruaok
gaga is up.
2021-05-04 12442, 2021
14:27 PM
zas
ok, I'll upgrade docker
2021-05-04 12420, 2021
14:28 PM
alastairp
should I re-deploy with playlist redirects in place, or are we OK to leave it just a few minutes more?
2021-05-04 12454, 2021
14:28 PM
ruaok
just leave it.
2021-05-04 12457, 2021
14:28 PM
alastairp
sure
2021-05-04 12458, 2021
14:28 PM
ruaok
2021-05-04 12404, 2021
14:29 PM
ruaok
doh, all for naught.
2021-05-04 12420, 2021
14:29 PM
ruaok
why didn't you fucking replace the fan anyway???
2021-05-04 12440, 2021
14:29 PM
zas
ok...
2021-05-04 12450, 2021
14:29 PM
ruaok
you done, zas?
2021-05-04 12454, 2021
14:29 PM
zas
yup
2021-05-04 12456, 2021
14:29 PM
ruaok
k
2021-05-04 12412, 2021
14:30 PM
ruaok
starting services
2021-05-04 12416, 2021
14:30 PM
shivam-kapila
Why is the unique queue so big? 99.7k
2021-05-04 12425, 2021
14:30 PM
alastairp
shivam-kapila: because nothing consumes from it
2021-05-04 12455, 2021
14:30 PM
alastairp
uwsgi bouncing
2021-05-04 12404, 2021
14:31 PM
shivam-kapila
Ah like that. Thanks alastairp
2021-05-04 12410, 2021
14:31 PM
alastairp
listens pages up again
2021-05-04 12411, 2021
14:31 PM
ruaok
alastairp: give it a shot, timescale is back up.
2021-05-04 12421, 2021
14:31 PM
alastairp
ruaok: I didn't do anything, it did it itself!
2021-05-04 12425, 2021
14:31 PM
alastairp
at least one thing works
2021-05-04 12430, 2021
14:31 PM
ruaok
sweet! very nice. :)
2021-05-04 12451, 2021
14:31 PM
alastairp
restarting ts writer
2021-05-04 12416, 2021
14:32 PM
alastairp
mmm, bounced again. did you do anything ruaok?
2021-05-04 12425, 2021
14:32 PM
ruaok
no
2021-05-04 12435, 2021
14:32 PM
alastairp
odd. something to look into later
2021-05-04 12413, 2021
14:33 PM
ruaok
2021-05-04 12431, 2021
14:33 PM
alastairp
ruaok: yeah, we saw a few weeks ago that he had left pandora
2021-05-04 12441, 2021
14:33 PM
zas
well, at least it serves at testing LB degraded mode, and the machine was needing a bunch of upgrades anyway
2021-05-04 12450, 2021
14:34 PM
alastairp
timescale service appears to be degraded again
2021-05-04 12436, 2021
14:35 PM
alastairp
ruaok: zas
2021-05-04 12449, 2021
14:35 PM
ruaok
I just saw that.
2021-05-04 12451, 2021
14:35 PM
ruaok
wtf?
2021-05-04 12420, 2021
14:36 PM
alastairp
timescale-writer is failing because no config (old image), which means that the service isn't in consul
2021-05-04 12440, 2021
14:36 PM
ruaok
the server is up though.
2021-05-04 12441, 2021
14:36 PM
alastairp
2021-05-04 14:30:17.162 UTC [1] LOG: background worker "Continuous Aggregate Background Job" (PID 39) exited with exit code 1
2021-05-04 12446, 2021
14:36 PM
ruaok
let me make a connection.
2021-05-04 12455, 2021
14:36 PM
alastairp
last log item in timescale docker logs
2021-05-04 12427, 2021
14:37 PM
ruaok
its accepting connections and answering queries.
2021-05-04 12433, 2021
14:37 PM
ruaok
timescale is up.
2021-05-04 12451, 2021
14:37 PM
ruaok
and that error message just means that the regular background job finished.
2021-05-04 12441, 2021
14:38 PM
ruaok
zas: any idea why timescale is still marked bad?
2021-05-04 12403, 2021
14:39 PM
zas
no idea, let me check if I find something
2021-05-04 12455, 2021
14:39 PM
zas
registrator complains about it
2021-05-04 12459, 2021
14:39 PM
zas
2021-05-04 12412, 2021
14:40 PM
alastairp
timescale writer finally rendered config
2021-05-04 12416, 2021
14:40 PM
alastairp
oops,and just crashed again
2021-05-04 12441, 2021
14:40 PM
zas
my guess: it crashes over and over, and is marked bad for a good reason
2021-05-04 12429, 2021
14:41 PM
alastairp
zas: from what we can tell, the timescale server is up and running. we can make psql connection to it and it responds to queries
2021-05-04 12449, 2021
14:41 PM
ruaok
the container has not restarted.
2021-05-04 12408, 2021
14:42 PM
ruaok
timescale continues to work fine. no loss of connections.
2021-05-04 12423, 2021
14:42 PM
zas
hmmm weird
2021-05-04 12435, 2021
14:42 PM
alastairp
where does the 'postgres-health-check' app run from?
2021-05-04 12453, 2021
14:42 PM
zas
inside the container I think
2021-05-04 12401, 2021
14:43 PM
ruaok
2021/05/04 14:41:28 register failed: &{gaga:listenbrainz-timescale:5432 timescale-listenbrainz 13046 10.2.2.31 [] map[check_tcp:true check_interval:15s check_timeout:3s check_script:postgres-health-check 'host=10.2.2.31 port=13046 user=postgres password=postgres dbname=template1 sslmode=disable'] 0 {13046 10.2.2.31 5432 172.17.0.3 tcp 450c31e98368 450c31e983685d774550fd99022a0d1c7312732e6ab2637d0271c4bc6c5aab76 0xc2080fc8c0}} Unexpected
2021-05-04 12401, 2021
14:43 PM
ruaok
response code: 500 (Scripts are disabled on this agent; to enable, configure 'enable_script_checks' to true)
2021-05-04 12421, 2021
14:43 PM
zas
ah this is consul config
2021-05-04 12424, 2021
14:45 PM
ruaok
did the health check service not come back up ok?
2021-05-04 12434, 2021
14:45 PM
zas
2021-05-04 12434, 2021
14:46 PM
ruaok
do we need to restart consul with that flag?
2021-05-04 12441, 2021
14:46 PM
zas
yes
2021-05-04 12452, 2021
14:46 PM
alastairp
but that can't be in the timescale image... we're using off-the-shelf image from docker hub with no custom MeB configuration right?
2021-05-04 12455, 2021
14:46 PM
zas
and we should add it to services script
2021-05-04 12419, 2021
14:47 PM
ruaok
2021-05-04 12436, 2021
14:47 PM
ruaok
alastairp: yes.
2021-05-04 12423, 2021
14:48 PM
alastairp
consulagent, then?
2021-05-04 12434, 2021
14:48 PM
alastairp
zas: can you restart consulagent, or should one of us do it?
2021-05-04 12400, 2021
14:49 PM
zas
2021-05-04 12410, 2021
14:49 PM
zas
ruaok: I can do it if you want
2021-05-04 12416, 2021
14:49 PM
ruaok
please do.
2021-05-04 12423, 2021
14:49 PM
ruaok
I'm not sure exactly what needs doing.
2021-05-04 12448, 2021
14:49 PM
alastairp
yes, seems likely. good catch zas
2021-05-04 12410, 2021
14:52 PM
ruaok
"start_consul_agent_with_postgres_health_check" ?
2021-05-04 12422, 2021
14:52 PM
zas
I did it
2021-05-04 12431, 2021
14:52 PM
zas
I'll commit the patch
2021-05-04 12455, 2021
14:52 PM
ruaok
k
2021-05-04 12423, 2021
14:53 PM
zas
2021-05-04 12428, 2021
14:53 PM
zas
does it work now?
2021-05-04 12438, 2021
14:53 PM
akashgp09 has quit
2021-05-04 12451, 2021
14:53 PM
zas
2021/05/04 14:50:45 added: 450c31e98368 gaga:listenbrainz-timescale:5432
2021-05-04 12459, 2021
14:53 PM
zas
registrator looks happier
2021-05-04 12437, 2021
14:54 PM
ruaok
the main web container hasn't caught this yet.
2021-05-04 12408, 2021
14:55 PM
alastairp
hmm
2021-05-04 12401, 2021
14:56 PM
ruaok
no consul is not happy yet.
2021-05-04 12433, 2021
14:56 PM
ruaok
is it just a matter of time, zas?
2021-05-04 12455, 2021
14:56 PM
zas
nope, it should work, but there are perhaps another issue
2021-05-04 12403, 2021
14:57 PM
alastairp
let me kick web again manually
2021-05-04 12425, 2021
14:57 PM
zas
at least registrator log shows it added the service
2021-05-04 12427, 2021
14:57 PM
ruaok
wont help, alastairp. listenbrainz-mbid-mapping-writer fails still.
2021-05-04 12402, 2021
14:58 PM
alastairp
we restarted consulagent, but not timescale. what if we try and restart that?
2021-05-04 12446, 2021
14:58 PM
zas
you should restart all containers, perhaps consul-template didn't like consul restart (we experienced issues in the past)
2021-05-04 12455, 2021
14:58 PM
ruaok
ok, will restart
2021-05-04 12404, 2021
14:59 PM
zas
but consul agent & registrator should be ok
2021-05-04 12404, 2021
14:59 PM
alastairp
mm
2021-05-04 12411, 2021
14:59 PM
zas
don't restart those
2021-05-04 12427, 2021
14:59 PM
alastairp
zas: please check volume mount of postgres-health-check
2021-05-04 12435, 2021
14:59 PM
alastairp
in gaga consulagent it's a directory, should be a binary
2021-05-04 12444, 2021
14:59 PM
zas
huh?
2021-05-04 12456, 2021
14:59 PM
ruaok
restarted, not better.
2021-05-04 12402, 2021
15:00 PM
alastairp
# ls -l /usr/local/bin/
2021-05-04 12402, 2021
15:00 PM
alastairp
drwxr-xr-x 2 root root 4096 May 4 14:50 postgres-health-check
2021-05-04 12420, 2021
15:00 PM
alastairp
inside `docker exec -it consulagent sh`
2021-05-04 12422, 2021
15:00 PM
zas
looks like this patch was buggy
2021-05-04 12418, 2021
15:02 PM
zas
2021-05-04 12439, 2021
15:02 PM
zas
ok, let's see how we can fix this mess
2021-05-04 12455, 2021
15:02 PM
ruaok is ready to help
2021-05-04 12419, 2021
15:03 PM
zas
". It is always created as a directory."
2021-05-04 12403, 2021
15:04 PM
alastairp
where is the file with the docker run command?
2021-05-04 12423, 2021
15:04 PM
_lucifer
scripts/services.sh probably
2021-05-04 12424, 2021
15:05 PM
_lucifer
2021-05-04 12420, 2021
15:06 PM
ruaok
should we turn off the health check for now?
2021-05-04 12435, 2021
15:06 PM
ruaok
and get things running again and then debug the health check?
2021-05-04 12413, 2021
15:07 PM
alastairp
/home/zas/docker-server-configs/postgres-health-check/postgres-health-check on gaga has nothing in it
2021-05-04 12414, 2021
15:08 PM
alastairp
ruaok: yes, we could attempt to change SERVICE_5432_CHECK_SCRIPT in start_listenbrainz_timescale to something truthy
2021-05-04 12429, 2021
15:08 PM
ruaok
can we just remove the line?
2021-05-04 12440, 2021
15:08 PM
alastairp
I don't know the answer to that