in #metabrainz

1:15 AM
vogen joined the channel
1:45 AM
vogen has quit
1:53 AM
MusicbrainzB0T has quit
1:54 AM
MusicbrainzB0T joined the channel
2:54 AM
thomasross has quit
5:04 AM
reosarevok

CatQuest: yeah, that's all it does. Now that's all sortname guess does too :p
5:27 AM
_lucifer

ruaok, i tried to test again on michael but it seems not to be picking up requests from the queue again.
5:27 AM
however pinging prince on the 62673 succeeds.
5:42 AM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #2066 (beta…MBS-11583): MBS-11583: Use sanitized context in hydrated component https://github.com/metabrainz/musicbrainz-serve...
6:00 AM
[musicbrainz-server] reosarevok merged pull request #2066 (beta…MBS-11583): MBS-11583: Use sanitized context in hydrated component https://github.com/metabrainz/musicbrainz-serve...
6:08 AM
[musicbrainz-server] reosarevok merged pull request #2033 (master…MBS-11542): MBS-11542 / MBS-11552: Allow and cleanup new Classical Archives links + add validation https://github.com/metabrainz/musicbrainz-serve...
6:38 AM
_lucifer

ok i figured due to some reason yarn is trying to connect to marlon instead of worker-marlon. i think that might be the case if its tries public ip instead of internal ip.
6:39 AM
interesting hdfs cli works as expected though reports the internal ips and worker-*
7:29 AM
ok. the ip thing is fixed now but it now fails with user application exited with exit code 1.
7:30 AM
alastairp

CatQuest: thanks! blank spaces is OK, the only problem we might have is an empty string. Anything that is actually represented by characters is fine 👍
7:32 AM
sumedh joined the channel
7:44 AM
reosarevok

bitmap, yvanzo: for MBS-1658, I was thinking at least one of the places to add a comment to the entry should be from the list itself
7:44 AM
BrainzBot

MBS-1658: My Collection: add free text comment field https://tickets.metabrainz.org/browse/MBS-1658
7:44 AM
reosarevok

https://usercontent.irccloud-cdn.com/file/pajfO...
7:46 AM
CatQuest

alastairp: iirc there was an issue with an artist credit that included https://beta.musicbrainz.org/artist/3f0bdf7f-3f... some time back..
7:48 AM
https://beta.musicbrainz.org/edit/17556643
7:48 AM
reosarevok

bitmap, yvanzo: So that last column should have like an edit icon somehow, and I guess allow inline editing that would get sent to the DB? Do you know if we do anything like that anywhere else?
7:49 AM
Or if you think that's a bad idea, how would you do it?
7:49 AM
alastairp

CatQuest: hah, nice
7:49 AM
that's not a problem either though, but I can see how it could be a problem
7:49 AM
CatQuest

it was a nice bobby tables thing
7:49 AM
.. niche
7:49 AM
damn englich
7:57 AM
sumedh has quit
8:19 AM
D4RK joined the channel
8:21 AM
D4RK-PH0ENiX has quit
8:34 AM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #2067 (master…MBS-10899): MBS-10899: Report for releases with catnos that look like ISRCs https://github.com/metabrainz/musicbrainz-serve...
8:59 AM
ruaok

moooin!
9:01 AM
Mr_Monkey: alastairp : so after more experimenting last night, I'm able to get rid of the min/max ts cont aggs by simply creating a 5 days cont agg with compound index on user/listened_at. Thats 18M rows less for starters.
9:02 AM
and I think we can replace those with month and year cont aggs for the graphs you two would like.
9:02 AM
alastairp

right. get all data from the same table?
9:02 AM
ruaok

yeah, it was already there. just the index was missing to make it faster.
9:02 AM
alastairp

sweet, if a month and year aggregate is possible then that sounds like it should be perfect
9:02 AM
great
9:04 AM
ruaok

basically we swapped doing a table scan on the DB with an index scan. not sure we can do much better than that -- but with increased cache times, this should work well.
9:06 AM
alastairp

_lucifer: ^ remember how I told you to add indexes to tables where you want to select some data?
9:11 AM
_lucifer

yup, i'll keep it in mind :D
9:11 AM
ruaok

its the rookie mistake that keeps on giving. #going25yearsstrong
9:15 AM
sumedh joined the channel
9:29 AM
woah. https://restofworld.org/2021/the-rise-and-fall-...
9:37 AM
sumedh has quit
9:45 AM
_lucifer

ruaok, apparently `0.0.0.0` is causing hostname resolution errors. 0.0.0.0 is resolving to the server's name michael instead of leader.
9:46 AM
https://www.irccloud.com/pastebin/Uuitqvql/
9:46 AM
the same error was happening on workers leading to resolving as tito instead of worker-tito so on.
9:47 AM
sumedh joined the channel
9:47 AM
i fixed that by changing the configurations of various files here https://github.com/metabrainz/hadoop-cluster-do...
9:48 AM
but for michael it seems it is picking up some default we didn't use to define in the earlier setup. any guesses which one it could be?
10:00 AM
ruaok

_lucifer: I suspect that is because the canonical name of the machine is michael and has its reverse DNS set like that.
10:00 AM
so, for config purposes you should always use michael. leader is just a shorthand/convention for us to log into the cluster.
10:01 AM
_lucifer

i think what is happening is that michael is used when it tries to bind an interface on 0.0.0.0.
10:01 AM
ruaok

I would change the /etc/hosts and change leader to michael.
10:02 AM
_lucifer

makes sense. i'll try that.
10:02 AM
ruaok

not 100% sure that will work. the last paste -- which container is that from? is it up?
10:03 AM
_lucifer

no it goes down after that.
10:03 AM
ruaok

from the inside of that container can you ping michael:31171 ?
10:03 AM
try the bash trick again, get the container up and then see what you can or cannot connect to.
10:08 AM
_lucifer

just tried that fails with unknown host
10:08 AM
ping michael works but with the port doesn't
10:09 AM
ruaok

sorry wget michael:31171
10:12 AM
_lucifer

Aaannd it worked!
10:12 AM
ruaok

!m _lucifer
10:12 AM
BrainzBot

You're doing good work, _lucifer!
10:12 AM
_lucifer

https://www.irccloud.com/pastebin/9mkggfTg/
10:12 AM
ruaok

yisss!
10:12 AM
_lucifer

changing to michael didn't work
10:12 AM
but changing a spark default to the vlan ip did
10:12 AM
ruaok

so, lets do all the loading of data (mapping, incrementals) then we can fire off some jobs.
10:12 AM
that makes sense.
10:12 AM
very very good.
10:13 AM
_lucifer

two things left to do. one is define memory defaults and second is update the new configuration in syswiki.
10:21 AM
monitoring this cluster is easier than the docker one. one tunnel is sufficient
10:22 AM
ruaok

that was exactly the goal.
10:22 AM
and each server is being monitored by all of zas' magic.
10:22 AM
_lucifer

:D
11:14 AM
alastairp: available to talk about the GH actions PR?
12:00 PM
ruaok, zas: do you know if any service we run on j5 might listen on port 5666?
12:00 PM
zas

nagios
12:00 PM
well, its agent
12:01 PM
_lucifer

👍 thanks!
12:03 PM
sumedh has quit
12:06 PM
sumedh joined the channel
12:10 PM
zas, i saw a commit in syswiki renaming germaine to jermaine so wanted to let you know that i noticed /etc/hosts on jermaine still contains has a couple of entries referring germaine.
12:12 PM
zas

oh, ok, I'll fix it
12:21 PM
https://data.musicbrainz.org (ftp / williams over http(s|2))
12:22 PM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #2068 (master…MBS-10711): MBS-10711: Convert report lists to react-table [WIP] https://github.com/metabrainz/musicbrainz-serve...
12:24 PM
reosarevok

bitmap, yvanzo ^ would really appreciate some feedback on whether the way I'm approaching this seems sensible, improvements etc, before I keep working on other lists
12:44 PM
zas

https://blog.metabrainz.org/2021/04/15/picard-2...
13:19 PM
BrainzGit

[bookbrainz-site] akashgp09 opened pull request #601 (master…browser-compatibility): FIX(BB-615): Copy/Paste annotation text in FireFox <= 60 https://github.com/bookbrainz/bookbrainz-site/p...
13:51 PM
scory joined the channel
13:58 PM
scory

Hello everyone. I would like to ask about Lucene Search syntax of the musicbrainz database. What kind of instance is it running on? If i would like to have a musicbrainz database (mbdata) with ElasticSearch instance, how do you recommend to integrate these two. I just started looking into ElasticSearch but what I found out i would need some kind of
13:58 PM
data set to import it to ElasticSearch (*.json for example). Do you have some kind of method to import musicbrainz database into a Lucene Search intance, a data set I can use, or i should generate it myself? How do you keep it updated?
13:58 PM
ruaok

hi scory!
13:58 PM
why must is be elasticsearch?
13:59 PM
because we have a perfectly working search infrastructure that you can use without having to reinvent the wheel.
14:04 PM
scory

That infrastructure currently doesn't support what I need, last time I was here that was the conclusion for me. That's the reason I am currently running a mbdata server locally, and can run graphql queries against it, with batching. But I would like to implement an ElasticSearch instance on graphql. But currently i am just investigating.
14:05 PM
ruaok

you could look at the denormalized JSON dumps we have: ftp://ftp.eu.metabrainz.org/pub/musicbrainz/dat...
14:06 PM
those fit for importing into a document store.
14:07 PM
scory

Thank you very much.
14:08 PM
scory has quit
14:24 PM
sumedh has quit
14:54 PM
bitmap

reosarevok: I can't think of anything else like that offhand, but doesn't seem like a bad idea. we could add a small endpoint to /ws/js for it
15:03 PM
sumedh joined the channel
15:52 PM
vardan has quit
16:17 PM
adhi001 joined the channel
16:20 PM
adhi001

Sorry ruaok , I was sick the last week and was not able to submit a proposal for GSoC. Still part of the community :)
16:21 PM
ruaok

oh, bummer. that sucks. at least you're better, right?
16:21 PM
adhi001

yeah
16:23 PM
Thank you for your concern
16:24 PM
alastairp

_lucifer: hi, sorry - had a hectic day. still around?
16:25 PM
_lucifer

alastairp: hi! no worries. yup, i am available.
16:26 PM
alastairp

so I was suggesting using test.sh in the actions?
16:26 PM
_lucifer

yes
16:26 PM
alastairp

so we already have things like `./test.sh -b` to build, and `test.sh -u` to bring up containers
16:26 PM
test.sh fe to run frontend tests
16:27 PM
_lucifer

yes there's also test.sh spark
16:28 PM
alastairp

great, so it sounds like it's probably a good fit that we can use the actions files for specifying the orders in which to run things, but reusing test.sh for the actual commands allows us to share 100% test code between local developement and CI, right?
16:29 PM
_lucifer

should we use separate build steps? like there's a ./test.sh -u to just bring up supporting containers. or should we just to do ./test.sh which does it in one go
16:30 PM
alastairp

but we need to separate pull / cache / build / run, in CI, right?
16:30 PM
_lucifer

yes, mostly. we'll still have to pull manually
16:30 PM
that step won't change
16:31 PM
alastairp

one question - if a test generates files (e.g. the junit xml), will it be cached? Or does the cache action only cache docker layers?
16:32 PM
_lucifer

i'll need to check that.
16:33 PM
i expect docker layers only but we can confirm it by generating some files and looking at the actions output
16:33 PM
alastairp

it seems that satackey/action-docker-layer-caching works explicitly on layers (looking at the output, it uses docker commands to generate the archives)
16:34 PM
so yeah, ./test.sh pull; restore cache; run test.sh; save cache
16:34 PM
if that works, then I'm all for it!
16:34 PM
_lucifer

makes sense. i'll try that.
16:34 PM
i also opened https://github.com/metabrainz/brainzutils-pytho...
16:34 PM
the junit action works fine but as i mentioned it might comment excessively
16:35 PM
alastairp

neat. did you see what happens if one fails? does it only update the comment or does it also add an annotation to the failing test?
16:36 PM
and it'll make a new comment on every push (i.e. even if they all pass?)
16:36 PM
_lucifer

no i haven't, let me do that right now.
16:36 PM
yes
16:36 PM
alastairp

or only if the results of the test run change?
16:36 PM
interesting
16:36 PM
you're right that this could get a bit annoying
16:36 PM
_lucifer

it'll hide the existing one and add a new one
16:37 PM
for LB there are going to be 4 comments on each push
16:37 PM
alastairp

oh, that's quite annoying. merging tests together would help (e.g. get it down to 2), but I suspect that this might be too much
16:37 PM
one thing that ruaok was suggesting back on jenkins was that it seems stupid to run tests on _every_ push, perhaps there could be a way to run them less often. once a day? on request based on a comment? just before merge?
16:38 PM
ruaok

anything, really.
16:38 PM
alastairp

let's not spend too much more time on this, but perhaps there is an action or a flag for `on:` that lets us decide to run them less often
16:38 PM
_lucifer

i'll look into that should be possible i think
16:38 PM
alastairp

on: comment: contains: "test please"
16:38 PM
that'd be great