IRC Logs for #metabrainz | MetaBrainz Chatlogs

2:27 AM
MRiddickW joined the channel
5:01 AM
gcrkrause has quit
5:04 AM
gcrkrause joined the channel
5:07 AM
-- BotBot disconnected, possible missing messages --
5:07 AM
BrainzBot joined the channel
6:58 AM
gavinatkinson has quit
6:58 AM
pprkut has quit
6:58 AM
rdrg109 has quit
7:04 AM
bitmap has quit
7:04 AM
atj has quit
7:04 AM
atj joined the channel
7:17 AM
reosarevok

yvanzo: will release prod, since the only known bug is "timelines load very slowly" which seems like it's fine to fix for next release
7:17 AM
Anything special about this docker release?
7:27 AM
BrainzGit

[musicbrainz-server] 14reosarevok merged pull request #2332 (03beta…MBS-12082-tidal-favicon): MBS-12082: Move Tidal favicon to external-favicons https://github.com/metabrainz/musicbrainz-serve...
8:10 AM
MRiddickW has quit
8:13 AM
dseomn has quit
8:15 AM
reosarevok

Updating beta
8:20 AM
dseomn joined the channel
9:02 AM
Updating prod
9:10 AM
zas

I love the smell of upgrades in the morning.... Moooin reosarevok
9:10 AM
reosarevok

Did the alerts wake you up? :D
9:13 AM
zas

yup ;)
9:14 AM
but that's my time (and yours it seems)
9:18 AM
reosarevok

We should probably try to get https://github.com/metabrainz/musicbrainz-serve... in for next time :)
9:33 AM
pprkut joined the channel
9:47 AM
alastairp

hello
9:47 AM
4pm today for more gaga stuff, right?
9:47 AM
ruaok

mooin!
9:47 AM
yes, gaga stuff 4pm.
9:48 AM
reosarevok

yvanzo: blog post ready to review, draft release of docker ready, didn't tag docker yet in case we have something else to add, although I'm expecting not since there's nothing new in master
9:48 AM
Do let me know
9:51 AM
zas: see support about ip addresses
9:52 AM
zas

I'll check
9:53 AM
ruaok

lucifer: that list of releases looks very promising -- I really want to listen to it.
9:54 AM
lucifer

nice :D
9:54 AM
ruaok

is there any way to get this data into a dataset hoster or something that is playable?
9:54 AM
I think I will embed BP into data set hoster once that is possible.
9:54 AM
lucifer

i think i can get this on LB prod this week.
9:55 AM
ruaok

another thought on this: would it be possible to differentiate on album type?
9:55 AM
lucifer

once this data is in LB db we can present it in any way we want.
9:55 AM
can you elaborate on that?
9:55 AM
ruaok

I care less about singles releases or EPs, but albums, those I really want to know about
9:55 AM
lucifer

i see. yes should be possible i think.
9:57 AM
do you mean to just select albums or include album type in the final json?
9:57 AM
*just select albums before in the sql query itself
9:58 AM
ruaok

I think selecting the column is a good start.
9:58 AM
then the question is: how do we present this?
9:58 AM
do we want to make a list of releases and a "selected recordings from this list of releases" playlist
9:58 AM
?
9:58 AM
I can make the latter happen if you can get this data into the data set hoster.
9:59 AM
reosarevok

Keep in mind EPs very often are also new music that doesn't end up in albums, so I'd want to be able to at least choose to see those :)
9:59 AM
lucifer

we'll get a release mbid as the output. show the users the list of releases. if they click one, load the recordings from MB WS and feed it to BP. thoughts?
10:00 AM
much like how huesound works actually. clicking color gets release mbids then we fetch recordings from db.
10:00 AM
ruaok

reosarevok: claro, I just want to be able to skew it.
10:00 AM
lucifer: yes, I think that is good.
10:00 AM
lucifer

another possibility is random recordings from these release playlist.
10:00 AM
ruaok

that means we need to write such an endpoint, which shouldn't be hard.
10:01 AM
lucifer

indeed.
10:01 AM
ruaok

lucifer: the random recordings is what I was suggesting for a playlist. we could do both.
10:01 AM
lucifer

sounds good to me.
10:02 AM
ruaok

a lot of my top discoveries of 2021 haven't been matched to MBIDs.
10:02 AM
so now I am trying to make sure that they exist in MB and am working on a "retry no_match" mbid mapping entries/
10:03 AM
lucifer

ah, that's not good. matching is a key part of much of the stuff we are doing.
10:03 AM
ruaok

so we can tell people to go and get their missing albums in before we run the data for real.
10:03 AM
lucifer

noice!
10:04 AM
ruaok

I really need to make the mapping match this album: https://musicbrainz.org/release/87f10206-f3fe-4...
10:04 AM
its a great case for bad data in MB.
10:04 AM
I thought it should already match this -- not sure why not. I'll check,
10:06 AM
lucifer

do you have a sample listen that is not matching this?
10:07 AM
ruaok

not to hand, no.
10:08 AM
https://bono.metabrainz.org/top-discoveries?use...
10:08 AM
see one track not matched near the 6s. "the river bend"
10:10 AM
lucifer

i see.
10:10 AM
the one in MB uses feat. while the listen has `,`
10:10 AM
do we ignore join phrases while trying to match?
10:10 AM
ruaok

no
10:11 AM
lucifer

thoughts on doing that?
10:11 AM
ruaok

but we attempt to find common endings on tracks like "feat. XXXX"
10:11 AM
and if we remove them and get a match, its low or med quality match.
10:11 AM
lucifer

makes sense
10:12 AM
ruaok

well, I wouldn't spend a lot of time thinking about join phrases. the extra guff usually are search engine stop words that get filtered out.
10:50 AM
lucifer

ruaok, running the steps mbid mapper goes through manually. i see typesense find the mbid match but evaluate_hit discards it.
10:51 AM
ruaok

sounds about right.
10:51 AM
picking the right threshold for such an operation is tricky to say the least.
10:51 AM
lucifer

https://www.irccloud.com/pastebin/IMyreIfn/
10:52 AM
this is the original comparision without detuning
10:52 AM
ruaok

yeah.
10:52 AM
I wonder if there should be a detune, that removes everthing past the first comma.
10:52 AM
from the artist name.
10:53 AM
lucifer

uh that looks wrong. i probably messed something up the artist should not be The River bend,
10:54 AM
i'll retry more carefully but is this code working as expected https://github.com/metabrainz/listenbrainz-serv...
10:54 AM
detune_query_string('The River Bend') returns ''
10:54 AM
ruaok

we need to add a lot more detunings, I think.
10:55 AM
lucifer

should it just the whole thing back if there's no cruft?
10:55 AM
*just return the whole thing
10:55 AM
ruaok

it should return the whole thing if there is no cruft, returning "" is wrong
10:56 AM
well, but clearly that is what the code does. let me look closer.
10:56 AM
lucifer

👍
10:57 AM
ruaok

yeah, that seems wrong.
10:57 AM
it should return the whole string. time to add more detailed test cases that catches this.
10:58 AM
lucifer

cool, let's fix that bug. it should hopefully improve some matches.
10:58 AM
ruaok

possibly.
10:58 AM
I'll dive into that once the migration doc for the afternoon is worked up.
10:58 AM
alastairp

I'm copying a 500gb file over a usb2 hard drive enclsure
10:58 AM
I now remember that usb2 max speed is 400mbps
10:59 AM
my home internet connection is faster
10:59 AM
lucifer

another thing i was thinking that we detune only query strings not the mb data (makes sense because mb data is mostly correct) but then levensthein distance is symmetric so should not matter anyways, right?
11:00 AM
ruaok

lucifer: if you have more examples that the mapping should match, but doesn't, let me know. I'll add them to the tests
11:00 AM
lucifer

sure will do
11:00 AM
ruaok

detuning MB data is intended to catch the subculture example
11:01 AM
I think that is still valid.
11:02 AM
lucifer

uh right, i got confused.
11:02 AM
ruaok

alastairp: lucifer zas : how do we feel about doing backups before today's migration? In theory today's operations are a little less error prone, but still. what should our level of backups be today?
11:02 AM
lucifer: this shit does get confusing.
11:02 AM
lucifer

thinking again. yes, i think another step that detunes MB data after detuned query string has no match and assigns a low match in this case sounds good.
11:03 AM
indeed 😓
11:03 AM
alastairp

ruaok: incremental backup and playlist/mapping backup took less than 5 minutes, I don't have a problem doing them again just in case
11:03 AM
ruaok

lets gather more cases that didn't match and see if we can make a prioritized list of things to add
11:03 AM
lucifer

+1 on both
11:04 AM
ruaok

lucifer: we didn't happen to run a full dump last night, did we?
11:04 AM
lucifer

let me check
11:04 AM
alastairp

oh, or more specifically, there isn't a full dump running right now, is there?
11:04 AM
ruaok

even better question.
11:04 AM
zas

ruaok: we should be ready for the worse, so back up whatever is needed
11:04 AM
lucifer

uh no..
11:04 AM
we are in the middle of one
11:05 AM
ruaok

zas: yeah, ok.
11:05 AM
lucifer

https://www.irccloud.com/pastebin/RuwvT8Vo/
11:05 AM
won't be done anytime soon.
11:05 AM
ruaok

2016? 4 hours? hmm. could be possible.
11:06 AM
won't be done is different from done dumping from gaga.
11:06 AM
if we're done dumping from gaga, we can proceed.
11:06 AM
(and onto compressing dumps and cp'ing them)
11:08 AM
lucifer

yeah, i checked the dump timings we ran on 11th. its like
11:08 AM
https://www.irccloud.com/pastebin/9HzRAeoT/
11:08 AM
and this is the normal listen dump. after this spark listen dump will start
11:09 AM
ruaok

crap
11:09 AM
delay by a day or abort dumps?
11:09 AM
do we have full incremental dumps since the last full dump?
11:10 AM
lucifer

yes
11:10 AM
ruaok

(read: could we recover from this, by dumping out incremental dumps and backing up the other tables?)
11:10 AM
lucifer

i'd suggest abort dumps. deploy alastairp's no listen flag PR. dump all tables except listens fully and do an inc listen dump.
11:11 AM
ruaok

I'm ok with that.
11:11 AM
alastairp

lucifer: good idea
11:11 AM
ruaok

alastairp: ?
11:12 AM
lucifer

ok let's do that then.
11:12 AM
this is the PR to review https://github.com/metabrainz/listenbrainz-serv...
11:13 AM
alastairp

I tested a full end-to-end dump + import of listens, user table, playlist table, mappings table (didn't test stats, but nothing changed there)
11:14 AM
ruaok

is that PR misnamed?
11:14 AM
does it create private and public dumps?