in #metabrainz

1:43 AM
darkstarx has quit
1:44 AM
darkstarx joined the channel
6:02 AM
reosarevok

lucifer or mayhem: can you check the "Statistics for the user have not been calculated" email, when you have some time? :) Not sure whether there's a way to force that or something.
6:05 AM
lucifer

reosarevok: thanks 👍 asked for their username to look into it
6:07 AM
reosarevok

Heh, ok, yeah, seems the email isn't in use in MB so maybe it's a different one in their account
6:14 AM
piwu has quit
6:19 AM
piwu joined the channel
6:32 AM
bitmap, yvanzo: I was wondering, what's the use for $c->user_exists in Perl? We don't use it in JS. Is it faster than just checking $c->user?
6:32 AM
Or are there cases where we set user_exists without loading user?
6:33 AM
aerozol

mayhem: I have no idea what it means, but I get those suggestions too, with no results if I search for them
6:58 AM
saturday7 has quit
6:58 AM
saturday79 joined the channel
7:01 AM
tykling has quit
7:07 AM
tykling joined the channel
7:10 AM
alastairp

morning
7:10 AM
https://twitter.com/jherskowitz/status/15875894...
7:14 AM
CatQuest

[22:04] <aerozol> Am I the only person in the world who doesn’t feel okay with getting a Spotify account?
7:14 AM
lmao
7:14 AM
I was stumped because yo uhadot put in a gender
7:14 AM
and I didn't really want to at the time
7:14 AM
now that I figured myself out I *can't* I'd have to lie (unless they've stopped using it/have nb as an option/realsied that sometimes peopel don't wnatot give away "gender" to some thing)
7:14 AM
I had on but i never used it, and i can't remember a password
7:15 AM
heh. I think music, as in the artists and such, actually *thrived* because of piracy. I would have *never* heard (or heard of!) most of the music i've later *bought* if it wasn't for piracy/freeblogs/etc
7:16 AM
morn alastairp
7:17 AM
hah. yet another reason I think it's important to include december yo :D
7:17 AM
hohohoho
7:18 AM
... wait so people lsiten to christmas music for mnovember 1st now? that's ridicolous
7:19 AM
(lol one could simply make an algorithm, especially if one was spotify since you had all the data about what releases *where* christmas) to jsut. you know, *exclude* christmas music
7:20 AM
alastairp

yeah, when I spoke to the echonest about this years ago, they were talking about how identifying christmas music (and kid's music) was really important in order to work out when the correct context to recommend it was
7:20 AM
CatQuest

anyway i say it now and I say it always. data is data. and it *si* interesting data that people play chistmas music in december. it's *OK* to include that statistic (it's als ook to exclude that statistic for music recomendations :))
7:20 AM
yea!
7:22 AM
alastairp

I don't know when the switchover date(s) are, but I understand that there are some
7:22 AM
maybe it's nov 1? in north america I had always heard "thanksgiving" (last thurs in nov?)
7:22 AM
CatQuest

yea apparently https://eu.usatoday.com/story/money/2022/11/01/...
7:22 AM
it's just so.. noone does "thanksgiving" but americans. so it's like to the rest of the world it's ???
7:23 AM
anyway I hadn't noticed "spotify "wrapped" before last year when LB did a thing and people here kept talking about it :D
7:24 AM
I'm happy we also did the recap of decmeber in early jan too. I think being *later* but *more complete* can b our selling point tbh
7:24 AM
i'd rather have that
7:25 AM
alastairp

well, i mean - if american's use thanksgiving as an informal "start of christmas" indicator then that's fine
7:25 AM
CatQuest

sure!
7:25 AM
alastairp

I just started using it because it's an easy to identify part of the year
7:25 AM
anyway
7:25 AM
CatQuest

i mean.. for you. i have no idea when "thnaksgiving" is :D
7:25 AM
alastairp -> officebrainz
7:25 AM
for me witches thnaksgiving is the autumn equinox
7:25 AM
:D
7:25 AM
alastairp

sure, I've had plenty of exposure to US friends and culture, so...
7:26 AM
CatQuest

yep :)
8:09 AM
alastairp

hi Pratha-Fish, good luck about your exams! 🙊
8:09 AM
when will they start instead?
8:21 AM
piwu1 joined the channel
8:23 AM
piwu has quit
8:23 AM
piwu1 is now known as piwu
9:37 AM
mayhem

alastairp: officially thanksgiving in the US is the 4th thursday of Nov.
9:44 AM
alastairp

is it possible to have 5 thursdays in november?
9:44 AM
oh yes, in fact next year the 30th is the 5th thursday
10:23 AM
atj

yvanzo: did you work out what caused the random SOLR slowdown on Monday?
10:24 AM
piwu7 joined the channel
10:25 AM
piwu has quit
10:25 AM
piwu7 is now known as piwu
10:28 AM
alastairp

hi lucifer, I have some questions
10:29 AM
https://github.com/metabrainz/listenbrainz-serv... I don't understand what you mean here
10:29 AM
https://github.com/metabrainz/listenbrainz-serv... did this change since I wrote it?
10:34 AM
lucifer

alastairp: hi! i meant that the canonical data tables and as a consequence the typsense index are built from the musicbrainz replica on aretha server.
10:34 AM
alastairp

oh right. "the json dump database", not "the json dump"
10:34 AM
lucifer

also we have 2 mapping containers, one mbid-mapping-writer-prod and mbid-mapping.
10:35 AM
the latter has the cron jobs to rebuild those indexes and canonical tables
10:36 AM
alastairp

"these indexes" - the mbid_mapping?
10:42 AM
lucifer

whereas the former runs the process which actually consumes listens and does the mapping utilising the typsense index and canonical tables
10:42 AM
alastairp

yeah, right
10:42 AM
lucifer

"these indexes" - the typsense mbid mapping index
10:42 AM
alastairp

which Dockerfiles build each of these?
10:43 AM
lucifer

the writer container uses the same dockerfile as web container. so dockerfile at root of the repo
10:43 AM
the listenbrainz-mbid-mapping one uses the dockerfile from listenbrainz/mbid_mapping dir
10:45 AM
the recheck has indeed been added since then
10:45 AM
https://github.com/metabrainz/listenbrainz-serv...
10:46 AM
alastairp

how does this work?
10:46 AM
Initially if a match is not found, check again a day later. After that retry, every 2 * (NOW() - last_updated) INTERVAL later till no match is found.
10:46 AM
until _no_ match is found?
10:46 AM
this is in the mapping writer container?
10:47 AM
monkey

https://github.com/metabrainz/listenbrainz-serv...
10:47 AM
Some poetry
10:47 AM
lucifer

yes the logic is first no match. recheck after 1 day. still no match then recheck after 2. then 4, 8, 16, 32. max is 32. if no match is found then recheck after 32 days again
10:48 AM
note that this is not a cron job. it works like if a listen with this msid comes again then recheck.
10:48 AM
before that pr we never rechecked a msid which was present in the mbid mapping table regadless of whether there was a match or not.
10:49 AM
what happens now is that when listens come in, we check whether their msids are matched already or not.
10:49 AM
if its not matched, then we check when was the last time we attempted a match.
10:50 AM
if the current time is more than the check again time in the table do the recheck otherwise ignore the msid.
10:51 AM
alastairp

how do we know if a listen with this msid comes in again?
10:51 AM
is it triggered from the listen writer?
10:51 AM
lucifer

its a bit convuluted. if it helps, i can try to write this up with some examples in the docs.
10:51 AM
alastairp

sure, how about I push my changes as they are and you fill in this part?
10:52 AM
lucifer

the mbid mapping writer container consumes the unique listens queue
10:52 AM
the timescale writer writes all listens it inserts in the db to that queue
10:52 AM
sure sounds good
10:53 AM
alastairp

maybe https://listenbrainz.readthedocs.io/en/latest/d... is a bit out of date too?
10:55 AM
BrainzGit

[listenbrainz-server] 14alastair closed pull request #1996 (03master…mapping-docs): Add initial mapping documentation for developers and maintainers https://github.com/metabrainz/listenbrainz-serv...
10:55 AM
alastairp

lucifer: I merged this into LB#2157
10:55 AM
BrainzBot

Add dumps for musicbrainz metadata tables: https://github.com/metabrainz/listenbrainz-serv...
10:55 AM
alastairp

I'm just applying your feedback on that now
10:58 AM
lucifer

sounds good thanks
10:59 AM
alastairp: that architecture docs looks upto date to me. what seems outdated to you?
11:00 AM
alastairp

lucifer: just based on your comments about how msids are matched - in the Listen Flow section. if certain conditions when a listen comes in causes processes to happen such as a re-match then perhaps that should be in the docs
11:00 AM
lucifer

ah ok. i see
11:00 AM
alastairp

I guess "The MBID mapper also consumes from the unique queue and builds a MSID->MBID mapping using these listens." is part of that
11:00 AM
lucifer

that architecture doc is mainly how the listen flows in the system
11:01 AM
recheck can probably go in mapping specific docs
11:01 AM
alastairp

yeah, I'm unsure where the explanation should have gone
11:01 AM
lucifer

but mostly a matter of preference i guess
11:01 AM
alastairp

I was thinking about the developer/maintainer split - who needs to know about this
11:02 AM
lucifer

makes sense. sounds like developer to me.
11:02 AM
alastairp

that being said, we don't really have a development environment for this part of the stack, right?
11:02 AM
lucifer

maintainer is very specific things server related, consul or dumps rsync stuff imo.
11:02 AM
yeah true that.
11:02 AM
alastairp

yes, agreed
11:09 AM
lucifer: one more: https://github.com/metabrainz/listenbrainz-serv...
11:11 AM
Pratha-Fish

alastairp: Hi, the exam has been postponed to 14th Nov
11:12 AM
I just have normal classes and practicals till 14th Nov
11:13 AM
*practical exams / writeups
11:13 AM
lucifer

alastairp: just keeping an explicit transaction would be good. otherwise feel free to update as preference
11:13 AM
alastairp

lucifer: thanks
11:14 AM
just testing this again now, perhaps we can deploy on beta and try and make a dump ;)
11:14 AM
Pratha-Fish: excellent!
11:14 AM
I hope you enjoyed the "official" part of SoC!
11:14 AM
Pratha-Fish

Yes haha
11:14 AM
alastairp

as we said before, happy for you to stick around as long as you want
11:14 AM
Pratha-Fish

I'd be happy to stick around!
11:15 AM
alastairp

Pratha-Fish: so, on Monday I started having a play around with your conversion code and came up with a handful of interesting thigs
11:15 AM
Pratha-Fish 👀
11:16 AM
first of all (and we couldn't have predicted this ahead of time), python 3.11 was released with a bunch of speed improvements
11:17 AM
check out this, for example: https://gist.github.com/alastair/fe8fd0ae0e7a01...
11:17 AM
Pratha-Fish

Oh yes, I've heard it has become at least 10% faster in most cases. Especially with stuff involving for loops
11:17 AM
alastairp

(there are 2 files in that gist)
11:18 AM
in this case, it's almost 2x faster just looping through some mlhd files and counting blank recording rows
11:18 AM
Pratha-Fish

_W o w_
11:19 AM
That was so unexpected
11:19 AM
alastairp

reload that page, I just uploaded another file to the gist
11:19 AM
stats-pandas-vs-python.txt
11:19 AM
to me this is the even more interesting one - trying to count empty rows in 1000 mlhd files
11:19 AM
with python 3.9, python is ~50 seconds, and pandas 30
11:20 AM
but with python 3.11, python is just as fast as pandas!
11:20 AM
Pratha-Fish

!!!
11:20 AM
alastairp

for me this is a really interesting final result
11:21 AM
Pratha-Fish

Wow, and here I was thinking Python 3.11 would only bring ~10% improvements lol
11:21 AM
alastairp

I spent a whole bunch of time experimenting with pandas on monday. it's interesting - I see that it makes some things faster, but honestly I'm unsure what the tradeoff is between time spent learning how to use it, and how much faster plain python code runs (especially with these speed improvements)
11:21 AM
yes, right. it's important to keep in mind that these are very simple changes
11:21 AM
so, I did one more set of experiments, doing the full mlhd conversion process on 1000 files
11:22 AM
and it turns out that my basic loop + dictionaries + sets is basically exactly the same speed as the pandas code that you wrote
11:22 AM
I suspect that's because they are basically the same thing - the dataframe.map function just iterates through the dataframe and does an operation on each row
11:22 AM
Pratha-Fish

That's a certified bruh moment