lucifer or mayhem: can you check the "Statistics for the user have not been calculated" email, when you have some time? :) Not sure whether there's a way to force that or something.
lucifer
reosarevok: thanks 👍 asked for their username to look into it
reosarevok
Heh, ok, yeah, seems the email isn't in use in MB so maybe it's a different one in their account
piwu has quit
piwu joined the channel
bitmap, yvanzo: I was wondering, what's the use for $c->user_exists in Perl? We don't use it in JS. Is it faster than just checking $c->user?
Or are there cases where we set user_exists without loading user?
aerozol
mayhem: I have no idea what it means, but I get those suggestions too, with no results if I search for them
[22:04] <aerozol> Am I the only person in the world who doesn’t feel okay with getting a Spotify account?
lmao
I was stumped because yo uhadot put in a gender
and I didn't really want to at the time
now that I figured myself out I *can't* I'd have to lie (unless they've stopped using it/have nb as an option/realsied that sometimes peopel don't wnatot give away "gender" to some thing)
I had on but i never used it, and i can't remember a password
heh. I think music, as in the artists and such, actually *thrived* because of piracy. I would have *never* heard (or heard of!) most of the music i've later *bought* if it wasn't for piracy/freeblogs/etc
morn alastairp
hah. yet another reason I think it's important to include december yo :D
hohohoho
... wait so people lsiten to christmas music for mnovember 1st now? that's ridicolous
(lol one could simply make an algorithm, especially if one was spotify since you had all the data about what releases *where* christmas) to jsut. you know, *exclude* christmas music
alastairp
yeah, when I spoke to the echonest about this years ago, they were talking about how identifying christmas music (and kid's music) was really important in order to work out when the correct context to recommend it was
CatQuest
anyway i say it now and I say it always. data is data. and it *si* interesting data that people play chistmas music in december. it's *OK* to include that statistic (it's als ook to exclude that statistic for music recomendations :))
yea!
alastairp
I don't know when the switchover date(s) are, but I understand that there are some
maybe it's nov 1? in north america I had always heard "thanksgiving" (last thurs in nov?)
yes the logic is first no match. recheck after 1 day. still no match then recheck after 2. then 4, 8, 16, 32. max is 32. if no match is found then recheck after 32 days again
note that this is not a cron job. it works like if a listen with this msid comes again then recheck.
before that pr we never rechecked a msid which was present in the mbid mapping table regadless of whether there was a match or not.
what happens now is that when listens come in, we check whether their msids are matched already or not.
if its not matched, then we check when was the last time we attempted a match.
if the current time is more than the check again time in the table do the recheck otherwise ignore the msid.
alastairp
how do we know if a listen with this msid comes in again?
is it triggered from the listen writer?
lucifer
its a bit convuluted. if it helps, i can try to write this up with some examples in the docs.
alastairp
sure, how about I push my changes as they are and you fill in this part?
lucifer
the mbid mapping writer container consumes the unique listens queue
the timescale writer writes all listens it inserts in the db to that queue
alastairp: that architecture docs looks upto date to me. what seems outdated to you?
alastairp
lucifer: just based on your comments about how msids are matched - in the Listen Flow section. if certain conditions when a listen comes in causes processes to happen such as a re-match then perhaps that should be in the docs
lucifer
ah ok. i see
alastairp
I guess "The MBID mapper also consumes from the unique queue and builds a MSID->MBID mapping using these listens." is part of that
lucifer
that architecture doc is mainly how the listen flows in the system
recheck can probably go in mapping specific docs
alastairp
yeah, I'm unsure where the explanation should have gone
lucifer
but mostly a matter of preference i guess
alastairp
I was thinking about the developer/maintainer split - who needs to know about this
lucifer
makes sense. sounds like developer to me.
alastairp
that being said, we don't really have a development environment for this part of the stack, right?
lucifer
maintainer is very specific things server related, consul or dumps rsync stuff imo.
Oh yes, I've heard it has become at least 10% faster in most cases. Especially with stuff involving for loops
alastairp
(there are 2 files in that gist)
in this case, it's almost 2x faster just looping through some mlhd files and counting blank recording rows
Pratha-Fish
_W o w_
That was so unexpected
alastairp
reload that page, I just uploaded another file to the gist
stats-pandas-vs-python.txt
to me this is the even more interesting one - trying to count empty rows in 1000 mlhd files
with python 3.9, python is ~50 seconds, and pandas 30
but with python 3.11, python is just as fast as pandas!
Pratha-Fish
!!!
alastairp
for me this is a really interesting final result
Pratha-Fish
Wow, and here I was thinking Python 3.11 would only bring ~10% improvements lol
alastairp
I spent a whole bunch of time experimenting with pandas on monday. it's interesting - I see that it makes some things faster, but honestly I'm unsure what the tradeoff is between time spent learning how to use it, and how much faster plain python code runs (especially with these speed improvements)
yes, right. it's important to keep in mind that these are very simple changes
so, I did one more set of experiments, doing the full mlhd conversion process on 1000 files
and it turns out that my basic loop + dictionaries + sets is basically exactly the same speed as the pandas code that you wrote
I suspect that's because they are basically the same thing - the dataframe.map function just iterates through the dataframe and does an operation on each row