lucifer or mayhem: can you check the "Statistics for the user have not been calculated" email, when you have some time? :) Not sure whether there's a way to force that or something.
2022-11-02 30638, 2022
lucifer
reosarevok: thanks 👍 asked for their username to look into it
2022-11-02 30610, 2022
reosarevok
Heh, ok, yeah, seems the email isn't in use in MB so maybe it's a different one in their account
2022-11-02 30610, 2022
piwu has quit
2022-11-02 30646, 2022
piwu joined the channel
2022-11-02 30639, 2022
reosarevok
bitmap, yvanzo: I was wondering, what's the use for $c->user_exists in Perl? We don't use it in JS. Is it faster than just checking $c->user?
2022-11-02 30650, 2022
reosarevok
Or are there cases where we set user_exists without loading user?
2022-11-02 30622, 2022
aerozol
mayhem: I have no idea what it means, but I get those suggestions too, with no results if I search for them
[22:04] <aerozol> Am I the only person in the world who doesn’t feel okay with getting a Spotify account?
2022-11-02 30645, 2022
CatQuest
lmao
2022-11-02 30645, 2022
CatQuest
I was stumped because yo uhadot put in a gender
2022-11-02 30645, 2022
CatQuest
and I didn't really want to at the time
2022-11-02 30645, 2022
CatQuest
now that I figured myself out I *can't* I'd have to lie (unless they've stopped using it/have nb as an option/realsied that sometimes peopel don't wnatot give away "gender" to some thing)
2022-11-02 30647, 2022
CatQuest
I had on but i never used it, and i can't remember a password
2022-11-02 30653, 2022
CatQuest
heh. I think music, as in the artists and such, actually *thrived* because of piracy. I would have *never* heard (or heard of!) most of the music i've later *bought* if it wasn't for piracy/freeblogs/etc
2022-11-02 30636, 2022
CatQuest
morn alastairp
2022-11-02 30641, 2022
CatQuest
hah. yet another reason I think it's important to include december yo :D
2022-11-02 30643, 2022
CatQuest
hohohoho
2022-11-02 30648, 2022
CatQuest
... wait so people lsiten to christmas music for mnovember 1st now? that's ridicolous
2022-11-02 30627, 2022
CatQuest
(lol one could simply make an algorithm, especially if one was spotify since you had all the data about what releases *where* christmas) to jsut. you know, *exclude* christmas music
2022-11-02 30615, 2022
alastairp
yeah, when I spoke to the echonest about this years ago, they were talking about how identifying christmas music (and kid's music) was really important in order to work out when the correct context to recommend it was
2022-11-02 30624, 2022
CatQuest
anyway i say it now and I say it always. data is data. and it *si* interesting data that people play chistmas music in december. it's *OK* to include that statistic (it's als ook to exclude that statistic for music recomendations :))
2022-11-02 30635, 2022
CatQuest
yea!
2022-11-02 30612, 2022
alastairp
I don't know when the switchover date(s) are, but I understand that there are some
2022-11-02 30612, 2022
alastairp
maybe it's nov 1? in north america I had always heard "thanksgiving" (last thurs in nov?)
yes the logic is first no match. recheck after 1 day. still no match then recheck after 2. then 4, 8, 16, 32. max is 32. if no match is found then recheck after 32 days again
2022-11-02 30612, 2022
lucifer
note that this is not a cron job. it works like if a listen with this msid comes again then recheck.
2022-11-02 30653, 2022
lucifer
before that pr we never rechecked a msid which was present in the mbid mapping table regadless of whether there was a match or not.
2022-11-02 30620, 2022
lucifer
what happens now is that when listens come in, we check whether their msids are matched already or not.
2022-11-02 30649, 2022
lucifer
if its not matched, then we check when was the last time we attempted a match.
2022-11-02 30631, 2022
lucifer
if the current time is more than the check again time in the table do the recheck otherwise ignore the msid.
2022-11-02 30635, 2022
alastairp
how do we know if a listen with this msid comes in again?
2022-11-02 30640, 2022
alastairp
is it triggered from the listen writer?
2022-11-02 30643, 2022
lucifer
its a bit convuluted. if it helps, i can try to write this up with some examples in the docs.
2022-11-02 30659, 2022
alastairp
sure, how about I push my changes as they are and you fill in this part?
2022-11-02 30608, 2022
lucifer
the mbid mapping writer container consumes the unique listens queue
2022-11-02 30625, 2022
lucifer
the timescale writer writes all listens it inserts in the db to that queue
alastairp: that architecture docs looks upto date to me. what seems outdated to you?
2022-11-02 30616, 2022
alastairp
lucifer: just based on your comments about how msids are matched - in the Listen Flow section. if certain conditions when a listen comes in causes processes to happen such as a re-match then perhaps that should be in the docs
2022-11-02 30631, 2022
lucifer
ah ok. i see
2022-11-02 30641, 2022
alastairp
I guess "The MBID mapper also consumes from the unique queue and builds a MSID->MBID mapping using these listens." is part of that
2022-11-02 30651, 2022
lucifer
that architecture doc is mainly how the listen flows in the system
2022-11-02 30602, 2022
lucifer
recheck can probably go in mapping specific docs
2022-11-02 30617, 2022
alastairp
yeah, I'm unsure where the explanation should have gone
2022-11-02 30620, 2022
lucifer
but mostly a matter of preference i guess
2022-11-02 30641, 2022
alastairp
I was thinking about the developer/maintainer split - who needs to know about this
2022-11-02 30613, 2022
lucifer
makes sense. sounds like developer to me.
2022-11-02 30644, 2022
alastairp
that being said, we don't really have a development environment for this part of the stack, right?
2022-11-02 30645, 2022
lucifer
maintainer is very specific things server related, consul or dumps rsync stuff imo.
Oh yes, I've heard it has become at least 10% faster in most cases. Especially with stuff involving for loops
2022-11-02 30650, 2022
alastairp
(there are 2 files in that gist)
2022-11-02 30610, 2022
alastairp
in this case, it's almost 2x faster just looping through some mlhd files and counting blank recording rows
2022-11-02 30641, 2022
Pratha-Fish
_W o w_
2022-11-02 30607, 2022
Pratha-Fish
That was so unexpected
2022-11-02 30614, 2022
alastairp
reload that page, I just uploaded another file to the gist
2022-11-02 30619, 2022
alastairp
stats-pandas-vs-python.txt
2022-11-02 30637, 2022
alastairp
to me this is the even more interesting one - trying to count empty rows in 1000 mlhd files
2022-11-02 30647, 2022
alastairp
with python 3.9, python is ~50 seconds, and pandas 30
2022-11-02 30601, 2022
alastairp
but with python 3.11, python is just as fast as pandas!
2022-11-02 30606, 2022
Pratha-Fish
!!!
2022-11-02 30629, 2022
alastairp
for me this is a really interesting final result
2022-11-02 30611, 2022
Pratha-Fish
Wow, and here I was thinking Python 3.11 would only bring ~10% improvements lol
2022-11-02 30625, 2022
alastairp
I spent a whole bunch of time experimenting with pandas on monday. it's interesting - I see that it makes some things faster, but honestly I'm unsure what the tradeoff is between time spent learning how to use it, and how much faster plain python code runs (especially with these speed improvements)
2022-11-02 30635, 2022
alastairp
yes, right. it's important to keep in mind that these are very simple changes
2022-11-02 30653, 2022
alastairp
so, I did one more set of experiments, doing the full mlhd conversion process on 1000 files
2022-11-02 30619, 2022
alastairp
and it turns out that my basic loop + dictionaries + sets is basically exactly the same speed as the pandas code that you wrote
2022-11-02 30652, 2022
alastairp
I suspect that's because they are basically the same thing - the dataframe.map function just iterates through the dataframe and does an operation on each row