#metabrainz

/

      • darkstarx has quit
      • darkstarx joined the channel
      • reosarevok
        lucifer or mayhem: can you check the "Statistics for the user have not been calculated" email, when you have some time? :) Not sure whether there's a way to force that or something.
      • lucifer
        reosarevok: thanks 👍 asked for their username to look into it
      • reosarevok
        Heh, ok, yeah, seems the email isn't in use in MB so maybe it's a different one in their account
      • piwu has quit
      • piwu joined the channel
      • bitmap, yvanzo: I was wondering, what's the use for $c->user_exists in Perl? We don't use it in JS. Is it faster than just checking $c->user?
      • Or are there cases where we set user_exists without loading user?
      • aerozol
        mayhem: I have no idea what it means, but I get those suggestions too, with no results if I search for them
      • saturday7 has quit
      • saturday79 joined the channel
      • tykling has quit
      • tykling joined the channel
      • alastairp
        morning
      • CatQuest
        [22:04] <aerozol> Am I the only person in the world who doesn’t feel okay with getting a Spotify account?
      • lmao
      • I was stumped because yo uhadot put in a gender
      • and I didn't really want to at the time
      • now that I figured myself out I *can't* I'd have to lie (unless they've stopped using it/have nb as an option/realsied that sometimes peopel don't wnatot give away "gender" to some thing)
      • I had on but i never used it, and i can't remember a password
      • heh. I think music, as in the artists and such, actually *thrived* because of piracy. I would have *never* heard (or heard of!) most of the music i've later *bought* if it wasn't for piracy/freeblogs/etc
      • morn alastairp
      • hah. yet another reason I think it's important to include december yo :D
      • hohohoho
      • ... wait so people lsiten to christmas music for mnovember 1st now? that's ridicolous
      • (lol one could simply make an algorithm, especially if one was spotify since you had all the data about what releases *where* christmas) to jsut. you know, *exclude* christmas music
      • alastairp
        yeah, when I spoke to the echonest about this years ago, they were talking about how identifying christmas music (and kid's music) was really important in order to work out when the correct context to recommend it was
      • CatQuest
        anyway i say it now and I say it always. data is data. and it *si* interesting data that people play chistmas music in december. it's *OK* to include that statistic (it's als ook to exclude that statistic for music recomendations :))
      • yea!
      • alastairp
        I don't know when the switchover date(s) are, but I understand that there are some
      • maybe it's nov 1? in north america I had always heard "thanksgiving" (last thurs in nov?)
      • CatQuest
      • it's just so.. noone does "thanksgiving" but americans. so it's like to the rest of the world it's ???
      • anyway I hadn't noticed "spotify "wrapped" before last year when LB did a thing and people here kept talking about it :D
      • I'm happy we also did the recap of decmeber in early jan too. I think being *later* but *more complete* can b our selling point tbh
      • i'd rather have that
      • alastairp
        well, i mean - if american's use thanksgiving as an informal "start of christmas" indicator then that's fine
      • CatQuest
        sure!
      • alastairp
        I just started using it because it's an easy to identify part of the year
      • anyway
      • CatQuest
        i mean.. for you. i have no idea when "thnaksgiving" is :D
      • alastairp -> officebrainz
      • for me witches thnaksgiving is the autumn equinox
      • :D
      • alastairp
        sure, I've had plenty of exposure to US friends and culture, so...
      • CatQuest
        yep :)
      • alastairp
        hi Pratha-Fish, good luck about your exams! 🙊
      • when will they start instead?
      • piwu1 joined the channel
      • piwu has quit
      • piwu1 is now known as piwu
      • mayhem
        alastairp: officially thanksgiving in the US is the 4th thursday of Nov.
      • alastairp
        is it possible to have 5 thursdays in november?
      • oh yes, in fact next year the 30th is the 5th thursday
      • atj
        yvanzo: did you work out what caused the random SOLR slowdown on Monday?
      • piwu7 joined the channel
      • piwu has quit
      • piwu7 is now known as piwu
      • alastairp
        hi lucifer, I have some questions
      • https://github.com/metabrainz/listenbrainz-serv... I don't understand what you mean here
      • https://github.com/metabrainz/listenbrainz-serv... did this change since I wrote it?
      • lucifer
        alastairp: hi! i meant that the canonical data tables and as a consequence the typsense index are built from the musicbrainz replica on aretha server.
      • alastairp
        oh right. "the json dump database", not "the json dump"
      • lucifer
        also we have 2 mapping containers, one mbid-mapping-writer-prod and mbid-mapping.
      • the latter has the cron jobs to rebuild those indexes and canonical tables
      • alastairp
        "these indexes" - the mbid_mapping?
      • lucifer
        whereas the former runs the process which actually consumes listens and does the mapping utilising the typsense index and canonical tables
      • alastairp
        yeah, right
      • lucifer
        "these indexes" - the typsense mbid mapping index
      • alastairp
        which Dockerfiles build each of these?
      • lucifer
        the writer container uses the same dockerfile as web container. so dockerfile at root of the repo
      • the listenbrainz-mbid-mapping one uses the dockerfile from listenbrainz/mbid_mapping dir
      • the recheck has indeed been added since then
      • alastairp
        how does this work?
      • Initially if a match is not found, check again a day later. After that retry, every 2 * (NOW() - last_updated) INTERVAL later till no match is found.
      • until _no_ match is found?
      • this is in the mapping writer container?
      • monkey
      • Some poetry
      • lucifer
        yes the logic is first no match. recheck after 1 day. still no match then recheck after 2. then 4, 8, 16, 32. max is 32. if no match is found then recheck after 32 days again
      • note that this is not a cron job. it works like if a listen with this msid comes again then recheck.
      • before that pr we never rechecked a msid which was present in the mbid mapping table regadless of whether there was a match or not.
      • what happens now is that when listens come in, we check whether their msids are matched already or not.
      • if its not matched, then we check when was the last time we attempted a match.
      • if the current time is more than the check again time in the table do the recheck otherwise ignore the msid.
      • alastairp
        how do we know if a listen with this msid comes in again?
      • is it triggered from the listen writer?
      • lucifer
        its a bit convuluted. if it helps, i can try to write this up with some examples in the docs.
      • alastairp
        sure, how about I push my changes as they are and you fill in this part?
      • lucifer
        the mbid mapping writer container consumes the unique listens queue
      • the timescale writer writes all listens it inserts in the db to that queue
      • sure sounds good
      • alastairp
      • BrainzGit
        [listenbrainz-server] 14alastair closed pull request #1996 (03master…mapping-docs): Add initial mapping documentation for developers and maintainers https://github.com/metabrainz/listenbrainz-serv...
      • alastairp
        lucifer: I merged this into LB#2157
      • BrainzBot
        Add dumps for musicbrainz metadata tables: https://github.com/metabrainz/listenbrainz-serv...
      • alastairp
        I'm just applying your feedback on that now
      • lucifer
        sounds good thanks
      • alastairp: that architecture docs looks upto date to me. what seems outdated to you?
      • alastairp
        lucifer: just based on your comments about how msids are matched - in the Listen Flow section. if certain conditions when a listen comes in causes processes to happen such as a re-match then perhaps that should be in the docs
      • lucifer
        ah ok. i see
      • alastairp
        I guess "The MBID mapper also consumes from the unique queue and builds a MSID->MBID mapping using these listens." is part of that
      • lucifer
        that architecture doc is mainly how the listen flows in the system
      • recheck can probably go in mapping specific docs
      • alastairp
        yeah, I'm unsure where the explanation should have gone
      • lucifer
        but mostly a matter of preference i guess
      • alastairp
        I was thinking about the developer/maintainer split - who needs to know about this
      • lucifer
        makes sense. sounds like developer to me.
      • alastairp
        that being said, we don't really have a development environment for this part of the stack, right?
      • lucifer
        maintainer is very specific things server related, consul or dumps rsync stuff imo.
      • yeah true that.
      • alastairp
        yes, agreed
      • Pratha-Fish
        alastairp: Hi, the exam has been postponed to 14th Nov
      • I just have normal classes and practicals till 14th Nov
      • *practical exams / writeups
      • lucifer
        alastairp: just keeping an explicit transaction would be good. otherwise feel free to update as preference
      • alastairp
        lucifer: thanks
      • just testing this again now, perhaps we can deploy on beta and try and make a dump ;)
      • Pratha-Fish: excellent!
      • I hope you enjoyed the "official" part of SoC!
      • Pratha-Fish
        Yes haha
      • alastairp
        as we said before, happy for you to stick around as long as you want
      • Pratha-Fish
        I'd be happy to stick around!
      • alastairp
        Pratha-Fish: so, on Monday I started having a play around with your conversion code and came up with a handful of interesting thigs
      • Pratha-Fish đź‘€
      • first of all (and we couldn't have predicted this ahead of time), python 3.11 was released with a bunch of speed improvements
      • Pratha-Fish
        Oh yes, I've heard it has become at least 10% faster in most cases. Especially with stuff involving for loops
      • alastairp
        (there are 2 files in that gist)
      • in this case, it's almost 2x faster just looping through some mlhd files and counting blank recording rows
      • Pratha-Fish
        _W o w_
      • That was so unexpected
      • alastairp
        reload that page, I just uploaded another file to the gist
      • stats-pandas-vs-python.txt
      • to me this is the even more interesting one - trying to count empty rows in 1000 mlhd files
      • with python 3.9, python is ~50 seconds, and pandas 30
      • but with python 3.11, python is just as fast as pandas!
      • Pratha-Fish
        !!!
      • alastairp
        for me this is a really interesting final result
      • Pratha-Fish
        Wow, and here I was thinking Python 3.11 would only bring ~10% improvements lol
      • alastairp
        I spent a whole bunch of time experimenting with pandas on monday. it's interesting - I see that it makes some things faster, but honestly I'm unsure what the tradeoff is between time spent learning how to use it, and how much faster plain python code runs (especially with these speed improvements)
      • yes, right. it's important to keep in mind that these are very simple changes
      • so, I did one more set of experiments, doing the full mlhd conversion process on 1000 files
      • and it turns out that my basic loop + dictionaries + sets is basically exactly the same speed as the pandas code that you wrote
      • I suspect that's because they are basically the same thing - the dataframe.map function just iterates through the dataframe and does an operation on each row
      • Pratha-Fish
        That's a certified bruh moment