#metabrainz

/

      • pristine___
        iliekcomputers:
      • Let me know when you are around! This needs to be addressed asap,, recs of people are not shown on the site.
      • supersandro2000 has quit
      • supersandro2000 joined the channel
      • tn5421 joined the channel
      • d4rkie has quit
      • Nyanko-sensei joined the channel
      • thomasross has quit
      • _lucifer
        pristine___: does the source get zipped and then imported by spark ?
      • yeah it does according to this
      • pristine___: i might be wrong but moving the `UserRecommendationsRecord` and `UserRecommendationsMessage` to somewhere inside `listenbrainz_spark` folder should probably fix the issue
      • right now, the `data` folder at the root is not included in the source zip, so spark is unable to find those files and hence errors
      • pristine___
      • _lucifer: this above file runs perfectly, so I am not sure if it's a zip issue
      • _lucifer
        pristine___: yeah in that case, you are right. it shouldn't be a zip issue then
      • iliekcomputers
        pristine___: hey
      • looking now
      • although the question is, if the errors are in request consumer, why is it affecting the site?
      • pristine___: i'm not completely sure what the issue is, i'll restart the cnsumer and spark workers too this time for safety.
      • pristine___: i also don't understand how this led to user facing errors? and how to remediate those
      • pristine___: I see errors for the userfacing issues, I figure those need to be handled? https://sentry.metabrainz.org/metabrainz/listen...
      • pristine___: i've triggered a new recommendations job, but i don't think i can do much else, the data validation errors need to be fixed.
      • pristine___: the job failed again, i'm not sure what the issue is, it'll need to be debugged in dev i guess. if this needs to be fixed quick, we should revert https://github.com/metabrainz/listenbrainz-serv... and deploy again.
      • other than that, things look reasonable to me, so i'm stepping away for now.
      • pristine___
        > although the question is, if the errors are in request consumer, why is it affecting the site?
      • It is affecting site in way that users see "recommendations for the user not generated, check back later", it's a valid message but if the scripts runs success fully users will be able to see their recs.
      • iliekcomputers: ^
      • iliekcomputers
        Why is the site not showing the old recommendations
      • pristine___
        Because the older recs are not according to Pydantic format. I triggered generate recommendations so that the recs are in suitable format and will be shown on site, but the script failed because of `data module not found` error
      • iliekcomputers: I don't think there is an need to revert the PR, I will just open a PR to remove the data module usage from recommend.py and it will work.
      • iliekcomputers
        We should investigate why the import fails
      • Does it work in dev?
      • pristine___
        Though it is weird, the error. Because the data module works for one script and doesn't for the other
      • Yeah, works in dev.
      • iliekcomputers
        That is very weird
      • Let's open a ticket to investigate what exactly the issue is
      • pristine___
        Right. Also, regarding the site, it looks better with this PR in a way that check back later is a better message than ISE, imo
      • Cool. I will open a ticket
      • iliekcomputers
        Cool, I didn't really understand the urgency of this, but I'm happy with that plan
      • pristine___
        iliekcomputers: steps
      • 1 open a PR to remove data module usage from recommend.py ( your comment on the PR last night will fulfill this purpose)
      • 2. Merge the PR, restart request consumer.
      • 3 open a ticket to fix the issue.
      • iliekcomputers: Urgency of what?
      • iliekcomputers
        Llke how urgent fixing the error was
      • Steps look good to me
      • pristine___
        iliekcomputers: So that users can see their recs, I am of the opinion that the rec feature is new, so users might be interested in checking their recs, I'd just don't want them to see check back later message, when a few days back we said that go check your recs and give feedback.
      • iliekcomputers
        Makes sense, which is why I suggested reverting
      • pristine___
        iliekcomputers: But then a few users will get ISE, no? *Check back later* is better than ISE, and recs better than *check back later*. Give me an hour, have just woken up, I will make a PR in an hour or so.
      • iliekcomputers
        Sure.
      • v6lur joined the channel
      • _lucifer
        alastairp: there are some issues regarding gh:CB#311.
      • BrainzBot
        CB-382: You can sometimes create a review without a revision: https://github.com/metabrainz/critiquebrainz/pu...
      • _lucifer
        It is not working as expected because the create revision function itself calls other functions like `review.get_by_id` and avg_rating.update`.
      • it seems that the created revision is not yet committed when the other two operations are executed hence there is a mismatch.
      • i think this can be probably fixed if we pass the connection to those functions as well but that means adding an optional connection to almost all of the db operations
      • i am not sure if there is a better solution
      • Gazooo794 has quit
      • Gazooo794 joined the channel
      • BrainzGit
        [listenbrainz-server] vansika opened pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-serv...
      • pristine___
        iliekcomputers: opening a ticket now :)
      • BrainzGit
        [listenbrainz-server] paramsingh merged pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-serv...
      • iliekcomputers
        pristine___: restarted
      • pristine___: please link the ticket here once, thanks!
      • ruaok: I decided to look at the checkout integration this weekend https://usercontent.irccloud-cdn.com/file/Gl7sr...
      • while I wait for reviews on the feed page thing
      • pristine___
        Cool
      • iliekcomputers: recs are on site now.
      • iliekcomputers
        awesome
      • pristine___
        And the ISE resolved!
      • Though I still don't know why recs for the user aren't in the expected format but atleast the user will no more see ISE. In the meantime I will try to look into this.
      • iliekcomputers
        sounds good.
      • gr0uch0mars joined the channel
      • ruaok
        iliekcomputers: looks promising!
      • iliekcomputers
        i lifted the lichess text :D
      • ruaok
        all art is theft. :)
      • v6lur has quit
      • Glycem has quit
      • Glycem joined the channel
      • pristine___
        ruaok: what if postgres' `unaccent` and python's `unidecode` gives different result for the same accented string
      • ruaok
        we'll miss matches.
      • I think it might be best for me to move to unidecode for my next round of mapping work.
      • pristine___
        > we'll miss matches.
      • ruaok: Right. we already miss matches since we are joining on msids rn, and the fact that unicode and unaccent results may differ, we will again miss matches. So I was wondering if devoting time in creating matchable fields rn for artist_name and track_name is a good step, I mean shouldn't we wait till mapping also uses unidecode?
      • ruaok
        its a matter of timing and severity of the problem
      • timing: I won't be doing mapping work until after the summit
      • severity: you're going to get many many more matches on text, but you're going to lose .0001% of those to funky decode mismatches. I bet you won't be able to tell.
      • pristine___
        Right
      • Cool. The missing mb data endpoint will tell us anyway the matches we missed
      • ruaok
        yep.
      • gr0uch0mars has quit
      • MajorLurker has quit
      • shivam-kapila
        iliekcomputers: whats your display res
      • Mineo has quit
      • Mineo joined the channel
      • _lucifer
        pristine___: ping
      • pristine___
        _lucifer: pong
      • _lucifer
        available for discussing as we decided the other day?
      • pristine___
        Yup 👍
      • _lucifer
        great!
      • pristine___
      • Let's start from here, normalization of the input?
      • _lucifer
        sure, i had a question before that
      • how does hdfs fit in the picture with spark?
      • pristine___
        Yeah, so spark does all the processing of data, and that data is stored in a distributed system, here that distributed system is HDFS
      • _lucifer
        ok makes sense, yes so let's continue
      • pristine___
        Nice
      • So remember you were taking about that medium blog?
      • _lucifer
        yup
      • pristine___
        Do you have a link?
      • _lucifer
        let me see if i can find it
      • pristine___
        Cool. Rn, all we do is just count the number of times a user has listened to a song, feed it as such in the recommender
      • I guess it is affecting user-user similarity
      • in a not so good way, no?
      • _lucifer
        yeah right, that is affecting the recs
      • in a bad way, at least theorectically
      • i am unable to find the link but the basic idea is this
      • |X - average| / mean
      • pristine___
        X here is the playcount?
      • _lucifer
        yes, and my bad it should | playcount - average | / std. deviation
      • average is here to counteract the user's own listening tendencies
      • and std deviation is to bring all users on a same rating scale
      • pristine___
        Right.
      • shivam-kapila
        I smell variance here
      • pristine___
        _lucifer: have you seen the rating beyong the limit error in Sentry?
      • _lucifer
        i do have a sentry account 😅. is it open for all?
      • shivam-kapila
        You will need an invite
      • _lucifer
        you can share the stack trace for the time being i guess
      • pristine___
        Not sure. But I will tell you. The ratings given by the recommender to recordings belong to (-1, 3)
      • Though we initially thought them to be in (-1, 1)
      • I am still not sure about (-1, 3) but that's what I have seen till now.
      • These ratings don't make much sense
      • imo
      • And they are directly dependent on what we feed in, ig
      • i.e playcount
      • _lucifer
        yeah, (-1, 3) does not make sense at all, its either open ended and the ratings should be considered relative to each other or yeah that
      • pristine___
        > |X - average| / mean
      • Is that the only metric we have? I am not really good at this stuff, but I guess if we have a few metrics/ways of normalization, we can compare and find the better one.
      • shivam-kapila
        Average == mean??
      • _lucifer
        not mean the standard deviation, my mistake as I said above 😓
      • there are many ways to normalize yes
      • this is the most basic one imo
      • pristine___
        Hmm.. Okay. So to start, we know that the current way of feeding playcount isn't really cool since user interactions can only be interpreted as positive feedback (implicit feedback)
      • We can start with the formula you shared above
      • _lucifer
        yeah, we can experiment and compare the results
      • pristine___
        Would you like to work on this? I mean simply treat the generated playcounts with the above formula and compare results
      • You will need to have some data sets in hdfs on your local machine, and you are good to go
      • _lucifer
        yeah sure, i am currently setup spark locally and will try to generate recs locally