#metabrainz

/

      • pristine___
        iliekcomputers:
      • 2020-09-26 27004, 2020

      • pristine___
      • 2020-09-26 27003, 2020

      • pristine___
        Let me know when you are around! This needs to be addressed asap,, recs of people are not shown on the site.
      • 2020-09-26 27032, 2020

      • supersandro2000 has quit
      • 2020-09-26 27053, 2020

      • supersandro2000 joined the channel
      • 2020-09-26 27011, 2020

      • tn5421 joined the channel
      • 2020-09-26 27021, 2020

      • d4rkie has quit
      • 2020-09-26 27055, 2020

      • Nyanko-sensei joined the channel
      • 2020-09-26 27038, 2020

      • thomasross has quit
      • 2020-09-26 27042, 2020

      • _lucifer
        pristine___: does the source get zipped and then imported by spark ?
      • 2020-09-26 27009, 2020

      • _lucifer
        yeah it does according to this
      • 2020-09-26 27014, 2020

      • _lucifer
      • 2020-09-26 27050, 2020

      • _lucifer
        pristine___: i might be wrong but moving the `UserRecommendationsRecord` and `UserRecommendationsMessage` to somewhere inside `listenbrainz_spark` folder should probably fix the issue
      • 2020-09-26 27055, 2020

      • _lucifer
        right now, the `data` folder at the root is not included in the source zip, so spark is unable to find those files and hence errors
      • 2020-09-26 27043, 2020

      • pristine___
      • 2020-09-26 27010, 2020

      • pristine___
        _lucifer: this above file runs perfectly, so I am not sure if it's a zip issue
      • 2020-09-26 27015, 2020

      • _lucifer
        pristine___: yeah in that case, you are right. it shouldn't be a zip issue then
      • 2020-09-26 27042, 2020

      • iliekcomputers
        pristine___: hey
      • 2020-09-26 27004, 2020

      • iliekcomputers
        looking now
      • 2020-09-26 27033, 2020

      • iliekcomputers
        although the question is, if the errors are in request consumer, why is it affecting the site?
      • 2020-09-26 27054, 2020

      • iliekcomputers
        pristine___: i'm not completely sure what the issue is, i'll restart the cnsumer and spark workers too this time for safety.
      • 2020-09-26 27014, 2020

      • iliekcomputers
        pristine___: i also don't understand how this led to user facing errors? and how to remediate those
      • 2020-09-26 27050, 2020

      • iliekcomputers
        pristine___: I see errors for the userfacing issues, I figure those need to be handled? https://sentry.metabrainz.org/metabrainz/listenbr…
      • 2020-09-26 27058, 2020

      • iliekcomputers
        pristine___: i've triggered a new recommendations job, but i don't think i can do much else, the data validation errors need to be fixed.
      • 2020-09-26 27046, 2020

      • iliekcomputers
        pristine___: the job failed again, i'm not sure what the issue is, it'll need to be debugged in dev i guess. if this needs to be fixed quick, we should revert https://github.com/metabrainz/listenbrainz-server… and deploy again.
      • 2020-09-26 27051, 2020

      • iliekcomputers
        other than that, things look reasonable to me, so i'm stepping away for now.
      • 2020-09-26 27037, 2020

      • pristine___
        > although the question is, if the errors are in request consumer, why is it affecting the site?
      • 2020-09-26 27033, 2020

      • pristine___
        It is affecting site in way that users see "recommendations for the user not generated, check back later", it's a valid message but if the scripts runs success fully users will be able to see their recs.
      • 2020-09-26 27039, 2020

      • pristine___
        iliekcomputers: ^
      • 2020-09-26 27036, 2020

      • iliekcomputers
        Why is the site not showing the old recommendations
      • 2020-09-26 27057, 2020

      • pristine___
        Because the older recs are not according to Pydantic format. I triggered generate recommendations so that the recs are in suitable format and will be shown on site, but the script failed because of `data module not found` error
      • 2020-09-26 27024, 2020

      • pristine___
        iliekcomputers: I don't think there is an need to revert the PR, I will just open a PR to remove the data module usage from recommend.py and it will work.
      • 2020-09-26 27011, 2020

      • iliekcomputers
        We should investigate why the import fails
      • 2020-09-26 27020, 2020

      • iliekcomputers
        Does it work in dev?
      • 2020-09-26 27020, 2020

      • pristine___
        Though it is weird, the error. Because the data module works for one script and doesn't for the other
      • 2020-09-26 27030, 2020

      • pristine___
        Yeah, works in dev.
      • 2020-09-26 27047, 2020

      • iliekcomputers
        That is very weird
      • 2020-09-26 27019, 2020

      • iliekcomputers
        Let's open a ticket to investigate what exactly the issue is
      • 2020-09-26 27034, 2020

      • pristine___
        Right. Also, regarding the site, it looks better with this PR in a way that check back later is a better message than ISE, imo
      • 2020-09-26 27041, 2020

      • pristine___
        Cool. I will open a ticket
      • 2020-09-26 27010, 2020

      • iliekcomputers
        Cool, I didn't really understand the urgency of this, but I'm happy with that plan
      • 2020-09-26 27006, 2020

      • pristine___
        iliekcomputers: steps
      • 2020-09-26 27006, 2020

      • pristine___
        1 open a PR to remove data module usage from recommend.py ( your comment on the PR last night will fulfill this purpose)
      • 2020-09-26 27006, 2020

      • pristine___
        2. Merge the PR, restart request consumer.
      • 2020-09-26 27006, 2020

      • pristine___
        3 open a ticket to fix the issue.
      • 2020-09-26 27015, 2020

      • pristine___
        iliekcomputers: Urgency of what?
      • 2020-09-26 27016, 2020

      • iliekcomputers
        Llke how urgent fixing the error was
      • 2020-09-26 27058, 2020

      • iliekcomputers
        Steps look good to me
      • 2020-09-26 27013, 2020

      • pristine___
        iliekcomputers: So that users can see their recs, I am of the opinion that the rec feature is new, so users might be interested in checking their recs, I'd just don't want them to see check back later message, when a few days back we said that go check your recs and give feedback.
      • 2020-09-26 27009, 2020

      • iliekcomputers
        Makes sense, which is why I suggested reverting
      • 2020-09-26 27036, 2020

      • pristine___
        iliekcomputers: But then a few users will get ISE, no? *Check back later* is better than ISE, and recs better than *check back later*. Give me an hour, have just woken up, I will make a PR in an hour or so.
      • 2020-09-26 27011, 2020

      • iliekcomputers
        Sure.
      • 2020-09-26 27017, 2020

      • v6lur joined the channel
      • 2020-09-26 27040, 2020

      • _lucifer
        alastairp: there are some issues regarding gh:CB#311.
      • 2020-09-26 27041, 2020

      • BrainzBot
        CB-382: You can sometimes create a review without a revision: https://github.com/metabrainz/critiquebrainz/pull…
      • 2020-09-26 27058, 2020

      • _lucifer
        It is not working as expected because the create revision function itself calls other functions like `review.get_by_id` and avg_rating.update`.
      • 2020-09-26 27001, 2020

      • _lucifer
        it seems that the created revision is not yet committed when the other two operations are executed hence there is a mismatch.
      • 2020-09-26 27026, 2020

      • _lucifer
        i think this can be probably fixed if we pass the connection to those functions as well but that means adding an optional connection to almost all of the db operations
      • 2020-09-26 27054, 2020

      • _lucifer
        i am not sure if there is a better solution
      • 2020-09-26 27001, 2020

      • Gazooo794 has quit
      • 2020-09-26 27048, 2020

      • Gazooo794 joined the channel
      • 2020-09-26 27057, 2020

      • BrainzGit
        [listenbrainz-server] vansika opened pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-server…
      • 2020-09-26 27050, 2020

      • pristine___
        iliekcomputers: opening a ticket now :)
      • 2020-09-26 27015, 2020

      • BrainzGit
        [listenbrainz-server] paramsingh merged pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-server…
      • 2020-09-26 27039, 2020

      • iliekcomputers
        pristine___: restarted
      • 2020-09-26 27047, 2020

      • iliekcomputers
        pristine___: please link the ticket here once, thanks!
      • 2020-09-26 27013, 2020

      • iliekcomputers
        ruaok: I decided to look at the checkout integration this weekend https://usercontent.irccloud-cdn.com/file/Gl7srXf…
      • 2020-09-26 27022, 2020

      • iliekcomputers
        while I wait for reviews on the feed page thing
      • 2020-09-26 27025, 2020

      • pristine___
        Cool
      • 2020-09-26 27004, 2020

      • pristine___
        iliekcomputers: recs are on site now.
      • 2020-09-26 27018, 2020

      • iliekcomputers
        awesome
      • 2020-09-26 27015, 2020

      • pristine___
        And the ISE resolved!
      • 2020-09-26 27008, 2020

      • pristine___
        Though I still don't know why recs for the user aren't in the expected format but atleast the user will no more see ISE. In the meantime I will try to look into this.
      • 2020-09-26 27045, 2020

      • iliekcomputers
        sounds good.
      • 2020-09-26 27015, 2020

      • gr0uch0mars joined the channel
      • 2020-09-26 27054, 2020

      • ruaok
        iliekcomputers: looks promising!
      • 2020-09-26 27024, 2020

      • iliekcomputers
        i lifted the lichess text :D
      • 2020-09-26 27038, 2020

      • ruaok
        all art is theft. :)
      • 2020-09-26 27011, 2020

      • v6lur has quit
      • 2020-09-26 27047, 2020

      • Glycem has quit
      • 2020-09-26 27013, 2020

      • Glycem joined the channel
      • 2020-09-26 27002, 2020

      • pristine___
        ruaok: what if postgres' `unaccent` and python's `unidecode` gives different result for the same accented string
      • 2020-09-26 27044, 2020

      • ruaok
        we'll miss matches.
      • 2020-09-26 27003, 2020

      • ruaok
        I think it might be best for me to move to unidecode for my next round of mapping work.
      • 2020-09-26 27058, 2020

      • pristine___
        > we'll miss matches.
      • 2020-09-26 27037, 2020

      • pristine___
        ruaok: Right. we already miss matches since we are joining on msids rn, and the fact that unicode and unaccent results may differ, we will again miss matches. So I was wondering if devoting time in creating matchable fields rn for artist_name and track_name is a good step, I mean shouldn't we wait till mapping also uses unidecode?
      • 2020-09-26 27013, 2020

      • ruaok
        its a matter of timing and severity of the problem
      • 2020-09-26 27022, 2020

      • ruaok
        timing: I won't be doing mapping work until after the summit
      • 2020-09-26 27000, 2020

      • ruaok
        severity: you're going to get many many more matches on text, but you're going to lose .0001% of those to funky decode mismatches. I bet you won't be able to tell.
      • 2020-09-26 27010, 2020

      • pristine___
        Right
      • 2020-09-26 27050, 2020

      • pristine___
        Cool. The missing mb data endpoint will tell us anyway the matches we missed
      • 2020-09-26 27058, 2020

      • ruaok
        yep.
      • 2020-09-26 27041, 2020

      • gr0uch0mars has quit
      • 2020-09-26 27000, 2020

      • MajorLurker has quit
      • 2020-09-26 27025, 2020

      • shivam-kapila
        iliekcomputers: whats your display res
      • 2020-09-26 27012, 2020

      • Mineo has quit
      • 2020-09-26 27022, 2020

      • Mineo joined the channel
      • 2020-09-26 27033, 2020

      • _lucifer
        pristine___: ping
      • 2020-09-26 27054, 2020

      • pristine___
        _lucifer: pong
      • 2020-09-26 27019, 2020

      • _lucifer
        available for discussing as we decided the other day?
      • 2020-09-26 27042, 2020

      • pristine___
        Yup 👍
      • 2020-09-26 27002, 2020

      • _lucifer
        great!
      • 2020-09-26 27058, 2020

      • pristine___
      • 2020-09-26 27032, 2020

      • pristine___
        Let's start from here, normalization of the input?
      • 2020-09-26 27046, 2020

      • _lucifer
        sure, i had a question before that
      • 2020-09-26 27000, 2020

      • _lucifer
        how does hdfs fit in the picture with spark?
      • 2020-09-26 27058, 2020

      • pristine___
        Yeah, so spark does all the processing of data, and that data is stored in a distributed system, here that distributed system is HDFS
      • 2020-09-26 27034, 2020

      • _lucifer
        ok makes sense, yes so let's continue
      • 2020-09-26 27042, 2020

      • pristine___
        Nice
      • 2020-09-26 27059, 2020

      • pristine___
        So remember you were taking about that medium blog?
      • 2020-09-26 27003, 2020

      • _lucifer
        yup
      • 2020-09-26 27007, 2020

      • pristine___
        Do you have a link?
      • 2020-09-26 27028, 2020

      • _lucifer
        let me see if i can find it
      • 2020-09-26 27011, 2020

      • pristine___
        Cool. Rn, all we do is just count the number of times a user has listened to a song, feed it as such in the recommender
      • 2020-09-26 27037, 2020

      • pristine___
        I guess it is affecting user-user similarity
      • 2020-09-26 27054, 2020

      • pristine___
        in a not so good way, no?
      • 2020-09-26 27034, 2020

      • _lucifer
        yeah right, that is affecting the recs
      • 2020-09-26 27054, 2020

      • _lucifer
        in a bad way, at least theorectically
      • 2020-09-26 27029, 2020

      • _lucifer
        i am unable to find the link but the basic idea is this
      • 2020-09-26 27059, 2020

      • _lucifer
        |X - average| / mean
      • 2020-09-26 27044, 2020

      • pristine___
        X here is the playcount?
      • 2020-09-26 27051, 2020

      • _lucifer
        yes, and my bad it should | playcount - average | / std. deviation
      • 2020-09-26 27035, 2020

      • _lucifer
        average is here to counteract the user's own listening tendencies
      • 2020-09-26 27057, 2020

      • _lucifer
        and std deviation is to bring all users on a same rating scale
      • 2020-09-26 27036, 2020

      • pristine___
        Right.
      • 2020-09-26 27048, 2020

      • shivam-kapila
        I smell variance here
      • 2020-09-26 27019, 2020

      • pristine___
        _lucifer: have you seen the rating beyong the limit error in Sentry?
      • 2020-09-26 27049, 2020

      • _lucifer
        i do have a sentry account 😅. is it open for all?
      • 2020-09-26 27018, 2020

      • shivam-kapila
        You will need an invite
      • 2020-09-26 27042, 2020

      • _lucifer
        you can share the stack trace for the time being i guess
      • 2020-09-26 27001, 2020

      • pristine___
        Not sure. But I will tell you. The ratings given by the recommender to recordings belong to (-1, 3)
      • 2020-09-26 27022, 2020

      • pristine___
        Though we initially thought them to be in (-1, 1)
      • 2020-09-26 27044, 2020

      • pristine___
        I am still not sure about (-1, 3) but that's what I have seen till now.
      • 2020-09-26 27053, 2020

      • pristine___
        These ratings don't make much sense
      • 2020-09-26 27058, 2020

      • pristine___
        imo
      • 2020-09-26 27014, 2020

      • pristine___
        And they are directly dependent on what we feed in, ig
      • 2020-09-26 27025, 2020

      • pristine___
        i.e playcount
      • 2020-09-26 27036, 2020

      • _lucifer
        yeah, (-1, 3) does not make sense at all, its either open ended and the ratings should be considered relative to each other or yeah that
      • 2020-09-26 27050, 2020

      • pristine___
        > |X - average| / mean
      • 2020-09-26 27041, 2020

      • pristine___
        Is that the only metric we have? I am not really good at this stuff, but I guess if we have a few metrics/ways of normalization, we can compare and find the better one.
      • 2020-09-26 27042, 2020

      • shivam-kapila
        Average == mean??
      • 2020-09-26 27001, 2020

      • _lucifer
        not mean the standard deviation, my mistake as I said above 😓
      • 2020-09-26 27034, 2020

      • _lucifer
        there are many ways to normalize yes
      • 2020-09-26 27050, 2020

      • _lucifer
        this is the most basic one imo
      • 2020-09-26 27004, 2020

      • pristine___
        Hmm.. Okay. So to start, we know that the current way of feeding playcount isn't really cool since user interactions can only be interpreted as positive feedback (implicit feedback)
      • 2020-09-26 27024, 2020

      • pristine___
        We can start with the formula you shared above
      • 2020-09-26 27000, 2020

      • _lucifer
        yeah, we can experiment and compare the results
      • 2020-09-26 27011, 2020

      • pristine___
        Would you like to work on this? I mean simply treat the generated playcounts with the above formula and compare results
      • 2020-09-26 27043, 2020

      • pristine___
        You will need to have some data sets in hdfs on your local machine, and you are good to go
      • 2020-09-26 27043, 2020

      • _lucifer
        yeah sure, i am currently setup spark locally and will try to generate recs locally