in #metabrainz

0:14 AM
pristine___

iliekcomputers:
0:14 AM
https://gist.github.com/vansika/cf1d7439c7c57de...
0:18 AM
Let me know when you are around! This needs to be addressed asap,, recs of people are not shown on the site.
0:25 AM
supersandro2000 has quit
0:25 AM
supersandro2000 joined the channel
1:57 AM
tn5421 joined the channel
2:00 AM
d4rkie has quit
2:00 AM
Nyanko-sensei joined the channel
4:26 AM
thomasross has quit
4:31 AM
_lucifer

pristine___: does the source get zipped and then imported by spark ?
4:36 AM
yeah it does according to this
4:36 AM
https://github.com/metabrainz/listenbrainz-serv...
4:37 AM
pristine___: i might be wrong but moving the `UserRecommendationsRecord` and `UserRecommendationsMessage` to somewhere inside `listenbrainz_spark` folder should probably fix the issue
4:38 AM
right now, the `data` folder at the root is not included in the source zip, so spark is unable to find those files and hence errors
5:20 AM
pristine___

https://github.com/metabrainz/listenbrainz-serv...
5:21 AM
_lucifer: this above file runs perfectly, so I am not sure if it's a zip issue
5:23 AM
_lucifer

pristine___: yeah in that case, you are right. it shouldn't be a zip issue then
6:00 AM
iliekcomputers

pristine___: hey
6:02 AM
looking now
6:02 AM
although the question is, if the errors are in request consumer, why is it affecting the site?
6:09 AM
pristine___: i'm not completely sure what the issue is, i'll restart the cnsumer and spark workers too this time for safety.
6:10 AM
pristine___: i also don't understand how this led to user facing errors? and how to remediate those
6:18 AM
pristine___: I see errors for the userfacing issues, I figure those need to be handled? https://sentry.metabrainz.org/metabrainz/listen...
6:20 AM
pristine___: i've triggered a new recommendations job, but i don't think i can do much else, the data validation errors need to be fixed.
6:29 AM
pristine___: the job failed again, i'm not sure what the issue is, it'll need to be debugged in dev i guess. if this needs to be fixed quick, we should revert https://github.com/metabrainz/listenbrainz-serv... and deploy again.
6:30 AM
other than that, things look reasonable to me, so i'm stepping away for now.
6:54 AM
pristine___

> although the question is, if the errors are in request consumer, why is it affecting the site?
6:55 AM
It is affecting site in way that users see "recommendations for the user not generated, check back later", it's a valid message but if the scripts runs success fully users will be able to see their recs.
6:55 AM
iliekcomputers: ^
6:56 AM
iliekcomputers

Why is the site not showing the old recommendations
6:57 AM
pristine___

Because the older recs are not according to Pydantic format. I triggered generate recommendations so that the recs are in suitable format and will be shown on site, but the script failed because of `data module not found` error
6:59 AM
iliekcomputers: I don't think there is an need to revert the PR, I will just open a PR to remove the data module usage from recommend.py and it will work.
7:00 AM
iliekcomputers

We should investigate why the import fails
7:00 AM
Does it work in dev?
7:00 AM
pristine___

Though it is weird, the error. Because the data module works for one script and doesn't for the other
7:00 AM
Yeah, works in dev.
7:00 AM
iliekcomputers

That is very weird
7:01 AM
Let's open a ticket to investigate what exactly the issue is
7:01 AM
pristine___

Right. Also, regarding the site, it looks better with this PR in a way that check back later is a better message than ISE, imo
7:01 AM
Cool. I will open a ticket
7:03 AM
iliekcomputers

Cool, I didn't really understand the urgency of this, but I'm happy with that plan
7:04 AM
pristine___

iliekcomputers: steps
7:04 AM
1 open a PR to remove data module usage from recommend.py ( your comment on the PR last night will fulfill this purpose)
7:04 AM
2. Merge the PR, restart request consumer.
7:04 AM
3 open a ticket to fix the issue.
7:04 AM
iliekcomputers: Urgency of what?
7:05 AM
iliekcomputers

Llke how urgent fixing the error was
7:05 AM
Steps look good to me
7:07 AM
pristine___

iliekcomputers: So that users can see their recs, I am of the opinion that the rec feature is new, so users might be interested in checking their recs, I'd just don't want them to see check back later message, when a few days back we said that go check your recs and give feedback.
7:11 AM
iliekcomputers

Makes sense, which is why I suggested reverting
7:12 AM
pristine___

iliekcomputers: But then a few users will get ISE, no? *Check back later* is better than ISE, and recs better than *check back later*. Give me an hour, have just woken up, I will make a PR in an hour or so.
7:13 AM
iliekcomputers

Sure.
7:53 AM
v6lur joined the channel
8:40 AM
_lucifer

alastairp: there are some issues regarding gh:CB#311.
8:40 AM
BrainzBot

CB-382: You can sometimes create a review without a revision: https://github.com/metabrainz/critiquebrainz/pu...
8:41 AM
_lucifer

It is not working as expected because the create revision function itself calls other functions like `review.get_by_id` and avg_rating.update`.
8:43 AM
it seems that the created revision is not yet committed when the other two operations are executed hence there is a mismatch.
8:44 AM
i think this can be probably fixed if we pass the connection to those functions as well but that means adding an optional connection to almost all of the db operations
8:44 AM
i am not sure if there is a better solution
9:05 AM
Gazooo794 has quit
9:06 AM
Gazooo794 joined the channel
9:15 AM
BrainzGit

[listenbrainz-server] vansika opened pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-serv...
9:33 AM
pristine___

iliekcomputers: opening a ticket now :)
9:34 AM
BrainzGit

[listenbrainz-server] paramsingh merged pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-serv...
9:51 AM
iliekcomputers

pristine___: restarted
9:51 AM
pristine___: please link the ticket here once, thanks!
9:52 AM
ruaok: I decided to look at the checkout integration this weekend https://usercontent.irccloud-cdn.com/file/Gl7sr...
9:52 AM
while I wait for reviews on the feed page thing
9:52 AM
pristine___

Cool
9:59 AM
iliekcomputers: recs are on site now.
9:59 AM
iliekcomputers

awesome
10:02 AM
pristine___

And the ISE resolved!
10:03 AM
Though I still don't know why recs for the user aren't in the expected format but atleast the user will no more see ISE. In the meantime I will try to look into this.
10:03 AM
iliekcomputers

sounds good.
10:07 AM
gr0uch0mars joined the channel
10:10 AM
ruaok

iliekcomputers: looks promising!
10:11 AM
iliekcomputers

i lifted the lichess text :D
10:11 AM
ruaok

all art is theft. :)
10:26 AM
v6lur has quit
11:03 AM
Glycem has quit
11:10 AM
Glycem joined the channel
11:27 AM
pristine___

ruaok: what if postgres' `unaccent` and python's `unidecode` gives different result for the same accented string
11:27 AM
ruaok

we'll miss matches.
11:28 AM
I think it might be best for me to move to unidecode for my next round of mapping work.
11:28 AM
pristine___

> we'll miss matches.
11:32 AM
ruaok: Right. we already miss matches since we are joining on msids rn, and the fact that unicode and unaccent results may differ, we will again miss matches. So I was wondering if devoting time in creating matchable fields rn for artist_name and track_name is a good step, I mean shouldn't we wait till mapping also uses unidecode?
11:33 AM
ruaok

its a matter of timing and severity of the problem
11:33 AM
timing: I won't be doing mapping work until after the summit
11:34 AM
severity: you're going to get many many more matches on text, but you're going to lose .0001% of those to funky decode mismatches. I bet you won't be able to tell.
11:34 AM
pristine___

Right
11:34 AM
Cool. The missing mb data endpoint will tell us anyway the matches we missed
11:34 AM
ruaok

yep.
12:16 PM
gr0uch0mars has quit
12:40 PM
MajorLurker has quit
13:22 PM
shivam-kapila

iliekcomputers: whats your display res
13:37 PM
Mineo has quit
13:37 PM
Mineo joined the channel
13:39 PM
_lucifer

pristine___: ping
13:47 PM
pristine___

_lucifer: pong
13:48 PM
_lucifer

available for discussing as we decided the other day?
13:48 PM
pristine___

Yup 👍
13:49 PM
_lucifer

great!
13:49 PM
pristine___

https://github.com/metabrainz/listenbrainz-serv...
13:50 PM
Let's start from here, normalization of the input?
13:50 PM
_lucifer

sure, i had a question before that
13:51 PM
how does hdfs fit in the picture with spark?
13:51 PM
pristine___

Yeah, so spark does all the processing of data, and that data is stored in a distributed system, here that distributed system is HDFS
13:52 PM
_lucifer

ok makes sense, yes so let's continue
13:52 PM
pristine___

Nice
13:52 PM
So remember you were taking about that medium blog?
13:53 PM
_lucifer

yup
13:53 PM
pristine___

Do you have a link?
13:53 PM
_lucifer

let me see if i can find it
13:54 PM
pristine___

Cool. Rn, all we do is just count the number of times a user has listened to a song, feed it as such in the recommender
13:54 PM
I guess it is affecting user-user similarity
13:54 PM
in a not so good way, no?
13:57 PM
_lucifer

yeah right, that is affecting the recs
13:57 PM
in a bad way, at least theorectically
13:58 PM
i am unable to find the link but the basic idea is this
13:58 PM
|X - average| / mean
13:59 PM
pristine___

X here is the playcount?
14:00 PM
_lucifer

yes, and my bad it should | playcount - average | / std. deviation
14:01 PM
average is here to counteract the user's own listening tendencies
14:01 PM
and std deviation is to bring all users on a same rating scale
14:02 PM
pristine___

Right.
14:02 PM
shivam-kapila

I smell variance here
14:03 PM
pristine___

_lucifer: have you seen the rating beyong the limit error in Sentry?
14:03 PM
_lucifer

i do have a sentry account 😅. is it open for all?
14:04 PM
shivam-kapila

You will need an invite
14:04 PM
_lucifer

you can share the stack trace for the time being i guess
14:05 PM
pristine___

Not sure. But I will tell you. The ratings given by the recommender to recordings belong to (-1, 3)
14:05 PM
Though we initially thought them to be in (-1, 1)
14:05 PM
I am still not sure about (-1, 3) but that's what I have seen till now.
14:05 PM
These ratings don't make much sense
14:05 PM
imo
14:06 PM
And they are directly dependent on what we feed in, ig
14:06 PM
i.e playcount
14:06 PM
_lucifer

yeah, (-1, 3) does not make sense at all, its either open ended and the ratings should be considered relative to each other or yeah that
14:06 PM
pristine___

> |X - average| / mean
14:07 PM
Is that the only metric we have? I am not really good at this stuff, but I guess if we have a few metrics/ways of normalization, we can compare and find the better one.
14:07 PM
shivam-kapila

Average == mean??
14:08 PM
_lucifer

not mean the standard deviation, my mistake as I said above 😓
14:08 PM
there are many ways to normalize yes
14:08 PM
this is the most basic one imo
14:10 PM
pristine___

Hmm.. Okay. So to start, we know that the current way of feeding playcount isn't really cool since user interactions can only be interpreted as positive feedback (implicit feedback)
14:10 PM
We can start with the formula you shared above
14:11 PM
_lucifer

yeah, we can experiment and compare the results
14:11 PM
pristine___

Would you like to work on this? I mean simply treat the generated playcounts with the above formula and compare results
14:11 PM
You will need to have some data sets in hdfs on your local machine, and you are good to go
14:11 PM
_lucifer

yeah sure, i am currently setup spark locally and will try to generate recs locally