#metabrainz

/

0:14 AM
pristine___

iliekcomputers:

2020-09-26 27004, 2020

0:14 AM
pristine___

https://gist.github.com/vansika/cf1d7439c7c57deea…

2020-09-26 27003, 2020

0:18 AM
pristine___

Let me know when you are around! This needs to be addressed asap,, recs of people are not shown on the site.

2020-09-26 27032, 2020

0:25 AM
supersandro2000 has quit

2020-09-26 27053, 2020

0:25 AM
supersandro2000 joined the channel

2020-09-26 27011, 2020

1:57 AM
tn5421 joined the channel

2020-09-26 27021, 2020

2:00 AM
d4rkie has quit

2020-09-26 27055, 2020

2:00 AM
Nyanko-sensei joined the channel

2020-09-26 27038, 2020

4:26 AM
thomasross has quit

2020-09-26 27042, 2020

4:31 AM
_lucifer

pristine___: does the source get zipped and then imported by spark ?

2020-09-26 27009, 2020

4:36 AM
_lucifer

yeah it does according to this

2020-09-26 27014, 2020

4:36 AM
_lucifer

https://github.com/metabrainz/listenbrainz-server…

2020-09-26 27050, 2020

4:37 AM
_lucifer

pristine___: i might be wrong but moving the `UserRecommendationsRecord` and `UserRecommendationsMessage` to somewhere inside `listenbrainz_spark` folder should probably fix the issue

2020-09-26 27055, 2020

4:38 AM
_lucifer

right now, the `data` folder at the root is not included in the source zip, so spark is unable to find those files and hence errors

2020-09-26 27043, 2020

5:20 AM
pristine___

https://github.com/metabrainz/listenbrainz-server…

2020-09-26 27010, 2020

5:21 AM
pristine___

_lucifer: this above file runs perfectly, so I am not sure if it's a zip issue

2020-09-26 27015, 2020

5:23 AM
_lucifer

pristine___: yeah in that case, you are right. it shouldn't be a zip issue then

2020-09-26 27042, 2020

6:00 AM
iliekcomputers

pristine___: hey

2020-09-26 27004, 2020

6:02 AM
iliekcomputers

looking now

2020-09-26 27033, 2020

6:02 AM
iliekcomputers

although the question is, if the errors are in request consumer, why is it affecting the site?

2020-09-26 27054, 2020

6:09 AM
iliekcomputers

pristine___: i'm not completely sure what the issue is, i'll restart the cnsumer and spark workers too this time for safety.

2020-09-26 27014, 2020

6:10 AM
iliekcomputers

pristine___: i also don't understand how this led to user facing errors? and how to remediate those

2020-09-26 27050, 2020

6:18 AM
iliekcomputers

pristine___: I see errors for the userfacing issues, I figure those need to be handled? https://sentry.metabrainz.org/metabrainz/listenbr…

2020-09-26 27058, 2020

6:20 AM
iliekcomputers

pristine___: i've triggered a new recommendations job, but i don't think i can do much else, the data validation errors need to be fixed.

2020-09-26 27046, 2020

6:29 AM
iliekcomputers

pristine___: the job failed again, i'm not sure what the issue is, it'll need to be debugged in dev i guess. if this needs to be fixed quick, we should revert https://github.com/metabrainz/listenbrainz-server… and deploy again.

2020-09-26 27051, 2020

6:30 AM
iliekcomputers

other than that, things look reasonable to me, so i'm stepping away for now.

2020-09-26 27037, 2020

6:54 AM
pristine___

> although the question is, if the errors are in request consumer, why is it affecting the site?

2020-09-26 27033, 2020

6:55 AM
pristine___

It is affecting site in way that users see "recommendations for the user not generated, check back later", it's a valid message but if the scripts runs success fully users will be able to see their recs.

2020-09-26 27039, 2020

6:55 AM
pristine___

iliekcomputers: ^

2020-09-26 27036, 2020

6:56 AM
iliekcomputers

Why is the site not showing the old recommendations

2020-09-26 27057, 2020

6:57 AM
pristine___

Because the older recs are not according to Pydantic format. I triggered generate recommendations so that the recs are in suitable format and will be shown on site, but the script failed because of `data module not found` error

2020-09-26 27024, 2020

6:59 AM
pristine___

iliekcomputers: I don't think there is an need to revert the PR, I will just open a PR to remove the data module usage from recommend.py and it will work.

2020-09-26 27011, 2020

7:00 AM
iliekcomputers

We should investigate why the import fails

2020-09-26 27020, 2020

7:00 AM
iliekcomputers

Does it work in dev?

2020-09-26 27020, 2020

7:00 AM
pristine___

Though it is weird, the error. Because the data module works for one script and doesn't for the other

2020-09-26 27030, 2020

7:00 AM
pristine___

Yeah, works in dev.

2020-09-26 27047, 2020

7:00 AM
iliekcomputers

That is very weird

2020-09-26 27019, 2020

7:01 AM
iliekcomputers

Let's open a ticket to investigate what exactly the issue is

2020-09-26 27034, 2020

7:01 AM
pristine___

Right. Also, regarding the site, it looks better with this PR in a way that check back later is a better message than ISE, imo

2020-09-26 27041, 2020

7:01 AM
pristine___

Cool. I will open a ticket

2020-09-26 27010, 2020

7:03 AM
iliekcomputers

Cool, I didn't really understand the urgency of this, but I'm happy with that plan

2020-09-26 27006, 2020

7:04 AM
pristine___

iliekcomputers: steps

2020-09-26 27006, 2020

7:04 AM
pristine___

1 open a PR to remove data module usage from recommend.py ( your comment on the PR last night will fulfill this purpose)

2020-09-26 27006, 2020

7:04 AM
pristine___

2. Merge the PR, restart request consumer.

2020-09-26 27006, 2020

7:04 AM
pristine___

3 open a ticket to fix the issue.

2020-09-26 27015, 2020

7:04 AM
pristine___

iliekcomputers: Urgency of what?

2020-09-26 27016, 2020

7:05 AM
iliekcomputers

Llke how urgent fixing the error was

2020-09-26 27058, 2020

7:05 AM
iliekcomputers

Steps look good to me

2020-09-26 27013, 2020

7:07 AM
pristine___

iliekcomputers: So that users can see their recs, I am of the opinion that the rec feature is new, so users might be interested in checking their recs, I'd just don't want them to see check back later message, when a few days back we said that go check your recs and give feedback.

2020-09-26 27009, 2020

7:11 AM
iliekcomputers

Makes sense, which is why I suggested reverting

2020-09-26 27036, 2020

7:12 AM
pristine___

iliekcomputers: But then a few users will get ISE, no? *Check back later* is better than ISE, and recs better than *check back later*. Give me an hour, have just woken up, I will make a PR in an hour or so.

2020-09-26 27011, 2020

7:13 AM
iliekcomputers

Sure.

2020-09-26 27017, 2020

7:53 AM
v6lur joined the channel

2020-09-26 27040, 2020

8:40 AM
_lucifer

alastairp: there are some issues regarding gh:CB#311.

2020-09-26 27041, 2020

8:40 AM
BrainzBot

CB-382: You can sometimes create a review without a revision: https://github.com/metabrainz/critiquebrainz/pull…

2020-09-26 27058, 2020

8:41 AM
_lucifer

It is not working as expected because the create revision function itself calls other functions like `review.get_by_id` and avg_rating.update`.

2020-09-26 27001, 2020

8:43 AM
_lucifer

it seems that the created revision is not yet committed when the other two operations are executed hence there is a mismatch.

2020-09-26 27026, 2020

8:44 AM
_lucifer

i think this can be probably fixed if we pass the connection to those functions as well but that means adding an optional connection to almost all of the db operations

2020-09-26 27054, 2020

8:44 AM
_lucifer

i am not sure if there is a better solution

2020-09-26 27001, 2020

9:05 AM
Gazooo794 has quit

2020-09-26 27048, 2020

9:06 AM
Gazooo794 joined the channel

2020-09-26 27057, 2020

9:15 AM
BrainzGit

[listenbrainz-server] vansika opened pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-server…

2020-09-26 27050, 2020

9:33 AM
pristine___

iliekcomputers: opening a ticket now :)

2020-09-26 27015, 2020

9:34 AM
BrainzGit

[listenbrainz-server] paramsingh merged pull request #1110 (master…redundant-recommend-code): remove redundant dict->pydantic->dict conversion from recommend.py https://github.com/metabrainz/listenbrainz-server…

2020-09-26 27039, 2020

9:51 AM
iliekcomputers

pristine___: restarted

2020-09-26 27047, 2020

9:51 AM
iliekcomputers

pristine___: please link the ticket here once, thanks!

2020-09-26 27013, 2020

9:52 AM
iliekcomputers

ruaok: I decided to look at the checkout integration this weekend https://usercontent.irccloud-cdn.com/file/Gl7srXf…

2020-09-26 27022, 2020

9:52 AM
iliekcomputers

while I wait for reviews on the feed page thing

2020-09-26 27025, 2020

9:52 AM
pristine___

Cool

2020-09-26 27004, 2020

9:59 AM
pristine___

iliekcomputers: recs are on site now.

2020-09-26 27018, 2020

9:59 AM
iliekcomputers

awesome

2020-09-26 27015, 2020

10:02 AM
pristine___

And the ISE resolved!

2020-09-26 27008, 2020

10:03 AM
pristine___

Though I still don't know why recs for the user aren't in the expected format but atleast the user will no more see ISE. In the meantime I will try to look into this.

2020-09-26 27045, 2020

10:03 AM
iliekcomputers

sounds good.

2020-09-26 27015, 2020

10:07 AM
gr0uch0mars joined the channel

2020-09-26 27054, 2020

10:10 AM
ruaok

iliekcomputers: looks promising!

2020-09-26 27024, 2020

10:11 AM
iliekcomputers

i lifted the lichess text :D

2020-09-26 27038, 2020

10:11 AM
ruaok

all art is theft. :)

2020-09-26 27011, 2020

10:26 AM
v6lur has quit

2020-09-26 27047, 2020

11:03 AM
Glycem has quit

2020-09-26 27013, 2020

11:10 AM
Glycem joined the channel

2020-09-26 27002, 2020

11:27 AM
pristine___

ruaok: what if postgres' `unaccent` and python's `unidecode` gives different result for the same accented string

2020-09-26 27044, 2020

11:27 AM
ruaok

we'll miss matches.

2020-09-26 27003, 2020

11:28 AM
ruaok

I think it might be best for me to move to unidecode for my next round of mapping work.

2020-09-26 27058, 2020

11:28 AM
pristine___

> we'll miss matches.

2020-09-26 27037, 2020

11:32 AM
pristine___

ruaok: Right. we already miss matches since we are joining on msids rn, and the fact that unicode and unaccent results may differ, we will again miss matches. So I was wondering if devoting time in creating matchable fields rn for artist_name and track_name is a good step, I mean shouldn't we wait till mapping also uses unidecode?

2020-09-26 27013, 2020

11:33 AM
ruaok

its a matter of timing and severity of the problem

2020-09-26 27022, 2020

11:33 AM
ruaok

timing: I won't be doing mapping work until after the summit

2020-09-26 27000, 2020

11:34 AM
ruaok

severity: you're going to get many many more matches on text, but you're going to lose .0001% of those to funky decode mismatches. I bet you won't be able to tell.

2020-09-26 27010, 2020

11:34 AM
pristine___

Right

2020-09-26 27050, 2020

11:34 AM
pristine___

Cool. The missing mb data endpoint will tell us anyway the matches we missed

2020-09-26 27058, 2020

11:34 AM
ruaok

yep.

2020-09-26 27041, 2020

12:16 PM
gr0uch0mars has quit

2020-09-26 27000, 2020

12:40 PM
MajorLurker has quit

2020-09-26 27025, 2020

13:22 PM
shivam-kapila

iliekcomputers: whats your display res

2020-09-26 27012, 2020

13:37 PM
Mineo has quit

2020-09-26 27022, 2020

13:37 PM
Mineo joined the channel

2020-09-26 27033, 2020

13:39 PM
_lucifer

pristine___: ping

2020-09-26 27054, 2020

13:47 PM
pristine___

_lucifer: pong

2020-09-26 27019, 2020

13:48 PM
_lucifer

available for discussing as we decided the other day?

2020-09-26 27042, 2020

13:48 PM
pristine___

Yup 👍

2020-09-26 27002, 2020

13:49 PM
_lucifer

great!

2020-09-26 27058, 2020

13:49 PM
pristine___

https://github.com/metabrainz/listenbrainz-server…

2020-09-26 27032, 2020

13:50 PM
pristine___

Let's start from here, normalization of the input?

2020-09-26 27046, 2020

13:50 PM
_lucifer

sure, i had a question before that

2020-09-26 27000, 2020

13:51 PM
_lucifer

how does hdfs fit in the picture with spark?

2020-09-26 27058, 2020

13:51 PM
pristine___

Yeah, so spark does all the processing of data, and that data is stored in a distributed system, here that distributed system is HDFS

2020-09-26 27034, 2020

13:52 PM
_lucifer

ok makes sense, yes so let's continue

2020-09-26 27042, 2020

13:52 PM
pristine___

Nice

2020-09-26 27059, 2020

13:52 PM
pristine___

So remember you were taking about that medium blog?

2020-09-26 27003, 2020

13:53 PM
_lucifer

yup

2020-09-26 27007, 2020

13:53 PM
pristine___

Do you have a link?

2020-09-26 27028, 2020

13:53 PM
_lucifer

let me see if i can find it

2020-09-26 27011, 2020

13:54 PM
pristine___

Cool. Rn, all we do is just count the number of times a user has listened to a song, feed it as such in the recommender

2020-09-26 27037, 2020

13:54 PM
pristine___

I guess it is affecting user-user similarity

2020-09-26 27054, 2020

13:54 PM
pristine___

in a not so good way, no?

2020-09-26 27034, 2020

13:57 PM
_lucifer

yeah right, that is affecting the recs

2020-09-26 27054, 2020

13:57 PM
_lucifer

in a bad way, at least theorectically

2020-09-26 27029, 2020

13:58 PM
_lucifer

i am unable to find the link but the basic idea is this

2020-09-26 27059, 2020

13:58 PM
_lucifer

|X - average| / mean

2020-09-26 27044, 2020

13:59 PM
pristine___

X here is the playcount?

2020-09-26 27051, 2020

14:00 PM
_lucifer

yes, and my bad it should | playcount - average | / std. deviation

2020-09-26 27035, 2020

14:01 PM
_lucifer

average is here to counteract the user's own listening tendencies

2020-09-26 27057, 2020

14:01 PM
_lucifer

and std deviation is to bring all users on a same rating scale

2020-09-26 27036, 2020

14:02 PM
pristine___

Right.

2020-09-26 27048, 2020

14:02 PM
shivam-kapila

I smell variance here

2020-09-26 27019, 2020

14:03 PM
pristine___

_lucifer: have you seen the rating beyong the limit error in Sentry?

2020-09-26 27049, 2020

14:03 PM
_lucifer

i do have a sentry account 😅. is it open for all?

2020-09-26 27018, 2020

14:04 PM
shivam-kapila

You will need an invite

2020-09-26 27042, 2020

14:04 PM
_lucifer

you can share the stack trace for the time being i guess

2020-09-26 27001, 2020

14:05 PM
pristine___

Not sure. But I will tell you. The ratings given by the recommender to recordings belong to (-1, 3)

2020-09-26 27022, 2020

14:05 PM
pristine___

Though we initially thought them to be in (-1, 1)

2020-09-26 27044, 2020

14:05 PM
pristine___

I am still not sure about (-1, 3) but that's what I have seen till now.

2020-09-26 27053, 2020

14:05 PM
pristine___

These ratings don't make much sense

2020-09-26 27058, 2020

14:05 PM
pristine___

imo

2020-09-26 27014, 2020

14:06 PM
pristine___

And they are directly dependent on what we feed in, ig

2020-09-26 27025, 2020

14:06 PM
pristine___

i.e playcount

2020-09-26 27036, 2020

14:06 PM
_lucifer

yeah, (-1, 3) does not make sense at all, its either open ended and the ratings should be considered relative to each other or yeah that

2020-09-26 27050, 2020

14:06 PM
pristine___

> |X - average| / mean

2020-09-26 27041, 2020

14:07 PM
pristine___

Is that the only metric we have? I am not really good at this stuff, but I guess if we have a few metrics/ways of normalization, we can compare and find the better one.

2020-09-26 27042, 2020

14:07 PM
shivam-kapila

Average == mean??

2020-09-26 27001, 2020

14:08 PM
_lucifer

not mean the standard deviation, my mistake as I said above 😓

2020-09-26 27034, 2020

14:08 PM
_lucifer

there are many ways to normalize yes

2020-09-26 27050, 2020

14:08 PM
_lucifer

this is the most basic one imo

2020-09-26 27004, 2020

14:10 PM
pristine___

Hmm.. Okay. So to start, we know that the current way of feeding playcount isn't really cool since user interactions can only be interpreted as positive feedback (implicit feedback)

2020-09-26 27024, 2020

14:10 PM
pristine___

We can start with the formula you shared above

2020-09-26 27000, 2020

14:11 PM
_lucifer

yeah, we can experiment and compare the results

2020-09-26 27011, 2020

14:11 PM
pristine___

Would you like to work on this? I mean simply treat the generated playcounts with the above formula and compare results

2020-09-26 27043, 2020

14:11 PM
pristine___

You will need to have some data sets in hdfs on your local machine, and you are good to go

2020-09-26 27043, 2020

14:11 PM
_lucifer

yeah sure, i am currently setup spark locally and will try to generate recs locally