in #metabrainz

13:30 PM
alastairp

I think that this is why even though we're running npm as root, you're seeing permission denied errors, I think it's trying to write or chown the files, but doing it as your user, not as root. I remember running into this issue in another project where npm was completely screwing up the owner of my database files...
13:31 PM
I'm going to delete the prebuilt files that you have (../anshg1214/critiquebrainz/critiquebrainz/frontend/static/build) and re-run it, I think that this will correctly be able to create the directory and write the files with the correct owner
13:31 PM
but I'll also add in the trick that we have in LB where we run the commands as a specific user id
13:36 PM
lucifer

chinmay: are you unable to create a GlobalAppContextT or do you want to add a new context?
13:40 PM
chinmay

I want to add a new context
13:43 PM
lucifer

i see, what data will be part of it and how do you intend to use it?
13:46 PM
ansh

alastairp: This is interesting! Did it work?
13:46 PM
alastairp

ansh: I deleted your build directory and re-ran it (as root), and it changed the owner to you
13:47 PM
I tried with docker run's --user flag, and it failed to run (with the same error that lucifer encountered the other week)
13:47 PM
it looks like npm now needs to check for your user id in /etc/password, and if it doesn't exist then it fails with an unexplained error
13:48 PM
ansh: can you try and start it up again and check that the static builder loads and correctly re-builds when a file changes?
13:48 PM
ansh

Sure!
13:48 PM
alastairp

I'll open a ticket to suggest that we add a user to our images so that we can run npm
13:49 PM
this is an issue for LB too
13:51 PM
ansh

It works without any error
13:52 PM
alastairp

great!
13:52 PM
lucifer

alastairp: why not run npm tests as root?
13:52 PM
alastairp

lucifer: this is for running `webpack`
13:52 PM
or npm install
13:52 PM
lucifer

i see. but that works fine in LB, no?
13:52 PM
alastairp

it automatically chown's to the owner of . even if you run it as root
13:53 PM
the specific issue we encountered is that when ansh last ran the container with node 12, it left the bundle files owned by root
13:53 PM
and then when he re-ran it with node 16 (npm 8, I guess the issue was), it tried to chown the existing files and failed
13:53 PM
when I deleted the files and tried again, it successfully builds and chown's
13:55 PM
lucifer

ah ok, if the fix is simple, then we should probably do it but if the issue won't come back again as long as we are running node 16 and fixing is complex, it might be fine to leave as is.
13:57 PM
alastairp

this specific issue that we encountered is "you had existing build assets owned by root and then you upgraded from node 12 to node 16"
13:57 PM
(and you're running on linux, and you don't have sudo)
13:57 PM
so it's probably only affecting ansh in this specific case
13:58 PM
lucifer

makes sense, yeah it seems very specific and if it won't occur again (acc to my understanding it won't), fine to leave as is i say
13:58 PM
alastairp

if we want to take our previous plans and start running everything in a container as a local user... it looks like npm requires a user in /etc/passwd to be able to run
14:00 PM
chinmay

lucifer: I want to clean the data from API before taking it any further as there can be duplicates. There will be one array for the this cleaned data. I also want to keep a list of release_mbids that don't have coverarts when the page is loaded. So for example if I want to filter out releases with cover arts only, I'll filter the main data with the list of no coverart releases and pass it to ahead to update the page.
14:00 PM
Similar logic will apply for any filters.
14:00 PM
What do you think? is there a simple way to implement filters?
14:04 PM
lucifer

chinmay: pass the entire data to release page and then maintain multiple arrays/maps inside it. one for full other for filtered one so on?
14:08 PM
for making filter, i think a `.filter` or `if` should suffice.
14:08 PM
ansh

alastairp: I am unable to login into sentry. I created an account and it gives me an error saying something went wrong.
14:08 PM
lucifer

https://www.freecodecamp.org/news/how-to-make-a...
14:08 PM
something like this should probably work.
14:09 PM
chinmay

lucifer: I'll go through it
14:09 PM
alastairp

ansh: interesting, let me see if sentry reported an error to itself :)
14:09 PM
lucifer

chinmay: 👍 feel free to ask if you any other doubts.
14:10 PM
chinmay

yeah
14:11 PM
lucifer

this example also looks useful, https://developer.mozilla.org/en-US/docs/Learn/...
14:13 PM
alastairp

ansh: were you able to create an account?
14:13 PM
chinmay

Awesome! I'll check this one out too
14:13 PM
alastairp

or when you went to create it, it returned the error?
14:14 PM
ansh

When I went to create one, it gave me an error
14:14 PM
alastairp

right, because it says that your invite is still active
14:15 PM
I see no errors in sentry
14:15 PM
ansh

this is what it says https://usercontent.irccloud-cdn.com/file/9X3Bp...
14:16 PM
alastairp

yeah, right. weird - so that should definitely be reported as an error
14:16 PM
it's possible that we've not created an account for anyone since we last performed a sentry upgrade
14:16 PM
I'll try and upgrade it again and we can try again
14:17 PM
ansh

yep
14:41 PM
q3lont has quit
15:19 PM
alastairp

https://usercontent.irccloud-cdn.com/file/tQxqk...
15:19 PM
mayhem: closing my office door (10 m^2) certainly makes CO2 peak
15:30 PM
Pratha-Fish

alastairp: I up for discussion whenever you're free 👀
15:31 PM
*I'm
15:32 PM
mayhem

Wow, quite a serious peak at that.
15:32 PM
alastairp

Pratha-Fish: ok, give me 10 minutes, just doing some server maintenance
15:32 PM
Pratha-Fish

sure
15:32 PM
alastairp

yeah, I seem to idle around 400-500 with the door to the rest of the house open. at the moment also got window open + fan on
15:33 PM
yes, it felt sticky and heavy, but not sure how much of that was just hot + humid too
15:33 PM
and I knew that I was making it go up, so maybe it was 90% psychosomatic too
15:35 PM
a friend who has a monitor in his office says that he has a push notification to his phone at 2000 to remind him to open a window
15:36 PM
upgrading sentry, it'll be down temporarily, hopefully not more than 5 mins
15:44 PM
lucifer

mayhem: alastairp: apparently feedback dumps has been broken for a long while now. just discovered when trying to make couchdb dumps.
15:45 PM
mayhem

Huh. How did the dump checker not catch that?
15:45 PM
lucifer

http://data.metabrainz.org/pub/musicbrainz/list...
15:45 PM
dump exists but no data files inside
15:48 PM
alastairp

bug with the query/data collection?
15:48 PM
but it didn't raise an exception?
15:48 PM
lucifer

data was dumped but never added to tarfile
15:48 PM
alastairp

oops
15:49 PM
lucifer

https://github.com/metabrainz/listenbrainz-serv...
15:49 PM
it probably broke here
15:49 PM
tables = [] for user feedback dumps
15:49 PM
alastairp

ah, right
15:50 PM
lucifer

i'll add a `if not tables`: do the old way.
15:50 PM
alastairp

why is there no tables for feedback dump? because we have a different way of creating the dump?
15:51 PM
lucifer

yes those are dumped using a manual sql query.
15:51 PM
alastairp

feedback dump isn't designed to be imported into a db, right? so in that case the pre-sorting isn't an issue?
15:51 PM
lucifer

further, those are json dumps so the FK issue doesn't happen either.
15:51 PM
yup right
15:53 PM
alastairp

and I'm just reading the code for feedback dumps, the number name name of files is variable based on the result of the query it seems (so we can't set `tables` to something ahead of time)
15:53 PM
it'd be nice to try and set `tables` to None in this case so that we ended up with an exception
15:54 PM
because we already have a special if statement for making the feedback dumps, I think it's fine to do what you suggest. However, maybe it makes sense to move this to a separate codeflow anyway instead of having if statements everywhere - not sure how much of this code is shared and how much is different
15:54 PM
ansh: I've upgraded sentry. can you try and sign in again?
15:55 PM
lucifer

makes sense, that's what i had done initially for couchdb dumps but then it failed so i changed to [] and it worked fine. then i saw dump was empty so investigatd. https://github.com/metabrainz/listenbrainz-serv...
15:55 PM
yes +1 on separating json dumps from pg dumps.
15:56 PM
alastairp

in addition, it seems that feedback dumps write all files to a temp dir before adding to the archive? we don't dump/add/rm in a loop?
15:57 PM
but I guess that's like we do it for all dumps anyway, I can't remember if we tried the previous way to try and minimise transient disk usage
16:00 PM
lucifer: LB planning meeting?
16:00 PM
lucifer

yup
16:00 PM
mayhem: around?
16:01 PM
mayhem

yep
16:01 PM
lucifer

👍
16:02 PM
firstly, i am planning to finish the couchdb integration this week. currently working on dumps, will probably get those working today and start working on tests for dumps tomorrow onwards.
16:02 PM
mayhem

good.
16:03 PM
lucifer

after that i was thinking of restarting work on rabbitmq stuff.
16:03 PM
we had talked about replacing pika with kombu sometime ago and see if that fixes our woes.
16:03 PM
mayhem

I think there is also some work that could be done on the similarity stuff...
16:03 PM
lucifer

recording similarity?
16:04 PM
mayhem

yes. right now I feel frustrated because it is hard to evaluate the results.
16:04 PM
we have not guarantee that the algorithm in working or not.
16:04 PM
lucifer

i can add a datasethoster query if it helps.
16:04 PM
yuzie joined the channel
16:05 PM
fwiw these datasets should be present on wolf currently, https://docs.google.com/document/d/1BJbXFPqgu2x...
16:05 PM
mayhem

yes, I've looked at them, but just shrugged.
16:05 PM
unclear on how to proceed.
16:06 PM
what we need is a dummy test set with an expected set of results
16:06 PM
lucifer

i see. makes sense
16:06 PM
mayhem

so that we can verify that the algorithm is working. once we have that, then we can start tuning the alg
16:06 PM
another thing on that front is the improvement of what constitutes a session.
16:07 PM
lucifer

we can create a dummy test but unsure how to determine the baseline set of expected results.
16:07 PM
mayhem

we will need to have access to track lengths in spark in order to improve those.
16:07 PM
lucifer

we can import release dumps for that i think.
16:07 PM
mayhem

well, the dummy test needs to have a clear and contrived set of data.
16:07 PM
one where we know exactly what the result should be
16:08 PM
and if the result is not what we expect, then we have a bug and need to fix it.
16:08 PM
then we need to focus on improving "listening sessions"/.
16:08 PM
only when we have reliable sessions, can we build better similarity data.
16:09 PM
and once Pratha-Fish is done, we have clearned up data and we can use that for similarity data.
16:09 PM
and THAT will push us over the edge of having sufficient data.
16:09 PM
lucifer

makes sense
16:09 PM
mayhem

so, when you run low on things to do, work on stuff that moves us down this path.
16:10 PM
lucifer

do you have thoghts on how to build the test set? we can pick tracks and their similar ones but there likely will be bias.
16:10 PM
*pick manually ourselves
16:11 PM
mayhem

I would not even use real data.
16:11 PM
alastairp

I think we're going to have to generate synthetic data that we know the algorithm will throw back as similar
16:11 PM
mayhem

that.
16:11 PM
lucifer

i see, makes sense. that should be doable.
16:12 PM
about listening sessions, what do you have in mind? currently we use listened_at difference between listens for 30 mins. once we have track length, what should be done differently?
16:13 PM
mayhem

look for consecutive tracks and make sure there are no significant gaps.
16:14 PM
I'd like to have a tool that allows us to review sessions. have it spit out the sessions for a user for the last week or so.
16:14 PM
then we can see if they make sense.
16:14 PM
right now, we made a basic assumption and we haven't verified it.
16:14 PM
lucifer

i see, so running difference should be less than a specified constant.
16:15 PM
+1 to both.
16:16 PM
i can work on adding the test set then the sessions tool then adding sessions to that spark query.
16:16 PM
mayhem

perfect.
16:17 PM
alastairp

cool
16:17 PM
lucifer

when will you be back from vacation btw?
16:18 PM
mayhem

sept. :/
16:18 PM
lucifer

i see.
16:18 PM
alastairp: what about you?
16:18 PM
mayhem

well, I'll be around a bit each day, but just to answer emails/question
16:18 PM
alastairp

I expect to be back from 22 Aug
16:19 PM
lucifer

ok cool, lets discuss some more tickets then.