I think that this is why even though we're running npm as root, you're seeing permission denied errors, I think it's trying to write or chown the files, but doing it as your user, not as root. I remember running into this issue in another project where npm was completely screwing up the owner of my database files...
I'm going to delete the prebuilt files that you have (../anshg1214/critiquebrainz/critiquebrainz/frontend/static/build) and re-run it, I think that this will correctly be able to create the directory and write the files with the correct owner
but I'll also add in the trick that we have in LB where we run the commands as a specific user id
lucifer
chinmay: are you unable to create a GlobalAppContextT or do you want to add a new context?
chinmay
I want to add a new context
lucifer
i see, what data will be part of it and how do you intend to use it?
ansh
alastairp: This is interesting! Did it work?
alastairp
ansh: I deleted your build directory and re-ran it (as root), and it changed the owner to you
I tried with docker run's --user flag, and it failed to run (with the same error that lucifer encountered the other week)
it looks like npm now needs to check for your user id in /etc/password, and if it doesn't exist then it fails with an unexplained error
ansh: can you try and start it up again and check that the static builder loads and correctly re-builds when a file changes?
ansh
Sure!
alastairp
I'll open a ticket to suggest that we add a user to our images so that we can run npm
this is an issue for LB too
ansh
It works without any error
alastairp
great!
lucifer
alastairp: why not run npm tests as root?
alastairp
lucifer: this is for running `webpack`
or npm install
lucifer
i see. but that works fine in LB, no?
alastairp
it automatically chown's to the owner of . even if you run it as root
the specific issue we encountered is that when ansh last ran the container with node 12, it left the bundle files owned by root
and then when he re-ran it with node 16 (npm 8, I guess the issue was), it tried to chown the existing files and failed
when I deleted the files and tried again, it successfully builds and chown's
lucifer
ah ok, if the fix is simple, then we should probably do it but if the issue won't come back again as long as we are running node 16 and fixing is complex, it might be fine to leave as is.
alastairp
this specific issue that we encountered is "you had existing build assets owned by root and then you upgraded from node 12 to node 16"
(and you're running on linux, and you don't have sudo)
so it's probably only affecting ansh in this specific case
lucifer
makes sense, yeah it seems very specific and if it won't occur again (acc to my understanding it won't), fine to leave as is i say
alastairp
if we want to take our previous plans and start running everything in a container as a local user... it looks like npm requires a user in /etc/passwd to be able to run
chinmay
lucifer: I want to clean the data from API before taking it any further as there can be duplicates. There will be one array for the this cleaned data. I also want to keep a list of release_mbids that don't have coverarts when the page is loaded. So for example if I want to filter out releases with cover arts only, I'll filter the main data with the list of no coverart releases and pass it to ahead to update the page.
Similar logic will apply for any filters.
What do you think? is there a simple way to implement filters?
lucifer
chinmay: pass the entire data to release page and then maintain multiple arrays/maps inside it. one for full other for filtered one so on?
for making filter, i think a `.filter` or `if` should suffice.
ansh
alastairp: I am unable to login into sentry. I created an account and it gives me an error saying something went wrong.
why is there no tables for feedback dump? because we have a different way of creating the dump?
lucifer
yes those are dumped using a manual sql query.
alastairp
feedback dump isn't designed to be imported into a db, right? so in that case the pre-sorting isn't an issue?
lucifer
further, those are json dumps so the FK issue doesn't happen either.
yup right
alastairp
and I'm just reading the code for feedback dumps, the number name name of files is variable based on the result of the query it seems (so we can't set `tables` to something ahead of time)
it'd be nice to try and set `tables` to None in this case so that we ended up with an exception
because we already have a special if statement for making the feedback dumps, I think it's fine to do what you suggest. However, maybe it makes sense to move this to a separate codeflow anyway instead of having if statements everywhere - not sure how much of this code is shared and how much is different
ansh: I've upgraded sentry. can you try and sign in again?
lucifer
makes sense, that's what i had done initially for couchdb dumps but then it failed so i changed to [] and it worked fine. then i saw dump was empty so investigatd. https://github.com/metabrainz/listenbrainz-serv...
yes +1 on separating json dumps from pg dumps.
alastairp
in addition, it seems that feedback dumps write all files to a temp dir before adding to the archive? we don't dump/add/rm in a loop?
but I guess that's like we do it for all dumps anyway, I can't remember if we tried the previous way to try and minimise transient disk usage
lucifer: LB planning meeting?
lucifer
yup
mayhem: around?
mayhem
yep
lucifer
👍
firstly, i am planning to finish the couchdb integration this week. currently working on dumps, will probably get those working today and start working on tests for dumps tomorrow onwards.
mayhem
good.
lucifer
after that i was thinking of restarting work on rabbitmq stuff.
we had talked about replacing pika with kombu sometime ago and see if that fixes our woes.
mayhem
I think there is also some work that could be done on the similarity stuff...
lucifer
recording similarity?
mayhem
yes. right now I feel frustrated because it is hard to evaluate the results.
we have not guarantee that the algorithm in working or not.
what we need is a dummy test set with an expected set of results
lucifer
i see. makes sense
mayhem
so that we can verify that the algorithm is working. once we have that, then we can start tuning the alg
another thing on that front is the improvement of what constitutes a session.
lucifer
we can create a dummy test but unsure how to determine the baseline set of expected results.
mayhem
we will need to have access to track lengths in spark in order to improve those.
lucifer
we can import release dumps for that i think.
mayhem
well, the dummy test needs to have a clear and contrived set of data.
one where we know exactly what the result should be
and if the result is not what we expect, then we have a bug and need to fix it.
then we need to focus on improving "listening sessions"/.
only when we have reliable sessions, can we build better similarity data.
and once Pratha-Fish is done, we have clearned up data and we can use that for similarity data.
and THAT will push us over the edge of having sufficient data.
lucifer
makes sense
mayhem
so, when you run low on things to do, work on stuff that moves us down this path.
lucifer
do you have thoghts on how to build the test set? we can pick tracks and their similar ones but there likely will be bias.
*pick manually ourselves
mayhem
I would not even use real data.
alastairp
I think we're going to have to generate synthetic data that we know the algorithm will throw back as similar
mayhem
that.
lucifer
i see, makes sense. that should be doable.
about listening sessions, what do you have in mind? currently we use listened_at difference between listens for 30 mins. once we have track length, what should be done differently?
mayhem
look for consecutive tracks and make sure there are no significant gaps.
I'd like to have a tool that allows us to review sessions. have it spit out the sessions for a user for the last week or so.
then we can see if they make sense.
right now, we made a basic assumption and we haven't verified it.
lucifer
i see, so running difference should be less than a specified constant.
+1 to both.
i can work on adding the test set then the sessions tool then adding sessions to that spark query.
mayhem
perfect.
alastairp
cool
lucifer
when will you be back from vacation btw?
mayhem
sept. :/
lucifer
i see.
alastairp: what about you?
mayhem
well, I'll be around a bit each day, but just to answer emails/question