Hi Freso, I will be in a location today which doesn’t have any communication means, so will have to skip today’s meeting. For my update, I’ve been on a vacation and had a look at yellowhatpro’s work on the android app! Thank you.
alastairp: 163k files checked so far. Nothing found. Do you need any other numbers while we're at it?
2022-07-18 19919, 2022
alastairp
nothing yet
2022-07-18 19920, 2022
alastairp
hmm
2022-07-18 19952, 2022
alastairp
Pratha-Fish: from what I can see of the code, you're just loading it in, checking the first column against your db tables, and then writing the dataframe out again?
2022-07-18 19918, 2022
Pratha-Fish
alastairp: yes that's right
2022-07-18 19929, 2022
alastairp
however, I just randomly sampled a few of your compressed zst files and compared them against the gzip version of the same file, and the resulting uncompressed data is different
Pratha-Fish: That’s what eventually pushed me to drop Windows for good. 🙃 IME Windows is as likely to break as Linux, but with Linux I at least have an idea of what’s going on and a fighting chance to fix it.
2022-07-18 19920, 2022
ansh
Is there any way to retest them on github before merging?
2022-07-18 19920, 2022
Pratha-Fish
Freso: relatable haha but the opposite
2022-07-18 19956, 2022
Pratha-Fish
The only reason why I am sticking with windows at this point is because of excellent software support, and force of habit
alastairp: What should I do while the computation is running?
2022-07-18 19906, 2022
Pratha-Fish
We could jump back on the artist conflation issue, or even start converting all pandas.isin() code to set queries
2022-07-18 19913, 2022
alastairp
Pratha-Fish: I think that the next interesting step is going to be a comparison of our two data lookup methods
2022-07-18 19942, 2022
alastairp
remember back at the beginning of the year when we were explaining that we might need to rewrite some lookup methods in spark or some other faster system?
2022-07-18 19952, 2022
Pratha-Fish
right
2022-07-18 19941, 2022
alastairp
so, given a recording mbid in the data file, we currently have 2 ways of looking up a canonical id: mbid -> canonical mbid table; or mbid -> text metadata -> mapper
2022-07-18 19944, 2022
alastairp
and the previous experiment you did a few weeks back shows that some items give different results
2022-07-18 19913, 2022
alastairp
what we're interested in doing is seeing why these results are different, and what we can do to make them the same
2022-07-18 19942, 2022
alastairp
because ideally we could continue to use the canonical mbid table, because it's super fast (otherwise we need to look up all 27 billion rows in the mapper, which is slow)
2022-07-18 19911, 2022
Pratha-Fish
the mapper method won't complete the computing this year tbh
2022-07-18 19922, 2022
alastairp
so we need to decide if the mapper really is "better" (we don't know what the definition of better is here, we need to investigate the data and make a decision)
2022-07-18 19906, 2022
alastairp
and if it _is_ better, we need to move on to the next steps of seeing if we can re-implement in something faster (spark? something else) in order to do the processing in a reasonable time
2022-07-18 19927, 2022
Pratha-Fish
very interesting :D
2022-07-18 19944, 2022
Pratha-Fish
So I'll take a look at the data first ig. Let's see if there's any patterns
mayhem: lgtm, thanks. it would be nice to add some tests with real mb data as well but currently we don't have MB db in LB tests so I'll open a ticket for it.
lucifer: is it time for us to chat about how to integrate the three separarate branches of fresh releases work we've got going on?
2022-07-18 19913, 2022
lucifer
yes sure
2022-07-18 19921, 2022
mayhem
the fetching of user specific data needs to be added to the endpoint I just added, that is one thing I see.
2022-07-18 19957, 2022
mayhem
and now that I made space for the react work, chinmay can drop his work on top of the blank template that was just merged.
2022-07-18 19958, 2022
lucifer
we have a couple of options there, either spark calls the api or lb fetches the data from db and sends it as a part of the rmq message
2022-07-18 19959, 2022
mayhem
what else?
2022-07-18 19927, 2022
mayhem
I was expecting for LB to fetch the data from couchdb/postgres.
2022-07-18 19957, 2022
mayhem
and for the endpoint to return sidewide fresh releases unless a user name was given.
2022-07-18 19900, 2022
lucifer
rest of the backend is almost done. most of the couchdb integration will be done when migrating stats to it. after that i'll finish the fresh releases pr.
2022-07-18 19930, 2022
mayhem
ok, maybe we should just wait for that to be done before doing more stuff.
2022-07-18 19932, 2022
lucifer
yes, that makes sense.
2022-07-18 19915, 2022
lucifer
a few tests and dumps are pending on that front fwiw.
2022-07-18 19920, 2022
mayhem
ok, ping me if you need anything. I'm going to see if I can classify tracks as high/low energy with the data we have at our disposal.... see if I can make another playlist for users.
2022-07-18 19940, 2022
lucifer
will do. sounds great! :D
2022-07-18 19937, 2022
mayhem
daily jams are making me pretty happy. looking quite nice now. I really need to make a point of listening to them each day to see how things shape up over time.
2022-07-18 19954, 2022
mayhem
BP makes that pretty hard though. It plays a handful of tracks and then halts. :(
2022-07-18 19938, 2022
lucifer
yeah spotify does not have any documentation on how to fix the issue and no one answered on forums either.
2022-07-18 19924, 2022
mayhem
yeah, its fully meh.
2022-07-18 19944, 2022
mayhem
I think I might try my hand at the spotify cache using couchdb as the document store.
2022-07-18 19900, 2022
Sophist_UK has quit
2022-07-18 19925, 2022
Sophist-UK joined the channel
2022-07-18 19949, 2022
riksucks
hi lucifer, are you up?
2022-07-18 19918, 2022
lucifer
riksucks: yes. sorry forgot to answer your question earlier. there are 2 things to consider here: 1) mulitple notifications on feed 2) allowing individual recommendee's to delete a personal notification they received without affecting others.
2022-07-18 19936, 2022
lucifer
also maybe allow the recommender to unsend the recommendation to a particular person without unsending it to others?
2022-07-18 19914, 2022
lucifer
for instance, Instagram allows you to send a post to multiple persons at a time but then you can unsend to a particular person later if you want.
2022-07-18 19958, 2022
riksucks
true, I thought about the 2) one, and realised that in normal recommendation, only the recommender can delete it, and the recommendees can hide it from their timelines. So maybe we can implement a similar feature. Similarly for unsending for a particular person, we can try removing that specific ID from the array, and update it in the DB
2022-07-18 19915, 2022
lucifer
yes that's possible. alternative option is to keep 1 row per user and instead group all the notifications by recording id.
2022-07-18 19958, 2022
lucifer
mayhem, alastairp: thoughts on how to handle this: say a user sends a track recommendation to multiple people. should we create 1) 1 row per user or 2) 1 row with array containing all the users' ids.
2022-07-18 19914, 2022
mayhem
2
2022-07-18 19944, 2022
lucifer
in 1 its easier to handle deletes/hiding. but more work to manage notifications in feed. vice versa in 2.
2022-07-18 19917, 2022
Pratha-Fish
alastairp: I've ran the explainer API on 333 rows of faulty data. Now how do we debug it?
2022-07-18 19924, 2022
riksucks
also lucifer, I wanted to tell you another thing. I was reading up on how postgres handles JSONB, and what happens when we update certain keys or certain parts of that JSONB. Turns out, postgres always writes a new version of the whole row whenever we are updating. Do you think that would create overhead for lots of personal recommendation?