Hi Freso, I will be in a location today which doesn’t have any communication means, so will have to skip today’s meeting. For my update, I’ve been on a vacation and had a look at yellowhatpro’s work on the android app! Thank you.
alastairp: 163k files checked so far. Nothing found. Do you need any other numbers while we're at it?
alastairp
nothing yet
hmm
Pratha-Fish: from what I can see of the code, you're just loading it in, checking the first column against your db tables, and then writing the dataframe out again?
Pratha-Fish
alastairp: yes that's right
alastairp
however, I just randomly sampled a few of your compressed zst files and compared them against the gzip version of the same file, and the resulting uncompressed data is different
Pratha-Fish: That’s what eventually pushed me to drop Windows for good. 🙃 IME Windows is as likely to break as Linux, but with Linux I at least have an idea of what’s going on and a fighting chance to fix it.
ansh
Is there any way to retest them on github before merging?
Pratha-Fish
Freso: relatable haha but the opposite
The only reason why I am sticking with windows at this point is because of excellent software support, and force of habit
alastairp: What should I do while the computation is running?
We could jump back on the artist conflation issue, or even start converting all pandas.isin() code to set queries
alastairp
Pratha-Fish: I think that the next interesting step is going to be a comparison of our two data lookup methods
remember back at the beginning of the year when we were explaining that we might need to rewrite some lookup methods in spark or some other faster system?
Pratha-Fish
right
alastairp
so, given a recording mbid in the data file, we currently have 2 ways of looking up a canonical id: mbid -> canonical mbid table; or mbid -> text metadata -> mapper
and the previous experiment you did a few weeks back shows that some items give different results
what we're interested in doing is seeing why these results are different, and what we can do to make them the same
because ideally we could continue to use the canonical mbid table, because it's super fast (otherwise we need to look up all 27 billion rows in the mapper, which is slow)
Pratha-Fish
the mapper method won't complete the computing this year tbh
alastairp
so we need to decide if the mapper really is "better" (we don't know what the definition of better is here, we need to investigate the data and make a decision)
and if it _is_ better, we need to move on to the next steps of seeing if we can re-implement in something faster (spark? something else) in order to do the processing in a reasonable time
Pratha-Fish
very interesting :D
So I'll take a look at the data first ig. Let's see if there's any patterns
mayhem: lgtm, thanks. it would be nice to add some tests with real mb data as well but currently we don't have MB db in LB tests so I'll open a ticket for it.
lucifer: is it time for us to chat about how to integrate the three separarate branches of fresh releases work we've got going on?
lucifer
yes sure
mayhem
the fetching of user specific data needs to be added to the endpoint I just added, that is one thing I see.
and now that I made space for the react work, chinmay can drop his work on top of the blank template that was just merged.
lucifer
we have a couple of options there, either spark calls the api or lb fetches the data from db and sends it as a part of the rmq message
mayhem
what else?
I was expecting for LB to fetch the data from couchdb/postgres.
and for the endpoint to return sidewide fresh releases unless a user name was given.
lucifer
rest of the backend is almost done. most of the couchdb integration will be done when migrating stats to it. after that i'll finish the fresh releases pr.
mayhem
ok, maybe we should just wait for that to be done before doing more stuff.
lucifer
yes, that makes sense.
a few tests and dumps are pending on that front fwiw.
mayhem
ok, ping me if you need anything. I'm going to see if I can classify tracks as high/low energy with the data we have at our disposal.... see if I can make another playlist for users.
lucifer
will do. sounds great! :D
mayhem
daily jams are making me pretty happy. looking quite nice now. I really need to make a point of listening to them each day to see how things shape up over time.
BP makes that pretty hard though. It plays a handful of tracks and then halts. :(
lucifer
yeah spotify does not have any documentation on how to fix the issue and no one answered on forums either.
mayhem
yeah, its fully meh.
I think I might try my hand at the spotify cache using couchdb as the document store.
Sophist_UK has quit
Sophist-UK joined the channel
riksucks
hi lucifer, are you up?
lucifer
riksucks: yes. sorry forgot to answer your question earlier. there are 2 things to consider here: 1) mulitple notifications on feed 2) allowing individual recommendee's to delete a personal notification they received without affecting others.
also maybe allow the recommender to unsend the recommendation to a particular person without unsending it to others?
for instance, Instagram allows you to send a post to multiple persons at a time but then you can unsend to a particular person later if you want.
riksucks
true, I thought about the 2) one, and realised that in normal recommendation, only the recommender can delete it, and the recommendees can hide it from their timelines. So maybe we can implement a similar feature. Similarly for unsending for a particular person, we can try removing that specific ID from the array, and update it in the DB
lucifer
yes that's possible. alternative option is to keep 1 row per user and instead group all the notifications by recording id.
mayhem, alastairp: thoughts on how to handle this: say a user sends a track recommendation to multiple people. should we create 1) 1 row per user or 2) 1 row with array containing all the users' ids.
mayhem
2
lucifer
in 1 its easier to handle deletes/hiding. but more work to manage notifications in feed. vice versa in 2.
Pratha-Fish
alastairp: I've ran the explainer API on 333 rows of faulty data. Now how do we debug it?
riksucks
also lucifer, I wanted to tell you another thing. I was reading up on how postgres handles JSONB, and what happens when we update certain keys or certain parts of that JSONB. Turns out, postgres always writes a new version of the whole row whenever we are updating. Do you think that would create overhead for lots of personal recommendation?