#metabrainz

/

0:46 AM
akshaaatt

Hi Freso, I will be in a location today which doesn’t have any communication means, so will have to skip today’s meeting. For my update, I’ve been on a vacation and had a look at yellowhatpro’s work on the android app! Thank you.

2022-07-18 19915, 2022

4:46 AM
v6lur joined the channel

2022-07-18 19901, 2022

5:17 AM
v6lur has quit

2022-07-18 19947, 2022

5:19 AM
Pratha-Fish

Looks like it's still going https://usercontent.irccloud-cdn.com/file/rPHMD2K…

2022-07-18 19900, 2022

5:21 AM
Pratha-Fish

alastairp: looks like there has been a _slight miscalculation_

2022-07-18 19909, 2022

5:22 AM
Pratha-Fish

MLHD has 600k files, not 60k. So the estimated total time is 50hours, not 5 hours ⚰️

2022-07-18 19947, 2022

5:25 AM
Pratha-Fish

However the good news is, the processing is going fine so far with 0.3s avg testing time per file

2022-07-18 19905, 2022

5:26 AM
Pratha-Fish

No track-MBIDs detected in recording-MBID so far

2022-07-18 19929, 2022

6:38 AM
BrainzGit

[critiquebrainz] 14anshg1214 opened pull request #446 (03master…CB_440): CB-440: Recording entity [unknown artist] https://github.com/metabrainz/critiquebrainz/pull…

2022-07-18 19959, 2022

7:53 AM
Freso

akshaaatt: Noted. Thanks. :)

2022-07-18 19959, 2022

8:01 AM
trolley has quit

2022-07-18 19929, 2022

8:03 AM
trolley joined the channel

2022-07-18 19908, 2022

8:49 AM
alastairp

Pratha-Fish: and imagine if we were taking 3 seconds per file instead of 0.3!

2022-07-18 19903, 2022

8:50 AM
Pratha-Fish

alastairp: that would've been 500 hours 🥶

2022-07-18 19950, 2022

8:51 AM
Pratha-Fish

alastairp: Also, looks like the process is constantly using > 90% CPU on wolf. I hope it doesn't disturb other user's work

2022-07-18 19901, 2022

8:52 AM
alastairp

Pratha-Fish: note that this is 90% of 1 CPU core

2022-07-18 19913, 2022

8:52 AM
alastairp

we have 12 CPU cores

2022-07-18 19923, 2022

8:52 AM
Pratha-Fish

_wow_

2022-07-18 19952, 2022

8:52 AM
alastairp

btw, one thing we could have done is start up 6-8 parallel workers for the same process

2022-07-18 19902, 2022

8:53 AM
alastairp

get it done in 7 hours

2022-07-18 19919, 2022

8:53 AM
alastairp

no worries, let's just leave it to do its thing

2022-07-18 19927, 2022

8:53 AM
Pratha-Fish

Damn that would've been nice

2022-07-18 19931, 2022

8:53 AM
alastairp

Pratha-Fish: any indication so far about track ids?

2022-07-18 19939, 2022

8:53 AM
alastairp

Pratha-Fish: are you also saving the zst files? in what location?

2022-07-18 19944, 2022

8:53 AM
Pratha-Fish

alastairp: Nothing as of today morning

2022-07-18 19906, 2022

8:54 AM
Pratha-Fish

alastairp: I am saving it all in snaek/MLHD/rec_track_checker/MLHD

2022-07-18 19932, 2022

8:54 AM
alastairp

I see it, grat

2022-07-18 19933, 2022

8:54 AM
alastairp

great

2022-07-18 19921, 2022

8:55 AM
Pratha-Fish

Also, all logs are being written in 1 level above the dir where MLHD is being written. So far so good

2022-07-18 19938, 2022

8:56 AM
Pratha-Fish

https://usercontent.irccloud-cdn.com/file/U4wsPue…

2022-07-18 19906, 2022

8:57 AM
Pratha-Fish

alastairp: 163k files checked so far. Nothing found. Do you need any other numbers while we're at it?

2022-07-18 19919, 2022

8:58 AM
alastairp

nothing yet

2022-07-18 19920, 2022

9:01 AM
alastairp

hmm

2022-07-18 19952, 2022

9:01 AM
alastairp

Pratha-Fish: from what I can see of the code, you're just loading it in, checking the first column against your db tables, and then writing the dataframe out again?

2022-07-18 19918, 2022

9:02 AM
Pratha-Fish

alastairp: yes that's right

2022-07-18 19929, 2022

9:02 AM
alastairp

however, I just randomly sampled a few of your compressed zst files and compared them against the gzip version of the same file, and the resulting uncompressed data is different

2022-07-18 19923, 2022

9:03 AM
Pratha-Fish

How exactly?

2022-07-18 19950, 2022

9:03 AM
alastairp

good question

2022-07-18 19952, 2022

9:03 AM
alastairp

I ran this:

2022-07-18 19953, 2022

9:03 AM
alastairp

diff <(zstdcat /home/snaek/MLHD/rec_track_checker/MLHD/0a/0a118981-15b5-46df-8666-080ca5a1af62.csv.zst) <(zcat /data/mlhd/0a/0a118981-15b5-46df-8666-080ca5a1af62.txt.gz)

2022-07-18 19905, 2022

9:04 AM
alastairp

which should have no output (indicating that the files are the same)

2022-07-18 19933, 2022

9:04 AM
alastairp

oh wait -

2022-07-18 19949, 2022

9:04 AM
alastairp

sorry, of course, we're using csv for zstd and tsv for txt

2022-07-18 19908, 2022

9:05 AM
Pratha-Fish

ah right that could be the reason

2022-07-18 19913, 2022

9:05 AM
alastairp

or did you use tabs in the end?

2022-07-18 19924, 2022

9:05 AM
Pratha-Fish

I think I ended up using tabs

2022-07-18 19929, 2022

9:05 AM
alastairp

https://github.com/Prathamesh-Ghatole/MLHD/blob/m…

2022-07-18 19902, 2022

9:06 AM
alastairp

yes, right. so I would expect these files to be identical

2022-07-18 19923, 2022

9:06 AM
Pratha-Fish

Hmm

2022-07-18 19940, 2022

9:06 AM
Pratha-Fish

Lemme load up a few files in python and cross check

2022-07-18 19957, 2022

9:07 AM
Pratha-Fish

I hope the difference is only limited to something trivial like row indices being written with the data

2022-07-18 19952, 2022

9:08 AM
ansh

moin!

2022-07-18 19956, 2022

9:08 AM
alastairp

https://www.irccloud.com/pastebin/q9yUuQG7/

2022-07-18 19908, 2022

9:09 AM
alastairp

Pratha-Fish: at least the first 10 lines of the files are the same

2022-07-18 19942, 2022

9:09 AM
alastairp

oh, hmm. interesting

2022-07-18 19943, 2022

9:09 AM
alastairp

one sec

2022-07-18 19946, 2022

9:11 AM
alastairp

https://www.irccloud.com/pastebin/tIChzWqm/

2022-07-18 19921, 2022

9:12 AM
alastairp

Pratha-Fish: right, so the original files have \r\n line terminators, and the ones that we generated have only \n

2022-07-18 19954, 2022

9:12 AM
alastairp

phew, that's less terrible than I thought

2022-07-18 19957, 2022

9:13 AM
Pratha-Fish

alastairp: Phew

2022-07-18 19902, 2022

9:14 AM
Pratha-Fish

https://usercontent.irccloud-cdn.com/file/mNa8dc9…

2022-07-18 19906, 2022

9:14 AM
Pratha-Fish

pandas confirms that too

2022-07-18 19938, 2022

9:14 AM
Pratha-Fish

alastairp: does not having \r make a significant difference?

2022-07-18 19901, 2022

9:15 AM
alastairp

no, it just tends to appear more often on files created from windows

2022-07-18 19922, 2022

9:15 AM
Pratha-Fish

thankgod

2022-07-18 19925, 2022

9:15 AM
alastairp

in fact, I believe that the way that we are doing it is more correct

2022-07-18 19933, 2022

9:15 AM
alastairp

oh cool, there's a flag to `diff`:

2022-07-18 19940, 2022

9:15 AM
alastairp

diff --strip-trailing-cr <(zstdcat /home/snaek/MLHD/rec_track_checker/MLHD/0a/0a118981-15b5-46df-8666-080ca5a1af62.csv.zst) <(zcat /data/mlhd/0a/0a118981-15b5-46df-8666-080ca5a1af62.txt.gz)

2022-07-18 19944, 2022

9:15 AM
alastairp

that correctly outputs nothing

2022-07-18 19951, 2022

9:15 AM
Pratha-Fish

Nicee

2022-07-18 19932, 2022

9:16 AM
Pratha-Fish

Linux CLI is surprisingly powerful NGL. Makes me wanna switch back to arch

2022-07-18 19955, 2022

9:16 AM
Pratha-Fish

I just couldn't live with the constantly breaking system as a daily driver tbh

2022-07-18 19911, 2022

9:17 AM
ansh

alastairp: I tests are passing on CB#445. I tries running them locally.

2022-07-18 19911, 2022

9:17 AM
BrainzBot

Remove script to update Bookbrainz Database: https://github.com/metabrainz/critiquebrainz/pull…

2022-07-18 19918, 2022

9:17 AM
ansh

The*

2022-07-18 19919, 2022

9:18 AM
Freso

Pratha-Fish: That’s what eventually pushed me to drop Windows for good. 🙃 IME Windows is as likely to break as Linux, but with Linux I at least have an idea of what’s going on and a fighting chance to fix it.

2022-07-18 19920, 2022

9:19 AM
ansh

Is there any way to retest them on github before merging?

2022-07-18 19920, 2022

9:19 AM
Pratha-Fish

Freso: relatable haha but the opposite

2022-07-18 19956, 2022

9:19 AM
Pratha-Fish

The only reason why I am sticking with windows at this point is because of excellent software support, and force of habit

2022-07-18 19912, 2022

9:20 AM
alastairp

ansh: I can trigger it again

2022-07-18 19929, 2022

9:20 AM
alastairp

should be running again

2022-07-18 19931, 2022

9:20 AM
alastairp

https://github.com/metabrainz/critiquebrainz/acti…

2022-07-18 19941, 2022

9:20 AM
ansh

yes it started running

2022-07-18 19905, 2022

9:36 AM
Pratha-Fish

alastairp: What should I do while the computation is running?

2022-07-18 19906, 2022

9:36 AM
Pratha-Fish

We could jump back on the artist conflation issue, or even start converting all pandas.isin() code to set queries

2022-07-18 19913, 2022

9:39 AM
alastairp

Pratha-Fish: I think that the next interesting step is going to be a comparison of our two data lookup methods

2022-07-18 19942, 2022

9:39 AM
alastairp

remember back at the beginning of the year when we were explaining that we might need to rewrite some lookup methods in spark or some other faster system?

2022-07-18 19952, 2022

9:39 AM
Pratha-Fish

right

2022-07-18 19941, 2022

9:40 AM
alastairp

so, given a recording mbid in the data file, we currently have 2 ways of looking up a canonical id: mbid -> canonical mbid table; or mbid -> text metadata -> mapper

2022-07-18 19944, 2022

9:41 AM
alastairp

and the previous experiment you did a few weeks back shows that some items give different results

2022-07-18 19913, 2022

9:42 AM
alastairp

what we're interested in doing is seeing why these results are different, and what we can do to make them the same

2022-07-18 19942, 2022

9:42 AM
alastairp

because ideally we could continue to use the canonical mbid table, because it's super fast (otherwise we need to look up all 27 billion rows in the mapper, which is slow)

2022-07-18 19911, 2022

9:43 AM
Pratha-Fish

the mapper method won't complete the computing this year tbh

2022-07-18 19922, 2022

9:44 AM
alastairp

so we need to decide if the mapper really is "better" (we don't know what the definition of better is here, we need to investigate the data and make a decision)

2022-07-18 19906, 2022

9:45 AM
alastairp

and if it _is_ better, we need to move on to the next steps of seeing if we can re-implement in something faster (spark? something else) in order to do the processing in a reasonable time

2022-07-18 19927, 2022

9:45 AM
Pratha-Fish

very interesting :D

2022-07-18 19944, 2022

9:46 AM
Pratha-Fish

So I'll take a look at the data first ig. Let's see if there's any patterns

2022-07-18 19940, 2022

9:47 AM
alastairp

Pratha-Fish: this dataset endpoint should be useful: https://labs.api.listenbrainz.org/explain-mbid-ma…

2022-07-18 19904, 2022

9:48 AM
alastairp

you give it a single artist and recording, and it'll return debugging about how it finds the item

2022-07-18 19917, 2022

9:49 AM
Pratha-Fish

Sounds good. I can try mapping the API to some data

2022-07-18 19915, 2022

9:51 AM
mayhem

moooin!

2022-07-18 19945, 2022

9:51 AM
mayhem

for any one who knows about Don Norman's "Design of Everyday things", but hasn't been able to read/finish it, these notes look quite cool" https://elvischidera.com/2022-06-24-design-everyd…

2022-07-18 19954, 2022

9:55 AM
Pratha-Fish

mayhem: What a coincidence, I started reading that one today :))

2022-07-18 19943, 2022

9:58 AM
BrainzGit

[critiquebrainz] 14alastair merged pull request #445 (03master…remove_temp_script): Remove script to update Bookbrainz Database https://github.com/metabrainz/critiquebrainz/pull…

2022-07-18 19905, 2022

10:05 AM
CatQuest

"Perceived affordances help people figure out what actions are possible without the need for labels or instructions."

2022-07-18 19905, 2022

10:05 AM
CatQuest

this is the bullshit mentality that makes everything have icons and boxes now instead of CLEAR LABELS AND INSTRUCTIONS

2022-07-18 19905, 2022

10:05 AM
CatQuest

I *LIKE* Labels and Instructions!!!!

2022-07-18 19919, 2022

10:05 AM
CatQuest

aaaaaaaaaaaaaaaaa

2022-07-18 19915, 2022

10:09 AM
CatQuest

.. but later on they say that a simple "you are offline" wouldsuffice as a notfication that connection is broken..

2022-07-18 19917, 2022

10:09 AM
CatQuest

i'm confused

2022-07-18 19909, 2022

10:10 AM
CatQuest

but in short: please label things with text, & write succinct instructions where needed." thanks

2022-07-18 19905, 2022

10:15 AM
mayhem

lucifer: https://github.com/metabrainz/listenbrainz-server…

2022-07-18 19916, 2022

10:15 AM
mayhem

is now ready with PR feedback and missing test file added.

2022-07-18 19947, 2022

10:28 AM
ansh

alastairp: The tests passed successfully after rebasing CB#441 last time. If there are any more changes required, pls let me know :)

2022-07-18 19947, 2022

10:28 AM
BrainzBot

CB-437: Add entity metadata to review get endpoints: https://github.com/metabrainz/critiquebrainz/pull…

2022-07-18 19935, 2022

10:51 AM
lucifer

mayhem: lgtm, thanks. it would be nice to add some tests with real mb data as well but currently we don't have MB db in LB tests so I'll open a ticket for it.

2022-07-18 19952, 2022

10:51 AM
mayhem

k

2022-07-18 19933, 2022

10:52 AM
BrainzGit

[listenbrainz-server] 14mayhem merged pull request #2065 (03master…add-upcoming-releases-backend): Add fresh releases backend https://github.com/metabrainz/listenbrainz-server…

2022-07-18 19957, 2022

10:52 AM
mayhem

lucifer: is it time for us to chat about how to integrate the three separarate branches of fresh releases work we've got going on?

2022-07-18 19913, 2022

10:53 AM
lucifer

yes sure

2022-07-18 19921, 2022

10:53 AM
mayhem

the fetching of user specific data needs to be added to the endpoint I just added, that is one thing I see.

2022-07-18 19957, 2022

10:53 AM
mayhem

and now that I made space for the react work, chinmay can drop his work on top of the blank template that was just merged.

2022-07-18 19958, 2022

10:53 AM
lucifer

we have a couple of options there, either spark calls the api or lb fetches the data from db and sends it as a part of the rmq message

2022-07-18 19959, 2022

10:53 AM
mayhem

what else?

2022-07-18 19927, 2022

10:54 AM
mayhem

I was expecting for LB to fetch the data from couchdb/postgres.

2022-07-18 19957, 2022

10:54 AM
mayhem

and for the endpoint to return sidewide fresh releases unless a user name was given.

2022-07-18 19900, 2022

10:55 AM
lucifer

rest of the backend is almost done. most of the couchdb integration will be done when migrating stats to it. after that i'll finish the fresh releases pr.

2022-07-18 19930, 2022

10:55 AM
mayhem

ok, maybe we should just wait for that to be done before doing more stuff.

2022-07-18 19932, 2022

10:55 AM
lucifer

yes, that makes sense.

2022-07-18 19915, 2022

10:56 AM
lucifer

a few tests and dumps are pending on that front fwiw.

2022-07-18 19920, 2022

10:57 AM
mayhem

ok, ping me if you need anything. I'm going to see if I can classify tracks as high/low energy with the data we have at our disposal.... see if I can make another playlist for users.

2022-07-18 19940, 2022

10:57 AM
lucifer

will do. sounds great! :D

2022-07-18 19937, 2022

10:58 AM
mayhem

daily jams are making me pretty happy. looking quite nice now. I really need to make a point of listening to them each day to see how things shape up over time.

2022-07-18 19954, 2022

10:58 AM
mayhem

BP makes that pretty hard though. It plays a handful of tracks and then halts. :(

2022-07-18 19938, 2022

10:59 AM
lucifer

yeah spotify does not have any documentation on how to fix the issue and no one answered on forums either.

2022-07-18 19924, 2022

11:00 AM
mayhem

yeah, its fully meh.

2022-07-18 19944, 2022

11:02 AM
mayhem

I think I might try my hand at the spotify cache using couchdb as the document store.

2022-07-18 19900, 2022

11:12 AM
Sophist_UK has quit

2022-07-18 19925, 2022

11:12 AM
Sophist-UK joined the channel

2022-07-18 19949, 2022

11:16 AM
riksucks

hi lucifer, are you up?

2022-07-18 19918, 2022

11:19 AM
lucifer

riksucks: yes. sorry forgot to answer your question earlier. there are 2 things to consider here: 1) mulitple notifications on feed 2) allowing individual recommendee's to delete a personal notification they received without affecting others.

2022-07-18 19936, 2022

11:20 AM
lucifer

also maybe allow the recommender to unsend the recommendation to a particular person without unsending it to others?

2022-07-18 19914, 2022

11:21 AM
lucifer

for instance, Instagram allows you to send a post to multiple persons at a time but then you can unsend to a particular person later if you want.

2022-07-18 19958, 2022

11:21 AM
riksucks

true, I thought about the 2) one, and realised that in normal recommendation, only the recommender can delete it, and the recommendees can hide it from their timelines. So maybe we can implement a similar feature. Similarly for unsending for a particular person, we can try removing that specific ID from the array, and update it in the DB

2022-07-18 19915, 2022

11:23 AM
lucifer

yes that's possible. alternative option is to keep 1 row per user and instead group all the notifications by recording id.

2022-07-18 19958, 2022

11:25 AM
lucifer

mayhem, alastairp: thoughts on how to handle this: say a user sends a track recommendation to multiple people. should we create 1) 1 row per user or 2) 1 row with array containing all the users' ids.

2022-07-18 19914, 2022

11:26 AM
mayhem

2

2022-07-18 19944, 2022

11:26 AM
lucifer

in 1 its easier to handle deletes/hiding. but more work to manage notifications in feed. vice versa in 2.

2022-07-18 19917, 2022

11:27 AM
Pratha-Fish

alastairp: I've ran the explainer API on 333 rows of faulty data. Now how do we debug it?

2022-07-18 19924, 2022

11:30 AM
riksucks

also lucifer, I wanted to tell you another thing. I was reading up on how postgres handles JSONB, and what happens when we update certain keys or certain parts of that JSONB. Turns out, postgres always writes a new version of the whole row whenever we are updating. Do you think that would create overhead for lots of personal recommendation?