#metabrainz

/

0:36 AM
MRiddickW has quit

2021-06-28 17916, 2021

2:11 AM
MRiddickW joined the channel

2021-06-28 17958, 2021

2:18 AM
wargreen has quit

2021-06-28 17903, 2021

2:22 AM
wargreen joined the channel

2021-06-28 17918, 2021

3:28 AM
wargreen has quit

2021-06-28 17926, 2021

6:41 AM
reosarevok

ruaok: thanks for taking care of support! Back now :)

2021-06-28 17945, 2021

6:41 AM
reosarevok

Let's see if I remember how work works

2021-06-28 17946, 2021

6:43 AM
yyoung

Is this an intended behavior of external links editor? https://imgur.com/a/MN51UDa

2021-06-28 17913, 2021

6:44 AM
yyoung

To reproduce, enter two links and then clear the first one.

2021-06-28 17934, 2021

6:52 AM
reosarevok

yyoung: I don't think it's intended as such, no

2021-06-28 17958, 2021

6:52 AM
reosarevok

I guess it does allow to ctrl-z and bring the old link back, maybe? If it doesn't, then it's completely useless and 100% buggy

2021-06-28 17939, 2021

6:53 AM
reosarevok

ruaok: can you unsubscribe from notifications from LB's twitter? Unless you're using those emails

2021-06-28 17952, 2021

6:53 AM
reosarevok

(otherwise, it's just effectively stuff to remove)

2021-06-28 17907, 2021

6:54 AM
opal has quit

2021-06-28 17924, 2021

6:54 AM
yyoung

reosarevok: Yes Ctrl-Z will work

2021-06-28 17938, 2021

6:54 AM
reosarevok

Then we could keep it, but with better UI

2021-06-28 17947, 2021

6:54 AM
reosarevok

If it makes sense for what you're building

2021-06-28 17907, 2021

6:55 AM
reosarevok

Not that I know what the better UI would be :D

2021-06-28 17910, 2021

6:56 AM
opal joined the channel

2021-06-28 17908, 2021

6:59 AM
yyoung

Since I'm changing the implementation, this bug is likely to disappear

2021-06-28 17903, 2021

7:00 AM
reosarevok

Ok

2021-06-28 17934, 2021

7:00 AM
yyoung

Just found it when testing some tricky actions, which are prone to introduce bugs :)

2021-06-28 17945, 2021

7:41 AM
ruaok

mooin!

2021-06-28 17941, 2021

7:42 AM
ruaok

reosarevok: the twitter notifications are really stupid. you can't just get notifications for direct messages and mentions. you only get these useless scattershot notifications...

2021-06-28 17949, 2021

7:42 AM
ruaok

I can't decide if they are useful or not....

2021-06-28 17933, 2021

8:00 AM
reosarevok

Yeah, dunno. We seem to have them disabled for MB (but I didn't do that myself)

2021-06-28 17958, 2021

8:16 AM
MRiddickW has quit

2021-06-28 17948, 2021

8:24 AM
akashgp09 joined the channel

2021-06-28 17908, 2021

8:25 AM
zas

yvanzo: sir-prod is stuck again with "maximum recursion depth exceeded in cmp"

2021-06-28 17923, 2021

8:40 AM
outsidecontext

akshaaatt[m], lucifer: hi, regarding your chat yesterday: I too think adding this to the release activity directly is the best solution. One advantage of the app is that it has the search and barcode lookup functionality already fully working, so adding the "send to picard" on the release activity automatically makes both use cases work

2021-06-28 17900, 2021

8:42 AM
lucifer

+1, thanks! :D

2021-06-28 17914, 2021

8:56 AM
ruaok

moin lucifer. care to take a look at #1514? Looks mergeable to me. :)

2021-06-28 17920, 2021

8:58 AM
lucifer

ruaok: sure, will do. i am currently working on adding the consul values for that PR but since we need booleans not strings need to add a consul define.

2021-06-28 17915, 2021

8:59 AM
ruaok

oh, thanks!

2021-06-28 17901, 2021

9:12 AM
ruaok

lucifer: I just had a thought...

2021-06-28 17953, 2021

9:12 AM
ruaok

right now we scale user similarities on a global scale, meaning that only one person ever will get a perfect 1.0 score. everyone else is going to get less. like we are seeing now.

2021-06-28 17907, 2021

9:13 AM
ruaok

should we scale users individually?

2021-06-28 17931, 2021

9:13 AM
ruaok

not 100% sure how to do that best.

2021-06-28 17913, 2021

9:14 AM
ruaok

I guess if a user has only one match and it is a low match at that, we don't want to say "this is your 100% match". that's crap.

2021-06-28 17929, 2021

9:14 AM
lucifer

yeah, we could scale individually. but then if highest similarity is only 0.009 something then we give it which is bad.

2021-06-28 17937, 2021

9:14 AM
lucifer

right that.

2021-06-28 17950, 2021

9:14 AM
ruaok

what if we say define 3-5 levels: Great, good, ok, so-so, weak.

2021-06-28 17905, 2021

9:15 AM
lucifer

that makes sense.

2021-06-28 17905, 2021

9:15 AM
ruaok

and that represents the top end to the bottom end.

2021-06-28 17942, 2021

9:15 AM
ruaok

should I make a PR that scales the users individually so we can see what the ratings become?

2021-06-28 17911, 2021

9:16 AM
ruaok

if we like that approach, make an LB (not spark) PR for changing the display from numeric to test level?

2021-06-28 17926, 2021

9:16 AM
ruaok

*text*

2021-06-28 17957, 2021

9:16 AM
lucifer

let's scale like that but also put a threshold on min rating below which we don't raise it to 1.

2021-06-28 17921, 2021

9:17 AM
lucifer

say if the rating is below 0.01, don't make it 1 but rather 0.5 something.

2021-06-28 17947, 2021

9:17 AM
ruaok

I see what you are suggesting and I agree, but...

2021-06-28 17949, 2021

9:17 AM
lucifer

but that can be done as a follow up, we can probably just test the individual scale first

2021-06-28 17903, 2021

9:18 AM
ruaok

so far hard coded thresholds have been problematic to say the least.

2021-06-28 17940, 2021

9:18 AM
ruaok

actually, now that I think about it, you have a different approach than I do.

2021-06-28 17903, 2021

9:19 AM
ruaok

I am saying that in the context of a users we are describing them in a relative way, rather than an absolute way.

2021-06-28 17933, 2021

9:19 AM
ruaok

in my way, if there one match, its your best (and worst) match.

2021-06-28 17951, 2021

9:19 AM
ruaok

in your way, there still some global threshold of quality between users.

2021-06-28 17907, 2021

9:20 AM
ruaok

which I think is problematic.

2021-06-28 17909, 2021

9:20 AM
lucifer

i see. makes sense.

2021-06-28 17933, 2021

9:20 AM
ruaok

let me make a PR for user scaling and then we can examine the results. from there we can decide next steps.

2021-06-28 17938, 2021

9:20 AM
lucifer

+1

2021-06-28 17956, 2021

9:20 AM
ruaok

yay, I managed to procrastinate working on dumps!

2021-06-28 17905, 2021

9:21 AM
lucifer

lol

2021-06-28 17918, 2021

9:21 AM
lucifer

what were you planning to work on dumps btw?

2021-06-28 17941, 2021

9:21 AM
ruaok

mapped MBIDs into spark dumps.

2021-06-28 17952, 2021

9:21 AM
ruaok

there are two approaches:

2021-06-28 17955, 2021

9:21 AM
lucifer

oh nice!

2021-06-28 17908, 2021

9:22 AM
ruaok

1. Add MBIDs into full dumps and then transmogrify

2021-06-28 17923, 2021

9:22 AM
ruaok

2. Make a separate spark dump, that moves a lot less data, but duplciates more code.

2021-06-28 17950, 2021

9:22 AM
ruaok

#1 rubs me wrong, because we're putting generated data into the user data. that feels wrong.

2021-06-28 17900, 2021

9:23 AM
ruaok

but it wold be the fastest solution.

2021-06-28 17904, 2021

9:23 AM
lucifer

i would be in favor of 2. because i also wanted to move spark dumps to parquet format.

2021-06-28 17924, 2021

9:23 AM
ruaok

I agree, 2 is better.

2021-06-28 17936, 2021

9:23 AM
ruaok

but, aren't the dumps in parquet format now?

2021-06-28 17939, 2021

9:23 AM
ruaok

what needs to change?

2021-06-28 17909, 2021

9:24 AM
lucifer

acc to my understanding, dumps are output in json. spark reads json and writes in hdfs as paraquet.

2021-06-28 17941, 2021

9:24 AM
lucifer

my suggestion we write in parquet at the first step.

2021-06-28 17944, 2021

9:24 AM
ruaok

so we incur and extra pass over the data during import?

2021-06-28 17955, 2021

9:24 AM
lucifer

right

2021-06-28 17904, 2021

9:25 AM
ruaok

ok, that makes sense. let's do that.

2021-06-28 17945, 2021

9:25 AM
lucifer

cool, so i can take up this task do it using approach 2 then?

2021-06-28 17950, 2021

9:25 AM
ruaok

beause that lets us play with DuckDB: https://duckdb.org/2021/06/25/querying-parquet.ht…

2021-06-28 17915, 2021

9:26 AM
ruaok

yes, once I submit the similarities PR, I'll work on the parquet based dumps.

2021-06-28 17955, 2021

9:26 AM
lucifer

never heard of duckdb before will take a look.

2021-06-28 17908, 2021

9:27 AM
alastairp

hello good morning

2021-06-28 17918, 2021

9:27 AM
ruaok

it sounds quite cool. run SQL queries on parquet files.

2021-06-28 17921, 2021

9:27 AM
ruaok

moin alastairp

2021-06-28 17924, 2021

9:27 AM
ruaok

how is the 5G?

2021-06-28 17932, 2021

9:27 AM
lucifer

moin!

2021-06-28 17942, 2021

9:27 AM
alastairp

it's all gone away and I feel back to normal

2021-06-28 17954, 2021

9:27 AM
lucifer

i got first vaccine dose this weekend too :)

2021-06-28 17959, 2021

9:27 AM
ruaok

fucking bill gates. over promise and under deliver.

2021-06-28 17905, 2021

9:28 AM
ruaok

yayayayayaya, lucifer !

2021-06-28 17907, 2021

9:28 AM
ruaok

very good.

2021-06-28 17922, 2021

9:28 AM
ruaok

out team is well underway to getting fully vaxxed.

2021-06-28 17952, 2021

9:28 AM
lucifer

nice!! :DD

2021-06-28 17937, 2021

9:29 AM
ruaok

my mum mentioned that a lot of people who now end up in hospitals (in the US) with covid are anti-vaxxers.

2021-06-28 17947, 2021

9:29 AM
ruaok

there is some poetic justice in that.

2021-06-28 17938, 2021

9:31 AM
alastairp

ruaok: just looking through your changes in https://github.com/metabrainz/listenbrainz-server… again

2021-06-28 17902, 2021

9:32 AM
loujine has quit

2021-06-28 17924, 2021

9:32 AM
alastairp

to confirm - the behaviour that you're going for here is that any time a dump is running, you want to use the lock file to block other processes from running

2021-06-28 17937, 2021

9:32 AM
alastairp

so periodically refresh_listen_count_aggregate won't run because there's a dump happening

2021-06-28 17911, 2021

9:40 AM
loujine joined the channel

2021-06-28 17957, 2021

9:42 AM
ruaok

no, that wasn't the intent. the lock should not prevent processes running. it should only prevent the container from being killes.

2021-06-28 17901, 2021

9:43 AM
ruaok

*killed.

2021-06-28 17934, 2021

9:43 AM
ruaok

but I think I see your point.

2021-06-28 17947, 2021

9:43 AM
ruaok

the lock needs to have a refcount to accomplish what I want to do.

2021-06-28 17918, 2021

9:44 AM
ruaok

because right one one process gets precluded.

2021-06-28 17921, 2021

9:44 AM
alastairp

ok, right

2021-06-28 17937, 2021

9:44 AM
BrainzGit

[listenbrainz-server] 14mayhem closed pull request #1513 (03master…terminate-cron): Terminate cron https://github.com/metabrainz/listenbrainz-server…

2021-06-28 17940, 2021

9:44 AM
alastairp

what about multiple lock files, does that introduce too much complexity?

2021-06-28 17951, 2021

9:44 AM
ruaok

let me get back to that -- I closed the PR for now.

2021-06-28 17900, 2021

9:45 AM
alastairp

have a lock file per process, then the killer checker can look for any *.lock, for example

2021-06-28 17902, 2021

9:45 AM
ruaok

multiple lock files could work.

2021-06-28 17914, 2021

9:45 AM
alastairp

OK, cool. lmk when you have another solution

2021-06-28 17931, 2021

9:45 AM
ruaok

yeah, makes sense. I'll do that once I finish the user similarity tweak lucifer and I just dicussed.

2021-06-28 17919, 2021

9:55 AM
ruaok

lucifer: question about the similarity matrix....

2021-06-28 17941, 2021

9:56 AM
ruaok

I know that one user is a row of data. but a user user is also a column of data.

2021-06-28 17928, 2021

9:58 AM
ruaok

I guess my question hinges on this line: https://github.com/metabrainz/listenbrainz-server…

2021-06-28 17942, 2021

9:58 AM
ruaok

how is the dataframe generated from the matrix?

2021-06-28 17952, 2021

10:02 AM
ruaok

I'm starting to think we should remove the thresholding function and send the raw data to the LB server.

2021-06-28 17908, 2021

10:03 AM
ruaok

and at LB we do the scaling on the fly -- at least until we know what we're doing.

2021-06-28 17920, 2021

10:03 AM
lucifer

ruaok, the line you shared does not involve the matrix.

2021-06-28 17949, 2021

10:03 AM
lucifer

indeed, i think its better to do scaling in Lb for now as it is simpler.

2021-06-28 17904, 2021

10:04 AM
lucifer

once, we are sure what to do we can move that to spark side.

2021-06-28 17923, 2021

10:04 AM
ruaok

oh. right. the threshold similar users actually does that extraction from the matrix!

2021-06-28 17930, 2021

10:04 AM
ruaok

its returns a list. heh.

2021-06-28 17941, 2021

10:04 AM
lucifer

https://github.com/metabrainz/listenbrainz-server…

2021-06-28 17904, 2021

10:05 AM
ruaok

yeah. duh.

2021-06-28 17908, 2021

10:05 AM
lucifer

that said, i think moving the these two declarations into the first loop should be enough to scale individually.

2021-06-28 17912, 2021

10:05 AM
ruaok

that actually makes it easy to scale in spark.

2021-06-28 17911, 2021

10:06 AM
ruaok

yes, you're right. it is that simple.

2021-06-28 17904, 2021

10:07 AM
ruaok

well, not quite. it needs more changes, but those are not hard.

2021-06-28 17958, 2021

10:07 AM
lucifer

right, just saw there's another loop after it.

2021-06-28 17905, 2021

10:08 AM
ruaok nods

2021-06-28 17916, 2021

10:16 AM
BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #1528 (03master…consul-load): Add consul_template KEY_JSON to use when the value of key needs to parsed before used in Python https://github.com/metabrainz/listenbrainz-server…

2021-06-28 17925, 2021

10:17 AM
lucifer

alastairp: ^, tested above on test.lb seems to work fine.

2021-06-28 17912, 2021

10:18 AM
lucifer

I need to add a bunch of flags (2 for email and another for pin apis) so added above to reduce duplication.

2021-06-28 17949, 2021

10:19 AM
alastairp

ah, nice. so the idea is that consul will always have _encoded json_, and we decode it in config.py?

2021-06-28 17912, 2021

10:20 AM
alastairp

should we consider having a try/except for ValueError in case the json is invalid?

2021-06-28 17928, 2021

10:20 AM
lucifer

yes, right.

2021-06-28 17949, 2021

10:20 AM
lucifer

if the json is invalid, it should crash i think and report to sentry, right?

2021-06-28 17928, 2021

10:22 AM
lucifer

it does. https://sentry.metabrainz.org/metabrainz/service-…

2021-06-28 17930, 2021

10:22 AM
alastairp

actually, we probably already have it. If the json is invalid, uwsgi will fail to load the app, and will quit. our standard reporting system will take over from here

2021-06-28 17949, 2021

10:22 AM
lucifer

yup exactly that happened.

2021-06-28 17906, 2021

10:23 AM
alastairp

additional question is if we want to be able to start up even in the case of invalid data

2021-06-28 17944, 2021

10:23 AM
lucifer

i was thinking about that, in case of missing services yes but invalid config not sure

2021-06-28 17959, 2021

10:27 AM
lucifer

ruaok: alastairp: should we enable the pinned rec api in beta/test so that we can test it easily?

2021-06-28 17942, 2021

10:28 AM
ruaok

good idea!

2021-06-28 17946, 2021

10:28 AM
alastairp

yeah, right. we could special-case this: if there's an invalid youtube config then selectively disable youtube playback. but honestly, it's less work to just fix the config if it's broken

2021-06-28 17950, 2021

10:28 AM
alastairp

I agree, let's do it