#metabrainz

/

0:05 AM
strider has quit

2022-05-25 14555, 2022

0:08 AM
strider joined the channel

2022-05-25 14550, 2022

1:00 AM
Shubh joined the channel

2022-05-25 14510, 2022

1:08 AM
strider has quit

2022-05-25 14550, 2022

2:22 AM
strider joined the channel

2022-05-25 14543, 2022

2:29 AM
MRiddickW joined the channel

2022-05-25 14525, 2022

3:21 AM
lucifer

alastairp: the timeout thing might be doable not completely sure though. so like store the latest message in ts writer, if it times out and is redelivered do something about it? or do you mean if rabbitmq could automatically do this for us?

2022-05-25 14517, 2022

3:52 AM
chinmay joined the channel

2022-05-25 14514, 2022

5:45 AM
skelly37 joined the channel

2022-05-25 14500, 2022

6:02 AM
chinmay has quit

2022-05-25 14531, 2022

6:47 AM
SothoTalKer_ joined the channel

2022-05-25 14548, 2022

6:51 AM
tykling_ joined the channel

2022-05-25 14510, 2022

6:52 AM
pprkut_ joined the channel

2022-05-25 14515, 2022

6:52 AM
loujine_ joined the channel

2022-05-25 14542, 2022

6:56 AM
pprkut has quit

2022-05-25 14542, 2022

6:56 AM
loujine has quit

2022-05-25 14543, 2022

6:56 AM
SothoTalKer has quit

2022-05-25 14543, 2022

6:56 AM
tykling has quit

2022-05-25 14543, 2022

6:56 AM
rdswift has quit

2022-05-25 14551, 2022

6:56 AM
SothoTalKer_ is now known as SothoTalKer

2022-05-25 14538, 2022

7:01 AM
rdswift joined the channel

2022-05-25 14505, 2022

7:02 AM
Shubh has quit

2022-05-25 14546, 2022

7:02 AM
rdswift is now known as Guest7859

2022-05-25 14542, 2022

7:20 AM
bitmap has quit

2022-05-25 14546, 2022

7:20 AM
chinmay joined the channel

2022-05-25 14558, 2022

7:20 AM
bitmap joined the channel

2022-05-25 14503, 2022

7:21 AM
everdred has quit

2022-05-25 14520, 2022

7:21 AM
Etua joined the channel

2022-05-25 14520, 2022

7:21 AM
everdred joined the channel

2022-05-25 14541, 2022

7:31 AM
Etua has quit

2022-05-25 14501, 2022

8:06 AM
trolley has quit

2022-05-25 14556, 2022

8:06 AM
trolley joined the channel

2022-05-25 14535, 2022

8:24 AM
BrainzGit

[guidelines] 14zas opened pull request #19 (03master…zas-patch-1): Typo fix Reccomended -> Recommended https://github.com/metabrainz/guidelines/pull/19

2022-05-25 14528, 2022

8:27 AM
Shubh joined the channel

2022-05-25 14509, 2022

8:28 AM
BrainzGit

[guidelines] 14reosarevok merged pull request #19 (03master…zas-patch-1): Typo fix Reccomended -> Recommended https://github.com/metabrainz/guidelines/pull/19

2022-05-25 14504, 2022

8:36 AM
mayhem

moooin!

2022-05-25 14520, 2022

8:36 AM
lucifer

morning

2022-05-25 14506, 2022

8:37 AM
mayhem

reosarevok: are you back today?

2022-05-25 14530, 2022

8:37 AM
reosarevok

Yes, in an hour or two max

2022-05-25 14558, 2022

8:37 AM
mayhem

ok, I'm forwarding a schema change question to support@ then

2022-05-25 14524, 2022

8:38 AM
reosarevok

Works for me

2022-05-25 14532, 2022

8:49 AM
mayhem

atj: akshaaatt yvanzo bitmap alastairp monkey outsidecontext and anyone with a @meb email address. If you'd like a metabrainz dropbox account with loads of storage, go ahead and use your @meb email address to sign up for the dropbox account and you'll be automatically added to the team.

2022-05-25 14541, 2022

8:50 AM
akshaaatt

Sounds cool mayhem!

2022-05-25 14558, 2022

8:50 AM
mayhem

yeah, go for it. our friend at dropbox gave us a free business account.

2022-05-25 14547, 2022

9:30 AM
alastairp

thanks mayhem

2022-05-25 14554, 2022

9:30 AM
mayhem

np!

2022-05-25 14513, 2022

9:31 AM
alastairp

lucifer: hmm, adding some extra checks via tswriter sounds like trouble and complexity

2022-05-25 14542, 2022

9:31 AM
alastairp

I was just thinking out loud that it'd be great if rmq had a "you're _about_ to be disconnected" callback, rather than just "you've been disconnected"

2022-05-25 14524, 2022

9:32 AM
alastairp

any thoughts on the comment that I added on LB#2009?

2022-05-25 14525, 2022

9:32 AM
BrainzBot

Allow a maximim number of listens per import: https://github.com/metabrainz/listenbrainz-server…

2022-05-25 14526, 2022

9:32 AM
lucifer

ah i see. i don't think that thing exists.

2022-05-25 14540, 2022

9:32 AM
alastairp

the documentation about max listen payload and the check that we do is different

2022-05-25 14545, 2022

9:32 AM
alastairp

so we should update one or the other

2022-05-25 14511, 2022

9:34 AM
alastairp

any thoughts on how we can select the largest listen payload from the database? can you do strlen on a json column in pg?

2022-05-25 14534, 2022

9:34 AM
mayhem

alastairp: slowly, yes.

2022-05-25 14557, 2022

9:34 AM
atj

so is my understanding correct, in that the issue last night was someone submitting a request with 70k listens which effectively DoS'd the application?

2022-05-25 14508, 2022

9:35 AM
alastairp

atj: more or less, yes

2022-05-25 14543, 2022

9:35 AM
mayhem

alastairp: seems possible: https://stackoverflow.com/questions/40440462/how-…

2022-05-25 14550, 2022

9:35 AM
alastairp

there is a timing issue where rmq expects an acknolwledgement of the message within [timespan], and it took us longer than that to process the listens

2022-05-25 14552, 2022

9:35 AM
mayhem

`select id, pg_column_size(datab) from data.items;`

2022-05-25 14553, 2022

9:35 AM
atj

and the consumer was taking too long to process the request, so it timed out?

2022-05-25 14500, 2022

9:36 AM
atj

right

2022-05-25 14502, 2022

9:36 AM
mayhem

yes.

2022-05-25 14516, 2022

9:36 AM
alastairp

current fix in my PR ^ is to limit the number of items in a single payload

2022-05-25 14532, 2022

9:36 AM
alastairp

but I'm thinking out loud about how we could preempt this issue if we get into it again

2022-05-25 14544, 2022

9:36 AM
atj

yes, it sounds like a good idea to have some sort of limit :)

2022-05-25 14556, 2022

9:36 AM
atj

how big was the actual request?

2022-05-25 14512, 2022

9:37 AM
atj

I guess you can fit 70k listens in a few MB?

2022-05-25 14536, 2022

9:37 AM
alastairp

ahh, yesterday I could have told you that because I had the size in the rmq admin pane when I popped the message off of the queue

2022-05-25 14543, 2022

9:37 AM
alastairp

but I don't think we have that info any more

2022-05-25 14502, 2022

9:38 AM
alastairp

our "sample" listen is 700 bytes (https://listenbrainz.readthedocs.io/en/latest/use…)

2022-05-25 14540, 2022

9:38 AM
alastairp

we have a 10kb per listen upper limit, but our check is that the average size of all listens in the payload is less than 10kb, not the full message

2022-05-25 14534, 2022

9:39 AM
lucifer

re size of data, yeah pg_column_size may work, probably use it to find the largest listen and then do actual length on it because pg might be compressing jsonb and the size might be less than actual then.

2022-05-25 14525, 2022

9:40 AM
lucifer

regarding the check, i am not sure why it was added. is it there to prevent overwhelming something api side or db side?

2022-05-25 14531, 2022

9:40 AM
alastairp

or maybe we can estimate

2022-05-25 14532, 2022

9:40 AM
atj

seems a weird check

2022-05-25 14538, 2022

9:40 AM
atj

"the average size of all listens in the payload is less than 10kb"

2022-05-25 14554, 2022

9:40 AM
alastairp

I think this may have come about because of our changes in technologies

2022-05-25 14500, 2022

9:41 AM
lucifer

if the former then a per document size makes sense but if the latter then a per listen size limit is sensible.

2022-05-25 14522, 2022

9:41 AM
alastairp

that check may have been added before we started using rabbitmq (and so before we could have multiple listens per message)

2022-05-25 14536, 2022

9:41 AM
lucifer

we still have multiple listens per message

2022-05-25 14545, 2022

9:41 AM
alastairp

right, but did we beforehand?

2022-05-25 14546, 2022

9:41 AM
lucifer

ah sorry, i misread your message

2022-05-25 14546, 2022

9:42 AM
alastairp

https://github.com/metabrainz/listenbrainz-server…

2022-05-25 14552, 2022

9:42 AM
alastairp

I guess it's just a simpler way of doing the check

2022-05-25 14505, 2022

9:43 AM
alastairp

of "each listen < 10k in size"

2022-05-25 14535, 2022

9:43 AM
alastairp

because otherwise you'd need to convert body json -> py dict, then iterate through it, then for each item convert back to json to check its size in bytes

2022-05-25 14521, 2022

9:48 AM
lucifer

checks seems to have been added in this commit. https://github.com/metabrainz/listenbrainz-server…

2022-05-25 14513, 2022

9:49 AM
lucifer

so multiple listens existed that time too.probably that way for easier checking indeed

2022-05-25 14541, 2022

9:52 AM
lucifer

alastairp: `select *, pg_column_size(data) AS size from listen ORDER BY size DESC LIMIT 5;` to find largest listen. then `select length(data::text) from listen where listened_at = 1651504402 AND user_id = 8741;` for its length. length is 9331.

2022-05-25 14524, 2022

9:56 AM
alastairp

commit message by mayhem: "That is probably enough sanity checking for this minute."

2022-05-25 14530, 2022

9:56 AM
alastairp

I mean, at the time it probably was

2022-05-25 14513, 2022

9:57 AM
alastairp

lucifer: yes, multiple listens in an API payload existed. but I was talking about what happens after the API endpoint

2022-05-25 14540, 2022

9:57 AM
alastairp

I think that at that time we may have split the payload up into messages of 1 listen each

2022-05-25 14558, 2022

9:58 AM
alastairp

lucifer: https://github.com/metabrainz/listenbrainz-server…

2022-05-25 14501, 2022

9:59 AM
lucifer

i see makes sense,

2022-05-25 14520, 2022

9:59 AM
lucifer

yup it splits the payload into separate listens

2022-05-25 14547, 2022

10:03 AM
MRiddickW has quit

2022-05-25 14532, 2022

10:05 AM
atj

seems like it might be a good idea to check the payload size before parsing it

2022-05-25 14548, 2022

10:06 AM
alastairp

atj: hmm, right. so we could say "per-listen 10k, max 1000 listens -> max message payload 10mb" and reject early if the body is over that

2022-05-25 14518, 2022

10:07 AM
alastairp

then reject again after parsing if there are >1000 listens (e.g. maybe the user submitted 10k small listens which were under the 10mb limit)

2022-05-25 14523, 2022

10:07 AM
atj

exactly

2022-05-25 14544, 2022

10:07 AM
atj

you could open yourself up to DoS if someone decides to send lots of large payloads

2022-05-25 14522, 2022

10:08 AM
atj

I doubt "ujson.loads()" is a particularly cheap call, CPU or memory wise

2022-05-25 14536, 2022

10:08 AM
alastairp

it's certainly cheaper than json.loads()!

2022-05-25 14555, 2022

10:08 AM
atj

that's probably a low bar though isn't it? :)

2022-05-25 14556, 2022

10:08 AM
alastairp

but you're right - fail if necessary before doing the json parsing sounds like a good idea too

2022-05-25 14505, 2022

10:09 AM
alastairp

thanks

2022-05-25 14528, 2022

10:09 AM
atj

fail early and all that

2022-05-25 14513, 2022

10:14 AM
atj

this check for unicode null - https://github.com/metabrainz/listenbrainz-server…

2022-05-25 14517, 2022

10:14 AM
atj

isn't that just null?

2022-05-25 14526, 2022

10:17 AM
atj

ah, I see further down that Postgres doesn't like it and you have a performance check for it

2022-05-25 14521, 2022

10:19 AM
alastairp

I think you're right that it is just "null" check

2022-05-25 14537, 2022

10:19 AM
alastairp

I guess perhaps it's "null in a string" rather than "null anywhere else in the payload"

2022-05-25 14503, 2022

10:20 AM
alastairp

lucifer: did we ever check the performance of that span in sentry?

2022-05-25 14520, 2022

10:20 AM
lucifer

yup, it was miniscule impact.

2022-05-25 14552, 2022

10:20 AM
lucifer

https://sentry.metabrainz.org/organizations/metab…

2022-05-25 14554, 2022

10:20 AM
lucifer

for instance

2022-05-25 14516, 2022

10:21 AM
alastairp

cool

2022-05-25 14534, 2022

10:21 AM
alastairp

100x faster than round-trip to redis

2022-05-25 14546, 2022

10:21 AM
alastairp

maybe we can remove the sentry span, then?

2022-05-25 14551, 2022

10:22 AM
lucifer

yes sounds good

2022-05-25 14525, 2022

10:23 AM
alastairp

that being said, great feature if we want to benchmark other stuff

2022-05-25 14506, 2022

10:32 AM
alastairp

lucifer: I'm looking into this requirements.txt -> setup.py install_requires change for BU

2022-05-25 14543, 2022

10:32 AM
alastairp

I think the least effort solution that'll work well is just to rewrite any lines that are git repos, right?

2022-05-25 14523, 2022

10:33 AM
alastairp

or we can try pyproject.toml :/

2022-05-25 14528, 2022

10:34 AM
alastairp

https://github.com/pypa/pip/issues/8049

2022-05-25 14539, 2022

10:34 AM
jesus2099 joined the channel

2022-05-25 14517, 2022

10:35 AM
alastairp

looks like this is still all a bit of a mess

2022-05-25 14519, 2022

10:35 AM
lucifer

alastairp: thats what the SO answer from yesterfay did?

2022-05-25 14523, 2022

10:35 AM
alastairp

hi jesus2099!

2022-05-25 14507, 2022

10:36 AM
alastairp

lucifer: yes, the answer from yesterday suggested a new format for source dependencies (PEP 508 I understand), but that's still different from what requirements.txt needs

2022-05-25 14540, 2022

10:36 AM
lucifer

i see, makes sense to rewrite then

2022-05-25 14556, 2022

10:36 AM
alastairp

so... we either need to manage these lists manually, convert automatically, or switch to poetry/pyproject.toml for local development so that we can reuse the same file for local dev and remote installs

2022-05-25 14555, 2022

10:37 AM
alastairp

agreed that automatically converting when inserting into setup.py is probably the easiest idea for now - and it's only temporary until new mbdata is released, I guess

2022-05-25 14551, 2022

10:38 AM
lucifer

if poetry/pyproject.toml handle this better then makes sense to migrate to those in future.

2022-05-25 14514, 2022

10:39 AM
CatQuest

..

2022-05-25 14520, 2022

10:39 AM
lucifer

indeed, the current situation should be temporary

2022-05-25 14538, 2022

10:39 AM
CatQuest

no. i refuse. you can't call a foss project "poetry" that's just .. how even

2022-05-25 14546, 2022

10:39 AM
CatQuest

sigh

2022-05-25 14526, 2022

10:41 AM
alastairp

yeah, though I'm just worried that we'll move to a new dependency tool just as a new "better" one gets released

2022-05-25 14527, 2022

10:41 AM
alastairp

lol

2022-05-25 14554, 2022

10:43 AM
skelly37

outsidecontext, zas: FIFO seems to work for unix, you might want to take a look and review the bare protocol and ideas behind it: https://github.com/skelly37/pipethon. In the evening I'll take care of doing the same but with Windows API and then it's ready to be implemented in Picard.

2022-05-25 14520, 2022

10:45 AM
alastairp

skelly37: great work getting this far so soon!

2022-05-25 14542, 2022

10:45 AM
alastairp

of course, as they say the first 90% is easy, it's the second 90% that takes most of the time :)

2022-05-25 14549, 2022

10:45 AM
atj

pipes work on windows?

2022-05-25 14502, 2022

10:46 AM
skelly37

alastairp: thanks :)

2022-05-25 14503, 2022

10:46 AM
alastairp

atj: "but with windows API" - I guess not

2022-05-25 14521, 2022

10:47 AM
alastairp

ansh: by the way, I didn't confirm with you the other day, but as mayhem pointed out starting early is fine. let me know when you want to discuss this. I was chatting with monkey and we think that it makes sense that I work with you directly when you're working on CB parts, and he works with you when doing BB parts. perhaps the 3 of us could get together in the next week or so and go over your plan again

2022-05-25 14517, 2022

10:48 AM
skelly37

atj: It requires pywin32 module, os.mkfifo() is unfortunately unix only.