alastairp: the timeout thing might be doable not completely sure though. so like store the latest message in ts writer, if it times out and is redelivered do something about it? or do you mean if rabbitmq could automatically do this for us?
ok, I'm forwarding a schema change question to support@ then
reosarevok
Works for me
mayhem
atj: akshaaatt yvanzo bitmap alastairp monkey outsidecontext and anyone with a @meb email address. If you'd like a metabrainz dropbox account with loads of storage, go ahead and use your @meb email address to sign up for the dropbox account and you'll be automatically added to the team.
akshaaatt
Sounds cool mayhem!
mayhem
yeah, go for it. our friend at dropbox gave us a free business account.
alastairp
thanks mayhem
mayhem
np!
alastairp
lucifer: hmm, adding some extra checks via tswriter sounds like trouble and complexity
I was just thinking out loud that it'd be great if rmq had a "you're _about_ to be disconnected" callback, rather than just "you've been disconnected"
any thoughts on the comment that I added on LB#2009?
the documentation about max listen payload and the check that we do is different
so we should update one or the other
any thoughts on how we can select the largest listen payload from the database? can you do strlen on a json column in pg?
mayhem
alastairp: slowly, yes.
atj
so is my understanding correct, in that the issue last night was someone submitting a request with 70k listens which effectively DoS'd the application?
we have a 10kb per listen upper limit, but our check is that the average size of all listens in the payload is less than 10kb, not the full message
lucifer
re size of data, yeah pg_column_size may work, probably use it to find the largest listen and then do actual length on it because pg might be compressing jsonb and the size might be less than actual then.
regarding the check, i am not sure why it was added. is it there to prevent overwhelming something api side or db side?
alastairp
or maybe we can estimate
atj
seems a weird check
"the average size of all listens in the payload is less than 10kb"
alastairp
I think this may have come about because of our changes in technologies
lucifer
if the former then a per document size makes sense but if the latter then a per listen size limit is sensible.
alastairp
that check may have been added before we started using rabbitmq (and so before we could have multiple listens per message)
I guess it's just a simpler way of doing the check
of "each listen < 10k in size"
because otherwise you'd need to convert body json -> py dict, then iterate through it, then for each item convert back to json to check its size in bytes
so multiple listens existed that time too.probably that way for easier checking indeed
alastairp: `select *, pg_column_size(data) AS size from listen ORDER BY size DESC LIMIT 5;` to find largest listen. then `select length(data::text) from listen where listened_at = 1651504402 AND user_id = 8741;` for its length. length is 9331.
alastairp
commit message by mayhem: "That is probably enough sanity checking for this minute."
I mean, at the time it probably was
lucifer: yes, multiple listens in an API payload existed. but I was talking about what happens after the API endpoint
I think that at that time we may have split the payload up into messages of 1 listen each
alastairp: thats what the SO answer from yesterfay did?
alastairp
hi jesus2099!
lucifer: yes, the answer from yesterday suggested a new format for source dependencies (PEP 508 I understand), but that's still different from what requirements.txt needs
lucifer
i see, makes sense to rewrite then
alastairp
so... we either need to manage these lists manually, convert automatically, or switch to poetry/pyproject.toml for local development so that we can reuse the same file for local dev and remote installs
agreed that automatically converting when inserting into setup.py is probably the easiest idea for now - and it's only temporary until new mbdata is released, I guess
lucifer
if poetry/pyproject.toml handle this better then makes sense to migrate to those in future.
CatQuest
..
lucifer
indeed, the current situation should be temporary
CatQuest
no. i refuse. you can't call a foss project "poetry" that's just .. how even
sigh
alastairp
yeah, though I'm just worried that we'll move to a new dependency tool just as a new "better" one gets released
lol
skelly37
outsidecontext, zas: FIFO seems to work for unix, you might want to take a look and review the bare protocol and ideas behind it: https://github.com/skelly37/pipethon. In the evening I'll take care of doing the same but with Windows API and then it's ready to be implemented in Picard.
alastairp
skelly37: great work getting this far so soon!
of course, as they say the first 90% is easy, it's the second 90% that takes most of the time :)
atj
pipes work on windows?
skelly37
alastairp: thanks :)
alastairp
atj: "but with windows API" - I guess not
ansh: by the way, I didn't confirm with you the other day, but as mayhem pointed out starting early is fine. let me know when you want to discuss this. I was chatting with monkey and we think that it makes sense that I work with you directly when you're working on CB parts, and he works with you when doing BB parts. perhaps the 3 of us could get together in the next week or so and go over your plan again
skelly37
atj: It requires pywin32 module, os.mkfifo() is unfortunately unix only.