in #metabrainz

4:34 AM
inkasso0815 joined the channel
4:34 AM
inkasso0815 has quit
5:01 AM
reosarevok

ruaok, _lucifer: is this something that is easy to do now? https://tickets.metabrainz.org/browse/MBS-11586
5:01 AM
BrainzBot

MBS-11586: Show How Many Times a Recording Has Been Listened to with ListenBrainz on Recording's Page
5:11 AM
_lucifer

reosarevok: No, I think it'll be difficult to do. one major is issue is that listens are linked to a `msid` not a `mbid`.
5:12 AM
secondly, i am not sure but as per the current structure of the data, such queries might be expensive.
6:31 AM
MRiddickW has quit
6:31 AM
zas: ruaok: the worker nodes are failing some jobs due to lack of disk space. I see the nodes have a lot of free space but during processing some data is stored at `/tmp`. this path property is configurable. should i point the nodes to store data somewhere else like `/data/tmp` maybe?
6:31 AM
https://www.irccloud.com/pastebin/GojRLn5D/
6:33 AM
the persistent data is being stored at `/data/datanode` and `/data/namenode`. let me know i should modify that as well.
8:30 AM
ruaok

_lucifer: yes. exactly that.
8:30 AM
/data seems fine
8:30 AM
also, moooin!
8:51 AM
iliekcomputers

good morning!
8:52 AM
ruaok

moin, how are things in dublin?
8:53 AM
iliekcomputers

it's been sunny this week!
8:53 AM
ruaok

said like a true northerner. :)
8:54 AM
alastairp

hello
8:55 AM
ruaok: interesting to see that the materialised view also has lots of subtables
8:55 AM
how many rows does listen_count_5day have? hypothesis: if it was just a plain old postgres table with a bunch of indexes, would it be faster?
8:56 AM
ruaok

yeah. the thing that flummoxxes me is that it a search on the materialized index with start and end constraints still does a full index scan on all chunks. which kills performance.
8:56 AM
> if it was just a plain old postgres table with a bunch of indexes, would it be faster?
8:57 AM
it should be faster. right now it has 2M rows or so. with a proper index it is faster.
8:57 AM
my conclusion: cont aggs are too critical to be a mission critical element of our infrastructure.
8:57 AM
but, I have a plan!
8:58 AM
alastairp

yeah, I was surprised to see all of those scans on each chunk, when I guess we put all of this effort into minimising the number of chunks that we wanted to scan back when we started this optimisation process
8:59 AM
ruaok

first, we should store listen count, min ts, max ts in redis -- perhaps even one key since they often are needed in the same place. Since we have ONE place where we update the listen table (timescale_writer) we can ensure that the values in redis are kept accurate. And this has been working well from what I can tell. So, lets not expire those keys anymore -- primary store is in redis, but can be recomputed when needed by expiring the
8:59 AM
keys.
8:59 AM
_lucifer

ruaok: 👍 , i'll restart the cluster after changing the local dir and we should be able to do more requests.
8:59 AM
ruaok

that cuts out a 2.7s query. not hard to accomplish.
9:00 AM
alastairp

right, so instead of having to do a scan, be it indexed or table scan, we just do a redis get
9:00 AM
great
9:00 AM
_lucifer

btw, i checked it did process some stats jobs yesterday.
9:00 AM
ruaok

next, I took a look at our usage patterns. and I think we should make sure that fetching recent listens is FAST and older listens reasonably fast. ancient listens can be slower. but most of our queries will be recent shit, so make them fast.
9:01 AM
I'm finishing debugging an approach that starts with a 30d window and then does an exponential backoff on the query bounds.
9:01 AM
alastairp

great, that makes sense too
9:01 AM
ruaok

if not found in 30 days, save those results and adjust bounds to be wider and just outside the last bounds.
9:02 AM
worst case, 3-5 queries that never overlap, doing a full table index scan.
9:02 AM
my goal is to fetch listens in under 100ms on avg.
9:02 AM
hopefully better than that.
9:03 AM
hopefully we can test that in a minute.
9:04 AM
does any of that sounds like a bad approach?
9:04 AM
furthermore, cont aggs don't support month buckets for the data view you and Mr_Monkey wanted.
9:05 AM
but I created a 30 day bucket, which will never align right, but get the overall job done -- a month or year aligned query on top of that, will still fetch the data in a reasonable manner.
9:05 AM
30 day bucket cont agg.
9:20 AM
alastairp

will the exponential backoff still hit chunk boundaries?
9:20 AM
ruaok

no.
9:20 AM
alastairp

but the idea would be that the first few chunks will get most of the data that we want anyway, so it should be fast?
9:21 AM
ruaok

https://www.irccloud.com/pastebin/xKuw5kt8/
9:21 AM
yes, exactly that.
9:21 AM
see the paste for how the timestamps change over time. right not I have it tuned to give max 3 queries.
9:24 AM
alastairp

yeah right, nice
9:24 AM
ruaok

yisss. tests pass.
9:25 AM
alastairp

one question: from the pg explain above, do you know what `listened_at_bucket < COALESCE(_timescaledb_internal.cagg_watermark('35'::oid), '-9223372036854775808'::bigint))` is?
9:25 AM
I assume that this is some internal timescale thing, but it's interesting that the bigint argument is the same on all chunks
9:25 AM
ruaok

oh yes, and when we cache timestamps, then the lower bound of the query will be the timestamps of the user, further making it more efficient. (new users will never paw through 20 years of data)
9:26 AM
I don't know. has confused me too.
9:29 AM
shivam-kapila

We used to cache min max timestamps during Influx time iirc
9:30 AM
ruaok

we still may. I just want to shore them up and store them in one redis key.
9:30 AM
I haven't dealt with that yet.
9:30 AM
awww yiss: `2021-04-16 09:30:38,546 INFO fetch listens 0.03s in 1 passes`
9:31 AM
fetch this: https://test.listenbrainz.org/user/mr_monkey
9:31 AM
shivam-kapila

That looks goood
9:31 AM
Damn thats atleast 2x faster for me
9:31 AM
ruaok

yes, we clearly cache timestamps. once they are in cache, things are fast.
9:32 AM
https://www.irccloud.com/pastebin/xdOHQJ0G/
9:32 AM
30ms. nice. I was shooting for under 100ms.
9:32 AM
yesterday was a loong day, but in the end worth it.
9:33 AM
shivam-kapila

Strange enough that the cont. agg. Is so slow
9:33 AM
alastairp

sweet, nice work
9:34 AM
ruaok

wooooo, wooo fucking hooo!
9:34 AM
ruaok does a little dance
9:34 AM
man, that was a long slog.
9:34 AM
_lucifer

!m ruaok
9:34 AM
BrainzBot

You're doing good work, ruaok!
9:34 AM
ruaok

https://www.irccloud.com/pastebin/RST1HlT7/
9:34 AM
shivam-kapila

But the performance boost is cleary noticable.
9:34 AM
ruaok

anything that is slow, is the ts fetch. once that is in cache, booom!
9:34 AM
:D
9:36 AM
BrainzGit

[listenbrainz-server] mayhem opened pull request #1390 (master…be-gone-time-ranges): Remove time ranges and refactor listen fetching https://github.com/metabrainz/listenbrainz-serv...
9:36 AM
sumedh joined the channel
9:37 AM
_lucifer

alastairp: hi! regarding https://github.com/metabrainz/listenbrainz-serv..., do you mean to reject invalid format tags or just not build and push the image to docker hub?
9:38 AM
alastairp

good question
9:38 AM
can you reject badly formatted tags at a github level?
9:39 AM
I think it's fine to just match this action on a regex
9:39 AM
ruaok

alastairp: could you please glance at https://github.com/metabrainz/listenbrainz-serv... and tell me if we should finish this and merge or if I should do the timestamp improvements before finishing this PR?
9:39 AM
Mr_Monkey

Nice work ruaok !
9:39 AM
ruaok

I would prefer to finish the timestamp improvements in the same PR.
9:39 AM
Mr_Monkey: thanks. :)
9:40 AM
Mr_Monkey: please check the branch to see if you find any issues.
9:40 AM
Mr_Monkey

Looking now
9:40 AM
ruaok

the PR is already long, but mostly because of deleted code.
9:40 AM
Mr_Monkey

We're talking of PR 1390 you posted above, correct?
9:41 AM
ruaok

yes
9:41 AM
which is live on test
9:41 AM
did you read my comments on the month and year aggregated views?
9:42 AM
_lucifer

alastairp: i don't we can reject such tags directly, we'd need to write some webhook and wire BrainzBot maybe. probably not worth it. the other solution to match the regex is easy. yup, let's do that.
9:42 AM
alastairp

ruaok: a quey about that PR: `CREATE VIEW listen_count_5day WITH ...`. I'm not super familiar with this part of postgres
9:43 AM
this is a materialised view (I understand yes, because you can only create indexes on materialised views?), in that case, what's the difference between CREAET VIEW and CREATE MATERILIZED VIEW?
9:43 AM
ruaok

that part will likely still change. I've not settled on what to do yet.
9:43 AM
but I may just replace the listen_count with listen_count_30days and listen_count_365days.
9:44 AM
outsidecontext

zas: https://blog.mozilla.org/addons/2021/04/15/buil...
9:44 AM
zas

outsidecontext: https://data.musicbrainz.org
9:44 AM
ruaok

alastairp: thats how things are done in TS 1.7. IN 2.x+ you create a materialized view. I suspect that an upgrade will help us on a few fronts, but they still won't make the approach from yesterday viable.
9:45 AM
outsidecontext

zas: oh, great. so we could already change the links for the picard download page?
9:45 AM
alastairp

ruaok: ok, thanks for clarifying
9:45 AM
ruaok

!m zas
9:45 AM
BrainzBot

You're doing good work, zas!
9:45 AM
zas

outsidecontext: I guess so, I need have few bits to tune, but it is here to stay
9:45 AM
ruaok

alastairp: so thoughts on continue vs new PR for the timestamps work?
9:45 AM
alastairp

ruaok: I'm reading through the PR now
9:45 AM
outsidecontext

cool. shall I submit a PR for the website?
9:46 AM
zas

for now, it supports http & https, should I just redirect http to https?
9:46 AM
ruaok

zas: I don't think so.
9:46 AM
zas

k
9:52 AM
alastairp

ruaok: can you clarify - this PR includes the changes that we discussed on wednesday: adds the 5 day aggregate, removes the need for making multiple queries on the frontend. query times are on par with what they were before this change. your new proposed improvements include caching of the min/max and the increasing range query, which gets us down to your example 30ms query?
9:53 AM
ruaok

correct. however, the "adds the 5 day aggregate" will likely still change a bit.
9:53 AM
it is no longer crucial to have the cont agg in lock step with the hypertable.
9:54 AM
I am learning toward replacing the listen_count with listen_count_30day, rather than creating a new listen_count_5day.
9:54 AM
it should make fetching timestamps a bit faster.
9:54 AM
so the 30day cont agg and the timestamp work are the only bits left.
9:54 AM
alastairp

ah, I see. this is why it also makes sense to merge your timestamp work into this same PR
9:54 AM
ruaok

yes.
9:55 AM
and then we have one larger PR, but this whole whicket of issues gets solved.
9:55 AM
and the basis for 30day listen overview is in place for future work.
9:55 AM
alastairp

this PR is already quite large, though I see a lot of it is just changing arguments in tests.
9:56 AM
the timestamp stuff is going to be pretty self-contained in a single file, right?
9:56 AM
ruaok

exactly why I am asking before proceeding.
9:56 AM
2, but year.
9:56 AM
yeah
9:56 AM
I will create a new agg in this code, but we wont need an SQL update script yet. I've done it by hand.
9:56 AM
alastairp

go ahead with putting both in this PR then, as the timestamp stuff isn't going directly affect the other 21 files
9:57 AM
ruaok

once deployed I will open a small PR for removing the old cont aggs.
9:57 AM
alastairp

I've got a meeting now, but after lunch I'll be available for review
9:57 AM
ruaok

kewl. I'll finish that this afternoon with some luck.
9:57 AM
I suspect that PR wont be ready until monday, so no rush.
9:58 AM
alastairp

ok cool
9:58 AM
ruaok

I worked from 10am until 1:30 in the morning with few breaks, so I'd like a bike ride this afternoon. :)
9:58 AM
alastairp

can I pre-review this? I see a few suggestions already, or would you prefer to wait until you finish?
9:58 AM
theoretically comarca restrictions get lifted on Monday!
9:59 AM
ruaok

don't go digging into too much detail and skip comments on the SQL scripts, but yes feedback would be welcome.
9:59 AM
ohh, really?? that would be fantastic.
9:59 AM
ruaok wants to go to olot.
9:59 AM
alastairp

to "Veguerias", which are circa 14th century political divisions, lol
9:59 AM
so our one includes Barceloneces, Maresme, and the 2 Valles
9:59 AM
https://en.wikipedia.org/wiki/Vegueria
10:00 AM
"the feudal administrative territorial jurisdiction of the Principality of Catalonia (to the Crown of Aragon) during the Middle Ages and into the Modern Era until the Nueva Planta decrees of 1716."
10:00 AM
ruaok

crap.
10:00 AM
oh well, a trip up to Arenys would also be nice.
10:01 AM
Sophist-UK joined the channel