MBS-11586: Show How Many Times a Recording Has Been Listened to with ListenBrainz on Recording's Page
_lucifer
reosarevok: No, I think it'll be difficult to do. one major is issue is that listens are linked to a `msid` not a `mbid`.
secondly, i am not sure but as per the current structure of the data, such queries might be expensive.
MRiddickW has quit
zas: ruaok: the worker nodes are failing some jobs due to lack of disk space. I see the nodes have a lot of free space but during processing some data is stored at `/tmp`. this path property is configurable. should i point the nodes to store data somewhere else like `/data/tmp` maybe?
the persistent data is being stored at `/data/datanode` and `/data/namenode`. let me know i should modify that as well.
ruaok
_lucifer: yes. exactly that.
/data seems fine
also, moooin!
iliekcomputers
good morning!
ruaok
moin, how are things in dublin?
iliekcomputers
it's been sunny this week!
ruaok
said like a true northerner. :)
alastairp
hello
ruaok: interesting to see that the materialised view also has lots of subtables
how many rows does listen_count_5day have? hypothesis: if it was just a plain old postgres table with a bunch of indexes, would it be faster?
ruaok
yeah. the thing that flummoxxes me is that it a search on the materialized index with start and end constraints still does a full index scan on all chunks. which kills performance.
> if it was just a plain old postgres table with a bunch of indexes, would it be faster?
it should be faster. right now it has 2M rows or so. with a proper index it is faster.
my conclusion: cont aggs are too critical to be a mission critical element of our infrastructure.
but, I have a plan!
alastairp
yeah, I was surprised to see all of those scans on each chunk, when I guess we put all of this effort into minimising the number of chunks that we wanted to scan back when we started this optimisation process
ruaok
first, we should store listen count, min ts, max ts in redis -- perhaps even one key since they often are needed in the same place. Since we have ONE place where we update the listen table (timescale_writer) we can ensure that the values in redis are kept accurate. And this has been working well from what I can tell. So, lets not expire those keys anymore -- primary store is in redis, but can be recomputed when needed by expiring the
keys.
_lucifer
ruaok: đź‘Ť , i'll restart the cluster after changing the local dir and we should be able to do more requests.
ruaok
that cuts out a 2.7s query. not hard to accomplish.
alastairp
right, so instead of having to do a scan, be it indexed or table scan, we just do a redis get
great
_lucifer
btw, i checked it did process some stats jobs yesterday.
ruaok
next, I took a look at our usage patterns. and I think we should make sure that fetching recent listens is FAST and older listens reasonably fast. ancient listens can be slower. but most of our queries will be recent shit, so make them fast.
I'm finishing debugging an approach that starts with a 30d window and then does an exponential backoff on the query bounds.
alastairp
great, that makes sense too
ruaok
if not found in 30 days, save those results and adjust bounds to be wider and just outside the last bounds.
worst case, 3-5 queries that never overlap, doing a full table index scan.
my goal is to fetch listens in under 100ms on avg.
hopefully better than that.
hopefully we can test that in a minute.
does any of that sounds like a bad approach?
furthermore, cont aggs don't support month buckets for the data view you and Mr_Monkey wanted.
but I created a 30 day bucket, which will never align right, but get the overall job done -- a month or year aligned query on top of that, will still fetch the data in a reasonable manner.
30 day bucket cont agg.
alastairp
will the exponential backoff still hit chunk boundaries?
ruaok
no.
alastairp
but the idea would be that the first few chunks will get most of the data that we want anyway, so it should be fast?
see the paste for how the timestamps change over time. right not I have it tuned to give max 3 queries.
alastairp
yeah right, nice
ruaok
yisss. tests pass.
alastairp
one question: from the pg explain above, do you know what `listened_at_bucket < COALESCE(_timescaledb_internal.cagg_watermark('35'::oid), '-9223372036854775808'::bigint))` is?
I assume that this is some internal timescale thing, but it's interesting that the bigint argument is the same on all chunks
ruaok
oh yes, and when we cache timestamps, then the lower bound of the query will be the timestamps of the user, further making it more efficient. (new users will never paw through 20 years of data)
I don't know. has confused me too.
shivam-kapila
We used to cache min max timestamps during Influx time iirc
ruaok
we still may. I just want to shore them up and store them in one redis key.
I haven't dealt with that yet.
awww yiss: `2021-04-16 09:30:38,546 INFO fetch listens 0.03s in 1 passes`
can you reject badly formatted tags at a github level?
I think it's fine to just match this action on a regex
ruaok
alastairp: could you please glance at https://github.com/metabrainz/listenbrainz-serv... and tell me if we should finish this and merge or if I should do the timestamp improvements before finishing this PR?
Mr_Monkey
Nice work ruaok !
ruaok
I would prefer to finish the timestamp improvements in the same PR.
Mr_Monkey: thanks. :)
Mr_Monkey: please check the branch to see if you find any issues.
Mr_Monkey
Looking now
ruaok
the PR is already long, but mostly because of deleted code.
Mr_Monkey
We're talking of PR 1390 you posted above, correct?
ruaok
yes
which is live on test
did you read my comments on the month and year aggregated views?
_lucifer
alastairp: i don't we can reject such tags directly, we'd need to write some webhook and wire BrainzBot maybe. probably not worth it. the other solution to match the regex is easy. yup, let's do that.
alastairp
ruaok: a quey about that PR: `CREATE VIEW listen_count_5day WITH ...`. I'm not super familiar with this part of postgres
this is a materialised view (I understand yes, because you can only create indexes on materialised views?), in that case, what's the difference between CREAET VIEW and CREATE MATERILIZED VIEW?
ruaok
that part will likely still change. I've not settled on what to do yet.
but I may just replace the listen_count with listen_count_30days and listen_count_365days.
alastairp: thats how things are done in TS 1.7. IN 2.x+ you create a materialized view. I suspect that an upgrade will help us on a few fronts, but they still won't make the approach from yesterday viable.
outsidecontext
zas: oh, great. so we could already change the links for the picard download page?
alastairp
ruaok: ok, thanks for clarifying
ruaok
!m zas
BrainzBot
You're doing good work, zas!
zas
outsidecontext: I guess so, I need have few bits to tune, but it is here to stay
ruaok
alastairp: so thoughts on continue vs new PR for the timestamps work?
alastairp
ruaok: I'm reading through the PR now
outsidecontext
cool. shall I submit a PR for the website?
zas
for now, it supports http & https, should I just redirect http to https?
ruaok
zas: I don't think so.
zas
k
alastairp
ruaok: can you clarify - this PR includes the changes that we discussed on wednesday: adds the 5 day aggregate, removes the need for making multiple queries on the frontend. query times are on par with what they were before this change. your new proposed improvements include caching of the min/max and the increasing range query, which gets us down to your example 30ms query?
ruaok
correct. however, the "adds the 5 day aggregate" will likely still change a bit.
it is no longer crucial to have the cont agg in lock step with the hypertable.
I am learning toward replacing the listen_count with listen_count_30day, rather than creating a new listen_count_5day.
it should make fetching timestamps a bit faster.
so the 30day cont agg and the timestamp work are the only bits left.
alastairp
ah, I see. this is why it also makes sense to merge your timestamp work into this same PR
ruaok
yes.
and then we have one larger PR, but this whole whicket of issues gets solved.
and the basis for 30day listen overview is in place for future work.
alastairp
this PR is already quite large, though I see a lot of it is just changing arguments in tests.
the timestamp stuff is going to be pretty self-contained in a single file, right?
ruaok
exactly why I am asking before proceeding.
2, but year.
yeah
I will create a new agg in this code, but we wont need an SQL update script yet. I've done it by hand.
alastairp
go ahead with putting both in this PR then, as the timestamp stuff isn't going directly affect the other 21 files
ruaok
once deployed I will open a small PR for removing the old cont aggs.
alastairp
I've got a meeting now, but after lunch I'll be available for review
ruaok
kewl. I'll finish that this afternoon with some luck.
I suspect that PR wont be ready until monday, so no rush.
alastairp
ok cool
ruaok
I worked from 10am until 1:30 in the morning with few breaks, so I'd like a bike ride this afternoon. :)
alastairp
can I pre-review this? I see a few suggestions already, or would you prefer to wait until you finish?
theoretically comarca restrictions get lifted on Monday!
ruaok
don't go digging into too much detail and skip comments on the SQL scripts, but yes feedback would be welcome.
ohh, really?? that would be fantastic.
ruaok wants to go to olot.
alastairp
to "Veguerias", which are circa 14th century political divisions, lol
so our one includes Barceloneces, Maresme, and the 2 Valles
"the feudal administrative territorial jurisdiction of the Principality of Catalonia (to the Crown of Aragon) during the Middle Ages and into the Modern Era until the Nueva Planta decrees of 1716."