lucifer, mayhem: Hello! We have a new user whose dashboard is completely broken due to the timescale sporadic listens issue. In this case it is drastic, the request completely fails and the page does not load.... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/QCjuPSaEMDwRUfRUSODNAAOI>)
2025-08-21 23316, 2025
d4rkie has quit
2025-08-21 23345, 2025
d4rkie joined the channel
2025-08-21 23309, 2025
lucifer[m]
[@monkey:chatbrainz.org](https://matrix.to/#/@monkey:chatbrainz.org) yes that would work, i think the point of contention was whether the same behaviour should be enforced for api users.
2025-08-21 23329, 2025
monkey[m]
Yes, API is also going to be an issue in this (albeit extreme) case
2025-08-21 23336, 2025
monkey[m]
Not sure what we should do then, but I assume something similar.
2025-08-21 23336, 2025
monkey[m]
There was talk of storign timestamps of when the next listens starts if memory serves?
2025-08-21 23336, 2025
monkey[m]
Then we could return a pagination-style next_ts or something int he API response
2025-08-21 23354, 2025
lucifer[m]
yes right
2025-08-21 23351, 2025
Karlifornia
i love that people still use IRC for work purposes/to contact one another to concur things
2025-08-21 23303, 2025
mayhem[m]
> For the dashboard we had talked about have a max number of passes and returning early with a bool indicator if we didn't reach 25 listens, then we can show a load more button on the front-end. Would that work?
2025-08-21 23311, 2025
mayhem[m]
that is what we did previously and everyone hated it.
2025-08-21 23330, 2025
mayhem[m]
Karlifornia (IRC): most of us are now on matrix, but we have holdouts on IRC.
2025-08-21 23354, 2025
mayhem[m]
monkey: I suggested to lucifer that we insert hints into the DB on a periodic basis. so, if we're fetching dates for one year and we're approaching a large gap, say 2 years, the system should insert a record of sorts that says, next listen is at <this> timestamp.
2025-08-21 23341, 2025
lucifer[m]
yes we can do that but it will be hard to keep that up to date because we allow historical imports.
2025-08-21 23302, 2025
lucifer[m]
say a hint says the next listen is in 2022 but then the user submits listens for 2023.
2025-08-21 23329, 2025
monkey[m]
Yes, I believe this is probably how this particular case ended up happening, looks like imported data 2007-2012, then started sending new listens in 2025
2025-08-21 23332, 2025
lucifer[m]
unless we invalidate the 2022 hint properly in the database we will miss the 2023 listens.
2025-08-21 23305, 2025
mayhem[m]
lucifer[m]: the script that manages these hints needs to handle this case.
2025-08-21 23309, 2025
monkey[m]
mayhem[m]: Wait did we do this already? You mean a while back, or?
2025-08-21 23316, 2025
mayhem[m]
and we should have a way to recompute them all for a user, after an import.
2025-08-21 23328, 2025
lucifer[m]
mayhem: it would need to happen as soon as insert listens happens imo
2025-08-21 23332, 2025
mayhem[m]
monkey: that was the very first implementation.
2025-08-21 23338, 2025
monkey[m]
I see
2025-08-21 23359, 2025
mayhem[m]
lucifer[m]: if we can insert them without a background process, even better!
2025-08-21 23308, 2025
lucifer[m]
iiuc the hints will be a nullable column in the listens table?
2025-08-21 23342, 2025
mayhem[m]
I think that could work. when we do an insert for a user, we know their min/max timestamps already, yes? if a new insert is more than X days away, insert a hint. at least, that is the easy case.
2025-08-21 23300, 2025
mayhem[m]
lucifer[m]: separate table, I would think.
2025-08-21 23301, 2025
monkey[m]
Could a hybrid solution work? 1. scan a max number of times, return early if we hit the max, 2. despite having returned the response, keep scanning the DB until we hit the next listen 3. store that listens ts temporarily as a hint for future table scans
2025-08-21 23319, 2025
mayhem[m]
what if all listens, have a previous_ts field?
2025-08-21 23331, 2025
mayhem[m]
and that is always kept up to date?
2025-08-21 23335, 2025
mayhem[m]
could we manage that?
2025-08-21 23356, 2025
mayhem[m]
because this problem becomes easy if you have previous_ts
2025-08-21 23341, 2025
monkey[m]
I hate to think about adding that to the giant SQL queries we have, but if lucifer says it's possible...
2025-08-21 23359, 2025
lucifer[m]
the issue is keeping it upto date. delete and insert need to be updated.
2025-08-21 23311, 2025
lucifer[m]
doable yes not sure how fast it will be.
2025-08-21 23312, 2025
d4rkie has quit
2025-08-21 23315, 2025
mayhem[m]
yes, keeping it up to date is the tricky part.
2025-08-21 23333, 2025
lucifer[m]
i would rather do a separate table on the granularity of a month or year.
2025-08-21 23342, 2025
d4rkie joined the channel
2025-08-21 23343, 2025
mayhem[m]
sure.
2025-08-21 23351, 2025
lucifer[m]
that maintains info as to whether a user has listens in that month or year.
2025-08-21 23307, 2025
lucifer[m]
not as optimal but easier to keep up to date
2025-08-21 23310, 2025
mayhem[m]
delete is not too hard. we have the previous_ts, we'd need to scan for next_ts, which is probelmatic.
2025-08-21 23317, 2025
mayhem[m]
so previous_ts and next_ts?
2025-08-21 23319, 2025
monkey[m]
Ahhh, and use that as a hint to skip some time buckets?
2025-08-21 23339, 2025
mayhem[m]
lucifer: oh! I like that idea.
2025-08-21 23341, 2025
monkey[m]
(I meant lucifer's proposal)
2025-08-21 23300, 2025
mayhem[m]
how many buckets can we shove into 64 bits?
2025-08-21 23305, 2025
mayhem[m]
how future proof is that?
2025-08-21 23315, 2025
monkey[m]
That's a sentence I never thought I'd read.
2025-08-21 23321, 2025
mayhem[m]
what if we had just that to decode a map of where we need to query?
2025-08-21 23307, 2025
mayhem[m]
what is our current bucket size, lucifer ?
2025-08-21 23338, 2025
lucifer[m]
30 days
2025-08-21 23337, 2025
monkey[m]
So 5 years of monthly ticks in 64 bits?
2025-08-21 23305, 2025
mayhem[m]
ok, lets make it future proof then.
2025-08-21 23325, 2025
mayhem[m]
we use a binary field. and we keep appending bytes to it as we progress in time.
2025-08-21 23359, 2025
mayhem[m]
each bit represents one month since LAST_FM_START
2025-08-21 23318, 2025
monkey[m]
We also have the user's earliest TS, we could calculate based on that but would need to update it when there is an import/deletion that changes it
2025-08-21 23305, 2025
mayhem[m]
the large bitfield I was describing is easy to update and easy to fetch/store.
2025-08-21 23311, 2025
lucifer[m]
if we are doing a separate table, then its just one row per user and we can store an ints or datetimes. etc
2025-08-21 23314, 2025
mayhem[m]
got a new listen, just set the right bit.
2025-08-21 23321, 2025
lucifer[m]
bitfield is fine too sure.
2025-08-21 23345, 2025
mayhem[m]
the bitfield is super dense and we dont need to know much than these search hints.
2025-08-21 23359, 2025
mayhem[m]
yeah, I think this idea has legs.
2025-08-21 23312, 2025
lusciouslover has quit
2025-08-21 23302, 2025
lusciouslover joined the channel
2025-08-21 23315, 2025
monkey[m]
How does that translate on the front-end and API side? does it mean we seamlessly continue scanning the DB, but skipping the empty buckets, so it's seamless to users?
2025-08-21 23314, 2025
mayhem[m]
it should be transparent to the UI. just ask for data, the backend will know how to fetch it.
2025-08-21 23355, 2025
monkey[m]
That sounds perfect, definitely better than the other solutions mentioned
2025-08-21 23313, 2025
fettuccinae[m] joined the channel
2025-08-21 23313, 2025
fettuccinae[m]
Would there be any difference if we let timescale scan the db for x number of latest listens of a user instead of changing the scanning window manually for each pass?
2025-08-21 23342, 2025
fettuccinae[m]
* Would there be any difference if we let timescale db scan for x number of latest listens of a user instead of changing the scanning window manually for each pass?
2025-08-21 23357, 2025
mayhem[m]
letting timescale search for x listens is exactly the problem. if there are gaps in the listen history this could take quite some time.
2025-08-21 23313, 2025
fettuccinae[m]
i meant we run the query multiple times before finding x listens, will there be any difference if we run it only once. Also, what if we create an index on user_id, listened_at