#metabrainz

/

4:34 AM
inkasso0815 joined the channel

2021-04-16 10608, 2021

4:34 AM
inkasso0815 has quit

2021-04-16 10658, 2021

5:01 AM
reosarevok

ruaok, _lucifer: is this something that is easy to do now? https://tickets.metabrainz.org/browse/MBS-11586

2021-04-16 10659, 2021

5:01 AM
BrainzBot

MBS-11586: Show How Many Times a Recording Has Been Listened to with ListenBrainz on Recording's Page

2021-04-16 10646, 2021

5:11 AM
_lucifer

reosarevok: No, I think it'll be difficult to do. one major is issue is that listens are linked to a `msid` not a `mbid`.

2021-04-16 10640, 2021

5:12 AM
_lucifer

secondly, i am not sure but as per the current structure of the data, such queries might be expensive.

2021-04-16 10605, 2021

6:31 AM
MRiddickW has quit

2021-04-16 10612, 2021

6:31 AM
_lucifer

zas: ruaok: the worker nodes are failing some jobs due to lack of disk space. I see the nodes have a lot of free space but during processing some data is stored at `/tmp`. this path property is configurable. should i point the nodes to store data somewhere else like `/data/tmp` maybe?

2021-04-16 10646, 2021

6:31 AM
_lucifer

https://www.irccloud.com/pastebin/GojRLn5D/

2021-04-16 10603, 2021

6:33 AM
_lucifer

the persistent data is being stored at `/data/datanode` and `/data/namenode`. let me know i should modify that as well.

2021-04-16 10637, 2021

8:30 AM
ruaok

_lucifer: yes. exactly that.

2021-04-16 10649, 2021

8:30 AM
ruaok

/data seems fine

2021-04-16 10654, 2021

8:30 AM
ruaok

also, moooin!

2021-04-16 10657, 2021

8:51 AM
iliekcomputers

good morning!

2021-04-16 10630, 2021

8:52 AM
ruaok

moin, how are things in dublin?

2021-04-16 10613, 2021

8:53 AM
iliekcomputers

it's been sunny this week!

2021-04-16 10641, 2021

8:53 AM
ruaok

said like a true northerner. :)

2021-04-16 10626, 2021

8:54 AM
alastairp

hello

2021-04-16 10611, 2021

8:55 AM
alastairp

ruaok: interesting to see that the materialised view also has lots of subtables

2021-04-16 10653, 2021

8:55 AM
alastairp

how many rows does listen_count_5day have? hypothesis: if it was just a plain old postgres table with a bunch of indexes, would it be faster?

2021-04-16 10632, 2021

8:56 AM
ruaok

yeah. the thing that flummoxxes me is that it a search on the materialized index with start and end constraints still does a full index scan on all chunks. which kills performance.

2021-04-16 10646, 2021

8:56 AM
ruaok

> if it was just a plain old postgres table with a bunch of indexes, would it be faster?

2021-04-16 10606, 2021

8:57 AM
ruaok

it should be faster. right now it has 2M rows or so. with a proper index it is faster.

2021-04-16 10634, 2021

8:57 AM
ruaok

my conclusion: cont aggs are too critical to be a mission critical element of our infrastructure.

2021-04-16 10640, 2021

8:57 AM
ruaok

but, I have a plan!

2021-04-16 10654, 2021

8:58 AM
alastairp

yeah, I was surprised to see all of those scans on each chunk, when I guess we put all of this effort into minimising the number of chunks that we wanted to scan back when we started this optimisation process

2021-04-16 10628, 2021

8:59 AM
ruaok

first, we should store listen count, min ts, max ts in redis -- perhaps even one key since they often are needed in the same place. Since we have ONE place where we update the listen table (timescale_writer) we can ensure that the values in redis are kept accurate. And this has been working well from what I can tell. So, lets not expire those keys anymore -- primary store is in redis, but can be recomputed when needed by expiring the

2021-04-16 10628, 2021

8:59 AM
ruaok

keys.

2021-04-16 10657, 2021

8:59 AM
_lucifer

ruaok: 👍 , i'll restart the cluster after changing the local dir and we should be able to do more requests.

2021-04-16 10657, 2021

8:59 AM
ruaok

that cuts out a 2.7s query. not hard to accomplish.

2021-04-16 10611, 2021

9:00 AM
alastairp

right, so instead of having to do a scan, be it indexed or table scan, we just do a redis get

2021-04-16 10613, 2021

9:00 AM
alastairp

great

2021-04-16 10616, 2021

9:00 AM
_lucifer

btw, i checked it did process some stats jobs yesterday.

2021-04-16 10659, 2021

9:00 AM
ruaok

next, I took a look at our usage patterns. and I think we should make sure that fetching recent listens is FAST and older listens reasonably fast. ancient listens can be slower. but most of our queries will be recent shit, so make them fast.

2021-04-16 10630, 2021

9:01 AM
ruaok

I'm finishing debugging an approach that starts with a 30d window and then does an exponential backoff on the query bounds.

2021-04-16 10657, 2021

9:01 AM
alastairp

great, that makes sense too

2021-04-16 10658, 2021

9:01 AM
ruaok

if not found in 30 days, save those results and adjust bounds to be wider and just outside the last bounds.

2021-04-16 10629, 2021

9:02 AM
ruaok

worst case, 3-5 queries that never overlap, doing a full table index scan.

2021-04-16 10649, 2021

9:02 AM
ruaok

my goal is to fetch listens in under 100ms on avg.

2021-04-16 10656, 2021

9:02 AM
ruaok

hopefully better than that.

2021-04-16 10640, 2021

9:03 AM
ruaok

hopefully we can test that in a minute.

2021-04-16 10619, 2021

9:04 AM
ruaok

does any of that sounds like a bad approach?

2021-04-16 10636, 2021

9:04 AM
ruaok

furthermore, cont aggs don't support month buckets for the data view you and Mr_Monkey wanted.

2021-04-16 10630, 2021

9:05 AM
ruaok

but I created a 30 day bucket, which will never align right, but get the overall job done -- a month or year aligned query on top of that, will still fetch the data in a reasonable manner.

2021-04-16 10640, 2021

9:05 AM
ruaok

30 day bucket cont agg.

2021-04-16 10614, 2021

9:20 AM
alastairp

will the exponential backoff still hit chunk boundaries?

2021-04-16 10624, 2021

9:20 AM
ruaok

no.

2021-04-16 10651, 2021

9:20 AM
alastairp

but the idea would be that the first few chunks will get most of the data that we want anyway, so it should be fast?

2021-04-16 10611, 2021

9:21 AM
ruaok

https://www.irccloud.com/pastebin/xKuw5kt8/

2021-04-16 10616, 2021

9:21 AM
ruaok

yes, exactly that.

2021-04-16 10639, 2021

9:21 AM
ruaok

see the paste for how the timestamps change over time. right not I have it tuned to give max 3 queries.

2021-04-16 10638, 2021

9:24 AM
alastairp

yeah right, nice

2021-04-16 10641, 2021

9:24 AM
ruaok

yisss. tests pass.

2021-04-16 10609, 2021

9:25 AM
alastairp

one question: from the pg explain above, do you know what `listened_at_bucket < COALESCE(_timescaledb_internal.cagg_watermark('35'::oid), '-9223372036854775808'::bigint))` is?

2021-04-16 10645, 2021

9:25 AM
alastairp

I assume that this is some internal timescale thing, but it's interesting that the bigint argument is the same on all chunks

2021-04-16 10650, 2021

9:25 AM
ruaok

oh yes, and when we cache timestamps, then the lower bound of the query will be the timestamps of the user, further making it more efficient. (new users will never paw through 20 years of data)

2021-04-16 10633, 2021

9:26 AM
ruaok

I don't know. has confused me too.

2021-04-16 10625, 2021

9:29 AM
shivam-kapila

We used to cache min max timestamps during Influx time iirc

2021-04-16 10601, 2021

9:30 AM
ruaok

we still may. I just want to shore them up and store them in one redis key.

2021-04-16 10616, 2021

9:30 AM
ruaok

I haven't dealt with that yet.

2021-04-16 10658, 2021

9:30 AM
ruaok

awww yiss: `2021-04-16 09:30:38,546 INFO fetch listens 0.03s in 1 passes`

2021-04-16 10622, 2021

9:31 AM
ruaok

fetch this: https://test.listenbrainz.org/user/mr_monkey

2021-04-16 10622, 2021

9:31 AM
shivam-kapila

That looks goood

2021-04-16 10649, 2021

9:31 AM
shivam-kapila

Damn thats atleast 2x faster for me

2021-04-16 10652, 2021

9:31 AM
ruaok

yes, we clearly cache timestamps. once they are in cache, things are fast.

2021-04-16 10613, 2021

9:32 AM
ruaok

https://www.irccloud.com/pastebin/xdOHQJ0G/

2021-04-16 10621, 2021

9:32 AM
ruaok

30ms. nice. I was shooting for under 100ms.

2021-04-16 10635, 2021

9:32 AM
ruaok

yesterday was a loong day, but in the end worth it.

2021-04-16 10624, 2021

9:33 AM
shivam-kapila

Strange enough that the cont. agg. Is so slow

2021-04-16 10624, 2021

9:33 AM
alastairp

sweet, nice work

2021-04-16 10600, 2021

9:34 AM
ruaok

wooooo, wooo fucking hooo!

2021-04-16 10606, 2021

9:34 AM
ruaok does a little dance

2021-04-16 10610, 2021

9:34 AM
ruaok

man, that was a long slog.

2021-04-16 10622, 2021

9:34 AM
_lucifer

!m ruaok

2021-04-16 10622, 2021

9:34 AM
BrainzBot

You're doing good work, ruaok!

2021-04-16 10626, 2021

9:34 AM
ruaok

https://www.irccloud.com/pastebin/RST1HlT7/

2021-04-16 10633, 2021

9:34 AM
shivam-kapila

But the performance boost is cleary noticable.

2021-04-16 10641, 2021

9:34 AM
ruaok

anything that is slow, is the ts fetch. once that is in cache, booom!

2021-04-16 10647, 2021

9:34 AM
ruaok

:D

2021-04-16 10604, 2021

9:36 AM
BrainzGit

[listenbrainz-server] mayhem opened pull request #1390 (master…be-gone-time-ranges): Remove time ranges and refactor listen fetching https://github.com/metabrainz/listenbrainz-server…

2021-04-16 10613, 2021

9:36 AM
sumedh joined the channel

2021-04-16 10637, 2021

9:37 AM
_lucifer

alastairp: hi! regarding https://github.com/metabrainz/listenbrainz-server…, do you mean to reject invalid format tags or just not build and push the image to docker hub?

2021-04-16 10641, 2021

9:38 AM
alastairp

good question

2021-04-16 10658, 2021

9:38 AM
alastairp

can you reject badly formatted tags at a github level?

2021-04-16 10609, 2021

9:39 AM
alastairp

I think it's fine to just match this action on a regex

2021-04-16 10629, 2021

9:39 AM
ruaok

alastairp: could you please glance at https://github.com/metabrainz/listenbrainz-server… and tell me if we should finish this and merge or if I should do the timestamp improvements before finishing this PR?

2021-04-16 10639, 2021

9:39 AM
Mr_Monkey

Nice work ruaok !

2021-04-16 10642, 2021

9:39 AM
ruaok

I would prefer to finish the timestamp improvements in the same PR.

2021-04-16 10646, 2021

9:39 AM
ruaok

Mr_Monkey: thanks. :)

2021-04-16 10606, 2021

9:40 AM
ruaok

Mr_Monkey: please check the branch to see if you find any issues.

2021-04-16 10625, 2021

9:40 AM
Mr_Monkey

Looking now

2021-04-16 10643, 2021

9:40 AM
ruaok

the PR is already long, but mostly because of deleted code.

2021-04-16 10659, 2021

9:40 AM
Mr_Monkey

We're talking of PR 1390 you posted above, correct?

2021-04-16 10611, 2021

9:41 AM
ruaok

yes

2021-04-16 10615, 2021

9:41 AM
ruaok

which is live on test

2021-04-16 10639, 2021

9:41 AM
ruaok

did you read my comments on the month and year aggregated views?

2021-04-16 10631, 2021

9:42 AM
_lucifer

alastairp: i don't we can reject such tags directly, we'd need to write some webhook and wire BrainzBot maybe. probably not worth it. the other solution to match the regex is easy. yup, let's do that.

2021-04-16 10640, 2021

9:42 AM
alastairp

ruaok: a quey about that PR: `CREATE VIEW listen_count_5day WITH ...`. I'm not super familiar with this part of postgres

2021-04-16 10615, 2021

9:43 AM
alastairp

this is a materialised view (I understand yes, because you can only create indexes on materialised views?), in that case, what's the difference between CREAET VIEW and CREATE MATERILIZED VIEW?

2021-04-16 10618, 2021

9:43 AM
ruaok

that part will likely still change. I've not settled on what to do yet.

2021-04-16 10639, 2021

9:43 AM
ruaok

but I may just replace the listen_count with listen_count_30days and listen_count_365days.

2021-04-16 10604, 2021

9:44 AM
outsidecontext

zas: https://blog.mozilla.org/addons/2021/04/15/built-…

2021-04-16 10626, 2021

9:44 AM
zas

outsidecontext: https://data.musicbrainz.org

2021-04-16 10650, 2021

9:44 AM
ruaok

alastairp: thats how things are done in TS 1.7. IN 2.x+ you create a materialized view. I suspect that an upgrade will help us on a few fronts, but they still won't make the approach from yesterday viable.

2021-04-16 10607, 2021

9:45 AM
outsidecontext

zas: oh, great. so we could already change the links for the picard download page?

2021-04-16 10609, 2021

9:45 AM
alastairp

ruaok: ok, thanks for clarifying

2021-04-16 10612, 2021

9:45 AM
ruaok

!m zas

2021-04-16 10612, 2021

9:45 AM
BrainzBot

You're doing good work, zas!

2021-04-16 10637, 2021

9:45 AM
zas

outsidecontext: I guess so, I need have few bits to tune, but it is here to stay

2021-04-16 10638, 2021

9:45 AM
ruaok

alastairp: so thoughts on continue vs new PR for the timestamps work?

2021-04-16 10650, 2021

9:45 AM
alastairp

ruaok: I'm reading through the PR now

2021-04-16 10659, 2021

9:45 AM
outsidecontext

cool. shall I submit a PR for the website?

2021-04-16 10617, 2021

9:46 AM
zas

for now, it supports http & https, should I just redirect http to https?

2021-04-16 10641, 2021

9:46 AM
ruaok

zas: I don't think so.

2021-04-16 10648, 2021

9:46 AM
zas

k

2021-04-16 10611, 2021

9:52 AM
alastairp

ruaok: can you clarify - this PR includes the changes that we discussed on wednesday: adds the 5 day aggregate, removes the need for making multiple queries on the frontend. query times are on par with what they were before this change. your new proposed improvements include caching of the min/max and the increasing range query, which gets us down to your example 30ms query?

2021-04-16 10607, 2021

9:53 AM
ruaok

correct. however, the "adds the 5 day aggregate" will likely still change a bit.

2021-04-16 10636, 2021

9:53 AM
ruaok

it is no longer crucial to have the cont agg in lock step with the hypertable.

2021-04-16 10605, 2021

9:54 AM
ruaok

I am learning toward replacing the listen_count with listen_count_30day, rather than creating a new listen_count_5day.

2021-04-16 10622, 2021

9:54 AM
ruaok

it should make fetching timestamps a bit faster.

2021-04-16 10640, 2021

9:54 AM
ruaok

so the 30day cont agg and the timestamp work are the only bits left.

2021-04-16 10649, 2021

9:54 AM
alastairp

ah, I see. this is why it also makes sense to merge your timestamp work into this same PR

2021-04-16 10656, 2021

9:54 AM
ruaok

yes.

2021-04-16 10614, 2021

9:55 AM
ruaok

and then we have one larger PR, but this whole whicket of issues gets solved.

2021-04-16 10630, 2021

9:55 AM
ruaok

and the basis for 30day listen overview is in place for future work.

2021-04-16 10653, 2021

9:55 AM
alastairp

this PR is already quite large, though I see a lot of it is just changing arguments in tests.

2021-04-16 10611, 2021

9:56 AM
alastairp

the timestamp stuff is going to be pretty self-contained in a single file, right?

2021-04-16 10612, 2021

9:56 AM
ruaok

exactly why I am asking before proceeding.

2021-04-16 10622, 2021

9:56 AM
ruaok

2, but year.

2021-04-16 10624, 2021

9:56 AM
ruaok

yeah

2021-04-16 10656, 2021

9:56 AM
ruaok

I will create a new agg in this code, but we wont need an SQL update script yet. I've done it by hand.

2021-04-16 10659, 2021

9:56 AM
alastairp

go ahead with putting both in this PR then, as the timestamp stuff isn't going directly affect the other 21 files

2021-04-16 10614, 2021

9:57 AM
ruaok

once deployed I will open a small PR for removing the old cont aggs.

2021-04-16 10623, 2021

9:57 AM
alastairp

I've got a meeting now, but after lunch I'll be available for review

2021-04-16 10632, 2021

9:57 AM
ruaok

kewl. I'll finish that this afternoon with some luck.

2021-04-16 10646, 2021

9:57 AM
ruaok

I suspect that PR wont be ready until monday, so no rush.

2021-04-16 10606, 2021

9:58 AM
alastairp

ok cool

2021-04-16 10615, 2021

9:58 AM
ruaok

I worked from 10am until 1:30 in the morning with few breaks, so I'd like a bike ride this afternoon. :)

2021-04-16 10626, 2021

9:58 AM
alastairp

can I pre-review this? I see a few suggestions already, or would you prefer to wait until you finish?

2021-04-16 10651, 2021

9:58 AM
alastairp

theoretically comarca restrictions get lifted on Monday!

2021-04-16 10601, 2021

9:59 AM
ruaok

don't go digging into too much detail and skip comments on the SQL scripts, but yes feedback would be welcome.

2021-04-16 10617, 2021

9:59 AM
ruaok

ohh, really?? that would be fantastic.

2021-04-16 10621, 2021

9:59 AM
ruaok wants to go to olot.

2021-04-16 10625, 2021

9:59 AM
alastairp

to "Veguerias", which are circa 14th century political divisions, lol

2021-04-16 10642, 2021

9:59 AM
alastairp

so our one includes Barceloneces, Maresme, and the 2 Valles

2021-04-16 10654, 2021

9:59 AM
alastairp

https://en.wikipedia.org/wiki/Vegueria

2021-04-16 10606, 2021

10:00 AM
alastairp

"the feudal administrative territorial jurisdiction of the Principality of Catalonia (to the Crown of Aragon) during the Middle Ages and into the Modern Era until the Nueva Planta decrees of 1716."

2021-04-16 10607, 2021

10:00 AM
ruaok

crap.

2021-04-16 10621, 2021

10:00 AM
ruaok

oh well, a trip up to Arenys would also be nice.

2021-04-16 10606, 2021

10:01 AM
Sophist-UK joined the channel