in #metabrainz

20:26 PM
prabal joined the channel
20:29 PM
yvanzo

reosarevok: agreed, seems to describe the same issue(s), maybe with variations.
20:29 PM
reosarevok: yes it is totally spurious (the PR just changes MD files)
20:31 PM
iliekcomputers

shivam-kapila: sounds good!
20:31 PM
yvanzo

reosarevok: I don’t think we should go for a more recent version, maintenance is planned from Apr 2020 to Apr 2021.
20:32 PM
that would be an artificial requirement and discourage people (if not already) to install it on a system that comes with stock v10.
20:33 PM
reosarevok

Is there no actual improvement that would be useful to us in the newer versions?
20:36 PM
yvanzo

no idea, even if it would improve perfs (for example), it would still not be needed
20:39 PM
iliekcomputers

ruaok: hey
20:39 PM
ruaok

hey
20:39 PM
iliekcomputers

So to remove the serialization deserialization stuff
20:40 PM
ruaok

in the ingestion process, you mean?
20:40 PM
iliekcomputers

I wonder if it's worth it to investigate using protobuf instead of json
20:40 PM
Yes
20:40 PM
Or in general
20:40 PM
ruaok

quite possibly.
20:40 PM
rdswift joined the channel
20:40 PM
but, let me throw a spanner into the works.
20:41 PM
I've been reading up on influxdb and what exactly cardinality entails in that world.
20:41 PM
right now our series cardinality is 980441 (estimated)
20:42 PM
rdswift has quit
20:42 PM
from what I understand that is the product of measurements X fields (in our case).
20:42 PM
and series cardinality of <1M is considered medium use.
20:43 PM
>1M is considered high level of use.
20:43 PM
rdswift joined the channel
20:43 PM
>10M is not practical.
20:43 PM
iliekcomputers

We're already there
20:43 PM
👏🏽
20:43 PM
ruaok

so, if we hit 80k users, we're into the "impractical" zone.
20:43 PM
yeah.
20:44 PM
so, that opens a giant can of worms.
20:44 PM
we could, thinking out loud, stay with influx DB.
20:44 PM
if we get rid of most fields and store a JSON string, then our cardinality would be greatly reduced.
20:44 PM
and we'd be good for a while, I think.
20:45 PM
and that is fine -- we never really query on anything but listened at and user.
20:45 PM
the alternative is to check out timescale.
20:45 PM
iliekcomputers

That sounds reasonable to me. It's an archive anyways
20:45 PM
ruaok

https://www.timescale.com/
20:46 PM
timescale is an addition to postgres.
20:46 PM
and extension to be exact. it brings hypertables to postgres with 100% of the postgres functionality.
20:46 PM
we connect to like like postgres and it has the schema of postgres, but supposedly performance is that of influx.
20:46 PM
iliekcomputers

Im not sure I see the justification to completely changing the database yet.
20:47 PM
ruaok

well, it looks like we have to re-write the whole DB.
20:47 PM
switching DBs might be a small extra cost... aside from rewriting a pile of SQL.
20:47 PM
but into a saner syntax we know and love.
20:47 PM
BUT, the big thing?
20:47 PM
we can modify old rows.
20:48 PM
delete a user? trivial.
20:48 PM
delete individual listens? cake.
20:48 PM
iliekcomputers

That sounds very nice
20:48 PM
ruaok

too good to be true, honestly.
20:48 PM
iliekcomputers

How old is timescale
20:48 PM
ruaok

so my plan is to do the migration to 1.7.9 tomorrow and see how that performs.
20:49 PM
should buy us some time
20:49 PM
then I will try a trial migration to timescale and see how it goes.
20:49 PM
iliekcomputers

ruaok: I think an exploration on how to make our shit more resource efficient might be a great gsoc project.
20:49 PM
ruaok

I could easily connect a timescale_write to the rabbitmq and also write to timescale.
20:49 PM
in parallel.
20:49 PM
good point.
20:50 PM
I dont think a test to kick the tires should take more than a day to knock out.
20:51 PM
and if we have a full DB, running in parallel we can migrate bits of code on beta and roll it out later.
20:51 PM
iliekcomputers

Mhmm, that's the right way to do database migrations
20:51 PM
(so I learned after the AB migration)
20:52 PM
So the gsoc project
20:52 PM
ruaok

not sure how old it is, but it seems to have enough cred and and it is postgres. I get the same postgres ethos vibe from their stuff.
20:52 PM
iliekcomputers

Could contain the following:
20:52 PM
1. Rewrite influx writer in go/rust
20:52 PM
2. Use protobuf insteaf of json
20:53 PM
3. Do the stuff we need for migration
20:53 PM
ruaok: what do you think is the urgency of the migration?
20:53 PM
ruaok

if we do the schema creation and the initial DB mirror, (which are the tough bits) then this totally makes sense.
20:54 PM
I think a gsoc project would be a perfect fit. I think the migration to 1.7.9 will buy us that time.
20:54 PM
if not, we can re-write the DB, if we have to.
20:54 PM
iliekcomputers

(famous last words) (kidding)
20:55 PM
ruaok

true that.
20:55 PM
there are many little things about influx that bug me.
20:55 PM
iliekcomputers

Influx has been a PITA since the beginning tbh
20:55 PM
ruaok

would would be getting rid of a lot of code designed to deal with influx and idosyncracies.
20:56 PM
API wise yes. stability and speed, until as of late, was good.
20:56 PM
iliekcomputers still remembers the shitty escape logic I wrote
20:56 PM
DING
20:56 PM
that.
20:56 PM
and we KNOW how to postgres.
20:56 PM
however, this puts us at odds with an already suggested GSoC project. or two.
20:57 PM
the add ability to delete listens and it also impacts like/dislike.
20:57 PM
naw, not really, actually.
20:57 PM
if the prereq is the we have the parallel DB in place before coding starts, the new features can code against TS, not influx.
20:58 PM
iliekcomputers

Does delete listens even remain a project?
20:58 PM
ruaok

it could -- we should expand the scope a bit.
20:58 PM
iliekcomputers

I guess there's the UI to do still
20:58 PM
ruaok

like automated user rename/delete when a users gets deleted. and UI components for delete listens.
20:59 PM
iliekcomputers

And if you add the idea to deleting listens in the incremental dumps too
20:59 PM
ruaok

I think that just broke my head.
20:59 PM
ruaok puts it back together.
21:00 PM
yes, agreed
21:00 PM
iliekcomputers

If we have a deadline on this migration however, either you or pristine__ will have to take point on it.
21:00 PM
ruaok

me
21:00 PM
its a weekend project. so 3 months should be a good fit.
21:01 PM
iliekcomputers

Heh
21:01 PM
ruaok

not really weekend project.
21:01 PM
I mean you and I could hammer it out during our hack day.
21:01 PM
I'm positive.
21:01 PM
and not just a partial migration, but migrating *everything*
21:02 PM
it is mostly porting queries. and our queries are pretty simple and the postgres query format is simpler.
21:02 PM
most of it would be massaging code around.
21:02 PM
iliekcomputers

Hmm.
21:02 PM
Yeah, I think that sounds reasonable if we aim for a first cut
21:03 PM
ruaok

yeah.
21:03 PM
I still think a quick and dirty proof of concept is in order.
21:03 PM
iliekcomputers

Yeah, agreed.
21:03 PM
ruaok

ingest the LB data dump into TS -- how long does that take?
21:03 PM
iliekcomputers

Considering that weekend is like a month away anyeays
21:04 PM
ruaok

we could even it against influx -- we have the code for ingesting into influx.
21:04 PM
side by side comparison.
21:04 PM
also, did you see what I posted earlier?
21:04 PM
I read in the influx docs that appending near current time timestamps is fast.
21:04 PM
but adding older timestamps incurs a significant overhead in influx.
21:05 PM
iliekcomputers

Yeah, that's a pain.
21:05 PM
So
21:05 PM
ruaok

and I noticed that the spikes in our queue backing up are.. people importing data.
21:05 PM
like a last.fm import.
21:05 PM
of those are problematic now, we're... screwed.
21:06 PM
iliekcomputers

How long have we had the queue backed up
21:06 PM
Did we have more people importing in the last few days?
21:07 PM
ruaok

first spike appeared 3-feb
21:07 PM
yes, I noticed 4 people ended up causing the big backlog
21:08 PM
this parallel DB is a really good idea. we can compare real world cases.
21:08 PM
iliekcomputers

Yeah
21:08 PM
bukwurm has quit
21:09 PM
I don't want to decide that we're migrating databases too quickly.
21:09 PM
ruaok

influx has made me nervous for a while now.
21:09 PM
iliekcomputers

I'd like it if we had actual numbers
21:09 PM
ruaok

agreed.
21:11 PM
iliekcomputers

Because in the end, we are time constrained and even if it's pretty easy to implement it'll still take effort that could be spent developing features.
21:13 PM
I do think that the json low hanging fruit is something that should go into a gsoc project.
21:14 PM
ruaok

let me add it to the trello.
21:14 PM
yvanzo

reosarevok: selenium failure on 1379 is spurious too
21:14 PM
ruaok

https://trello.com/b/lFuXCRkz/metabrainz-roadmap
21:14 PM
iliekcomputers: ^^
21:15 PM
yvanzo

trello resurrected?
21:15 PM
ruaok

I'm using trello for the new MeB roadmap and for LB planning.
21:16 PM
yvanzo: do you have a roadmap of features needed for the VM?
21:16 PM
it could use adding to that trello.
21:17 PM
amCap1712

CatQuest: hi
21:17 PM
https://usercontent.irccloud-cdn.com/file/jUV0w...
21:18 PM
yvanzo

cool, there are some needed features, I created tickets for some already
21:18 PM
ruaok

great -- I've been adding lists called "related tickets" and adding links to jira for tasks.
21:18 PM
iliekcomputers

I have the road map on my list of things to look at.
21:19 PM
amCap1712

CatQuest: the dll is ready to be tested on windows. ping me when you're available
21:19 PM
ruaok

yvanzo: do you have a trello account? you're not part of the team.