-
prabal joined the channel
-
yvanzo
reosarevok: agreed, seems to describe the same issue(s), maybe with variations.
-
reosarevok: yes it is totally spurious (the PR just changes MD files)
-
iliekcomputers
shivam-kapila: sounds good!
-
yvanzo
reosarevok: I donโt think we should go for a more recent version, maintenance is planned from Apr 2020 to Apr 2021.
-
that would be an artificial requirement and discourage people (if not already) to install it on a system that comes with stock v10.
-
reosarevok
Is there no actual improvement that would be useful to us in the newer versions?
-
yvanzo
no idea, even if it would improve perfs (for example), it would still not be needed
-
iliekcomputers
ruaok: hey
-
ruaok
hey
-
iliekcomputers
So to remove the serialization deserialization stuff
-
ruaok
in the ingestion process, you mean?
-
iliekcomputers
I wonder if it's worth it to investigate using protobuf instead of json
-
Yes
-
Or in general
-
ruaok
quite possibly.
-
rdswift joined the channel
-
but, let me throw a spanner into the works.
-
I've been reading up on influxdb and what exactly cardinality entails in that world.
-
right now our series cardinality is 980441 (estimated)
-
rdswift has quit
-
from what I understand that is the product of measurements X fields (in our case).
-
and series cardinality of <1M is considered medium use.
-
>1M is considered high level of use.
-
rdswift joined the channel
-
>10M is not practical.
-
iliekcomputers
We're already there
-
๐๐ฝ
-
ruaok
so, if we hit 80k users, we're into the "impractical" zone.
-
yeah.
-
so, that opens a giant can of worms.
-
we could, thinking out loud, stay with influx DB.
-
if we get rid of most fields and store a JSON string, then our cardinality would be greatly reduced.
-
and we'd be good for a while, I think.
-
and that is fine -- we never really query on anything but listened at and user.
-
the alternative is to check out timescale.
-
iliekcomputers
That sounds reasonable to me. It's an archive anyways
-
ruaok
-
timescale is an addition to postgres.
-
and extension to be exact. it brings hypertables to postgres with 100% of the postgres functionality.
-
we connect to like like postgres and it has the schema of postgres, but supposedly performance is that of influx.
-
iliekcomputers
Im not sure I see the justification to completely changing the database yet.
-
ruaok
well, it looks like we have to re-write the whole DB.
-
switching DBs might be a small extra cost... aside from rewriting a pile of SQL.
-
but into a saner syntax we know and love.
-
BUT, the big thing?
-
we can modify old rows.
-
delete a user? trivial.
-
delete individual listens? cake.
-
iliekcomputers
That sounds very nice
-
ruaok
too good to be true, honestly.
-
iliekcomputers
How old is timescale
-
ruaok
so my plan is to do the migration to 1.7.9 tomorrow and see how that performs.
-
should buy us some time
-
then I will try a trial migration to timescale and see how it goes.
-
iliekcomputers
ruaok: I think an exploration on how to make our shit more resource efficient might be a great gsoc project.
-
ruaok
I could easily connect a timescale_write to the rabbitmq and also write to timescale.
-
in parallel.
-
good point.
-
I dont think a test to kick the tires should take more than a day to knock out.
-
and if we have a full DB, running in parallel we can migrate bits of code on beta and roll it out later.
-
iliekcomputers
Mhmm, that's the right way to do database migrations
-
(so I learned after the AB migration)
-
So the gsoc project
-
ruaok
not sure how old it is, but it seems to have enough cred and and it is postgres. I get the same postgres ethos vibe from their stuff.
-
iliekcomputers
Could contain the following:
-
1. Rewrite influx writer in go/rust
-
2. Use protobuf insteaf of json
-
3. Do the stuff we need for migration
-
ruaok: what do you think is the urgency of the migration?
-
ruaok
if we do the schema creation and the initial DB mirror, (which are the tough bits) then this totally makes sense.
-
I think a gsoc project would be a perfect fit. I think the migration to 1.7.9 will buy us that time.
-
if not, we can re-write the DB, if we have to.
-
iliekcomputers
(famous last words) (kidding)
-
ruaok
true that.
-
there are many little things about influx that bug me.
-
iliekcomputers
Influx has been a PITA since the beginning tbh
-
ruaok
would would be getting rid of a lot of code designed to deal with influx and idosyncracies.
-
API wise yes. stability and speed, until as of late, was good.
-
iliekcomputers still remembers the shitty escape logic I wrote
-
DING
-
that.
-
and we KNOW how to postgres.
-
however, this puts us at odds with an already suggested GSoC project. or two.
-
the add ability to delete listens and it also impacts like/dislike.
-
naw, not really, actually.
-
if the prereq is the we have the parallel DB in place before coding starts, the new features can code against TS, not influx.
-
iliekcomputers
Does delete listens even remain a project?
-
ruaok
it could -- we should expand the scope a bit.
-
iliekcomputers
I guess there's the UI to do still
-
ruaok
like automated user rename/delete when a users gets deleted. and UI components for delete listens.
-
iliekcomputers
And if you add the idea to deleting listens in the incremental dumps too
-
ruaok
I think that just broke my head.
-
ruaok puts it back together.
-
yes, agreed
-
iliekcomputers
If we have a deadline on this migration however, either you or pristine__ will have to take point on it.
-
ruaok
me
-
its a weekend project. so 3 months should be a good fit.
-
iliekcomputers
Heh
-
ruaok
not really weekend project.
-
I mean you and I could hammer it out during our hack day.
-
I'm positive.
-
and not just a partial migration, but migrating *everything*
-
it is mostly porting queries. and our queries are pretty simple and the postgres query format is simpler.
-
most of it would be massaging code around.
-
iliekcomputers
Hmm.
-
Yeah, I think that sounds reasonable if we aim for a first cut
-
ruaok
yeah.
-
I still think a quick and dirty proof of concept is in order.
-
iliekcomputers
Yeah, agreed.
-
ruaok
ingest the LB data dump into TS -- how long does that take?
-
iliekcomputers
Considering that weekend is like a month away anyeays
-
ruaok
we could even it against influx -- we have the code for ingesting into influx.
-
side by side comparison.
-
also, did you see what I posted earlier?
-
I read in the influx docs that appending near current time timestamps is fast.
-
but adding older timestamps incurs a significant overhead in influx.
-
iliekcomputers
Yeah, that's a pain.
-
So
-
ruaok
and I noticed that the spikes in our queue backing up are.. people importing data.
-
like a last.fm import.
-
of those are problematic now, we're... screwed.
-
iliekcomputers
How long have we had the queue backed up
-
Did we have more people importing in the last few days?
-
ruaok
first spike appeared 3-feb
-
yes, I noticed 4 people ended up causing the big backlog
-
this parallel DB is a really good idea. we can compare real world cases.
-
iliekcomputers
Yeah
-
bukwurm has quit
-
I don't want to decide that we're migrating databases too quickly.
-
ruaok
influx has made me nervous for a while now.
-
iliekcomputers
I'd like it if we had actual numbers
-
ruaok
agreed.
-
iliekcomputers
Because in the end, we are time constrained and even if it's pretty easy to implement it'll still take effort that could be spent developing features.
-
I do think that the json low hanging fruit is something that should go into a gsoc project.
-
ruaok
let me add it to the trello.
-
yvanzo
reosarevok: selenium failure on 1379 is spurious too
-
ruaok
-
iliekcomputers: ^^
-
yvanzo
trello resurrected?
-
ruaok
I'm using trello for the new MeB roadmap and for LB planning.
-
yvanzo: do you have a roadmap of features needed for the VM?
-
it could use adding to that trello.
-
amCap1712
CatQuest: hi
-
-
yvanzo
cool, there are some needed features, I created tickets for some already
-
ruaok
great -- I've been adding lists called "related tickets" and adding links to jira for tasks.
-
iliekcomputers
I have the road map on my list of things to look at.
-
amCap1712
CatQuest: the dll is ready to be tested on windows. ping me when you're available
-
ruaok
yvanzo: do you have a trello account? you're not part of the team.