-
prabal joined the channel
2020-02-19 05017, 2020
-
yvanzo
reosarevok: agreed, seems to describe the same issue(s), maybe with variations.
2020-02-19 05058, 2020
-
yvanzo
reosarevok: yes it is totally spurious (the PR just changes MD files)
2020-02-19 05007, 2020
-
iliekcomputers
shivam-kapila: sounds good!
2020-02-19 05007, 2020
-
yvanzo
reosarevok: I don’t think we should go for a more recent version, maintenance is planned from Apr 2020 to Apr 2021.
2020-02-19 05035, 2020
-
yvanzo
that would be an artificial requirement and discourage people (if not already) to install it on a system that comes with stock v10.
2020-02-19 05038, 2020
-
reosarevok
Is there no actual improvement that would be useful to us in the newer versions?
2020-02-19 05009, 2020
-
yvanzo
no idea, even if it would improve perfs (for example), it would still not be needed
2020-02-19 05028, 2020
-
iliekcomputers
ruaok: hey
2020-02-19 05039, 2020
-
ruaok
hey
2020-02-19 05045, 2020
-
iliekcomputers
So to remove the serialization deserialization stuff
2020-02-19 05009, 2020
-
ruaok
in the ingestion process, you mean?
2020-02-19 05010, 2020
-
iliekcomputers
I wonder if it's worth it to investigate using protobuf instead of json
2020-02-19 05014, 2020
-
iliekcomputers
Yes
2020-02-19 05020, 2020
-
iliekcomputers
Or in general
2020-02-19 05038, 2020
-
ruaok
quite possibly.
2020-02-19 05042, 2020
-
rdswift joined the channel
2020-02-19 05049, 2020
-
ruaok
but, let me throw a spanner into the works.
2020-02-19 05014, 2020
-
ruaok
I've been reading up on influxdb and what exactly cardinality entails in that world.
2020-02-19 05042, 2020
-
ruaok
right now our series cardinality is 980441 (estimated)
2020-02-19 05016, 2020
-
rdswift has quit
2020-02-19 05019, 2020
-
ruaok
from what I understand that is the product of measurements X fields (in our case).
2020-02-19 05056, 2020
-
ruaok
and series cardinality of <1M is considered medium use.
2020-02-19 05013, 2020
-
ruaok
>1M is considered high level of use.
2020-02-19 05017, 2020
-
rdswift joined the channel
2020-02-19 05023, 2020
-
ruaok
>10M is not practical.
2020-02-19 05032, 2020
-
iliekcomputers
We're already there
2020-02-19 05040, 2020
-
iliekcomputers
👏🏽
2020-02-19 05046, 2020
-
ruaok
so, if we hit 80k users, we're into the "impractical" zone.
2020-02-19 05051, 2020
-
ruaok
yeah.
2020-02-19 05005, 2020
-
ruaok
so, that opens a giant can of worms.
2020-02-19 05024, 2020
-
ruaok
we could, thinking out loud, stay with influx DB.
2020-02-19 05044, 2020
-
ruaok
if we get rid of most fields and store a JSON string, then our cardinality would be greatly reduced.
2020-02-19 05057, 2020
-
ruaok
and we'd be good for a while, I think.
2020-02-19 05026, 2020
-
ruaok
and that is fine -- we never really query on anything but listened at and user.
2020-02-19 05042, 2020
-
ruaok
the alternative is to check out timescale.
2020-02-19 05049, 2020
-
iliekcomputers
That sounds reasonable to me. It's an archive anyways
2020-02-19 05052, 2020
-
ruaok
2020-02-19 05005, 2020
-
ruaok
timescale is an addition to postgres.
2020-02-19 05025, 2020
-
ruaok
and extension to be exact. it brings hypertables to postgres with 100% of the postgres functionality.
2020-02-19 05048, 2020
-
ruaok
we connect to like like postgres and it has the schema of postgres, but supposedly performance is that of influx.
2020-02-19 05056, 2020
-
iliekcomputers
Im not sure I see the justification to completely changing the database yet.
2020-02-19 05021, 2020
-
ruaok
well, it looks like we have to re-write the whole DB.
2020-02-19 05042, 2020
-
ruaok
switching DBs might be a small extra cost... aside from rewriting a pile of SQL.
2020-02-19 05050, 2020
-
ruaok
but into a saner syntax we know and love.
2020-02-19 05053, 2020
-
ruaok
BUT, the big thing?
2020-02-19 05058, 2020
-
ruaok
we can modify old rows.
2020-02-19 05006, 2020
-
ruaok
delete a user? trivial.
2020-02-19 05013, 2020
-
ruaok
delete individual listens? cake.
2020-02-19 05021, 2020
-
iliekcomputers
That sounds very nice
2020-02-19 05030, 2020
-
ruaok
too good to be true, honestly.
2020-02-19 05041, 2020
-
iliekcomputers
How old is timescale
2020-02-19 05056, 2020
-
ruaok
so my plan is to do the migration to 1.7.9 tomorrow and see how that performs.
2020-02-19 05002, 2020
-
ruaok
should buy us some time
2020-02-19 05014, 2020
-
ruaok
then I will try a trial migration to timescale and see how it goes.
2020-02-19 05030, 2020
-
iliekcomputers
ruaok: I think an exploration on how to make our shit more resource efficient might be a great gsoc project.
2020-02-19 05032, 2020
-
ruaok
I could easily connect a timescale_write to the rabbitmq and also write to timescale.
2020-02-19 05037, 2020
-
ruaok
in parallel.
2020-02-19 05052, 2020
-
ruaok
good point.
2020-02-19 05037, 2020
-
ruaok
I dont think a test to kick the tires should take more than a day to knock out.
2020-02-19 05019, 2020
-
ruaok
and if we have a full DB, running in parallel we can migrate bits of code on beta and roll it out later.
2020-02-19 05045, 2020
-
iliekcomputers
Mhmm, that's the right way to do database migrations
2020-02-19 05057, 2020
-
iliekcomputers
(so I learned after the AB migration)
2020-02-19 05016, 2020
-
iliekcomputers
So the gsoc project
2020-02-19 05020, 2020
-
ruaok
not sure how old it is, but it seems to have enough cred and and it is postgres. I get the same postgres ethos vibe from their stuff.
2020-02-19 05028, 2020
-
iliekcomputers
Could contain the following:
2020-02-19 05040, 2020
-
iliekcomputers
1. Rewrite influx writer in go/rust
2020-02-19 05048, 2020
-
iliekcomputers
2. Use protobuf insteaf of json
2020-02-19 05002, 2020
-
iliekcomputers
3. Do the stuff we need for migration
2020-02-19 05050, 2020
-
iliekcomputers
ruaok: what do you think is the urgency of the migration?
2020-02-19 05059, 2020
-
ruaok
if we do the schema creation and the initial DB mirror, (which are the tough bits) then this totally makes sense.
2020-02-19 05025, 2020
-
ruaok
I think a gsoc project would be a perfect fit. I think the migration to 1.7.9 will buy us that time.
2020-02-19 05035, 2020
-
ruaok
if not, we can re-write the DB, if we have to.
2020-02-19 05055, 2020
-
iliekcomputers
(famous last words) (kidding)
2020-02-19 05007, 2020
-
ruaok
true that.
2020-02-19 05026, 2020
-
ruaok
there are many little things about influx that bug me.
2020-02-19 05048, 2020
-
iliekcomputers
Influx has been a PITA since the beginning tbh
2020-02-19 05049, 2020
-
ruaok
would would be getting rid of a lot of code designed to deal with influx and idosyncracies.
2020-02-19 05004, 2020
-
ruaok
API wise yes. stability and speed, until as of late, was good.
2020-02-19 05007, 2020
-
iliekcomputers still remembers the shitty escape logic I wrote
2020-02-19 05010, 2020
-
ruaok
DING
2020-02-19 05011, 2020
-
ruaok
that.
2020-02-19 05021, 2020
-
ruaok
and we KNOW how to postgres.
2020-02-19 05046, 2020
-
ruaok
however, this puts us at odds with an already suggested GSoC project. or two.
2020-02-19 05012, 2020
-
ruaok
the add ability to delete listens and it also impacts like/dislike.
2020-02-19 05030, 2020
-
ruaok
naw, not really, actually.
2020-02-19 05053, 2020
-
ruaok
if the prereq is the we have the parallel DB in place before coding starts, the new features can code against TS, not influx.
2020-02-19 05011, 2020
-
iliekcomputers
Does delete listens even remain a project?
2020-02-19 05034, 2020
-
ruaok
it could -- we should expand the scope a bit.
2020-02-19 05037, 2020
-
iliekcomputers
I guess there's the UI to do still
2020-02-19 05049, 2020
-
ruaok
like automated user rename/delete when a users gets deleted. and UI components for delete listens.
2020-02-19 05004, 2020
-
iliekcomputers
And if you add the idea to deleting listens in the incremental dumps too
2020-02-19 05033, 2020
-
ruaok
I think that just broke my head.
2020-02-19 05058, 2020
-
ruaok puts it back together.
2020-02-19 05001, 2020
-
ruaok
yes, agreed
2020-02-19 05031, 2020
-
iliekcomputers
If we have a deadline on this migration however, either you or pristine__ will have to take point on it.
2020-02-19 05039, 2020
-
ruaok
me
2020-02-19 05055, 2020
-
ruaok
its a weekend project. so 3 months should be a good fit.
2020-02-19 05001, 2020
-
iliekcomputers
Heh
2020-02-19 05017, 2020
-
ruaok
not really weekend project.
2020-02-19 05027, 2020
-
ruaok
I mean you and I could hammer it out during our hack day.
2020-02-19 05030, 2020
-
ruaok
I'm positive.
2020-02-19 05048, 2020
-
ruaok
and not just a partial migration, but migrating *everything*
2020-02-19 05007, 2020
-
ruaok
it is mostly porting queries. and our queries are pretty simple and the postgres query format is simpler.
2020-02-19 05023, 2020
-
ruaok
most of it would be massaging code around.
2020-02-19 05032, 2020
-
iliekcomputers
Hmm.
2020-02-19 05047, 2020
-
iliekcomputers
Yeah, I think that sounds reasonable if we aim for a first cut
2020-02-19 05010, 2020
-
ruaok
yeah.
2020-02-19 05029, 2020
-
ruaok
I still think a quick and dirty proof of concept is in order.
2020-02-19 05038, 2020
-
iliekcomputers
Yeah, agreed.
2020-02-19 05049, 2020
-
ruaok
ingest the LB data dump into TS -- how long does that take?
2020-02-19 05050, 2020
-
iliekcomputers
Considering that weekend is like a month away anyeays
2020-02-19 05010, 2020
-
ruaok
we could even it against influx -- we have the code for ingesting into influx.
2020-02-19 05016, 2020
-
ruaok
side by side comparison.
2020-02-19 05025, 2020
-
ruaok
also, did you see what I posted earlier?
2020-02-19 05040, 2020
-
ruaok
I read in the influx docs that appending near current time timestamps is fast.
2020-02-19 05057, 2020
-
ruaok
but adding older timestamps incurs a significant overhead in influx.
2020-02-19 05002, 2020
-
iliekcomputers
Yeah, that's a pain.
2020-02-19 05012, 2020
-
iliekcomputers
So
2020-02-19 05015, 2020
-
ruaok
and I noticed that the spikes in our queue backing up are.. people importing data.
2020-02-19 05022, 2020
-
ruaok
like a last.fm import.
2020-02-19 05034, 2020
-
ruaok
of those are problematic now, we're... screwed.
2020-02-19 05001, 2020
-
iliekcomputers
How long have we had the queue backed up
2020-02-19 05043, 2020
-
iliekcomputers
Did we have more people importing in the last few days?
2020-02-19 05015, 2020
-
ruaok
first spike appeared 3-feb
2020-02-19 05034, 2020
-
ruaok
yes, I noticed 4 people ended up causing the big backlog
2020-02-19 05022, 2020
-
ruaok
this parallel DB is a really good idea. we can compare real world cases.
2020-02-19 05036, 2020
-
iliekcomputers
Yeah
2020-02-19 05054, 2020
-
bukwurm has quit
2020-02-19 05001, 2020
-
iliekcomputers
I don't want to decide that we're migrating databases too quickly.
2020-02-19 05010, 2020
-
ruaok
influx has made me nervous for a while now.
2020-02-19 05014, 2020
-
iliekcomputers
I'd like it if we had actual numbers
2020-02-19 05026, 2020
-
ruaok
agreed.
2020-02-19 05015, 2020
-
iliekcomputers
Because in the end, we are time constrained and even if it's pretty easy to implement it'll still take effort that could be spent developing features.
2020-02-19 05057, 2020
-
iliekcomputers
I do think that the json low hanging fruit is something that should go into a gsoc project.
2020-02-19 05018, 2020
-
ruaok
let me add it to the trello.
2020-02-19 05019, 2020
-
yvanzo
reosarevok: selenium failure on 1379 is spurious too
2020-02-19 05037, 2020
-
ruaok
2020-02-19 05043, 2020
-
ruaok
iliekcomputers: ^^
2020-02-19 05027, 2020
-
yvanzo
trello resurrected?
2020-02-19 05052, 2020
-
ruaok
I'm using trello for the new MeB roadmap and for LB planning.
2020-02-19 05038, 2020
-
ruaok
yvanzo: do you have a roadmap of features needed for the VM?
2020-02-19 05046, 2020
-
ruaok
it could use adding to that trello.
2020-02-19 05029, 2020
-
amCap1712
CatQuest: hi
2020-02-19 05054, 2020
-
amCap1712
2020-02-19 05001, 2020
-
yvanzo
cool, there are some needed features, I created tickets for some already
2020-02-19 05033, 2020
-
ruaok
great -- I've been adding lists called "related tickets" and adding links to jira for tasks.
2020-02-19 05058, 2020
-
iliekcomputers
I have the road map on my list of things to look at.
2020-02-19 05022, 2020
-
amCap1712
CatQuest: the dll is ready to be tested on windows. ping me when you're available
2020-02-19 05054, 2020
-
ruaok
yvanzo: do you have a trello account? you're not part of the team.