#metabrainz

/

      • prabal joined the channel
      • yvanzo
        reosarevok: agreed, seems to describe the same issue(s), maybe with variations.
      • reosarevok: yes it is totally spurious (the PR just changes MD files)
      • iliekcomputers
        shivam-kapila: sounds good!
      • yvanzo
        reosarevok: I donโ€™t think we should go for a more recent version, maintenance is planned from Apr 2020 to Apr 2021.
      • that would be an artificial requirement and discourage people (if not already) to install it on a system that comes with stock v10.
      • reosarevok
        Is there no actual improvement that would be useful to us in the newer versions?
      • yvanzo
        no idea, even if it would improve perfs (for example), it would still not be needed
      • iliekcomputers
        ruaok: hey
      • ruaok
        hey
      • iliekcomputers
        So to remove the serialization deserialization stuff
      • ruaok
        in the ingestion process, you mean?
      • iliekcomputers
        I wonder if it's worth it to investigate using protobuf instead of json
      • Yes
      • Or in general
      • ruaok
        quite possibly.
      • rdswift joined the channel
      • but, let me throw a spanner into the works.
      • I've been reading up on influxdb and what exactly cardinality entails in that world.
      • right now our series cardinality is 980441 (estimated)
      • rdswift has quit
      • from what I understand that is the product of measurements X fields (in our case).
      • and series cardinality of <1M is considered medium use.
      • >1M is considered high level of use.
      • rdswift joined the channel
      • >10M is not practical.
      • iliekcomputers
        We're already there
      • ๐Ÿ‘๐Ÿฝ
      • ruaok
        so, if we hit 80k users, we're into the "impractical" zone.
      • yeah.
      • so, that opens a giant can of worms.
      • we could, thinking out loud, stay with influx DB.
      • if we get rid of most fields and store a JSON string, then our cardinality would be greatly reduced.
      • and we'd be good for a while, I think.
      • and that is fine -- we never really query on anything but listened at and user.
      • the alternative is to check out timescale.
      • iliekcomputers
        That sounds reasonable to me. It's an archive anyways
      • ruaok
      • timescale is an addition to postgres.
      • and extension to be exact. it brings hypertables to postgres with 100% of the postgres functionality.
      • we connect to like like postgres and it has the schema of postgres, but supposedly performance is that of influx.
      • iliekcomputers
        Im not sure I see the justification to completely changing the database yet.
      • ruaok
        well, it looks like we have to re-write the whole DB.
      • switching DBs might be a small extra cost... aside from rewriting a pile of SQL.
      • but into a saner syntax we know and love.
      • BUT, the big thing?
      • we can modify old rows.
      • delete a user? trivial.
      • delete individual listens? cake.
      • iliekcomputers
        That sounds very nice
      • ruaok
        too good to be true, honestly.
      • iliekcomputers
        How old is timescale
      • ruaok
        so my plan is to do the migration to 1.7.9 tomorrow and see how that performs.
      • should buy us some time
      • then I will try a trial migration to timescale and see how it goes.
      • iliekcomputers
        ruaok: I think an exploration on how to make our shit more resource efficient might be a great gsoc project.
      • ruaok
        I could easily connect a timescale_write to the rabbitmq and also write to timescale.
      • in parallel.
      • good point.
      • I dont think a test to kick the tires should take more than a day to knock out.
      • and if we have a full DB, running in parallel we can migrate bits of code on beta and roll it out later.
      • iliekcomputers
        Mhmm, that's the right way to do database migrations
      • (so I learned after the AB migration)
      • So the gsoc project
      • ruaok
        not sure how old it is, but it seems to have enough cred and and it is postgres. I get the same postgres ethos vibe from their stuff.
      • iliekcomputers
        Could contain the following:
      • 1. Rewrite influx writer in go/rust
      • 2. Use protobuf insteaf of json
      • 3. Do the stuff we need for migration
      • ruaok: what do you think is the urgency of the migration?
      • ruaok
        if we do the schema creation and the initial DB mirror, (which are the tough bits) then this totally makes sense.
      • I think a gsoc project would be a perfect fit. I think the migration to 1.7.9 will buy us that time.
      • if not, we can re-write the DB, if we have to.
      • iliekcomputers
        (famous last words) (kidding)
      • ruaok
        true that.
      • there are many little things about influx that bug me.
      • iliekcomputers
        Influx has been a PITA since the beginning tbh
      • ruaok
        would would be getting rid of a lot of code designed to deal with influx and idosyncracies.
      • API wise yes. stability and speed, until as of late, was good.
      • iliekcomputers still remembers the shitty escape logic I wrote
      • DING
      • that.
      • and we KNOW how to postgres.
      • however, this puts us at odds with an already suggested GSoC project. or two.
      • the add ability to delete listens and it also impacts like/dislike.
      • naw, not really, actually.
      • if the prereq is the we have the parallel DB in place before coding starts, the new features can code against TS, not influx.
      • iliekcomputers
        Does delete listens even remain a project?
      • ruaok
        it could -- we should expand the scope a bit.
      • iliekcomputers
        I guess there's the UI to do still
      • ruaok
        like automated user rename/delete when a users gets deleted. and UI components for delete listens.
      • iliekcomputers
        And if you add the idea to deleting listens in the incremental dumps too
      • ruaok
        I think that just broke my head.
      • ruaok puts it back together.
      • yes, agreed
      • iliekcomputers
        If we have a deadline on this migration however, either you or pristine__ will have to take point on it.
      • ruaok
        me
      • its a weekend project. so 3 months should be a good fit.
      • iliekcomputers
        Heh
      • ruaok
        not really weekend project.
      • I mean you and I could hammer it out during our hack day.
      • I'm positive.
      • and not just a partial migration, but migrating *everything*
      • it is mostly porting queries. and our queries are pretty simple and the postgres query format is simpler.
      • most of it would be massaging code around.
      • iliekcomputers
        Hmm.
      • Yeah, I think that sounds reasonable if we aim for a first cut
      • ruaok
        yeah.
      • I still think a quick and dirty proof of concept is in order.
      • iliekcomputers
        Yeah, agreed.
      • ruaok
        ingest the LB data dump into TS -- how long does that take?
      • iliekcomputers
        Considering that weekend is like a month away anyeays
      • ruaok
        we could even it against influx -- we have the code for ingesting into influx.
      • side by side comparison.
      • also, did you see what I posted earlier?
      • I read in the influx docs that appending near current time timestamps is fast.
      • but adding older timestamps incurs a significant overhead in influx.
      • iliekcomputers
        Yeah, that's a pain.
      • So
      • ruaok
        and I noticed that the spikes in our queue backing up are.. people importing data.
      • like a last.fm import.
      • of those are problematic now, we're... screwed.
      • iliekcomputers
        How long have we had the queue backed up
      • Did we have more people importing in the last few days?
      • ruaok
        first spike appeared 3-feb
      • yes, I noticed 4 people ended up causing the big backlog
      • this parallel DB is a really good idea. we can compare real world cases.
      • iliekcomputers
        Yeah
      • bukwurm has quit
      • I don't want to decide that we're migrating databases too quickly.
      • ruaok
        influx has made me nervous for a while now.
      • iliekcomputers
        I'd like it if we had actual numbers
      • ruaok
        agreed.
      • iliekcomputers
        Because in the end, we are time constrained and even if it's pretty easy to implement it'll still take effort that could be spent developing features.
      • I do think that the json low hanging fruit is something that should go into a gsoc project.
      • ruaok
        let me add it to the trello.
      • yvanzo
        reosarevok: selenium failure on 1379 is spurious too
      • ruaok
      • iliekcomputers: ^^
      • yvanzo
        trello resurrected?
      • ruaok
        I'm using trello for the new MeB roadmap and for LB planning.
      • yvanzo: do you have a roadmap of features needed for the VM?
      • it could use adding to that trello.
      • amCap1712
        CatQuest: hi
      • yvanzo
        cool, there are some needed features, I created tickets for some already
      • ruaok
        great -- I've been adding lists called "related tickets" and adding links to jira for tasks.
      • iliekcomputers
        I have the road map on my list of things to look at.
      • amCap1712
        CatQuest: the dll is ready to be tested on windows. ping me when you're available
      • ruaok
        yvanzo: do you have a trello account? you're not part of the team.