#metabrainz

/

      • ruaok
        if we had the diskspace, I would say, screw it, just write a duplicate table.
      • and then we can swap over in an instant.
      • but I dont think we can get away with that.
      • kinduff
        alrighty, thank you lucifer, reosarevok, ruaok
      • alastairp
        and even then, I suspect that recreating the continuous aggregate on a new datetime column at the same time that we have the one on the timestamp column will cause disk problems too
      • ruaok
        we wouldn't need that.
      • the new table would not be accessed until the actual switchover.
      • we could take the listenstore offline for an hour or so during the switchover, no big deal.
      • alastairp
        for both reads and writes?
      • and then during that time drop the existing aggregate and recreate it on the datetime column?
      • ruaok
        yes
      • alastairp
        I'm happy to try it, though now this task has now ballooned in size a bit
      • ruaok
        could we run a faily simple trial as a proof of concept?
      • fairly == not fail-y
      • create table, start copied old rows, monitor for 10% and extrapolate how much disk space it would really take.
      • and only if it looks doable do we proceed with this approach.
      • alastairp
        the only reason to do both columns at the same time would be to avoid 2 schema changes, are we worried about that? (I'm not)
      • sure, let me put together a quick PR for that
      • ruaok
        i'm not.
      • but, I am worried about an exclusive table lock during the ALTER TABLE command.
      • and we have no idea how long ALTER TABLE will run for
      • alastairp
        I believe that add column default null in postgres now no longer needs a lock
      • however, moving to not null may need one
      • ruaok
        if we can avoid a table lock we should use your solution. clearly simpler.
      • alastairp
        OK, I'll do the following: 1) PR for moving to user_id, 2) PR for testing the change to a date time field - by making a new table and copying 10%, 3) verify that adding a column doesn't need a lock and check the time that adding a not null constraint requires
      • thanks for the discussion
      • ruaok
        1) does not need a table lock either?
      • alastairp
        for adding the column, no
      • but now I'm doubting the change of the constraint
      • ruaok
        let's do 3) first.
      • because that will really inform 1 and 2
      • alastairp
        perfect, let me finish my db import and I'll test that
      • ruaok
        thx
      • [1997kB] has quit
      • outsidecontext_
        alastairp: is this intentional or an oversight? https://tickets.metabrainz.org/projects/AB/issu...
      • ruaok
        alastairp: what exactly is broken about the public LB dumps?
      • BrainzBot
        AB-460: API: Missing feature tonal.chords_changes_rate
      • ruaok
        I see data.
      • wow. I just ran a query on timescale to extract spotify recording IDs from the new mapping.
      • anyone wanna guess how many rows it has?
      • reosarevok
        9387579!
      • (I might have generated a random number)
      • ruaok
        you might be withing 20%
      • -g
      • 11M rows!
      • but, those are mapped against MSIDs.
      • meaning it contains loads of dupes
      • 1.4M unique recordings.
      • lucifer: alastairp : quick ponderance.... for the parquet based LB dumps intended to be imported into spark...
      • those are mostly intended as internal use. does it make sense to spend all that time XZ compressing them just to move them to another server at the same datacenter?
      • (or cluster of datacenters)
      • I'm inclined to not compress at ALL.
      • lucifer
        sure i think we can get away without compressing to xz.
      • also, parquet files by default use "snappy" compression iirc so it might already be comparable to compressed json anyway.
      • ruaok
        oh. well, that makes everything easier then.
      • was it 64MB chunks?
      • I wonder how to estimate that.
      • lucifer
        128 MB chunks or a lit less than less.
      • ruaok
        k.
      • if there is compression in the mix, I'll have to play with it to see if I can get close without going over.
      • ruaok laughs at the thought that his first hard drive was 30MB large
      • reosarevok
      • BrainzBot
        MBS-11767: Track-level artists that differ from the release artist are no longer shown on multi-disc releases that aren't fully loaded
      • reosarevok
        I took a quick look but I'm not sure why medium is not being detected as changed by useMemo
      • Sophist-UK joined the channel
      • Sophist-UK has quit
      • Sophist-UK joined the channel
      • Sophist_UK has quit
      • lucifer
        ruaok: re lb public dumps, iiuc the `user` table schema of the public dumps is incorrect. we only import that table when there is no private dump so when we try to import public dump solely we get an error.
      • ruaok
        ah
      • BrainzGit
        [musicbrainz-android] 14akshaaatt opened pull request #81 (03master…patch-1): Update README.md https://github.com/metabrainz/musicbrainz-andro...
      • akshaaatt[m]
        lucifer: I updated the readme of the github project. Will add more changes soon but this seems like a good start.
      • We should add the website, topics and tags as well to the repository
      • <akshaaatt[m] "We should add the website, topic"> in the github about section
      • ritiek joined the channel
      • revi has quit
      • revi joined the channel
      • akashgp09 joined the channel
      • lucifer
        akshaaatt[m]: we don't have a website for the app. adding topics and tags sounds good.
      • akshaaatt[m]
      • [1997kB] joined the channel
      • ruaok
      • alastairp
        ruaok: sorry, I had to pop out. did lucifer answer your question about public dumps?
      • akshaaatt[m]
        <ruaok "https://juliareda.eu/2021/07/git"> This is really interesting!
      • ruaok
        alastairp: y
      • alastairp
        outsidecontext_: that's an oversight
      • akshaaatt[m]
        I really wish it were a free plugin though. Anyway, open sourced plugins similar to this will float eventually.
      • alastairp
        or more specifically, we made a list of things that we thought people might want to select, and that wasn't in our initial
      • list
      • lucifer
        akshaaatt[m]: i think we can use the MB android app page but that means we also have to maintain it at two places. let's finalize the details of the readme and see how we want to do it.
      • akshaaatt[m]
        Okaaayyy boss!
      • lucifer
        alastairp: sklearn training now takes ~5m after fixing the groundtruth path.
      • alastairp
        🎉
      • lucifer
        time to move to next step now :D
      • alastairp
        did you find the model file?
      • lucifer
        we have a lot of files in dataset directory of sklearn. checking for pkl file.
      • yup we have it
      • `/home/acousticbrainz/acousticbrainz-server/data/datasets/8f9c452b-6cef-4f36-a4c9-f2b29d4f167b/8f9c452b-6cef-4f36-a4c9-f2b29d4f167b/best_clf_model.pkl`
      • alastairp
        great
      • why is the uuid there twice?
      • lucifer
        not sure, but that's part of the groundtruth path stuff. due to some reason, groundtruth path is used to calculate dataset_dir path.
      • alastairp
        might be another error or perhaps an issue when selecting the groundtruth path, let's see if we can get rid of it
      • lucifer
        i added a separate arg to avoid messing with it. i'll read through the code and work on simplifying it.
      • alastairp
        ok, sounds great
      • lucifer
        do we need to keep the standalone scripts?
      • alastairp
        yes, I think they're useful to have
      • lucifer
        👍
      • another question unrealted to the PR, why is dataset eval page in react instead of jinja2?
      • alastairp
        dataset editor is in react too
      • having the editor in react was nice, as it made it interactive
      • and so all of that part of the site is in react, as it was able to reuse code
      • lucifer
        ah right. makes sense.
      • alastairp
        nice!
      • Lotheric_ joined the channel
      • lucifer
        i just saw two different creation time format and found one is from jinja2 and other from react.
      • alastairp
        ah, right
      • yeah, there are a lot of react/data display tickets open
      • lucifer
        ah! i see.
      • i have also added the tool column here https://similarity.acousticbrainz.org/datasets/...
      • alastairp
        perfect
      • lucifer
        also looked into failed status stuff, we have it already but are not catching all exceptions so sometimes it does not get updated.
      • BrainzGit
        [acousticbrainz-server] 14alastair opened pull request #405 (03master…AB-460-chords_changes_rate): AB-460: Add tonal.chords_changes_rate to allowed lowlevel features https://github.com/metabrainz/acousticbrainz-se...
      • alastairp
        yes, I saw that. I think there's even a TODO saying to catch more exceptions, right? :)
      • Lotheric has quit
      • lucifer
        yup, how poetic.
      • outsidecontext_
        alastairp: thanks, makes sense. So I could submit a PR to add this
      • alastairp
        outsidecontext_: ^ I just did :)
      • outsidecontext_
        Thanks!
      • alastairp
        I'm just finishing up a few other features that I hope to merge soon, so expect to see this available some time this week
      • akshaaatt[m]
        <BrainzGit "[musicbrainz-android] akshaaatt "> lucifer: Changes made ✌️
      • BrainzGit
        [musicbrainz-android] 14amCap1712 merged pull request #81 (03master…patch-1): Update README.md https://github.com/metabrainz/musicbrainz-andro...
      • lucifer
        akshaaatt[m]: thanks! looks much better now.
      • !m akshaaatt[m]
      • BrainzBot
        You're doing good work, akshaaatt[m]!
      • akshaaatt[m]
        Sweet! 💯
      • reosarevok
        bitmap: https://tickets.metabrainz.org/browse/MBS-11762 seems like another side effect of the assumption you had about the disc URLs
      • BrainzBot
        MBS-11762: Medium toolbox missing on disc URLs
      • reosarevok
        I'm not sure what's the best option with this
      • ritiek has quit
      • ritiek joined the channel
      • lucifer
        meeting time? :)
      • Freso: ping
      • ruaok
        wanna take a stab at running the meeting until Freso appears, lucifer ?
      • if not, I'm happy to kick things off.
      • lucifer
        i haven't done that anytime before. better if you do it :D
      • ruaok
        ok.
      • lucifer
        thanks!
      • ruaok
        <BANG>
      • meeting time!