#metabrainz

/

      • d4rkie has quit
      • 2020-05-21 14239, 2020

      • Nyanko-sensei joined the channel
      • 2020-05-21 14202, 2020

      • Lotheric joined the channel
      • 2020-05-21 14247, 2020

      • ishaanshah[m] has left the channel
      • 2020-05-21 14202, 2020

      • Gore has quit
      • 2020-05-21 14219, 2020

      • Gore joined the channel
      • 2020-05-21 14219, 2020

      • kori joined the channel
      • 2020-05-21 14202, 2020

      • Nyanko-sensei has quit
      • 2020-05-21 14232, 2020

      • BrainzGit
        [listenbrainz-server] ishaanshah opened pull request #872 (master…master): LB-595: Update API Endpoint to reflect support for multiple time ranges https://github.com/metabrainz/listenbrainz-server…
      • 2020-05-21 14233, 2020

      • BrainzBot
        LB-595: Update API Endpoint to reflect support for multiple time ranges https://tickets.metabrainz.org/browse/LB-595
      • 2020-05-21 14244, 2020

      • ishaanshah joined the channel
      • 2020-05-21 14232, 2020

      • ishaanshah
        iliekcomputers: Hi, the API endpoint PR is ready for review, please have a look when free :)
      • 2020-05-21 14254, 2020

      • Nyanko-sensei joined the channel
      • 2020-05-21 14218, 2020

      • thomasross has quit
      • 2020-05-21 14209, 2020

      • Chinmay3199 joined the channel
      • 2020-05-21 14219, 2020

      • BrainzGit
        [listenbrainz-server] ishaanshah opened pull request #873 (master…release_stats): [WIP] LB-577: Add "Top Releases" statistics (Spark + DB) https://github.com/metabrainz/listenbrainz-server…
      • 2020-05-21 14219, 2020

      • BrainzBot
        LB-577: Add "Top Releases" statistics (Spark + DB) https://tickets.metabrainz.org/browse/LB-577
      • 2020-05-21 14249, 2020

      • v6lur joined the channel
      • 2020-05-21 14225, 2020

      • ishaanshah
        iliekcomputers: Hi, I got the release statistics to work
      • 2020-05-21 14240, 2020

      • ishaanshah
        I had some points which I would like to discuss
      • 2020-05-21 14250, 2020

      • ishaanshah
        please ping me when you get time
      • 2020-05-21 14227, 2020

      • iliekcomputers
        I'm around
      • 2020-05-21 14250, 2020

      • ishaanshah
        Hi
      • 2020-05-21 14206, 2020

      • ishaanshah
        I got the release statistics to work
      • 2020-05-21 14225, 2020

      • ishaanshah
        However I noticed a lot of code duplication taking place
      • 2020-05-21 14216, 2020

      • ishaanshah
        For eg
      • 2020-05-21 14224, 2020

      • ishaanshah
      • 2020-05-21 14246, 2020

      • ishaanshah
        These functions are completely same as the ones for artists
      • 2020-05-21 14218, 2020

      • ishaanshah
        The only function different is the main get_realeses function, i.e the SQL query
      • 2020-05-21 14230, 2020

      • ishaanshah
        get_releases*
      • 2020-05-21 14258, 2020

      • iliekcomputers
        Right
      • 2020-05-21 14200, 2020

      • ishaanshah
        Similar things have happened in spark reader
      • 2020-05-21 14230, 2020

      • ishaanshah
        I was wondering wether we should wait until we calculate all user stats
      • 2020-05-21 14242, 2020

      • ishaanshah
        And then look into generalisation
      • 2020-05-21 14257, 2020

      • ishaanshah
        Or start now itself?
      • 2020-05-21 14237, 2020

      • iliekcomputers
        I see you've most fixed it in the spark reader side
      • 2020-05-21 14201, 2020

      • ishaanshah
        I fixed the insert_db query
      • 2020-05-21 14214, 2020

      • ishaanshah
        But handlers are duplicated
      • 2020-05-21 14230, 2020

      • iliekcomputers
        I'd say try to deduplicate it on the spark cluster side.
      • 2020-05-21 14245, 2020

      • iliekcomputers
        The recording thing would be really small then
      • 2020-05-21 14207, 2020

      • ishaanshah
        Yeah, we should get all the data needed from the query itself
      • 2020-05-21 14227, 2020

      • iliekcomputers
        Yeah, this is actually pretty cool
      • 2020-05-21 14237, 2020

      • ishaanshah
        I guess we should have a hierarchy like this
      • 2020-05-21 14211, 2020

      • ishaanshah
        stats -> user_stats -> graph_type -> further_info
      • 2020-05-21 14241, 2020

      • iliekcomputers
        What's this hierarchy for?
      • 2020-05-21 14248, 2020

      • ishaanshah
        The query
      • 2020-05-21 14222, 2020

      • ishaanshah
        for example for weekly artist stats
      • 2020-05-21 14225, 2020

      • iliekcomputers
        Please elaborate
      • 2020-05-21 14253, 2020

      • ishaanshah
        query = stats.user.entity.artist.week
      • 2020-05-21 14208, 2020

      • ishaanshah
        for sitewide graphs when we add them
      • 2020-05-21 14219, 2020

      • ishaanshah
        query = stats.sitewide.artist.week
      • 2020-05-21 14246, 2020

      • ishaanshah
        stats.sitewide.entity.artist.week*
      • 2020-05-21 14256, 2020

      • iliekcomputers
        How does this help with code deduplication here?
      • 2020-05-21 14246, 2020

      • ishaanshah
        So in our case
      • 2020-05-21 14210, 2020

      • ishaanshah
        We see user, we send it to a generic user handler function
      • 2020-05-21 14232, 2020

      • ishaanshah
        which sees entity, and sends it the entity handler
      • 2020-05-21 14250, 2020

      • ishaanshah
        entity handler sees artist and week
      • 2020-05-21 14214, 2020

      • ishaanshah
        it calls the get_week('artist')
      • 2020-05-21 14212, 2020

      • ishaanshah
        Right now we only have user.entity
      • 2020-05-21 14223, 2020

      • iliekcomputers
        First impression, That seems a bit overkill to me
      • 2020-05-21 14230, 2020

      • ishaanshah
        But later we will have user.daily_activity
      • 2020-05-21 14237, 2020

      • ishaanshah
        user.listening_history
      • 2020-05-21 14241, 2020

      • ishaanshah
        and so on
      • 2020-05-21 14206, 2020

      • iliekcomputers
        There's only one layer of handlers right now, so it's easy to see what message calls what function
      • 2020-05-21 14227, 2020

      • iliekcomputers
        If we do this, more layers are added, things get harder to understand.
      • 2020-05-21 14212, 2020

      • ishaanshah
        Hmm, that is true
      • 2020-05-21 14230, 2020

      • iliekcomputers
        I was mostly thinking of doing something similar to the entity thing you've done on the insert function, bring the common code out to a generic entity function, making the handlers really small.
      • 2020-05-21 14232, 2020

      • ishaanshah
        for now we can get away with modifying the current handler ig
      • 2020-05-21 14224, 2020

      • ishaanshah
        Yeah so basically function will be get_entity_week(<entity_name>)
      • 2020-05-21 14238, 2020

      • iliekcomputers
        Yeah. Something like that.
      • 2020-05-21 14212, 2020

      • iliekcomputers
        I don't want to do too much processing of the query string because it's very hard to document it later.
      • 2020-05-21 14236, 2020

      • ishaanshah
        Yeah, that was my concern too
      • 2020-05-21 14219, 2020

      • ishaanshah
        Also right now some places artists is used and someplaces artist is used
      • 2020-05-21 14219, 2020

      • iliekcomputers
        Let's keep the query string -> handler mapping 1<>1 and direct for now.
      • 2020-05-21 14252, 2020

      • iliekcomputers
        ishaanshah: I don't have any preferences there, the db column is artist by design, change the rest as you see fit.
      • 2020-05-21 14258, 2020

      • ishaanshah
        That makes it hard to deduplicate code
      • 2020-05-21 14224, 2020

      • ishaanshah
        I was thinking of making everything to artist
      • 2020-05-21 14235, 2020

      • ishaanshah
        So that we don;t have to modify the schema
      • 2020-05-21 14242, 2020

      • iliekcomputers
        If that's the case, change it to one of them in a separate PR, and base the release PR off of that.
      • 2020-05-21 14256, 2020

      • iliekcomputers
        That way they both are easy to review.
      • 2020-05-21 14229, 2020

      • ishaanshah
        Yeah, I will do that
      • 2020-05-21 14251, 2020

      • iliekcomputers
        Cool, thanks!
      • 2020-05-21 14227, 2020

      • iliekcomputers
        I'm glad that release and recording won't take much time, now that we've put the work in for artists. That's very ideal. :)
      • 2020-05-21 14242, 2020

      • ishaanshah
        I am done with the API PR, overestimated this time 😆
      • 2020-05-21 14235, 2020

      • iliekcomputers
        Heh
      • 2020-05-21 14248, 2020

      • iliekcomputers
        I took a quick look, looks sane to me, a few small changes.
      • 2020-05-21 14218, 2020

      • iliekcomputers
        alastairp: I think data classes will fit perfectly for our stats json in postgres.
      • 2020-05-21 14241, 2020

      • iliekcomputers
        I'm gonna play around with it tomorrow and see is it works.
      • 2020-05-21 14255, 2020

      • iliekcomputers
        *if I can get it to work.
      • 2020-05-21 14253, 2020

      • reosarevok
        bitmap: those reports worked before though or?
      • 2020-05-21 14250, 2020

      • bitmap
        yeah, they must've been I guess (otherwise our daily script would've been getting stuck like now)
      • 2020-05-21 14212, 2020

      • bitmap
        there's a hard statement_timeout of 56s on the database
      • 2020-05-21 14212, 2020

      • bitmap
        but I could see if increasing the statement_timeout helps
      • 2020-05-21 14226, 2020

      • reosarevok
        When you said you failed at optimizing it, do you mean it didn't improve at all, or not enough to run?
      • 2020-05-21 14233, 2020

      • reosarevok
        Because any improvements might be useful
      • 2020-05-21 14221, 2020

      • bitmap
        well, none of the improvements I tried allowed it to complete within 60s, so I didn't bother checking beyond that
      • 2020-05-21 14256, 2020

      • bitmap
        RecordingTrackDifferentName completes with a 120s statement_timeout, but RecordingsSameNameDifferentArtistsSameName still doesn't
      • 2020-05-21 14211, 2020

      • iliekcomputers
        bitmap: you're up late! (or early)
      • 2020-05-21 14239, 2020

      • bitmap
        I had a nap earlier :)
      • 2020-05-21 14249, 2020

      • bitmap
        (but should prob go back to sleep soon)
      • 2020-05-21 14205, 2020

      • reosarevok
        Oh god
      • 2020-05-21 14219, 2020

      • reosarevok
        Of course, we only run it once a day
      • 2020-05-21 14228, 2020

      • reosarevok
        Can we set a different timeout for reports than we do for other things?
      • 2020-05-21 14236, 2020

      • reosarevok
        Reports could have 5 min for all we care
      • 2020-05-21 14240, 2020

      • bitmap
        that's me running those report queries btw :P https://stats.metabrainz.org/d/000000067/postgres…
      • 2020-05-21 14248, 2020

      • diru1100
        Mo"ing !!
      • 2020-05-21 14248, 2020

      • bitmap
        I'll try with 300s, it's kinda ridiculous beyond that though
      • 2020-05-21 14227, 2020

      • reosarevok
        Agreed
      • 2020-05-21 14240, 2020

      • reosarevok
        So we can set it only for the report run?
      • 2020-05-21 14246, 2020

      • bitmap
        yup
      • 2020-05-21 14249, 2020

      • reosarevok
        Just have the script up the timeout then bring it back to normal, right?
      • 2020-05-21 14200, 2020

      • bitmap
        if it's in a transaction, SET LOCAL statement_timeout = '300s';
      • 2020-05-21 14240, 2020

      • reosarevok
        then I guess we might as well
      • 2020-05-21 14212, 2020

      • bitmap
        can do for RecordingTrackDifferentName I guess
      • 2020-05-21 14220, 2020

      • bitmap
        we'll see if RecordingsSameNameDifferentArtistsSameName finishes yet
      • 2020-05-21 14257, 2020

      • bitmap
        'ERROR: canceling statement due to statement timeout' after 5 mins too
      • 2020-05-21 14207, 2020

      • reosarevok
        jeez
      • 2020-05-21 14214, 2020

      • reosarevok
        Ok, that *must* be possible to optimize
      • 2020-05-21 14234, 2020

      • reosarevok
        Recording and track are bigger tables than recording and artist...
      • 2020-05-21 14225, 2020

      • reosarevok
        I guess it's because all of the ac joins
      • 2020-05-21 14240, 2020

      • bitmap
        yeaj, though it's also joining recording twice (and artist twice, and artist_credit and artist_credit_name both twice...)
      • 2020-05-21 14232, 2020

      • reosarevok
        I guess changing the order of the FROM tables doesn't change anything?
      • 2020-05-21 14240, 2020

      • reosarevok
        I'm just not sure how this works under the hood
      • 2020-05-21 14217, 2020

      • reosarevok
        Intuitively it would make sense that checking for matching ACs first and only then comparing their recordings would be more efficient, but I think this stuff is smart enough to do things the easiest way
      • 2020-05-21 14250, 2020

      • reosarevok
        I guess the new collate isn't much slower than the previous one?
      • 2020-05-21 14220, 2020

      • BrainzGit
        [musicbrainz-server] reosarevok merged pull request #1502 (master…MBS-10813): MBS-10813: Update the Bandcamp logo https://github.com/metabrainz/musicbrainz-server/…
      • 2020-05-21 14221, 2020

      • BrainzBot
        MBS-10813: Update the Bandcamp logo used in the sidebar https://tickets.metabrainz.org/browse/MBS-10813
      • 2020-05-21 14249, 2020

      • jmp_music joined the channel
      • 2020-05-21 14219, 2020

      • BrainzGit
        [musicbrainz-server] reosarevok merged pull request #1508 (master…MBS-10822): MBS-10822: Change tableColumns tables to use named parameters https://github.com/metabrainz/musicbrainz-server/…
      • 2020-05-21 14221, 2020

      • BrainzBot
        MBS-10822: Change tableColumns tables to use named parameters https://tickets.metabrainz.org/browse/MBS-10822
      • 2020-05-21 14222, 2020

      • bitmap
        right, I don't think changing the join order would have an effect there
      • 2020-05-21 14251, 2020

      • bitmap
        and I tried running it without the COLLATE (only ordering by id) and it still times out after 5 min
      • 2020-05-21 14231, 2020

      • reosarevok
        Siiigh
      • 2020-05-21 14236, 2020

      • reosarevok
        Oh well
      • 2020-05-21 14259, 2020

      • reosarevok
        There must be a better way of doing this, but
      • 2020-05-21 14229, 2020

      • jmp_music has quit
      • 2020-05-21 14201, 2020

      • jmp_music joined the channel
      • 2020-05-21 14228, 2020

      • ruaok
        moiiin!
      • 2020-05-21 14252, 2020

      • jmp_music
        moooiin!
      • 2020-05-21 14204, 2020

      • BrainzGit
        [musicbrainz-server] yvanzo merged pull request #1520 (master…react-table-7.1.0): Bump react-table version to 7.1.0 https://github.com/metabrainz/musicbrainz-server/…
      • 2020-05-21 14212, 2020

      • iliekcomputers
        moin.
      • 2020-05-21 14241, 2020

      • iliekcomputers
        we'll probably have weekly, monthly and yearly artist stats and an API endpoint by EOD if all goes well
      • 2020-05-21 14243, 2020

      • shivam-kapila
        Morning
      • 2020-05-21 14201, 2020

      • ruaok
        end of decade? 🏃‍♀️
      • 2020-05-21 14209, 2020

      • iliekcomputers
        heh
      • 2020-05-21 14216, 2020

      • ruaok
        more seriously 🎉
      • 2020-05-21 14217, 2020

      • iliekcomputers
        definitely has felt like that
      • 2020-05-21 14233, 2020

      • yvanzo
        hi reosarevok, can you please forward me the answer from muziekweb?
      • 2020-05-21 14240, 2020

      • reosarevok
        Sure
      • 2020-05-21 14226, 2020

      • alastairp
        muziekweb? technical questions?