#metabrainz

/

      • ericd[m]
        Maybe give me another 10 minutes. I want to have some more observations.
      • mayhem[m]
        ok, np
      • bitmap[m]
        reosarevok[m]: I was hoping for a way to reproduce the wide char one by just browsing to a page, but I think the `EditExternalLinks` one suffices too, since that goes through Catalyst
      • reosarevok[m]
        Yeah, I didn't look further since that one did hit it already :)
      • ericd[m]
        <ericd[m]> "I'll check" <- ah not a bug. it's just that MB has that many releases to return :D
      • mayhem[m]
        ericd[m]: Then maybe limit the number of items we return?
      • ericd[m]
        mayhem[m]: yeah I will change it to some more reasonable amounts, or this may confuses users
      • mayhem[m]
        if the list is truncated, maybe we can add a link to where they can see the rest on the web?
      • ericd[m]
        s/amounts/amount/, s/confuses/confuse/
      • <mayhem[m]> "if the list is truncated..." <- make sense. i will add a link in the feed content.
      • reosarevok[m]
        yvanzo, bitmap: can at least one of you also check https://github.com/metabrainz/musicbrainz-serve... to see if you feel it's ready?
      • yvanzo: I understand you wanted a release today, was that prod or beta?
      • yvanzo[m]
        prod
      • It should have been on last Monday.
      • reosarevok[m]
        Ok, just making sure
      • Seems good to me
      • Also, if either of you have a problem with MBS-13681 let me know - if not I'll look into it this week
      • BrainzBot
        MBS-13681: Show recent event additions/artwork on the frontpage https://tickets.metabrainz.org/browse/MBS-13681
      • ericd[m]
        mayhem: I think we can restore test server now. Thanks!
      • mayhem[m]
        very good -- thanks for working with me on this.
      • monkey: social-share-image is coming back now.
      • monkey[m]
        Fankyou
      • yvanzo[m]
        reosarevok: The two hot topics currently in my pipeline are still SolrCloud backups and Event edit page in React.
      • reosarevok[m]
        yvanzo: sure, I'm just sharing that one because it's not a team contributor and we usually try to fast-track those :)
      • But no worries if you don't have time rn
      • bitmap[m]
        I'm looking at teh layout shift one
      • lucifer[m]
        [@mayhem](https://matrix.to/#/@mayhem:chatbrainz.org) Sure.
      • bitmap[m]
        s/teh/the/
      • lucifer[m]
        <rimskii[m]> "lucifer: I opened a PR in troi..." <- Probably related to some outdated GitHub action we are using, feel free to ignore. I'll fix.
      • mayhem[m]
        lucifer[m]: what was this in ref to?
      • yvanzo[m]
        bitmap, yellowhatpro: ping 🔔
      • yellowhatpro[m]
        pongg
      • yvanzo[m]
        bitmap: I just addressed all of reosarevok ’s comments for the prod-needed PR about releasing.
      • yellowhatpro[m]
        This is what I am working on currently: https://github.com/yellowHatpro/mb-exurl-ia-ser...
      • I got messed in it for quite some time. Had written a lot of code for Error handling when I realised I am just over complicating stuff, so went with straightforward impl
      • lucifer[m]
        <mayhem[m]> "what was this in ref to?" <- Some GitHub action failure on a troi PR
      • mayhem[m]
        ah.
      • djl has quit
      • djl joined the channel
      • yvanzo[m]
        yellowhatpro: Ok, so you’re on using `this_error` atm?
      • theflash[m] joined the channel
      • theflash[m] uploaded an image: (407KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/DcGBWAPCriZamWJOFhWqglJe/IMG_7483.PNG >
      • yellowhatpro[m]
        I haven't yet, but as rustynova suggested, I will explore and use it
      • reosarevok[m]
        "We will rule over this error, and we will call it... this_error"
      • theflash[m]
        akshaaatt[m]: hey, I have implemented pagination in the feed, when i am using LazyVStack, the duplicate events are not being loaded at once
      • yellowhatpro[m]
        I will be focusing on these 2 points in the current pr:
      • - api mocking
      • - dealing with rate limiting
      • bitmap[m]
        you will add some tests for make_archival_network_request using the API mocking, yes?
      • yellowhatpro[m]
        Yes will add tests for this
      • bitmap[m]
        besides the lack of tests and RustyNova's suggestions I think it looks pretty good
      • but I'd make sure to remove your API key too
      • yvanzo[m]
        It should be retrieved from a configuration file instead.
      • yellowhatpro[m]
        aah yes. Will remove it soon.
      • Also,should we use some meb account for archiving?
      • yvanzo[m]
        Maybe for deployment. Does the account matter for development?
      • yellowhatpro[m]
        Nope, for dev I am using my own id and key
      • yvanzo[m]
        Having a configuration file should probably be a priority as there are a number of hard-coded values in the code that would better fit that too.
      • yellowhatpro[m]
        yvanzo[m]: Yeah I will add them in env fle itself, I thought to add in the final commits of the PR. Current credentials don't matter much
      • bitmap[m]
        for MBS we wrote a small service that mocks the IA's S3 API, you could also write something similar here that mocks the /save endpoint (for development)
      • yellowhatpro[m]
        bitmap[m]: Oh we are using IA's API in MBS, are we dealing with rate limiiting in that as well?
      • yvanzo[m]
        yellowhatpro: `.env` might be too limited. A TOML file might be more appropriate. See the crate `config` for example.
      • yellowhatpro[m]
        yvanzo[m]: ok will make it work soon
      • Okk, gonna explore this_error and config crates then
      • yvanzo[m]
        No, we aren’t using the same API from MBS.
      • bitmap[m]
        yellowhatpro[m]: not really, we just sleep 1s between each event (but each event may take 1-2s to process). if we hit the rate limit (which is rare), it's just retried later
      • yellowhatpro[m]
        ohh alright.
      • bitmap[m]
        but yes, it's a completely different API
      • yellowhatpro[m]
        bitmap[m]: Right. I should try something similar.
      • Maybe I should just apply some math and since I am mostly polling, the time can be configured
      • bitmap[m]
        for importing existing edits we may want something that takes better advantage of the rate limit though, since that process will take a while
      • yvanzo[m]
        It is okay to start with a simplistic rate limiting indeed. It can be improved later on, once everything starts to be working together.
      • Ideally, it should be what bitmap mentioned: different threads or processes to handle polling and requesting.
      • yellowhatpro[m]
        Ummm a doubt here
      • Ok nvm. I thought you meant we have to create multiple threds for requesting
      • yvanzo[m]: Yupp I am running polling and archiving in different threads
      • yvanzo[m]
        Threading isn’t in the main goals, so at most just make a note about it for the stretch goals if you want to remind about it.
      • Great if you have some kind of threading already. :)
      • Yes, multiple threads for requesting might be a thing if we can be allowed a higher rate limit.
      • yellowhatpro[m]
        <bitmap[m]> "for importing existing edits..." <- Regarding this, If we are rate limited, then should I focus on maximizing the requests.
      • For ex, if I don't have any URL to process in current poll (while making a request), should I devote that time to archive the existing ones?
      • yvanzo[m]: Ohh did you mean archiving x URLs parallely ?
      • yvanzo[m]
        Yes (as stretch goals)
      • yellowhatpro[m]
        Got it ✅
      • bitmap[m]
        yellowhatpro[m]: > <@yellowhatpro:matrix.org> Regarding this, If we are rate limited, then should I focus on maximizing the requests.
      • > For ex, if I don't have any URL to process in current poll (while making a request), should I devote that time to archive the existing ones?
      • if there's still work to do you should maximize the requests you can do, ideally
      • but you can start with something simple as yvanzo said
      • yellowhatpro[m]
        btw there has to be another task that will do cleanup/re-archival part of URLs that couldn't get archived in the first place. That will also repeat after x amout of time. That has to be done after I am done with the archival part
      • bitmap[m]
        not sure what you meant by "archive the existing ones" though, do you mean older edits?
      • yellowhatpro[m]
        <bitmap[m]> "for importing existing edits..." <- yupp older ones. I thought you were referring to them when you said importing existing edits
      • bitmap[m]
        yeah, I was, but I was under the impression that there was only one edit counter that is incremented; so it starts from the beginning, and doesn't process new edits until all previous ones have been processed
      • yellowhatpro[m]
        I mean its configurable, we can either have it start from the beginning, or from the latest one as well
      • I haven't really thought what should be the better thing to do. But later if we go with the trigger impl, we will have to start with the latest edits, which keeps on incrementing.
      • But in any case, I will try to archive all the previous ones as well
      • yvanzo[m]
        yellowhatpro: There is a feature in GitHub to mark your PRs as drafts if needed.
      • yellowhatpro[m]
        ok, should I make the wip PR draft?
      • yvanzo[m]
        It seems synonymous indeed :)
      • * It seems to be synonymous indeed
      • yellowhatpro[m]
        Okii made it a draft one
      • bitmap[m]
        <yellowhatpro[m]> "I mean its configurable, we..." <- I assumed "if I don't have any URL to process in current poll (while making a request), should I devote that time to archive the existing ones" was within a single process -- i.e. how are you keeping track of which edits have been processed in that case
      • yellowhatpro[m]
        Edits processed means when I am adding them to `internet_archive_urls` table right?
      • I am just tracking the last edit in that case
      • Sorry I get suuper confused sometimes
      • fletchto99 has quit
      • fletchto99 joined the channel
      • yvanzo[m]
        No worries, it should become more clear once you have API requests in the loop.
      • bitmap[m]
        maybe I misunderstood you :) I thought you were talking about prioritizing the processing of new (recent) URLs, and then processing old (existing) URLs only if there are no recent ones polled -- which would require separate counters
      • btw, if the service is stopped, where are edit_note_start_idx and edit_data_start_idx read from such that it can continue from where it left off?
      • yellowhatpro[m]
        from internet_archive_urls table : https://github.com/yellowHatpro/mb-exurl-ia-ser...
      • Welp I didn't write a clear function doc for it
      • Will update that
      • akshaaatt[m]
        On it mayhem !
      • theflash__ (IRC): can you elaborate more on the issue?
      • bitmap[m]
        yellowhatpro[m]: hmm, doesn't that only track the last edit with a save-able URL? many edit notes won't have URLs for example
      • BrainzGit
        [listenbrainz-server] 14MonkeyDo opened pull request #2937 (03master…entity-stats-page): LB-1102: Revamp Top Entity stats pages https://github.com/metabrainz/listenbrainz-serv...
      • yellowhatpro[m]
        Yeah, right. But internet_archive_urls is the only place for now where I can look for the data. Is there any other way where I can keep the latest edit data and edit note id?
      • yvanzo[m]
        Probably a separate table last_processed_rows
      • bitmap[m]
        you could introduce a new table to store them
      • yellowhatpro[m]
        alright then, a new table coming right up ✅
      • bitmap[m]
        that's why I was asking about prioritizing recent edits (so that they are archived right away such that the state of the page at the time the edit or note was entered is preserved) over older ones
      • yellowhatpro[m]
        what do you refer to when you say the state of the page??
      • The recent rows ?
      • bitmap[m]
        the content of the page being archived
      • yvanzo[m]
        bitmap: Prioritizing recent edits certainly is a longer term goal.
      • bitmap[m]
        with the last_processed_rows table you could potentially keep separate counters for recent vs. historical edits later on
      • yellowhatpro[m]
        bitmap[m]: oh nice, now I am able to process things
      • yvanzo[m]
        It will have to be as flexible as possible, but if we just start with one row pointer per table, that would be a good start.
      • yellowhatpro[m]
        cool, each row in last_processed_rows pointing to the latest processed rows of different tables (edit_data and edit_table currently)
      • latest processed during polling, regardless it containg URL or not
      • yvanzo[m]
        yellowhatpro: At first glance, what columns do you imagine for this new table?
      • yellowhatpro[m]
        id, latest_row_processed, table_name
      • as of now
      • yvanzo[m]
        Yup, even though id is probably unneeded (or I’m missing the point).
      • yellowhatpro[m]
        yeah it's not needed
      • yvanzo[m]
        Or just use it to refer to the id in the other table?
      • pranav[m]
        akshaaatt: I’ll try to get the stats page in soon before mid term evals
      • yellowhatpro[m]
        yeah right
      • yvanzo[m]
        You might also need a column column as not every table has a column id.
      • (or id_column if it helps with clarity)
      • yellowhatpro[m]
        id_column will refer to id in case of edit_note and edit in case of edit_data, right?
      • yvanzo[m]
        That should work.
      • discordbrainz
        <05rustynova> bitmap: "not really, we just sleep 1s between each event..." That's the easy part. Now deal with an async and parallel environment and it starts messing itself up in .23 femtoseconds. Either you do the clean way and use semaphores, holding permits until the next refresh window, or you do the ugly way and just hold a mutex until prev_request_start + 1. I had to do the later one for MB_RS as it doesn't have the http
      • headers needed to the number of tickets per window: https://github.com/RustyNova016/musicbrainz_rs_...
      • yellowhatpro[m]
        Cool
      • yvanzo[m]
        bitmap, yellowhatpro: Anything else to be discussed about the archiver project? :)
      • discordbrainz
        <05rustynova> As for the DB schema, why not just doing a ORDER BY on an inserted_at column?
      • bitmap[m]
        yvanzo[m]: nothing else from me right now :) thanks!
      • texke has quit
      • yellowhatpro[m]
        yvanzo[m]: Nothing in particular, got the notes for today