#metabrainz

/

      • yvanzo[m]
        It is exactly what we discussed last week.
      • It is just another way to invoke the app.
      • bitmap[m]
        yvanzo[m]: right, I guess the idea is that the polling task would start from the very first edit (another scenario triggers couldn't cover)
      • yvanzo[m]
        That wasn’t my idea, but it seems even better.
      • yellowhatpro[m]
        I was thinking to archive the URLs that are there before the service starts, but should we be doing it simultaneously, whiile the archiver is archiving the current set of URLs
      • ?
      • yvanzo[m]
        That is a possibility.
      • yellowhatpro[m]
        But I am limiting my thoughts due to rate limiting as well.
      • I know that we won't be getting much URLs per edit/edit note
      • bitmap[m]
        I guess it depends on how fast it can archive the existing URLs
      • yellowhatpro[m]
        * edit/edit note, but the previous ones are still many right
      • yvanzo[m]
        It will be rate limited by the API.
      • yellowhatpro[m]
        bitmap[m]: But we still are capped by IA rate limit right
      • yvanzo[m]
        I don’t see any test in the repo.
      • bitmap[m]
        yes, you can do a rough calculation based on the IA rate limit and number of existing URLs
      • yvanzo[m]
        Do you have a mock API for testing, or is it always running with the real API?
      • yellowhatpro[m]
        yvanzo[m]: yeah sorry, I was exploring tests past week.
      • Did'nt add any rn. Will start writing them
      • yvanzo[m]: Currently there is no mock API. I am assuming the URL to be saved, and then it would return some JOB_ID, which I would update in the internet_archive_urls table
      • But should I create any mock API?
      • yvanzo[m]
        It would make sense to start with unit and doc testing: https://doc.rust-lang.org/rust-by-example/testi...
      • yellowhatpro[m]
        yvanzo[m]: Alright I'l start adding tests then ✅
      • I wanted to ask more about the URL filterings, like what URLs to exclude
      • bitmap[m]
        yellowhatpro[m]: a mock API is also be useful for testing what happens when an archive request fails
      • s/be//
      • yellowhatpro[m]
        bitmap[m]: I see. I will try to keep it in another repo then, that's fine right?
      • yellowhatpro[m]: Regarding this, I have an idea to keep a blacklist, and URLs similar to them will not be inserted to the table
      • What other aspects should we cover in URLs filtering?
      • yvanzo[m]
        Some active crates to mock HTTP requests are listed at https://github.com/leoschwarz/reqwest_mock#readme
      • In MBS, we do record/replay HTTP requests for integration tests.
      • bitmap[m]
        <yellowhatpro[m]> "I see. I will try to keep it..." <- I wouldn't expect a separate repo is needed, rather an existing mock API like yvanzo referenced is used
      • yellowhatpro[m]
        Yupp, will add the required mocks then.
      • Any tip for db tests btw?
      • Like what areas should I cover in that aspect?
      • yvanzo[m]
        Start with something very simple, such as hand-crafted examples if needed.
      • bitmap[m]
        <yellowhatpro[m]> "What other aspects should we..." <- how are you generating this list? automatically adding domains that fail consistently? allowing manual entries?
      • yellowhatpro[m]
        Currently was thinking to manually enter, but I shall check if there is any domain filtering list
      • Adding domains that fail consistently is one good thing I can implement
      • yvanzo[m]
        If you need an example to start with, musicbrainz.org URLs should be filtered out.
      • yellowhatpro[m]
        Like if there is any site, which fails to archive ever after x retries, I can prolly add it to some domain filtering table
      • yvanzo[m]
        A manually-curated list would be nice to have just in case some domain names must be temporarily ignored.
      • yellowhatpro[m]
        Right
      • discordbrainz
        <05rustynova> DB tests depend on your DB driver. Sqlx is auto testing on your DB at compile time. Add the serde feature and it's done.
      • <05rustynova> CDD FTW
      • fletchto99_ joined the channel
      • yellowhatpro[m]
        Yupp I am using sqlx
      • Oh that's cool
      • fletchto99 has quit
      • fletchto99_ is now known as fletchto99
      • yvanzo[m]
        bitmap, yellowhatpro: What else did you want to discuss?
      • discordbrainz
        <05rustynova> I know that sea orm also use sqlx under the hood, but I don't know about the compile time checking. Although for ORMs it's just a matter of checking the schema in the code against the schema in the DB.
      • yellowhatpro[m]
        yvanzo[m]: I also remember reo once said that we should prevent spam users edits as well iirc.
      • yvanzo[m]
        Yes but that is a separate issue.
      • Basically, when retrieving rows from edit_data, it should be checked that the edit author isn’t marked as spammer.
      • Same for edit_note.
      • Also consider edit_note_change for later on.
      • yellowhatpro[m]
        <yvanzo[m]> "That is one more use case for..." <- Also this. Sorry, I think I misunderstood earlier, but is it just a cli script to write, right? Sorta simple methods to write entries to `internet_archive_urls` ?
      • Can you elaborate a bit on this. I might have misunderstood it from earlier meet as well.
      • yvanzo[m]
        yellowhatpro: Ok, from “week 1”, “a small, workable app, which can perform the above-mentioned tasks” and “will be tested individually” made me thought that the app would be able to perform any of these tasks individually when requested through arguments.
      • atj[m]
        zas, yvanzo: just FYI, now that we have a firewall I've update Solr to listen on the wildcard address, so you can make API requests to localhost as well as the LAN IP. The HAProxy backend configuration is now the same on all nodes.
      • yellowhatpro[m]
        Oh got it. what I meant was to work on them separately, and checking if I was able to get the result from it correctly.
      • atj[m]
        I've force-pushed the updates to my solr branch in the metabrainz-ansible repo
      • yvanzo[m]
        For example: To queue a URL for archiving: melba --queue-url http://example.com/; To queue an edit for archiving the URLs it contain: melba --queue-edit 42
      • yellowhatpro[m]
        For that purpose, I thought working on separate branches would be fine
      • bitmap[m]
        <yvanzo[m]> "bitmap, yellowhatpro: What..." <- sorry I had to go afk for a moment, nothing specific from me, just to answer any of yellowhatpro's concerns
      • yvanzo[m]
        yellowhatpro[m]: It has to be in the same branch eventually.
      • yellowhatpro[m]
        yvanzo[m]: I didn't think it that way, but I can come up with cmd line args feature support
      • yvanzo[m]: Yupp got the point
      • bitmap[m]
        <yellowhatpro[m]> "Any tip for db tests btw?..." <- > <@yellowhatpro:matrix.org> Any tip for db tests btw?
      • > Like what areas should I cover in that aspect?
      • look at what DB queries the archiver is making, and make sure you have test data that satisfies each type of query. at least one example for each edit type
      • yvanzo[m]
        yellowhatpro[m]: See the crate clap
      • yellowhatpro[m]
        bitmap[m]: Got it, will work on this ✅
      • yvanzo[m]: yeah I have used it in past so will impl it sooon
      • So to summarize... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • I should cover these aspects then
      • yvanzo[m]
        Did you have any other question about transformations?
      • yellowhatpro[m]
        Not at the moment, I was only curious about the filtering part, and any other edit types
      • Oh btw
      • For extracting URL from text, I am using a crate Linkify: https://docs.rs/linkify/0.10.0/linkify/index.html
      • So I might have to filter emails as well.
      • And while I am speaking of emails, I realised there are other types such as media (images ending with .jpeg)
      • Do we filter media items as well?
      • yvanzo[m]
        yellowhatpro[m]: Linkify has can be restricted to URLs only.
      • It can also handle scheme-less URLs.
      • yellowhatpro[m]
        yvanzo[m]: yeah . finder.kinds(&[LinkKind::Url]); ✅
      • yvanzo[m]
        Even .jpeg images can be used as reference, better archiving these too.
      • yellowhatpro[m]
        Alrightyy. media stays
      • <yellowhatpro[m]> "So to summarize..." <- > <@yellowhatpro:matrix.org> So to summarize... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • yvanzo[m]
        About your summary: Tests can be split into documentation tests (the easiest to start with) and unit tests (simple to add as well and very useful). Integration tests can be delayed until there is proper mocking of SQL and HTTP.
      • I would suggest to prioritize tests at first (as it can help with spotting basic issues), then command-line (as it can help with debugging), and the rest.
      • yellowhatpro[m]
        Thanks. working on it ✅
      • yvanzo[m]
        You’re doing well, keep it up! 😎
      • bitmap, yellowhatpro: Thanks for your time!
      • atj[m]
        Sounds like this is going to be a really useful project
      • I've looked at IA in vain many times trying to see if edits were correct etc.
      • yellowhatpro[m]
        yvanzo[m]: Thanks. I didn't outline many things earlier in proposal, but as we move I can see there are many things to work on. Thanks y'all for help, will make things work
      • atj[m]: Oh cool. Well that's motivation
      • atj[m]
        [@yvanzo](https://matrix.to/#/@yvanzo:chatbrainz.org): I don't know what your thoughts are but I feel like we either do the Solr switch tomorrow or wait 2 weeks until I'm back from holiday.
      • yvanzo[m]
        atj, bitmap: I patched all the MB web containers to allow switching to the new cluster (using HTTPS).
      • v6lur joined the channel
      • atj[m]
        Is there a new mb-solr release? I'm using a snapshot extracted from the Docker build I think
      • yvanzo[m]
        I worked on upgrading to Solr 9.6.1 too.
      • atj[m]
        You can do the Solr upgrade tomorrow using Ansible :)
      • it was super easy when I upgraded to 9.6.0
      • v6lur has quit
      • BrainzGit
        [mbsssss] 14yvanzo opened pull request #66 (03master…solr-9-config): Update the general configuration file to Solr 9 https://github.com/metabrainz/mbsssss/pull/66
      • [mmd-schema] 14yvanzo opened pull request #39 (03master…update-java-deps): Update the dependencies of the Java Binding https://github.com/metabrainz/mmd-schema/pull/39
      • bitmap[m]
        reosarevok: yvanzo: MBS-13628 is caused by the production-cron container only having stub edit classes for the EAA edit types, which don't implement the proper `alter_edit_pending` methods.
      • I'm wondering if we should temporarily deploy the beta image there, since it would be difficult to extract all of the necessary bits of code needed to the production branch (and complicate reverting the changes later)
      • BrainzBot
        MBS-13628: Events with added event art remain hightlighted after all edits are applied. https://tickets.metabrainz.org/browse/MBS-13628
      • reosarevok[m]
        bitmap, yvanzo: whe are we planning to just release prod?
      • s/whe/when/
      • bitmap[m]
        I mean, I'd be fine with doing that this week
      • yvanzo[m]
        There are two tickets reported about beta.
      • bitmap[m]
        we don't have any documentation prepared for the EAA yet though
      • yvanzo[m]
        MBS-13613
      • BrainzBot
        MBS-13613: Beta: Show secondary event art if no poster is available https://tickets.metabrainz.org/browse/MBS-13613
      • yvanzo[m]
        and MBS-13602 which is a reminder of https://github.com/metabrainz/musicbrainz-serve...
      • BrainzBot
        MBS-13602: Beta: docs for Event Art/Types needed https://tickets.metabrainz.org/browse/MBS-13602
      • reosarevok[m]
        We have type descriptions, so that's trivial, it just needs copying from https://beta.musicbrainz.org/admin/attributes/E... (ideally we'd actually make that public but the PR doing that is still not approved IIRC so we can put them on the wiki for now)
      • bitmap[m]
        BrainzBot: determining which images are marked as front is unfortunately a schema change
      • yvanzo[m]
        bitmap: It can be useful to deploy beta to production-cron to also test using the PG standby.
      • reosarevok[m]
        The non-poster one very much does not seem in any case like it needs to happen in beta
      • Oh, if it can't, more for that, but in any case, it's not a beta ticket really, it's a perfectly fine long term improvement suggeston
      • s/suggeston/suggestion/
      • yvanzo[m]
        It isn’t complicated to implement either.
      • We can make a decision at least before forgetting about it.
      • bitmap[m]
        I was also thinking we were just querying /front, but that's not accurate, we should just be able to query for a fallback image from the database
      • but indexing them as front images is a schema change at least
      • reosarevok[m]
        Allowing something like flyer seems fine to me
      • yvanzo[m]
        front is for release? are we discussing events?
      • reosarevok[m]
        As fallback
      • Yes, events, but it's still the concept of "frontiest" because there's no better way :)
      • bitmap[m]
      • yvanzo[m]
        Ok, front is also used as internal verbiage by the EAA.
      • Or it is rather a copy/paste from the CAA. :)
      • So basically, just replace type_id = 1 with type_id IN (1, 2)(or any relevant IDs.
      • bitmap[m]
        Yep, something like that
      • yvanzo[m]
        Banner too? Then 1, 4, 6?
      • v6lur joined the channel
      • reosarevok[m]
        Banner is... interesting, because it's usually horizontal so a weird fit for the sidebar
      • yvanzo[m]
        Would it be worse than no image at all?
      • We should also order by preferred type: Poster > Flyer (> Banner if retained)
      • BrainzGit
        [musicbrainz-server] 14mwiencek opened pull request #3291 (03beta…mbs-13629): MBS-13629: Beta: ISE loading "Remove cover art" edit for removed release https://github.com/metabrainz/musicbrainz-serve...
      • [bookbrainz-site] 14MonkeyDo merged pull request #1097 (03master…dependabot/npm_and_yarn/braces-3.0.3): chore(deps): bump braces from 3.0.2 to 3.0.3 https://github.com/metabrainz/bookbrainz-site/p...
      • v6lur has quit
      • [bookbrainz-site] 14MonkeyDo merged pull request #1099 (03master…dependabot/npm_and_yarn/ws-7.5.10): chore(deps): bump ws from 7.5.6 to 7.5.10 https://github.com/metabrainz/bookbrainz-site/p...
      • [mbsssss] 14yvanzo merged pull request #66 (03master…solr-9-config): Update the general configuration file to Solr 9 https://github.com/metabrainz/mbsssss/pull/66