      • reosarevok
        Updating beta
      • Updating prod
      • Update done
      • chaban
        And so it begins. WhatGear links are removed: https://musicbrainz.org/edit/78092748
      • reosarevok
        Are these links useful? Should we be storing them but under otherdbs or something?
      • Or is this just added by someone promoting their own site and are not worth storing?
      • chaban
      • On one hand it looks like some SEO site, on the other hand the info looks truthful. There is often proof even.
      • Looks like I just found out what the eb_ suffix means in page source, e.g. eb_twitter on https://whatgear.com/pro/skrillex
      • https://equipboard.com/pros/skrillex that site and its URL structure looks strikingly similar to WhatGear
      • Experimental new site? Copycat? ¯\_(ツ)_/¯
      • Speaking of gear: https://musicbrainz.org/edit/78065634 that information is from Sound Credit
      • Discovered it a few days ago and added it to the wiki: https://wiki.musicbrainz.org/index.php?title=Ot...
      • At the same time I learned that you can now get ISNIs for free: https://blog.soundcredit.com/post/Music-Industr...
      • nifemi joined the channel
      • rxrog
        Hi everyone
      • ROpdebee
      • chaban
        You made it. Welcome to the 80s
      • ROpdebee
      • i've been meaning to join for a while now, but never found the time for it :P
      • anyways, I've just finished some talks with ArchiveTeam, they're going to queue up all URLs from the latest MB dump later today into one of their projects, and set up a live data feed to archive new URLs as they are added
      • those will eventually be injected into the wayback machine
      • BrainzBot
        MBS-9009: Every time a Homepage/Blog/Discography/Biography URL is submitted to MB, it should also be submitted to the Wayback Machine
      • reosarevok
      • ROpdebee
        edit notes aren't included in the live data feed though, so those won't be archived automatically (yet). reosarevok: Is there any way we could get those in a feed too?
      • reosarevok
        ROpdebee: probably not, since they're not meant to be public-public (they require login)
        ah I see, that makes sense
      • URL entities are now being added: https://tracker.archiveteam.org/urls/
      • darwin
        pretty happy about this, for like when things get removed from beatport or bandcamp
      • useful to have an archive to be able to refer back to
      • musicfan
        I am attempting to add a band named Soraia (https://www.soraia.com/). There's already another unrelated entry with that name, so I'm attempting to disambiguate, but the disambiguation field remains red no matter what, which does not allow me to submit. Is this a known issue or am I doing something wrong?
      • finalsummer
        whatgear looks like a SEO spam site and should be blacklisted. equipboard looks legitimate and has actual user contributions, whatgear looks like just outdated scraped(?) info from the former
      • https://musicbrainz.org/edit/77426700 okay this is 100% a SEO spam account, ban? (ping reosarevok)
      • crism
        Yeah. Added a ton of “official home page” links which clearly aren’t. When called out on one, said, “Oops!”
      • User reported.
      • CatQuest
        [14:46] <ROpdebee> anyways, I've just finished some talks with ArchiveTeam, they're going to queue up all URLs from the latest MB dump later today into one of their projects, and set up a live data feed to archive new URLs as they are added
      • wait so urls in the edit notes or what?
      • becasue omg
      • BUT url entities automatically being put in the IA?
      • 👏 🎉 👏
      • ROpdebee: whooooo
      • [14:47] <ROpdebee> those will eventually be injected into the wayback machine
      • I'm very excited about this!!!!
      • (even if it's not edit note urls today)
      • musicfan: hi! can you give a sreenshot of how you're doing it?
      • ROpdebee
        Well, currently most of the URL entities in the 2021-03-13 mbdump have been grabbed by archiveteam. they'll be uploaded to IA and injected into the wayback machine sometime soon. new or updated URL entities should be grabbed via the live data feed, I've been told that'll be set up later today
      • as for edit notes and annotations, those will likely be done every three days with new data from the dumps, but i'm still working on extracting the URLs as i'd like to replicate the way the MB server does it to make sure we're parsing them consistently
      • CatQuest
        like reo said I'm not sure edit notes are possible :/ but annotation ones should be
      • but this is *such* a help! so many old urls are gone and they were the proof or input fro many releases. some releases you can't even find any more
      • I wonder 🤔 would it be possible to get a simple list of urls that no longer resolve + weren't already in the ia?
      • or rather maybe not a list but to see the number of them
      • ROpdebee
        probably not, the effort that would take would be equivalent to the actual archival
      • CatQuest
        hah, alright
      • ROpdebee
        or maybe a bit less, but would still take as many requests (>6M for URL entities alone)
      • CatQuest
        yea not viable
      • anyway this is excellent news! this exact thing is something I've always worried about. it should have been in function ages ago <3
      • but from now on new urls should be caught so no *new* urls are "lost"
      • oh. btw maybe you should also do this with BookBrainz.org
      • still fairly undeveloped but should also evnetually be able to link to all kinds of things
      • woudl be great to have the url archiving fro mthe get-go
      • ROpdebee
        what could be useful though, is a periodic dump of all recently "used" URLs on MB
      • CatQuest
        used how?
      • entered?
      • ROpdebee
        say once a day a file is uploaded to some FTP with URLs that have been entered into edit notes, annotations, URL entities which have been edited (or their ARs added/removed/edited)
      • CatQuest
      • reosarevok: ?
      • ROpdebee
        that file could be injected into AT's queue immediately with little effort, and you'd get a snapshot of the URLs in the exact state as when they were used
      • CatQuest
        reosarevok: how hard would it be to create a dump ofthe urls entered into edit notes?
      • i personally don't know the data (just a long standing editor+BBstylecat and MB Instrument Inserter) you' wanna talk to reo, yvanzo, zas, etc
      • ROpdebee
        yeah i'm just throwing out ideas, now we'd have to download and process fairly large DB dumps to get just a couple thousand new links every three days
      • to be clear though, we can still get the urls in edit notes from one of the dump files
      • reosarevok
        bitmap: ^ does this seem like something that could be done kinda like with the json dumps?
      • ROpdebee
        also, couple of caveats: The project it's being inserted into currently doesn't archive page requisites (images, css, js, etc) but it's better than nothing I guess
      • and Spotify links probably aren't useful, since it loads data dynamically from the API, and those responses aren't captured either. so you'll just get broken pages :( I told them about this, but it's a wontfix situation
        reosarevok: sounds very plausible yes
        ROpdebee: awesome, archiving URLs is something we wanted for a long time, it's even one of ideas for GSoC: https://musicbrainz.org/doc/Development/Summer_...
