#metabrainz

/

      • CatQuest
        put perhasp as an image or osmething, ro prevent spambots uh, reading it
      • 2022-06-16 16744, 2022

      • alastairp
        CatQuest: I don't think that's an issue - the trick is to prevent a bot from finding the site, seeing that it's a mediawiki, and then trying to create an account automatically
      • 2022-06-16 16721, 2022

      • alastairp
        99.9% of spammers who encounter the slightest amount of resistance are going to move on to the next site and not bother. I bet no one will even try and load the create user page and see why it failed
      • 2022-06-16 16739, 2022

      • CatQuest
        hmmmm
      • 2022-06-16 16741, 2022

      • Pratha-Fish
        alastairp: i.e. 0.301% of ALL UNIQUE rec-MBIDs are unknown. (Don't belong to the recording table OR have a valid redirect.)
      • 2022-06-16 16710, 2022

      • CatQuest
        huh i'd be interestedin a report of those
      • 2022-06-16 16714, 2022

      • alastairp
        Pratha-Fish: perfect! (although 0.3% is higher than the 0.2 that I computed yesterday :(, ohwell)
      • 2022-06-16 16736, 2022

      • alastairp
        CatQuest: yeah, because at some point in time lastfm thought that they were valid recordings. so, were they deleted from MB?
      • 2022-06-16 16757, 2022

      • Pratha-Fish
        Still reasonable ig :))
      • 2022-06-16 16701, 2022

      • alastairp
        Pratha-Fish: please make a list of those mbids and put them in a pastebin for CatQuest
      • 2022-06-16 16720, 2022

      • Pratha-Fish
        I already have them ready 😎
      • 2022-06-16 16734, 2022

      • Pratha-Fish
      • 2022-06-16 16734, 2022

      • Pratha-Fish
        here you go ^
      • 2022-06-16 16754, 2022

      • Pratha-Fish
        ^ CatQuest alastairp
      • 2022-06-16 16702, 2022

      • alastairp
        what's the first column? row id?
      • 2022-06-16 16707, 2022

      • CatQuest
        uhh.
      • 2022-06-16 16717, 2022

      • CatQuest
        *would* like a n full url if it was possible tbh
      • 2022-06-16 16745, 2022

      • alastairp
        could you recompute this including the timestamp of the first scrobble with that id? (it'd be great to take the number and convert it to a full datetime representation too)
      • 2022-06-16 16745, 2022

      • Pratha-Fish
        Ah that first column is residual index. give me a sec
      • 2022-06-16 16712, 2022

      • CatQuest
        that first one tho. 666 :eyes:
      • 2022-06-16 16741, 2022

      • CatQuest
      • 2022-06-16 16750, 2022

      • CatQuest
        so far the ones i've checked have lal been gone
      • 2022-06-16 16706, 2022

      • CatQuest
        what I'm guessing is that these whre removed fro mmb for various reasons.
      • 2022-06-16 16729, 2022

      • CatQuest
        left over recordings in releases that users wrongly changed the recordings (instead of reusing/merging)
      • 2022-06-16 16741, 2022

      • alastairp
        CatQuest: right, none of these mbids are in MB
      • 2022-06-16 16744, 2022

      • CatQuest
        most likley the culprit
      • 2022-06-16 16704, 2022

      • Pratha-Fish
        alastairp: I think computing it with the timestamp would require computing the whole (duplicated) dataset.
      • 2022-06-16 16704, 2022

      • Pratha-Fish
        To do that, we might have to map the unique ID's computations to their duplicated counterparts in the complete dataset.
      • 2022-06-16 16707, 2022

      • alastairp
        what I'd love to know though is if we can find out what happened to them. if we loaded a 1 year old db dump would some of them show up?
      • 2022-06-16 16724, 2022

      • CatQuest
        possible
      • 2022-06-16 16734, 2022

      • alastairp
        Pratha-Fish: I don't know how to do it in pandas, but in regular python I'd do something like this:
      • 2022-06-16 16701, 2022

      • CatQuest
        you shouldn't do things inside pandas, too much eucalyptus
      • 2022-06-16 16736, 2022

      • Pratha-Fish
        CatQuest: That one's reserved for Koalas tho. Pandas eat bamboos
      • 2022-06-16 16713, 2022

      • CatQuest
        🤦
      • 2022-06-16 16716, 2022

      • CatQuest
        yes you're right
      • 2022-06-16 16726, 2022

      • CatQuest
        you shouldn't do things inside pandas, too much bamboo
      • 2022-06-16 16757, 2022

      • CatQuest
        ruined my own joke, thanks brain
      • 2022-06-16 16707, 2022

      • Pratha-Fish
        heh
      • 2022-06-16 16717, 2022

      • alastairp
      • 2022-06-16 16730, 2022

      • alastairp
        Pratha-Fish: this should be pretty fast, even without using a vectorised lookup
      • 2022-06-16 16755, 2022

      • Pratha-Fish
        Let's give it a shot
      • 2022-06-16 16752, 2022

      • alastairp
        I'm not sure if this will give interesting results - because I don't know if lastfm did the recording lookup when the scrobble was added, or if it was re-processed in bulk at some later time
      • 2022-06-16 16709, 2022

      • CatQuest
        ... dang what i the wikipage where people put theri names for the meeting
      • 2022-06-16 16717, 2022

      • CatQuest
        i can't find a link to it on https://wiki.musicbrainz.org/MetaBrainz_Meeting either
      • 2022-06-16 16755, 2022

      • Pratha-Fish
      • 2022-06-16 16705, 2022

      • CatQuest
        oh great
      • 2022-06-16 16710, 2022

      • CatQuest
        thanks
      • 2022-06-16 16722, 2022

      • Pratha-Fish
        alastairp: iirc all rows have a corresponding timestamp to them, so it shouldn't be a problem
      • 2022-06-16 16750, 2022

      • Pratha-Fish
        alastairp: Oh nvm it was easy enough (2 lines of code in pandas! That to with Cython speeds :)))
      • 2022-06-16 16707, 2022

      • alastairp
        great
      • 2022-06-16 16740, 2022

      • alastairp
        how does that work? filter the data frame to only mbids in the list of unknown mbids, and then order by date and pick the first of each duplicate?
      • 2022-06-16 16758, 2022

      • Pratha-Fish
        It simply does the following:
      • 2022-06-16 16758, 2022

      • Pratha-Fish
        1) Takes in a list of unk IDs
      • 2022-06-16 16758, 2022

      • Pratha-Fish
        2) Makes a bool_map of where the IDs lie in the main dataset
      • 2022-06-16 16758, 2022

      • Pratha-Fish
        3) Uses that filter to get locations where these IDs lie
      • 2022-06-16 16712, 2022

      • Pratha-Fish
      • 2022-06-16 16750, 2022

      • Pratha-Fish
        alastairp: CatQuest the updated lists with timestamps has been pushed on the git repo too (https://github.com/Prathamesh-Ghatole/MLHD/tree/m…)
      • 2022-06-16 16747, 2022

      • alastairp
        Pratha-Fish: https://github.com/Prathamesh-Ghatole/MLHD/blob/m… includes all timestamps for a given recording mbid, duplicating it. Can you pick just the earliest timestamp?
      • 2022-06-16 16708, 2022

      • Pratha-Fish
        sure
      • 2022-06-16 16730, 2022

      • alastairp
        Pratha-Fish: btw, generally we don't put data files in git repositories because if they change often then the size of the repository gets larger and larger
      • 2022-06-16 16740, 2022

      • alastairp
        in this specific case since it's a testing/experiment repo this isn't as much of an issue
      • 2022-06-16 16742, 2022

      • Pratha-Fish
        Oh
      • 2022-06-16 16753, 2022

      • alastairp
        but we try very hard to not put this kind of stuff in the main code repos for our projects
      • 2022-06-16 16757, 2022

      • Pratha-Fish
        So where should it be hosted?
      • 2022-06-16 16702, 2022

      • alastairp
        great question
      • 2022-06-16 16720, 2022

      • alastairp
        for now it would be OK to just upload it to the irccloud pastebin
      • 2022-06-16 16724, 2022

      • alastairp
        or gist
      • 2022-06-16 16735, 2022

      • Pratha-Fish
        gist sounds good
      • 2022-06-16 16751, 2022

      • alastairp
        eventually we could set up some hosting space on wolf so that we can put files there and give them a public url
      • 2022-06-16 16719, 2022

      • Pratha-Fish
        alastairp: Please also give me a mini tutorial on how to do that if possible!
      • 2022-06-16 16759, 2022

      • CatQuest
        I mean irccloud paste works fine too
      • 2022-06-16 16718, 2022

      • Pratha-Fish
        Is IRCcloud paste persistant though? (Not that we need persistent paste anyway)
      • 2022-06-16 16734, 2022

      • Pratha-Fish
      • 2022-06-16 16723, 2022

      • Pratha-Fish
      • 2022-06-16 16723, 2022

      • Pratha-Fish
        Apparently irccloud paste doesnt support files larger than 50kb
      • 2022-06-16 16711, 2022

      • Pratha-Fish
        ^ CatQuest alastairp
      • 2022-06-16 16709, 2022

      • alastairp
        yes, irccloud paste is permanent. If you click the "attach" button to the right of the input box and then "Text snippets" you see a list of all the ones you created, and you can delete them if you want
      • 2022-06-16 16735, 2022

      • alastairp
        I suspect that if you had a larger file you could just upload it, the limit there should be larger
      • 2022-06-16 16705, 2022

      • alastairp
        Pratha-Fish: could you convert the timestamps to date objects? e.g. using datetime.fromtimestamp and datetime.isoformat (https://docs.python.org/3/library/datetime.html#d…) and make the recording-mbid start with https://musicbrainz.org/recording/
      • 2022-06-16 16726, 2022

      • skelly37 has quit
      • 2022-06-16 16728, 2022

      • Pratha-Fish
        yep
      • 2022-06-16 16700, 2022

      • Pratha-Fish
        I didn't quite get the "make the recording-MBID start with" part
      • 2022-06-16 16731, 2022

      • Pratha-Fish
      • 2022-06-16 16748, 2022

      • Pratha-Fish
        All timestamps are pointing to 1970
      • 2022-06-16 16757, 2022

      • Pratha-Fish
        What's the original format of these timestamps?
      • 2022-06-16 16744, 2022

      • Pratha-Fish
        nvm its working now
      • 2022-06-16 16717, 2022

      • Pratha-Fish
      • 2022-06-16 16702, 2022

      • Pratha-Fish
      • 2022-06-16 16709, 2022

      • alastairp
        great, well done. one more thing - can you order by timestamp?
      • 2022-06-16 16707, 2022

      • Pratha-Fish
        yes should be easy enough
      • 2022-06-16 16759, 2022

      • Pratha-Fish
        Timestamp sorted rec-mbids
      • 2022-06-16 16759, 2022

      • Pratha-Fish
      • 2022-06-16 16723, 2022

      • Pratha-Fish
      • 2022-06-16 16730, 2022

      • Pratha-Fish
        ^timestamp sorted artist-MBIDs
      • 2022-06-16 16702, 2022

      • CatQuest
        none ofthese are younger than 9 years...
      • 2022-06-16 16755, 2022

      • BrainzGit
        [musicbrainz-docker] 14yvanzo merged pull request #230 (03master…fix-sir-dev-deps): Replace virtualenv with user site-packages directory for SIR development https://github.com/metabrainz/musicbrainz-docker/…
      • 2022-06-16 16717, 2022

      • alastairp
        Pratha-Fish: see how easy it is to make that kind of statement ("none ofthese are younger than 9 years") when we sort the data?
      • 2022-06-16 16742, 2022

      • alastairp
        so all of the things I have been asking for are to make it easy to take a glance at it and make some sort of statement
      • 2022-06-16 16713, 2022

      • alastairp
        CatQuest: so now I'm tempted to import an old 2012-era dump (do those exist?!) and see how many of them are there
      • 2022-06-16 16707, 2022

      • mayhem
      • 2022-06-16 16716, 2022

      • mayhem
        silly office mate. :)
      • 2022-06-16 16705, 2022

      • Pratha-Fish
        alastairp: I see!
      • 2022-06-16 16705, 2022

      • Pratha-Fish
        That's the whole soul of analytics ig.
      • 2022-06-16 16705, 2022

      • Pratha-Fish
        I'll try to bring in some similar stats from the next time too
      • 2022-06-16 16721, 2022

      • CatQuest
        alastairp: hmm...
      • 2022-06-16 16733, 2022

      • CatQuest
        would it even be possible to just chek with archive
      • 2022-06-16 16747, 2022

      • CatQuest
        i'm sure it has 2012 era snapshots of mb pages...
      • 2022-06-16 16755, 2022

      • lucifer
        chinmay: iiuc, for instance: the listens page for any user shows feedback for current user if user is logged in but if user is not logged in then feedback for the user whose page is open is shown.
      • 2022-06-16 16703, 2022

      • lucifer
        alastairp: i see, at one place in LB/CB also we were using copy_expert. it has been around for a while.
      • 2022-06-16 16732, 2022

      • wargreen joined the channel
      • 2022-06-16 16741, 2022

      • v6lur joined the channel
      • 2022-06-16 16738, 2022

      • KevlarNoir has quit
      • 2022-06-16 16757, 2022

      • BrainzGit
        [acousticbrainz-server] 14alastair merged pull request #297 (03master…incremental-full-dumps): Incremental full dumps https://github.com/metabrainz/acousticbrainz-serv…
      • 2022-06-16 16734, 2022

      • KevlarNoir joined the channel
      • 2022-06-16 16759, 2022

      • aerozol
        Yay for biking brainz!!
      • 2022-06-16 16710, 2022

      • KevlarNoir has quit
      • 2022-06-16 16743, 2022

      • v6lur has quit
      • 2022-06-16 16704, 2022

      • v6lur joined the channel
      • 2022-06-16 16744, 2022

      • v6lur has quit
      • 2022-06-16 16743, 2022

      • v6lur joined the channel
      • 2022-06-16 16704, 2022

      • skelly37 joined the channel
      • 2022-06-16 16756, 2022

      • skelly37 has quit