#metabrainz

/

      • alastairp
        and then the explain endpoint remains html-only
      • 2022-07-20 20132, 2022

      • lucifer
      • 2022-07-20 20139, 2022

      • lucifer
        i think it could be made to work by printing log first then letting the dataset hoster handle the result formatting.
      • 2022-07-20 20157, 2022

      • lucifer
        or alternatively some code duplication for the short term is fine
      • 2022-07-20 20149, 2022

      • mayhem
        lucifer: alastairp and I were just chatting...
      • 2022-07-20 20135, 2022

      • alastairp
        lucifer: yeah, I was thinking about adding the formatting in the ds hoster, but the log output of the mapper is so specific (e.g. bold indicators in certain places) that I don't think we can make it generic
      • 2022-07-20 20152, 2022

      • mayhem
        last night I realized that if we added release support for the mapper, we could solve two problems that we are currently seeing: Listening to a complication while watching data in the listening now viewer (it does the wrong thing). Identifying albums in a listen stream.
      • 2022-07-20 20110, 2022

      • mayhem
        I didn't know how to deal with releases, but now I understand. so now I feel armed to go back and add release support.
      • 2022-07-20 20119, 2022

      • alastairp
        I worked out that there's actually no need to make this machine-readable. the key is just to make mapper + explain endpoints return the same data, which just means using consistent stop word handling
      • 2022-07-20 20135, 2022

      • mayhem
        which means that alastairp you and I should spend some time at the summit to work out how to make a better mapper.
      • 2022-07-20 20139, 2022

      • mayhem
      • 2022-07-20 20129, 2022

      • lucifer
        mayhem: alastairp: makes sense. sounds good to me.
      • 2022-07-20 20141, 2022

      • alastairp
        mayhem:
      • 2022-07-20 20141, 2022

      • alastairp
        CREATE TABLE mapping.release_group_secondary_type_sort ( secondary_type integer, sort integer )
      • 2022-07-20 20145, 2022

      • mayhem
        once these improvements are in, the mapper is going to be incredible.
      • 2022-07-20 20146, 2022

      • alastairp
        INSERT INTO mapping.release_group_secondary_type_sort values (%s, %s);", tuple((id, type_id)))
      • 2022-07-20 20157, 2022

      • alastairp
        is that inserting in the correct columns?
      • 2022-07-20 20115, 2022

      • mayhem
      • 2022-07-20 20119, 2022

      • mayhem
        this is the resultant table.
      • 2022-07-20 20147, 2022

      • mayhem
      • 2022-07-20 20151, 2022

      • mayhem
        with proper order.
      • 2022-07-20 20148, 2022

      • mayhem
        so, I guess so, but perhaps the variable names could be named better.
      • 2022-07-20 20114, 2022

      • alastairp
        ah, I just saw the way that you were doing the join
      • 2022-07-20 20121, 2022

      • alastairp
        ON rgst.id = rgsts.secondary_type
      • 2022-07-20 20139, 2022

      • alastairp
        mm
      • 2022-07-20 20115, 2022

      • alastairp
        but does this mean that sec-type 8 (dj-mix) is sorted before type 6 (live)?
      • 2022-07-20 20153, 2022

      • lucifer
        each group should be assigned the same the id so that the next sort gets a chance, no?
      • 2022-07-20 20106, 2022

      • lucifer
        *all items in a group
      • 2022-07-20 20107, 2022

      • mayhem
        the sec type only comes into play for groups that have the same primary type, no?
      • 2022-07-20 20142, 2022

      • alastairp
        you're saying so that you would get the earliest of (compilation or soundtrack or live), rather than the earliest compilation, or if there are no compilations the earliest soundtrack, or if there are no.... etc, lucifer?
      • 2022-07-20 20102, 2022

      • lucifer
        alastairp: yup.
      • 2022-07-20 20127, 2022

      • alastairp
        when I mentioned groups yesterday I didn't mean to imply that I considered them all the same, I think it's OK to have an explicit ordering
      • 2022-07-20 20104, 2022

      • lucifer
        ah ok, my understanding was the other way around. makes sense to have it this way then.
      • 2022-07-20 20122, 2022

      • Pratha-Fish
        alastairp: Hi, I am back. Took longer than I expected lol
      • 2022-07-20 20140, 2022

      • Pratha-Fish
        Also, looks like the conversion is completed too :)
      • 2022-07-20 20145, 2022

      • Pratha-Fish
        It took 83.1 Hours in total
      • 2022-07-20 20153, 2022

      • alastairp
        Pratha-Fish: yeah, I was going to ask you about that
      • 2022-07-20 20100, 2022

      • mayhem
        merely a round off error from 50 hours, no worries. :)
      • 2022-07-20 20112, 2022

      • Pratha-Fish
        ☠️
      • 2022-07-20 20116, 2022

      • alastairp
        same order of magnitude, and still less than a week
      • 2022-07-20 20132, 2022

      • alastairp
        next time we need to do this, let's multi-thread it 8x
      • 2022-07-20 20141, 2022

      • alastairp
        Pratha-Fish: so... any track ids?
      • 2022-07-20 20150, 2022

      • Pratha-Fish
        alastairp: lets check!
      • 2022-07-20 20100, 2022

      • Pratha-Fish
        give me a sec
      • 2022-07-20 20128, 2022

      • Pratha-Fish
      • 2022-07-20 20134, 2022

      • Pratha-Fish
        alastairp: ^ Nothing found :)
      • 2022-07-20 20124, 2022

      • alastairp
        ah, this means that there are no files that have a log entry?
      • 2022-07-20 20138, 2022

      • alastairp
        so either it means that nothing was found, or it means that you have a bug in writing your logs ;)
      • 2022-07-20 20156, 2022

      • Pratha-Fish
        The logger returns None if the list of track-mbids in rec-mbids is Empty, if it has anything in it, it just returns a list
      • 2022-07-20 20113, 2022

      • Pratha-Fish
        I also tested it out before running, so hopefully it worked well :)
      • 2022-07-20 20138, 2022

      • Pratha-Fish
        Also, the timelogs are quite interesting too https://usercontent.irccloud-cdn.com/file/JxDsqkr…
      • 2022-07-20 20140, 2022

      • alastairp
        that's great. I also tested the data independently myself, so I think that we're pretty safe in deciding this
      • 2022-07-20 20102, 2022

      • Pratha-Fish
        🎉
      • 2022-07-20 20111, 2022

      • alastairp
        it means that we can remove track lookups from everything (remember also to put this in our doc explaining why we no longer have it!)
      • 2022-07-20 20133, 2022

      • Pratha-Fish
        Definitely
      • 2022-07-20 20134, 2022

      • alastairp
        yeah, I imagine that some files are large (users who have a lot of scrobbles)
      • 2022-07-20 20138, 2022

      • lucifer
        you can also try adding an errorneous entry to a file manually, then run it on 3-4 files including the maligned file to confirm it wasn't a logger issues.
      • 2022-07-20 20129, 2022

      • Pratha-Fish
        lucifer: sure I'll try it out. I have the faulty data ready too
      • 2022-07-20 20126, 2022

      • alastairp
        Pratha-Fish: ./49/49dc8e61-67ca-4ad1-bf53-437856924777.txt.gz is the largest file
      • 2022-07-20 20140, 2022

      • alastairp
        you could try and run it as a one-off to see if it takes ~12 seconds
      • 2022-07-20 20147, 2022

      • Pratha-Fish
        okie
      • 2022-07-20 20132, 2022

      • alastairp
        Pratha-Fish: I'll move the rec_track_checker/MLHD directory to /data, don't panic when it disappears
      • 2022-07-20 20139, 2022

      • Pratha-Fish
        sure
      • 2022-07-20 20104, 2022

      • alastairp
      • 2022-07-20 20108, 2022

      • alastairp
        that's amazing
      • 2022-07-20 20117, 2022

      • alastairp
        !m Pratha-Fish
      • 2022-07-20 20118, 2022

      • BrainzBot
        You're doing good work, Pratha-Fish!
      • 2022-07-20 20149, 2022

      • Pratha-Fish
        !!!
      • 2022-07-20 20152, 2022

      • alastairp
        Pratha-Fish: I'm about to head back home, but I have a few other things that I want to run past you
      • 2022-07-20 20104, 2022

      • Pratha-Fish
        alastairp: yes please
      • 2022-07-20 20151, 2022

      • alastairp
        1) lookup reports: I used your code to do lookups of more metadata and then generated an html report. check out this:
      • 2022-07-20 20152, 2022

      • alastairp
      • 2022-07-20 20111, 2022

      • alastairp
        html is a bit nicer to view, as we can do links etc
      • 2022-07-20 20124, 2022

      • Pratha-Fish
        Oh that's interesting
      • 2022-07-20 20129, 2022

      • alastairp
      • 2022-07-20 20152, 2022

      • alastairp
        you can now host stuff on wolf, just make a 'public_html' directory on wolf and put the stuff there
      • 2022-07-20 20118, 2022

      • alastairp
        (and also, there are some weird unicode issues in that html file and I don't know why, we need to debug it further)
      • 2022-07-20 20105, 2022

      • Pratha-Fish
        Yea, noticed the unicode issue
      • 2022-07-20 20139, 2022

      • Pratha-Fish
        Also, the jinja part is pretty interesting. I heard a bit about it through flask, but didn't know it could be used like that :)
      • 2022-07-20 20118, 2022

      • alastairp
        yes right. there's no requirement to only use jinja as part of a webserver
      • 2022-07-20 20133, 2022

      • Pratha-Fish
        alastairp: Does the public_html dir need to be in my home directory or can it be hosted anywhere?
      • 2022-07-20 20135, 2022

      • alastairp
        we could have used any templating system, but I just reached for jinja because we have experience with it
      • 2022-07-20 20102, 2022

      • alastairp
        Pratha-Fish: the configuration is such that ~your_username/file.html maps to /home/your_username/public_html/file.html
      • 2022-07-20 20131, 2022

      • Pratha-Fish
        Ah I see
      • 2022-07-20 20110, 2022

      • Pratha-Fish
        Also, if let's say I hosted a txt file in that directory, would ~snaek/file.txt just send the txt file as a download?
      • 2022-07-20 20119, 2022

      • alastairp
        try it and see
      • 2022-07-20 20125, 2022

      • Pratha-Fish
        OMW
      • 2022-07-20 20103, 2022

      • Pratha-Fish
        alastairp: yes, looks like it works that way. txt files are previewed directly on the browser, while binary and markdown files are sent as downloads
      • 2022-07-20 20121, 2022

      • alastairp
        the markdown one is interesting. because that's text
      • 2022-07-20 20150, 2022

      • Pratha-Fish
        +1
      • 2022-07-20 20123, 2022

      • alastairp
        we use nginx as the webserver
      • 2022-07-20 20132, 2022

      • Pratha-Fish
        Is there any particular tradeoffs b/w nginx and apache?
      • 2022-07-20 20135, 2022

      • Pratha-Fish
        *are
      • 2022-07-20 20145, 2022

      • alastairp
        typically it will inspect the file and know what it is, therefore it will know if it should tell the browser to display it or download it
      • 2022-07-20 20152, 2022

      • alastairp
      • 2022-07-20 20105, 2022

      • alastairp
        includes: `content-type: text/html`
      • 2022-07-20 20118, 2022

      • alastairp
        whereas the same one on README.md shows `content-type: application/octet-stream`
      • 2022-07-20 20126, 2022

      • Pratha-Fish
        wow that's interesting
      • 2022-07-20 20130, 2022

      • alastairp
        so the browser sees that and says "whoops, I'd better download this"
      • 2022-07-20 20149, 2022

      • alastairp
        try a zip file or something as well, based on the content-type the browser will choose what to do
      • 2022-07-20 20101, 2022

      • Pratha-Fish
        yep
      • 2022-07-20 20107, 2022

      • alastairp
        if we could configure nginx to send `text/plain` as the content type then the browser would probably display it
      • 2022-07-20 20132, 2022

      • alastairp
        regarding nginx/apache, no great difference. I used to use apache, then one day I started using nginx. about 15 years ago it had some features that made it faster
      • 2022-07-20 20106, 2022

      • alastairp
        Pratha-Fish: anyway, back to what we were discussing
      • 2022-07-20 20118, 2022

      • alastairp
        I think that it is now a top priority to start moving some of these notebooks to scripts (as you said you had started doing). It's very difficult for me to share code to you in the notebook, because every time I run your notebook, the outputs change and it causes really annoying git diffs
      • 2022-07-20 20141, 2022

      • alastairp
        with a python script, I'd be able to open a pull request on your repo to add this html template, for example
      • 2022-07-20 20117, 2022

      • Pratha-Fish
        I see, I'll get that one done ASAP then
      • 2022-07-20 20148, 2022

      • alastairp
        so let's focus on that. I'd like a script that I can run which takes the list of files (df2_artist_rec_names_artist_list.txt), looks up the necessary data, does the mapping lookup, and then writes the debug html
      • 2022-07-20 20106, 2022

      • alastairp
        here's another interesting item which I found:
      • 2022-07-20 20122, 2022

      • alastairp
        you use requests_cache
      • 2022-07-20 20126, 2022

      • alastairp
      • 2022-07-20 20137, 2022

      • alastairp
        this loop is interesting
      • 2022-07-20 20104, 2022

      • Pratha-Fish
        yes, requests cache just made testing faster without puting load on the mapping API
      • 2022-07-20 20119, 2022

      • alastairp
        do you see that even if the lookup that you do returns data from the cache, you'll still sleep for 0.5 seconds?
      • 2022-07-20 20104, 2022

      • Pratha-Fish
        yes, that one was placed there to reduce load on the server
      • 2022-07-20 20124, 2022

      • alastairp
        but if you don't access the server, there's no reason to reduce load on it
      • 2022-07-20 20147, 2022

      • Pratha-Fish
        Hmmmmmmmmm
      • 2022-07-20 20154, 2022

      • Pratha-Fish
        never thought of it!
      • 2022-07-20 20102, 2022

      • alastairp
        mmmhm
      • 2022-07-20 20149, 2022

      • alastairp
        this is why I suggested an alternative, of saving the result of the lookup to a file, and skipping the lookup if the file with the result already exists
      • 2022-07-20 20124, 2022

      • alastairp
      • 2022-07-20 20131, 2022

      • alastairp
        so you only sleep if you do the lookup
      • 2022-07-20 20149, 2022

      • alastairp
        there may be a method in requests_cache that tells you if a lookup was returned from the cache, I don't know
      • 2022-07-20 20131, 2022

      • Pratha-Fish
        Yes that's better, it'll even act as a "start where left" functionality if the test ends abruptly somewhere ig
      • 2022-07-20 20144, 2022

      • alastairp
        yep, exactly! that's why I use this pattern
      • 2022-07-20 20155, 2022

      • alastairp
      • 2022-07-20 20100, 2022

      • alastairp
        The following attributes are available on responses: from_cache: indicates if the response came from the cache
      • 2022-07-20 20112, 2022

      • alastairp
        so there we go, you could also use that
      • 2022-07-20 20120, 2022

      • Pratha-Fish
        Excellent
      • 2022-07-20 20128, 2022

      • alastairp
        anyway, it was just something that I noticed and wanted to tell you
      • 2022-07-20 20136, 2022

      • alastairp
        OK, one last thing - then I need to go home
      • 2022-07-20 20101, 2022

      • Pratha-Fish
        Thanks for informing :)
      • 2022-07-20 20105, 2022

      • alastairp
        after talking some stuff over with mayhem yesterday and today, I think that the query that I gave you to get the artist of a recording is incorrect
      • 2022-07-20 20159, 2022

      • alastairp
        https://musicbrainz.org/release-group/d406dc76-09… this is a good demonstration of the problem, it was one of the examples in the mis-matched report that I generated
      • 2022-07-20 20108, 2022

      • alastairp
        ~ Release group by onelinedrawing
      • 2022-07-20 20127, 2022

      • alastairp
        however, click on onelinedrawing and it takes you to an artist called "Jonah Matranga"
      • 2022-07-20 20124, 2022

      • Pratha-Fish
        Interesting
      • 2022-07-20 20126, 2022

      • alastairp
        This is because MB allows you to have "artist credits" for releases and recordings, which may be different to the name of the person (because of stylistic reasons, or maybe their name changed but they're the same person, or... there are many reasons)
      • 2022-07-20 20107, 2022

      • alastairp
        anyway, I think that the query I gave you uses the artist's "official" name, rather than the credit. but the mapping code uses the credit
      • 2022-07-20 20126, 2022

      • alastairp
        this is probably why we were getting so many empty matches
      • 2022-07-20 20133, 2022

      • alastairp
        so, let's fix the query
      • 2022-07-20 20100, 2022

      • Pratha-Fish
        🧠 ✨
      • 2022-07-20 20142, 2022

      • Pratha-Fish
        That makes me think, does the release-MBID column really help us with anything?
      • 2022-07-20 20155, 2022

      • alastairp
        not at the moment :)
      • 2022-07-20 20118, 2022

      • Pratha-Fish
        Great, one less thing to worry about
      • 2022-07-20 20130, 2022

      • alastairp
        let's talk about that in a few weeks. I came up with some ideas with mayhem yesterday. when we've finished these current tasks we can address it, but it's not useful for us yet
      • 2022-07-20 20139, 2022

      • alastairp
        ok, so
      • 2022-07-20 20140, 2022

      • alastairp
        select recording.gid as rec_gid, array_agg(artist.gid) as artist_credit_list from recording join artist_credit ac on ac.id=artist_credit join artist_credit_name acn on acn.artist_credit=ac.id join artist on artist.id = acn.artist group by recording.gid limit 10;
      • 2022-07-20 20105, 2022

      • alastairp
        I gave you a query like this, which gives you the artists on a recording, but then we used the `artist table` to look up the actual artist, right?
      • 2022-07-20 20135, 2022

      • Pratha-Fish
        that's right