#metabrainz

/

      • Hobbyboy has quit
      • JuniorJPDJ has quit
      • treeshateorcs[m] has quit
      • Hobbyboy joined the channel
      • JuniorJPDJ joined the channel
      • treeshateorcs[m] joined the channel
      • MRiddickW has quit
      • gcrkrause3 has quit
      • gcrkrause3 joined the channel
      • reosarevok
        aerozol: guidelines require a lot more community input :) but it might be a good time to start them!
      • If you have ideas, put them to paper (or well, to wiki) and we can open a discussion
      • lucifer
        Lotheric: goldenshimmer: not entirely sure but one possible lead I discussed with outsidecontext is the recently played endpoint returning tracks from spotify out of order. that can causes streams to miss from LB.
      • mayhem: alastairp: it looks like spotify endpoint might be returning streams out of order. for eg: consider a user continously listening since 1 PM. suppose we query at 1:15 PM and we get no streams. we attempt again later but still no streams then we query at 3 PM, endpoint returns streams till 2 PM + a recent stream listened around 2:50 PM. the endpoint hasn't yet returned streams between 2 PM - 2:50 PM. now we query the
      • endpoint again later, it has the entire listening history from 12- 3 at this point. but since we already imported the 2:50 PM stream in an older run, LB will ignore all the streams prior to 2:50 PM. so the user's listening history between 2-2:50 PM goes missing.
      • we don't have a reproducing sample currently but i, outsidecontext and people on spotify forums have seen similar behaviour https://community.spotify.com/t5/Spotify-for-De...
      • thoughts on how we can confirm this behaviour and then work on a workaround/fix.
      • outsidecontext
        I also try to monitor this today. Two days ago Spoitify import worked smoothly for me. Yesterday it had big issues, importing only 3 of 34 tracks (see https://gist.github.com/phw/03ce9343f64cb57328d...)
      • today there seem to be issues again. The three tracks I listened to earlier today are not yet showing up in Spotify API over an hour later. Maybe I can see it happening again and catch one of the bad API responses
      • BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #1822 (03master…startup-bug): Fix LB server local development https://github.com/metabrainz/listenbrainz-serv...
      • lucifer
        treeshateorcs[m]: above chage should fix server startup error you were seeing yesterday. will merge after review but you can make the changes locally till then if you want.
      • CatQuest
        wait ill adds scrobble to lfm or lb?
      • riksucks
        lucifer: btw i wanted to ask you something, wouldn't it be better to manually just add headers to the handle_error function, rather than using the decorator. Pretty sure the code quality would take a hit. What do you say
      • lucifer
        riksucks: yes i am currently debugging that error, if we can get it working fine otherwise adding manually is fine.
      • riksucks
        Right I see, thanks
      • texke joined the channel
      • lucifer
        riksucks: no useful leads yet, lets add the header manually.
      • jsonify returns a Response object so you can set the header on that.
      • alastairp
        lucifer: hmm, interesting. we use a limit to decide to no longer import some items from the spotify stream, right?
      • given that we have deduplication during ingest, maybe we just grab all 50 all the time?
      • (morning)
      • lucifer
        that would significantly increase dupes. also 50x the load on ts writer.
      • mayhem
        moooin!
      • Sophist_UK has quit
      • lucifer
        we currently store the timestamp of the latest listen imported for a user from spotify and then query the api for streams only after that timestamp.
      • mayhem
        moooin!
      • how about we normally do an incremental (since last timestamp) fetch, but every 10 tries we do a full pull of all 50 listens?
      • lucifer
        interesting thought, that could work but also means increased load for 7mins every ~75mins.
      • mayhem
        ideally we would spread the larger checks out, so they don't bunch up.
      • I think we ought to build a whole class in the importer that sets the next check time for a given user.
      • and that class can contain a whole lot of logic or attempts at predicting things.
      • PrathameshG joined the channel
      • and perhaps that even has some things that could be tuned on the fly.
      • alastairp
        just to confirm - does the ts writer ever back up with work?
      • lucifer
        i was thinking of doing some redis based dedup but that would be more work to get right. store last 50 listens imported for the user in redis, then query all available spotify listens for the user, match against redis only send the ones not in redis to ts writer update redis.
      • currently no.
      • mayhem
        say that this "out of order mode" from spotify is an abberation. maybe we can turn the full checks off when things run normally?
      • lucifer: I think adding extra work on that level will just end up being a pain. let TS sort it out, but lets try to be smart about how often we query a particular user.
      • alastairp
        ^ agreed about the complexity of another level
      • lucifer
        re disabling full checks, that needs us to be able to figure out when spotify is going out of order. for eg, outsidecontext noticed the issue but i usually don't monitor what spotify is importing for me so would never know.
      • makes sense.
      • outsidecontext
        yes. I also don't know how often it failed for me without noticing. Just yesterday I easily spotted it, and a few weeks ago there was another case I noticed.
      • mayhem
        lucifer: yes, perhaps this is something we check for and turn on automatically?
      • outsidecontext
        And today of course. Still waiting for my listenings from 3 hours ago to show up in Spotify API
      • mayhem
        like do random long pulls and check for OOO issues?
      • lucifer
        +1 on the "the importer that sets the next check time for a given user."
      • alastairp
        we found forum posts on the spotify site asking about this problem too, so it's not just related to us
      • mayhem
        we could have a button somewhere that says: "spotify is being dumb, please try harder for a day"
      • lucifer
        yes we can do random long pulls but still need to compare it with something to figure out that listens are missing.
      • outsidecontext
        there are two aspects to this: 1. spotify sometimes having huge delays until listens show up. There is obviously not much to do about this
      • mayhem
        and if 3 people press the button, we try harder.
      • alastairp
        is there a way for us to independently identify this without user input?
      • mayhem
        oh, ah. big brain time. 🧠
      • outsidecontext
        and 2. likely spotify sometimes only shown more recent listens first, while not yet showing some older ones. at least so far the theory, not yet seen an actual API result showing this behavior
      • alastairp
        randomly sample n users, get their listens every hour, see if listens "turn up" in the middle compared to when we checked last
      • mayhem
        lets get a paid spotify account. lets create a bot that listens to music. and it listens to a short track every 3 minutes or so.
      • now we have a "clock frequency" we know that listens should be coming for this user every 3 minutes.
      • if reality differs from theory, try harder.
      • lucifer
        sounds good
      • outsidecontext, have other listens for today showed up yet?
      • outsidecontext
        no, and I only did these three in the morning. I'll listen to something additional now for testing
      • lucifer
        outsidecontext, also i found another comment in the forums mentioning that it could be related to offline mode or iOS app. does that sound familiar to when you noticed the issue ?
      • outsidecontext
        have been using the desktop app on my laptops
      • lucifer
        👍
      • outsidecontext
        interesting is that the current playback endpoint is working well
      • alastairp
        yeah, I think we noticed that too
      • lucifer
        yeah the current playback endpoint usually works realtime but the recently played one may lag hours.
      • yvanzo
        O’Moin
      • PrathameshG has quit
      • reosarevok
        moin :)
      • yvanzo: someone said "nothing in the MusicBrainz documentation about work types, like Incidental Music and such. the only place I’ve found those descriptions is on the Create New Work pages. frankly, what we’ve got hasn’t helped me figure out if parts of a soundtrack are incidental or not"
      • This goes together with the ws ticket you shared yesterday
      • Should we look into having, say, /work-types on the site, and then maybe a /ws/2/work-types JSON representation of the same list?
      • mayhem
        reosarevok: a friend of mine, with whom I am chatting about a possible collaboration, asks:
      • "In the meantime can you send me a link to a musicbrainz entry that you think gives a good representation of full metadata? Like an artist with ISRCS and other unique identifiers, etc. So I can get a vibe on what the ultimate goal for artists would be."
      • got any artists in mind that fit this bill especially well?
      • reosarevok
        Hmm. Artists specifically
      • Lemme think
      • mayhem
        or an album...
      • where the artists is also pretty well pimped out.
      • reosarevok
        Well, as much as I dislike his bullshit, https://musicbrainz.org/artist/164f0d73-1234-4e... seems well-filled. Also, wtf, does the US really allow one to legally change their name to just "Ye", no surname? :D
      • For a classical example, https://musicbrainz.org/artist/ae0b2424-d4c5-4c... should work fine
      • (I see four recordings were added during the holidays that need fixing, heh, will change them)
      • Is the possible collaboration a seekrit?
      • ISRCs and whatnot though depend a ton on what people have sent and I can't guarantee most recordings for either artist have them, tbh
      • Since those usually require CD in hand and even then it's not always there
      • Kanye has a fair amount for albums, it seems, but not many for singles
      • mayhem
        ok, thanks for those links.
      • what would we consider ideal minimal metadata for an artist who wishes to add one release, or even perhaps one single track they just finished?
      • artist name, sortname, type, area, link to their own page, a link to where the artists can be supported, metadata for release including a link where it can be obtained.
      • is kinda my thinking.
      • reosarevok
        Sort name can be confusing, but yes. Link to their own page (which often will mean social media), link to streaming pages or whatnot, track titles and durations
      • And ideally ISRCs since it's much easier for them to provide them than for anyone else to find them
      • IPI and ISNI in the form confuse a lot of people though, so if we have a section for ISRCs it should specify "it's ok to skip them if you don't have them"
      • mayhem
        "but if there is a way to make an MB add-on for these" <- there is fierce competition for doing this right now. and each of these tools is collecting data for their own "WE MUST WIN IT ALL OR WE PERISH" approach. which is.. uhm, not going to work.
      • yvanzo
        But they are publishing music on several digital streaming platforms at the same time. Might it be possible to add MB as a target? It would just grab metadata and not audio content.
      • mayhem
        anything is possible at this point in time.
      • it would be good for us to think about this in broad terms of what we would love to see happen.
      • yvanzo
        But you’re probably right that they may not allow easily to create add-on for their software.
      • mayhem
        and then Marc and I can actually see about what is possible.
      • yvanzo: and the add-ons are usually for larger studios with established artists. byta deals more with tons of teeny artists.
      • which is actually great -- we want more teeny artists metadata in MB.
      • yvanzo
        Right.
      • reosarevok
        Yeah, established artist data we'll eventually get most of anyway
      • mayhem
        the easiest would be a cd stub like system, but... that doesn't do a lot of good in the grand scheme of things.
      • having switched on artists would be even better.
      • reosarevok
        I mean, it does do good if we or byta have people in charge of finishing the import
      • But it doesn't otherwise
      • mayhem
        agreed.
      • but I have no ideal of the costs and scalability of this.
      • reosarevok
        Yeah, me neither :) Something like that would need to be tested for feasibility for sure
      • But the scalability issue will to some degree be there even if the artists do all the adding since the new additions will be seen by the community
      • My main worry with artists doing the adding is just "who deals with edit notes by the community"
      • The artist is likely gone by then
      • mayhem
        agreed.scalability remains the greatest concern.
      • perhaps we can fund one position of someone who is what you call a finisher.
      • reosarevok
        Of course, byta edits could be clearly be marked as such, with a note like "this was added in this way, if you see something wrong here, be bold and make the changes, if you see something consistently wrong, get in touch so we can improve the system"
      • mayhem
        if we can identify the steam of users coming in from this, this person could be tasked with tidying up this incoming stream of data.
      • reosarevok
        But basically there should be clarity in any case on whether the community can expect an answer to notes or not
      • Yeah. That seems perfectly doable
      • (I do that at a small scale rn with BBC works)
      • The scalability is the main question
      • You'd need to decide with byta whether each artist would get one account, or all would go through one main byta account
      • mayhem
        a good question, that.
      • reosarevok
        If they have accounts for artists on their site, then giving them an automatic [byta_or_whatever_prefix]_username account is probably doable (needs implementing, but)
      • mayhem
        an ideally, this would be less of "marc and I deciding" but more of "reo telling us how it could be feasible"
      • reosarevok
        If they don't and each time the artist adds new music they do it on a form without them having an account over there as such, then a general account for them seems simpler
      • mayhem
        the "creating an account" seems to be a significant hurdle for such services.
      • reosarevok
        Yeah. We could in some way automate MB account creation for them if they have an account on their side
      • mayhem
        so, if they are required to have a byta account, then adding yet another one is a likely dealbreaker.
      • reosarevok
        I wasn't thinking of the artist filling in our captcha :)
      • Just that behind the scenes they'd get assigned an account
      • mayhem
        once we migrate to single oauth on MeB, then this is much easier.
      • reosarevok
        For submitting the data
      • mayhem
        agreed, not a bad idea.
      • reosarevok
        But that only works if they have byta accounts already
      • If not, they should go through one generic byta account
      • That'd probably be easier to monitor anyway, but harder to find the issues with specific artists and support them if they need to
      • FWIW though we now have an edit search for edit note content, meaning different accounts wouldn't be a problem if we can just search for a string such as "Submitted via byta"
      • yvanzo
        One generic byta account seems to be more workable indeed.
      • mayhem
        yvanzo: thanks for the weblate email.