aerozol: guidelines require a lot more community input :) but it might be a good time to start them!
If you have ideas, put them to paper (or well, to wiki) and we can open a discussion
lucifer
Lotheric: goldenshimmer: not entirely sure but one possible lead I discussed with outsidecontext is the recently played endpoint returning tracks from spotify out of order. that can causes streams to miss from LB.
mayhem: alastairp: it looks like spotify endpoint might be returning streams out of order. for eg: consider a user continously listening since 1 PM. suppose we query at 1:15 PM and we get no streams. we attempt again later but still no streams then we query at 3 PM, endpoint returns streams till 2 PM + a recent stream listened around 2:50 PM. the endpoint hasn't yet returned streams between 2 PM - 2:50 PM. now we query the
endpoint again later, it has the entire listening history from 12- 3 at this point. but since we already imported the 2:50 PM stream in an older run, LB will ignore all the streams prior to 2:50 PM. so the user's listening history between 2-2:50 PM goes missing.
thoughts on how we can confirm this behaviour and then work on a workaround/fix.
outsidecontext
I also try to monitor this today. Two days ago Spoitify import worked smoothly for me. Yesterday it had big issues, importing only 3 of 34 tracks (see https://gist.github.com/phw/03ce9343f64cb57328d...)
today there seem to be issues again. The three tracks I listened to earlier today are not yet showing up in Spotify API over an hour later. Maybe I can see it happening again and catch one of the bad API responses
treeshateorcs[m]: above chage should fix server startup error you were seeing yesterday. will merge after review but you can make the changes locally till then if you want.
CatQuest
wait ill adds scrobble to lfm or lb?
riksucks
lucifer: btw i wanted to ask you something, wouldn't it be better to manually just add headers to the handle_error function, rather than using the decorator. Pretty sure the code quality would take a hit. What do you say
lucifer
riksucks: yes i am currently debugging that error, if we can get it working fine otherwise adding manually is fine.
riksucks
Right I see, thanks
texke joined the channel
lucifer
riksucks: no useful leads yet, lets add the header manually.
jsonify returns a Response object so you can set the header on that.
alastairp
lucifer: hmm, interesting. we use a limit to decide to no longer import some items from the spotify stream, right?
given that we have deduplication during ingest, maybe we just grab all 50 all the time?
(morning)
lucifer
that would significantly increase dupes. also 50x the load on ts writer.
mayhem
moooin!
Sophist_UK has quit
lucifer
we currently store the timestamp of the latest listen imported for a user from spotify and then query the api for streams only after that timestamp.
mayhem
moooin!
how about we normally do an incremental (since last timestamp) fetch, but every 10 tries we do a full pull of all 50 listens?
lucifer
interesting thought, that could work but also means increased load for 7mins every ~75mins.
mayhem
ideally we would spread the larger checks out, so they don't bunch up.
I think we ought to build a whole class in the importer that sets the next check time for a given user.
and that class can contain a whole lot of logic or attempts at predicting things.
PrathameshG joined the channel
and perhaps that even has some things that could be tuned on the fly.
alastairp
just to confirm - does the ts writer ever back up with work?
lucifer
i was thinking of doing some redis based dedup but that would be more work to get right. store last 50 listens imported for the user in redis, then query all available spotify listens for the user, match against redis only send the ones not in redis to ts writer update redis.
currently no.
mayhem
say that this "out of order mode" from spotify is an abberation. maybe we can turn the full checks off when things run normally?
lucifer: I think adding extra work on that level will just end up being a pain. let TS sort it out, but lets try to be smart about how often we query a particular user.
alastairp
^ agreed about the complexity of another level
lucifer
re disabling full checks, that needs us to be able to figure out when spotify is going out of order. for eg, outsidecontext noticed the issue but i usually don't monitor what spotify is importing for me so would never know.
makes sense.
outsidecontext
yes. I also don't know how often it failed for me without noticing. Just yesterday I easily spotted it, and a few weeks ago there was another case I noticed.
mayhem
lucifer: yes, perhaps this is something we check for and turn on automatically?
outsidecontext
And today of course. Still waiting for my listenings from 3 hours ago to show up in Spotify API
mayhem
like do random long pulls and check for OOO issues?
lucifer
+1 on the "the importer that sets the next check time for a given user."
alastairp
we found forum posts on the spotify site asking about this problem too, so it's not just related to us
mayhem
we could have a button somewhere that says: "spotify is being dumb, please try harder for a day"
lucifer
yes we can do random long pulls but still need to compare it with something to figure out that listens are missing.
outsidecontext
there are two aspects to this: 1. spotify sometimes having huge delays until listens show up. There is obviously not much to do about this
mayhem
and if 3 people press the button, we try harder.
alastairp
is there a way for us to independently identify this without user input?
mayhem
oh, ah. big brain time. 🧠
outsidecontext
and 2. likely spotify sometimes only shown more recent listens first, while not yet showing some older ones. at least so far the theory, not yet seen an actual API result showing this behavior
alastairp
randomly sample n users, get their listens every hour, see if listens "turn up" in the middle compared to when we checked last
mayhem
lets get a paid spotify account. lets create a bot that listens to music. and it listens to a short track every 3 minutes or so.
now we have a "clock frequency" we know that listens should be coming for this user every 3 minutes.
if reality differs from theory, try harder.
lucifer
sounds good
outsidecontext, have other listens for today showed up yet?
outsidecontext
no, and I only did these three in the morning. I'll listen to something additional now for testing
lucifer
outsidecontext, also i found another comment in the forums mentioning that it could be related to offline mode or iOS app. does that sound familiar to when you noticed the issue ?
outsidecontext
have been using the desktop app on my laptops
lucifer
👍
outsidecontext
interesting is that the current playback endpoint is working well
alastairp
yeah, I think we noticed that too
lucifer
yeah the current playback endpoint usually works realtime but the recently played one may lag hours.
yvanzo
O’Moin
PrathameshG has quit
reosarevok
moin :)
yvanzo: someone said "nothing in the MusicBrainz documentation about work types, like Incidental Music and such. the only place I’ve found those descriptions is on the Create New Work pages. frankly, what we’ve got hasn’t helped me figure out if parts of a soundtrack are incidental or not"
This goes together with the ws ticket you shared yesterday
Should we look into having, say, /work-types on the site, and then maybe a /ws/2/work-types JSON representation of the same list?
mayhem
reosarevok: a friend of mine, with whom I am chatting about a possible collaboration, asks:
"In the meantime can you send me a link to a musicbrainz entry that you think gives a good representation of full metadata? Like an artist with ISRCS and other unique identifiers, etc. So I can get a vibe on what the ultimate goal for artists would be."
got any artists in mind that fit this bill especially well?
reosarevok
Hmm. Artists specifically
Lemme think
mayhem
or an album...
where the artists is also pretty well pimped out.
reosarevok
Well, as much as I dislike his bullshit, https://musicbrainz.org/artist/164f0d73-1234-4e... seems well-filled. Also, wtf, does the US really allow one to legally change their name to just "Ye", no surname? :D
(I see four recordings were added during the holidays that need fixing, heh, will change them)
Is the possible collaboration a seekrit?
ISRCs and whatnot though depend a ton on what people have sent and I can't guarantee most recordings for either artist have them, tbh
Since those usually require CD in hand and even then it's not always there
Kanye has a fair amount for albums, it seems, but not many for singles
mayhem
ok, thanks for those links.
what would we consider ideal minimal metadata for an artist who wishes to add one release, or even perhaps one single track they just finished?
artist name, sortname, type, area, link to their own page, a link to where the artists can be supported, metadata for release including a link where it can be obtained.
is kinda my thinking.
reosarevok
Sort name can be confusing, but yes. Link to their own page (which often will mean social media), link to streaming pages or whatnot, track titles and durations
And ideally ISRCs since it's much easier for them to provide them than for anyone else to find them
IPI and ISNI in the form confuse a lot of people though, so if we have a section for ISRCs it should specify "it's ok to skip them if you don't have them"
mayhem
"but if there is a way to make an MB add-on for these" <- there is fierce competition for doing this right now. and each of these tools is collecting data for their own "WE MUST WIN IT ALL OR WE PERISH" approach. which is.. uhm, not going to work.
yvanzo
But they are publishing music on several digital streaming platforms at the same time. Might it be possible to add MB as a target? It would just grab metadata and not audio content.
mayhem
anything is possible at this point in time.
it would be good for us to think about this in broad terms of what we would love to see happen.
yvanzo
But you’re probably right that they may not allow easily to create add-on for their software.
mayhem
and then Marc and I can actually see about what is possible.
yvanzo: and the add-ons are usually for larger studios with established artists. byta deals more with tons of teeny artists.
which is actually great -- we want more teeny artists metadata in MB.
yvanzo
Right.
reosarevok
Yeah, established artist data we'll eventually get most of anyway
mayhem
the easiest would be a cd stub like system, but... that doesn't do a lot of good in the grand scheme of things.
having switched on artists would be even better.
reosarevok
I mean, it does do good if we or byta have people in charge of finishing the import
But it doesn't otherwise
mayhem
agreed.
but I have no ideal of the costs and scalability of this.
reosarevok
Yeah, me neither :) Something like that would need to be tested for feasibility for sure
But the scalability issue will to some degree be there even if the artists do all the adding since the new additions will be seen by the community
My main worry with artists doing the adding is just "who deals with edit notes by the community"
The artist is likely gone by then
mayhem
agreed.scalability remains the greatest concern.
perhaps we can fund one position of someone who is what you call a finisher.
reosarevok
Of course, byta edits could be clearly be marked as such, with a note like "this was added in this way, if you see something wrong here, be bold and make the changes, if you see something consistently wrong, get in touch so we can improve the system"
mayhem
if we can identify the steam of users coming in from this, this person could be tasked with tidying up this incoming stream of data.
reosarevok
But basically there should be clarity in any case on whether the community can expect an answer to notes or not
Yeah. That seems perfectly doable
(I do that at a small scale rn with BBC works)
The scalability is the main question
You'd need to decide with byta whether each artist would get one account, or all would go through one main byta account
mayhem
a good question, that.
reosarevok
If they have accounts for artists on their site, then giving them an automatic [byta_or_whatever_prefix]_username account is probably doable (needs implementing, but)
mayhem
an ideally, this would be less of "marc and I deciding" but more of "reo telling us how it could be feasible"
reosarevok
If they don't and each time the artist adds new music they do it on a form without them having an account over there as such, then a general account for them seems simpler
mayhem
the "creating an account" seems to be a significant hurdle for such services.
reosarevok
Yeah. We could in some way automate MB account creation for them if they have an account on their side
mayhem
so, if they are required to have a byta account, then adding yet another one is a likely dealbreaker.
reosarevok
I wasn't thinking of the artist filling in our captcha :)
Just that behind the scenes they'd get assigned an account
mayhem
once we migrate to single oauth on MeB, then this is much easier.
reosarevok
For submitting the data
mayhem
agreed, not a bad idea.
reosarevok
But that only works if they have byta accounts already
If not, they should go through one generic byta account
That'd probably be easier to monitor anyway, but harder to find the issues with specific artists and support them if they need to
FWIW though we now have an edit search for edit note content, meaning different accounts wouldn't be a problem if we can just search for a string such as "Submitted via byta"
yvanzo
One generic byta account seems to be more workable indeed.