in #metabrainz

0:35 AM
Hobbyboy has quit
0:36 AM
JuniorJPDJ has quit
0:36 AM
treeshateorcs[m] has quit
0:39 AM
Hobbyboy joined the channel
0:54 AM
JuniorJPDJ joined the channel
0:56 AM
treeshateorcs[m] joined the channel
3:26 AM
MRiddickW has quit
4:49 AM
gcrkrause3 has quit
4:52 AM
gcrkrause3 joined the channel
8:21 AM
reosarevok

aerozol: guidelines require a lot more community input :) but it might be a good time to start them!
8:22 AM
If you have ideas, put them to paper (or well, to wiki) and we can open a discussion
8:36 AM
lucifer

Lotheric: goldenshimmer: not entirely sure but one possible lead I discussed with outsidecontext is the recently played endpoint returning tracks from spotify out of order. that can causes streams to miss from LB.
8:43 AM
mayhem: alastairp: it looks like spotify endpoint might be returning streams out of order. for eg: consider a user continously listening since 1 PM. suppose we query at 1:15 PM and we get no streams. we attempt again later but still no streams then we query at 3 PM, endpoint returns streams till 2 PM + a recent stream listened around 2:50 PM. the endpoint hasn't yet returned streams between 2 PM - 2:50 PM. now we query the
8:43 AM
endpoint again later, it has the entire listening history from 12- 3 at this point. but since we already imported the 2:50 PM stream in an older run, LB will ignore all the streams prior to 2:50 PM. so the user's listening history between 2-2:50 PM goes missing.
8:44 AM
we don't have a reproducing sample currently but i, outsidecontext and people on spotify forums have seen similar behaviour https://community.spotify.com/t5/Spotify-for-De...
8:44 AM
thoughts on how we can confirm this behaviour and then work on a workaround/fix.
8:46 AM
outsidecontext

I also try to monitor this today. Two days ago Spoitify import worked smoothly for me. Yesterday it had big issues, importing only 3 of 34 tracks (see https://gist.github.com/phw/03ce9343f64cb57328d...)
8:49 AM
today there seem to be issues again. The three tracks I listened to earlier today are not yet showing up in Spotify API over an hour later. Maybe I can see it happening again and catch one of the bad API responses
8:51 AM
BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #1822 (03master…startup-bug): Fix LB server local development https://github.com/metabrainz/listenbrainz-serv...
8:53 AM
lucifer

treeshateorcs[m]: above chage should fix server startup error you were seeing yesterday. will merge after review but you can make the changes locally till then if you want.
9:30 AM
CatQuest

wait ill adds scrobble to lfm or lb?
9:33 AM
riksucks

lucifer: btw i wanted to ask you something, wouldn't it be better to manually just add headers to the handle_error function, rather than using the decorator. Pretty sure the code quality would take a hit. What do you say
9:35 AM
lucifer

riksucks: yes i am currently debugging that error, if we can get it working fine otherwise adding manually is fine.
9:35 AM
riksucks

Right I see, thanks
9:54 AM
texke joined the channel
9:58 AM
lucifer

riksucks: no useful leads yet, lets add the header manually.
10:00 AM
jsonify returns a Response object so you can set the header on that.
10:03 AM
alastairp

lucifer: hmm, interesting. we use a limit to decide to no longer import some items from the spotify stream, right?
10:04 AM
given that we have deduplication during ingest, maybe we just grab all 50 all the time?
10:04 AM
(morning)
10:06 AM
lucifer

that would significantly increase dupes. also 50x the load on ts writer.
10:07 AM
mayhem

moooin!
10:07 AM
Sophist_UK has quit
10:07 AM
lucifer

we currently store the timestamp of the latest listen imported for a user from spotify and then query the api for streams only after that timestamp.
10:08 AM
mayhem

moooin!
10:08 AM
how about we normally do an incremental (since last timestamp) fetch, but every 10 tries we do a full pull of all 50 listens?
10:12 AM
lucifer

interesting thought, that could work but also means increased load for 7mins every ~75mins.
10:13 AM
mayhem

ideally we would spread the larger checks out, so they don't bunch up.
10:14 AM
I think we ought to build a whole class in the importer that sets the next check time for a given user.
10:14 AM
and that class can contain a whole lot of logic or attempts at predicting things.
10:14 AM
PrathameshG joined the channel
10:14 AM
and perhaps that even has some things that could be tuned on the fly.
10:14 AM
alastairp

just to confirm - does the ts writer ever back up with work?
10:14 AM
lucifer

i was thinking of doing some redis based dedup but that would be more work to get right. store last 50 listens imported for the user in redis, then query all available spotify listens for the user, match against redis only send the ones not in redis to ts writer update redis.
10:15 AM
currently no.
10:15 AM
mayhem

say that this "out of order mode" from spotify is an abberation. maybe we can turn the full checks off when things run normally?
10:15 AM
lucifer: I think adding extra work on that level will just end up being a pain. let TS sort it out, but lets try to be smart about how often we query a particular user.
10:16 AM
alastairp

^ agreed about the complexity of another level
10:16 AM
lucifer

re disabling full checks, that needs us to be able to figure out when spotify is going out of order. for eg, outsidecontext noticed the issue but i usually don't monitor what spotify is importing for me so would never know.
10:17 AM
makes sense.
10:17 AM
outsidecontext

yes. I also don't know how often it failed for me without noticing. Just yesterday I easily spotted it, and a few weeks ago there was another case I noticed.
10:18 AM
mayhem

lucifer: yes, perhaps this is something we check for and turn on automatically?
10:18 AM
outsidecontext

And today of course. Still waiting for my listenings from 3 hours ago to show up in Spotify API
10:18 AM
mayhem

like do random long pulls and check for OOO issues?
10:19 AM
lucifer

+1 on the "the importer that sets the next check time for a given user."
10:19 AM
alastairp

we found forum posts on the spotify site asking about this problem too, so it's not just related to us
10:19 AM
mayhem

we could have a button somewhere that says: "spotify is being dumb, please try harder for a day"
10:19 AM
lucifer

yes we can do random long pulls but still need to compare it with something to figure out that listens are missing.
10:20 AM
outsidecontext

there are two aspects to this: 1. spotify sometimes having huge delays until listens show up. There is obviously not much to do about this
10:20 AM
mayhem

and if 3 people press the button, we try harder.
10:20 AM
alastairp

is there a way for us to independently identify this without user input?
10:20 AM
mayhem

oh, ah. big brain time. 🧠
10:20 AM
outsidecontext

and 2. likely spotify sometimes only shown more recent listens first, while not yet showing some older ones. at least so far the theory, not yet seen an actual API result showing this behavior
10:20 AM
alastairp

randomly sample n users, get their listens every hour, see if listens "turn up" in the middle compared to when we checked last
10:21 AM
mayhem

lets get a paid spotify account. lets create a bot that listens to music. and it listens to a short track every 3 minutes or so.
10:21 AM
now we have a "clock frequency" we know that listens should be coming for this user every 3 minutes.
10:22 AM
if reality differs from theory, try harder.
10:22 AM
lucifer

sounds good
10:23 AM
outsidecontext, have other listens for today showed up yet?
10:24 AM
outsidecontext

no, and I only did these three in the morning. I'll listen to something additional now for testing
10:28 AM
lucifer

outsidecontext, also i found another comment in the forums mentioning that it could be related to offline mode or iOS app. does that sound familiar to when you noticed the issue ?
10:29 AM
outsidecontext

have been using the desktop app on my laptops
10:29 AM
lucifer

👍
10:31 AM
outsidecontext

interesting is that the current playback endpoint is working well
10:32 AM
alastairp

yeah, I think we noticed that too
10:32 AM
lucifer

yeah the current playback endpoint usually works realtime but the recently played one may lag hours.
10:43 AM
yvanzo

O’Moin
10:47 AM
PrathameshG has quit
11:12 AM
reosarevok

moin :)
11:24 AM
yvanzo: someone said "nothing in the MusicBrainz documentation about work types, like Incidental Music and such. the only place I’ve found those descriptions is on the Create New Work pages. frankly, what we’ve got hasn’t helped me figure out if parts of a soundtrack are incidental or not"
11:25 AM
This goes together with the ws ticket you shared yesterday
11:25 AM
Should we look into having, say, /work-types on the site, and then maybe a /ws/2/work-types JSON representation of the same list?
11:33 AM
mayhem

reosarevok: a friend of mine, with whom I am chatting about a possible collaboration, asks:
11:33 AM
"In the meantime can you send me a link to a musicbrainz entry that you think gives a good representation of full metadata? Like an artist with ISRCS and other unique identifiers, etc. So I can get a vibe on what the ultimate goal for artists would be."
11:34 AM
got any artists in mind that fit this bill especially well?
11:34 AM
reosarevok

Hmm. Artists specifically
11:34 AM
Lemme think
11:35 AM
mayhem

or an album...
11:35 AM
where the artists is also pretty well pimped out.
11:38 AM
reosarevok

Well, as much as I dislike his bullshit, https://musicbrainz.org/artist/164f0d73-1234-4e... seems well-filled. Also, wtf, does the US really allow one to legally change their name to just "Ye", no surname? :D
11:38 AM
For a classical example, https://musicbrainz.org/artist/ae0b2424-d4c5-4c... should work fine
11:40 AM
(I see four recordings were added during the holidays that need fixing, heh, will change them)
11:41 AM
Is the possible collaboration a seekrit?
11:41 AM
ISRCs and whatnot though depend a ton on what people have sent and I can't guarantee most recordings for either artist have them, tbh
11:42 AM
Since those usually require CD in hand and even then it's not always there
11:42 AM
Kanye has a fair amount for albums, it seems, but not many for singles
11:43 AM
mayhem

ok, thanks for those links.
11:53 AM
what would we consider ideal minimal metadata for an artist who wishes to add one release, or even perhaps one single track they just finished?
11:54 AM
artist name, sortname, type, area, link to their own page, a link to where the artists can be supported, metadata for release including a link where it can be obtained.
11:54 AM
is kinda my thinking.
11:55 AM
reosarevok

Sort name can be confusing, but yes. Link to their own page (which often will mean social media), link to streaming pages or whatnot, track titles and durations
11:56 AM
And ideally ISRCs since it's much easier for them to provide them than for anyone else to find them
11:56 AM
IPI and ISNI in the form confuse a lot of people though, so if we have a section for ISRCs it should specify "it's ok to skip them if you don't have them"
11:57 AM
mayhem

"but if there is a way to make an MB add-on for these" <- there is fierce competition for doing this right now. and each of these tools is collecting data for their own "WE MUST WIN IT ALL OR WE PERISH" approach. which is.. uhm, not going to work.
11:58 AM
yvanzo

But they are publishing music on several digital streaming platforms at the same time. Might it be possible to add MB as a target? It would just grab metadata and not audio content.
11:59 AM
mayhem

anything is possible at this point in time.
12:00 PM
it would be good for us to think about this in broad terms of what we would love to see happen.
12:00 PM
yvanzo

But you’re probably right that they may not allow easily to create add-on for their software.
12:00 PM
mayhem

and then Marc and I can actually see about what is possible.
12:00 PM
yvanzo: and the add-ons are usually for larger studios with established artists. byta deals more with tons of teeny artists.
12:01 PM
which is actually great -- we want more teeny artists metadata in MB.
12:01 PM
yvanzo

Right.
12:01 PM
reosarevok

Yeah, established artist data we'll eventually get most of anyway
12:02 PM
mayhem

the easiest would be a cd stub like system, but... that doesn't do a lot of good in the grand scheme of things.
12:02 PM
having switched on artists would be even better.
12:02 PM
reosarevok

I mean, it does do good if we or byta have people in charge of finishing the import
12:02 PM
But it doesn't otherwise
12:02 PM
mayhem

agreed.
12:03 PM
but I have no ideal of the costs and scalability of this.
12:03 PM
reosarevok

Yeah, me neither :) Something like that would need to be tested for feasibility for sure
12:04 PM
But the scalability issue will to some degree be there even if the artists do all the adding since the new additions will be seen by the community
12:04 PM
My main worry with artists doing the adding is just "who deals with edit notes by the community"
12:04 PM
The artist is likely gone by then
12:04 PM
mayhem

agreed.scalability remains the greatest concern.
12:05 PM
perhaps we can fund one position of someone who is what you call a finisher.
12:05 PM
reosarevok

Of course, byta edits could be clearly be marked as such, with a note like "this was added in this way, if you see something wrong here, be bold and make the changes, if you see something consistently wrong, get in touch so we can improve the system"
12:05 PM
mayhem

if we can identify the steam of users coming in from this, this person could be tasked with tidying up this incoming stream of data.
12:05 PM
reosarevok

But basically there should be clarity in any case on whether the community can expect an answer to notes or not
12:05 PM
Yeah. That seems perfectly doable
12:06 PM
(I do that at a small scale rn with BBC works)
12:06 PM
The scalability is the main question
12:06 PM
You'd need to decide with byta whether each artist would get one account, or all would go through one main byta account
12:07 PM
mayhem

a good question, that.
12:08 PM
reosarevok

If they have accounts for artists on their site, then giving them an automatic [byta_or_whatever_prefix]_username account is probably doable (needs implementing, but)
12:08 PM
mayhem

an ideally, this would be less of "marc and I deciding" but more of "reo telling us how it could be feasible"
12:08 PM
reosarevok

If they don't and each time the artist adds new music they do it on a form without them having an account over there as such, then a general account for them seems simpler
12:09 PM
mayhem

the "creating an account" seems to be a significant hurdle for such services.
12:09 PM
reosarevok

Yeah. We could in some way automate MB account creation for them if they have an account on their side
12:09 PM
mayhem

so, if they are required to have a byta account, then adding yet another one is a likely dealbreaker.
12:10 PM
reosarevok

I wasn't thinking of the artist filling in our captcha :)
12:10 PM
Just that behind the scenes they'd get assigned an account
12:10 PM
mayhem

once we migrate to single oauth on MeB, then this is much easier.
12:10 PM
reosarevok

For submitting the data
12:10 PM
mayhem

agreed, not a bad idea.
12:10 PM
reosarevok

But that only works if they have byta accounts already
12:10 PM
If not, they should go through one generic byta account
12:11 PM
That'd probably be easier to monitor anyway, but harder to find the issues with specific artists and support them if they need to
12:11 PM
FWIW though we now have an edit search for edit note content, meaning different accounts wouldn't be a problem if we can just search for a string such as "Submitted via byta"
12:11 PM
yvanzo

One generic byta account seems to be more workable indeed.
12:15 PM
mayhem

yvanzo: thanks for the weblate email.