alastairp: if my counts are correct then at max 60 laus relevant to us were deleted in the past month.
2022-09-29 27257, 2022
lucifer
i'd be fine with incrementally updating the rest and updating link fully every week or so. maybe also try to optimize the full metadata cache generation to the extent possible. i have some ideas to try out in that direction.
2022-09-29 27246, 2022
alastairp
lucifer: awesome
2022-09-29 27218, 2022
BrainzGit
[troi-recommendation-playground] 14amCap1712 merged pull request #67 (03main…daily-jams-accept-day): Accept jam_date as optional argument in daily-jams patch https://github.com/metabrainz/troi-recommendation…
2022-09-29 27233, 2022
alastairp
lucifer: interesting that you said that the query to get a new set of artist rels is pretty quick too - but I think artist rels are part of an artist, not as a top level key in the recording json?
2022-09-29 27209, 2022
alastairp
so we'd need to get a full artist blob for each recording, right? and then replace the "artists" key in _all_ recordings
2022-09-29 27229, 2022
lucifer
alastairp: yes, indeed. but we can restructure the cache or do a partial update instead of insert.
2022-09-29 27245, 2022
alastairp
lucifer: ok, yeah, right.
2022-09-29 27210, 2022
alastairp
so, I like this idea of moving things around to make updates easier, or change the way that we do updates
2022-09-29 27217, 2022
lucifer
we can also build a temp table of artist rels and index on artist mbid and then join to it instead of having it in CTE.
2022-09-29 27200, 2022
lucifer
and in any case everything should happen at once when we run the mb_metadata_cache script.
2022-09-29 27206, 2022
alastairp
yes, right
2022-09-29 27235, 2022
alastairp
you said that the cte for artist rels is pretty quick, does that speed carry over to actually writing the data to a table and building the index too?
2022-09-29 27210, 2022
alastairp
just putting another idea out there - is splitting each of these CTEs into a separate temp table and then running the final query as slow as running the entire query in one go?
2022-09-29 27207, 2022
lucifer
wrtiting the data takes 1 hour for full table. it only begins after the query has finished executing iiuc.
2022-09-29 27234, 2022
lucifer
yes, splitting each cte and indexing before running final query is one of the approaches i want to check.
mayhem: did you mean the normalizing email or the Italian one? :)
2022-09-29 27216, 2022
reosarevok
Anyway, I'm dealing with both, but can you check the cover art one?
2022-09-29 27220, 2022
reosarevok
(heh, or the Spanish answer)
2022-09-29 27217, 2022
mayhem
both. :)
2022-09-29 27257, 2022
reosarevok
Answered all 3 now
2022-09-29 27247, 2022
mayhem
thanks
2022-09-29 27244, 2022
lucifer
mayhem: i updated the PR to move the debug playlists back to post recommendation step, only daily jams run hourly now. that was the easiest way to keep debug playlists and not generate them everyhour in my understanding.
2022-09-29 27207, 2022
mayhem
makes sense
2022-09-29 27215, 2022
lucifer
also, tested it just not on cron by changing my time zone to US/Hawaii. will merge once tests pass.
monkey: to be clear, i invalidated the match so that the mapper attempted to find a new match and this time it found the right once.
2022-09-29 27254, 2022
monkey
Ah, I see.
2022-09-29 27233, 2022
monkey
Any way we could improve the odd ones by using the release name in the listen?
2022-09-29 27246, 2022
lucifer
the mapper should be deterministic and find the same match each time unless there are some code changes or new additions to the database. its neither in this case so not sure why it chose the wrong one earlier.
2022-09-29 27224, 2022
lucifer
yes using release name in the listen is a planned improvement. mayhem can probably tell you more about the exact plan.
2022-09-29 27248, 2022
monkey
Thanks for the explanations !
2022-09-29 27202, 2022
lucifer
mayhem: can you review LB#2188. i'll do a release after that.
lucifer: hi, do you know which packet # that line is from by chance? and are you importing the packets into postgres first, or parsing the files directly?
2022-09-29 27208, 2022
lucifer
bitmap: parsing directly by reading the file as csv with excel-tab dialect and then trying to parse olddata/newdata column as json. i don't know the packet number currently but can find it.
currently, no UI available so have to use api directly.
2022-09-29 27208, 2022
alastairp
hah
2022-09-29 27212, 2022
alastairp
Tpken?
2022-09-29 27221, 2022
lucifer
ah yes, typo. should be Token
2022-09-29 27222, 2022
monkey
One pain I have in my personal workflow is I use the Pano Scrobbler app on my phone which connects to LFM/LB, but loved tracks are only sent to LFM. Not I'll be able to import them :)
2022-09-29 27253, 2022
lucifer
currently we only import loved tracks that have a mbid assignned to them by LFM.
2022-09-29 27228, 2022
lucifer
we could change it to lookup mbids ourself from mapper for tracks that do not have a mbid assigned or maybe look it up for all.
2022-09-29 27258, 2022
lucifer
alastairp: LFM uses track mbids in this endpoint. maybe that's where the confusion between track mbid and recording mbid in MLHD came from ?
2022-09-29 27239, 2022
alastairp
lucifer: hmm, no, we were definitely seeing both recording and track mbids in the same field from the response of some API
2022-09-29 27257, 2022
alastairp
I think the field name wwas "track_mbid" or something (as a fallback to pre-NGS)
2022-09-29 27207, 2022
lucifer
ah ok, i see.
2022-09-29 27230, 2022
alastairp
lucifer: I'll try and take a look at the BU PR when I get home
2022-09-29 27241, 2022
lucifer
for a couple of random users, i imported on a test account. i get: `{"inserted":3990,"invalid_mbid":0,"mbid_not_found":291,"missing_mbid":3745,"total":8026}`
2022-09-29 27258, 2022
alastairp
though, actually. maybe I'll just do it now, I trust all of your code :)
2022-09-29 27259, 2022
lucifer
its possible some of those 291 not found are recording mbids
2022-09-29 27207, 2022
alastairp
ohhh, interesting
2022-09-29 27215, 2022
lucifer
hehe. no hurry :)
2022-09-29 27204, 2022
kepstin
i think i actually noticed recently that last.fm switched from returning recording ids to track ids in the same field, yeah
2022-09-29 27241, 2022
kepstin
so older tracks imported from last.fm before some date will have recording ids, after they'll have track ids.
then we decided to add them anyway, but after identifying this, we changed them importer to put them in another field which wasn't the main LB recording mbid field
2022-09-29 27237, 2022
alastairp
I don't think we've done anything since then - we could definitely go back and look at all the fields we have in our listens and see which ones can map to recordings and which to trakcs
2022-09-29 27233, 2022
lucifer
with the mapper its much less relevant anyway.
2022-09-29 27239, 2022
alastairp
yes, right
2022-09-29 27248, 2022
alastairp
still, it'd be curious to see where lfm and mapper differ
That's all the PRs I had ready lucifer, thanks for waiting
2022-09-29 27225, 2022
lucifer
np, thanks. will do a release later today
2022-09-29 27229, 2022
alastairp
!m monkey
2022-09-29 27229, 2022
BrainzBot
You're doing good work, monkey!
2022-09-29 27236, 2022
alastairp
monkey: how goes the upgrade?
2022-09-29 27252, 2022
alastairp
I saw you closed dependabot, not sure if it was getting noisy due to changes you made, or was just being annoying
2022-09-29 27254, 2022
monkey
Took a break from diving into nivo internals.
2022-09-29 27255, 2022
monkey
I closed one dependabot PR that was doing the same upgrade I've started, but without implementing the required changes (i.e. still considered a point release despite some APIs and props completely changed
2022-09-29 27203, 2022
monkey
But I'm getting there
2022-09-29 27235, 2022
alastairp
nice
2022-09-29 27237, 2022
monkey
I'm gonna have to deploy that one to test.LB to ensure all the graphs are working properly
zas: hey, when you are around can you check the openresty logs for prod.caa-redirect.access.log? there seems to be a lot of requests being spammed from a particular user agent