mayhem: having spent many hours trying to debug issues with building popularity data, it turns out to be a casing issue in the _mbid field. spark doesn't have a uuid type so it treats them as strings. i think we should lowercase all UUIDs when accepting listens in LB automatically, alternatively we need to lowercase them at dumps time or in each query in spark.
mayhem[m]
but we store them in PG which has a UUID type, which I would assume doesn't have a case, right?
lucifer[m]
also, need to fix it for existing listens which is not fun but oh well
we store user submitted listen data (additional_info) as json
mayhem[m]
oh, this is in the JSON field, which doesn't have a UUID type,.
I think we should to both. convert to lower case when ingesting, but also do so when we use them for stats work.
lucifer[m]
doing it when using them for stats work would likely make all query slower.
if we do it ingestion time, i think we should be fine. given that we also fix the listens already ingested.
mayhem[m]
fun. well, we need to add unique ids to the listen table, so might as well do it then
lucifer[m]
yeah, makes sense.
i'll update the popularity queries for now. open a ticket for rest of the stuff.
BobSwift[m] has quit
pite has quit
minimal has quit
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged and not empty as it is bridged to IRC; see https://musicbrainz.org/doc/ChatBrainz for details | Agenda: Reviews, Hetzner mainboard repl. (zas)