in #metabrainz

18:46 PM
lucifer[m]

mayhem: having spent many hours trying to debug issues with building popularity data, it turns out to be a casing issue in the _mbid field. spark doesn't have a uuid type so it treats them as strings. i think we should lowercase all UUIDs when accepting listens in LB automatically, alternatively we need to lowercase them at dumps time or in each query in spark.
18:47 PM
mayhem[m]

but we store them in PG which has a UUID type, which I would assume doesn't have a case, right?
18:47 PM
lucifer[m]

also, need to fix it for existing listens which is not fun but oh well
18:48 PM
we store user submitted listen data (additional_info) as json
18:48 PM
mayhem[m]

oh, this is in the JSON field, which doesn't have a UUID type,.
18:48 PM
I think we should to both. convert to lower case when ingesting, but also do so when we use them for stats work.
18:49 PM
lucifer[m]

doing it when using them for stats work would likely make all query slower.
18:49 PM
if we do it ingestion time, i think we should be fine. given that we also fix the listens already ingested.
18:50 PM
mayhem[m]

fun. well, we need to add unique ids to the listen table, so might as well do it then
18:50 PM
lucifer[m]

yeah, makes sense.
18:51 PM
i'll update the popularity queries for now. open a ticket for rest of the stuff.
18:53 PM
BobSwift[m] has quit
18:57 PM
pite has quit
20:19 PM
minimal has quit
20:20 PM
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged and not empty as it is bridged to IRC; see https://musicbrainz.org/doc/ChatBrainz for details | Agenda: Reviews, Hetzner mainboard repl. (zas)
20:31 PM
lusciouslover has quit
21:08 PM
spynx joined the channel
21:08 PM
spynxic has quit
21:12 PM
Kladky has quit
21:42 PM
lusciouslover joined the channel