so the idea is that it would match a recording from a release made in US before, say, Chile? (assuming the recording was in fact released in both countries)
2022-03-15 07436, 2022
mayhem
I see the point of this one and it would be nice, but might be hard to do.
2022-03-15 07441, 2022
alastairp
is this a year tiebreaker? or independent of the year?
2022-03-15 07457, 2022
mayhem
year is one of the much earlier sort columsn.
2022-03-15 07401, 2022
lucifer
year tiebreaker
2022-03-15 07411, 2022
alastairp
right, so it'll happen in the case that the year is the same
2022-03-15 07417, 2022
mayhem
yes.
2022-03-15 07428, 2022
mayhem
and there is a case in the mapping later that puts this to a finer point.
we could also try including album name for mapping (since many listens have those) but that would complicate stuff.
2022-03-15 07434, 2022
mayhem
things we've decided: use PG unaccent, improve format sort for dj-mix/single, add country sort.
2022-03-15 07452, 2022
mayhem
lucifer: it tends to make things worse, really.
2022-03-15 07428, 2022
mayhem
I would prefer to keep album out -- at least until a clear use case emerges.
2022-03-15 07440, 2022
lucifer
oh! yeah makes sense to leave out for now then
2022-03-15 07429, 2022
mayhem
ok, so this brings us to fixing existing issues - those will be tricky, but I can hammer those out in a few days time.
2022-03-15 07437, 2022
mayhem
lets discuss de-tuning.
2022-03-15 07413, 2022
mayhem
right now we have an iterative approach to this process. try exact, detune, try fuzzy, etc.
2022-03-15 07439, 2022
mayhem
and that is too expensive to do if we are trying to make a better API endpoint.
2022-03-15 07406, 2022
mayhem
and given the timings, the exact lookup is MUCH faster and the most common match type, we should do this:
2022-03-15 07410, 2022
mayhem
1. Exact match.
2022-03-15 07412, 2022
mayhem
2. Fuzzy match
2022-03-15 07423, 2022
mayhem
3. Detune,
2022-03-15 07444, 2022
mayhem
4 (option A): Exact match, fuzzy match.
2022-03-15 07450, 2022
mayhem
4 (option B), Fuzzy match.
2022-03-15 07413, 2022
reosarevok
bitmap: is your tags code for the schema change also solving MBS-11755? If not, we can run a script whenever to delete the extra tags, see comment there for a (hopefully relevant, I wrote it ages ago) query
Selena Gomez with Rauw Alejandro or Selena Gomez w/ Rauw Alejandro is the artist name in MB. Baila conmigo is recording name.
2022-03-15 07443, 2022
lucifer
spotify does. Baila conmigo (with Rauw Alejandro) as recording name and Selena Gomez as artist name.
2022-03-15 07408, 2022
lucifer
so this case may need detuning MB data to match.
2022-03-15 07448, 2022
mayhem
if we feel that MB needs detuning, then we should add detuned rows to the index.
2022-03-15 07406, 2022
lucifer
or maybe this gets caught by fuzzy match.
2022-03-15 07424, 2022
mayhem
the fuzzy match will match 2-3 characters at most.
2022-03-15 07433, 2022
lucifer
i see, makes sense.
2022-03-15 07436, 2022
mayhem
match a difference of 2-3 characters at most.
2022-03-15 07441, 2022
lucifer
can we do fuzzy match on words?
2022-03-15 07444, 2022
mayhem
or otherwise it slows down.
2022-03-15 07457, 2022
mayhem
yes, that is supported and another order of magnitude slower than just letters
2022-03-15 07409, 2022
lucifer
artist_name + recording_name (incoming) fuzzy match on artist_name + recording_name Mb data
2022-03-15 07417, 2022
mayhem
I put it in last night and took it back out immediately, since it was sooooo slow.
2022-03-15 07420, 2022
lucifer
oh :/
2022-03-15 07454, 2022
lucifer
can we do a faster endpoint and a slower background one?
2022-03-15 07420, 2022
mayhem
yes, I think that is a good approach.
2022-03-15 07426, 2022
lucifer
a background process that reads unmatched stuff from mapping table and looks it up via the slower means.
2022-03-15 07432, 2022
lucifer
cool sounds good
2022-03-15 07443, 2022
mayhem
I am already clear on the fact that I want to keep the pipeline for mapping around. it is working well.
2022-03-15 07429, 2022
mayhem
ok, given that we want to work on this stuff this week, where should we split the work?
2022-03-15 07429, 2022
lucifer
makes sense.
2022-03-15 07440, 2022
mayhem
I know how to work on the things already discussed.
2022-03-15 07449, 2022
mayhem
I wonder if you'd be open for working on a better detuning engine.
2022-03-15 07410, 2022
mayhem
you seem to have ideas on that front and its a pretty separate piece of code.
2022-03-15 07454, 2022
lucifer
sure makes sense
2022-03-15 07429, 2022
mayhem
maybe draw up your thinking on a gist/doc so we can discuss?
2022-03-15 07452, 2022
lucifer
do we keep typesense around in the background pipeline?
2022-03-15 07438, 2022
mayhem
possibly. I have a feeling it performs better for fuzzy matching up to 5 characters.
2022-03-15 07457, 2022
mayhem
let me add some timing to the typesense based search and then we'll have a better idea.
2022-03-15 07404, 2022
reosarevok
yvanzo, bitmap: I added some descriptions for the different tickets to the schema change draft doc too, btw
2022-03-15 07406, 2022
mayhem
that will be my first task, I think.
2022-03-15 07411, 2022
lucifer
makes sense 👍
2022-03-15 07424, 2022
reosarevok
yvanzo: I also added the description from last year's blog post to your AC ticket, but do check if it's still correct
2022-03-15 07452, 2022
lucifer
mayhem, do we have a process that periodically rechecks unmatched listens automatically or is it manually invalidating some rows?
2022-03-15 07419, 2022
mayhem
right now I invalidate rows by hand. I just invalidated all no_match matches for 2021.
2022-03-15 07428, 2022
mayhem
this needs to be automated. not sure how yet.
2022-03-15 07445, 2022
lucifer
cron job to invalidate listens weekly, newer ones more frequently and older ones less so?
2022-03-15 07401, 2022
mayhem
that
2022-03-15 07447, 2022
Dijia
Hi, I had a problem when submitting my listen record. I have successfully built the development environment, everything works well except the "listens" page. Everytime I open this page, it shows "get /socket.io/?eio=4&transport=polling&t=nz-uqq1 http/1.1" 404 in git. When I tried to submit a record, the number of recent listens can increase but no listen records shown in the page. I have googled this bug but there seems to be no solutions. Is there
2022-03-15 07447, 2022
Dijia
anyone know what to do with that?
2022-03-15 07433, 2022
lucifer
Dijia: uh yeah, that's a known issue. can you try running `./develop.sh manage update_user_listen_data` ? (that 404 is unrelated)
2022-03-15 07440, 2022
mayhem
Dijia: hi. if you have errors like these, it is best to past the error you're getting -- that helps us understand better.
2022-03-15 07410, 2022
mayhem
unless you're lucifer , who is clearly a mind-reader. :)
2022-03-15 07437, 2022
lucifer
being the devil comes with its perks :)
2022-03-15 07430, 2022
Dijia
Ah thank you lucifer!! It works!! Amazing!
2022-03-15 07430, 2022
mayhem
the dark side does seem to have better perks. sigh.
2022-03-15 07436, 2022
lucifer
this is the same issue that we discussed last week, happens in prod due to which listens don't appear until cron job runs but well it never runs in dev so listens would never appear. i'll get the fix out soon.
2022-03-15 07435, 2022
lucifer
mayhem: oh i forgot to tell you. that query downgraded to full chunk scan again :-(. the test reproducer i had created to compare 11/13 works fine on 13 but the actual still query doesn't. i think i have found another work around (using a subquery isntead of CTE). opening a TS bug as we speak.
Hi yellowhatpro! The design looks good to me so far. We can have a figma design for it if you're comfortable. Otherwise also I think you can proceed :)
2022-03-15 07440, 2022
lucifer
instead of JOINing to CTE and select from it as subquery and you get chunk exclusion.
>The last successful request was processed 71 days after the first email. The GDPR doesn’t define “without undue delay”, but I’m fairly certain that it requires companies to not stall for over 10 weeks.
2022-03-15 07410, 2022
lucifer
spotify assumes it to mean 3 months apparently
2022-03-15 07432, 2022
mayhem
once again, we're among the few who take this seriously.
2022-03-15 07449, 2022
lucifer
indeed
2022-03-15 07457, 2022
yellowhatpro
<akshaaatt> "Hi yellowhatpro! The design..." <- Thanks sempaiii.. I will be working on the Figma designs then. (/≧▽≦)/
2022-03-15 07419, 2022
mayhem
lucifer: alastairp : I guess we're keeping typesense then
pg_trgm can't touch that. just like mchammer can't touch THIS.
2022-03-15 07412, 2022
alastairp
fuzzy 16/sec
2022-03-15 07443, 2022
mayhem
16/sec at .6 which is about 2 edit distance on average.
2022-03-15 07452, 2022
mayhem
so, pg_trm is quite a bit slower, sadly.
2022-03-15 07421, 2022
alastairp
is postgres memory settings on bono optimal?
2022-03-15 07414, 2022
reosarevok
mayhem: wow D:
2022-03-15 07423, 2022
ankes
Hi, I have seen that after Spotify's hiccup last week the listening logs collection in ListenBrainz stopped working for some accounts. I am monitoring a few users for an experiment that I am doing, and after asking them to disconnect / reconnect, still it's not working (I have written about this issue yesterday to the MetaBrainz contact email). Is
2022-03-15 07423, 2022
ankes
there any way to check if their ListenBrainz accounts are still linked to Spotify (and that the collection is working properly)?
2022-03-15 07428, 2022
lucifer
ankes: hi! yeah if you can share the username with us, we can check whether spotify is linked or not. other than that all data is public so you can see if listens are coming on website/api, then its working.
2022-03-15 07449, 2022
lucifer
mayhem: oh nice! what about edit distance 2, 3? if typesense is fast there too then might as well not do fuzzy match in pg. also which version of typesense is this, probably should upgrade to latest for more enhancements.
2022-03-15 07439, 2022
ankes
lucifer thanks! for instance "draconisfirebolt" was working until last tue, then stopped, but then after disconnecting/reconnecting still is not working. The same goes for "ByeBye", "bigDart" and "Danysanak" (I doublecheck with the API)
2022-03-15 07456, 2022
CatQuest
whatever is 0 after that and remove it though
2022-03-15 07457, 2022
CatQuest
will this remove tags fro msearch and the like that have literally no hits?
2022-03-15 07436, 2022
CatQuest
it annoys me to no end that misspelled tags I made one second exist forever because you can't permaremove tags
2022-03-15 07439, 2022
zas
atj: about ansible role for haproxy, I think we'll need quite a lot specific settings, but we can use one as basis. I read some roles are not 100% compatible with most recent haproxy versions, we'll likely use one of the most recent version (2.5.x) because we need some very recent features
2022-03-15 07448, 2022
lucifer
ankes: all of those disconnected on 8th (probably due to the spotify downtime), and haven't been reconnected since.
2022-03-15 07440, 2022
ankes
lucifer this is strange because they told me they did it. I will ask them to double-check. thanks!
2022-03-15 07434, 2022
lucifer
👍, i also disconnected/reconnected my account just to confirm that our part of workflow if working fine.
2022-03-15 07440, 2022
lucifer
*just now
2022-03-15 07400, 2022
lucifer
alastairp: had you tried importing from the pg_dump you made the other day for ts? i am trying to dump my local db (~400 listens) and import to create a small smaple for TS bug report and importing from it is failing.
2022-03-15 07431, 2022
alastairp
lucifer: I didn't make a pg_dump, we just copied the entire data directory