MBS-2278: Sorting collections by artist should use the "sort name" of the artist
reosarevok
(that's a lot of votes)
"using joined sort-names of the artists from the artist credit" is probably the best we can do since it makes no sense to add sort names to artist credits as a deeper concept
yvanzo
reosarevok, bitmap: On a related issue, I’m looking at replacing paging with react-window for collections.
reosarevok
yvanzo: is that endless scrolling? Tell me it isn't endless scrolling
(that'd not solve any of the main issues of pagination, such as making things findable in one go, but would also make it impossible to just load multiple pages in different tabs to quickly check them all)
yvanzo
it doesn´t need to be endless scrolling (and collections are not endless either) and it can make things findable in one go with anchors.
Hello! I've been trying to build a full index in Solr, with limited success. I'm using mb-solr@v3.0 and sir@schema-24. I consistently seeing failures in the Solr log for artists and recordings. In the case of artists, I see the same error: "unknown field 'primary_alias'", in the case of recordings I consistently see "missing required field: name". What's odd is that if I check a failed artist on search.musicbrainz.org, it's present so
you've clearly successfully indexed it but it fails for me?
ruaok
morning Matthew_. yvanzo is the particular person you need to speak with. most of the rest of us only have cursory knowledge of how the search stuff works, let alone how to debug it.
Matthew_
Thanks ruaok.
mueslo has quit
mueslo joined the channel
pristine__
ruaok: hi
yvanzo
Hi Matthew_, I double-checked and our latest docker images seem to match these versions. Can you check mb-solr submodules are in sync in your clone?
Matthew_
Thanks, yvanzo. Will take a look. On a related note, how long typically does a full reindex take for you and what is the server spec?
Matthew_: I don’t know, SIR is still beta, there currently is an issue with SIR which doesn’t returns even though reindexing is complete.
Matthew_
Thanks, yvanzo. So how do you currently rebuild your indices?
yvanzo
I never had to run a full reindex (or any reindex) on prod servers for now, still learning from previous devs. :)
Matthew_
Fair enough! That might be a problem for us though. We require the ability to originate a full index on our slave implementation for the purposes of initial setup and also disaster recovery. Anyhow, I'll double check the dependencies / versions...
yvanzo
Matthew_: I successfully built search indexes locally on sample data without issue about primary alias, will make further tests with full data.
Matthew_
Thanks yvanzo. From what I've seen, the error appears pretty quickly once artists start being indexed. However, it's probably that it's user error on my part and the deps are out of kilter. Will let you know for sure...
ruaok
hi pristine__ !
I'm feeling a lot better (though not 100%) so I am slowly catching up.
pristine__
that's good to hear :)
ruaok: there is a good news.
ruaok
I could use some of those. :)
CatQuest
[07:54] <yvanzo> reosarevok, bitmap: On a related issue, I’m looking at replacing paging with react-window for collections.
NO please
pristine__
We were able to reduce lookup time from 12 hours to around 2 hours.
I have made a few changes, the script is running on leader now, I will forward the HTML files to you in some time.
ruaok
lookup meaning running the model?
pristine__
You can compare the lookup time of the script I sent you yesterday and of the one I will send you.
no
ruaok
training vs running.
pristine__
ruaok: "look up" : after predicting recommendations, we get recording ids of the recommended songs, then we lookup for relevant information (track name, artist name etc) corresponding to the recording ids.
D4RK-PH0ENiX has quit
ruaok
oh, so the 10 hours didn't involve models at all?
pristine__
no. 2 hours to train the model plus 12 hours to predict tracks and lookup
in these two hours, we are training 8 models.
and computing each model's RMSE
ruaok
yep
ok, so here will be the proof if your work.
when I get the chance to look at the HTML files I will want to understand what "12 hours to predict tracks and lookup" means.
because those are two very distinct steps and we should know which step takes how long
pristine__
did you read the HTML?
ruaok
not yet, but I remain hopeful. :)
let me do that right now.
pristine__
yeah, they will help you to understand (I hope)
okay
ruaok
yes, looking much better. nicely done.
but, there are still some things to improve.
pristine__
Okay
ruaok
on the model training page... you have roughly three sections of data.
model info, explanations for model info and the table of models generated.
the most useful things are at the bottom of the page.
reference / explanation which we will need once or at least infrequently is near the top.
the table bottom/middle.
I think the last 5 lines should be near the top, perhaps in a concise table as well.
along with "Preprocessing of playcounts-dataframe takes 105.73s. Of the preprocessed data, approx. 66% (15081669) listens have been used as training data, 17% (3773882) listens have been used as validation data and 17% (3772169) listens have been used as test data. After preprocessing, training phase starts. From the models trained, the best one is selected to generate recommendations."
then the table of model trainings and finally the reference stuff.
Matthew_
yvanzo. I can confirm that I'm running mb-solr@v3.0, mbsssss@5e6153f, mmd-schema@40e2115, sir@d28c977
pristine__
yeah, right.
ruaok
this way, the most important stuff is near the top where we wish to see it and the less important stuff as we go down.
make sense?
Matthew_
(Solr version 7.7.1)
ruaok
but, it looks like you have all info relevant to this page now, which is good.
pristine__
yeah. I put reference stuff at the top because it will be needed to understand the table. but yeah, makes sene :)
It is like a story, so I tried to put everything in order of the script.
I'm talking about the "data collection" page. I don't see that model ID on that page.
pristine__
because at that time
we have not trained the model
we are just collecting data to be able to preprocess it :)
ruaok
ohh,I see.
but, I can look at the two pages and not correlate them.
pristine__
The three HTMLs correspond to three scripts that we use in the whole process. and they are in that order. there is a link at the bottom to go to the next.
did you read about the playcounts-df in the "data collection" HTMl?
I mean playcounts-dataframe
ruaok
hmm, ok.
but I see a problem.
pristine__
yeah?
ruaok
you're using dates to emit filenames.
there will be multiple runs on the same day.
and I suppose that the reports will be sufficiently linked if the reader can go bidirectionally.
pristine__
yeah. I have used dates to name html files, It should be changed.