MBS-2278: Sorting collections by artist should use the "sort name" of the artist
2019-05-20 14010, 2019
reosarevok
(that's a lot of votes)
2019-05-20 14056, 2019
reosarevok
"using joined sort-names of the artists from the artist credit" is probably the best we can do since it makes no sense to add sort names to artist credits as a deeper concept
2019-05-20 14030, 2019
yvanzo
reosarevok, bitmap: On a related issue, I’m looking at replacing paging with react-window for collections.
2019-05-20 14053, 2019
reosarevok
yvanzo: is that endless scrolling? Tell me it isn't endless scrolling
2019-05-20 14028, 2019
reosarevok
(that'd not solve any of the main issues of pagination, such as making things findable in one go, but would also make it impossible to just load multiple pages in different tabs to quickly check them all)
2019-05-20 14029, 2019
yvanzo
it doesn´t need to be endless scrolling (and collections are not endless either) and it can make things findable in one go with anchors.
Hello! I've been trying to build a full index in Solr, with limited success. I'm using mb-solr@v3.0 and sir@schema-24. I consistently seeing failures in the Solr log for artists and recordings. In the case of artists, I see the same error: "unknown field 'primary_alias'", in the case of recordings I consistently see "missing required field: name". What's odd is that if I check a failed artist on search.musicbrainz.org, it's present so
2019-05-20 14057, 2019
Matthew_
you've clearly successfully indexed it but it fails for me?
2019-05-20 14045, 2019
ruaok
morning Matthew_. yvanzo is the particular person you need to speak with. most of the rest of us only have cursory knowledge of how the search stuff works, let alone how to debug it.
2019-05-20 14038, 2019
Matthew_
Thanks ruaok.
2019-05-20 14058, 2019
mueslo has quit
2019-05-20 14029, 2019
mueslo joined the channel
2019-05-20 14004, 2019
pristine__
ruaok: hi
2019-05-20 14032, 2019
yvanzo
Hi Matthew_, I double-checked and our latest docker images seem to match these versions. Can you check mb-solr submodules are in sync in your clone?
2019-05-20 14030, 2019
Matthew_
Thanks, yvanzo. Will take a look. On a related note, how long typically does a full reindex take for you and what is the server spec?
Matthew_: I don’t know, SIR is still beta, there currently is an issue with SIR which doesn’t returns even though reindexing is complete.
2019-05-20 14036, 2019
Matthew_
Thanks, yvanzo. So how do you currently rebuild your indices?
2019-05-20 14026, 2019
yvanzo
I never had to run a full reindex (or any reindex) on prod servers for now, still learning from previous devs. :)
2019-05-20 14002, 2019
Matthew_
Fair enough! That might be a problem for us though. We require the ability to originate a full index on our slave implementation for the purposes of initial setup and also disaster recovery. Anyhow, I'll double check the dependencies / versions...
2019-05-20 14011, 2019
yvanzo
Matthew_: I successfully built search indexes locally on sample data without issue about primary alias, will make further tests with full data.
2019-05-20 14016, 2019
Matthew_
Thanks yvanzo. From what I've seen, the error appears pretty quickly once artists start being indexed. However, it's probably that it's user error on my part and the deps are out of kilter. Will let you know for sure...
2019-05-20 14035, 2019
ruaok
hi pristine__ !
2019-05-20 14049, 2019
ruaok
I'm feeling a lot better (though not 100%) so I am slowly catching up.
2019-05-20 14001, 2019
pristine__
that's good to hear :)
2019-05-20 14019, 2019
pristine__
ruaok: there is a good news.
2019-05-20 14031, 2019
ruaok
I could use some of those. :)
2019-05-20 14050, 2019
CatQuest
[07:54] <yvanzo> reosarevok, bitmap: On a related issue, I’m looking at replacing paging with react-window for collections.
2019-05-20 14051, 2019
CatQuest
NO please
2019-05-20 14059, 2019
pristine__
We were able to reduce lookup time from 12 hours to around 2 hours.
2019-05-20 14055, 2019
pristine__
I have made a few changes, the script is running on leader now, I will forward the HTML files to you in some time.
2019-05-20 14031, 2019
ruaok
lookup meaning running the model?
2019-05-20 14041, 2019
pristine__
You can compare the lookup time of the script I sent you yesterday and of the one I will send you.
2019-05-20 14043, 2019
pristine__
no
2019-05-20 14004, 2019
ruaok
training vs running.
2019-05-20 14050, 2019
pristine__
ruaok: "look up" : after predicting recommendations, we get recording ids of the recommended songs, then we lookup for relevant information (track name, artist name etc) corresponding to the recording ids.
2019-05-20 14008, 2019
D4RK-PH0ENiX has quit
2019-05-20 14016, 2019
ruaok
oh, so the 10 hours didn't involve models at all?
2019-05-20 14052, 2019
pristine__
no. 2 hours to train the model plus 12 hours to predict tracks and lookup
2019-05-20 14043, 2019
pristine__
in these two hours, we are training 8 models.
2019-05-20 14007, 2019
pristine__
and computing each model's RMSE
2019-05-20 14011, 2019
ruaok
yep
2019-05-20 14020, 2019
ruaok
ok, so here will be the proof if your work.
2019-05-20 14003, 2019
ruaok
when I get the chance to look at the HTML files I will want to understand what "12 hours to predict tracks and lookup" means.
2019-05-20 14022, 2019
ruaok
because those are two very distinct steps and we should know which step takes how long
2019-05-20 14028, 2019
pristine__
did you read the HTML?
2019-05-20 14051, 2019
ruaok
not yet, but I remain hopeful. :)
2019-05-20 14058, 2019
ruaok
let me do that right now.
2019-05-20 14025, 2019
pristine__
yeah, they will help you to understand (I hope)
2019-05-20 14027, 2019
pristine__
okay
2019-05-20 14049, 2019
ruaok
yes, looking much better. nicely done.
2019-05-20 14059, 2019
ruaok
but, there are still some things to improve.
2019-05-20 14025, 2019
pristine__
Okay
2019-05-20 14031, 2019
ruaok
on the model training page... you have roughly three sections of data.
2019-05-20 14056, 2019
ruaok
model info, explanations for model info and the table of models generated.
2019-05-20 14022, 2019
ruaok
the most useful things are at the bottom of the page.
2019-05-20 14042, 2019
ruaok
reference / explanation which we will need once or at least infrequently is near the top.
2019-05-20 14050, 2019
ruaok
the table bottom/middle.
2019-05-20 14015, 2019
ruaok
I think the last 5 lines should be near the top, perhaps in a concise table as well.
2019-05-20 14027, 2019
ruaok
along with "Preprocessing of playcounts-dataframe takes 105.73s. Of the preprocessed data, approx. 66% (15081669) listens have been used as training data, 17% (3773882) listens have been used as validation data and 17% (3772169) listens have been used as test data. After preprocessing, training phase starts. From the models trained, the best one is selected to generate recommendations."
2019-05-20 14038, 2019
ruaok
then the table of model trainings and finally the reference stuff.
2019-05-20 14042, 2019
Matthew_
yvanzo. I can confirm that I'm running mb-solr@v3.0, mbsssss@5e6153f, mmd-schema@40e2115, sir@d28c977
2019-05-20 14053, 2019
pristine__
yeah, right.
2019-05-20 14004, 2019
ruaok
this way, the most important stuff is near the top where we wish to see it and the less important stuff as we go down.
2019-05-20 14013, 2019
ruaok
make sense?
2019-05-20 14016, 2019
Matthew_
(Solr version 7.7.1)
2019-05-20 14040, 2019
ruaok
but, it looks like you have all info relevant to this page now, which is good.
2019-05-20 14059, 2019
pristine__
yeah. I put reference stuff at the top because it will be needed to understand the table. but yeah, makes sene :)
2019-05-20 14036, 2019
pristine__
It is like a story, so I tried to put everything in order of the script.
I'm talking about the "data collection" page. I don't see that model ID on that page.
2019-05-20 14004, 2019
pristine__
because at that time
2019-05-20 14018, 2019
pristine__
we have not trained the model
2019-05-20 14049, 2019
pristine__
we are just collecting data to be able to preprocess it :)
2019-05-20 14030, 2019
ruaok
ohh,I see.
2019-05-20 14020, 2019
ruaok
but, I can look at the two pages and not correlate them.
2019-05-20 14034, 2019
pristine__
The three HTMLs correspond to three scripts that we use in the whole process. and they are in that order. there is a link at the bottom to go to the next.
2019-05-20 14024, 2019
pristine__
did you read about the playcounts-df in the "data collection" HTMl?
2019-05-20 14044, 2019
pristine__
I mean playcounts-dataframe
2019-05-20 14015, 2019
ruaok
hmm, ok.
2019-05-20 14020, 2019
ruaok
but I see a problem.
2019-05-20 14027, 2019
pristine__
yeah?
2019-05-20 14030, 2019
ruaok
you're using dates to emit filenames.
2019-05-20 14038, 2019
ruaok
there will be multiple runs on the same day.
2019-05-20 14033, 2019
ruaok
and I suppose that the reports will be sufficiently linked if the reader can go bidirectionally.
2019-05-20 14006, 2019
pristine__
yeah. I have used dates to name html files, It should be changed.