#metabrainz

/

0:11 AM
Lotheric has quit

2019-05-20 14056, 2019

0:14 AM
Lotheric joined the channel

2019-05-20 14044, 2019

0:47 AM
Protab joined the channel

2019-05-20 14044, 2019

0:47 AM
Rotab has quit

2019-05-20 14052, 2019

0:57 AM
Protab is now known as Rotab

2019-05-20 14003, 2019

1:03 AM
Nyanko-sensei joined the channel

2019-05-20 14057, 2019

1:05 AM
D4RK-PH0ENiX has quit

2019-05-20 14042, 2019

1:47 AM
Nyanko-sensei has quit

2019-05-20 14019, 2019

1:48 AM
D4RK-PH0ENiX joined the channel

2019-05-20 14008, 2019

4:57 AM
Nyanko-sensei joined the channel

2019-05-20 14048, 2019

4:58 AM
D4RK-PH0ENiX has quit

2019-05-20 14021, 2019

5:15 AM
Nyanko-sensei has quit

2019-05-20 14016, 2019

5:16 AM
D4RK-PH0ENiX joined the channel

2019-05-20 14058, 2019

5:48 AM
reosarevok

bitmap, yvanzo: we should try to find a solution for https://tickets.metabrainz.org/browse/MBS-2278 one of these days

2019-05-20 14059, 2019

5:48 AM
BrainzBot

MBS-2278: Sorting collections by artist should use the "sort name" of the artist

2019-05-20 14010, 2019

5:49 AM
reosarevok

(that's a lot of votes)

2019-05-20 14056, 2019

5:51 AM
reosarevok

"using joined sort-names of the artists from the artist credit" is probably the best we can do since it makes no sense to add sort names to artist credits as a deeper concept

2019-05-20 14030, 2019

5:54 AM
yvanzo

reosarevok, bitmap: On a related issue, I’m looking at replacing paging with react-window for collections.

2019-05-20 14053, 2019

6:04 AM
reosarevok

yvanzo: is that endless scrolling? Tell me it isn't endless scrolling

2019-05-20 14028, 2019

6:10 AM
reosarevok

(that'd not solve any of the main issues of pagination, such as making things findable in one go, but would also make it impossible to just load multiple pages in different tabs to quickly check them all)

2019-05-20 14029, 2019

6:19 AM
yvanzo

it doesn´t need to be endless scrolling (and collections are not endless either) and it can make things findable in one go with anchors.

2019-05-20 14005, 2019

6:38 AM
reosarevok

Findable meaning "ctrl + f for a name", I meant

2019-05-20 14059, 2019

6:41 AM
Gore|work has quit

2019-05-20 14023, 2019

6:47 AM
amCap1712 has quit

2019-05-20 14023, 2019

6:47 AM
LordSputnik has quit

2019-05-20 14024, 2019

6:47 AM
RJ2 has quit

2019-05-20 14024, 2019

6:47 AM
akhilesh has quit

2019-05-20 14036, 2019

6:47 AM
akhilesh joined the channel

2019-05-20 14041, 2019

6:47 AM
amCap1712 joined the channel

2019-05-20 14044, 2019

6:47 AM
LordSputnik joined the channel

2019-05-20 14044, 2019

6:47 AM
RJ2 joined the channel

2019-05-20 14047, 2019

6:47 AM
spellew has quit

2019-05-20 14047, 2019

6:47 AM
Cyna has quit

2019-05-20 14000, 2019

6:48 AM
spellew joined the channel

2019-05-20 14001, 2019

6:48 AM
HorusHorrendus has quit

2019-05-20 14001, 2019

6:48 AM
Mr_Monkey has quit

2019-05-20 14001, 2019

6:48 AM
Cyna joined the channel

2019-05-20 14002, 2019

6:48 AM
discopatrick has quit

2019-05-20 14010, 2019

6:48 AM
xarph has quit

2019-05-20 14010, 2019

6:48 AM
alastairp has quit

2019-05-20 14014, 2019

6:48 AM
Mr_Monkey joined the channel

2019-05-20 14021, 2019

6:48 AM
alastairp joined the channel

2019-05-20 14023, 2019

6:48 AM
HorusHorrendus joined the channel

2019-05-20 14031, 2019

6:48 AM
xarph joined the channel

2019-05-20 14033, 2019

6:48 AM
discopatrick joined the channel

2019-05-20 14004, 2019

6:50 AM
reosarevok

https://www.bbc.com/news/business-48330310 well fuck

2019-05-20 14008, 2019

6:50 AM
Leftmost has quit

2019-05-20 14045, 2019

6:51 AM
modwizcode has quit

2019-05-20 14000, 2019

6:52 AM
modwizcode joined the channel

2019-05-20 14002, 2019

6:52 AM
Gore|work joined the channel

2019-05-20 14057, 2019

6:54 AM
Leftmost joined the channel

2019-05-20 14046, 2019

7:10 AM
Cyna has quit

2019-05-20 14046, 2019

7:10 AM
Cyna joined the channel

2019-05-20 14003, 2019

8:44 AM
ruaok

moooin

2019-05-20 14056, 2019

9:06 AM
Gazooo joined the channel

2019-05-20 14027, 2019

9:07 AM
Matthew_ joined the channel

2019-05-20 14056, 2019

9:10 AM
Matthew_

Hello! I've been trying to build a full index in Solr, with limited success. I'm using mb-solr@v3.0 and sir@schema-24. I consistently seeing failures in the Solr log for artists and recordings. In the case of artists, I see the same error: "unknown field 'primary_alias'", in the case of recordings I consistently see "missing required field: name". What's odd is that if I check a failed artist on search.musicbrainz.org, it's present so

2019-05-20 14057, 2019

9:10 AM
Matthew_

you've clearly successfully indexed it but it fails for me?

2019-05-20 14045, 2019

9:11 AM
ruaok

morning Matthew_. yvanzo is the particular person you need to speak with. most of the rest of us only have cursory knowledge of how the search stuff works, let alone how to debug it.

2019-05-20 14038, 2019

9:12 AM
Matthew_

Thanks ruaok.

2019-05-20 14058, 2019

9:48 AM
mueslo has quit

2019-05-20 14029, 2019

9:50 AM
mueslo joined the channel

2019-05-20 14004, 2019

10:13 AM
pristine__

ruaok: hi

2019-05-20 14032, 2019

10:16 AM
yvanzo

Hi Matthew_, I double-checked and our latest docker images seem to match these versions. Can you check mb-solr submodules are in sync in your clone?

2019-05-20 14030, 2019

10:18 AM
Matthew_

Thanks, yvanzo. Will take a look. On a related note, how long typically does a full reindex take for you and what is the server spec?

2019-05-20 14042, 2019

10:19 AM
yvanzo

You can compare mbsssss and mmd-schema submodules with https://github.com/metabrainz/mb-solr/tree/v3.0

2019-05-20 14058, 2019

10:21 AM
yvanzo

Matthew_: I don’t know, SIR is still beta, there currently is an issue with SIR which doesn’t returns even though reindexing is complete.

2019-05-20 14036, 2019

10:22 AM
Matthew_

Thanks, yvanzo. So how do you currently rebuild your indices?

2019-05-20 14026, 2019

10:23 AM
yvanzo

I never had to run a full reindex (or any reindex) on prod servers for now, still learning from previous devs. :)

2019-05-20 14002, 2019

10:26 AM
Matthew_

Fair enough! That might be a problem for us though. We require the ability to originate a full index on our slave implementation for the purposes of initial setup and also disaster recovery. Anyhow, I'll double check the dependencies / versions...

2019-05-20 14011, 2019

10:29 AM
yvanzo

Matthew_: I successfully built search indexes locally on sample data without issue about primary alias, will make further tests with full data.

2019-05-20 14016, 2019

10:30 AM
Matthew_

Thanks yvanzo. From what I've seen, the error appears pretty quickly once artists start being indexed. However, it's probably that it's user error on my part and the deps are out of kilter. Will let you know for sure...

2019-05-20 14035, 2019

10:37 AM
ruaok

hi pristine__ !

2019-05-20 14049, 2019

10:37 AM
ruaok

I'm feeling a lot better (though not 100%) so I am slowly catching up.

2019-05-20 14001, 2019

10:46 AM
pristine__

that's good to hear :)

2019-05-20 14019, 2019

10:46 AM
pristine__

ruaok: there is a good news.

2019-05-20 14031, 2019

10:46 AM
ruaok

I could use some of those. :)

2019-05-20 14050, 2019

10:46 AM
CatQuest

[07:54] <yvanzo> reosarevok, bitmap: On a related issue, I’m looking at replacing paging with react-window for collections.

2019-05-20 14051, 2019

10:46 AM
CatQuest

NO please

2019-05-20 14059, 2019

10:46 AM
pristine__

We were able to reduce lookup time from 12 hours to around 2 hours.

2019-05-20 14055, 2019

10:47 AM
pristine__

I have made a few changes, the script is running on leader now, I will forward the HTML files to you in some time.

2019-05-20 14031, 2019

10:48 AM
ruaok

lookup meaning running the model?

2019-05-20 14041, 2019

10:48 AM
pristine__

You can compare the lookup time of the script I sent you yesterday and of the one I will send you.

2019-05-20 14043, 2019

10:48 AM
pristine__

no

2019-05-20 14004, 2019

10:49 AM
ruaok

training vs running.

2019-05-20 14050, 2019

10:50 AM
pristine__

ruaok: "look up" : after predicting recommendations, we get recording ids of the recommended songs, then we lookup for relevant information (track name, artist name etc) corresponding to the recording ids.

2019-05-20 14008, 2019

10:51 AM
D4RK-PH0ENiX has quit

2019-05-20 14016, 2019

10:52 AM
ruaok

oh, so the 10 hours didn't involve models at all?

2019-05-20 14052, 2019

10:53 AM
pristine__

no. 2 hours to train the model plus 12 hours to predict tracks and lookup

2019-05-20 14043, 2019

10:54 AM
pristine__

in these two hours, we are training 8 models.

2019-05-20 14007, 2019

10:55 AM
pristine__

and computing each model's RMSE

2019-05-20 14011, 2019

10:55 AM
ruaok

yep

2019-05-20 14020, 2019

10:55 AM
ruaok

ok, so here will be the proof if your work.

2019-05-20 14003, 2019

10:56 AM
ruaok

when I get the chance to look at the HTML files I will want to understand what "12 hours to predict tracks and lookup" means.

2019-05-20 14022, 2019

10:56 AM
ruaok

because those are two very distinct steps and we should know which step takes how long

2019-05-20 14028, 2019

10:56 AM
pristine__

did you read the HTML?

2019-05-20 14051, 2019

10:56 AM
ruaok

not yet, but I remain hopeful. :)

2019-05-20 14058, 2019

10:56 AM
ruaok

let me do that right now.

2019-05-20 14025, 2019

10:57 AM
pristine__

yeah, they will help you to understand (I hope)

2019-05-20 14027, 2019

10:57 AM
pristine__

okay

2019-05-20 14049, 2019

10:57 AM
ruaok

yes, looking much better. nicely done.

2019-05-20 14059, 2019

10:57 AM
ruaok

but, there are still some things to improve.

2019-05-20 14025, 2019

10:58 AM
pristine__

Okay

2019-05-20 14031, 2019

10:58 AM
ruaok

on the model training page... you have roughly three sections of data.

2019-05-20 14056, 2019

10:58 AM
ruaok

model info, explanations for model info and the table of models generated.

2019-05-20 14022, 2019

10:59 AM
ruaok

the most useful things are at the bottom of the page.

2019-05-20 14042, 2019

10:59 AM
ruaok

reference / explanation which we will need once or at least infrequently is near the top.

2019-05-20 14050, 2019

10:59 AM
ruaok

the table bottom/middle.

2019-05-20 14015, 2019

11:00 AM
ruaok

I think the last 5 lines should be near the top, perhaps in a concise table as well.

2019-05-20 14027, 2019

11:00 AM
ruaok

along with "Preprocessing of playcounts-dataframe takes 105.73s. Of the preprocessed data, approx. 66% (15081669) listens have been used as training data, 17% (3773882) listens have been used as validation data and 17% (3772169) listens have been used as test data. After preprocessing, training phase starts. From the models trained, the best one is selected to generate recommendations."

2019-05-20 14038, 2019

11:00 AM
ruaok

then the table of model trainings and finally the reference stuff.

2019-05-20 14042, 2019

11:00 AM
Matthew_

yvanzo. I can confirm that I'm running mb-solr@v3.0, mbsssss@5e6153f, mmd-schema@40e2115, sir@d28c977

2019-05-20 14053, 2019

11:00 AM
pristine__

yeah, right.

2019-05-20 14004, 2019

11:01 AM
ruaok

this way, the most important stuff is near the top where we wish to see it and the less important stuff as we go down.

2019-05-20 14013, 2019

11:01 AM
ruaok

make sense?

2019-05-20 14016, 2019

11:01 AM
Matthew_

(Solr version 7.7.1)

2019-05-20 14040, 2019

11:01 AM
ruaok

but, it looks like you have all info relevant to this page now, which is good.

2019-05-20 14059, 2019

11:01 AM
pristine__

yeah. I put reference stuff at the top because it will be needed to understand the table. but yeah, makes sene :)

2019-05-20 14036, 2019

11:02 AM
pristine__

It is like a story, so I tried to put everything in order of the script.

2019-05-20 14004, 2019

11:03 AM
ruaok

ah, I see. not a bad approach to things, really.

2019-05-20 14008, 2019

11:03 AM
yvanzo

Matthew_: https://github.com/metabrainz/mb-solr/blob/v3.0/D… is Solr 7.5.0, but I don’t think it is related to your issue.

2019-05-20 14033, 2019

11:03 AM
ruaok

but we and our community are the target audience of the script and I know how we're going to look at it time and time again.

2019-05-20 14009, 2019

11:04 AM
Matthew_

Aye. It should be backwardsly compatible with a point release. We don't use the docker file - we build RPMs for deployment.

2019-05-20 14014, 2019

11:04 AM
pristine__

sure. I will bring important things to the top :)

2019-05-20 14025, 2019

11:04 AM
ruaok

the data collection page is great, btw. however, it gives us stats about a model that was trained, but no model ID.

2019-05-20 14038, 2019

11:04 AM
pristine__

there is.

2019-05-20 14008, 2019

11:05 AM
pristine__

" listenbrainz-recommendation-model-bf1155df-b926-45b9-a5dc-69938811dd73"

2019-05-20 14009, 2019

11:05 AM
ruaok

ok, I still haven't found it.

2019-05-20 14015, 2019

11:05 AM
pristine__

something like this.

2019-05-20 14048, 2019

11:05 AM
ruaok

I'm talking about the "data collection" page. I don't see that model ID on that page.

2019-05-20 14004, 2019

11:06 AM
pristine__

because at that time

2019-05-20 14018, 2019

11:06 AM
pristine__

we have not trained the model

2019-05-20 14049, 2019

11:06 AM
pristine__

we are just collecting data to be able to preprocess it :)

2019-05-20 14030, 2019

11:07 AM
ruaok

ohh,I see.

2019-05-20 14020, 2019

11:08 AM
ruaok

but, I can look at the two pages and not correlate them.

2019-05-20 14034, 2019

11:08 AM
pristine__

The three HTMLs correspond to three scripts that we use in the whole process. and they are in that order. there is a link at the bottom to go to the next.

2019-05-20 14024, 2019

11:09 AM
pristine__

did you read about the playcounts-df in the "data collection" HTMl?

2019-05-20 14044, 2019

11:09 AM
pristine__

I mean playcounts-dataframe

2019-05-20 14015, 2019

11:10 AM
ruaok

hmm, ok.

2019-05-20 14020, 2019

11:10 AM
ruaok

but I see a problem.

2019-05-20 14027, 2019

11:10 AM
pristine__

yeah?

2019-05-20 14030, 2019

11:10 AM
ruaok

you're using dates to emit filenames.

2019-05-20 14038, 2019

11:10 AM
ruaok

there will be multiple runs on the same day.

2019-05-20 14033, 2019

11:11 AM
ruaok

and I suppose that the reports will be sufficiently linked if the reader can go bidirectionally.

2019-05-20 14006, 2019

11:12 AM
pristine__

yeah. I have used dates to name html files, It should be changed.

2019-05-20 14011, 2019

11:12 AM
ruaok

back and forward, ya?

2019-05-20 14025, 2019

11:12 AM
ruaok

you might consider generating a UUID for a "run"

2019-05-20 14047, 2019

11:12 AM
pristine__

was thinking the same. thanks :)

2019-05-20 14052, 2019

11:12 AM
ruaok

data-collection-C98E3B93-EC03-482D-B4EC-31EDF86AB58E.html

2019-05-20 14059, 2019

11:12 AM
ruaok

recommendations-C98E3B93-EC03-482D-B4EC-31EDF86AB58E.html