#metabrainz

/

13:53 PM
ruaok

jmp_music: send me a private message with your email and I'll invite you.

2020-05-08 12940, 2020

13:54 PM
alastairp

a great first step would be to train a model with gaia to ensure that you know how the process works, and then to reproduce that process in scikit learn

2020-05-08 12927, 2020

13:55 PM
jmp_music

alastairp: Ok thanks! I downloaded also the datasets via the link you provided to me a few days ago.

2020-05-08 12956, 2020

13:55 PM
alastairp

in gaia we do a grid search with about 700 C/gamma values and feature permutations. There's a configuration file which lists these parameters (https://github.com/MTG/gaia/blob/master/src/bindi…) It would be good to have something similar in scikit learn, but it doesn't have to use this configuration file

2020-05-08 12926, 2020

13:56 PM
alastairp

I understand that sklearn has a number of helper tools for grid search, so it seems like it would be a good idea to use that as much as possible

2020-05-08 12921, 2020

13:59 PM
jmp_music

I could test the sklearn's GridSearch embedded algorithm as well as the RandomizedSearchCV too

2020-05-08 12934, 2020

13:59 PM
jmp_music

to see which one could provide better results

2020-05-08 12921, 2020

14:00 PM
alastairp

yeah, those were the ones that I was thinking of

2020-05-08 12950, 2020

14:00 PM
jmp_music

I agree with you to start by training a similar model to gaia and compare its results. As I saw from the datasets you provided to me, the problem is a multilabel classification and now a multiclass one

2020-05-08 12954, 2020

14:00 PM
jmp_music

not*

2020-05-08 12939, 2020

14:01 PM
jmp_music

Each row of the dataset inludes its MBID, and some genres labeled to the track

2020-05-08 12950, 2020

14:01 PM
jmp_music

the genres are from 1 to 30

2020-05-08 12908, 2020

14:02 PM
alastairp

for those datasets, yes - but this isn't our only classification task. The main reason that I sent you those datasets so that you could download them and have a local copy of a few thousand .json files

2020-05-08 12918, 2020

14:02 PM
alastairp

in case you need more data

2020-05-08 12936, 2020

14:02 PM
alastairp

here are our current datasets: https://acousticbrainz.org/datasets/accuracy

2020-05-08 12929, 2020

14:04 PM
alastairp

these are much more "traditional" single-label datasets

2020-05-08 12939, 2020

14:04 PM
alastairp

these are the ones that we should focus on first

2020-05-08 12912, 2020

14:05 PM
jmp_music

Ok! By this link I understand that it refers to a multiclass problem

2020-05-08 12959, 2020

14:05 PM
jmp_music

The outcomes are from the SVM's decision function probabilities. Am I right?

2020-05-08 12904, 2020

14:06 PM
alastairp

yes

2020-05-08 12934, 2020

14:06 PM
alastairp

see the output of a high-level model: https://acousticbrainz.org/4792f85c-ba03-43db-af4…

2020-05-08 12901, 2020

14:07 PM
alastairp

in acousticbrainz we use "low-level" to mean features - these are extracted from audio files

2020-05-08 12913, 2020

14:07 PM
alastairp

and "high-level" means the results of an ML model

2020-05-08 12935, 2020

14:09 PM
jmp_music

I have figured out how this works. However, there are some questions that I would like to ask you.

2020-05-08 12914, 2020

14:10 PM
jmp_music

I checked the low-level data, and I saw that some of the features are lists (arrays)

2020-05-08 12919, 2020

14:11 PM
jmp_music

these lists have a standard length of values, except the "rhythm_beats_position"

2020-05-08 12942, 2020

14:11 PM
alastairp

right, because songs are different lengths 😅

2020-05-08 12956, 2020

14:11 PM
jmp_music

ahahaha

2020-05-08 12944, 2020

14:12 PM
jmp_music

that's right. And my question is, if there is a post-process that exports a feature from these values

2020-05-08 12955, 2020

14:12 PM
jmp_music

e.g. taking the mean, etc.

2020-05-08 12926, 2020

14:13 PM
jmp_music

or the length of the list as a value (with the python's len() method)

2020-05-08 12954, 2020

14:13 PM
jmp_music

I shoould start checking gaia for this process if it takes place there

2020-05-08 12907, 2020

14:14 PM
alastairp

for that specific value, I don't think so

2020-05-08 12922, 2020

14:14 PM
alastairp

yes, good idea. I was just looking at https://github.com/MTG/gaia/blob/master/src/bindi…

2020-05-08 12945, 2020

14:14 PM
alastairp

and I don't see any specific behaviour for rhythm_beats_position

2020-05-08 12918, 2020

14:15 PM
alastairp

that's a good question though, I wonder if we should remove this from the data before building the model... it seems like it has the potential to introduce bad training data

2020-05-08 12903, 2020

14:17 PM
jmp_music

Let me check about it and the other features too. Maybe some of them could be dropped before the training process and thus speed up the training time

2020-05-08 12916, 2020

14:18 PM
alastairp

good idea, but let's focus on that after we've reproduced the existing models

2020-05-08 12937, 2020

14:19 PM
jmp_music

yes of course.

2020-05-08 12941, 2020

14:21 PM
jmp_music

for the training process and during the labeling of the data all these classes should be included?

2020-05-08 12942, 2020

14:21 PM
jmp_music

['danceability', 'gender', 'genre_dortmund', 'genre_electronic', 'genre_rosamerica', 'genre_tzanetakis', 'ismir04_rhythm', 'mood_acoustic', 'mood_aggressive', 'mood_electronic', 'mood_happy', 'mood_party', 'mood_relaxed', 'mood_sad', 'moods_mirex', 'timbre', 'tonal_atonal', 'voice_instrumental']

2020-05-08 12951, 2020

14:22 PM
alastairp

those are all individual models

2020-05-08 12926, 2020

14:23 PM
alastairp

but yes, we should build new models for all of these datasets

2020-05-08 12912, 2020

14:24 PM
yvanzo

bitmap, reosarevok: Is v-2020-05-18 supposed to be pg12 only? If so, should we freeze master until then?

2020-05-08 12939, 2020

14:24 PM
reosarevok

Most our schema changes have included unrelated code too

2020-05-08 12957, 2020

14:24 PM
reosarevok

So I'd expect no, but if bitmap thinks it's important to make it PG12 only, then we can

2020-05-08 12955, 2020

14:26 PM
jmp_music

alastairp: I 'll be waiting for the dataset link and I'll come up with updates.

2020-05-08 12911, 2020

14:27 PM
jmp_music

Is the dataset already labeled?

2020-05-08 12915, 2020

14:27 PM
alastairp

yes

2020-05-08 12927, 2020

14:29 PM
jmp_music

Alastair, thank you very much for your introduction to the project and its needs

2020-05-08 12900, 2020

14:30 PM
alastairp

no problem. we're looking forward to your work

2020-05-08 12952, 2020

14:42 PM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #1503 (master…MBS-9340): MBS-9340: Only allow mul and zxx as the only work language https://github.com/metabrainz/musicbrainz-server/…

2020-05-08 12953, 2020

14:42 PM
BrainzBot

MBS-9340: Don't allow more languages if [No lyrics] is selected https://tickets.metabrainz.org/browse/MBS-9340

2020-05-08 12906, 2020

14:43 PM
yvanzo

reosarevok: sure, it is just we won't have two weeks as usual.

2020-05-08 12916, 2020

14:43 PM
reosarevok

Oh, I see

2020-05-08 12932, 2020

14:43 PM
reosarevok

Well, then we could do it so that we only merge bugfixes or something?

2020-05-08 12934, 2020

14:43 PM
reosarevok

Dunno

2020-05-08 12934, 2020

14:47 PM
yvanzo

We could also merge pg12 instead of master into beta/production.

2020-05-08 12959, 2020

14:48 PM
yvanzo

Since tags are on production, that would not require to freeze master at all.

2020-05-08 12947, 2020

14:56 PM
Cyna[m]

reosarevok:

2020-05-08 12957, 2020

14:56 PM
Cyna[m]

made changes and pushed :)

2020-05-08 12957, 2020

15:03 PM
reosarevok

I saw, I'm going to test

2020-05-08 12940, 2020

15:29 PM
jmp_music has quit

2020-05-08 12951, 2020

15:31 PM
ishaanshah[m]

iliekcomputers: Hi, can we do our meeting a bit earlier today?

2020-05-08 12913, 2020

15:33 PM
iliekcomputers

sure

2020-05-08 12930, 2020

15:34 PM
iliekcomputers

i haven't been able to look at the PR again

2020-05-08 12941, 2020

15:34 PM
ishaanshah[m]

I added another parameter to the api endpoint today, "offset"

2020-05-08 12941, 2020

15:34 PM
iliekcomputers

i'll try to do that tomorrow

2020-05-08 12900, 2020

15:35 PM
ishaanshah[m]

I figured we would need for pagination

2020-05-08 12917, 2020

15:35 PM
iliekcomputers

true, that makes sense

2020-05-08 12920, 2020

15:36 PM
ishaanshah[m]

Zastai proposed that we should use timestamp instead of all_time...

2020-05-08 12932, 2020

15:36 PM
iliekcomputers

right, i saw that

2020-05-08 12942, 2020

15:36 PM
iliekcomputers

i think the proposal makes sense

2020-05-08 12955, 2020

15:36 PM
ishaanshah[m]

Although, I am not sure we can do that because we calculate stats in batch

2020-05-08 12907, 2020

15:37 PM
iliekcomputers

eventually however, right now it isn't really feasible

2020-05-08 12941, 2020

15:37 PM
ishaanshah[m]

Yes, maybe later we can make spark work for on demand queries

2020-05-08 12955, 2020

15:37 PM
ishaanshah[m]

I will open a ticket for that then

2020-05-08 12958, 2020

15:37 PM
iliekcomputers

i'm happy to open a ticket and think more about it

2020-05-08 12905, 2020

15:38 PM
iliekcomputers

sounds good

2020-05-08 12909, 2020

15:38 PM
iliekcomputers

one small thing

2020-05-08 12920, 2020

15:38 PM
iliekcomputers

the endpoint is `artist` rn

2020-05-08 12928, 2020

15:38 PM
iliekcomputers

`artists`

2020-05-08 12939, 2020

15:38 PM
iliekcomputers

would be better

2020-05-08 12903, 2020

15:39 PM
ishaanshah[m]

Ya sure I will change that

2020-05-08 12940, 2020

15:39 PM
ishaanshah[m]

Other than that I am done with the rendering and processing part for artist grapj

2020-05-08 12905, 2020

15:40 PM
ishaanshah[m]

I have fixed LB-570 too

2020-05-08 12905, 2020

15:40 PM
BrainzBot

LB-570: Artist graph: long artist names should wrap https://tickets.metabrainz.org/browse/LB-570

2020-05-08 12914, 2020

15:40 PM
iliekcomputers

oh awesome

2020-05-08 12940, 2020

15:40 PM
Mr_Monkey

👍

2020-05-08 12944, 2020

15:40 PM
iliekcomputers

i think getting a review of the design from Mr_Monkey would be helpful, if he gets the time

2020-05-08 12947, 2020

15:40 PM
ishaanshah[m]

I figured I will open another PR for LB-547

2020-05-08 12948, 2020

15:40 PM
BrainzBot

LB-547: Artist chart bugs https://tickets.metabrainz.org/browse/LB-547

2020-05-08 12919, 2020

15:41 PM
ishaanshah[m]

Because the current PR has already become large

2020-05-08 12927, 2020

15:41 PM
Mr_Monkey

I'm going to be jumping back into LB more in the coming months, so I'll definitely be able to find some time, looking at it a lot :)

2020-05-08 12901, 2020

15:42 PM
ishaanshah[m]

Mr_Monkey thanks I will post a screenshot

2020-05-08 12958, 2020

15:43 PM
ishaanshah[m]

iliekcomputers The only part remaining in the the graph PR is fetching the stats from the backend

2020-05-08 12915, 2020

15:44 PM
ishaanshah[m]

I will do that when the endpoint PR is merged

2020-05-08 12934, 2020

15:45 PM
iliekcomputers

ok. i'll try to merge over the weekend

2020-05-08 12903, 2020

15:46 PM
ishaanshah[m]

Cool, thanks a lot :)

2020-05-08 12908, 2020

15:46 PM
sumedh has quit

2020-05-08 12958, 2020

15:46 PM
Zastai has quit

2020-05-08 12903, 2020

15:48 PM
sumedh joined the channel

2020-05-08 12904, 2020

15:48 PM
ishaanshah[m]

Also a small bug

2020-05-08 12949, 2020

15:48 PM
ishaanshah[m]

https://listenbrainz.org/user/ishaanshah/

2020-05-08 12955, 2020

15:48 PM
ishaanshah[m]

This page shows 404

2020-05-08 12911, 2020

15:49 PM
shivam-kapila

Yeah that /

2020-05-08 12917, 2020

15:49 PM
ishaanshah[m]

Because of the extra slash

2020-05-08 12951, 2020

15:50 PM
iliekcomputers

open a ticket

2020-05-08 12923, 2020

15:51 PM
ishaanshah[m]

Ya, sure

2020-05-08 12931, 2020

15:52 PM
CatQuest joined the channel

2020-05-08 12931, 2020

15:52 PM
CatQuest has quit

2020-05-08 12931, 2020

15:52 PM
CatQuest joined the channel

2020-05-08 12916, 2020

15:59 PM
reosarevok

Cyna[m]: there's some issues with those historical edits, but I can work on that, since I've been dealing with historical edits anyway

2020-05-08 12945, 2020

16:01 PM
Cyna[m]

Once the two open PRs are merged... I continue with next entity

2020-05-08 12919, 2020

16:31 PM
Freso

ruaok: Looking at https://test.listenbrainz.org/user/Freso vs. https://listenbrainz.org/user/Freso they both report the same Listen count, but the most recent listens are not the same. Is the listen count read from the same source for both sites?

2020-05-08 12900, 2020

16:32 PM
shivam-kapila

Freso: IIRC test.lb.org uses same redis as prod.

2020-05-08 12916, 2020

16:32 PM
Freso

shivam-kapila: Right.

2020-05-08 12957, 2020

16:35 PM
sumedh has quit

2020-05-08 12918, 2020

16:49 PM
BrainzGit

[bookbrainz-site] prabalsingh24 opened pull request #423 (master…add-user-in-search): search: add 'editor' type in the search result https://github.com/bookbrainz/bookbrainz-site/pul…

2020-05-08 12951, 2020

16:54 PM
supersandro20005 has quit

2020-05-08 12904, 2020

16:55 PM
supersandro2000 joined the channel

2020-05-08 12918, 2020

17:04 PM
lazka joined the channel

2020-05-08 12925, 2020

17:05 PM
lazka

Does anyone know who manages the MB Ubuntu PPA?

2020-05-08 12958, 2020

17:08 PM
supersandro2000

maybe this page helps you https://launchpad.net/~metabrainz/+members

2020-05-08 12915, 2020

17:09 PM
reosarevok

zas: that you? ^

2020-05-08 12930, 2020

17:09 PM
reosarevok

outsidecontext: ? ^ :)

2020-05-08 12955, 2020

17:09 PM
reosarevok

(assuming you mean the Picard one)

2020-05-08 12911, 2020

17:11 PM
lazka

I mean this one: https://launchpad.net/~musicbrainz-developers/+me…

2020-05-08 12908, 2020

17:13 PM
reosarevok

Yeah, I suspect zas and outsidecontext are the most likely to be involved

2020-05-08 12935, 2020

17:13 PM
lazka

ah, there is a history with user names for builds, so phillipp wolfer I guess

2020-05-08 12916, 2020

17:14 PM
reosarevok

That'd be outsidecontext then

2020-05-08 12920, 2020

17:14 PM
reosarevok

What did you need? :)

2020-05-08 12900, 2020

17:15 PM
lazka

outsidecontext, you copied some of my mutagen packages into the PPA but because I moved the mutagen tools from the python2 to python3 package you also need to copy the python2-mutagen variants

2020-05-08 12931, 2020

17:15 PM
lazka

(some user emailed me about it)

2020-05-08 12959, 2020

17:15 PM
lazka

I should have added a version conflict I guess, but didn't think of the copying to other PPAs case

2020-05-08 12932, 2020

17:16 PM
adhawkins has quit

2020-05-08 12946, 2020

17:17 PM
adhawkins joined the channel

2020-05-08 12946, 2020

17:17 PM
reosarevok

I'll send him an email about it too, in case he misses this :)

2020-05-08 12910, 2020

17:25 PM
BrainzGit

[musicbrainz-docker] yvanzo merged pull request #145 (mbvm-38-dev…recv-keys): Try reaching different PGP servers/pools if needed https://github.com/metabrainz/musicbrainz-docker/…

2020-05-08 12921, 2020

17:28 PM
outsidecontext

lazka: thanks for the info, I'll look into it.

2020-05-08 12949, 2020

17:30 PM
outsidecontext

lazka: do I get this right: the issue is a file conflict, if one has the python2 package installed?

2020-05-08 12931, 2020

17:39 PM
lazka

outsidecontext, if you install the py3 one it tries to install tools owned by the py2 package. To work around this I added a py2 variant which doesn't include the tools

2020-05-08 12916, 2020

17:40 PM
lazka

so, yes

2020-05-08 12920, 2020

17:49 PM
shivam-kapila

ruaok: Even though the query is heavy but we tend to do too much in /user/<user_name> route.

2020-05-08 12942, 2020

17:49 PM
shivam-kapila

We even call this heavy query twice. And the second time its totally unbound

2020-05-08 12956, 2020

17:49 PM
shivam-kapila

https://github.com/metabrainz/listenbrainz-server…

2020-05-08 12940, 2020

17:50 PM
ruaok

yeah, we need to rethink that. :)

2020-05-08 12900, 2020

17:51 PM
shivam-kapila

We also fetch min/max timestamps for the user

2020-05-08 12921, 2020

17:51 PM
shivam-kapila

Wont the latest_listen_ts be mostly equal to max_ts

2020-05-08 12937, 2020

17:51 PM
shivam-kapila

If its so then we can totally nuke out this query