#metabrainz

/

0:05 AM
supersandro2000 has quit

2020-11-10 31548, 2020

0:05 AM
supersandro2000 joined the channel

2020-11-10 31511, 2020

0:33 AM
d4rkie has quit

2020-11-10 31505, 2020

0:34 AM
D4RK-PH0ENiX joined the channel

2020-11-10 31545, 2020

1:36 AM
MajorLurker joined the channel

2020-11-10 31523, 2020

5:04 AM
BestSteve has quit

2020-11-10 31508, 2020

5:11 AM
BestSteve joined the channel

2020-11-10 31528, 2020

5:25 AM
shivam-kapila

Morning

2020-11-10 31548, 2020

5:25 AM
shivam-kapila

Jira finally has a mobile app.

2020-11-10 31538, 2020

6:17 AM
BestSteve has quit

2020-11-10 31526, 2020

6:18 AM
BestSteve joined the channel

2020-11-10 31505, 2020

6:20 AM
supersandro2000 has quit

2020-11-10 31523, 2020

6:20 AM
supersandro2000 joined the channel

2020-11-10 31522, 2020

7:15 AM
outsidecontext

does anyone know a release MBID that has been merged into another (so it would be redirected to the new one)?

2020-11-10 31553, 2020

7:38 AM
yvanzo

outsidecontext: 6814e030-34bd-402d-8661-8ff625062e45

2020-11-10 31541, 2020

7:41 AM
sumedh joined the channel

2020-11-10 31545, 2020

8:01 AM
outsidecontext

yvanzo: thanks a lot

2020-11-10 31525, 2020

8:22 AM
sumedh has quit

2020-11-10 31525, 2020

8:26 AM
v6lur joined the channel

2020-11-10 31534, 2020

8:47 AM
antlarr has quit

2020-11-10 31504, 2020

8:50 AM
antlarr joined the channel

2020-11-10 31534, 2020

9:00 AM
djinni` has quit

2020-11-10 31509, 2020

9:01 AM
Lartza has quit

2020-11-10 31501, 2020

9:02 AM
Lartza joined the channel

2020-11-10 31555, 2020

9:03 AM
djinni` joined the channel

2020-11-10 31521, 2020

9:41 AM
ruaok

alastairp: I took a look at PG text search last night. and full text search is very much not what is needed for this task.

2020-11-10 31549, 2020

9:41 AM
ruaok

fuzzy searching, not NLP. certainly, a language based solution isn't going to work.

2020-11-10 31550, 2020

9:43 AM
alastairp

for what? The messybrainz matching, or lookups for the playlists?

2020-11-10 31528, 2020

9:46 AM
ruaok

yes

2020-11-10 31551, 2020

9:46 AM
ruaok

this however, is very promising: https://typesense.org/

2020-11-10 31510, 2020

9:47 AM
reosarevok

iliekcomputers: if someone has listens in LB not imported from last.fm (so just dropping and reimporting isn't ideal) what is plan B again?

2020-11-10 31512, 2020

9:49 AM
alastairp

> Here's a live demo showing Typesense in action on a songs dataset from MusicBrainz: songs-search.typesense.org

2020-11-10 31520, 2020

9:49 AM
alastairp

haha, well, I guess they have our use-case wrapped up

2020-11-10 31539, 2020

9:49 AM
ruaok

wait, what?

2020-11-10 31542, 2020

9:49 AM
ruaok

holy shit.

2020-11-10 31546, 2020

9:49 AM
ruaok

i didn't see that example.

2020-11-10 31558, 2020

9:49 AM
alastairp

on the github repo in the readme

2020-11-10 31544, 2020

9:50 AM
alastairp

honestly, if it works out of the box, and if they have a MB demo, I'd definitely consider it at the top of the list

2020-11-10 31507, 2020

9:51 AM
ruaok

"several small spices gathered in a cave and grooving with a pict"

2020-11-10 31514, 2020

9:51 AM
ruaok

doesn't find anything though.

2020-11-10 31528, 2020

9:51 AM
ruaok

I would think it should. maybe some config tweaking and it would.

2020-11-10 31557, 2020

9:51 AM
ruaok

`groove amanda` does work. good.

2020-11-10 31537, 2020

9:52 AM
alastairp

as does 'several pict'

2020-11-10 31505, 2020

9:53 AM
alastairp

and it's fast

2020-11-10 31514, 2020

9:53 AM
ruaok

it is certainly low effort to try it.

2020-11-10 31513, 2020

9:55 AM
alastairp

gpl license, too! no problems for integration it seems

2020-11-10 31529, 2020

9:55 AM
ruaok

and docker images out of the box.

2020-11-10 31532, 2020

9:55 AM
alastairp

I wonder how good it is for typo-tolerance

2020-11-10 31554, 2020

9:55 AM
ruaok

that is one of the key stated goals.

2020-11-10 31513, 2020

9:56 AM
alastairp

should we just ask them for their import scripts too?!? :) or try and load it ourselves?

2020-11-10 31540, 2020

9:56 AM
ruaok

not for my use case. I have two columns I want to index.

2020-11-10 31510, 2020

9:57 AM
alastairp

do we need/want autocomplete on MB search?

2020-11-10 31528, 2020

9:58 AM
alastairp

ah, here's their MB loader: https://github.com/typesense/showcase-songs-search

2020-11-10 31551, 2020

9:58 AM
ruaok

I bet some people would, but let me kick the tyres first.

2020-11-10 31531, 2020

10:00 AM
alastairp

https://twitter.com/jasonbosco/status/13233248701…

2020-11-10 31510, 2020

10:01 AM
ruaok

wait, that recent? woah.

2020-11-10 31538, 2020

10:01 AM
alastairp

haha, yeah!

2020-11-10 31552, 2020

10:01 AM
alastairp

he's the cofounder

2020-11-10 31511, 2020

10:02 AM
alastairp

cool. this seems super exiting, let me know what you come up with

2020-11-10 31513, 2020

10:02 AM
ruaok

it would be cool to turn around and how us having built something with it.

2020-11-10 31505, 2020

10:03 AM
ruaok

will do. I think this slots in perfectly after the mapping is calculated. the mapping itself isn't that useful in context of ACRP, but it shows us "these are the tracks that can't be matched with exact matching". its a perfect test dataset.

2020-11-10 31530, 2020

10:03 AM
ruaok

and I can feel the "add MBIDs to listens in timescale" getting closer to reality.

2020-11-10 31551, 2020

10:03 AM
ruaok

which would make so much more code on the spark side simpler for not having to match MBIDs as part of the process.

2020-11-10 31501, 2020

10:05 AM
Gazooo79494 has quit

2020-11-10 31544, 2020

10:06 AM
Gazooo79494 joined the channel

2020-11-10 31505, 2020

10:25 AM
HorusHorrendus has quit

2020-11-10 31554, 2020

10:25 AM
HorusHorrendus joined the channel

2020-11-10 31526, 2020

10:26 AM
mruszczyk has quit

2020-11-10 31512, 2020

10:28 AM
mruszczyk joined the channel

2020-11-10 31526, 2020

10:46 AM
BrainzGit

[troi-recommendation-playground] alastair opened pull request #28 (main…remove-runtimeerror): Remove runtimeerror https://github.com/metabrainz/troi-recommendation…

2020-11-10 31557, 2020

10:54 AM
BrainzGit

[docker-python] yvanzo merged pull request #8 (master…master): Update python minor versions and upgrade pip to the latest 20.2.3 https://github.com/metabrainz/docker-python/pull/8

2020-11-10 31556, 2020

10:55 AM
alastairp

thanks for working on that push script, yvanzo!

2020-11-10 31500, 2020

10:56 AM
alastairp takes it off his todo list

2020-11-10 31536, 2020

10:58 AM
pristine___

alastairp: ping me when you are up for the meeting!

2020-11-10 31522, 2020

11:01 AM
BrainzGit

[troi-recommendation-playground] mayhem merged pull request #28 (main…remove-runtimeerror): Remove runtimeerror https://github.com/metabrainz/troi-recommendation…

2020-11-10 31526, 2020

11:02 AM
alastairp

hi pristine___, I'm here. just revising the documents that you sent

2020-11-10 31517, 2020

11:04 AM
Nyanko-sensei has quit

2020-11-10 31516, 2020

11:06 AM
BrainzGit

[docker-python] yvanzo opened pull request #9 (master…date-tag): Use creation date for tagging and pushing Docker images https://github.com/metabrainz/docker-python/pull/9

2020-11-10 31508, 2020

11:07 AM
yvanzo

alastairp: not sure why I cannot request your review for this PR through GitHub, so I’m doing it here ^

2020-11-10 31529, 2020

11:07 AM
alastairp

maybe I'm not in an admin team on the repo. will look

2020-11-10 31538, 2020

11:07 AM
alastairp

pristine___: OK, I'm here. how are you?

2020-11-10 31558, 2020

11:07 AM
Nyanko-sensei joined the channel

2020-11-10 31515, 2020

11:08 AM
alastairp

Let's start with the artist recommendations doc

2020-11-10 31521, 2020

11:08 AM
pristine___

Great!

2020-11-10 31550, 2020

11:08 AM
alastairp

I haven't been following your discussions with ruaok. Can you give me a very quick 2-3 line overview of what you're planning on doing?

2020-11-10 31507, 2020

11:09 AM
pristine___

Sure

2020-11-10 31516, 2020

11:09 AM
discopatrick has quit

2020-11-10 31513, 2020

11:10 AM
pristine___

So rn, we are generating recording recommendations for users (the playlists), we thought of generating artist recommendations too, as in artists users might like. Roughly, it can help us in refining the daily jams/playlists.

2020-11-10 31536, 2020

11:12 AM
alastairp

and what are your thoughts on the artist recommendations? How are you planning on collecting the training data, and what will be the input and output to the model?

2020-11-10 31550, 2020

11:13 AM
pristine___

Umm... I have written that in the doc. The training data will be, as I plan, the playcounts/artistcounts, as in how many times a user has listened to a particular artists. It's implicit.

2020-11-10 31543, 2020

11:14 AM
pristine___

So we can use listens of past month/year to fetch these artist counts.

2020-11-10 31550, 2020

11:14 AM
pristine___

And train model on these

2020-11-10 31502, 2020

11:15 AM
alastairp

ok, cool

2020-11-10 31521, 2020

11:15 AM
alastairp

I'm looking at this 'create dataframes' section in the document

2020-11-10 31533, 2020

11:15 AM
pristine___

https://github.com/metabrainz/listenbrainz-server…

2020-11-10 31539, 2020

11:15 AM
pristine___

Something like this.

2020-11-10 31540, 2020

11:15 AM
alastairp

it's not clear to me if these are all new dataframes

2020-11-10 31548, 2020

11:15 AM
pristine___

New?

2020-11-10 31557, 2020

11:15 AM
alastairp

do they currently exist?

2020-11-10 31546, 2020

11:16 AM
pristine___

No.

2020-11-10 31556, 2020

11:16 AM
pristine___

The users df

2020-11-10 31558, 2020

11:17 AM
pristine___

Exists now, as in it is used for recording recs, but I will want to have a separate users df, stored in a separate dir for artist recs, something like `/recs/artist/df/users.parquet`

2020-11-10 31558, 2020

11:17 AM
alastairp

the names of these dataframes seem really generic - especially the one that you've named playcounts_df, this doesn't have anything in the name that says that they are _artist_ playcounts

2020-11-10 31542, 2020

11:18 AM
pristine___

Yeah, right I missed that. It can be users_df, listens_df, artists_df, and artistcount_df

2020-11-10 31556, 2020

11:18 AM
alastairp

given a listen, what are the steps for putting it into these tables?

2020-11-10 31527, 2020

11:19 AM
pristine___

Yeah, so the listens are first mapped with the mapping.

2020-11-10 31534, 2020

11:19 AM
alastairp

the input to the model will be a matrix of User / Artist, right? With counts in the cells

2020-11-10 31543, 2020

11:19 AM
pristine___

Yes

2020-11-10 31512, 2020

11:20 AM
pristine___

Then we fetch distinct users and assign them a user ID and prepare users_df

2020-11-10 31529, 2020

11:20 AM
alastairp

right. so the description of these dataframes don't explain to me why each of them are needed in this setup. it's difficult to follow how the data flows into these dfs

2020-11-10 31532, 2020

11:20 AM
pristine___

Fetch distinct artist, assign artist ID and prepare artist df

2020-11-10 31553, 2020

11:20 AM
alastairp

what's an artist ID, and why do you need it?

2020-11-10 31505, 2020

11:21 AM
pristine___

Users_df, artist_df and listend_df are needed to prepare artistcount_df

2020-11-10 31512, 2020

11:21 AM
pristine___

Yeah, so the IDs

2020-11-10 31558, 2020

11:21 AM
pristine___

The user ID and artist ID, it is assigned like this

2020-11-10 31501, 2020

11:22 AM
pristine___

https://github.com/metabrainz/listenbrainz-server…

2020-11-10 31531, 2020

11:22 AM
pristine___

The model takes input of the form (int, int bigint)

2020-11-10 31545, 2020

11:22 AM
pristine___

Int, int, bigint

2020-11-10 31552, 2020

11:22 AM
pristine___

So we cannot just pass an mbid

2020-11-10 31554, 2020

11:22 AM
pristine___

Or string

2020-11-10 31500, 2020

11:23 AM
pristine___

To identify user, artist

2020-11-10 31506, 2020

11:23 AM
pristine___

We assign them IDs

2020-11-10 31531, 2020

11:23 AM
alastairp

great. so it's a mapping from our input to the indexes in the matrix

2020-11-10 31526, 2020

11:24 AM
pristine___

I won't say indexes, but yeah we can use ids to later reference mbids/names etc

2020-11-10 31500, 2020

11:25 AM
alastairp

can you please edit the document to make this clearer? Explicitly describe the matrix format, and explain that each of these other tables is for the mapping

2020-11-10 31523, 2020

11:25 AM
alastairp

index - I mean how to reference a particular row or column.

2020-11-10 31529, 2020

11:25 AM
pristine___

> Explicitly describe the matrix format, and explain that each of these other tables is for the mapping

2020-11-10 31550, 2020

11:25 AM
pristine___

So we generally use the term mapping for msid->mbid mapping

2020-11-10 31556, 2020

11:25 AM
pristine___

Sorry, I want clear on that

2020-11-10 31559, 2020

11:25 AM
pristine___

Wasn't

2020-11-10 31503, 2020

11:26 AM
pristine___

Earlier

2020-11-10 31506, 2020

11:26 AM
pristine___

Will edit

2020-11-10 31519, 2020

11:26 AM
pristine___

Do you want me to edit rn, or after the discussion?

2020-11-10 31529, 2020

11:26 AM
alastairp

anything that goes from one identifier to another identifier is a mapping

2020-11-10 31536, 2020

11:26 AM
pristine___

Yeah

2020-11-10 31539, 2020

11:26 AM
alastairp

users_df is a mapping from a username to an integer

2020-11-10 31545, 2020

11:26 AM
alastairp

I don't mind when you edit it

2020-11-10 31553, 2020

11:26 AM
pristine___

Cool

2020-11-10 31518, 2020

11:27 AM
alastairp

why do we need 3 tables? what's listens_df?

2020-11-10 31503, 2020

11:28 AM
pristine___

Can you have a look at this join

2020-11-10 31505, 2020

11:28 AM
pristine___

https://github.com/metabrainz/listenbrainz-server…

2020-11-10 31512, 2020

11:28 AM
pristine___

This explains the idea

2020-11-10 31547, 2020

11:28 AM
pristine___

Listens_df is nothing but the listens as such, I have just filtered the columns/fields I need

2020-11-10 31547, 2020

11:28 AM
alastairp

can you explain it to me?

2020-11-10 31552, 2020

11:28 AM
pristine___

Yeah

2020-11-10 31558, 2020

11:28 AM
alastairp

oh, it's an existing table?

2020-11-10 31508, 2020

11:30 AM
pristine___

We just do, listens_df = listens.select(`arist_credit_id`, `user_name`).

2020-11-10 31514, 2020

11:30 AM
alastairp

you just said before that these are all new dataframes

2020-11-10 31550, 2020

11:30 AM
pristine___

Yes. New in the sense, we are creating them from the listens table (Submitted to LB)

2020-11-10 31509, 2020

11:31 AM
alastairp

is there a specific technical reason why this is needed? So it seems like you're going from `listens` -> `listens_df` -> users_df and artists_df -> `playcounts_df`