#metabrainz

/

      • supersandro2000 has quit
      • 2020-11-10 31548, 2020

      • supersandro2000 joined the channel
      • 2020-11-10 31511, 2020

      • d4rkie has quit
      • 2020-11-10 31505, 2020

      • D4RK-PH0ENiX joined the channel
      • 2020-11-10 31545, 2020

      • MajorLurker joined the channel
      • 2020-11-10 31523, 2020

      • BestSteve has quit
      • 2020-11-10 31508, 2020

      • BestSteve joined the channel
      • 2020-11-10 31528, 2020

      • shivam-kapila
        Morning
      • 2020-11-10 31548, 2020

      • shivam-kapila
        Jira finally has a mobile app.
      • 2020-11-10 31538, 2020

      • BestSteve has quit
      • 2020-11-10 31526, 2020

      • BestSteve joined the channel
      • 2020-11-10 31505, 2020

      • supersandro2000 has quit
      • 2020-11-10 31523, 2020

      • supersandro2000 joined the channel
      • 2020-11-10 31522, 2020

      • outsidecontext
        does anyone know a release MBID that has been merged into another (so it would be redirected to the new one)?
      • 2020-11-10 31553, 2020

      • yvanzo
        outsidecontext: 6814e030-34bd-402d-8661-8ff625062e45
      • 2020-11-10 31541, 2020

      • sumedh joined the channel
      • 2020-11-10 31545, 2020

      • outsidecontext
        yvanzo: thanks a lot
      • 2020-11-10 31525, 2020

      • sumedh has quit
      • 2020-11-10 31525, 2020

      • v6lur joined the channel
      • 2020-11-10 31534, 2020

      • antlarr has quit
      • 2020-11-10 31504, 2020

      • antlarr joined the channel
      • 2020-11-10 31534, 2020

      • djinni` has quit
      • 2020-11-10 31509, 2020

      • Lartza has quit
      • 2020-11-10 31501, 2020

      • Lartza joined the channel
      • 2020-11-10 31555, 2020

      • djinni` joined the channel
      • 2020-11-10 31521, 2020

      • ruaok
        alastairp: I took a look at PG text search last night. and full text search is very much not what is needed for this task.
      • 2020-11-10 31549, 2020

      • ruaok
        fuzzy searching, not NLP. certainly, a language based solution isn't going to work.
      • 2020-11-10 31550, 2020

      • alastairp
        for what? The messybrainz matching, or lookups for the playlists?
      • 2020-11-10 31528, 2020

      • ruaok
        yes
      • 2020-11-10 31551, 2020

      • ruaok
        this however, is very promising: https://typesense.org/
      • 2020-11-10 31510, 2020

      • reosarevok
        iliekcomputers: if someone has listens in LB not imported from last.fm (so just dropping and reimporting isn't ideal) what is plan B again?
      • 2020-11-10 31512, 2020

      • alastairp
        > Here's a live demo showing Typesense in action on a songs dataset from MusicBrainz: songs-search.typesense.org
      • 2020-11-10 31520, 2020

      • alastairp
        haha, well, I guess they have our use-case wrapped up
      • 2020-11-10 31539, 2020

      • ruaok
        wait, what?
      • 2020-11-10 31542, 2020

      • ruaok
        holy shit.
      • 2020-11-10 31546, 2020

      • ruaok
        i didn't see that example.
      • 2020-11-10 31558, 2020

      • alastairp
        on the github repo in the readme
      • 2020-11-10 31544, 2020

      • alastairp
        honestly, if it works out of the box, and if they have a MB demo, I'd definitely consider it at the top of the list
      • 2020-11-10 31507, 2020

      • ruaok
        "several small spices gathered in a cave and grooving with a pict"
      • 2020-11-10 31514, 2020

      • ruaok
        doesn't find anything though.
      • 2020-11-10 31528, 2020

      • ruaok
        I would think it should. maybe some config tweaking and it would.
      • 2020-11-10 31557, 2020

      • ruaok
        `groove amanda` does work. good.
      • 2020-11-10 31537, 2020

      • alastairp
        as does 'several pict'
      • 2020-11-10 31505, 2020

      • alastairp
        and it's fast
      • 2020-11-10 31514, 2020

      • ruaok
        it is certainly low effort to try it.
      • 2020-11-10 31513, 2020

      • alastairp
        gpl license, too! no problems for integration it seems
      • 2020-11-10 31529, 2020

      • ruaok
        and docker images out of the box.
      • 2020-11-10 31532, 2020

      • alastairp
        I wonder how good it is for typo-tolerance
      • 2020-11-10 31554, 2020

      • ruaok
        that is one of the key stated goals.
      • 2020-11-10 31513, 2020

      • alastairp
        should we just ask them for their import scripts too?!? :) or try and load it ourselves?
      • 2020-11-10 31540, 2020

      • ruaok
        not for my use case. I have two columns I want to index.
      • 2020-11-10 31510, 2020

      • alastairp
        do we need/want autocomplete on MB search?
      • 2020-11-10 31528, 2020

      • alastairp
      • 2020-11-10 31551, 2020

      • ruaok
        I bet some people would, but let me kick the tyres first.
      • 2020-11-10 31531, 2020

      • alastairp
      • 2020-11-10 31510, 2020

      • ruaok
        wait, that recent? woah.
      • 2020-11-10 31538, 2020

      • alastairp
        haha, yeah!
      • 2020-11-10 31552, 2020

      • alastairp
        he's the cofounder
      • 2020-11-10 31511, 2020

      • alastairp
        cool. this seems super exiting, let me know what you come up with
      • 2020-11-10 31513, 2020

      • ruaok
        it would be cool to turn around and how us having built something with it.
      • 2020-11-10 31505, 2020

      • ruaok
        will do. I think this slots in perfectly after the mapping is calculated. the mapping itself isn't that useful in context of ACRP, but it shows us "these are the tracks that can't be matched with exact matching". its a perfect test dataset.
      • 2020-11-10 31530, 2020

      • ruaok
        and I can feel the "add MBIDs to listens in timescale" getting closer to reality.
      • 2020-11-10 31551, 2020

      • ruaok
        which would make so much more code on the spark side simpler for not having to match MBIDs as part of the process.
      • 2020-11-10 31501, 2020

      • Gazooo79494 has quit
      • 2020-11-10 31544, 2020

      • Gazooo79494 joined the channel
      • 2020-11-10 31505, 2020

      • HorusHorrendus has quit
      • 2020-11-10 31554, 2020

      • HorusHorrendus joined the channel
      • 2020-11-10 31526, 2020

      • mruszczyk has quit
      • 2020-11-10 31512, 2020

      • mruszczyk joined the channel
      • 2020-11-10 31526, 2020

      • BrainzGit
        [troi-recommendation-playground] alastair opened pull request #28 (main…remove-runtimeerror): Remove runtimeerror https://github.com/metabrainz/troi-recommendation…
      • 2020-11-10 31557, 2020

      • BrainzGit
        [docker-python] yvanzo merged pull request #8 (master…master): Update python minor versions and upgrade pip to the latest 20.2.3 https://github.com/metabrainz/docker-python/pull/8
      • 2020-11-10 31556, 2020

      • alastairp
        thanks for working on that push script, yvanzo!
      • 2020-11-10 31500, 2020

      • alastairp takes it off his todo list
      • 2020-11-10 31536, 2020

      • pristine___
        alastairp: ping me when you are up for the meeting!
      • 2020-11-10 31522, 2020

      • BrainzGit
        [troi-recommendation-playground] mayhem merged pull request #28 (main…remove-runtimeerror): Remove runtimeerror https://github.com/metabrainz/troi-recommendation…
      • 2020-11-10 31526, 2020

      • alastairp
        hi pristine___, I'm here. just revising the documents that you sent
      • 2020-11-10 31517, 2020

      • Nyanko-sensei has quit
      • 2020-11-10 31516, 2020

      • BrainzGit
        [docker-python] yvanzo opened pull request #9 (master…date-tag): Use creation date for tagging and pushing Docker images https://github.com/metabrainz/docker-python/pull/9
      • 2020-11-10 31508, 2020

      • yvanzo
        alastairp: not sure why I cannot request your review for this PR through GitHub, so I’m doing it here ^
      • 2020-11-10 31529, 2020

      • alastairp
        maybe I'm not in an admin team on the repo. will look
      • 2020-11-10 31538, 2020

      • alastairp
        pristine___: OK, I'm here. how are you?
      • 2020-11-10 31558, 2020

      • Nyanko-sensei joined the channel
      • 2020-11-10 31515, 2020

      • alastairp
        Let's start with the artist recommendations doc
      • 2020-11-10 31521, 2020

      • pristine___
        Great!
      • 2020-11-10 31550, 2020

      • alastairp
        I haven't been following your discussions with ruaok. Can you give me a very quick 2-3 line overview of what you're planning on doing?
      • 2020-11-10 31507, 2020

      • pristine___
        Sure
      • 2020-11-10 31516, 2020

      • discopatrick has quit
      • 2020-11-10 31513, 2020

      • pristine___
        So rn, we are generating recording recommendations for users (the playlists), we thought of generating artist recommendations too, as in artists users might like. Roughly, it can help us in refining the daily jams/playlists.
      • 2020-11-10 31536, 2020

      • alastairp
        and what are your thoughts on the artist recommendations? How are you planning on collecting the training data, and what will be the input and output to the model?
      • 2020-11-10 31550, 2020

      • pristine___
        Umm... I have written that in the doc. The training data will be, as I plan, the playcounts/artistcounts, as in how many times a user has listened to a particular artists. It's implicit.
      • 2020-11-10 31543, 2020

      • pristine___
        So we can use listens of past month/year to fetch these artist counts.
      • 2020-11-10 31550, 2020

      • pristine___
        And train model on these
      • 2020-11-10 31502, 2020

      • alastairp
        ok, cool
      • 2020-11-10 31521, 2020

      • alastairp
        I'm looking at this 'create dataframes' section in the document
      • 2020-11-10 31533, 2020

      • pristine___
      • 2020-11-10 31539, 2020

      • pristine___
        Something like this.
      • 2020-11-10 31540, 2020

      • alastairp
        it's not clear to me if these are all new dataframes
      • 2020-11-10 31548, 2020

      • pristine___
        New?
      • 2020-11-10 31557, 2020

      • alastairp
        do they currently exist?
      • 2020-11-10 31546, 2020

      • pristine___
        No.
      • 2020-11-10 31556, 2020

      • pristine___
        The users df
      • 2020-11-10 31558, 2020

      • pristine___
        Exists now, as in it is used for recording recs, but I will want to have a separate users df, stored in a separate dir for artist recs, something like `/recs/artist/df/users.parquet`
      • 2020-11-10 31558, 2020

      • alastairp
        the names of these dataframes seem really generic - especially the one that you've named playcounts_df, this doesn't have anything in the name that says that they are _artist_ playcounts
      • 2020-11-10 31542, 2020

      • pristine___
        Yeah, right I missed that. It can be users_df, listens_df, artists_df, and artistcount_df
      • 2020-11-10 31556, 2020

      • alastairp
        given a listen, what are the steps for putting it into these tables?
      • 2020-11-10 31527, 2020

      • pristine___
        Yeah, so the listens are first mapped with the mapping.
      • 2020-11-10 31534, 2020

      • alastairp
        the input to the model will be a matrix of User / Artist, right? With counts in the cells
      • 2020-11-10 31543, 2020

      • pristine___
        Yes
      • 2020-11-10 31512, 2020

      • pristine___
        Then we fetch distinct users and assign them a user ID and prepare users_df
      • 2020-11-10 31529, 2020

      • alastairp
        right. so the description of these dataframes don't explain to me why each of them are needed in this setup. it's difficult to follow how the data flows into these dfs
      • 2020-11-10 31532, 2020

      • pristine___
        Fetch distinct artist, assign artist ID and prepare artist df
      • 2020-11-10 31553, 2020

      • alastairp
        what's an artist ID, and why do you need it?
      • 2020-11-10 31505, 2020

      • pristine___
        Users_df, artist_df and listend_df are needed to prepare artistcount_df
      • 2020-11-10 31512, 2020

      • pristine___
        Yeah, so the IDs
      • 2020-11-10 31558, 2020

      • pristine___
        The user ID and artist ID, it is assigned like this
      • 2020-11-10 31501, 2020

      • pristine___
      • 2020-11-10 31531, 2020

      • pristine___
        The model takes input of the form (int, int bigint)
      • 2020-11-10 31545, 2020

      • pristine___
        Int, int, bigint
      • 2020-11-10 31552, 2020

      • pristine___
        So we cannot just pass an mbid
      • 2020-11-10 31554, 2020

      • pristine___
        Or string
      • 2020-11-10 31500, 2020

      • pristine___
        To identify user, artist
      • 2020-11-10 31506, 2020

      • pristine___
        We assign them IDs
      • 2020-11-10 31531, 2020

      • alastairp
        great. so it's a mapping from our input to the indexes in the matrix
      • 2020-11-10 31526, 2020

      • pristine___
        I won't say indexes, but yeah we can use ids to later reference mbids/names etc
      • 2020-11-10 31500, 2020

      • alastairp
        can you please edit the document to make this clearer? Explicitly describe the matrix format, and explain that each of these other tables is for the mapping
      • 2020-11-10 31523, 2020

      • alastairp
        index - I mean how to reference a particular row or column.
      • 2020-11-10 31529, 2020

      • pristine___
        > Explicitly describe the matrix format, and explain that each of these other tables is for the mapping
      • 2020-11-10 31550, 2020

      • pristine___
        So we generally use the term mapping for msid->mbid mapping
      • 2020-11-10 31556, 2020

      • pristine___
        Sorry, I want clear on that
      • 2020-11-10 31559, 2020

      • pristine___
        Wasn't
      • 2020-11-10 31503, 2020

      • pristine___
        Earlier
      • 2020-11-10 31506, 2020

      • pristine___
        Will edit
      • 2020-11-10 31519, 2020

      • pristine___
        Do you want me to edit rn, or after the discussion?
      • 2020-11-10 31529, 2020

      • alastairp
        anything that goes from one identifier to another identifier is a mapping
      • 2020-11-10 31536, 2020

      • pristine___
        Yeah
      • 2020-11-10 31539, 2020

      • alastairp
        users_df is a mapping from a username to an integer
      • 2020-11-10 31545, 2020

      • alastairp
        I don't mind when you edit it
      • 2020-11-10 31553, 2020

      • pristine___
        Cool
      • 2020-11-10 31518, 2020

      • alastairp
        why do we need 3 tables? what's listens_df?
      • 2020-11-10 31503, 2020

      • pristine___
        Can you have a look at this join
      • 2020-11-10 31505, 2020

      • pristine___
      • 2020-11-10 31512, 2020

      • pristine___
        This explains the idea
      • 2020-11-10 31547, 2020

      • pristine___
        Listens_df is nothing but the listens as such, I have just filtered the columns/fields I need
      • 2020-11-10 31547, 2020

      • alastairp
        can you explain it to me?
      • 2020-11-10 31552, 2020

      • pristine___
        Yeah
      • 2020-11-10 31558, 2020

      • alastairp
        oh, it's an existing table?
      • 2020-11-10 31508, 2020

      • pristine___
        We just do, listens_df = listens.select(`arist_credit_id`, `user_name`).
      • 2020-11-10 31514, 2020

      • alastairp
        you just said before that these are all new dataframes
      • 2020-11-10 31550, 2020

      • pristine___
        Yes. New in the sense, we are creating them from the listens table (Submitted to LB)
      • 2020-11-10 31509, 2020

      • alastairp
        is there a specific technical reason why this is needed? So it seems like you're going from `listens` -> `listens_df` -> users_df and artists_df -> `playcounts_df`