#metabrainz

/

      • ApeKattQuest joined the channel
      • lusciouslover has quit
      • lusciouslover joined the channel
      • Jigen has quit
      • Jigen joined the channel
      • Goemon joined the channel
      • ApeKattQuest has quit
      • HSOWA joined the channel
      • HSOWA has quit
      • HSOWA joined the channel
      • Jigen has quit
      • rozlav8 has quit
      • rozlav82 joined the channel
      • lucifer[m]
        [@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) what postgres queries and which metadata files?
      • holycow23[m]
        lucifer[m]: Let's say I want to write a postgres query to fetch the listens with their year of release so I need to use the metadata from the HDFS query and for listens I am using the function as defined in the gist
      • lucifer[m]
        [@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) you won't need to write a postgres query, it would be a spark sql query. the distinction is also important because for some things the syntax of spark sql is different from postgres. for the hdfs metadata you can read the dataframe as in the example gist using dataframe api or spark sql query.
      • You shouldn't need to check the MB dumps in any case. Any metadata that you need should come from MB db. The format of MB dumps is different from MB DB so if you use the first it would create issues.
      • I'll check the release year YIM queries in a while and confirm if all the data you need for that one is available or not.
      • I have added release_metadata_cache to the sample dumps already. (Might need to update the codebase/container to import it successfully though).
      • holycow23[m]
        <lucifer[m]> "You shouldn't need to check..." <- Where can I see the formt of the MD db?
      • lucifer[m]
      • Alternatively you can connect to the database on wolf using ssh port forwarding and browse the database with your choice of tool.
      • holycow23[m]
        But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • * But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • lucifer[m]
        This is not MB db.
      • This is the list of metadata files imported into spark
      • holycow23[m]
        sorry this is the metadata part right?
      • Yeah my bad
      • So, I gotta use this right?
      • No need for MB db?
      • lucifer[m]
        Yes this data already exists and you can use it as needed
      • If there is some metadata that doesn't in these files then you would need to write queries to create these files by reading data from MB db
      • I'll update the setup on wolf later today to add release_metadata_cache to the table.
      • *to spark.
      • holycow23[m]
        lucifer[m]: I didn't get this?
      • lucifer[m]
        There is one more metadata file available in production that is missing from your local setup because I only added it to sample dumps last week.
      • saumon has quit
      • I'll update your spark setup to add that file.
      • julian45[m]
        reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • not urgent, just a few thoughts i had while heading towards bed :)
      • holycow23[m]
        lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • * lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • dabeglavins60721 joined the channel
      • dabeglavins6072 has quit
      • Kladky joined the channel
      • reosarevok[m]
        <julian45[m]> "reosarevok: a while back we..." <- > <@julian45:julian45.net> reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
      • saumon joined the channel
      • Maxr1998 joined the channel
      • Maxr1998_ has quit
      • mayhem[m]
        <mayhem[m]> "lucifer: labs.api is running..." <- Did you take a look to see if anything was amiss with the data?
      • dabeglavins60721 has quit
      • lucifer[m]
        mayhem: missed that message yesterday, will take a look now.
      • <holycow23[m]> "lucifer: I wrote this small..." <- you can assume it will work fine in production without limiting, we have bigger queries that work fine there. do you still run out of memory with --driver-memory 8g?
      • pite_ has quit
      • pite joined the channel
      • holycow23[m]
        Yes I did run out of memory
      • mayhem[m] uploaded an image: (23KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/QxFQIEVftryIukGiTwUJvmZA/image.png >
      • mayhem[m]
        lucifer: my LB instance is throwing this error on login
      • keys verified, so without an error message, I am unsure how to proceed.
      • lucifer[m]
        mayhem: client id as well?
      • to confirm the OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET in your config, match the client on https://musicbrainz.org/new-oauth2/client/list ?
      • holycow23[m]
        <lucifer[m]> "you can assume it will work fine..." <- I did run out of memory, also does such type of querying work or do I need SQL queries, that's what I wrote in mock queries in the proposal so either I will have to use that or just pandas filtering
      • lucifer[m]
        holycow23: that type of querying works but i think for consistency sake best to use SQL queries only.
      • holycow23[m]
        lucifer[m]: Okay that's what I thought of too but how do I test those?
      • lucifer[m]
        you can those by passing the query to spark.sql(query)
      • that returns a dataframe.
      • for running out of memory, i'll take a look at it, there are different kinds of memory configurations in spark and its possible another one needs to be increased to avoid the issue.
      • mayhem[m]
        <lucifer[m]> "to confirm the OAUTH_CLIENT_ID..." <- yes, both match
      • lucifer[m]
        i'll try to reproduce the issue an fix it
      • mayhem[m]
        let me know if you need help.
      • holycow23[m] sent a from code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/NwWeXkNYsOPcWVKEASzafweS
      • holycow23[m]
        This script ran quite well to map the songs with the genre
      • outsidecontext[m
        reosarevok: is the tagger link fix for taglookup supposed to be deployed on beta? Because I still get the issue there
      • pite has quit
      • pite joined the channel
      • reosarevok[m]
        Hmm. I think so? Let me double check
      • outsidecontext[m
        clicking on any tagger link on https://beta.musicbrainz.org/taglookup/index?ta... still makes the browser navigate and not a xhr request
      • reosarevok[m]
        Does it?
      • It no longer opens a localhost tab for me at least...
      • (and I get the same error on console than on search)
      • reosarevok[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UWGxJbpzaowymWnFeQTNlXwt
      • outsidecontext[m
        it does for me (well, it is the same tab for me, but it navigates to http://127.0.0.1:8001)
      • ok, sorry. was a cache issue. cleared the cache and now it works
      • holycow23[m]
        <lucifer[m]> "for running out of memory, i'..." <- I just wrote a query for count of listens per genre grouped by user, that worked well without any limits
      • lucifer[m]
        cool sounds good.
      • ijc has quit
      • ijc joined the channel
      • fettuccinae[m]
        hey lucifer Can you please review this (pr) [https://github.com/metabrainz/metabrainz.org/pu...
      • * hey lucifer Can you please review this pr https://github.com/metabrainz/metabrainz.org/pu...
      • rayyan_seliya123
        <lucifer[m]> "rayyan_seliya123, suvid, m...." <- hey lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-serv... as we discussed tp get move ahead whats pending or something !!
      • * hey
      • lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-serv... as we discussed to get move ahead whats pending or something !!
      • holycow23[m]
        <lucifer[m]> "cool sounds good." <- Would it be possible for you to give me suggestions on how to move forward after this?
      • Since its quite prod based with base table, aggregate table and cron jobs
      • lucifer[m]
        mayhem: similar recordings should load faster now.
      • holycow23: is the query for your stat ready?
      • rayyan_seliya123: will do
      • holycow23[m]
        yeah
      • lucifer[m]
        @fettuccinae:matrix.org: yes i had reviewed it earlier today, forgot to approve. done now
      • rayyan_seliya123
        lucifer[m]: Okk 👍
      • holycow23[m] uploaded an image: (28KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/RvmjipvvbPoocOZxGaPHmoLi/image.png >
      • fettuccinae[m]
        lucifer[m]: Thanks.
      • holycow23[m]
        The output is exactly what we need for the genre activity chart
      • lucifer[m]
        cool, take a look at: https://github.com/metabrainz/listenbrainz-serv... and create a similar class for your stat.
      • this will be used to execute your query and generate the results.
      • holycow23[m]
        Okay will look into it
      • mayhem[m]
        lucifer: was this dataset processed withe the Beatles fix in place?
      • lucifer[m]
        mayhem: nope.
      • holycow23[m]
        lucifer[m]: Actually I did go through this in the early days but how do I test this?
      • lucifer[m]
        mayhem: i don't have the link to those video recordings with lfm guys. can you share it again?
      • mayhem[m]
        lucifer[m]: the data looks really nice, from the spot checks I've made. but artists like the beatles are featuring quite prominently in some results. so I would very much love to see this fixed for all of our similar data sets.
      • lucifer[m]
        fwiw, it might not be easily applicable here as to my best recollection their suggestion was to scale items in the collaborative filtering model.
      • mayhem[m]
        ah, yes. ok, in that case, I think this is workable for the start. I can't see any problems from my spot checking, but eventually others might. so, lets keep our ability to regenerate this data alive for the time being.
      • lucifer[m]
      • holycow23: above are the changes needed to add a new stat to spark.
      • holycow23[m]
        Okay will go through them thanks
      • lucifer[m]
        once all of this is in place, you will be able to run the command created in step 5 to send a request to the spark cluster (like we do for requesting existing stats or creating a new dump)
      • for testing purposes, when your class in step 1 is ready, you can import it in pyspark and run it directly.
      • the code will be similar to the function linked in step 2.
      • when you have written step 2, you can just import that function and call it with the desired arguments and test 1 and 2 together so on.
      • holycow23[m]
        Okay, will go through them and if anything will get back toyou
      • lucifer[m] posted a file: Debugging.ipynb (3703KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ulXxuIqCvyTUGaSAkIwhAMel >
      • lucifer[m]
        this is a notebook that i use for similar testing and debugging of the spark cluster. i'll try to clean it up later and share with you. but for now you can see the raw version and if it helps.
      • holycow23[m]
        okay
      • _BrainzGit
        [listenbrainz-server] 14amCap1712 opened pull request #3302 (03master…mlhd-labs-api): Add mlhd similar recordings to labs api https://github.com/metabrainz/listenbrainz-serv...
      • lucifer[m]
        monkey: hi! let me know if you can take a look at https://github.com/metabrainz/listenbrainz-serv... ? i tested it on test.lb and it seems to work fine fwiw.
      • rayyan_seliya123: you can combine the changes from both PRs into a single one and close the other one.
      • rayyan_seliya123
        lucifer[m]: The one in which I have added sql files or tables ??I should close this ? And merge it into the one in which I have the seeder file and indexer script ?
      • lucifer[m]
        sure sounds good to me