in #metabrainz

0:37 AM
ApeKattQuest joined the channel
0:41 AM
lusciouslover has quit
0:41 AM
lusciouslover joined the channel
0:42 AM
Jigen has quit
1:27 AM
Jigen joined the channel
1:28 AM
Goemon joined the channel
1:29 AM
ApeKattQuest has quit
1:31 AM
HSOWA joined the channel
1:31 AM
HSOWA has quit
1:31 AM
HSOWA joined the channel
1:32 AM
Jigen has quit
2:41 AM
rozlav8 has quit
2:42 AM
rozlav82 joined the channel
4:01 AM
lucifer[m]

[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) what postgres queries and which metadata files?
4:02 AM
holycow23[m]

lucifer[m]: Let's say I want to write a postgres query to fetch the listens with their year of release so I need to use the metadata from the HDFS query and for listens I am using the function as defined in the gist
4:08 AM
lucifer[m]

[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) you won't need to write a postgres query, it would be a spark sql query. the distinction is also important because for some things the syntax of spark sql is different from postgres. for the hdfs metadata you can read the dataframe as in the example gist using dataframe api or spark sql query.
4:10 AM
You shouldn't need to check the MB dumps in any case. Any metadata that you need should come from MB db. The format of MB dumps is different from MB DB so if you use the first it would create issues.
4:11 AM
I'll check the release year YIM queries in a while and confirm if all the data you need for that one is available or not.
4:12 AM
I have added release_metadata_cache to the sample dumps already. (Might need to update the codebase/container to import it successfully though).
4:15 AM
holycow23[m]

<lucifer[m]> "You shouldn't need to check..." <- Where can I see the formt of the MD db?
4:17 AM
lucifer[m]

https://musicbrainz.org/doc/MusicBrainz_Databas...
4:18 AM
Alternatively you can connect to the database on wolf using ssh port forwarding and browse the database with your choice of tool.
4:18 AM
holycow23[m]

But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
4:18 AM
* But are all the tables present cause I could locate... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
4:18 AM
lucifer[m]

This is not MB db.
4:18 AM
This is the list of metadata files imported into spark
4:18 AM
holycow23[m]

sorry this is the metadata part right?
4:18 AM
Yeah my bad
4:19 AM
So, I gotta use this right?
4:19 AM
No need for MB db?
4:19 AM
lucifer[m]

Yes this data already exists and you can use it as needed
4:20 AM
If there is some metadata that doesn't in these files then you would need to write queries to create these files by reading data from MB db
4:20 AM
I'll update the setup on wolf later today to add release_metadata_cache to the table.
4:20 AM
*to spark.
4:22 AM
holycow23[m]

lucifer[m]: I didn't get this?
4:24 AM
lucifer[m]

There is one more metadata file available in production that is missing from your local setup because I only added it to sample dumps last week.
4:24 AM
saumon has quit
4:24 AM
I'll update your spark setup to add that file.
4:37 AM
julian45[m]

reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
4:41 AM
not urgent, just a few thoughts i had while heading towards bed :)
5:11 AM
holycow23[m]

lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
5:11 AM
* lucifer: I wrote this small script... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
5:14 AM
dabeglavins60721 joined the channel
5:18 AM
dabeglavins6072 has quit
5:53 AM
Kladky joined the channel
6:01 AM
reosarevok[m]

<julian45[m]> "reosarevok: a while back we..." <- > <@julian45:julian45.net> reosarevok: a while back we talked about the continued need to mass mail auto-editors for election notifications, even in a post SSO implementation future...... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
7:04 AM
saumon joined the channel
9:06 AM
Maxr1998 joined the channel
9:07 AM
Maxr1998_ has quit
9:30 AM
mayhem[m]

<mayhem[m]> "lucifer: labs.api is running..." <- Did you take a look to see if anything was amiss with the data?
9:36 AM
dabeglavins60721 has quit
10:03 AM
lucifer[m]

mayhem: missed that message yesterday, will take a look now.
10:04 AM
<holycow23[m]> "lucifer: I wrote this small..." <- you can assume it will work fine in production without limiting, we have bigger queries that work fine there. do you still run out of memory with --driver-memory 8g?
10:11 AM
pite_ has quit
10:12 AM
pite joined the channel
12:01 PM
holycow23[m]

Yes I did run out of memory
12:04 PM
mayhem[m] uploaded an image: (23KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/QxFQIEVftryIukGiTwUJvmZA/image.png >
12:04 PM
mayhem[m]

lucifer: my LB instance is throwing this error on login
12:05 PM
keys verified, so without an error message, I am unsure how to proceed.
12:08 PM
lucifer[m]

mayhem: client id as well?
12:10 PM
to confirm the OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET in your config, match the client on https://musicbrainz.org/new-oauth2/client/list ?
12:12 PM
holycow23[m]

<lucifer[m]> "you can assume it will work fine..." <- I did run out of memory, also does such type of querying work or do I need SQL queries, that's what I wrote in mock queries in the proposal so either I will have to use that or just pandas filtering
12:13 PM
lucifer[m]

holycow23: that type of querying works but i think for consistency sake best to use SQL queries only.
12:13 PM
holycow23[m]

lucifer[m]: Okay that's what I thought of too but how do I test those?
12:14 PM
lucifer[m]

you can those by passing the query to spark.sql(query)
12:14 PM
that returns a dataframe.
12:14 PM
example: https://gist.github.com/amCap1712/ecef51789766c...
12:17 PM
for running out of memory, i'll take a look at it, there are different kinds of memory configurations in spark and its possible another one needs to be increased to avoid the issue.
12:19 PM
mayhem[m]

<lucifer[m]> "to confirm the OAUTH_CLIENT_ID..." <- yes, both match
12:20 PM
lucifer[m]

i'll try to reproduce the issue an fix it
12:21 PM
mayhem[m]

let me know if you need help.
12:21 PM
holycow23[m] sent a from code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/NwWeXkNYsOPcWVKEASzafweS
12:21 PM
holycow23[m]

This script ran quite well to map the songs with the genre
12:24 PM
outsidecontext[m

reosarevok: is the tagger link fix for taglookup supposed to be deployed on beta? Because I still get the issue there
12:24 PM
pite has quit
12:24 PM
pite joined the channel
12:25 PM
reosarevok[m]

Hmm. I think so? Let me double check
12:26 PM
outsidecontext[m

clicking on any tagger link on https://beta.musicbrainz.org/taglookup/index?ta... still makes the browser navigate and not a xhr request
12:26 PM
reosarevok[m]

Does it?
12:26 PM
It no longer opens a localhost tab for me at least...
12:27 PM
(and I get the same error on console than on search)
12:27 PM
reosarevok[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/UWGxJbpzaowymWnFeQTNlXwt
12:28 PM
outsidecontext[m

it does for me (well, it is the same tab for me, but it navigates to http://127.0.0.1:8001)
12:29 PM
ok, sorry. was a cache issue. cleared the cache and now it works
12:30 PM
holycow23[m]

<lucifer[m]> "for running out of memory, i'..." <- I just wrote a query for count of listens per genre grouped by user, that worked well without any limits
12:31 PM
lucifer[m]

cool sounds good.
12:33 PM
ijc has quit
12:33 PM
ijc joined the channel
12:38 PM
fettuccinae[m]

hey lucifer Can you please review this (pr) [https://github.com/metabrainz/metabrainz.org/pu...
12:38 PM
* hey lucifer Can you please review this pr https://github.com/metabrainz/metabrainz.org/pu...
12:40 PM
rayyan_seliya123

<lucifer[m]> "rayyan_seliya123, suvid, m...." <- hey lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-serv... as we discussed tp get move ahead whats pending or something !!
12:41 PM
* hey
12:41 PM
lucifer gentle reminder can u please review this commit https://github.com/metabrainz/listenbrainz-serv... as we discussed to get move ahead whats pending or something !!
12:44 PM
holycow23[m]

<lucifer[m]> "cool sounds good." <- Would it be possible for you to give me suggestions on how to move forward after this?
12:44 PM
Since its quite prod based with base table, aggregate table and cron jobs
12:47 PM
lucifer[m]

mayhem: similar recordings should load faster now.
12:47 PM
holycow23: is the query for your stat ready?
12:48 PM
rayyan_seliya123: will do
12:48 PM
holycow23[m]

yeah
12:48 PM
lucifer[m]

@fettuccinae:matrix.org: yes i had reviewed it earlier today, forgot to approve. done now
12:48 PM
rayyan_seliya123

lucifer[m]: Okk 👍
12:48 PM
holycow23[m] uploaded an image: (28KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/RvmjipvvbPoocOZxGaPHmoLi/image.png >
12:48 PM
fettuccinae[m]

lucifer[m]: Thanks.
12:49 PM
holycow23[m]

The output is exactly what we need for the genre activity chart
12:50 PM
lucifer[m]

cool, take a look at: https://github.com/metabrainz/listenbrainz-serv... and create a similar class for your stat.
12:51 PM
this will be used to execute your query and generate the results.
12:52 PM
holycow23[m]

Okay will look into it
12:52 PM
mayhem[m]

lucifer: was this dataset processed withe the Beatles fix in place?
12:52 PM
lucifer[m]

mayhem: nope.
12:53 PM
holycow23[m]

lucifer[m]: Actually I did go through this in the early days but how do I test this?
12:53 PM
lucifer[m]

mayhem: i don't have the link to those video recordings with lfm guys. can you share it again?
12:53 PM
mayhem[m]

lucifer[m]: the data looks really nice, from the spot checks I've made. but artists like the beatles are featuring quite prominently in some results. so I would very much love to see this fixed for all of our similar data sets.
12:54 PM
lucifer[m]

fwiw, it might not be easily applicable here as to my best recollection their suggestion was to scale items in the collaborative filtering model.
12:55 PM
mayhem[m]

ah, yes. ok, in that case, I think this is workable for the start. I can't see any problems from my spot checking, but eventually others might. so, lets keep our ability to regenerate this data alive for the time being.
12:59 PM
lucifer[m]

1. Stat Implementation: https://github.com/metabrainz/listenbrainz-serv.... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
12:59 PM
holycow23: above are the changes needed to add a new stat to spark.
13:00 PM
holycow23[m]

Okay will go through them thanks
13:00 PM
lucifer[m]

once all of this is in place, you will be able to run the command created in step 5 to send a request to the spark cluster (like we do for requesting existing stats or creating a new dump)
13:01 PM
for testing purposes, when your class in step 1 is ready, you can import it in pyspark and run it directly.
13:01 PM
the code will be similar to the function linked in step 2.
13:02 PM
when you have written step 2, you can just import that function and call it with the desired arguments and test 1 and 2 together so on.
13:03 PM
holycow23[m]

Okay, will go through them and if anything will get back toyou
13:04 PM
lucifer[m] posted a file: Debugging.ipynb (3703KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ulXxuIqCvyTUGaSAkIwhAMel >
13:05 PM
lucifer[m]

this is a notebook that i use for similar testing and debugging of the spark cluster. i'll try to clean it up later and share with you. but for now you can see the raw version and if it helps.
13:06 PM
holycow23[m]

okay
13:08 PM
_BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #3302 (03master…mlhd-labs-api): Add mlhd similar recordings to labs api https://github.com/metabrainz/listenbrainz-serv...
13:09 PM
lucifer[m]

monkey: hi! let me know if you can take a look at https://github.com/metabrainz/listenbrainz-serv... ? i tested it on test.lb and it seems to work fine fwiw.
13:10 PM
rayyan_seliya123: you can combine the changes from both PRs into a single one and close the other one.
13:13 PM
rayyan_seliya123

lucifer[m]: The one in which I have added sql files or tables ??I should close this ? And merge it into the one in which I have the seeder file and indexer script ?
13:13 PM
lucifer[m]

sure sounds good to me