Trying it now. I do get the same collation messages while the reindex is going on. Will monitor the next replication to see if that fixes it.
Sintharu joined the channel
Sintharu
Hi
adhawkins[m]
That seems to have done the trick bitmap. Maybe this step could be done automatically whenever a new version of the container is installed?
mayhem[m]
No, but outsidecontext might
ApeKattQuest joined the channel
Maxr1998_ joined the channel
Maxr1998 has quit
monkey[m]
<Sintharu> "Hi..." <- Hello Sintharu (IRC)
<mamanullah7[m]> "Hey monkey aerozol i needed a..." <- Overall looks good. I don't have much else to say other than what aerozol said: I expect the visual style would follow what we currently do on the connect services page. The LastFM one has extra text inputs for a good example.
And like LastFM, maybe there will be a need for an 'edit' button that is active only when you are connected to these services? Say if you want to change the URL for example.
mamanullah7[m]
<monkey[m]> "Overall looks good. I don't have..." <- okay i missed that edit one! i'll make sure to add this!
Thanks monkey aerozol i'll take care suggestions and i'll reach u out for further review!
petitminion joined the channel
davic has quit
spynxic has quit
spynxic joined the channel
davic joined the channel
Sintharu has quit
mglubb[m]
Hi yvanzo . Just wanted to say that I'm happy with re-indexing now that I've applied the latest SIR updates and tuned its configuration. Seems to be in the same ballpark as it was before, in terms of time. Possibly a bit quicker. Thank you for your service!
davic has quit
Sintharu joined the channel
Sophist-UK joined the channel
davic joined the channel
Kladky has quit
Kladky joined the channel
Sintharu has quit
Sintharu joined the channel
holycow23[m] joined the channel
holycow23[m]
Hey lucifer, I was actually looking into the TimeScale DB to run the a couple of queries over the listens and noticed that it doesn't have the `artist_mbid` or the `recording_mbid` to it, so how exactly do I fetch the same on local for the listens?
<holycow23[m]> "Hey lucifer, I was actually..." <- Also needed some more assistance with the working of stats, so could we get on a quick zoom call maybe?
s/more/help/, s/assistance//
petitminion has quit
petitminion joined the channel
yvanzo[m]
Hi mglubb, glad it works for your mirror, sharing your thanks with bitmap and lucifer who made it possible too.
lucifer: suggested small changes
lucifer[m]
yvanzo: just approved them, thanks!
<holycow23[m]> "Also needed some more assistance..." <- let's try to work it out over chat first and if it doesn't clear up we can do a call later.
holycow23[m]
lucifer[m]: Cool
yvanzo[m]
Great, on releasing sir then!
lucifer[m]
<holycow23[m]> "Hey lucifer, I was actually..." <- those will come from the mapping data, i'll put up a branch with sample dumps import tomorrow and then you can do a join to `mapping.mb_metadata_cache` table to get artist mbids for listens.
The cron job for the stats how often does it run, I found the file running the cron at /docker/services/cron/crontab, basically I wanna know so if the weekly is chosen then does it run weekly or does it run daily and update the individual time ranges at once?
julian45[m]
lucifer: you might already be aware, but just a heads up that the `stable` view of the sir docs (which seems to be the default when browsing to the RTD pages) doesn't yet have the documentation updates you've made, e.g., the [setup page](https://sir.readthedocs.io/en/stable/setup/index.html) still refers to `python2`
lucifer[m]
julian45: i am not sure if yvanzo has released the new version yet.
yvanzo[m]
On it…
lucifer[m]
just checked RTD dashboard, once the release is done stable should update automatically.
the link preview here is probably cached but it has updated to 4.0.1 now.
petitminion has quit
yvanzo[m]
I disabled link preview, personally.
lucifer[m]
ah okay
holycow23[m]
* lucifer: The cron job for the stats how often does it run, I found the file running the cron at `/docker/services/cron/crontab`, basically I wanna know so if the weekly is chosen then does it run weekly or does it run daily and update the individual time ranges at once?
<holycow23[m]> "The cron job for the stats how..." <- also the `get_aggregate_query` is based on the listens table so, over a period of time won't it have all the listens for a period of time or do you filter the listens for a period and then generate the results?
outsidecontext[m
<lucifer[m]> "mayhem: do you happen to have..." <- I still do. But I assume you will need OpenSubsonic support (for the MBIDs), as troi requires it?
I currently run the release version of funkwhale, but the opensubsonic support is not yet released.
lucifer[m]
<holycow23[m]> "also the `get_aggregate_query..." <- can you point me to the query on github?
yvanzo[m]
zas: Something went wrong with Solr backup, running it in 3min again.
<lucifer[m]> "can you point me to the query on..." <- You could refer to [this](https://github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz_spark/stats/incremental/user/listening_activity.py#L27)
lucifer[m]: And what is the frequency of the stats update is it done daily?
lucifer[m]
yes
holycow23[m]
Okay
Now for example, if I need to do the era stats, it will be based on the release date so that also will be fetched from the dump right?
But this is the MB Dump right not the Spark dump
lucifer[m]
[@holycow23:matrix.org](https://matrix.to/#/@holycow23:matrix.org) not the MB dump but directly from the MB db, we already have that stat for year in music iiuc.
we have the postgres queries to retrieve the data from the MB db
in listenbrainz_spark/postgres.
holycow23[m]
Yes correct the era one is already in the "Your year in Music 20xx" done but for testing it on local, I could use the MB Dump since I can't use the MB db directly
lucifer[m]
That data is cached in hdfs and refreshed daily before stats cron job run.
The json dump format is different from the database.
holycow23[m]
So, for developement of the stats how would I access the DB?
lucifer[m]
So it would not work. For this stat, I can export the data and you can import it in your local database.
For other stats, I can provide you with access to a full mb db replica. You can develop and test your queries on that and then I can export the data for using that query.
holycow23[m]
Cool, lemme know how to access the mb db replica
lucifer[m]
You can also connect your local spark cluster to a full mb db replica hosted on our servers. Or run a spark cluster on wolf. But that can be slower.
holycow23[m]
Could you guide me with connecting the local spark cluster with the full replica?
Also how do you write queries connecting two different databases, or do you just run individual queries on both?
lucifer[m]
The two databases as in?
holycow23[m]
Listens would be timescale_db and information regarding the songs would be
* Listens would be timescale_db and information regarding the songs would be mb_db
lucifer[m]
The listens are imported using dumps in spark
The addition metadata is brought in from MB db
And then joined together and processed in spark
holycow23[m]
So you don't use the timescale_db?
lucifer[m]
No that is not used for statistics at all
holycow23[m]
aah got it
got it
lucifer[m]
It is only used directly for listens page.
All the stats, recommendations etc. is done in spark where listen data is imported from dumps.
holycow23[m]
Got it, so spark has the listens as well as the info related to recordings, so you run queries over spark and then update the stats
lucifer[m]
Yes.
holycow23[m]
Thank you so much
Also till when will the dump be generated?
lucifer[m]
The sample data dump?
holycow23[m]
The spark dump
lucifer[m]
The spark dumps are generated daily for production
Do you mean the metadata like release dates etc?
holycow23[m]
Okay so how exactly would I proceed with my project since I will need the metadata with the listens
To run queries
lucifer[m]
Should be ready early next week.
holycow23[m]
Okay, is there anything that I can do before the dump is ready, wanted to start a little early actually
lucifer[m]
The data can be exported today but I need to add the import code in spark.
holycow23[m]
Okay
lucifer[m]
I think you can write the api and frontend side of things meanwhile.
Using some hardcoded dummy data for testing.
holycow23[m]
Okay
Thanks
petitminion joined the channel
_BrainzGit
[musicbrainz-server] 14mwiencek opened pull request #3546 (03production…mbs-14032): MBS-14032: Build temporary `release_first_release_date` table for MBS-13966 https://github.com/metabrainz/musicbrainz-serve...