These functions are completely same as the ones for artists
2020-05-21 14218, 2020
ishaanshah
The only function different is the main get_realeses function, i.e the SQL query
2020-05-21 14230, 2020
ishaanshah
get_releases*
2020-05-21 14258, 2020
iliekcomputers
Right
2020-05-21 14200, 2020
ishaanshah
Similar things have happened in spark reader
2020-05-21 14230, 2020
ishaanshah
I was wondering wether we should wait until we calculate all user stats
2020-05-21 14242, 2020
ishaanshah
And then look into generalisation
2020-05-21 14257, 2020
ishaanshah
Or start now itself?
2020-05-21 14237, 2020
iliekcomputers
I see you've most fixed it in the spark reader side
2020-05-21 14201, 2020
ishaanshah
I fixed the insert_db query
2020-05-21 14214, 2020
ishaanshah
But handlers are duplicated
2020-05-21 14230, 2020
iliekcomputers
I'd say try to deduplicate it on the spark cluster side.
2020-05-21 14245, 2020
iliekcomputers
The recording thing would be really small then
2020-05-21 14207, 2020
ishaanshah
Yeah, we should get all the data needed from the query itself
2020-05-21 14227, 2020
iliekcomputers
Yeah, this is actually pretty cool
2020-05-21 14237, 2020
ishaanshah
I guess we should have a hierarchy like this
2020-05-21 14211, 2020
ishaanshah
stats -> user_stats -> graph_type -> further_info
2020-05-21 14241, 2020
iliekcomputers
What's this hierarchy for?
2020-05-21 14248, 2020
ishaanshah
The query
2020-05-21 14222, 2020
ishaanshah
for example for weekly artist stats
2020-05-21 14225, 2020
iliekcomputers
Please elaborate
2020-05-21 14253, 2020
ishaanshah
query = stats.user.entity.artist.week
2020-05-21 14208, 2020
ishaanshah
for sitewide graphs when we add them
2020-05-21 14219, 2020
ishaanshah
query = stats.sitewide.artist.week
2020-05-21 14246, 2020
ishaanshah
stats.sitewide.entity.artist.week*
2020-05-21 14256, 2020
iliekcomputers
How does this help with code deduplication here?
2020-05-21 14246, 2020
ishaanshah
So in our case
2020-05-21 14210, 2020
ishaanshah
We see user, we send it to a generic user handler function
2020-05-21 14232, 2020
ishaanshah
which sees entity, and sends it the entity handler
2020-05-21 14250, 2020
ishaanshah
entity handler sees artist and week
2020-05-21 14214, 2020
ishaanshah
it calls the get_week('artist')
2020-05-21 14212, 2020
ishaanshah
Right now we only have user.entity
2020-05-21 14223, 2020
iliekcomputers
First impression, That seems a bit overkill to me
2020-05-21 14230, 2020
ishaanshah
But later we will have user.daily_activity
2020-05-21 14237, 2020
ishaanshah
user.listening_history
2020-05-21 14241, 2020
ishaanshah
and so on
2020-05-21 14206, 2020
iliekcomputers
There's only one layer of handlers right now, so it's easy to see what message calls what function
2020-05-21 14227, 2020
iliekcomputers
If we do this, more layers are added, things get harder to understand.
2020-05-21 14212, 2020
ishaanshah
Hmm, that is true
2020-05-21 14230, 2020
iliekcomputers
I was mostly thinking of doing something similar to the entity thing you've done on the insert function, bring the common code out to a generic entity function, making the handlers really small.
2020-05-21 14232, 2020
ishaanshah
for now we can get away with modifying the current handler ig
2020-05-21 14224, 2020
ishaanshah
Yeah so basically function will be get_entity_week(<entity_name>)
2020-05-21 14238, 2020
iliekcomputers
Yeah. Something like that.
2020-05-21 14212, 2020
iliekcomputers
I don't want to do too much processing of the query string because it's very hard to document it later.
2020-05-21 14236, 2020
ishaanshah
Yeah, that was my concern too
2020-05-21 14219, 2020
ishaanshah
Also right now some places artists is used and someplaces artist is used
2020-05-21 14219, 2020
iliekcomputers
Let's keep the query string -> handler mapping 1<>1 and direct for now.
2020-05-21 14252, 2020
iliekcomputers
ishaanshah: I don't have any preferences there, the db column is artist by design, change the rest as you see fit.
2020-05-21 14258, 2020
ishaanshah
That makes it hard to deduplicate code
2020-05-21 14224, 2020
ishaanshah
I was thinking of making everything to artist
2020-05-21 14235, 2020
ishaanshah
So that we don;t have to modify the schema
2020-05-21 14242, 2020
iliekcomputers
If that's the case, change it to one of them in a separate PR, and base the release PR off of that.
2020-05-21 14256, 2020
iliekcomputers
That way they both are easy to review.
2020-05-21 14229, 2020
ishaanshah
Yeah, I will do that
2020-05-21 14251, 2020
iliekcomputers
Cool, thanks!
2020-05-21 14227, 2020
iliekcomputers
I'm glad that release and recording won't take much time, now that we've put the work in for artists. That's very ideal. :)
2020-05-21 14242, 2020
ishaanshah
I am done with the API PR, overestimated this time 😆
2020-05-21 14235, 2020
iliekcomputers
Heh
2020-05-21 14248, 2020
iliekcomputers
I took a quick look, looks sane to me, a few small changes.
2020-05-21 14218, 2020
iliekcomputers
alastairp: I think data classes will fit perfectly for our stats json in postgres.
2020-05-21 14241, 2020
iliekcomputers
I'm gonna play around with it tomorrow and see is it works.
2020-05-21 14255, 2020
iliekcomputers
*if I can get it to work.
2020-05-21 14253, 2020
reosarevok
bitmap: those reports worked before though or?
2020-05-21 14250, 2020
bitmap
yeah, they must've been I guess (otherwise our daily script would've been getting stuck like now)
2020-05-21 14212, 2020
bitmap
there's a hard statement_timeout of 56s on the database
2020-05-21 14212, 2020
bitmap
but I could see if increasing the statement_timeout helps
2020-05-21 14226, 2020
reosarevok
When you said you failed at optimizing it, do you mean it didn't improve at all, or not enough to run?
2020-05-21 14233, 2020
reosarevok
Because any improvements might be useful
2020-05-21 14221, 2020
bitmap
well, none of the improvements I tried allowed it to complete within 60s, so I didn't bother checking beyond that
2020-05-21 14256, 2020
bitmap
RecordingTrackDifferentName completes with a 120s statement_timeout, but RecordingsSameNameDifferentArtistsSameName still doesn't
2020-05-21 14211, 2020
iliekcomputers
bitmap: you're up late! (or early)
2020-05-21 14239, 2020
bitmap
I had a nap earlier :)
2020-05-21 14249, 2020
bitmap
(but should prob go back to sleep soon)
2020-05-21 14205, 2020
reosarevok
Oh god
2020-05-21 14219, 2020
reosarevok
Of course, we only run it once a day
2020-05-21 14228, 2020
reosarevok
Can we set a different timeout for reports than we do for other things?
I'll try with 300s, it's kinda ridiculous beyond that though
2020-05-21 14227, 2020
reosarevok
Agreed
2020-05-21 14240, 2020
reosarevok
So we can set it only for the report run?
2020-05-21 14246, 2020
bitmap
yup
2020-05-21 14249, 2020
reosarevok
Just have the script up the timeout then bring it back to normal, right?
2020-05-21 14200, 2020
bitmap
if it's in a transaction, SET LOCAL statement_timeout = '300s';
2020-05-21 14240, 2020
reosarevok
then I guess we might as well
2020-05-21 14212, 2020
bitmap
can do for RecordingTrackDifferentName I guess
2020-05-21 14220, 2020
bitmap
we'll see if RecordingsSameNameDifferentArtistsSameName finishes yet
2020-05-21 14257, 2020
bitmap
'ERROR: canceling statement due to statement timeout' after 5 mins too
2020-05-21 14207, 2020
reosarevok
jeez
2020-05-21 14214, 2020
reosarevok
Ok, that *must* be possible to optimize
2020-05-21 14234, 2020
reosarevok
Recording and track are bigger tables than recording and artist...
2020-05-21 14225, 2020
reosarevok
I guess it's because all of the ac joins
2020-05-21 14240, 2020
bitmap
yeaj, though it's also joining recording twice (and artist twice, and artist_credit and artist_credit_name both twice...)
2020-05-21 14232, 2020
reosarevok
I guess changing the order of the FROM tables doesn't change anything?
2020-05-21 14240, 2020
reosarevok
I'm just not sure how this works under the hood
2020-05-21 14217, 2020
reosarevok
Intuitively it would make sense that checking for matching ACs first and only then comparing their recordings would be more efficient, but I think this stuff is smart enough to do things the easiest way
2020-05-21 14250, 2020
reosarevok
I guess the new collate isn't much slower than the previous one?