iliekcomputers: About the cronjob for incremental dump, if we are creating an incremental dump everyday then we should import it on the same day too
2020-09-10 25455, 2020
ishaanshah
Otherwise we will never import that dump at all, because before the next day's import a newer incremental dump will be created which will then get imported.
2020-09-10 25438, 2020
sumedh joined the channel
2020-09-10 25423, 2020
thomasross has quit
2020-09-10 25457, 2020
sumedh has quit
2020-09-10 25407, 2020
BrainzGit
[listenbrainz-server] ishaanshah opened pull request #1083 (master…ishaan/listening-activity-range-update): LB-690: Minor improvements to Listening Activity graph https://github.com/metabrainz/listenbrainz-server…
My lecturer dropped a bomb saying college starts on 21st 😬😬😬
2020-09-10 25450, 2020
CatQuest
sep?
2020-09-10 25413, 2020
CatQuest
well school has already been in session here fro 2 weeks :D
2020-09-10 25426, 2020
CatQuest
man am i glad i no longer have it
2020-09-10 25455, 2020
SomalRudra
my college started a week back
2020-09-10 25439, 2020
zas
It seems changes that significantly reduced traffic between gateways and mb backend servers also had an impact on web service (that wasn't obvious at start)
bitmap, reosarevok: updated blog post with yesterday’s hotfixes, pushed a git tag and fixed previous git tag message. Make sure to delete your local tag: git tag -d v-2020-09-07 # then fetch to get new tag
2020-09-10 25416, 2020
nelgin
yvanzo, well...There has to be a better way to optimize the indexing? Maybe it's time to think of a different type of database? Ever through about NoSQL?
2020-09-10 25415, 2020
yvanzo
nelgin: live indexing works perfectly in production, the issue is more that about setup for mirrors.
2020-09-10 25442, 2020
jesus2099 joined the channel
2020-09-10 25402, 2020
jesus2099
Rotab yvanzo bitmap CatQuest: Indeed I don't use Opera 12 any more. ;)
2020-09-10 25421, 2020
jesus2099
reosarevok > "Sigh. I wish github was a bit better at showing where the only change is space
2020-09-10 25424, 2020
yvanzo
nelgin: for example, there is only one Solr instance in musicbrainz-docker whereas we have many nodes in production.
Yes, once you know it exists, you just have to find it back on the page. :)
2020-09-10 25404, 2020
jesus2099
(the diff settings button)
2020-09-10 25410, 2020
jesus2099
bitmap: I recently added a CAA ticket because it seems the CORS headers are missing when you use the release-group API. The release API CORS headers are OK but if you use the release-group API, that forwards to the release API, then you get the error: has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the
If I understood correctly. I'm not saying I'm 100% sure something should be fixed.
2020-09-10 25411, 2020
shivam-kapila
pristine___: self.temporary_login
2020-09-10 25456, 2020
pristine___
shivam-kapila: tried. Still not working
2020-09-10 25434, 2020
iliekcomputers
ishaanshah: it'll import the full dump with the same id as the incremental dump, so we should be good.
2020-09-10 25459, 2020
iliekcomputers
The full dump command creates a full dump with the ID of the last incremental dump
2020-09-10 25448, 2020
iliekcomputers
What I am worried about is this: suppose incremental dump creation fails, we request import of the newest incremental dump, we might be importing the same incremental dump twice
2020-09-10 25433, 2020
ishaanshah
> it'll import the full dump with the same id as the incremental dump, so we should be good.
2020-09-10 25433, 2020
ishaanshah
I didn't get you
2020-09-10 25427, 2020
ishaanshah
rn, suppose we trigger a full dump on 1st
2020-09-10 25452, 2020
iliekcomputers
1st of month - incremental dump with Id x is created.
2020-09-10 25407, 2020
iliekcomputers
Later on the 1st - full dump with Id x is created
2020-09-10 25427, 2020
iliekcomputers
2 - full dump with Id x is imported
2020-09-10 25417, 2020
ishaanshah
2 - another incremental dump is created right
2020-09-10 25423, 2020
iliekcomputers
We don't need to import the incremental dump with Id x because the full dump will contain the same data
2020-09-10 25431, 2020
ishaanshah
but we arent importing this one
2020-09-10 25438, 2020
iliekcomputers
Oh
2020-09-10 25442, 2020
iliekcomputers
Oof
2020-09-10 25448, 2020
iliekcomputers
My bad
2020-09-10 25458, 2020
iliekcomputers
Yeah, we should be importing that
2020-09-10 25412, 2020
ishaanshah
either we should skip generating that dump or import everyday
2020-09-10 25400, 2020
iliekcomputers
Yeah, I guess we need to import everyday.
2020-09-10 25407, 2020
jesus2099 has left the channel
2020-09-10 25412, 2020
iliekcomputers
We need some Id validation on the spark side as well
2020-09-10 25432, 2020
iliekcomputers
Right now it's all dependent on the cron job and brittle
2020-09-10 25459, 2020
ishaanshah
hmm, so the last imported id for incremental
2020-09-10 25416, 2020
ishaanshah
otherwise we might end up importing it twice
2020-09-10 25419, 2020
iliekcomputers
Spark should store the current Id somewhere, check if the dump it's importing is greater than the ID and then import
2020-09-10 25421, 2020
iliekcomputers
Yeah
2020-09-10 25457, 2020
ishaanshah
id can be stored in hdfs only ig, cause we dont have redis in spark cluster
2020-09-10 25418, 2020
iliekcomputers
Hdfs makes sense to me.
2020-09-10 25407, 2020
iliekcomputers
Maybe make it a dataframe with history. (id, imported_timestamp, dump_type, dump_timestamp)
2020-09-10 25459, 2020
ishaanshah
cool, cool, I'll make a PR for it over this weekend...
2020-09-10 25409, 2020
iliekcomputers
Sounds good, thanks!
2020-09-10 25419, 2020
ishaanshah
btw, how much time did the import take?
2020-09-10 25423, 2020
iliekcomputers
Let's see how it runs in the meanwhile, it should still be stable this week
2020-09-10 25428, 2020
ishaanshah
like the copy part
2020-09-10 25440, 2020
iliekcomputers
ishaanshah: the copy command took ~20 min I think
2020-09-10 25404, 2020
iliekcomputers
It's not very scalable, but that's because of the way we store data in hdfs
2020-09-10 25419, 2020
ishaanshah
yep, I saw the ticket
2020-09-10 25426, 2020
iliekcomputers
I figure we'll have to partition the parquet files based on the listen submission timestamps
2020-09-10 25439, 2020
iliekcomputers
That way we can just add a new file to hdfs and we're done
ruaok: hey. I have raised InternalServerError if labs.api.listenbrainz.org does not return 200 as status code or if there is any other problem with it. What do you think?
2020-09-10 25449, 2020
ruaok
hmmm.
2020-09-10 25431, 2020
ruaok
if labs returns a 400 error, then the problem is on the caller (your) side of things. Should that be an ISE?
2020-09-10 25404, 2020
reosarevok
"External Server Error" :D
2020-09-10 25432, 2020
BrainzGit
[listenbrainz-server] mayhem merged pull request #1080 (master…distinct-similar-top-artist): [LB-703] Similar and top artist should be distinct for a user https://github.com/metabrainz/listenbrainz-server…