#metabrainz

/

12:42 PM
shivam-kapila

What do we use BiqQuery for currently?

2020-05-02 12313, 2020

12:46 PM
iliekcomputers

nothing, it's mostly a private archive right now

2020-05-02 12356, 2020

12:46 PM
shivam-kapila

Shouldn't we clean out it from LB repo or there is some reason its still there?

2020-05-02 12301, 2020

12:48 PM
sumedh joined the channel

2020-05-02 12348, 2020

12:48 PM
jmp_music joined the channel

2020-05-02 12327, 2020

12:50 PM
iliekcomputers

yeah, we should remove all the bigquery stats code

2020-05-02 12333, 2020

12:50 PM
iliekcomputers

the bigquery-writer still works

2020-05-02 12342, 2020

12:51 PM
shivam-kapila

Yes I meant the stats part

2020-05-02 12314, 2020

12:54 PM
iliekcomputers

i'll open a ticket, it's a bit non-trivial

2020-05-02 12339, 2020

12:54 PM
iliekcomputers

it's at least an hour or two of work

2020-05-02 12344, 2020

12:55 PM
iliekcomputers

LB-550

2020-05-02 12345, 2020

12:55 PM
BrainzBot

LB-550: Remove bigquery statistics code https://tickets.metabrainz.org/browse/LB-550

2020-05-02 12319, 2020

12:56 PM
iliekcomputers

shivam-kapila: i made your changes to the docs PR

2020-05-02 12304, 2020

12:59 PM
shivam-kapila

Yeah I had a look. Thanks :). Your comment about the develop.sh makes me inclined to think that we can have a separate file to wrap non docker-compose tasks.

2020-05-02 12304, 2020

13:00 PM
iliekcomputers

what tasks are you talking about ?

2020-05-02 12313, 2020

13:00 PM
shivam-kapila

The manage ones

2020-05-02 12322, 2020

13:00 PM
shivam-kapila

./develop.sh manage

2020-05-02 12325, 2020

13:00 PM
shivam-kapila

types

2020-05-02 12326, 2020

13:00 PM
iliekcomputers

manage.py is supposed to be that

2020-05-02 12340, 2020

13:00 PM
iliekcomputers

we're making a manage script over a manage script

2020-05-02 12343, 2020

13:00 PM
moufl has quit

2020-05-02 12346, 2020

13:00 PM
iliekcomputers

makes no sense

2020-05-02 12354, 2020

13:02 PM
moufl joined the channel

2020-05-02 12329, 2020

13:03 PM
shivam-kapila

Oh yes. Sorry. That was dumb :/

2020-05-02 12355, 2020

13:04 PM
iliekcomputers

nah, that's ok. i'm just trying to reduce complexity in the dev environment as much as possible.

2020-05-02 12358, 2020

13:04 PM
iliekcomputers

for example

2020-05-02 12315, 2020

13:05 PM
iliekcomputers

both develop.sh and spark_develop.sh have a `format_namenode` for some reason

2020-05-02 12346, 2020

13:05 PM
iliekcomputers

I don't think we need the `develop.sh npm` either, that automatically happens

2020-05-02 12313, 2020

13:06 PM
shivam-kapila

Yes we can remove that

2020-05-02 12346, 2020

13:06 PM
iliekcomputers

`psql` I'm okay with, but it's not documented anywhere

2020-05-02 12320, 2020

13:07 PM
iliekcomputers

`manage` i'm not so sure about

2020-05-02 12323, 2020

13:07 PM
shivam-kapila

> both develop.sh and spark_develop.sh have a `format_namenode` for some reason

2020-05-02 12323, 2020

13:07 PM
shivam-kapila

The one in develop.sh can be removed. Its useless. The one in spark_develop.sh accomplishes the same thing

2020-05-02 12341, 2020

13:07 PM
iliekcomputers

yes

2020-05-02 12309, 2020

13:08 PM
iliekcomputers

spark_develop.sh format namenode used to take a clusterId argument that i had no idea where to find

2020-05-02 12327, 2020

13:08 PM
iliekcomputers

removing it still worked

2020-05-02 12333, 2020

13:08 PM
iliekcomputers shrugs

2020-05-02 12349, 2020

13:08 PM
shivam-kapila

Yes it will

2020-05-02 12318, 2020

13:09 PM
iliekcomputers

we need to adopt a policy of only adding stuff that's necessary. the dev environment has gotten so complicated that it's easier to develop spark stuff and run it on production data.

2020-05-02 12331, 2020

13:09 PM
iliekcomputers

vs just developing locally

2020-05-02 12316, 2020

13:10 PM
shivam-kapila

clusterid was used to format the namenode of the same cluster s of datanode

2020-05-02 12321, 2020

13:10 PM
shivam-kapila

as*

2020-05-02 12308, 2020

13:11 PM
shivam-kapila

Actually if we don't provide clusterID then if we run format command twice then the link between datanode and namenode will end and they wont work

2020-05-02 12316, 2020

13:11 PM
shivam-kapila

ClusterID ws used for this reason

2020-05-02 12336, 2020

13:11 PM
iliekcomputers

what is the ID that I'm supposed to pass tho?

2020-05-02 12341, 2020

13:11 PM
iliekcomputers

where do I find it?

2020-05-02 12323, 2020

13:12 PM
iliekcomputers

it also doesn't error out well enough, I ran it two or three times and then was confused why it wasn't working, only realized that it needs some extra argument

2020-05-02 12316, 2020

13:13 PM
shivam-kapila

The clusterID part is somewhere in docs. Give me a sec

2020-05-02 12347, 2020

13:13 PM
iliekcomputers

LB-551

2020-05-02 12348, 2020

13:13 PM
BrainzBot

LB-551: Simplify the develop.sh scripts https://tickets.metabrainz.org/browse/LB-551

2020-05-02 12317, 2020

13:14 PM
shivam-kapila

iliekcomputers: https://github.com/metabrainz/listenbrainz-server…

2020-05-02 12329, 2020

13:14 PM
shivam-kapila

Here you see the cat command to fetch the ID

2020-05-02 12329, 2020

13:14 PM
SothoTalKer has quit

2020-05-02 12301, 2020

13:15 PM
shivam-kapila

3rd point

2020-05-02 12342, 2020

13:15 PM
iliekcomputers

we should remove hacking.md

2020-05-02 12349, 2020

13:15 PM
SothoTalKer joined the channel

2020-05-02 12352, 2020

13:15 PM
iliekcomputers

all of it should be on listenbrainz.readthedocs

2020-05-02 12349, 2020

13:16 PM
iliekcomputers

new devs will not read HACKING.md to find this stuff lol

2020-05-02 12311, 2020

13:18 PM
shivam-kapila

Seems like dev env things need a standard that also needs to be documented

2020-05-02 12342, 2020

13:18 PM
iliekcomputers

readthedocs is the standard

2020-05-02 12321, 2020

13:19 PM
iliekcomputers

for docs at least :D

2020-05-02 12333, 2020

13:20 PM
iliekcomputers

LB-552

2020-05-02 12333, 2020

13:20 PM
BrainzBot

LB-552: Should not need to format hadoop on first run https://tickets.metabrainz.org/browse/LB-552

2020-05-02 12303, 2020

13:21 PM
shivam-kapila

Its not a config problem

2020-05-02 12350, 2020

13:21 PM
shivam-kapila

Hadoop docs say that 1st time namenode needs to be formatted so it can obtain the clusterID of datanode after the format

2020-05-02 12313, 2020

13:22 PM
shivam-kapila

Although they also encourage to never format the namenode again unless its highly required

2020-05-02 12314, 2020

13:23 PM
shivam-kapila

Also HACKING.md has nothing much that isnt in docs. The main stuff thats different is format namenode one which I agree should be in spark-dev-env

2020-05-02 12328, 2020

13:23 PM
iliekcomputers

not in spark-dev-env

2020-05-02 12344, 2020

13:23 PM
iliekcomputers

the first format should just work (automatically or in a single command)

2020-05-02 12359, 2020

13:23 PM
shivam-kapila

automatic looks good

2020-05-02 12319, 2020

13:24 PM
iliekcomputers

the clusterId finding shit should be easy to do, if a user runs `./spark-develop.sh format` without a clusterID they should get an error that tells them exactly what to do

2020-05-02 12326, 2020

13:24 PM
iliekcomputers

where to find the ID and how to pass it

2020-05-02 12352, 2020

13:24 PM
iliekcomputers

the dev-env docs should be as simple as possible with the minimum number of steps possible

2020-05-02 12323, 2020

13:25 PM
iliekcomputers

otherwise, it's just a pita to read them and run so many steps

2020-05-02 12306, 2020

13:29 PM
shivam-kapila

Right now if they dont pass the clusterID I guess the format will still work. But if they run multiple times then the link breakage will occur. ClusterID is an optional param for format command

2020-05-02 12334, 2020

13:29 PM
iliekcomputers

shivam-kapila: that's not the case in current master

2020-05-02 12349, 2020

13:29 PM
iliekcomputers

if you run ./spark-develop.sh format on master, it won't work

2020-05-02 12304, 2020

13:30 PM
iliekcomputers

i fixed it in the PR by removing the clusterID argument

2020-05-02 12325, 2020

13:30 PM
iliekcomputers

is there a usecase for formatting hadoop after the first format?

2020-05-02 12340, 2020

13:30 PM
shivam-kapila

We dont need it mostly.

2020-05-02 12344, 2020

13:30 PM
iliekcomputers

i'm wondering if we can just remove the format command entirely, just run it the first time.

2020-05-02 12356, 2020

13:30 PM
iliekcomputers

if someone needs to format their data, they should use hdfs -rm

2020-05-02 12325, 2020

13:31 PM
shivam-kapila

We can remove that

2020-05-02 12349, 2020

13:31 PM
shivam-kapila

Then your current fix as you pushed into thr PR will work

2020-05-02 12311, 2020

13:32 PM
shivam-kapila

A note can be added that *Please run this command only once*

2020-05-02 12317, 2020

13:33 PM
iliekcomputers

ideally it would run automatically, and we can remove the entire step from the doc

2020-05-02 12330, 2020

13:33 PM
iliekcomputers

https://tickets.metabrainz.org/browse/LB-553

2020-05-02 12331, 2020

13:33 PM
BrainzBot

LB-553: Improve developer environment and docs

2020-05-02 12333, 2020

13:33 PM
iliekcomputers

opened an epic

2020-05-02 12339, 2020

13:33 PM
shivam-kapila

Also for multiple formats we can add the HACKING.md into FAQs replacing `./spark_develop.sh format` with `docker-compose ...........`

2020-05-02 12332, 2020

13:34 PM
shivam-kapila

So you mean to add the format without clusterID into `./spark_develop.sh build`?

2020-05-02 12355, 2020

13:34 PM
shivam-kapila

So it happens itself?

2020-05-02 12337, 2020

13:35 PM
iliekcomputers

wouldn't work with build

2020-05-02 12356, 2020

13:35 PM
iliekcomputers

i'll have to think about how to do it specifically, but that's an implementation detail

2020-05-02 12325, 2020

13:36 PM
shivam-kapila

Thats why I asked. If we build multiple times then format will occur multiple times and link between nodes will break

2020-05-02 12323, 2020

13:37 PM
iliekcomputers

yep

2020-05-02 12332, 2020

13:40 PM
iliekcomputers

the ticket about removing the command entirely (or removing it with hdfs -rm): https://tickets.metabrainz.org/browse/LB-552

2020-05-02 12333, 2020

13:40 PM
BrainzBot

LB-552: Should not need to format hadoop on first run

2020-05-02 12350, 2020

13:40 PM
iliekcomputers

wait

2020-05-02 12351, 2020

13:40 PM
iliekcomputers

https://tickets.metabrainz.org/browse/LB-555

2020-05-02 12351, 2020

13:40 PM
BrainzBot

LB-555: Improve the spark-develop.sh format command

2020-05-02 12339, 2020

13:42 PM
iliekcomputers

the epic has a nice list of tasks now, so at least we have a way forward now :)

2020-05-02 12343, 2020

13:42 PM
iliekcomputers

thanks shivam-kapila

2020-05-02 12336, 2020

13:43 PM
iliekcomputers

https://listenbrainz.readthedocs.io/en/production…

2020-05-02 12341, 2020

13:43 PM
iliekcomputers

this part shouldn't be in the setting up doc

2020-05-02 12306, 2020

13:45 PM
shivam-kapila

Oh yes.

2020-05-02 12357, 2020

13:45 PM
djwhitey joined the channel

2020-05-02 12301, 2020

13:47 PM
iliekcomputers

done in current PR

2020-05-02 12312, 2020

13:53 PM
shivam-kapila

iliekcomputers: I added a comment about a mistake I did in my docs PR. I just noticed it on readthedocs. Can you plz resolve that too

2020-05-02 12359, 2020

13:53 PM
iliekcomputers

yes.

2020-05-02 12306, 2020

13:54 PM
moufl has quit

2020-05-02 12352, 2020

13:55 PM
moufl joined the channel

2020-05-02 12303, 2020

13:56 PM
shivam-kapila

About LB-549: Doesnt that happen because the spotify reader fetches the latest 50 listens again after deleting all listens? P.S. Not urgent. This just caught my eye

2020-05-02 12304, 2020

13:56 PM
BrainzBot

LB-549: Listen Deletion Doesn't Work https://tickets.metabrainz.org/browse/LB-549

2020-05-02 12307, 2020

14:00 PM
iliekcomputers

yeah, that makes sense

2020-05-02 12326, 2020

14:00 PM
iliekcomputers

i'll reply to the jira saying that they should disconnect their spotify

2020-05-02 12334, 2020

14:00 PM
iliekcomputers

shivam-kapila: added the dump docs change

2020-05-02 12326, 2020

14:01 PM
shivam-kapila

Thanks :)

2020-05-02 12359, 2020

14:01 PM
shivam-kapila

We also need to reset user stats after we delete the listens. I will open a ticket for that

2020-05-02 12302, 2020

14:05 PM
iliekcomputers

shivam-kapila: yes please, thanks

2020-05-02 12308, 2020

14:05 PM
iliekcomputers

put them in the statistics epic

2020-05-02 12313, 2020

14:05 PM
iliekcomputers

zas: https://tickets.metabrainz.org/browse/LB-534

2020-05-02 12314, 2020

14:05 PM
BrainzBot

LB-534: In case of error 504 the API returns HTML instead of JSON

2020-05-02 12326, 2020

14:05 PM
iliekcomputers

I'm not sure if there's anyway around this

2020-05-02 12305, 2020

14:06 PM
iliekcomputers

ishaanshah[m]: https://tickets.metabrainz.org/projects/LB/issues…

2020-05-02 12305, 2020

14:06 PM
BrainzBot

LB-533: 400 Bad Request: Incorrect timestamp argument max_ts: null when browsing listens

2020-05-02 12355, 2020

14:06 PM
alastairp

iliekcomputers: not anyway around what?

2020-05-02 12317, 2020

14:07 PM
alastairp

535 or 533?

2020-05-02 12325, 2020

14:07 PM
shivam-kapila

LB-533: Oh I had fixed that. I was studying about frontend tests to add test for that

2020-05-02 12327, 2020

14:07 PM
iliekcomputers

533

2020-05-02 12334, 2020

14:07 PM
iliekcomputers

no sorry, 534

2020-05-02 12342, 2020

14:07 PM
iliekcomputers

the 504 comes from the gateways, not LB

2020-05-02 12342, 2020

14:07 PM
alastairp

534, we do that in AB

2020-05-02 12350, 2020

14:07 PM
iliekcomputers

oh

2020-05-02 12351, 2020

14:07 PM
alastairp

oh, right. 504

2020-05-02 12354, 2020

14:07 PM
alastairp

sorry, you're right

2020-05-02 12332, 2020

14:08 PM
iliekcomputers

shivam-kapila: can you just pull request the fix, i'll look into adding a test

2020-05-02 12355, 2020

14:08 PM
shivam-kapila

Ok I will do that. I will rebase it over master first

2020-05-02 12356, 2020

14:08 PM
alastairp

and it's true that other 5xx errors, we probably do return json. so we can't just say "don't bother parsing a 5xx error"

2020-05-02 12318, 2020

14:09 PM
iliekcomputers

alastairp: yeah.

2020-05-02 12330, 2020

14:09 PM
alastairp

but to do it from the frontend, that fe would have to inspect the path, and return a different error message depending on the value of the path

2020-05-02 12341, 2020

14:11 PM
alastairp

I'd definitely raise it as a sysadmin ticket to see if there's anything we can do about it

2020-05-02 12336, 2020

14:12 PM
shivam-kapila

iliekcomputers: LB-556

2020-05-02 12336, 2020

14:12 PM
BrainzBot

LB-556: Reset user stats when a user deletes their listens https://tickets.metabrainz.org/browse/LB-556

2020-05-02 12322, 2020

14:13 PM
iliekcomputers

MBH-538

2020-05-02 12323, 2020

14:13 PM
BrainzBot

MBH-538: Can we return JSON for ListenBrainz API Gateway Timeout requests? https://tickets.metabrainz.org/browse/MBH-538

2020-05-02 12341, 2020

14:13 PM
iliekcomputers

shivam-kapila: thanks!

2020-05-02 12341, 2020

14:14 PM
alastairp

iliekcomputers: nice

2020-05-02 12357, 2020

14:18 PM
moufl has quit

2020-05-02 12349, 2020

14:21 PM
moufl joined the channel