alastairp: i'm pretty sure that should be possible right now.
2020-07-30 21234, 2020
alastairp
so, I have an lb env set up now. next step is data. can you give a brief description of what data is available, and the process for loading it?
2020-07-30 21235, 2020
ruaok
please update your code to use this URL from now on.
2020-07-30 21241, 2020
alastairp
I'll fill in some missing docs if necessary
2020-07-30 21244, 2020
iliekcomputers
the listens data can get imported into spark.
2020-07-30 21249, 2020
alastairp
(take your time, this evening after work would be fine)
2020-07-30 21212, 2020
ishaanshah
ruaok: noice!
2020-07-30 21218, 2020
ishaanshah
I will update the code
2020-07-30 21257, 2020
ishaanshah
ruaok: we need msid->mbid for artists too
2020-07-30 21259, 2020
iliekcomputers
alastairp: i'd suggest going through the steps here once: https://listenbrainz.readthedocs.io/en/production…, if something doesn't work, we can fix the docs. but that should set you up with a valid data dump with listens in spark.
iliekcomputers: what creates the metabrainz/hadoop-yarn, metabrainz/spark-master, and metabrainz/spark-worker images?
2020-07-30 21225, 2020
alastairp
ah, hadoop-cluster-docker
2020-07-30 21218, 2020
ruaok
shit. :(
2020-07-30 21256, 2020
ruaok
ishaanshah: I didn't know you need the artist_msid lookup to be in production. that's the messybrainz mapping which is a lot harder to put into production.
2020-07-30 21226, 2020
ruaok
for now, keep using the one on bono until I figure out what to do.
diru1100: is bio_tokenizer.pickle still needed? I don't see any reference to it in pr #2.
2020-07-30 21204, 2020
supersandro2000 has quit
2020-07-30 21208, 2020
alastairp
iliekcomputers:
2020-07-30 21209, 2020
alastairp
hadoop-master_1 | 2020-07-30 11:53:40,816 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /temp to /data/listenbrainz because destination's parent does not exist
2020-07-30 21214, 2020
alastairp
does this look familiar?
2020-07-30 21219, 2020
supersandro2000 joined the channel
2020-07-30 21243, 2020
iliekcomputers
ishaanshah: ^
2020-07-30 21257, 2020
alastairp
when running the spark data importer. I don't see any reference to /data in the docker-compose.spark file
2020-07-30 21205, 2020
alastairp
thanks :)
2020-07-30 21201, 2020
diru1100
yvanzo: it's not needed. I have removed all pickle files in pr #2
diru1100: so does it still need to be in v-0.1 assets?
2020-07-30 21241, 2020
diru1100
yvanzo: it's needed if they want to generate data.
2020-07-30 21254, 2020
diru1100
Not needed to run the model
2020-07-30 21202, 2020
diru1100
We can remove all *_tokenizers actually
2020-07-30 21206, 2020
yvanzo
diru1100: It would probably be more useful to just explain how to use these tokenizer files as the goal is to allow reproducing tests and derivative works.
2020-07-30 21251, 2020
yvanzo
diru1100: For example, how can one use bio_tokenizer.pickle (step by step)?
2020-07-30 21252, 2020
diru1100
yvanzo: Yes, should will help. But in dataset_generation notebook we aren't using the pickle file at all. We are directly using Keras Tokenizer class to do the job.
2020-07-30 21228, 2020
diru1100
I think it is kept maybe to store the tokenizers once we use it in production, but due to online it might change.
Hi ruaok ! Do you have a few minutes to talk about LB's search_larger_time_range mechanism?
2020-07-30 21258, 2020
shivam-kapila
Mr_Monkey: hi. I have some idea about it. I may be able to help in case you are in hurry
2020-07-30 21247, 2020
Mr_Monkey
Not in a hurry per se, ni, but you can probably help me understand a bit better. In short, I'm trying to figure out what should be changed now that pagination is done in react