For stats. The codebase has not expanded much so it should be easy for you to understand how it all goes. Ping me if you get stuck anywhere.
2019-12-14 34804, 2019
pristine__
Rn, we are writing tests for labs, so that is one place where you can surely contribute
2019-12-14 34818, 2019
pristine__
I will open a ticket and you can have a look to start with labs :)
2019-12-14 34823, 2019
pristine__
Good luck :)
2019-12-14 34843, 2019
sarthak_jain
Yes, that will be great.
2019-12-14 34803, 2019
pristine__
I should be back after lunch :)
2019-12-14 34853, 2019
sarthak_jain
Okayy
2019-12-14 34822, 2019
sbvkrishna joined the channel
2019-12-14 34831, 2019
sbvkrishna
Mr_Monkey: Had a quick look at the Merge tool and it's pretty awesome! Couple small things - 1. We don't need the Merge queue on the Merge submit page right? (instead, add a single cancel-merge button?), 2. commented here- https://github.com/bookbrainz/bookbrainz-site/pul… .
2019-12-14 34813, 2019
rahul24 has quit
2019-12-14 34859, 2019
reosarevok
Mr_Monkey: please look into the video task again - if Freso is not available to check you'll need to just be the judge yourself
2019-12-14 34803, 2019
reosarevok
I can take a look if it helps?
2019-12-14 34825, 2019
pristine__
ruaok: lemme know when you are up to chat a lil
2019-12-14 34851, 2019
yvanzo
bitmap, reosarevok: can we hotfix beta/prod with 1329?
2019-12-14 34826, 2019
reosarevok
If you think it's urgent enough, yes :)
2019-12-14 34846, 2019
reosarevok
(it shows only names so it's not too awful, but still, we could)
2019-12-14 34848, 2019
yvanzo
(and only when it is shared with another editor, but still)
So rn, It is pretty easy to set up labs on local machine, First setup LB-server, clone labs, and run develop.sh but to run the recommendation engine we need data.
2019-12-14 34812, 2019
pristine__
We have already discussed about uploading mappings
2019-12-14 34821, 2019
pristine__
and artist relation
2019-12-14 34858, 2019
pristine__
but we still need listens
2019-12-14 34835, 2019
pristine__
the dumps adn incremental dumps are there on williams but i strongly feel that they are not ideal for running labs on local machine
2019-12-14 34845, 2019
pristine__
beause of the size
2019-12-14 34853, 2019
ruaok
too large?
2019-12-14 34814, 2019
pristine__
yes, GBs. Not possible to download
2019-12-14 34828, 2019
pristine__
and upload in hdfs. Lot ofwork
2019-12-14 34833, 2019
pristine__
Also
2019-12-14 34815, 2019
pristine__
We need recent listens to run the recommendation engine. recent as in till the date on which it is being run
2019-12-14 34830, 2019
pristine__
So we have two options
2019-12-14 34828, 2019
pristine__
smaller incremental dumps every monday/thursday or whatever the window is and download them as and when you require.
2019-12-14 34845, 2019
pristine__
The configs can be changed to fetch data other than the near future.
2019-12-14 34819, 2019
pristine__
but I see a problem, is it okay to every time connect to FTP and download the data and do stuff, can be time consuming.
2019-12-14 34846, 2019
pristine__
The other option is to generate local (fake) data to run the recommendation engine
2019-12-14 34857, 2019
pristine__
of course we need to write scripts for that'
2019-12-14 34846, 2019
pristine__
for the first option we need need to modify mapping and relation also (to keep the size small and have them intersect with the listens)
2019-12-14 34811, 2019
pristine__
what do you think?
2019-12-14 34829, 2019
pristine__
Rn things are not so clear, I understand but people are coming up for contribution so I guess we need to make a setup for them
2019-12-14 34808, 2019
ruaok ponders
2019-12-14 34839, 2019
pristine__
setup that is enough for them to make PRs
2019-12-14 34801, 2019
pristine__
the actual testing and everything will be done on leader and for that we need the big data dumps
2019-12-14 34813, 2019
pristine__
but not to run on local machine
2019-12-14 34853, 2019
ruaok
> but I see a problem, is it okay to every time connect to FTP and download the data and do stuff, can be time consuming.
2019-12-14 34800, 2019
ruaok
yes, we have the bandwidth for that.
2019-12-14 34807, 2019
sbvkrishna has quit
2019-12-14 34815, 2019
ruaok
the problem with fake data is that it is of limited usefulness.
2019-12-14 34838, 2019
ruaok
what if we, during data dump generation, generate a test data dump of a diserable size?
2019-12-14 34845, 2019
ruaok
from the latest data?
2019-12-14 34813, 2019
ruaok
say, take X listens from the Y users?
2019-12-14 34831, 2019
ruaok
because other projects have small data sets for testing too. what do you think of that?
2019-12-14 34856, 2019
ruaok
also, did you see iliekcomputers suggestion of merging the -labs repos back into the main repos?
2019-12-14 34814, 2019
pristine__
> also, did you see iliekcomputers suggestion of merging the -labs repos back into the main repos?
2019-12-14 34819, 2019
ruaok
msb-labs > msb and lb-labs > lb ?
2019-12-14 34822, 2019
pristine__
No, I guess I missed it
2019-12-14 34834, 2019
pristine__
Okay.
2019-12-14 34838, 2019
ruaok
I fully agree with the msb case, but am still thinking about the lb case.
2019-12-14 34857, 2019
ruaok
that reduces the number of setup and management scripts that we duplicate.
2019-12-14 34858, 2019
pristine__
I will think about it too. Not sure rn
2019-12-14 34803, 2019
ruaok
ok, please do that.
2019-12-14 34844, 2019
ruaok
so, your desire to make things small and easily installable for new comers is great.
2019-12-14 34815, 2019
pristine__
> what if we, during data dump generation, generate a test data dump of a diserable size?
2019-12-14 34820, 2019
pristine__
sounds cool
2019-12-14 34826, 2019
pristine__
We have to do the same for mapping
2019-12-14 34831, 2019
pristine__
and relations
2019-12-14 34840, 2019
pristine__
and ensure that they intersect
2019-12-14 34842, 2019
ruaok
but I feel that we're doing a lot of meta-stuff, while we haven't gotten the core fully nailed down yet.
2019-12-14 34857, 2019
pristine__
true
2019-12-14 34807, 2019
Wassabi joined the channel
2019-12-14 34812, 2019
pristine__
At this point, It is really difficult for any third person to contribute because we need a lot of data to let the scripts fire
2019-12-14 34818, 2019
ruaok
what if we draw our goal to be "let's make things testable and get things workding. then once we have stuff that we can see is working, we'll make it easier to install."
2019-12-14 34832, 2019
ruaok
yes, indeed.
2019-12-14 34847, 2019
pristine__
> what if we draw our goal to be "let's make things testable and get things workding. then once we have stuff that we can see is working, we'll make it easier to install."
2019-12-14 34854, 2019
ruaok
but things are also so young still that we may not get many people interested in contributing on this code.
2019-12-14 34810, 2019
ruaok
we had one fellow mail support@ about working on some LB tickets.
2019-12-14 34824, 2019
ruaok
which is great, but not a lot of lb-labs interrest yet.
2019-12-14 34837, 2019
pristine__
I understand but then I have a small periphery for newcomers to contribute
2019-12-14 34847, 2019
pristine__
Like they can write tests
2019-12-14 34806, 2019
pristine__
but wait a lil to run the actual recommendation engine. or stats or stuff
2019-12-14 34812, 2019
pristine__
I am fine with this anyway
2019-12-14 34823, 2019
pristine__
> but things are also so young still that we may not get many people interested in contributing on this code.
2019-12-14 34825, 2019
pristine__
agreed
2019-12-14 34848, 2019
pristine__
Unless they take the initiative to make things work out anyway :p
2019-12-14 34823, 2019
pristine__
But I keep telling people about labs whenever someone here asks me about GSoC :p
2019-12-14 34836, 2019
pristine__
> which is great, but not a lot of lb-labs interrest yet.
2019-12-14 34800, 2019
pristine__
Yes. I have an eye on that.
2019-12-14 34852, 2019
pristine__
ruaok: So for the person who approached today, I think I can help him in setup and assign a ticket for writing tests
2019-12-14 34858, 2019
ruaok
so, not sure if we're in agreement here. how would you suggest we proceed in the next couple of weeks.
2019-12-14 34832, 2019
pristine__
I first of all would like to get the mapping PR merged with tests, have scripts for uploading mappings and relations done so that if anyone is willing to download huge data, the person can run the lil engine.
2019-12-14 34853, 2019
pristine__
rn, if you try to run, it will say data missing in HDFS.
2019-12-14 34838, 2019
pristine__
Then take on the task for making things easy for newcomers
2019-12-14 34845, 2019
pristine__
If it makes sense to you?
2019-12-14 34800, 2019
ruaok
yes, it does.
2019-12-14 34821, 2019
pristine__
(I know I am being a lil slow but uni people trouble a lot in final year and you cannot do anything since you need the degree)