in #metabrainz

0:03 AM
heyoni joined the channel
0:07 AM
dragonzeron joined the channel
0:08 AM
heyoni has quit
0:16 AM
Leo_Verto

amending unpushed commits is fine but don't do it for already pushed once :P
0:17 AM
naiveai

yeah, will keep in mind. i spotted it too late >_<
0:24 AM
heyoni joined the channel
0:29 AM
come to think of it I'm going to have to sign all my commits too ....
0:41 AM
Sophist-UK has quit
0:46 AM
heyoni_ joined the channel
0:49 AM
heyoni has quit
1:03 AM
D4RK-PH0ENiX has quit
1:19 AM
annebelleo joined the channel
1:21 AM
D4RK-PH0ENiX joined the channel
1:22 AM
heyoni joined the channel
1:25 AM
annebelleo has quit
1:26 AM
heyoni_ has quit
2:16 AM
discopatrick has quit
3:03 AM
LordSputnik: could you purge @coveralls comments from https://git.io/vbbR1? I'm about to do a bulk signing commit that would leave only one coveralls comment for all my commits
3:05 AM
Slurpee joined the channel
3:06 AM
then I'll do a proper writeup on the changes and submit
4:08 AM
Slurpee has quit
4:15 AM
xps2 has quit
4:16 AM
warp_ joined the channel
4:34 AM
rgunkar joined the channel
4:39 AM
dragonzeron has quit
5:14 AM
leon joined the channel
5:14 AM
leon is now known as Guest94268
5:22 AM
Leftmost

naiveai, by and large you shouldn't ever have to do a force push, so if you find yourself doing that, double-check.
5:31 AM
heyoni has quit
5:40 AM
xps2 joined the channel
5:44 AM
Guest94268

Hey
5:45 AM
I am trying to setup picard-website locally and following the install.md for linux
5:45 AM
however when I am running pip install -r requirements.txt I am facing an error
5:45 AM
Complete output from command python setup.py egg_info:
5:45 AM
Traceback (most recent call last):
5:45 AM
File "<string>", line 1, in <module>
5:45 AM
File "/tmp/pip-build-g83s_q00/transifex-client/setup.py", line 14, in <module>
5:45 AM
if long_description.startswith(BOM):
5:45 AM
TypeError: startswith first arg must be str or a tuple of str, not bytes
5:45 AM
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-g83s_q00/transifex-client/
5:46 AM
How to solve this error?
5:51 AM
iliekcomputers

Which version of python are you using?
5:53 AM
Are you using docker?
5:57 AM
Guest94268

I am using python 2.7.12
5:57 AM
Using virualenv
5:59 AM
*python 3.5.2
6:03 AM
naiveai

Leftmost: i have some…bad git habits to say the least, so I do force pushes more foten then I really should. i'm trying to shake off the habit
6:07 AM
i feel like at this point I should just open a new PR without all the coveralls spam
6:14 AM
lesson learned (i feel like i've been saying that a lot lately): git is literally magic, but magic should be used wisely.
6:25 AM
LordSputnik, arthelon[m], Leftmost: made PR#174 with all commits signed, pushed all at once. submitted.
6:25 AM
phew! now i won't bother you guys again.
6:25 AM
naiveai has left the channel
6:44 AM
naiveai joined the channel
6:51 AM
Guest94268

@iliekcomputers any help?
7:12 AM
PROTechThor joined the channel
7:40 AM
xps2 has quit
7:45 AM
PROTechThor has quit
7:48 AM
TehTotalPwnage joined the channel
8:10 AM
drsaunder has quit
8:27 AM
d4rkie joined the channel
8:29 AM
D4RK-PH0ENiX has quit
8:56 AM
D4RK-PH0ENiX joined the channel
8:57 AM
d4rkie has quit
9:02 AM
d4rkie joined the channel
9:03 AM
D4RK-PH0ENiX has quit
9:19 AM
D4RK-PH0ENiX joined the channel
9:19 AM
d4rkie has quit
9:33 AM
reosarevok

zas: suffering some sort of server setbacks?
9:34 AM
Stuff seems slow
9:35 AM
MajorLurker has quit
9:35 AM
iliekcomputers

Guest94268: I'm not familiar with Picard website but it uses python 2.7 I think
9:35 AM
MajorLurker joined the channel
9:41 AM
zas

reosarevok: i did a quick check, it doesn't look slower than usual
9:41 AM
reosarevok

I think now it is a bit better again - I was having issues with stuff taking so long to send it broke scripts :/
9:47 AM
discopatrick joined the channel
9:51 AM
Guest94268 has quit
9:52 AM
HSOWA joined the channel
9:52 AM
KassOtsimine has quit
9:57 AM
KassOtsimine joined the channel
9:57 AM
KassOtsimine has quit
9:57 AM
KassOtsimine joined the channel
9:58 AM
HSOWA has quit
10:44 AM
Bruker joined the channel
10:44 AM
kuno has quit
10:44 AM
warp_ is now known as kuno
10:45 AM
kuno has quit
10:45 AM
kuno joined the channel
10:52 AM
Bruker has quit
10:56 AM
HSOWA joined the channel
10:57 AM
KassOtsimine has quit
10:58 AM
Bruker joined the channel
11:03 AM
KassOtsimine joined the channel
11:06 AM
HSOWA has quit
11:20 AM
Bruker has quit
11:23 AM
Bruker joined the channel
11:25 AM
kartikeya joined the channel
11:27 AM
kartikeya has quit
11:27 AM
kartikeya joined the channel
11:32 AM
KassOtsimine has quit
11:33 AM
KassOtsimine joined the channel
11:35 AM
Bruker_ joined the channel
11:36 AM
Bruker has quit
11:40 AM
Bruker_ has quit
11:41 AM
Bruker joined the channel
11:49 AM
ruaok

iliekcomputers: ping. see if we can appear in the same place at the same time today.
11:49 AM
iliekcomputers

Here right now
11:49 AM
ruaok

sweet.
11:50 AM
ok, first off. I'm really sorry.
11:50 AM
second, I remember what drama the freedb data dumps are.
11:50 AM
they create one file on disk for every CD.
11:52 AM
it chews up inodes and does an insane amount of file opening/closing.
11:52 AM
the process is painfully slow and ideally one would do an in-memory stream decompress to not bog down the filesystem.
11:53 AM
rgunkar has quit
11:55 AM
iliekcomputers

And us creating a file for each user is similar?
11:55 AM
ruaok

exactly.
11:55 AM
as much as I love the fact that one user can quickly find their own data, it makes consuming data dumps much harder for everyone.
11:55 AM
iliekcomputers

I see.
11:56 AM
ruaok

I think we should make it really simple then and not think too much.
11:56 AM
iliekcomputers

Tbh, there are only ~1200 users in LB right now.
11:56 AM
ruaok

correct.
11:56 AM
but this becomes sub-optimal at 10k+, which isn't far out.
11:57 AM
iliekcomputers

Okay.
11:57 AM
ruaok

so, I am thinking that we should set some file size limit. say 10MB or maybe 100MB.
11:58 AM
then we pump listens into txt files.
11:58 AM
normal is to make one JSON document per line.
11:58 AM
iliekcomputers

Text files which aren't username specific
11:58 AM
ruaok

correct.
11:58 AM
iliekcomputers

And each Json file contains usernames?
11:58 AM
ruaok

maybe just monotonically increasing numbers.
11:58 AM
iliekcomputers

*json doc
11:59 AM
ruaok

yes. I forget, does the JSON now contain usernames?
11:59 AM
antlarr has quit
11:59 AM
iliekcomputers

I'm not sure but I don't think it does.
11:59 AM
ruaok

that would make it easy for us, but a bit harder for users since they would need to grep for their user names.
11:59 AM
antlarr joined the channel
12:00 PM
my natural instinct is to make it so that we create an enclosing JSON doc with the username and then output one per line.
12:00 PM
but catcat would make that tough. having to load 1M+ listens into ram to be able to parse it, its also not very smart.
12:01 PM
so, we need to find a simple way to delineate listens by users.
12:01 PM
we could do that in the index file.
12:02 PM
user -> filename, offset
12:02 PM
user -> filename, offset, size
12:02 PM
so that to find the data for a given user, they need to extract one file, seek to offset and read size bytes.
12:02 PM
and it makes dumping the data very simple.
12:04 PM
iliekcomputers

Hmmm, sounds good, what if a user's listens take more than one file?
12:04 PM
ruaok

I think if we tread the filesize as a recommendation, not a hard limit, we should be good.
12:05 PM
meaning that if we pick 10Mb as the limit (unlikely) and a user has 12mb of listens, they take up one file.
12:05 PM
iliekcomputers

Ah okay
12:05 PM
ruaok

and if a user is at 9mb and then we reach catcat, fine.
12:06 PM
iliekcomputers

catcat would still probably take many files?
12:06 PM
ruaok

I'm quite curious to dump catcat to a single file using your current setup.