how did you generate this? what import/reimport steps did you take?
2017-06-22 17346, 2017
ruaok
I'd love some help getting me talk title/desc up to scracth
2017-06-22 17353, 2017
ruaok
scratch, even.
2017-06-22 17312, 2017
reosarevok
ruaok: you namedrop me in an interview and don't tell me? :D
2017-06-22 17339, 2017
ruaok
reosarevok: hey you! I dropped your name to get some street cred. Did you hear???
2017-06-22 17341, 2017
ruaok
:-P
2017-06-22 17347, 2017
reosarevok
(just got a message from a guy I know "hey this post at the top of the subreddit I follow mentions you wtf" :D )
2017-06-22 17308, 2017
ruaok
lol
2017-06-22 17347, 2017
iliekcomputers
ruaok: I did a simple last.fm import, just added a print to influx_writer wherever it thought a listen to be a duplicate
2017-06-22 17304, 2017
ruaok
did you start with an empty DB?
2017-06-22 17319, 2017
iliekcomputers
yes
2017-06-22 17330, 2017
ruaok
oh. well, that is a problem, then.
2017-06-22 17349, 2017
ruaok
clearly there shouldn't be duplicates on a single import.
2017-06-22 17310, 2017
iliekcomputers
yeah, I dunno what the problem is, though
2017-06-22 17338, 2017
iliekcomputers goes to take a look
2017-06-22 17344, 2017
ruaok
do we have an edge condition somewhere
2017-06-22 17347, 2017
ruaok
?
2017-06-22 17353, 2017
ruaok
>= vs > ?
2017-06-22 17319, 2017
ruaok
which might causes listens at the end of a block to be duplicated/omitted?
2017-06-22 17339, 2017
alastairp
I failed to import 2
2017-06-22 17343, 2017
alastairp
out of 400 pages
2017-06-22 17359, 2017
iliekcomputers
could be, I haven't seen much of the scraper code in detail. Although, the data is a bit weird, I don't understand how a difference of 2 seconds between listens is possible. We couldn't be the ones doing that, because the last.fm api returns timestamps themselves, maybe the last.fm data has listens with differences less than 30 seconds.
2017-06-22 17301, 2017
alastairp
I would have been more suspicious if I had failed to import 400 ;)
2017-06-22 17328, 2017
alastairp
iliekcomputers: yeah, it would be worth doing a scrape of an API and checking the values of the timestamps
2017-06-22 17306, 2017
ruaok
alastairp: it failed to import more than 1000 for CatQuest
inb4 flame war: I've been using XFCE forever but I'm getting a bit tired of some of its issues. Out of the flavours Ubuntu does ship with, what's the one I should install when I reinstall it? :p
2017-06-22 17307, 2017
Galaverna joined the channel
2017-06-22 17338, 2017
alastairp
ferbncode: cool, I'll look this afternoon. thanks
2017-06-22 17356, 2017
alastairp
honestly, gnome3 is way better than it was 6 years ago
2017-06-22 17313, 2017
Galaverna has quit
2017-06-22 17336, 2017
alastairp
I don't think anything annoys me on a day-to-day basis, but I installed quite a number of extensions to make it better
2017-06-22 17326, 2017
agentsim joined the channel
2017-06-22 17323, 2017
ferbncode
alastairp: great, thanks :). I used gnome an year ago, it was a heavy one for me, but then I switched to i3wm, superlight :P
2017-06-22 17330, 2017
agentsim has quit
2017-06-22 17322, 2017
ruaok
weird feeling of the day: logging into quickbooks and having new companies appear there that I've never heard of or dealt with.
2017-06-22 17326, 2017
ruaok
I <3 you Quesito!
2017-06-22 17309, 2017
iliekcomputers
so the weird data is from lastfm itself
2017-06-22 17346, 2017
Quesito
Lol! Lots of new ones ruaok!
2017-06-22 17311, 2017
ruaok
Quesito: :)
2017-06-22 17345, 2017
ruaok
iliekcomputers: I kinda thought so. you and i had been scouring all aspects of data ingestion.
2017-06-22 17341, 2017
iliekcomputers
there is this one edge case that I think we might have missed in influx-writer, if there are duplicates in the same rabbitmq batch, it might not find out because the timestamps dict here doesn't contain that timestamp?
[listenbrainz-server] paramsingh opened pull request #204: LB-180: Account for duplicates in same RabbitMQ batch for influx-writer (master...influx-writer/same-batch-dup) https://git.io/vQL0v