That's basically what I am trying to achieve in my projects as well, just at a lower scaleĀ :)
mayhem
what we may need to do is one extra step:
- Convert the artist mbid to text and the recording mbid to text, then lookup the text in our mapping and output that.
that is likely the best cleanup that can be done on that data.
PrathameshG34
interestingly enough, I think I might have already written some Python code for that
mayhem
PrathameshG34: good. I think it would make sense to carry forth with your projects using MB/LB data, rather than last.fm data. but that is just my take. :)
PrathameshG34
Oh yes, definitely!
That's exactly what I want to do lastfm data is pretty poor, so I am trying to aggregate data from as many feature-rich sources as possible (including MB metadata, and spotify data for stuff that's not on MB)
mayhem
what is the goal of your project?
and will the results be open source?
PrathameshG34
Yes, I'll try my best to put it all back into the MB database.
mayhem
cool, but what is your desired data outcome?
PrathameshG34
I listen to music like a maniac, so I wanted to analyze everything about it.
Lastfm was the obvious choice because it combined my streaming history from all sources. Then the data was pathetic, so I looked up MB, etc. Now I am just trying to aggregate all that stuff together and give back to this community in the process since it's exactly what I wished to create when I didn't know about it.
My desired outcome is to create a process that takes in just a few fields about a stream (title, artist, album, and MBID). Then crawl the web to find as much metadata about it as possible
mayhem
ok, sounds like our goals align well.
PrathameshG34
I wish to later use that data for advanced analytics and stuff. Provide insights that no other platform currently provides, interesting connections and visualizations (like unofficial collabs between artists in form of production credits, etc).
Just a LOT of metadata about everything. at one convenient place
mayhem
though adding scraped data to MB might be tricky -- that likely won't be able to be done in an automated manner...
PrathameshG34
mayhem: š¤
lucifer
mayhem: i had tried 10, 15, 25. its selected 10 for iterations so what we are using already.
mayhem
PrathameshG34: yep, those are our goals with listenbrainz.
lucifer: ok, then I think we should set the range 5 - 15 so that its current best lies in the middle of the range.
let me re-read about the alpha factor again.
lucifer
š
PrathameshG34
mayhem: right, I am aware that the metabrainz data addition goes through a lot of scrutiny for obvious reasons, so I am not expecting much from it at this very moment, but hopefully we could create pipelines for it further down the line :))
mayhem
good good
"alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0)."
sigh
PrathameshG34
BTW, I'd love to get started with the MLHD and do some EDA with it. However I am facing some problems downloading it. I'll try again and let you guys know if I face any issues with it again š
I'll be right back
mayhem
k.
we also have a common dev machine in a data center that we could probably give you an account on.
it has gobs of bandwidht
lucifer: lets just repeat our process for the alpha parameter. look at what is picked, pick a new range that gives it more, space. re-run, evaluate, adjust.
lucifer
mayhem: it appears unlike other params we can't request multiple alpha in one job. i'll modify the code tomorrow so that we can test it.
I hope waiting for sometime will solve this issue?
mayhem
I wonder if this has been ongoing for a while.
PrathameshG
Yea, I wasn't able to access the dataset yesterday either
Are there any mirrors for this?
BrainzGit
[bookbrainz-site] 14dependabot[bot] opened pull request #806 (03masterā¦dependabot/npm_and_yarn/babel/runtime-7.17.7): chore(deps): bump @babel/runtime from 7.16.3 to 7.17.7 https://github.com/metabrainz/bookbrainz-site/p...
[bookbrainz-site] 14dependabot[bot] closed pull request #795 (03masterā¦dependabot/npm_and_yarn/babel/runtime-7.17.2): chore(deps): bump @babel/runtime from 7.16.3 to 7.17.2 https://github.com/metabrainz/bookbrainz-site/p...
alastairp
PrathameshG: I emailed the author of the dataset (an old workmate of mine) to see if he knows what the error means. If it doesn't get resolved, I think I have a full copy somewhere, I'll have a look
PrathameshG
alastairp: Thanks a lot! Really appreciated.
BrainzGit
[bookbrainz-site] 14MonkeyDo merged pull request #800 (03masterā¦edition-initial-search-matching-EG): fix(entity-editor): Edition: search for EditionGroups with same name https://github.com/metabrainz/bookbrainz-site/p...
[bookbrainz-site] 14MonkeyDo merged pull request #801 (03masterā¦search-initial-pre-filled-name): feat(entity-editor): search for duplicates when pre-filling name https://github.com/metabrainz/bookbrainz-site/p...
Shubh has quit
[musicbrainz-server] 14reosarevok opened pull request #2454 (03masterā¦MBS-12252): MBS-12252 / MBS-12253 / MBS-12254 / MBS-12255: Genre-related schema additions for consistency https://github.com/metabrainz/musicbrainz-serve...
I stopped sir-prod again, the load on pink increased a lot and messages ingestion is somehow stuck, number of the messages in the queue is increasing again
yvanzo: I tried again, sir ingests few messages, then it gets stuck, and load increases, I stopped it, have a look when you can.
aerozol
Been getting a bug lately where the edit note screen no longer gives a summary/preview of all the edits. Seems to happen more with importers (e.g. atisket, discogs). Anyone else?
I could test by editing without any userscripts enabled for a while... but that sounds awful :D