That's basically what I am trying to achieve in my projects as well, just at a lower scale :)
2022-03-14 07327, 2022
mayhem
what we may need to do is one extra step:
2022-03-14 07358, 2022
mayhem
- Convert the artist mbid to text and the recording mbid to text, then lookup the text in our mapping and output that.
2022-03-14 07307, 2022
mayhem
that is likely the best cleanup that can be done on that data.
2022-03-14 07333, 2022
PrathameshG34
interestingly enough, I think I might have already written some Python code for that
2022-03-14 07348, 2022
mayhem
PrathameshG34: good. I think it would make sense to carry forth with your projects using MB/LB data, rather than last.fm data. but that is just my take. :)
2022-03-14 07313, 2022
PrathameshG34
Oh yes, definitely!
2022-03-14 07314, 2022
PrathameshG34
That's exactly what I want to do lastfm data is pretty poor, so I am trying to aggregate data from as many feature-rich sources as possible (including MB metadata, and spotify data for stuff that's not on MB)
2022-03-14 07348, 2022
mayhem
what is the goal of your project?
2022-03-14 07355, 2022
mayhem
and will the results be open source?
2022-03-14 07330, 2022
PrathameshG34
Yes, I'll try my best to put it all back into the MB database.
2022-03-14 07314, 2022
mayhem
cool, but what is your desired data outcome?
2022-03-14 07304, 2022
PrathameshG34
I listen to music like a maniac, so I wanted to analyze everything about it.
2022-03-14 07304, 2022
PrathameshG34
Lastfm was the obvious choice because it combined my streaming history from all sources. Then the data was pathetic, so I looked up MB, etc. Now I am just trying to aggregate all that stuff together and give back to this community in the process since it's exactly what I wished to create when I didn't know about it.
2022-03-14 07305, 2022
PrathameshG34
My desired outcome is to create a process that takes in just a few fields about a stream (title, artist, album, and MBID). Then crawl the web to find as much metadata about it as possible
2022-03-14 07304, 2022
mayhem
ok, sounds like our goals align well.
2022-03-14 07321, 2022
PrathameshG34
I wish to later use that data for advanced analytics and stuff. Provide insights that no other platform currently provides, interesting connections and visualizations (like unofficial collabs between artists in form of production credits, etc).
2022-03-14 07321, 2022
PrathameshG34
Just a LOT of metadata about everything. at one convenient place
2022-03-14 07337, 2022
mayhem
though adding scraped data to MB might be tricky -- that likely won't be able to be done in an automated manner...
2022-03-14 07337, 2022
PrathameshG34
mayhem: 🤝
2022-03-14 07337, 2022
lucifer
mayhem: i had tried 10, 15, 25. its selected 10 for iterations so what we are using already.
2022-03-14 07306, 2022
mayhem
PrathameshG34: yep, those are our goals with listenbrainz.
2022-03-14 07351, 2022
mayhem
lucifer: ok, then I think we should set the range 5 - 15 so that its current best lies in the middle of the range.
2022-03-14 07302, 2022
mayhem
let me re-read about the alpha factor again.
2022-03-14 07307, 2022
lucifer
👍
2022-03-14 07309, 2022
PrathameshG34
mayhem: right, I am aware that the metabrainz data addition goes through a lot of scrutiny for obvious reasons, so I am not expecting much from it at this very moment, but hopefully we could create pipelines for it further down the line :))
2022-03-14 07325, 2022
mayhem
good good
2022-03-14 07318, 2022
mayhem
"alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations (defaults to 1.0)."
2022-03-14 07323, 2022
mayhem
sigh
2022-03-14 07347, 2022
PrathameshG34
BTW, I'd love to get started with the MLHD and do some EDA with it. However I am facing some problems downloading it. I'll try again and let you guys know if I face any issues with it again 👍
2022-03-14 07354, 2022
PrathameshG34
I'll be right back
2022-03-14 07311, 2022
mayhem
k.
2022-03-14 07332, 2022
mayhem
we also have a common dev machine in a data center that we could probably give you an account on.
2022-03-14 07343, 2022
mayhem
it has gobs of bandwidht
2022-03-14 07332, 2022
mayhem
lucifer: lets just repeat our process for the alpha parameter. look at what is picked, pick a new range that gives it more, space. re-run, evaluate, adjust.
2022-03-14 07317, 2022
lucifer
mayhem: it appears unlike other params we can't request multiple alpha in one job. i'll modify the code tomorrow so that we can test it.
I hope waiting for sometime will solve this issue?
2022-03-14 07319, 2022
mayhem
I wonder if this has been ongoing for a while.
2022-03-14 07313, 2022
PrathameshG
Yea, I wasn't able to access the dataset yesterday either
2022-03-14 07320, 2022
PrathameshG
Are there any mirrors for this?
2022-03-14 07302, 2022
BrainzGit
[bookbrainz-site] 14dependabot[bot] opened pull request #806 (03master…dependabot/npm_and_yarn/babel/runtime-7.17.7): chore(deps): bump @babel/runtime from 7.16.3 to 7.17.7 https://github.com/metabrainz/bookbrainz-site/pul…
2022-03-14 07306, 2022
BrainzGit
[bookbrainz-site] 14dependabot[bot] closed pull request #795 (03master…dependabot/npm_and_yarn/babel/runtime-7.17.2): chore(deps): bump @babel/runtime from 7.16.3 to 7.17.2 https://github.com/metabrainz/bookbrainz-site/pul…
2022-03-14 07349, 2022
alastairp
PrathameshG: I emailed the author of the dataset (an old workmate of mine) to see if he knows what the error means. If it doesn't get resolved, I think I have a full copy somewhere, I'll have a look
2022-03-14 07323, 2022
PrathameshG
alastairp: Thanks a lot! Really appreciated.
2022-03-14 07312, 2022
BrainzGit
[bookbrainz-site] 14MonkeyDo merged pull request #800 (03master…edition-initial-search-matching-EG): fix(entity-editor): Edition: search for EditionGroups with same name https://github.com/metabrainz/bookbrainz-site/pul…
2022-03-14 07320, 2022
BrainzGit
[bookbrainz-site] 14MonkeyDo merged pull request #801 (03master…search-initial-pre-filled-name): feat(entity-editor): search for duplicates when pre-filling name https://github.com/metabrainz/bookbrainz-site/pul…
2022-03-14 07320, 2022
Shubh has quit
2022-03-14 07333, 2022
BrainzGit
[musicbrainz-server] 14reosarevok opened pull request #2454 (03master…MBS-12252): MBS-12252 / MBS-12253 / MBS-12254 / MBS-12255: Genre-related schema additions for consistency https://github.com/metabrainz/musicbrainz-server/…
I stopped sir-prod again, the load on pink increased a lot and messages ingestion is somehow stuck, number of the messages in the queue is increasing again
yvanzo: I tried again, sir ingests few messages, then it gets stuck, and load increases, I stopped it, have a look when you can.
2022-03-14 07333, 2022
aerozol
Been getting a bug lately where the edit note screen no longer gives a summary/preview of all the edits. Seems to happen more with importers (e.g. atisket, discogs). Anyone else?
2022-03-14 07312, 2022
aerozol
I could test by editing without any userscripts enabled for a while... but that sounds awful :D