Tumgik
#melesigenes
matrookfromizmir · 4 years
Photo
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
Mat-Rook The Vagabond Dog of Izmir
Chapter 42-Argos
For all chapters: https://matrookfromizmir.tumblr.com/tagged/chapter/chrono
0 notes
eurekakinginc · 5 years
Photo
Tumblr media
"[D] How to deal with a classification problem of a big mbalanced dataset?"- Detail: I have a dataset of 8 million unique members, approximately 800 million records. Of those 8 million members I have a positive sample of about 25000. It's a binary classification problem. I would like to not simply downsample although the downsampled RF performs pretty well. The data is on a Hadoop cluster. I only have access to it via a Zeppelin notebook with PySpark. It's a pain in the ass to get approval for packages installed. PySpark is even in python 2.7 and I don't really use Python 2. What should I do? The notebook is in a VM that's not connected to the worldwideweb. I would have to rewrite solutions like SMOTE if I wanted to use it. I found a package but it takes like a week for approval and I only have two more weeks for the project. I wanted to use a balanced or weighted random forest but I don't see a native spark.ml implemention. I'm also kind of new to spark.Any tips or advice on how to proceed? Would highly appreciate.. Caption by melesigenes. Posted By: www.eurekaking.com
0 notes