-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add ml/cluster/strided/dkmeansld
#9703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat: add ml/cluster/strided/dkmeansld
#9703
Conversation
|
This PR is WIP. |
| * @param {integer} offsetInit - initial index. | ||
| * @ returns {Result} results object | ||
| */ | ||
| function dkmeansld( M, N, k, replicates, metric, maxIter, tol, X, strideX1, strideX2, offsetX, init, strideInit1, strideInit2, strideInit3, offsetInit ) { // eslint-disable-line max-len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized after looking at this and the internal array allocation that, similar to https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/stats/strided/dztest, we'll want an out parameter for a pre-allocated results object. In the C implementation, we don't want to be dynamically allocating memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also realizing that we'll want a samples (number of samples, M) and a features (number of features, N) property on the results object. Having these values will allow consumers to properly read the init, centroids, and statistics linear memory buffers, independent of the original function invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current thinking is the following signature:
function dkmeansld( M, N, k, replicates, metric, maxIter, tol, x, strideX1, strideX2, offsetX, init, strideInit1, strideInit2, strideInit3, offsetInit, centroids, strideC1, strideC2, offsetC, statistics, strideS1, strideS2, offsetS, labels, strideL1, offsetL, out )where centroids, statistics, and labels are output arrays and out is a "results" object with the following fields:
- replicates: number of times the data was clustered with different initial centroids.
- replicate: index of the initial centroids which produced the best result.
- metric: metric name.
- iterations: number of iterations for best results.
- method: algorithm name (e.g., 'lloyd').
- inertia: sum of squared distances to the closest centroid for all samples.
- k: number of clusters.
- samples: number of samples (i.e., M).
- features: number of features (i.e., N).
Similar to ztest, several of the parameter values should be copied over to what will be an uninitialized results object, such as M, N, k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signature with abbreviated parameter names:
function dkmeansld( M, N, k, rep, metric, maxIter, tol, x, sx1, sx2, ox, init, si1, si2, si3, oi, c, sc1, sc2, oc, stats, ss1, ss2, os, labels, sl, ol, out )
Resolves None.
Description
This pull request:
ml/cluster/strided/dkmeansld.Related Issues
This pull request has the following related issues:
Questions
No.
Other
No.
Checklist
AI Assistance
If you answered "yes" above, how did you use AI assistance?
Disclosure
None.
@stdlib-js/reviewers