⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@nakul-krishnakumar
Copy link
Member

Resolves None.

Description

What is the purpose of this pull request?

This pull request:

  • Add ml/cluster/strided/dkmeansld.

Related Issues

Does this pull request have any related issues?

This pull request has the following related issues:

  • None.

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

AI Assistance

When authoring the changes proposed in this PR, did you use any kind of AI assistance?

  • Yes
  • No

If you answered "yes" above, how did you use AI assistance?

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Disclosure

If you answered "yes" to using AI assistance, please provide a short disclosure indicating how you used AI assistance. This helps reviewers determine how much scrutiny to apply when reviewing your contribution. Example disclosures: "This PR was written primarily by Claude Code." or "I consulted ChatGPT to understand the codebase, but the proposed changes were fully authored manually by myself.".

None.


@stdlib-js/reviewers

@stdlib-bot stdlib-bot added the Needs Review A pull request which needs code review. label Jan 12, 2026
@nakul-krishnakumar nakul-krishnakumar marked this pull request as draft January 12, 2026 06:43
@stdlib-bot stdlib-bot removed the Needs Review A pull request which needs code review. label Jan 12, 2026
@nakul-krishnakumar
Copy link
Member Author

This PR is WIP.

@nakul-krishnakumar
Copy link
Member Author

Expected API Design:
image

* @param {integer} offsetInit - initial index.
* @ returns {Result} results object
*/
function dkmeansld( M, N, k, replicates, metric, maxIter, tol, X, strideX1, strideX2, offsetX, init, strideInit1, strideInit2, strideInit3, offsetInit ) { // eslint-disable-line max-len
Copy link
Member

@kgryte kgryte Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized after looking at this and the internal array allocation that, similar to https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/stats/strided/dztest, we'll want an out parameter for a pre-allocated results object. In the C implementation, we don't want to be dynamically allocating memory.

Copy link
Member

@kgryte kgryte Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also realizing that we'll want a samples (number of samples, M) and a features (number of features, N) property on the results object. Having these values will allow consumers to properly read the init, centroids, and statistics linear memory buffers, independent of the original function invocation.

Copy link
Member

@kgryte kgryte Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current thinking is the following signature:

function dkmeansld( M, N, k, replicates, metric, maxIter, tol, x, strideX1, strideX2, offsetX, init, strideInit1, strideInit2, strideInit3, offsetInit, centroids, strideC1, strideC2, offsetC, statistics, strideS1, strideS2, offsetS, labels, strideL1, offsetL, out )

where centroids, statistics, and labels are output arrays and out is a "results" object with the following fields:

  • replicates: number of times the data was clustered with different initial centroids.
  • replicate: index of the initial centroids which produced the best result.
  • metric: metric name.
  • iterations: number of iterations for best results.
  • method: algorithm name (e.g., 'lloyd').
  • inertia: sum of squared distances to the closest centroid for all samples.
  • k: number of clusters.
  • samples: number of samples (i.e., M).
  • features: number of features (i.e., N).

Similar to ztest, several of the parameter values should be copied over to what will be an uninitialized results object, such as M, N, k.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signature with abbreviated parameter names:

function dkmeansld( M, N, k, rep, metric, maxIter, tol, x, sx1, sx2, ox, init, si1, si2, si3, oi, c, sc1, sc2, oc, stats, ss1, ss2, os, labels, sl, ol, out )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants