Skip to content

MLE-27018 update cosine, cosineDistance and vectorScore to match codegen#1896

Merged
stevebio merged 1 commit intodevelopfrom
feature/mle-27018-change-vecscore-cosine-cosinedistance
Feb 6, 2026
Merged

MLE-27018 update cosine, cosineDistance and vectorScore to match codegen#1896
stevebio merged 1 commit intodevelopfrom
feature/mle-27018-change-vecscore-cosine-cosinedistance

Conversation

@stevebio
Copy link

@stevebio stevebio commented Feb 6, 2026

The documentation for cosine function changed to just "return the cosine of the angle between two vectors". This is the same thing as similarity, but change the text to match codegen. Fix test to enforce correct range of values it should return ([-1,1]) to avoid confusion. Changed javadoc for cosineDistance to state explicitly "returns the cosine distance between two vectors" to match codegen. Test that 1 - cosine(v1,v2) == cosineDistance(v1,v2) explicitly in test (within a floating point delta) to document the relationship in the test. Update vectorScore methods, change the similarity parameter name to distance to match codegen. Add two new methods that have another weight param for the ANN portion of the final hybrid score.

The documentation for cosine function changed to just "return the cosine of the angle between two vectors". This is the same thing as similarity, but change the text to match codegen. Fix test to enforce correct range of values it should return ([-1,1]) to avoid confusion.
Changed javadoc for cosineDistance to state explicitly "returns the cosine distance between two vectors" to match codegen. Test that 1 - cosine(v1,v2) == cosineDistance(v1,v2) explicitly in test (within a floating point delta) to document the relationship in the test.
Update vectorScore methods, change the similarity parameter name to distance to match codegen. Add two new methods that have another weight param for the ANN portion of the final hybrid score.
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Copyright Validation Results
Total: 3 | Passed: 3 | Failed: 0 | Skipped: 0 | at: 2026-02-06 18:39:08 UTC | commit: d7c6ad5

✅ Valid Files

  • marklogic-client-api/src/main/java/com/marklogic/client/expression/VecExpr.java
  • marklogic-client-api/src/main/java/com/marklogic/client/impl/VecExprImpl.java
  • marklogic-client-api/src/test/java/com/marklogic/client/test/rows/VectorTest.java

✅ All files have valid copyright headers!

Copy link
Contributor

@rjrudin rjrudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@stevebio stevebio marked this pull request as ready for review February 6, 2026 21:56
@stevebio stevebio requested a review from BillFarber as a code owner February 6, 2026 21:56
Copilot AI review requested due to automatic review settings February 6, 2026 21:56
@stevebio stevebio merged commit 52634bb into develop Feb 6, 2026
4 checks passed
@stevebio stevebio deleted the feature/mle-27018-change-vecscore-cosine-cosinedistance branch February 6, 2026 21:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates vector function documentation and APIs to align with codegen semantics (cosine/cosineDistance wording and vectorScore “distance” terminology) and strengthens tests around expected value ranges and relationships.

Changes:

  • Updated VecExpr Javadocs for cosine/cosineDistance and reworded vectorScore docs/parameter naming from similarity → distance.
  • Added vectorScore overloads that include an additional weight parameter and implemented them in VecExprImpl.
  • Improved tests to enforce cosine range [-1, 1], validate cosineDistance identity, and cover the new vectorScore overloads.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 15 comments.

File Description
marklogic-client-api/src/test/java/com/marklogic/client/test/rows/VectorTest.java Expands vector function assertions and adds tests for new vectorScore overloads.
marklogic-client-api/src/main/java/com/marklogic/client/impl/VecExprImpl.java Renames vectorScore params to distance and adds 4-arg overloads delegating to vec:vector-score.
marklogic-client-api/src/main/java/com/marklogic/client/expression/VecExpr.java Updates public API documentation and signatures to match codegen terminology and new overloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.

Copilot uses AI. Check for mistakes.
* <p>
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.

Copilot uses AI. Check for mistakes.
* <p>
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.

Copilot uses AI. Check for mistakes.
* Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function.
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.

Suggested change
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to vec:cosine-distance() or vec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)

Copilot uses AI. Check for mistakes.
* @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>)
* @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
* @param similarityWeight The weight of the vector similarity on the score. The default value is 0.1. If 0.0 is passed in, vector similarity has no effect. If passed a value less than 0.0 or greater than 1.0, throw VEC-VECTORSCORE. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
* @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.

Copilot uses AI. Check for mistakes.
public ServerExpression vectorScore(ServerExpression score, ServerExpression distance, ServerExpression distanceWeight);
/**
* A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application.
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectorScore Javadoc repeats a very long, single-line explanation multiple times, which is hard to read in generated docs. Consider restructuring it with proper Javadoc formatting (e.g., separate paragraphs and an HTML list for the annScore/weight bullet points) and referencing the formula once to reduce repetition across overloads.

Copilot uses AI. Check for mistakes.
public ServerExpression vectorScore(ServerExpression score, double distance, double distanceWeight, double weight);
/**
* A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application.
* A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results.
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectorScore Javadoc repeats a very long, single-line explanation multiple times, which is hard to read in generated docs. Consider restructuring it with proper Javadoc formatting (e.g., separate paragraphs and an HTML list for the annScore/weight bullet points) and referencing the formula once to reduce repetition across overloads.

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +273
.bind(op.as("vectorScore1", op.vec.vectorScore(op.xs.unsignedInt(100), 0.3, 0.5, 0.5)))
.bind(op.as("vectorScore2", op.vec.vectorScore(op.xs.unsignedInt(100), 0.3, 0.8, 0.7)))
.bind(op.as("vectorScore3", op.vec.vectorScore(op.xs.unsignedInt(100), 0.3, 0.5, 0.3)));
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion message for score1 vs score2 claims only distanceWeight differs, but vectorScore2 changes both distanceWeight (0.8) and weight (0.7). This makes the test less precise and harder to diagnose if it fails. Consider changing vectorScore2 to vary only one parameter at a time (e.g., keep weight the same as vectorScore1 when testing distanceWeight differences).

Copilot uses AI. Check for mistakes.
Comment on lines +289 to +291
// Different weight parameters should produce different scores
assertNotEquals(score1, score2, "Different distanceWeight values should produce different scores");
assertNotEquals(score1, score3, "Different weight values should produce different scores");
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion message for score1 vs score2 claims only distanceWeight differs, but vectorScore2 changes both distanceWeight (0.8) and weight (0.7). This makes the test less precise and harder to diagnose if it fails. Consider changing vectorScore2 to vary only one parameter at a time (e.g., keep weight the same as vectorScore1 when testing distanceWeight differences).

Copilot uses AI. Check for mistakes.
assertTrue((cosine >= -1) && (cosine <= 1), "Cosine must be between -1 and 1, got: " + cosine);

double cosineDistanceEmbedding = row.getDouble("cosineDistanceEmbedding");
assertTrue(cosineDistanceEmbedding >= 0 && cosineDistanceEmbedding <= 2, "Cosine distance must be between 0 and 2, got: " + cosineDistanceEmbedding);
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has inconsistent indentation (mixed tabs/spaces) compared to surrounding assertions, which can cause noisy diffs and reduce readability. Please align indentation with the rest of the method.

Suggested change
assertTrue(cosineDistanceEmbedding >= 0 && cosineDistanceEmbedding <= 2, "Cosine distance must be between 0 and 2, got: " + cosineDistanceEmbedding);
assertTrue(cosineDistanceEmbedding >= 0 && cosineDistanceEmbedding <= 2, "Cosine distance must be between 0 and 2, got: " + cosineDistanceEmbedding);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants