Conversation
The documentation for cosine function changed to just "return the cosine of the angle between two vectors". This is the same thing as similarity, but change the text to match codegen. Fix test to enforce correct range of values it should return ([-1,1]) to avoid confusion. Changed javadoc for cosineDistance to state explicitly "returns the cosine distance between two vectors" to match codegen. Test that 1 - cosine(v1,v2) == cosineDistance(v1,v2) explicitly in test (within a floating point delta) to document the relationship in the test. Update vectorScore methods, change the similarity parameter name to distance to match codegen. Add two new methods that have another weight param for the ANN portion of the final hybrid score.
|
Copyright Validation Results ✅ Valid Files
✅ All files have valid copyright headers! |
There was a problem hiding this comment.
Pull request overview
Updates vector function documentation and APIs to align with codegen semantics (cosine/cosineDistance wording and vectorScore “distance” terminology) and strengthens tests around expected value ranges and relationships.
Changes:
- Updated VecExpr Javadocs for cosine/cosineDistance and reworded vectorScore docs/parameter naming from similarity → distance.
- Added vectorScore overloads that include an additional
weightparameter and implemented them inVecExprImpl. - Improved tests to enforce cosine range
[-1, 1], validate cosineDistance identity, and cover the new vectorScore overloads.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 15 comments.
| File | Description |
|---|---|
| marklogic-client-api/src/test/java/com/marklogic/client/test/rows/VectorTest.java | Expands vector function assertions and adds tests for new vectorScore overloads. |
| marklogic-client-api/src/main/java/com/marklogic/client/impl/VecExprImpl.java | Renames vectorScore params to distance and adds 4-arg overloads delegating to vec:vector-score. |
| marklogic-client-api/src/main/java/com/marklogic/client/expression/VecExpr.java | Updates public API documentation and signatures to match codegen terminology and new overloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function. | ||
| * @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>) | ||
| * @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) | ||
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) |
There was a problem hiding this comment.
The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.
| * <p> | ||
| * Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function. | ||
| * @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>) | ||
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) |
There was a problem hiding this comment.
The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.
| * <p> | ||
| * Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function. | ||
| * @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>) | ||
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) |
There was a problem hiding this comment.
The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.
| * Provides a client interface to the <a href="http://docs.marklogic.com/vec:vector-score" target="mlserverdoc">vec:vector-score</a> server function. | ||
| * @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>) | ||
| * @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) | ||
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) |
There was a problem hiding this comment.
The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) | |
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to vec:cosine-distance() or vec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) |
| * @param score The cts:score of the matching document. (of <a href="{@docRoot}/doc-files/types/xs_unsignedInt.html">xs:unsignedInt</a>) | ||
| * @param similarity The similarity between the vector in the matching document and the query vector. The result of a call to ovec:cosine(). In the case that the vectors are normalized, pass ovec:dot-product(). Note that vec:euclidean-distance() should not be used here. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) | ||
| * @param similarityWeight The weight of the vector similarity on the score. The default value is 0.1. If 0.0 is passed in, vector similarity has no effect. If passed a value less than 0.0 or greater than 1.0, throw VEC-VECTORSCORE. (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) | ||
| * @param distance The distance between the vector in the matching document and the query vector. Examples, the result of a call to ovec:cosine-distance() or ovec:euclidean-distance(). (of <a href="{@docRoot}/doc-files/types/xs_double.html">xs:double</a>) |
There was a problem hiding this comment.
The Javadoc refers to ovec:cosine-distance() / ovec:euclidean-distance(), but this interface is for vec:* functions and links to vec:vector-score. This looks like a documentation typo; update the examples to use the correct function prefix (e.g., vec:cosine-distance() / vec:euclidean-distance()) for consistency and to avoid confusing API users.
| public ServerExpression vectorScore(ServerExpression score, ServerExpression distance, ServerExpression distanceWeight); | ||
| /** | ||
| * A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application. | ||
| * A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results. |
There was a problem hiding this comment.
The vectorScore Javadoc repeats a very long, single-line explanation multiple times, which is hard to read in generated docs. Consider restructuring it with proper Javadoc formatting (e.g., separate paragraphs and an HTML list for the annScore/weight bullet points) and referencing the formula once to reduce repetition across overloads.
| public ServerExpression vectorScore(ServerExpression score, double distance, double distanceWeight, double weight); | ||
| /** | ||
| * A helper function that returns a hybrid score using a cts score and a vector similarity calculation result. You can tune the effect of the vector similarity on the score using the similarityWeight option. The ideal value for similarityWeight depends on your application. | ||
| * A helper function that returns a hybrid score using a cts score and a vector distance calculation result. You can tune the effect of the vector distance on the score using the distanceWeight option. The ideal value for distanceWeight depends on your application. The hybrid score is calculated using the formula: score = weight * annScore + (1 - weight) * ctsScore. - annScore is derived from the distance and distanceWeight, where a larger distanceWeight reduces the annScore for the same distance. - weight determines the contribution of the annScore and ctsScore to the final score. A weight of 0.5 balances both equally. This formula allows you to combine traditional cts scoring with vector-based distance scoring, providing a flexible way to rank results. |
There was a problem hiding this comment.
The vectorScore Javadoc repeats a very long, single-line explanation multiple times, which is hard to read in generated docs. Consider restructuring it with proper Javadoc formatting (e.g., separate paragraphs and an HTML list for the annScore/weight bullet points) and referencing the formula once to reduce repetition across overloads.
| .bind(op.as("vectorScore1", op.vec.vectorScore(op.xs.unsignedInt(100), 0.3, 0.5, 0.5))) | ||
| .bind(op.as("vectorScore2", op.vec.vectorScore(op.xs.unsignedInt(100), 0.3, 0.8, 0.7))) | ||
| .bind(op.as("vectorScore3", op.vec.vectorScore(op.xs.unsignedInt(100), 0.3, 0.5, 0.3))); |
There was a problem hiding this comment.
The assertion message for score1 vs score2 claims only distanceWeight differs, but vectorScore2 changes both distanceWeight (0.8) and weight (0.7). This makes the test less precise and harder to diagnose if it fails. Consider changing vectorScore2 to vary only one parameter at a time (e.g., keep weight the same as vectorScore1 when testing distanceWeight differences).
| // Different weight parameters should produce different scores | ||
| assertNotEquals(score1, score2, "Different distanceWeight values should produce different scores"); | ||
| assertNotEquals(score1, score3, "Different weight values should produce different scores"); |
There was a problem hiding this comment.
The assertion message for score1 vs score2 claims only distanceWeight differs, but vectorScore2 changes both distanceWeight (0.8) and weight (0.7). This makes the test less precise and harder to diagnose if it fails. Consider changing vectorScore2 to vary only one parameter at a time (e.g., keep weight the same as vectorScore1 when testing distanceWeight differences).
| assertTrue((cosine >= -1) && (cosine <= 1), "Cosine must be between -1 and 1, got: " + cosine); | ||
|
|
||
| double cosineDistanceEmbedding = row.getDouble("cosineDistanceEmbedding"); | ||
| assertTrue(cosineDistanceEmbedding >= 0 && cosineDistanceEmbedding <= 2, "Cosine distance must be between 0 and 2, got: " + cosineDistanceEmbedding); |
There was a problem hiding this comment.
This line has inconsistent indentation (mixed tabs/spaces) compared to surrounding assertions, which can cause noisy diffs and reduce readability. Please align indentation with the rest of the method.
| assertTrue(cosineDistanceEmbedding >= 0 && cosineDistanceEmbedding <= 2, "Cosine distance must be between 0 and 2, got: " + cosineDistanceEmbedding); | |
| assertTrue(cosineDistanceEmbedding >= 0 && cosineDistanceEmbedding <= 2, "Cosine distance must be between 0 and 2, got: " + cosineDistanceEmbedding); |
The documentation for cosine function changed to just "return the cosine of the angle between two vectors". This is the same thing as similarity, but change the text to match codegen. Fix test to enforce correct range of values it should return ([-1,1]) to avoid confusion. Changed javadoc for cosineDistance to state explicitly "returns the cosine distance between two vectors" to match codegen. Test that 1 - cosine(v1,v2) == cosineDistance(v1,v2) explicitly in test (within a floating point delta) to document the relationship in the test. Update vectorScore methods, change the similarity parameter name to distance to match codegen. Add two new methods that have another weight param for the ANN portion of the final hybrid score.