Jira task: https://scylladb.atlassian.net/browse/DRIVER-735
Jira epic: https://scylladb.atlassian.net/browse/DRIVER-729
Copied from Jira epic DRIVER-729:
This issue tracks client-library support for the Alternator vector search API.
Starting point / reference implementation: scylladb/alternator-client-java#87
This is distinct from DRIVER-112, which tracks CQL/native-driver vector support. Alternator vector search is a DynamoDB-compatible API extension, so standard AWS SDK models do not know the extra request/response fields and may reject or drop them unless the client library patches the model or intercepts raw JSON.
Core/server references
Related vector-store references:
Server docs/tests:
Required client API surface
CreateTable.VectorIndexes
Support vector index definitions with IndexName, VectorAttribute, optional Projection, and optional SimilarityFunction.
VectorAttribute contains AttributeName and Dimensions. SimilarityFunction values are COSINE, EUCLIDEAN, and DOT_PRODUCT; COSINE is the server default.
Projection follows the DynamoDB secondary-index projection shape: ProjectionType plus optional NonKeyAttributes. Current merged server support is KEYS_ONLY. Projection=INCLUDE is being added by scylladb/scylladb#29959. Projection=ALL is not currently supported.
Useful client-side validation: Dimensions must be positive and the server maximum is 16000; IndexName uses DynamoDB-style table/index name rules; the vector attribute cannot be a table key or secondary-index key; duplicate vector index names and duplicate vector indexes on the same attribute are invalid.
UpdateTable.VectorIndexUpdates
Support VectorIndexUpdates with exactly one operation per request: Create or Delete. Create has the same shape as CreateTable.VectorIndexes. Delete contains IndexName. There is no vector-index Update operation.
Do not combine vector index updates with GSI updates in the same request. After scylladb/scylladb#29826, vector indexes and Alternator Streams can coexist on the same table, but stream status changes and vector index create/delete should not be combined in one UpdateTable request.
DescribeTable.Table.VectorIndexes
Expose VectorIndexes from DescribeTable. Returned fields include IndexName, VectorAttribute, Projection, SimilarityFunction, IndexStatus, and Backfilling. Client libraries should provide or document a waiter/helper that waits for IndexStatus == ACTIVE before vector queries.
AttributeValue.FLOAT32VECTOR
Support the Alternator-specific AttributeValue member FLOAT32VECTOR, for example:
{"FLOAT32VECTOR": [0.1, -0.3, 0.7]}
Values are JSON numbers, not DynamoDB numeric strings. Values must fit finite float32. The vector length must match the index dimensions. Standard DynamoDB L-of-N vectors are also accepted by the server, but FLOAT32VECTOR is the compact optimized representation.
Most AWS SDKs do not know FLOAT32VECTOR. A client library may need to patch the SDK service model or intercept serialized request/response JSON. If a local placeholder representation is used internally, it must be replaced with real FLOAT32VECTOR JSON before sending the request.
On reads, preserve enough information for callers to re-write the value as FLOAT32VECTOR. If the client converts it only to a normal L-of-N value, copying the item back will store it as a regular list instead of the compact vector type.
Query.VectorSearch
Support Query.VectorSearch with QueryVector and optional ReturnScores.
Query rules: IndexName is required and must name a vector index. Limit is required and must be positive. scylladb/scylladb#29776 caps Limit at 1000. VectorSearch.QueryVector is required and may be FLOAT32VECTOR or L-of-N. ConsistentRead=true is rejected. ExclusiveStartKey is rejected because there is no pagination. ScanIndexForward is rejected because results are always nearest/best first. Legacy QueryFilter is rejected.
ReturnScores values are NONE and SIMILARITY. NONE is the default. With SIMILARITY, the response includes Scores, a numeric array parallel to Items where Scores[i] belongs to Items[i]. ReturnScores=SIMILARITY is invalid with Select=COUNT.
Score semantics: higher is better. COSINE scores are in [0, 1] using (1 + cosine) / 2. EUCLIDEAN scores are in [0, 1] using 1 / (1 + distance). DOT_PRODUCT is unbounded and may be negative or greater than 1.
Standard Query options: Select, ProjectionExpression, AttributesToGet, and FilterExpression are supported. FilterExpression is a post-filter and can return fewer than Limit results. The current default Select behavior is ALL_PROJECTED_ATTRIBUTES. Select=COUNT returns counts and no Items.
Filtering caveat: current merged server behavior ignores KeyConditionExpression for vector search. scylladb/scylladb#29776 changes KeyConditionExpression into a pre-filter over projected attributes and rejects legacy KeyConditions. Supported pre-filter operators after that PR are =, <, <=, >, >=, IN, BETWEEN, and AND. OR, NOT, and <> are not supported. The pre-filter is limited to projected scalar attributes, currently S and N.
Implementation guidance
- Preserve normal DynamoDB behavior for requests that do not use Alternator vector extensions.
- Prefer a typed, language-idiomatic API, but keep a raw escape hatch for unsupported or not-yet-modeled server fields.
- If the underlying SDK drops unknown response fields, extract
VectorIndexes, Scores, and FLOAT32VECTOR before normal SDK parsing.
- Integration tests should skip cleanly when the server does not support vector indexes or when vector-store is disabled.
Acceptance criteria
CreateTable.VectorIndexes can be serialized and sent.
UpdateTable.VectorIndexUpdates create/delete can be serialized and sent.
DescribeTable.Table.VectorIndexes can be parsed and exposed.
FLOAT32VECTOR can be written and read without being lost or silently converted into an incompatible representation.
Query.VectorSearch works with FLOAT32VECTOR and L-of-N query vectors.
ReturnScores=SIMILARITY is exposed and Scores order matches Items order.
- Invalid combinations are covered: missing
IndexName, missing Limit, missing QueryVector, ConsistentRead=true, ExclusiveStartKey, ScanIndexForward, and ReturnScores=SIMILARITY with Select=COUNT.
- Documentation/examples explain server-version caveats for
KeyConditionExpression pre-filtering, Projection=INCLUDE, Limit <= 1000, and Streams coexistence.
Jira task: https://scylladb.atlassian.net/browse/DRIVER-735
Jira epic: https://scylladb.atlassian.net/browse/DRIVER-729
Copied from Jira epic
DRIVER-729:This issue tracks client-library support for the Alternator vector search API.
Starting point / reference implementation: scylladb/alternator-client-java#87
This is distinct from DRIVER-112, which tracks CQL/native-driver vector support. Alternator vector search is a DynamoDB-compatible API extension, so standard AWS SDK models do not know the extra request/response fields and may reject or drop them unless the client library patches the model or intercepts raw JSON.
Core/server references
Limit <= 1000,KeyConditionsrejection: alternator: add even more vector search features scylladb#29776Related vector-store references:
Server docs/tests:
Required client API surface
CreateTable.VectorIndexesSupport vector index definitions with
IndexName,VectorAttribute, optionalProjection, and optionalSimilarityFunction.VectorAttributecontainsAttributeNameandDimensions.SimilarityFunctionvalues areCOSINE,EUCLIDEAN, andDOT_PRODUCT;COSINEis the server default.Projectionfollows the DynamoDB secondary-index projection shape:ProjectionTypeplus optionalNonKeyAttributes. Current merged server support isKEYS_ONLY.Projection=INCLUDEis being added by scylladb/scylladb#29959.Projection=ALLis not currently supported.Useful client-side validation:
Dimensionsmust be positive and the server maximum is 16000;IndexNameuses DynamoDB-style table/index name rules; the vector attribute cannot be a table key or secondary-index key; duplicate vector index names and duplicate vector indexes on the same attribute are invalid.UpdateTable.VectorIndexUpdatesSupport
VectorIndexUpdateswith exactly one operation per request:CreateorDelete.Createhas the same shape asCreateTable.VectorIndexes.DeletecontainsIndexName. There is no vector-indexUpdateoperation.Do not combine vector index updates with GSI updates in the same request. After scylladb/scylladb#29826, vector indexes and Alternator Streams can coexist on the same table, but stream status changes and vector index create/delete should not be combined in one
UpdateTablerequest.DescribeTable.Table.VectorIndexesExpose
VectorIndexesfromDescribeTable. Returned fields includeIndexName,VectorAttribute,Projection,SimilarityFunction,IndexStatus, andBackfilling. Client libraries should provide or document a waiter/helper that waits forIndexStatus == ACTIVEbefore vector queries.AttributeValue.FLOAT32VECTORSupport the Alternator-specific AttributeValue member
FLOAT32VECTOR, for example:{"FLOAT32VECTOR": [0.1, -0.3, 0.7]}Values are JSON numbers, not DynamoDB numeric strings. Values must fit finite
float32. The vector length must match the index dimensions. Standard DynamoDBL-of-Nvectors are also accepted by the server, butFLOAT32VECTORis the compact optimized representation.Most AWS SDKs do not know
FLOAT32VECTOR. A client library may need to patch the SDK service model or intercept serialized request/response JSON. If a local placeholder representation is used internally, it must be replaced with realFLOAT32VECTORJSON before sending the request.On reads, preserve enough information for callers to re-write the value as
FLOAT32VECTOR. If the client converts it only to a normalL-of-Nvalue, copying the item back will store it as a regular list instead of the compact vector type.Query.VectorSearchSupport
Query.VectorSearchwithQueryVectorand optionalReturnScores.Query rules:
IndexNameis required and must name a vector index.Limitis required and must be positive. scylladb/scylladb#29776 capsLimitat 1000.VectorSearch.QueryVectoris required and may beFLOAT32VECTORorL-of-N.ConsistentRead=trueis rejected.ExclusiveStartKeyis rejected because there is no pagination.ScanIndexForwardis rejected because results are always nearest/best first. LegacyQueryFilteris rejected.ReturnScoresvalues areNONEandSIMILARITY.NONEis the default. WithSIMILARITY, the response includesScores, a numeric array parallel toItemswhereScores[i]belongs toItems[i].ReturnScores=SIMILARITYis invalid withSelect=COUNT.Score semantics: higher is better.
COSINEscores are in[0, 1]using(1 + cosine) / 2.EUCLIDEANscores are in[0, 1]using1 / (1 + distance).DOT_PRODUCTis unbounded and may be negative or greater than 1.Standard Query options:
Select,ProjectionExpression,AttributesToGet, andFilterExpressionare supported.FilterExpressionis a post-filter and can return fewer thanLimitresults. The current defaultSelectbehavior isALL_PROJECTED_ATTRIBUTES.Select=COUNTreturns counts and noItems.Filtering caveat: current merged server behavior ignores
KeyConditionExpressionfor vector search. scylladb/scylladb#29776 changesKeyConditionExpressioninto a pre-filter over projected attributes and rejects legacyKeyConditions. Supported pre-filter operators after that PR are=,<,<=,>,>=,IN,BETWEEN, andAND.OR,NOT, and<>are not supported. The pre-filter is limited to projected scalar attributes, currentlySandN.Implementation guidance
VectorIndexes,Scores, andFLOAT32VECTORbefore normal SDK parsing.Acceptance criteria
CreateTable.VectorIndexescan be serialized and sent.UpdateTable.VectorIndexUpdatescreate/delete can be serialized and sent.DescribeTable.Table.VectorIndexescan be parsed and exposed.FLOAT32VECTORcan be written and read without being lost or silently converted into an incompatible representation.Query.VectorSearchworks withFLOAT32VECTORandL-of-Nquery vectors.ReturnScores=SIMILARITYis exposed andScoresorder matchesItemsorder.IndexName, missingLimit, missingQueryVector,ConsistentRead=true,ExclusiveStartKey,ScanIndexForward, andReturnScores=SIMILARITYwithSelect=COUNT.KeyConditionExpressionpre-filtering,Projection=INCLUDE,Limit <= 1000, and Streams coexistence.