使用向量搜尋
建立向量鍵空間
建立您要為 Vector Search 表格使用的鍵空間。此範例使用 cycling
作為 鍵空間名稱
CREATE KEYSPACE IF NOT EXISTS cycling
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };
建立向量表格
在您的鍵空間中建立新的表格,包括 comments_vector
向量欄位。以下程式碼建立一個具有五個值的向量
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
您也可以變更現有表格以新增向量欄位
ALTER TABLE cycling.comments_vs
ADD comment_vector VECTOR <FLOAT, 5>(1)
建立向量索引
使用儲存附加索引 (SAI) 建立自訂索引
CREATE INDEX IF NOT EXISTS ann_index
ON cycling.comments_vs(comment_vector) USING 'sai';
有關 SAI 的詳細資訊,請參閱 儲存附加索引 說明文件。
索引可以使用定義相似性函數的選項建立
|
將向量資料載入您的資料庫
使用新類型將資料插入表格
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
e7ae5cf3-d358-4d99-b900-85902fda9bb0,
'2017-02-14 12:43:20-0800',
'Raining too hard should have postponed',
'Alex',
[0.45, 0.09, 0.01, 0.2, 0.11]
);
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
e7ae5cf3-d358-4d99-b900-85902fda9bb0,
'2017-03-21 13:11:09.999-0800',
'Second rest stop was out of water',
'Alex',
[0.99, 0.5, 0.99, 0.1, 0.34]
);
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
e7ae5cf3-d358-4d99-b900-85902fda9bb0,
'2017-04-01 06:33:02.16-0800',
'LATE RIDERS SHOULD NOT DELAY THE START',
'Alex',
[0.9, 0.54, 0.12, 0.1, 0.95]
);
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
c7fceba0-c141-4207-9494-a29f9809de6f,
totimestamp(now()),
'The gift certificate for winning was the best',
'Amy',
[0.13, 0.8, 0.35, 0.17, 0.03]
);
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
c7fceba0-c141-4207-9494-a29f9809de6f,
'2017-02-17 12:43:20.234+0400',
'Glad you ran the race in the rain',
'Amy',
[0.3, 0.34, 0.2, 0.78, 0.25]
);
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
c7fceba0-c141-4207-9494-a29f9809de6f,
'2017-03-22 5:16:59.001+0400',
'Great snacks at all reststops',
'Amy',
[0.1, 0.4, 0.1, 0.52, 0.09]
);
INSERT INTO cycling.comments_vs (record_id, id, created_at, comment, commenter, comment_vector)
VALUES (
now(),
c7fceba0-c141-4207-9494-a29f9809de6f,
'2017-04-01 17:43:08.030+0400',
'Last climb was a killer',
'Amy',
[0.3, 0.75, 0.2, 0.2, 0.5]
);
使用 CQL 查詢向量資料
若要使用 Vector Search 查詢資料,請使用 SELECT
查詢
SELECT * FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 3;
若要取得結果中與查詢資料最接近的最佳評分節點的相似性計算,請使用 SELECT
查詢
SELECT comment, similarity_cosine(comment_vector, [0.2, 0.15, 0.3, 0.2, 0.05])
FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.1, 0.15, 0.3, 0.12, 0.05]
LIMIT 1;
此類查詢支援的函數為
-
similarity_dot_product
-
similarity_cosine
-
similarity_euclidean
,其參數為 (<vector_column>、<embedding_value>)。兩個參數都代表向量。
|