使用 SAI 進行查詢
SAI 快速入門僅專注於根據非主鍵欄位定義多個索引(一個非常有用的功能)。讓我們透過一些範例來探索其他選項,說明如何對具有不同定義的 SAI 索引的表格執行查詢。
SAI 僅支援 |
向量搜尋
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX IF NOT EXISTS ann_index
ON cycling.comments_vs(comment_vector) USING 'sai';
使用 CQL 查詢向量資料
若要使用向量搜尋查詢資料,請使用 SELECT
查詢
SELECT * FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 3;
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] | Alex | 616e77e0-22a2-11ee-b99d-1f350647414a
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-02-17 08:43:20.234000+0000 | Glad you ran the race in the rain | [0.3, 0.34, 0.2, 0.78, 0.25] | Amy | 6170c1d0-22a2-11ee-b99d-1f350647414a
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-04-01 13:43:08.030000+0000 | Last climb was a killer | [0.3, 0.75, 0.2, 0.2, 0.5] | Amy | 62105d30-22a2-11ee-b99d-1f350647414a
限制必須為 1,000 或更少。 |
在結果中向右捲動會顯示與用於查詢的嵌入最相符的表格中的留言。
單一索引在欄位上比對
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
commenter
欄位不是此表格中的分割鍵,因此需要索引才能對其進行查詢。
查詢該欄位上的比對
SELECT * FROM cycling.comments_vs
WHERE commenter = 'Alex';
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] | Alex | 6d0cdaa0-272b-11ee-859f-b9098002fcac
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
使用選項在欄位上比對單一索引
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_cs_idx ON cycling.comments_vs (commenter)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};
大小寫敏感性
commenter
欄位不是此表格中的分割鍵,因此需要索引才能對其進行查詢。如果我們要將 commenter
作為大小寫敏感的值進行檢查,我們可以使用 case_sensitive
選項設定為 true
。
請注意,如果您在查詢中使用不適當的大小寫敏感值,則不會傳回任何結果
SELECT * FROM comments_vs WHERE commenter ='alex';
id | created_at | comment | comment_vector | commenter | record_id
----+------------+---------+----------------+-----------+-----------
(0 rows)
當我們將自行車選手名稱的大小寫轉換為與索引中的大小寫相符時,查詢會成功
複合分區金鑰欄位上的索引比對
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
CREATE INDEX race_name_idx
ON cycling.rank_by_year_and_name (race_name)
USING 'sai';
CREATE INDEX race_year_idx
ON cycling.rank_by_year_and_name (race_year)
USING 'sai';
複合分區金鑰在表格中有多個欄位定義分區。通常,您需要在分區金鑰中指定所有欄位,才能使用 WHERE
子句查詢表格。但是,SAI 索引可以定義一個索引,使用表格複合分區金鑰中的單一欄位。如果您需要根據單一欄位進行查詢,則可以在複合分區金鑰中的每個欄位上建立 SAI 索引。
SAI 索引還允許您在不使用低效率的 ALLOW FILTERING
指令的情況下查詢表格。 ALLOW FILTERING
指令需要掃描表格中的所有分區,導致效能不佳。
race_year
和 race_name
欄位組成 cycling.rank_by_year_and_name
表格的複合分區金鑰。
查詢 race_name
欄位的比對
SELECT * FROM cycling.rank_by_year_and_name
WHERE race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Thomas LEBAS
查詢 race_year
欄位的比對
SELECT * FROM cycling.rank_by_year_and_name
WHERE race_year = 2014;
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT
2014 | 4th Tour of Beijing | 2 | Daniel MARTIN
2014 | 4th Tour of Beijing | 3 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES
使用 AND 比對多個索引
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
為表格建立多個索引,以示範如何查詢多個欄位的比對。
查詢多個欄位的比對,且兩個欄位都必須比對
SELECT * FROM cycling.comments_vs
WHERE
created_at='2017-03-21 21:11:09.999000+0000'
AND commenter = 'Alex';
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
使用 OR 比對多個索引
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
為表格建立多個索引,以示範如何查詢多個欄位的比對。
查詢一個或另一個欄位的比對
SELECT * FROM cycling.comments_vs
WHERE
created_at='2017-03-21 21:11:09.999000+0000'
OR created_at='2017-03-22 01:16:59.001000+0000';
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | [0.1, 0.4, 0.1, 0.52, 0.09] | Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac
使用 IN 比對多個索引
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
為表格建立多個索引,以示範如何查詢多個欄位的比對。
查詢與值清單中的欄位值相符
SELECT * FROM cycling.comments_vs
WHERE created_at IN
('2017-03-21 21:11:09.999000+0000'
,'2017-03-22 01:16:59.001000+0000');
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | [0.1, 0.4, 0.1, 0.52, 0.09] | Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac
使用者定義的類型
SAI 可以索引使用者定義的類型 (UDT) 或 UDT 清單。此範例顯示如何索引 UDT 清單。
此範例使用下列使用者定義的類型 (UDT)、表格和索引
CREATE TYPE IF NOT EXISTS cycling.race (
race_title text,
race_date timestamp,
race_time text
);
CREATE TABLE IF NOT EXISTS cycling.cyclist_races (
id UUID PRIMARY KEY,
lastname text,
firstname text,
races list<FROZEN <race>>
);
CREATE INDEX races_idx
ON cycling.cyclist_races (races)
USING 'sai';
索引建立在 cycling.cyclist_races
表格中 UDT 清單欄位 races
上。
使用 CONTAINS
從 races
清單欄位查詢
SELECT * FROM cycling.cyclist_races
WHERE races CONTAINS {
race_title:'Rabobank 7-Dorpenomloop Aalburg',
race_date:'2015-05-09',
race_time:'02:58:33'};
id | firstname | lastname | races
--------------------------------------+-----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | [{race_title: 'Rabobank 7-Dorpenomloop Aalburg', race_date: '2015-05-09 00:00:00.000000+0000', race_time: '02:58:33'}, {race_title: 'Ronde van Gelderland', race_date: '2015-04-19 00:00:00.000000+0000', race_time: '03:22:23'}]
(1 rows)
使用集合的 SAI 索引
SAI 支援 map
、list
和 set
類型的集合。集合讓您能將資料分組並儲存在欄位中。
在關聯式資料庫中,例如使用者的多個電子郵件地址,會透過 (例如) user
表格和 email
表格之間的多對一關聯關係來達成分組。Apache Cassandra 會將使用者的電子郵件地址儲存在 user
表格中的集合欄位中,以避免兩個表格之間的關聯。每個集合會指定所儲存資料的資料類型。
如果要收集儲存的資料有限,則集合會很合適。如果資料有無限成長的潛力,例如每秒註冊的訊息傳送或感測器事件,則不要使用集合。請改用具有複合主鍵的表格,其中資料儲存在分群欄位中。
在具有 SAI 索引的資料庫表格的 CQL 查詢中,
|
使用 set 類型
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
id UUID PRIMARY KEY,
lastname text,
teams set<text>
);
CREATE INDEX teams_idx
ON cycling.cyclist_career_teams (teams)
USING 'sai';
索引建立在 cyclist_career_teams
表格中 set 欄位 teams
上。
使用 CONTAINS
從 set 欄位 teams
查詢
SELECT * FROM cycling.cyclist_career_teams
WHERE teams CONTAINS 'Rabobank-Liv Giant';
id | lastname | teams
--------------------------------------+----------+------------------------------------------------------------------------------------------------------
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | VOS | {'Nederland bloeit', 'Rabobank Women Team', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}
使用 list 類型
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
year int,
month int,
events list<text>,
PRIMARY KEY (year, month)
);
CREATE INDEX events_idx
ON cycling.upcoming_calendar (events)
USING 'sai';
索引建立在 upcoming_calendar
表格中 list 欄位 events
上。
使用 CONTAINS
從 list 欄位 events
查詢
SELECT * FROM cycling.upcoming_calendar
WHERE events CONTAINS 'Criterium du Dauphine';
year | month | events
------+-------+-----------------------------------------------
2015 | 6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']
稍複雜的查詢會選取包含特定事件或特定月份日期的列
使用 map 類型
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
id uuid PRIMARY KEY,
firstname text,
lastname text,
teams map<int, text>
);
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );
CREATE INDEX IF NOT EXISTS team_year_entries_idx
ON cycling.cyclist_teams ( ENTRIES (teams) );
CREATE INDEX IF NOT EXISTS team_year_values_idx
ON cycling.cyclist_teams ( VALUES (teams) );
在 cyclist_career_teams
表格中,建立在 map 欄位 teams
上的索引會針對欄位資料的鍵、值和完整項目。
使用 map teams
欄位的 KEYS
進行查詢
SELECT * FROM cyclist_teams WHERE teams CONTAINS KEY 2014;
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | {2014: 'Rabobank-Liv Woman Cycling Team', 2015: 'Rabobank-Liv Woman Cycling Team'}
查詢 map teams
欄位的值,請注意只包含關鍵字 CONTAINS
SELECT * FROM cyclist_teams WHERE teams CONTAINS 'Team Garmin - Cervelo';
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
查詢 map teams
欄位的項目,請注意 WHERE
子句的差異
SELECT * FROM cyclist_teams
WHERE
teams[2014] = 'Boels:Dolmans Cycling Team'
AND teams[2015] = 'Boels:Dolmans Cycling Team';
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
這個範例會尋找在 map teams
欄位中存在兩個項目的列。
如需更多資訊,請參閱