使用儲存器附加索引 (SAI)
建立 SAI 索引
若要建立 SAI 索引,您必須定義索引名稱、表格名稱和要建立索引的欄位名稱。
若要建立簡單的 SAI 索引
CREATE INDEX lastname_sai_idx ON cycling.cyclist_semi_pro (lastname)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};
CREATE INDEX age_sai_idx ON cycling.cyclist_semi_pro (age)
USING 'sai';
CREATE INDEX country_sai_idx ON cycling.cyclist_semi_pro (country)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'};
CREATE INDEX registration_sai_idx ON cycling.cyclist_semi_pro (registration)
USING 'sai';
對於大多數 SAI 索引,欄位名稱定義在 CREATE INDEX
陳述式中,該陳述式也使用 USING 'sai'
。SAI 索引選項定義在 WITH OPTIONS
子句中。case_sensitive
選項設為 false
以允許不區分大小寫的搜尋。normalize
選項設為 true
以允許搜尋針對 Unicode 字元正規化。ascii_only
選項設為 true
以允許搜尋僅限於 ASCII 字元。
map
集合資料類型是一個例外,如 以下範例 所示。
map
集合在 SAI 索引中
Map 集合的格式與其他 SAI 索引不同
// Create an index on a map key to find all cyclist/team combos for a year
// tag::keysidx[]
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );
// end::keysidx[]
// Create an index on a map key to find all cyclist/team combos for a year
// tag::valuesidx[]
CREATE INDEX IF NOT EXISTS team_year_values_idx
ON cycling.cyclist_teams ( VALUES (teams) );
// end::valuesidx[]
// Create an index on a map key to find all cyclist/team combos for a year
// tag::entriesidx[]
CREATE INDEX IF NOT EXISTS team_year_entries_idx
ON cycling.cyclist_teams ( ENTRIES (teams) );
// end::entriesidx[]
向量搜尋的相似性函數
此範例使用下列表格
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
若要檢查 comment_vector
是否有設定特定的相似性函數,請使用 similarity-function
選項,並設定為其中一種受支援的相似性函數:DOT_PRODUCT、COSINE 或 EUCLIDEAN。預設的相似性函數為 COSINE。
此索引在 comment_vector
欄位上建立索引,並將相似性函數設定為 DOT_PRODUCT
CREATE INDEX sim_comments_idx
ON cycling.comments_vs (comment_vector)
USING 'sai'
WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT'};
其他資源
請參閱 建立自訂索引,以取得有關建立 SAI 索引的詳細資訊。
變更 SAI 索引
無法變更 SAI 索引。如果您需要修改 SAI 索引,您必須移除目前的索引、建立新的索引,並重新建置循環。
-
移除索引
DROP INDEX IF EXISTS cycling.lastname_sai_idx;
-
建立新索引
CREATE INDEX lastname_sai_idx ON cycling.cyclist_semi_pro (lastname) USING 'sai' WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; CREATE INDEX age_sai_idx ON cycling.cyclist_semi_pro (age) USING 'sai'; CREATE INDEX country_sai_idx ON cycling.cyclist_semi_pro (country) USING 'sai' WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; CREATE INDEX registration_sai_idx ON cycling.cyclist_semi_pro (registration) USING 'sai';
使用 SAI 查詢
SAI 快速入門 僅專注於根據非主鍵欄位定義多個索引(非常有用的功能)。讓我們透過一些範例來探索其他選項,瞭解如何對具有不同定義的 SAI 索引的表格執行查詢。
SAI 僅支援 |
向量搜尋
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX IF NOT EXISTS ann_index
ON cycling.comments_vs(comment_vector) USING 'sai';
使用 CQL 查詢向量資料
若要使用向量搜尋查詢資料,請使用 SELECT
查詢
SELECT * FROM cycling.comments_vs
ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
LIMIT 3;
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] | Alex | 616e77e0-22a2-11ee-b99d-1f350647414a
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-02-17 08:43:20.234000+0000 | Glad you ran the race in the rain | [0.3, 0.34, 0.2, 0.78, 0.25] | Amy | 6170c1d0-22a2-11ee-b99d-1f350647414a
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-04-01 13:43:08.030000+0000 | Last climb was a killer | [0.3, 0.75, 0.2, 0.2, 0.5] | Amy | 62105d30-22a2-11ee-b99d-1f350647414a
限制必須為 1,000 或更少。 |
在結果中向右捲動,會顯示與用於查詢的嵌入最相符的表格中的留言。
在欄位上進行單一索引比對
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
此表格中的欄位 commenter
並非分割鍵,因此需要建立索引才能查詢。
查詢該欄位上的比對
SELECT * FROM cycling.comments_vs
WHERE commenter = 'Alex';
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] | Alex | 6d0cdaa0-272b-11ee-859f-b9098002fcac
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
使用選項對欄位進行單一索引比對
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_cs_idx ON cycling.comments_vs (commenter)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};
大小寫敏感度
此表格中的欄位 commenter
並非分割鍵,因此需要建立索引才能查詢。如果我們要將 commenter
作為大小寫敏感的值進行檢查,可以使用 case_sensitive
選項並將其設定為 true
。
請注意,如果您在查詢中使用了不適當的大小寫敏感值,則不會傳回任何結果
SELECT * FROM comments_vs WHERE commenter ='alex';
id | created_at | comment | comment_vector | commenter | record_id
----+------------+---------+----------------+-----------+-----------
(0 rows)
當我們將自行車選手名字的大小寫轉換為與索引中相同的大小寫時,查詢便會成功
對複合分割鍵欄位進行索引比對
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
race_year int,
race_name text,
cyclist_name text,
rank int,
PRIMARY KEY ((race_year, race_name), rank)
);
CREATE INDEX race_name_idx
ON cycling.rank_by_year_and_name (race_name)
USING 'sai';
CREATE INDEX race_year_idx
ON cycling.rank_by_year_and_name (race_year)
USING 'sai';
複合分割鍵在表格中有多個欄位定義分割區。通常,您需要在分割鍵中指定所有欄位,才能使用 WHERE
子句查詢表格。但是,SAI 索引可以定義一個索引,並使用表格複合分割鍵中的單一欄位。如果您需要根據單一欄位進行查詢,則可以在複合分割鍵中的每個欄位建立 SAI 索引。
SAI 索引也允許您查詢表格,而無需使用低效率的 ALLOW FILTERING
指令。ALLOW FILTERING
指令需要掃描表格中的所有分割區,這會導致效能不佳。
race_year
和 race_name
欄位構成 cycling.rank_by_year_and_name
表格的複合分割鍵。
查詢 race_name
欄位上的比對
SELECT * FROM cycling.rank_by_year_and_name
WHERE race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Benjamin PRADES
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Adam PHELAN
2015 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Thomas LEBAS
查詢 race_year
欄位上的比對
SELECT * FROM cycling.rank_by_year_and_name
WHERE race_year = 2014;
race_year | race_name | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
2014 | 4th Tour of Beijing | 1 | Phillippe GILBERT
2014 | 4th Tour of Beijing | 2 | Daniel MARTIN
2014 | 4th Tour of Beijing | 3 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 1 | Daniel MARTIN
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 2 | Johan Esteban CHAVES
2014 | Tour of Japan - Stage 4 - Minami > Shinshu | 3 | Benjamin PRADES
使用 AND 比對多個索引
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
為表格建立多個索引,以示範如何查詢多個欄位上的比對。
查詢多個欄位上的比對,且兩個欄位都必須比對
SELECT * FROM cycling.comments_vs
WHERE
created_at='2017-03-21 21:11:09.999000+0000'
AND commenter = 'Alex';
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
多個索引與 OR 搭配
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
為表格建立多個索引,以示範如何查詢多個欄位上的比對。
查詢一個欄位或另一個欄位的相符項
SELECT * FROM cycling.comments_vs
WHERE
created_at='2017-03-21 21:11:09.999000+0000'
OR created_at='2017-03-22 01:16:59.001000+0000';
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | [0.1, 0.4, 0.1, 0.52, 0.09] | Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac
多個索引與 IN 搭配
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.comments_vs (
record_id timeuuid,
id uuid,
commenter text,
comment text,
comment_vector VECTOR <FLOAT, 5>,
created_at timestamp,
PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
ON cycling.comments_vs (commenter)
USING 'sai';
CREATE INDEX created_at_idx
ON cycling.comments_vs (created_at)
USING 'sai';
CREATE INDEX ann_index
ON cycling.comments_vs (comment_vector)
USING 'sai';
為表格建立多個索引,以示範如何查詢多個欄位上的比對。
查詢與值清單中欄位值的相符項
SELECT * FROM cycling.comments_vs
WHERE created_at IN
('2017-03-21 21:11:09.999000+0000'
,'2017-03-22 01:16:59.001000+0000');
id | created_at | comment | comment_vector | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] | Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 | Great snacks at all reststops | [0.1, 0.4, 0.1, 0.52, 0.09] | Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac
使用者定義的類型
SAI 可以索引使用者定義的類型 (UDT) 或 UDT 清單。此範例顯示如何索引 UDT 清單。
此範例使用下列使用者定義的類型 (UDT)、表格和索引
CREATE TYPE IF NOT EXISTS cycling.race (
race_title text,
race_date timestamp,
race_time text
);
CREATE TABLE IF NOT EXISTS cycling.cyclist_races (
id UUID PRIMARY KEY,
lastname text,
firstname text,
races list<FROZEN <race>>
);
CREATE INDEX races_idx
ON cycling.cyclist_races (races)
USING 'sai';
在 cycling.cyclist_races
表格中,於 UDT 清單欄位 races
上建立索引。
使用 CONTAINS
從清單 races
欄位查詢
SELECT * FROM cycling.cyclist_races
WHERE races CONTAINS {
race_title:'Rabobank 7-Dorpenomloop Aalburg',
race_date:'2015-05-09',
race_time:'02:58:33'};
id | firstname | lastname | races
--------------------------------------+-----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | [{race_title: 'Rabobank 7-Dorpenomloop Aalburg', race_date: '2015-05-09 00:00:00.000000+0000', race_time: '02:58:33'}, {race_title: 'Ronde van Gelderland', race_date: '2015-04-19 00:00:00.000000+0000', race_time: '03:22:23'}]
(1 rows)
使用集合的 SAI 索引
SAI 支援 map
、list
和 set
類型的集合。集合讓您在一個欄位中分組並儲存資料。
在關聯式資料庫中,會透過 (例如) user
表格和 email
表格之間的多對一連接關係,來達成使用者多個電子郵件地址這類的分組。Apache Cassandra 會將使用者的電子郵件地址儲存在 user
表格的集合欄位中,以避免兩個表格之間的連接。每個集合會指定所儲存資料的資料類型。
如果集合儲存的資料有限,那麼集合就適用。如果資料有無限的成長潛力,例如每秒註冊的訊息已傳送或感測器事件,請勿使用集合。請改用具有複合主鍵的表格,其中資料儲存在分群欄位中。
在具有 SAI 索引的資料庫表格的 CQL 查詢中,
|
使用 set 類型
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
id UUID PRIMARY KEY,
lastname text,
teams set<text>
);
CREATE INDEX teams_idx
ON cycling.cyclist_career_teams (teams)
USING 'sai';
在 cyclist_career_teams
表格中,於 set 欄位 teams
上建立索引。
使用 CONTAINS
從 teams
欄查詢集合
SELECT * FROM cycling.cyclist_career_teams
WHERE teams CONTAINS 'Rabobank-Liv Giant';
id | lastname | teams
--------------------------------------+----------+------------------------------------------------------------------------------------------------------
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | VOS | {'Nederland bloeit', 'Rabobank Women Team', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}
使用清單類型
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
year int,
month int,
events list<text>,
PRIMARY KEY (year, month)
);
CREATE INDEX events_idx
ON cycling.upcoming_calendar (events)
USING 'sai';
在 upcoming_calendar
表格中,清單欄 events
上建立索引。
使用 CONTAINS
從清單 events
欄查詢
SELECT * FROM cycling.upcoming_calendar
WHERE events CONTAINS 'Criterium du Dauphine';
year | month | events
------+-------+-----------------------------------------------
2015 | 6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']
稍複雜的查詢會選取包含特定事件或特定月份日期的列
使用映射類型
此範例使用下列表格和索引
CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
id uuid PRIMARY KEY,
firstname text,
lastname text,
teams map<int, text>
);
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );
CREATE INDEX IF NOT EXISTS team_year_entries_idx
ON cycling.cyclist_teams ( ENTRIES (teams) );
CREATE INDEX IF NOT EXISTS team_year_values_idx
ON cycling.cyclist_teams ( VALUES (teams) );
在 cyclist_career_teams
表格中,映射欄 teams
上建立的索引會針對欄資料的鍵、值和完整條目。
使用 KEYS
從映射 teams
欄查詢
SELECT * FROM cyclist_teams WHERE teams CONTAINS KEY 2014;
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
5b6962dd-3f90-4c93-8f61-eabfa4a803e2 | Marianne | VOS | {2014: 'Rabobank-Liv Woman Cycling Team', 2015: 'Rabobank-Liv Woman Cycling Team'}
從映射 teams
欄查詢值,請注意只包含關鍵字 CONTAINS
SELECT * FROM cyclist_teams WHERE teams CONTAINS 'Team Garmin - Cervelo';
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
從映射 teams
欄查詢條目,請注意 WHERE
子句的差異
SELECT * FROM cyclist_teams
WHERE
teams[2014] = 'Boels:Dolmans Cycling Team'
AND teams[2015] = 'Boels:Dolmans Cycling Team';
id | firstname | lastname | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
此範例會尋找 teams
欄中存在兩個條目的列。
如需更多資訊,請參閱