Cassandra 文件

版本

您正在檢視預發行版本的說明文件。

使用 SAI 進行查詢

SAI 快速入門僅專注於根據非主鍵欄位定義多個索引(一個非常有用的功能)。讓我們透過一些範例來探索其他選項,說明如何對具有不同定義的 SAI 索引的表格執行查詢。

SAI 僅支援 SELECT 查詢,但不支援 UPDATEDELETE 查詢。

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX IF NOT EXISTS ann_index
  ON cycling.comments_vs(comment_vector) USING 'sai';

使用 CQL 查詢向量資料

若要使用向量搜尋查詢資料,請使用 SELECT 查詢

  • CQL

  • 結果

SELECT * FROM cycling.comments_vs
    ORDER BY comment_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55]
    LIMIT 3;
 id                                   | created_at                      | comment                                | comment_vector               | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] |      Alex | 616e77e0-22a2-11ee-b99d-1f350647414a
 c7fceba0-c141-4207-9494-a29f9809de6f | 2017-02-17 08:43:20.234000+0000 |      Glad you ran the race in the rain | [0.3, 0.34, 0.2, 0.78, 0.25] |       Amy | 6170c1d0-22a2-11ee-b99d-1f350647414a
 c7fceba0-c141-4207-9494-a29f9809de6f | 2017-04-01 13:43:08.030000+0000 |                Last climb was a killer |   [0.3, 0.75, 0.2, 0.2, 0.5] |       Amy | 62105d30-22a2-11ee-b99d-1f350647414a

限制必須為 1,000 或更少。

在結果中向右捲動會顯示與用於查詢的嵌入最相符的表格中的留言。

單一索引在欄位上比對

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
    ON cycling.comments_vs (commenter)
    USING 'sai';
CREATE INDEX created_at_idx
    ON cycling.comments_vs (created_at)
    USING 'sai';
CREATE INDEX ann_index
    ON cycling.comments_vs (comment_vector)
    USING 'sai';

commenter 欄位不是此表格中的分割鍵,因此需要索引才能對其進行查詢。

查詢該欄位上的比對

  • 查詢

  • 結果

    SELECT * FROM cycling.comments_vs
        WHERE commenter = 'Alex';
 id                                   | created_at                      | comment                                | comment_vector               | commenter | record_id
--------------------------------------+---------------------------------+----------------------------------------+------------------------------+-----------+--------------------------------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-04-01 14:33:02.160000+0000 | LATE RIDERS SHOULD NOT DELAY THE START | [0.9, 0.54, 0.12, 0.1, 0.95] |      Alex | 6d0cdaa0-272b-11ee-859f-b9098002fcac
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 |      Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] |      Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
索引失敗

請注意,如果您在建立索引之前嘗試此查詢,將會發生失敗

  • 查詢

  • 結果

    SELECT * FROM cycling.comments_vs
        WHERE commenter = 'Alex';
InvalidRequest: Error from server: code=2200
[Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance.
If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

使用選項在欄位上比對單一索引

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_cs_idx ON cycling.comments_vs (commenter)
USING 'sai'
WITH OPTIONS = {'case_sensitive': 'true', 'normalize': 'true', 'ascii': 'true'};

大小寫敏感性

commenter 欄位不是此表格中的分割鍵,因此需要索引才能對其進行查詢。如果我們要將 commenter 作為大小寫敏感的值進行檢查,我們可以使用 case_sensitive 選項設定為 true

請注意,如果您在查詢中使用不適當的大小寫敏感值,則不會傳回任何結果

  • 查詢

  • 結果

SELECT * FROM comments_vs WHERE commenter ='alex';
 id | created_at | comment | comment_vector | commenter | record_id
----+------------+---------+----------------+-----------+-----------

(0 rows)

當我們將自行車選手名稱的大小寫轉換為與索引中的大小寫相符時,查詢會成功

  • 查詢

  • 結果

SELECT comment,commenter FROM comments_vs WHERE commenter ='Alex';
 comment                                | commenter
----------------------------------------+-----------
 LATE RIDERS SHOULD NOT DELAY THE START |      Alex
      Second rest stop was out of water |      Alex
(2 rows)

複合分區金鑰欄位上的索引比對

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.rank_by_year_and_name (
  race_year int,
  race_name text,
  cyclist_name text,
  rank int,
  PRIMARY KEY ((race_year, race_name), rank)
);
CREATE INDEX race_name_idx
    ON cycling.rank_by_year_and_name (race_name)
    USING 'sai';
CREATE INDEX race_year_idx
    ON cycling.rank_by_year_and_name (race_year)
    USING 'sai';

複合分區金鑰在表格中有多個欄位定義分區。通常,您需要在分區金鑰中指定所有欄位,才能使用 WHERE 子句查詢表格。但是,SAI 索引可以定義一個索引,使用表格複合分區金鑰中的單一欄位。如果您需要根據單一欄位進行查詢,則可以在複合分區金鑰中的每個欄位上建立 SAI 索引。

SAI 索引還允許您在不使用低效率的 ALLOW FILTERING 指令的情況下查詢表格。 ALLOW FILTERING 指令需要掃描表格中的所有分區,導致效能不佳。

race_yearrace_name 欄位組成 cycling.rank_by_year_and_name 表格的複合分區金鑰。

查詢 race_name 欄位的比對

  • 查詢

  • 結果

    SELECT * FROM cycling.rank_by_year_and_name
        WHERE race_name = 'Tour of Japan - Stage 4 - Minami > Shinshu';
 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |      Benjamin PRADES
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 |          Adam PHELAN
      2015 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |         Thomas LEBAS

查詢 race_year 欄位的比對

  • 查詢

  • 結果

    SELECT * FROM cycling.rank_by_year_and_name
        WHERE race_year = 2014;
 race_year | race_name                                  | rank | cyclist_name
-----------+--------------------------------------------+------+----------------------
      2014 |                        4th Tour of Beijing |    1 |    Phillippe GILBERT
      2014 |                        4th Tour of Beijing |    2 |        Daniel MARTIN
      2014 |                        4th Tour of Beijing |    3 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    1 |        Daniel MARTIN
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    2 | Johan Esteban CHAVES
      2014 | Tour of Japan - Stage 4 - Minami > Shinshu |    3 |      Benjamin PRADES

使用 AND 比對多個索引

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
    ON cycling.comments_vs (commenter)
    USING 'sai';
CREATE INDEX created_at_idx
    ON cycling.comments_vs (created_at)
    USING 'sai';
CREATE INDEX ann_index
    ON cycling.comments_vs (comment_vector)
    USING 'sai';

為表格建立多個索引,以示範如何查詢多個欄位的比對。

查詢多個欄位的比對,且兩個欄位都必須比對

  • 查詢

  • 結果

SELECT * FROM cycling.comments_vs
    WHERE
    created_at='2017-03-21 21:11:09.999000+0000'
    AND commenter = 'Alex';
 id                                   | created_at                      | comment                           | comment_vector               | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] |      Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac

使用 OR 比對多個索引

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
    ON cycling.comments_vs (commenter)
    USING 'sai';
CREATE INDEX created_at_idx
    ON cycling.comments_vs (created_at)
    USING 'sai';
CREATE INDEX ann_index
    ON cycling.comments_vs (comment_vector)
    USING 'sai';

為表格建立多個索引,以示範如何查詢多個欄位的比對。

查詢一個或另一個欄位的比對

  • 查詢

  • 結果

SELECT * FROM cycling.comments_vs
    WHERE
    created_at='2017-03-21 21:11:09.999000+0000'
    OR created_at='2017-03-22 01:16:59.001000+0000';
 id                                   | created_at                      | comment                           | comment_vector               | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] |      Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
 c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 |     Great snacks at all reststops |  [0.1, 0.4, 0.1, 0.52, 0.09] |       Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac

使用 IN 比對多個索引

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.comments_vs (
  record_id timeuuid,
  id uuid,
  commenter text,
  comment text,
  comment_vector VECTOR <FLOAT, 5>,
  created_at timestamp,
  PRIMARY KEY (id, created_at)
)
WITH CLUSTERING ORDER BY (created_at DESC);
CREATE INDEX commenter_idx
    ON cycling.comments_vs (commenter)
    USING 'sai';
CREATE INDEX created_at_idx
    ON cycling.comments_vs (created_at)
    USING 'sai';
CREATE INDEX ann_index
    ON cycling.comments_vs (comment_vector)
    USING 'sai';

為表格建立多個索引,以示範如何查詢多個欄位的比對。

查詢與值清單中的欄位值相符

  • 查詢

  • 結果

SELECT * FROM cycling.comments_vs
    WHERE created_at IN
    ('2017-03-21 21:11:09.999000+0000'
    ,'2017-03-22 01:16:59.001000+0000');
 id                                   | created_at                      | comment                           | comment_vector               | commenter | record_id
--------------------------------------+---------------------------------+-----------------------------------+------------------------------+-----------+--------------------------------------
 e7ae5cf3-d358-4d99-b900-85902fda9bb0 | 2017-03-21 21:11:09.999000+0000 | Second rest stop was out of water | [0.99, 0.5, 0.99, 0.1, 0.34] |      Alex | 6d0b7b10-272b-11ee-859f-b9098002fcac
 c7fceba0-c141-4207-9494-a29f9809de6f | 2017-03-22 01:16:59.001000+0000 |     Great snacks at all reststops |  [0.1, 0.4, 0.1, 0.52, 0.09] |       Amy | 6d0fc0d0-272b-11ee-859f-b9098002fcac

使用者定義的類型

SAI 可以索引使用者定義的類型 (UDT) 或 UDT 清單。此範例顯示如何索引 UDT 清單。

此範例使用下列使用者定義的類型 (UDT)、表格和索引

CREATE TYPE IF NOT EXISTS cycling.race (
  race_title text,
  race_date timestamp,
  race_time text
);
CREATE TABLE IF NOT EXISTS cycling.cyclist_races (
  id UUID PRIMARY KEY,
  lastname text,
  firstname text,
  races list<FROZEN <race>>
);
CREATE INDEX races_idx
    ON cycling.cyclist_races (races)
    USING 'sai';

索引建立在 cycling.cyclist_races 表格中 UDT 清單欄位 races 上。

使用 CONTAINSraces 清單欄位查詢

  • CQL

  • 結果

SELECT * FROM cycling.cyclist_races
    WHERE races CONTAINS {
       race_title:'Rabobank 7-Dorpenomloop Aalburg',
       race_date:'2015-05-09',
       race_time:'02:58:33'};
 id                                   | firstname | lastname | races
--------------------------------------+-----------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |  Marianne |      VOS | [{race_title: 'Rabobank 7-Dorpenomloop Aalburg', race_date: '2015-05-09 00:00:00.000000+0000', race_time: '02:58:33'}, {race_title: 'Ronde van Gelderland', race_date: '2015-04-19 00:00:00.000000+0000', race_time: '03:22:23'}]

(1 rows)

使用集合的 SAI 索引

SAI 支援 maplistset 類型的集合。集合讓您能將資料分組並儲存在欄位中。

在關聯式資料庫中,例如使用者的多個電子郵件地址,會透過 (例如) user 表格和 email 表格之間的多對一關聯關係來達成分組。Apache Cassandra 會將使用者的電子郵件地址儲存在 user 表格中的集合欄位中,以避免兩個表格之間的關聯。每個集合會指定所儲存資料的資料類型。

如果要收集儲存的資料有限,則集合會很合適。如果資料有無限成長的潛力,例如每秒註冊的訊息傳送或感測器事件,則不要使用集合。請改用具有複合主鍵的表格,其中資料儲存在分群欄位中。

在具有 SAI 索引的資料庫表格的 CQL 查詢中,CONTAINS 子句獲得支援,且特定於

  • 具有 keysvaluesentries 的 SAI 集合對應

  • 具有 listset 類型的 SAI 集合

使用 set 類型

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.cyclist_career_teams (
  id UUID PRIMARY KEY,
  lastname text,
  teams set<text>
);
CREATE INDEX teams_idx
    ON cycling.cyclist_career_teams (teams)
    USING 'sai';

索引建立在 cyclist_career_teams 表格中 set 欄位 teams 上。

使用 CONTAINS 從 set 欄位 teams 查詢

  • CQL

  • 結果

SELECT * FROM cycling.cyclist_career_teams
   WHERE teams CONTAINS 'Rabobank-Liv Giant';
 id                                   | lastname | teams
--------------------------------------+----------+------------------------------------------------------------------------------------------------------
 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |      VOS | {'Nederland bloeit', 'Rabobank Women Team', 'Rabobank-Liv Giant', 'Rabobank-Liv Woman Cycling Team'}

使用 list 類型

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.upcoming_calendar (
  year int,
  month int,
  events list<text>,
  PRIMARY KEY (year, month)
);
CREATE INDEX events_idx
    ON cycling.upcoming_calendar (events)
    USING 'sai';

索引建立在 upcoming_calendar 表格中 list 欄位 events 上。

使用 CONTAINS 從 list 欄位 events 查詢

  • CQL

  • 結果

SELECT * FROM cycling.upcoming_calendar
   WHERE events CONTAINS 'Criterium du Dauphine';
 year | month | events
------+-------+-----------------------------------------------
 2015 |     6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']

稍複雜的查詢會選取包含特定事件或特定月份日期的列

  • CQL

  • 結果

SELECT * FROM cycling.upcoming_calendar
    WHERE events CONTAINS 'Criterium du Dauphine'
          OR month = 7;
 year | month | events
------+-------+-----------------------------------------------
 2015 |     6 | ['Criterium du Dauphine', 'Tour de Sui\nsse']
 2015 |     7 |                            ['Tour de France']

使用 map 類型

此範例使用下列表格和索引

CREATE TABLE IF NOT EXISTS cycling.cyclist_teams (
  id uuid PRIMARY KEY,
  firstname text,
  lastname text,
  teams map<int, text>
);
CREATE INDEX IF NOT EXISTS team_year_keys_idx
ON cycling.cyclist_teams ( KEYS (teams) );
CREATE INDEX IF NOT EXISTS team_year_entries_idx
ON cycling.cyclist_teams ( ENTRIES (teams) );
CREATE INDEX IF NOT EXISTS team_year_values_idx
ON cycling.cyclist_teams ( VALUES (teams) );

cyclist_career_teams 表格中,建立在 map 欄位 teams 上的索引會針對欄位資料的鍵、值和完整項目。

使用 map teams 欄位的 KEYS 進行查詢

  • CQL

  • 結果

SELECT * FROM cyclist_teams WHERE teams CONTAINS KEY 2014;
 id                                   | firstname | lastname   | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}
 5b6962dd-3f90-4c93-8f61-eabfa4a803e2 |  Marianne |        VOS |                                                                                          {2014: 'Rabobank-Liv Woman Cycling Team', 2015: 'Rabobank-Liv Woman Cycling Team'}

查詢 map teams 欄位的值,請注意只包含關鍵字 CONTAINS

  • CQL

  • 結果

SELECT * FROM cyclist_teams WHERE teams CONTAINS 'Team Garmin - Cervelo';
 id                                   | firstname | lastname   | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

查詢 map teams 欄位的項目,請注意 WHERE 子句的差異

  • CQL

  • 結果

SELECT * FROM cyclist_teams
WHERE
    teams[2014] = 'Boels:Dolmans Cycling Team'
    AND teams[2015] = 'Boels:Dolmans Cycling Team';
 id                                   | firstname | lastname   | teams
--------------------------------------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 cb07baad-eac8-4f65-b28a-bddc06a0de23 | Elizabeth | ARMITSTEAD | {2011: 'Team Garmin - Cervelo', 2012: 'AA Drink - Leontien.nl', 2013: 'Boels:Dolmans Cycling Team', 2014: 'Boels:Dolmans Cycling Team', 2015: 'Boels:Dolmans Cycling Team'}

這個範例會尋找在 map teams 欄位中存在兩個項目的列。

如需更多資訊,請參閱