Commit Graph

1580 Commits

Author SHA1 Message Date
536953ec6c tests: deflake TestV3WatchRestoreSnapshotUnsync
The TestV3WatchRestoreSnapshotUnsync setups three members' cluster.
Before serving any update requests from client, after leader elected,
each member will have index 8 log: 3 x ConfChange +
3 x ClusterMemberAttrSet + 1 x ClusterVersionSet.

Based on the config (SnapshotCount: 10, CatchUpCount: 5), we need to
file update requests to trigger snapshot at least twice.

T1: L(snapshot-index: 11, compacted-index:  6) F_m0(index: 8)
T2: L(snapshot-index: 22, compacted-index: 17) F_m0(index: 8, out of date)

After member0 recovers from network partition, it will reject leader's
request and return hint (index:8, term:x). If it happens after
second snapshot, leader will find out the index:8 is out of date and
force to transfer snapshot.

However, the client only files 15 update requests and leader doesn't
finish the process of snapshot in time. Since the last of
compacted-index is 6, leader can still replicate index:9 to member0
instead of snapshot.

```bash
cd tests/integration
CLUSTER_DEBUG=true go test -v -count=1 -run TestV3WatchRestoreSnapshotUnsync ./
...

INFO    m2.raft 3da8ba707f1a21a4 became leader at term 2        {"member": "m2"}
...
INFO    m2      triggering snapshot     {"member": "m2", "local-member-id": "3da8ba707f1a21a4", "local-member-applied-index": 22, "local-member-snapshot-index": 11, "local-member-snapshot-count": 10, "snapshot-forced": false}
...

cluster.go:1359: network partition between: 99626fe5001fde8b <-> 1c964119da6db036
cluster.go:1359: network partition between: 99626fe5001fde8b <-> 3da8ba707f1a21a4
cluster.go:416: WaitMembersForLeader

INFO    m0.raft 99626fe5001fde8b became follower at term 2      {"member": "m0"}
INFO    m0.raft raft.node: 99626fe5001fde8b elected leader 3da8ba707f1a21a4 at term 2   {"member": "m0"}
DEBUG   m2.raft 3da8ba707f1a21a4 received MsgAppResp(rejected, hint: (index 8, term 2)) from 99626fe5001fde8b for index 23      {"member": "m2"}
DEBUG   m2.raft 3da8ba707f1a21a4 decreased progress of 99626fe5001fde8b to [StateReplicate match=8 next=9 inflight=15]  {"member": "m2"}

DEBUG   m0      Applying entries        {"member": "m0", "num-entries": 15}
DEBUG   m0      Applying entry  {"member": "m0", "index": 9, "term": 2, "type": "EntryNormal"}

....

INFO    m2      saved snapshot  {"member": "m2", "snapshot-index": 22}
INFO    m2      compacted Raft logs     {"member": "m2", "compact-index": 17}
```

To fix this issue, the patch uses log monitor to watch "compacted Raft
log" and expect that two members should compact log twice.

Fixes: #15545

Signed-off-by: Wei Fu <fuweid89@gmail.com>
2023-04-10 22:27:58 +08:00
7153a8f2f4 Merge pull request #15646 from serathius/robustness-readme-watch-issue
tests/robustness: Document analysing watch issue
2023-04-07 23:45:42 +02:00
a5a5862e0b tests: Make using etcdctl expicit in e2e tests
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-06 13:29:37 +02:00
8b1cd036ff security: remove password after authenticating the user
fix https://nvd.nist.gov/vuln/detail/CVE-2021-28235

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-06 17:11:54 +08:00
801bb4c6df test: add an e2e test to reproduce https://nvd.nist.gov/vuln/detail/CVE-2021-28235
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-06 16:47:31 +08:00
2d0d3c3fdf security: bump go to 1.19.8 to fix four CVEs
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-06 13:38:58 +08:00
2d9aeec91f Merge pull request #15645 from serathius/tests-cleanup-alternative-binaries
tests/framework: Cleanup alternative binaries in e2e tests
2023-04-06 07:33:17 +02:00
540d012e5e tests/robustness: Ensure that etcdctl binary is provided
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-05 23:04:20 +02:00
1e41d95ab2 tests/robustness: Document analysing watch issue
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-05 22:40:47 +02:00
651873cf7b tests/framework: Cleanup alternative binaries in e2e tests
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-05 15:32:31 +02:00
42a2643df9 tests/robustness: Reproduce issue #15220
This issue is somewhat easily reproduced simply by bombarding the
server with requests for progress notifications, which eventually
leads to one being delivered ahead of the payload message. This is
then caught by the watch response validation code previously added by
Marek Siarkowicz.

Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>
2023-04-05 11:23:02 +01:00
af25936fb7 tests/integration: Demonstrate manual progress notification race
This will fail basically every time, as the progress notification
request catches the watcher in an asynchronised state.

Signed-off-by: Peter Wortmann <peter.wortmann@skao.int>
2023-04-05 11:19:07 +01:00
5bae6b1e44 tests/robustness: Detect trigger timeout and exit
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-04 15:23:58 +02:00
1227754284 Cancel watch if cluster not healthy before or after injecting failpoints.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-04-04 13:58:17 +02:00
6582e349db tests: Enfoce timeout on failpoints
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-04 12:25:07 +02:00
523f235c82 Merge pull request #15603 from serathius/robustness-finish-with-success
tests: Ensure that operation history finishes with successful request
2023-04-04 12:03:36 +02:00
32acc662c9 Merge pull request #15638 from ahrtr/dependency_20230404
Bump some dependencies
2023-04-04 17:11:26 +08:00
6a5d326519 tests: Ensure that operation history finishes with successful request
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-04 09:40:17 +02:00
5e0119eadc Merge pull request #15636 from lavacat/main-test-watch-delay
tests: increase maxWatchDelay to prevent flaky TestWatchDelay*
2023-04-04 09:38:03 +02:00
138fae6246 Merge pull request #15632 from serathius/fix-comparing-etcd-version
tests: Fix comparing etcd version
2023-04-04 09:34:55 +02:00
8b6bf90c0d Merge pull request #15580 from chaochn47/fix_flaking_auth_member_remove_test
fix flaking auth member remove test
2023-04-04 09:34:16 +02:00
4fab20aa75 Merge pull request #15618 from serathius/robustness-fix-periodic-etcd-version
tests: Fix building incorrect etcd version and make switch strict
2023-04-04 09:30:20 +02:00
072c5cb5da dependency: bump google.golang.org/protobuf from 1.28.1 to 1.30.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-04 15:28:09 +08:00
56284d5dfe dependency: bump github.com/golang/protobuf from 1.5.2 to 1.5.3
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-04 15:21:22 +08:00
0c66fc9f29 dependency: bump go.uber.org/multierr from 1.9.0 to 1.11.0
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-04 15:15:32 +08:00
757910958e tests: increase maxWatchDelay to prevent flaky TestWatchDelay*
value is selected empirically after spot checking some logs of flaky workflows

fixes: https://github.com/etcd-io/etcd/issues/15634
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
2023-04-03 21:49:36 -07:00
caed563e08 fix flaking auth member remove test
Signed-off-by: Chao Chen <chaochn@amazon.com>
2023-04-03 17:41:08 -07:00
69afcd1960 tests: Fix comparing etcd version
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-03 21:13:36 +02:00
6f4e5f316e Merge pull request #15592 from serathius/cleanup-endpoints
tests: Cleanup endpoints
2023-04-03 16:00:44 +02:00
9c72ecb1f9 tests: Fix building incorrect etcd version and make switch strict
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-03 15:06:10 +02:00
e57dcd5ceb test: fix typo in robustness test
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2023-04-03 18:46:32 +08:00
0cbd56e8b6 tests: Cleanup endpoints
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-03 12:18:54 +02:00
7c7f636aea Merge pull request #15615 from serathius/robustness-snapshot-older-version
tests/robustness: Support running snapshot tests on older versions
2023-04-03 12:13:01 +02:00
029315f57e tests/robustness: Support running snapshot tests on older versions
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-03 10:43:06 +02:00
4da39e4b1e Merge pull request #15294 from mitake/range-check
server/auth: disallow creating empty permission ranges
2023-04-03 09:03:50 +09:00
03214c0239 Revert "tests/robustness: Disable testing network blackhole until #15595 is fixed"
This reverts commit 013e25fab9.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-01 16:32:20 +02:00
71ba0873e3 tests/robustness: Encrypt peer traffic to prevent proxy manipulating packets
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-04-01 16:31:53 +02:00
4529f01876 Merge pull request #15601 from serathius/robustness-disable-blackhole
tests/robustness: Disable testing network blackhole until #15595 is fixed
2023-03-31 15:04:24 +02:00
013e25fab9 tests/robustness: Disable testing network blackhole until #15595 is fixed
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-31 13:55:58 +02:00
be7be34800 client: Hide v2 client package
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-31 10:26:11 +02:00
e11a32366e Merge pull request #15544 from jmhbnz/remove_e2e_calc
Remove e2e from coverage calculation
2023-03-30 16:26:36 +02:00
0bd0b6b0b5 Merge pull request #15446 from serathius/separate-grpc-server
Allow user to separate http and grpc server
2023-03-30 11:52:25 +02:00
870d478844 Merge e2e spawn files.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-03-30 22:38:00 +13:00
4340cbb4aa Merge pull request #15575 from serathius/ensure-watch
tests: Ensure watch catches all events generated in traffic
2023-03-30 10:28:22 +02:00
65add8cec4 tests: Test separate http port connection multiplexing
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:49:45 +02:00
bf12179a5a server: Add --listen-client-http-urls flag to allow running grpc server separate from http server
Difference in load configuration for watch delay tests show how huge the
impact is. Even with random write scheduler grpc under http
server can only handle 500 KB with 2 seconds delay. On the other hand,
separate grpc server easily hits 10, 100 or even 1000 MB within 100 miliseconds.

Priority write scheduler that was used in most previous releases
is far worse than random one.

Tests configured to only 5 MB to avoid flakes and taking too long to fill
etcd.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2023-03-30 09:49:45 +02:00
5faad23812 Merge branch 'main' into remove_e2e_calc 2023-03-30 16:46:31 +13:00
4b87bb1852 Remove coverage implementation for ctl_v3_watch test.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-03-30 15:44:17 +13:00
3c40a68d09 Remove nocov flags for e2e tests.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-03-30 15:37:09 +13:00
1b125300bb Remove nocov implementation for e2e spawn.
Signed-off-by: James Blair <mail@jamesblair.net>
2023-03-30 15:17:53 +13:00