Commit Graph

1042 Commits

Author SHA1 Message Date
cdf4228673 Merge pull request #14308 from dusk125/main
server/etcdmain: add configurable cipher list to gRPC proxy listener
2022-09-16 12:15:08 -04:00
b7ba0542f6 Merge pull request #14422 from kkkkun/remove-redundant-code
remove redundant log messsages
2022-09-16 12:18:43 +08:00
c4582aaaee remove redundant log messages
Signed-off-by: kkkkun <scuzk373x@gmail.com>
2022-09-16 11:45:43 +08:00
3b585e94fc mvcc: Remove unused revisions and change comment rev to modified
Signed-off-by: Hongfei Huang <853885165@qq.com>
2022-09-14 23:36:54 +08:00
1d95b82b19 Merge pull request #14421 from vsvastey/usr/vsvastey/open-with-max-index-test-fix
testing: fix TestOpenWithMaxIndex cleanup
2022-09-08 06:41:36 +08:00
a4f140c9fa testing: fix TestOpenWithMaxIndex cleanup
A WAL object was closed by defer, however the WAL was rewritten afterwards,
so defer closed already closed WAL but not the new one. It caused a data
race between writing file and cleaning up a temporary test directory,
which led to a non-deterministic bug.

Fixes #14332

Signed-off-by: Vladimir Sokolov <vsvastey@gmail.com>
2022-09-07 21:30:16 +03:00
3dc5348d94 Merge pull request #14419 from ahrtr/alarm_list_ci
Move consistent_index forward when executing alarmList operation
2022-09-06 03:50:58 +08:00
cc840336f0 move consistent_index forward when executing alarmList operation
The alarm list is the only exception that doesn't move consistent_index
forward. The reproduction steps are as simple as,

```
etcd --snapshot-count=5 &
for i in {1..6}; do etcdctl  alarm list; done
kill -9 <etcd_pid>
etcd
```

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-09-05 10:05:55 +08:00
2a10049e47 fix the potential data loss for clusters with only one member
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.

When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
   1. etcd commits the boltDB transaction periodically instead of on each request;
   2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.

For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-30 15:29:20 +08:00
c52108942b Merge branch 'main' into main 2022-08-29 12:07:27 -04:00
08a9d1da07 chore: remove duplicate word in comments
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-08-27 13:39:48 +08:00
9c8326bb50 Merge pull request #14358 from vivekpatani/main
server/auth: refresh cache on each NewAuthStore
2022-08-27 11:13:59 +09:00
dc4b810195 Merge pull request #14388 from niconorsk/fix/notify-systemd-when-cluster-ready-times-out
etcdmain: Honour ExperimentalWaitClusterReadyTimeout in startEtcd
2022-08-26 18:34:52 +08:00
e15bdd9df1 etcdmain: Honour ExperimentalWaitClusterReadyTimeout in startEtcd
When we can't reach quorum, we were waiting forever and never sending
the systemd notify message. As a result, systemd would eventually time out
and restart the etcd process which likely would make the unhealthy cluster
in an even worse state

Improves #13785

Signed-off-by: Nicolai Moore <niconorsk@gmail.com>
2022-08-26 18:06:50 +10:00
be58a2539c Added client-auto-sync-interval argument to the grpc-proxy
Signed-off-by: Vitalii Levitskii <vitalii@uber.com>
2022-08-25 15:33:38 +03:00
ae608da7e6 server,test: refresh cache on each NewAuthStore
- permissions were incorrectly loaded on restarts.
- https://github.com/etcd-io/etcd/issues/14355

Signed-off-by: vivekpatani <9080894+vivekpatani@users.noreply.github.com>
2022-08-23 20:11:47 -07:00
1851316519 Merge pull request #14266 from stefanmonkey/feat/grpc-logging
Add logging grpc request and response content with grpc-proxy mode
2022-08-19 05:52:16 +08:00
27ffd7e1cf Merge pull request #14351 from spacewander/rca
chore: log when an invalid watch request is received
2022-08-18 09:45:29 +08:00
74506738b8 Refactor the keepAliveListener and keepAliveConn
Only `net.TCPConn` supports `SetKeepAlive` and `SetKeepAlivePeriod`
by default, so if you want to warp multiple layers of net.Listener,
the `keepaliveListener` should be the one which is closest to the
original `net.Listener` implementation, namely `TCPListener`.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-18 04:24:05 +08:00
76a5902efa server/etcdmain: add configurable cipher list to gRPC proxy listener
Signed-off-by: Allen Ray <alray@redhat.com>
2022-08-17 10:56:27 -04:00
508ce517e0 update according to the review
Signed-off-by: spacewander <spacewanderlzx@gmail.com>
2022-08-17 09:25:37 +08:00
bebefd8b80 chore: log when an invalid watch request is received
As protobuf doesn't have required field, user may send an empty
WatchRequest by mistake. Currently, etcd will ignore the invalid request
and keep the stream opening. If we don't reject the invalid request by
closing the stream, it would be better to leave a log there.

This commit also fixes a typo in the comment.

Signed-off-by: spacewander <spacewanderlzx@gmail.com>
2022-08-16 11:33:01 +08:00
fff5d00ccf Merge pull request #14149 from lavacat/main-txn-panic
server: don't panic in readonly serializable txn
2022-08-14 05:41:57 +08:00
cb5f358b5f Merge pull request #14330 from chaochn47/auth_test_logging
logging RoleGrantPermission key and range end
2022-08-11 06:39:56 +08:00
ccd4efc3b3 logging RoleGrantPermission key and range end
Signed-off-by: Chao Chen <chaochn@amazon.com>
2022-08-10 14:51:25 -07:00
6a04f7fbd6 With go-grpc-middleware, add grpc_zap to logging grpc request and response content in grpc-proxy mode In our test environment, it may be very useful to debug who delete etcd's key with grpc-proxy mode
inspired by https://github.com/grpc-ecosystem/go-grpc-middleware

Signed-off-by: stefanbo <stefan_bo@163.com>

Update flag name

1. add changelog
2. update flag name to `experimental-enable-grpc-debug`

Signed-off-by: stefan bo <stefan_bo@163.com>

Update CHANGELOG-3.6.md

Signed-off-by: stefan bo <stefan_bo@163.com>

change flag name

Signed-off-by: stefan bo <stefan_bo@163.com>
2022-08-10 16:10:05 +08:00
43bb9d5c22 server: don't panic in readonly serializable txn
Problem: We pass grpc context down to applier in readonly serializable txn.
This context can be cancelled for example due to timeout.
This will trigger panic inside applyTxn

Solution: Only panic for transactions with write operations

fixes https://github.com/etcd-io/etcd/issues/14110

Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
2022-08-09 00:46:50 -07:00
649babaf4a Merge pull request #14276 from yuzhiquan/add-alarm-metrics
Add alarms metrics for server
2022-08-09 05:06:35 +08:00
871d8fdaf1 Merge pull request #14300 from ahrtr/enhance_wal_find_error
Enhance the WAL file related error
2022-08-05 20:35:48 -04:00
4c13767881 etcdserver: add alarms metrics for server
Signed-off-by: yuzhiquanlong <yuzhiquanlong@gmail.com>
2022-08-03 09:33:02 +08:00
ae36a577d7 Merge pull request #14286 from VladSaioc/bugfix-goroutine-leak
Fixed goroutine leak in server/etcdserver/raft_test.go
2022-08-03 06:02:54 +08:00
3dd7d3f9af enhance the WAL file related error
The `ErrFileNotFound` was used for for three cases:
1. There is no any WAL files (probably due to no read permission);
2. There is no WAL files matching the snapshot index;
3. The WAL file seqs do not increase continuously.

It's not good for debug when users see the `ErrFileNotFound` error,
so in this PR, a different error is returned for each case above.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-08-03 05:37:30 +08:00
6cded3d94c Fixed goroutine leak in server/etcdserver/raft_test.go
Signed-off-by: VladSaioc <vladsaioc10@gmail.com>
2022-08-02 23:22:55 +02:00
4f0e92d94c Merge pull request #14262 from mind1949/update-server-etcdserver-raft
server/etcdserver: check whether raftNode has stopped
2022-08-02 05:58:30 +08:00
ff56da7745 rafthttp: test transport multiple transport removes
Unit test to verify multiple transport removes does not create an
  issue.

Signed-off-by: Austin Benoit <22805659+AustinBenoit@users.noreply.github.com>
2022-07-28 18:23:17 -04:00
fae4650834 Merge pull request #14280 from falser101/fix/zjf
fix: code cleanup
2022-07-27 07:20:43 +08:00
c26d7f5389 fix: code cleanup
Signed-off-by: jianfei.zhang <jianfei.zhang@daocloud.io>
2022-07-26 22:07:22 +08:00
bb7e4653c8 tests: Fix member id in CORRUPT alarm
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-26 15:55:22 +02:00
d44bbff278 server: Make corrtuption check optional and period configurable
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-26 09:31:15 +02:00
6697fca97d server: Implement compaction hash checking
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-26 09:31:14 +02:00
f3bd535747 server/etcdserver: fix test
Signed-off-by: mind1949 <lianjie1949@gmail.com>
2022-07-25 23:20:21 +08:00
c58ec9fe13 server: Refactor compaction checker
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-25 13:59:30 +02:00
264498258b tests: Move CorruptBBolt to testutil
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
2022-07-25 13:59:30 +02:00
2b0596f859 server/etcdserver: check if raftNode has been stopped
Signed-off-by: mind1949 <lianjie1949@gmail.com>
2022-07-23 08:49:07 +08:00
9c4fe12a88 Merge pull request #14169 from ahrtr/http_max_stream_20220628
Support configuring `MaxConcurrentStreams` for http2
2022-07-12 17:38:43 +08:00
e586dc19df lease: Rename Poll to Peek in the LeaseExpiredNotifier
`Poll` means that the elements in the heap will be removed.
And it is more appropriate to use `Peek` to get the top of the heap.

Signed-off-by: SimFG <1142838399@qq.com>
2022-07-07 16:25:26 +08:00
1a6fe4dbc6 update the comment for MaxConcurrentStreams to clearly state it's the max value for each client.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-07 04:51:20 +08:00
053ba95ed5 set max concurrent streams to the http2 server
The default max stream is 250 in http2. When there are more then
250 streams, the client side may be blocked until some previous
streams are released. So we need to support configuring a larger
`MaxConcurrentStreams`.

Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-06 03:43:46 +08:00
6220174687 support custom grpc.MaxConcurrentStreams
There is no update on the original PR (see below) for more then 2
weeks. So Benjamin(@ahrtr) continues to work on the PR. The first
step is to rebase the PR, because there are lots of conflicts with
the main branch.

The change to go.mod and go.sum reverted, because they are not needed.
The e2e test cases are also reverted, because they are not correct.

```
https://github.com/etcd-io/etcd/pull/14081
```

Signed-off-by: nic-chen <chenjunxu6@gmail.com>
Signed-off-by: Benjamin Wang <wachao@vmware.com>
2022-07-06 03:43:46 +08:00
1749a07a20 Merge pull request #14172 from SimFG/snap_log
snap: Delete the nil judgment of the log object
2022-07-04 23:57:15 +02:00