Commit Graph

156 Commits

Author SHA1 Message Date
fe35b5130e Fix code scanning alert: This log write receives unsanitized user input 2022-04-19 13:49:08 +02:00
3152dc8174 contrib/raftexample: Save snapshot and WAL before hard state
Update raftexample to save the snapshot file and WAL snapshot entry
before hardstate to ensure the snapshot exists during recovery.
Otherwise if there is a failure after storing the hard state there may
be reference to a non-existent snapshot.
This PR introduces the fix from #10219 to the raftexample.
2022-04-11 23:44:54 +00:00
5cf6ba48de added a unit test for the method processMessages 2022-03-08 09:38:23 +08:00
793218ed2b update the confstate before sending snapshot
When there is a `raftpb.EntryConfChange` after creating the snapshot,
then the confState included in the snapshot is out of date. so We need
to update the confState before sending a snapshot to a follower.
2022-03-07 12:18:29 +08:00
72c33d8b05 contrib/mixin: Generate rules, fix tests
* Add Makefile
* Make tests runnable
* Add generated rule manifest file

Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2022-02-10 16:17:03 +01:00
7460379bad contrib/mixin: add missing summary to alerts
to avoid alert messages being templated with undefined values lets
set summary for alerts that are currently missing one
2022-01-19 19:55:40 +01:00
2a151c8982 *: move from io/ioutil to io and os packages
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-10-28 00:05:28 +08:00
5991da1534 Merge pull request #13388 from grafana/mixin-rate-interval
contrib/mixin: Update dashboard promql to use $__rate_interval.
2021-10-21 08:14:25 -04:00
fead3be933 Grafana datasource template should be labelled 'Data Source'.
Signed-off-by: Tom Wilkie <tom@grafana.com>
2021-10-20 13:42:43 +01:00
aef9131c81 contrib/mixin/mixin.libsonnet: Include gRPC method in alert description
This makes it easier for admin to determine the alert issue.
2021-10-15 15:10:52 +02:00
0eb72bde2c contrib/mixin: omit Defragment method from etcdGRPCRequestsSlow
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
2021-10-08 16:21:46 -04:00
98427d2bed contrib/mixin: Update dashboard queries to use $__rate_interval
A global query variable was introduced in Grafana 7.2 which is "almost always right" for `rate`, `irate`, and `increase` function calls in promql.
2021-10-04 15:02:11 -07:00
b448daa698 Merge pull request #13275 from lilic/add-peer-dashboard
contrib/mixin/mixin.libsonnet: Add dashboard for peer round trip time
2021-08-05 08:27:38 -04:00
55b697c528 contrib/mixin/mixin.libsonnet: Add dashboard for peer round trip time
This helps users debug firing alerts.
2021-08-05 13:15:34 +02:00
44b8ae145b etcdserver: Move datadir and wal to storage package 2021-08-03 12:47:37 +02:00
7885f2a951 Mixin: Support configuring cluster label 2021-07-29 17:54:14 +02:00
85f7b3c406 contrib/mixin/mixin.libsonnet: Unify alerting description 2021-07-16 15:25:53 +02:00
36bb8d293c Use method const in package http instead of literal 2021-07-08 20:00:03 +08:00
f00231951d contrib/mixin/mixin.libsonnet: Adjust gRPC failed requests
OK is not the only one that is allowed, this before also captured
context canceled, NotFound, and other non error requests.
2021-06-21 11:47:53 +02:00
ffea1537d4 ClientV3 tests use integration.NewClient that configures proper logger. 2021-04-29 18:18:34 +02:00
562d645ac9 Fix the mixin.
Signed-off-by: Tom Wilkie <tom@grafana.com>
2021-04-13 19:38:55 +01:00
bad0b4d513 Merge pull request #12823 from mtulio/chore/dash-var-refresh
chore/dash-var-refresh: change default refresh to 2(time range)
2021-04-08 15:14:53 +02:00
aeeecc06cf fix/dash-var-refresh: add const and description 2021-04-08 10:12:41 -03:00
816d332d81 Merge pull request #12830 from ptabor/20210405-split-pkg
Split client/pkg as dedicated low-dependencies module for client
2021-04-08 00:48:41 +02:00
3bb7acc8cf Migrate dependencies pkg/foo -> client/pkg/foo 2021-04-07 00:38:47 +02:00
2ba69de281 Contrib lock example 2021-04-06 15:21:01 -04:00
d2bc5343fb chore/dash-var-refresh: change default refresh to 2(time range) 2021-04-01 00:06:57 -03:00
fce0c192eb Regenerate protos. 2021-03-25 00:31:44 +01:00
3976d68ed3 raftExample: Allow closing raftexample node when snapshotting.
Fix race that made the raftExample test fail.
2021-02-26 08:56:12 +01:00
5ae3f879c9 raftexample: Return an appropriate applyDoneC 2021-02-24 21:28:18 +09:00
cb0d256a18 raftexample: Add test for adding new node to existing cluster 2021-02-22 13:44:33 +09:00
1b1be43d65 raftexample: New joined node have to start with RestartNode 2021-02-22 09:45:44 +09:00
cc2b039817 raftexample: Explicitly notify all committed entries are applied 2021-02-19 19:26:36 +09:00
2d25f7f3da raftexample: Implement ReportUnreachable and ReportSnapshot 2021-02-17 11:59:32 +09:00
1302e1edb2 raftexample: Save snapshot file before writing to wal 2021-02-16 13:30:15 +09:00
1395a1a795 Migrate back mixin to contrib/
The mixin was moved out together with documentation.
This broke kube-prometheous: https://github.com/etcd-io/etcd/issues/12685#issuecomment-777264143
2021-02-11 09:28:30 +01:00
e8ba375032 Merge pull request #11889 from mrkm4ntr/example-recover-from-snap
raftexample: Fix recovery from snapshot
2021-02-10 12:18:30 +01:00
be2167ebab Wait until all committed entries are applied
To take a snapshot
2021-02-05 19:05:41 +09:00
cb14cdd774 raftexample: Fix recovery from snapshot
* If there is a snapshot, HTTP server won't start.
* Resotring form snapshot occurs after replaying WAL.
* When taking a snapshot, the last change is not applied to the state machine yet.
2021-02-05 09:34:34 +09:00
b5d11723d1 Merge pull request #12393 from viviyww/contrib-doc
contrib: del systemd/etcd2-backup-coreos in docs
2021-02-01 15:33:44 +01:00
4af159a30a Merge pull request #12259 from alvistack/master-aio_graceful_reboot
`etcd.service`: Define explicit dependencies of systemd etcd service
2021-01-31 23:58:12 +01:00
b0e2c70c71 contrib/systemd: add a sysusers entry
This adds a sysusers.d file, in order to create a system user/group
which matches the one used by the service unit.

Ref: https://www.freedesktop.org/software/systemd/man/sysusers.d.html
2020-12-09 13:59:46 +00:00
aaf423e962 server: Update imports.
find -name '*.go' | xargs sed -i --follow-symlinks 's|etcd/v3/|etcd/server/v3/|g'
2020-10-26 13:02:32 +01:00
45b007b8b4 contrib,clientv3: Move contrib/recipies to clientv3/experimental/recipies/...
Recipies is set of patterns / primitives implementation on top of clientv3.
It's used by integration tests. It shouldn't be considered "server" code.
2020-10-22 11:10:07 +02:00
e33c6dd9df client/v3: Rename of imports 2020-10-20 10:13:06 +02:00
e62417297d *: Rename of imports of raft (as its now a module)
% find -name '*.go' -o -name '*.md' -o -name '*.sh' | xargs sed -i --follow-symlinks 's|etcd/v3/raft|etcd/raft/v3|g'
2020-10-16 13:58:18 +02:00
5d3609e3cf contrib: del systemd/etcd2-backup-coreos in docs
del systemd/etcd2-back-coreos in docs
2020-10-14 09:49:56 +08:00
de55bb6331 pkg: Rename imports after making 'pkg' a module
find -name '*.go' | xargs sed --follow-symlinks -i 's|go.etcd.io/etcd/v3/pkg/|go.etcd.io/etcd/pkg/v3/|g'
go fmt ./...
2020-10-13 00:09:27 +02:00
28f2b07623 *: Update references to code moved to the api/ dir.
Follow up to file-moves done in the previous commit.

The commit contains purely mechanical consequences of execution (apart
of scripts/genproto.sh):

  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/etcdserver/api/v3rpc/rpctypes|v3/api/v3rpc/rpctypes|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/version|v3/api/version|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/mvcc/mvccpb|v3/api/mvccpb|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/etcdserver/etcdserverpb|v3/api/etcdserverpb|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/etcdserver/api/membership/membershippb|v3/api/membershippb|g'
  % find ./ -name '*.go'  | xargs sed --follow-symlinks -i 's|v3/auth/authpb|v3/api/authpb|g'

  % find ./ -name '*.proto' -o -name '*.md'  | xargs -L 1 sed --follow-symlinks -i 's|/mvcc/mvccpb/kv.proto|/api/mvccpb/kv.proto|g'
  % find ./ -name '*.proto' -o -name '*.md'  | xargs -L 1 sed --follow-symlinks -i 's|/auth/authpb/auth.proto|/api/authpb/auth.proto|g'
  % find ./ -name '*.proto' -o -name '*.md'  | xargs -L 1 sed --follow-symlinks -i 's|/etcdserver/api/membership/membershippb/membership.proto|/api/membershippb/membership.proto|g'

  I also modified manually paths in scripts/genproto.sh.

  % go fmt ./...
2020-10-06 11:56:16 +02:00
17ceed9b47 etcd.service: Support Graceful Reboot for AIO Node
Currently our sample systemd service file `contrib/systemd/etcd.service`
have startup/shutdown dependency as below:

    [Unit]
    After=network.target

For some rare condition, e.g. bare matel deployment with slow network
startup, IP could not be assigned e arly enough before etcd default
`ETCD_HEARTBEAT_INTERVAL="100"` and `ETCD_ELECTION_TIMEOUT="1000"` get
timeouted, after graceful system reboot.

This cause etcd false negative classify itself use unhealthy, therefore
stop rejoining the remaining online cluster members.

This PR introduce:

  - `etcd.service`: Ensure startup after `network-online.target` and
    `time-sync.target`, so effective network connectivity and synced
    time is available.

The logic is concept proof by
<https://github.com/alvistack/ansible-role-etcd/tree/develop>; also
works as expected with Ceph + Kubernetes deployment by
<https://github.com/alvistack/ansible-collection-kubernetes/tree/develop>.
No more deadlock happened during graceful system reboot, both AIO
single/multiple node with loopback mount.

Also see:

  - <https://github.com/ceph/ceph/pull/36776>
  - <https://github.com/etcd-io/etcd/pull/12259>
  - <https://github.com/cri-o/cri-o/pull/4128>
  - <https://github.com/kubernetes/release/pull/1504>

Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com>
2020-09-17 16:59:12 +08:00