This is the minimal set of package updates I get after:
taking packages from release-3.5 with golang.org/x/crypto@v0.17.0
slowly downgrading package by package to get minimal changes
running make to make sure other dependencies don't change
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Assume #16916 as baseline. The E2E takes `1395.082s`.
* https://github.com/etcd-io/etcd/pull/16988
It introduced `TestAuthority` which takes `18.39s`.
And after https://github.com/etcd-io/etcd/pull/16997, it takes `50.05s`.
* https://github.com/etcd-io/etcd/pull/16995
It introduced `TestInPlaceRecovery` which takes `17.37s`.
* https://github.com/etcd-io/etcd/pull/17144
- New `TestHTTPHealthHandler` takes `29.9s`
- New `TestHTTPLivezReadyzHandler` takes `35.20s`
* https://github.com/etcd-io/etcd/pull/17173
- New `TestMemberReplace` takes `7.55s`.
Ideally, it should increase `140.07s`. It's not larger than `1800s`
timeout value.
However, we run E2E cases 3 times. By default, we run E2E cases with
`-cpu 1,2,4`. That means that we run 3 times.
`1395.082s` + `140.07s * 3` = `1815.292s` > `1800s`
```bash
$ go help testflag
-count n
Run each test, benchmark, and fuzz seed n times (default 1).
If -cpu is set, run n times for each GOMAXPROCS value.
Examples are always run once. -count does not apply to
fuzz tests matched by -fuzz.
```
I don't think we should run E2E with different GOMAXPROCS value. All the
`TestXYZ` are used to control etcd process and we don't set GOMAXPROCS
env to etcd process.
Set `CPU=4` to align with main and release/3.5.
Closes: #17241
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Add two separate probes, one for liveness and one for readiness. The liveness probe would check that the local individual node is up and running, or else restart the node, while the readiness probe would check that the cluster is ready to serve traffic. This would make etcd health-check fully Kubernetes API complient.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Disable following redirects from peer HTTP communication on the client's side.
Etcd server may run into SSRF (Server-side request forgery) when adding a new
member. If users provide a malicious peer URL, the existing etcd members may be
redirected to another unexpected internal URL when getting the new member's
version.
Signed-off-by: Ivan Valdes <ivan@vald.es>
To keep etcd projects up to date with the latest patch releases &
incorporate the latest security updates.
Signed-off-by: arjunmalhotra1 <am2cj@virginia.edu>
fix the unexpected blocking when using Barrier.Wait(), e.g.
NewBarrier(client, a).Wait() will block if key a is not existed but a0 is existed, but it should return immediately.
Signed-off-by: James Blair <mail@jamesblair.net>
This is not yet implementation, just API and tests to be filled
with implementation in next CLs,
tracked by: https://github.com/etcd-io/etcd/issues/12652
We propose here 3 packages:
- clientv3/naming/endpoints ->
That is abstraction layer over etcd that allows to write, read &
watch Endpoints information. It's independent from GRPC API. It hides
the storage details.
- clientv3/naming/endpoints/internal ->
That contains the grpc's compatible Update class to preserve the
internal JSON mashalling format.
- clientv3/naming/resolver ->
That implements the GRPC resolver API, such that etcd can be
used for connection.Dial in grpc.
Please see the grpc_naming.md document changes & grpcproxy/cluster.go
new integration, to see how the new abstractions work.
Signed-off-by: Chao Chen <chaochn@amazon.com>
The PageWriter has cache buffer so that it doesn't call the Writer until
the cache is almost full. Since the data's length is random, the pending
bytes should be always less than cache buffer size, instead of page
size.
Fix: #16255
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit fddd1add52)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
From the name of func 'UniqueURLsFromFlag', we can tell that UniqueURLs'uss
should not have duplicates. The current implemention of UniqueURLs'Set
has a bug to make it unique. This PR fixes it.
Signed-off-by: Jes Cok <xigua67damn@gmail.com>
This contains a slight refactoring to expose enough information
to write meaningful tests for auth applier v3.
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Mitigates #15993 by not checking each key individually for permission
when auth is entirely disabled or admin user is calling the method.
Backport of #16005
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
Progress notifications requested using ProgressRequest were sent
directly using the ctrlStream, which means that they could race
against watch responses in the watchStream.
This would especially happen when the stream was not synced - e.g. if
you requested a progress notification on a freshly created unsynced
watcher, the notification would typically arrive indicating a revision
for which not all watch responses had been sent.
This changes the behaviour so that v3rpc always goes through the watch
stream, using a new RequestProgressAll function that closely matches
the behaviour of the v3rpc code - i.e.
1. Generate a message with WatchId -1, indicating the revision for
*all* watchers in the stream
2. Guarantee that a response is (eventually) sent
The latter might require us to defer the response until all watchers
are synced, which is likely as it should be. Note that we do *not*
guarantee that the number of progress notifications matches the number
of requests, only that eventually at least one gets sent.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Difference in load configuration for watch delay tests show how huge the
impact is. Even with random write scheduler grpc under http
server can only handle 500 KB with 2 seconds delay. On the other hand,
separate grpc server easily hits 10, 100 or even 1000 MB within 100 miliseconds.
Priority write scheduler that was used in most previous releases
is far worse than random one.
Tests configured to only 5 MB to avoid flakes and taking too long to fill
etcd.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Changes invocation from `go test -timeout 30m -v -cpu 1,2,4 '' -v
--count 1 go.etcd.io/etcd/tests/e2e` to `go test -timeout 30m -v -cpu 1,2,4 -v --count 1 go.etcd.io/etcd/tests/e2e` (removes '').
Those braces caused tests to also run in root package.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Problem: during restore in watchableStore.Restore, synced watchers are moved to unsynced.
minRev will be behind since it's not updated when watcher stays synced.
Solution: update minRev
fixes: https://github.com/etcd-io/etcd/issues/15271
Signed-off-by: Bogdan Kanivets <bkanivets@apple.com>
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
There are two goroutines accessing the `gs` grpc server var. Before
insecure `gs` server start, the `gs` can be changed to secure server and
then the client will fail to connect to etcd with insecure request. It
is data-race. We should use argument for reference in the new goroutine.
fix: #15495
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit a9988e2625)
Signed-off-by: Wei Fu <fuweid89@gmail.com>
Historic capnslog timestamps are in microsecond resolution. We need to match that when we migrate to the zap logger.
Signed-off-by: James Blair <mail@jamesblair.net>
Backport https://github.com/etcd-io/etcd/pull/15095 to 3.4.
When promoting a learner, we need to wait until the leader's applied ID
catches up to the commitId. Afterwards, check whether the learner ID
exist or not, and return `membership.ErrIDNotFound` directly in the API
if the member ID not found, to avoid the request being unnecessarily
delivered to raft.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The issue is caused by hand-crafted protobuf message. The runtime.errorBody
defines two protobuf fields with same number. We need to upgrade the
version to fix it. Otherwise, the client side won't receive any errors
from server side because of panic.
```
mismatching field: runtime.errorBody.error, want runtime.errorBody.message
```
It can fix the cases
PASSES="build grpcproxy" CPU=4 RACE=true ./test -run TestV3CurlLeaseRevokeNoTLS
The original error is like:
```
v3_curl_lease_test.go:109: testV3CurlLeaseRevoke: prefix (/v3) endpoint (/kv/lease/revoke): error (read /dev/ptmx: input/output error (expected "etcdserver: requested lease not found", got ["curl: (52) Empty reply from server\r\n"])), wanted etcdserver: requested lease not found
v3_curl_lease_test.go:109: testV3CurlLeaseRevoke: prefix (/v3beta) endpoint (/kv/lease/revoke): error (read /dev/ptmx: input/output error (expected "etcdserver: requested lease not found", got ["curl: (52) Empty reply from server\r\n"])), wanted etcdserver: requested lease not found
```
The `Empty reply from server` is caused by panic and server recover it
but it doesn't have chance to reply to client.
Signed-off-by: Wei Fu <fuweid89@gmail.com>
When users use the TLS CommonName based authentication, the
authTokenBundle is always nil. But it's possible for the clients
to get `rpctypes.ErrAuthOldRevision` response when the clients
concurrently modify auth data (e.g, addUser, deleteUser etc.).
In this case, there is no need to refresh the token; instead the
clients just need to retry the operations (e.g. Put, Delete etc).
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The client retries connection without backoff when the server is gone
after the watch stream is established. This results in high CPU usage
in the client process. This change introduces backoff when the stream is
failed and unavailable.
Signed-off-by: Hisanobu Tomari <posco.grubb@gmail.com>
If one of the nodes in the cluster has lost a dns record,
restarting the second node will break it.
This PR makes an attempt to add a comparison without using a resolver,
which allows to protect cluster from dns errors and does not break
the current logic of comparing urls in the URLStringsEqual function.
You can read more in the issue #7798Fixes#7798
Signed-off-by: Prasad Chandrasekaran <prasadc@vmware.com>
Due to a duplicate call of clientConfigFromCmd, the move-leader command
would fail with "conflicting environment variable is shadowed by corresponding command-line flag".
Also in scenarios where no command-line flag was supplied.
Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
A WAL object was closed by defer, however the WAL was rewritten afterwards,
so defer closed already closed WAL but not the new one. It caused a data
race between writing file and cleaning up a temporary test directory,
which led to a non-deterministic bug.
Fixes#14332
Signed-off-by: Vladimir Sokolov <vsvastey@gmail.com>
For a cluster with only one member, the raft always send identical
unstable entries and committed entries to etcdserver, and etcd
responds to the client once it finishes (actually partially) the
applying workflow.
When the client receives the response, it doesn't mean etcd has already
successfully saved the data, including BoltDB and WAL, because:
1. etcd commits the boltDB transaction periodically instead of on each request;
2. etcd saves WAL entries in parallel with applying the committed entries.
Accordingly, it may run into a situation of data loss when the etcd crashes
immediately after responding to the client and before the boltDB and WAL
successfully save the data to disk.
Note that this issue can only happen for clusters with only one member.
For clusters with multiple members, it isn't an issue, because etcd will
not commit & apply the data before it being replicated to majority members.
When the client receives the response, it means the data must have been applied.
It further means the data must have been committed.
Note: for clusters with multiple members, the raft will never send identical
unstable entries and committed entries to etcdserver.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
In v3.5 it is assumed that the logger should not be nil, however it is
still a case in v3.4. The PR targeted to v3.5 was backported to 3.4 and
that's why it's possible to get panic on nil logger in 3.4. This commit
fixed this issue.
Fixes#14402
Signed-off-by: Vladimir Sokolov <vsvastey@gmail.com>
The latest vesion v0.1.12 was just released On Jul 27, 2022,
and it is causing issue (see below) on the govet check,
```
govet_shadow' started at Sun Jul 31 23:23:27 PDT 2022
go get: upgraded golang.org/x/net v0.0.0-20211112202133-69e39bad7dc2 => v0.0.0-20220722155237-a158d28d115b
go get: upgraded golang.org/x/sys v0.0.0-20211019181941-9d821ace8654 => v0.0.0-20220722155257-8c9f86f7a55f
go get: upgraded golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135 => v0.1.12
/root/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_metrics.go:7:2: missing go.sum entry for module providing package golang.org/x/net/context (imported by go.etcd.io/etcd/etcdserver/etcdserverpb); to add:
go get go.etcd.io/etcd/etcdserver/etcdserverpb
/root/go/pkg/mod/google.golang.org/grpc@v1.26.0/internal/transport/controlbuf.go:28:2: missing go.sum entry for module providing package golang.org/x/net/http2 (imported by go.etcd.io/etcd/embed); to add:
go get go.etcd.io/etcd/embed
/root/go/pkg/mod/google.golang.org/grpc@v1.26.0/internal/transport/controlbuf.go:29:2: missing go.sum entry for module providing package golang.org/x/net/http2/hpack (imported by github.com/soheilhy/cmux); to add:
go get github.com/soheilhy/cmux@v0.1.4
/root/go/pkg/mod/google.golang.org/grpc@v1.26.0/server.go:36:2: missing go.sum entry for module providing package golang.org/x/net/trace (imported by go.etcd.io/etcd/embed); to add:
go get go.etcd.io/etcd/embed
```
It isn't good to always to use the latest version. Instead, we should
lock down the version, and v0.1.11 was confirmed to be working.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
To avoid inconsistant behavior during cluster upgrade we are feature
gating persistance behind cluster version. This should ensure that
all cluster members are upgraded to v3.6 before changing behavior.
To allow backporting this fix to v3.5 we are also introducing flag
--experimental-enable-lease-checkpoint-persist that will allow for
smooth upgrade in v3.5 clusters with this feature enabled.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
This is a backport of https://github.com/etcd-io/etcd/pull/13435 and is
part of the work for 3.4.20
https://github.com/etcd-io/etcd/issues/14232.
The original change had a second commit that modifies a changelog file.
The 3.4 branch does not include any changelog file, so that part was not
cherry-picked.
Local Testing:
- `make build`
- `make test`
Both succeed.
Signed-off-by: Ramsés Morales <ramses@gmail.com>
Current checkpointing mechanism is buggy. New checkpoints for any lease
are scheduled only until the first leader change. Added fix for that
and a test that will check it.
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
To improve debuggability of `agreement among raft nodes before
linearized reading`, we added some tracing inside
`linearizableReadLoop`.
This will allow us to know the timing of `s.r.ReadIndex` vs
`s.applyWait.Wait(rs.Index)`.
Signed-off-by: Chao Chen <chaochn@amazon.com>
This change is to ensure that all members returned during the client's
AutoSync are started and are not learners, which are not valid
etcd members to make requests to.
Signed-off-by: Chris Ayoub <cayoub@hubspot.com>
Cherry pick https://github.com/etcd-io/etcd/pull/13932 to 3.4.
When etcdserver receives a LeaseRenew request, it may be still in
progress of processing the LeaseGrantRequest on exact the same
leaseID. Accordingly it may return a TTL=0 to client due to the
leaseID not found error. So the leader should wait for the appliedID
to be available before processing client requests.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Fix the following error in integration pipeline,
```
=== RUN TestTLSReloadCopy
v3_grpc_test.go:1754: tls: failed to find any PEM data in key input
v3_grpc_test.go:1754: tls: private key does not match public key
v3_grpc_test.go:1754: tls: private key does not match public key
v3_grpc_test.go:1754: tls: private key does not match public key
```
Refer to https://github.com/etcd-io/etcd/runs/7123775361?check_suite_focus=true
Signed-off-by: Benjamin Wang <wachao@vmware.com>
We shouldn't fail the grpc-server (completely) by a not implemented RPC.
Failing whole server by remote request is anti-pattern and security
risk.
Refer to https://github.com/etcd-io/etcd/runs/7034342964?check_suite_focus=true#step:5:2284
```
=== RUN TestWatchRequestProgress/1-watcher
panic: not implemented
goroutine 83024 [running]:
go.etcd.io/etcd/proxy/grpcproxy.(*watchProxyStream).recvLoop(0xc009232f00, 0x4a73e1, 0xc00e2406e0)
/home/runner/work/etcd/etcd/proxy/grpcproxy/watch.go:265 +0xbf2
go.etcd.io/etcd/proxy/grpcproxy.(*watchProxy).Watch.func1(0xc0038a3bc0, 0xc009232f00)
/home/runner/work/etcd/etcd/proxy/grpcproxy/watch.go:125 +0x70
created by go.etcd.io/etcd/proxy/grpcproxy.(*watchProxy).Watch
/home/runner/work/etcd/etcd/proxy/grpcproxy/watch.go:123 +0x73b
FAIL go.etcd.io/etcd/clientv3/integration 222.813s
FAIL
```
Signed-off-by: Benjamin Wang <wachao@vmware.com>
The first bug fix is to resolve the race condition between goroutine
and channel on the same leases to be revoked. It's a classic mistake
in using Golang channel + goroutine. Please refer to
https://go.dev/doc/effective_go#channels
The second bug fix is to resolve the issue that etcd lessor may
continue to schedule checkpoint after stepping down the leader role.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
Items resolved:
1. fix the vet error: possible misuse of reflect.SliceHeader;
2. fix the vet error: call to (*T).Fatal from a non-test goroutine;
3. bump package golang.org/x/crypto, net and sys;
4. bump boltdb from 1.3.3 to 1.3.6;
5. remove the vendor directory;
6. remove go 1.12.17 and 1.15.15, add go 1.16.15 into pipeline;
7. bump go version to 1.16 in go.mod;
8. fix the issue: compile: version go1.16.15 does not match go tool version go1.17.11,
refer to https://github.com/actions/setup-go/issues/107;
9. fix data race on compactMainRev and watcherGauge;
10. fix test failure for TestLeasingTxnOwnerGet in cluster_proxy mode.
Signed-off-by: Benjamin Wang <wachao@vmware.com>
- retry_interceptor_test causes:
clientv3/naming/grpc.go:25:2: module google.golang.org/grpc@latest found (v1.46.0),
but does not contain package google.golang.org/grpc/naming
https://github.com/etcd-io/etcd/issues/12124
To fix a bug in the retry logic caused when the auth token is cleared after receiving `ErrInvalidAuthToken` from the server and the subsequent call to `getToken` also fails due to some reason (eg. context deadline exceeded).
This leaves the client without a token and the retry will continue to fail with `ErrUserEmpty` unless the token is refreshed.
Problem: Defrag was implemented before custom bolt options were added.
Currently defrag doesn't restore backend options.
For example BackendFreelistType will be unset after defrag.
Solution: save bolt db options and use them in defrag.
Update go to 1.15.15 which is the latest of 1.15 because linux-amd64-fmt fails with go 1.15.13.
Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
Ensure the client which access etcd via grpc-gateway won't
be limited by the MaxCallRecvMsgSize. Here we choose the same
default value of etcdcli as grpc-gateway's MaxCallRecvMsgSize.
Fix https://github.com/etcd-io/etcd/issues/12576
grpc proxy opens additional 2 watching channels. The metric is shared
between etcd-server & grpc_proxy, so all assertions on number of open
watch channels need to take in consideration the additional "2"
channels.
Cherry-pick of 60e44286fa from master branch does not work due to
missing `tls.CipherSuites()` function. We work around by using go build
tags for both the building and tests.
In go1.13, the TLS13 is enablled by default, and as per go1.13 release notes :
TLS 1.3 cipher suites are not configurable. All supported cipher suites are safe,
and if PreferServerCipherSuites is set in Config the preference order is based
on the available hardware.
Fixing the test case for go1.13 by limiting the TLS version to TLS12
There are situations where we don't wish to fsync but we do want to
write the data.
Typically this occurs in clusters where fsync latency (often the
result of firmware) transiently spikes. For Kubernetes clusters this
causes (many) elections which have knock-on effects such that the API
server will transiently fail causing other components fail in turn.
By writing the data (buffered and asynchronously flushed, so in most
situations the write is fast) and avoiding the fsync we no longer
trigger this situation and opportunistically write out the data.
Anecdotally:
Because the fsync is missing there is the argument that certain
types of failure events will cause data corruption or loss, in
testing this wasn't seen. If this was to occur the expectation is
the member can be readded to a cluster or worst-case restored from a
robust persisted snapshot.
The etcd members are deployed across isolated racks with different
power feeds. An instantaneous failure of all of them simultaneously
is unlikely.
Testing was usually of the form:
* create (Kubernetes) etcd write-churn by creating replicasets of
some 1000s of pods
* break/fail the leader
Failure testing included:
* hard node power-off events
* disk removal
* orderly reboots/shutdown
In all cases when the node recovered it was able to rejoin the
cluster and synchronize.
This fixes etcd being unable to send any message longer than 64 KB as
a notification over the websocket. This was because the older version
of grpc-websocket-proxy was used and WithMaxRespBodyBufferSize option
wasn't set.
To fix a panic that happens when trying to get ids of etcd members in
force new cluster mode, the issue happen if the cluster previously had
etcd learner nodes added to the cluster
Fixes#12285
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.
e.g.
os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
os.MkdirAll creates directory before umask so make sure that a desired
permission is set after creating a directory with MkdirAll. Use the
existing TouchDirAll function which checks for permission if dir is already
exist and when create a new dir.
Currently, watch cancel requests are only sent to the server after a
message comes through on a watch where the client has cancelled. This
means that cancelled watches that don't receive any new messages are
never cancelled; they persist for the lifetime of the client stream.
This has negative connotations for locking applications where a watch
may observe a key which might never change again after cancellation,
leading to many accumulating watches on the server.
By cancelling proactively, in most cases we simply move the cancel
request to happen earlier, and additionally we solve the case where the
cancel request would never be sent.
Fixes#9416
Heavy inspiration drawn from the solutions proposed there.
When an etcd instance attempts to perform service discovery, if a
cluster size with negative value is provided, the etcd instance
will panic without recovery because of
This makes it possible to run an etcd node for testing and development
without placing lots of load on the file system.
Fixes#11930.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
* clientv3: fix grpc-go(v1.27.0) incompatible changes to balancer/resolver.
* vendor: upgrade gRPC Go to v1.24.0
Picking up some performance improvements and bug fixes.
https://github.com/grpc/grpc-go/releases/tag/v1.24.0
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
* vendor: update gRPC Go to v1.26.0 (#11522)
* GO111MODULE=on go mod vendor
* GO111MODULE=on go mod vendor go 1.14
Bump travis 2
Co-authored-by: EDDYCJY <313687982@qq.com>
Co-authored-by: Gyuho Lee <leegyuho@amazon.com>
Co-authored-by: Yuchen Zhou <yczhou@google.com>
When promoting a learner member we should not count already a voting
member, but take only into account the number of existing voting
members and their current status (started, unstarted) when taking the
decision whether a learner member can be promoted.
Before this change, it was impossible to grow from a quorum N to a N+1
through promoting a learning member.
Fixes: #11633
In case of URLs that are synonyms, the current lexicographic sorting
and compare of the URLs fails with frustrating errors. Make sure to do
a full comparison between every set of PeerURLs before failing.
Fixes#11013
In the Background section, the document describes various challenges for cluster membership change.
Added section header for each case described for better readability.
When a leader removes itself, it will retain its leadership but not
accept new proposals, making the range effectively stuck until manual
intervention triggers a campaign event.
This commit documents the behavior. It does not correct it yet.
Verifiy the behavior in various v1 and v2 conf change operations.
This also includes various fixups, notably it adds protection
against transitioning in and out of new configs when this is not
permissible.
There are more threads to pull, but those are left for future commits.
When the leader applied a new configuration that added voters, it would
not immediately probe these voters, delaying when they would be caught
up.
I noticed this while writing an interaction-driven test, which has now
been cleaned up and completed.
It is a helper case to attach a debugger to when a problem needs
to be investigated in a longer test file. In such a case, add the
following stanza immediately before the interesting behavior starts:
_breakpoint:
----
ok
and set a breakpoint on the _breakpoint case.
Initializing at LastIndex+1 meant that new peers would not be probed
immediately when they appeared in the leader's config, which delays
their getting caught up.
It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.
This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.
In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).
The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in
https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263
which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".
Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
While writing interaction tests for joint configuration changes, I
realized that this wasn't working yet - restoring had no notion of
the joint configuration and was simply dropping it on the floor.
This commit introduces a helper `confchange.Restore` which takes
a `ConfState` and initializes a `Tracker` from it.
This is then used both in `(*raft).restore` as well as in `newRaft`.
This is helpful for upcoming testing work which allows datadriven
testing of the interaction of multiple nodes. This testing requires
determinism to work correctly.
Useful for deciding when to terminate the unhealthy follower.
If the follower is receiving a leader snapshot, operator may wait.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
add 1-size buffer for `errc` to avoid deadlock of child goroutine
add a local variable to a void data race in `err`
when `case <-stream.Context().Done():` is taken
It turns out that that learners must be allowed to cast votes.
This seems counter- intuitive but is necessary in the situation in which
a learner has been promoted (i.e. is now a voter) but has not learned
about this yet.
For example, consider a group in which id=1 is a learner and id=2 and
id=3 are voters. A configuration change promoting 1 can be committed on
the quorum `{2,3}` without the config change being appended to the
learner's log. If the leader (say 2) fails, there are de facto two
voters remaining. Only 3 can win an election (due to its log containing
all committed entries), but to do so it will need 1 to vote. But 1
considers itself a learner and will continue to do so until 3 has
stepped up as leader, replicates the conf change to 1, and 1 applies it.
Ultimately, by receiving a request to vote, the learner realizes that
the candidate believes it to be a voter, and that it should act
accordingly. The candidate's config may be stale, too; but in that case
it won't win the election, at least in the absence of the bug discussed
in:
https://github.com/etcd-io/etcd/issues/7625#issuecomment-488798263.
An etcd member being down is an important failure state - while
normal admin operations may cause transient outages to rotate,
when any member is down the cluster is operating in a degraded
fashion. Add an alert that records when any members are down
so that administrators know whether the next failure is fatal.
The rule is more complicated than `up{...} == 0` because not all
failure modes for etcd may have an `up{...}` entry for each member.
For instance, a Kubernetes service in front of an etcd cluster
might only have 2 endpoints recorded in `up` because the third
pod is evicted by the kubelet - the cluster is degraded but
`count(up{...})` would not return the full quorum size. Instead,
use network peer send failures as a failure detector and attempt
to return the max of down services or failing peers. We may
undercount the number of total failures, but we will at least
alert that a member is down.
Make log level configurable, and deprecate "debug" flag in v3.5.
And adds more warnings on flags that's being deprecated in v3.5.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Overwrite authority when it's IP.
When user dials with "grpc.WithDialer", "grpc.DialContext" "cc.parsedTarget"
update only happens once. This is problematic, because when TLS is enabled,
retries happen through "grpc.WithDialer" with static "cc.parsedTarget" from
the initial dial call.
If the server authenticates by IP addresses, we want to set a new endpoint as
a new authority. Otherwise
"transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.168.121.180, not 192.168.223.156"
when the new dial target is "192.168.121.180" whose certificate host name is also "192.168.121.180"
but client tries to authenticate with previously set "cc.parsedTarget" field "192.168.223.156"
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
This fixes an index out-of-bounds panic caused when using the embed
package and the zap logger. When a TLS handshake error is logged, the
slice for cert ip addresses is allocated with capacity but no length, so
subsequent index access causes the panic, and doesn't surface the TLS
handshake error to the user.
Fixes#10932
I changed `(*RawNode).Ready`'s behavior in #10892 in a problematic way.
Previously, `Ready()` would create and immediately "accept" a Ready
(i.e. commit the app to actually handling it). In #10892, Ready() became
a pure read-only operation and the "accepting" was moved to
`Advance(rd)`. As a result it was illegal to use the RawNode in certain
ways while the Ready was being handled. Failure to do so would result in
dropped messages (and perhaps worse). For example, with the following
operations
1. `rd := rawNode.Ready()`
2. `rawNode.Step(someMsg)`
3. `rawNode.Advance(rd)`
`someMsg` would be dropped, because `Advance()` would clear out the
outgoing messages thinking that they had all been handled by the client.
I mistakenly assumed that this restriction had existed prior, but this
is incorrect.
I noticed this while trying to pick up the above PR in CockroachDB,
where it caused unit test failures, precisely due to the above example.
This PR reestablishes the previous behavior (result of `Ready()` must
be handled by the app) and adds a regression test.
While I was there, I carried out a few small clarifying refactors.
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).
pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.
ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
It has a data race between the test's call to `reduceUncommittedSize`
and a corresponding call during Ready handling in `(*node).run()`.
The corresponding RawNode test still verifies the functionality, so
instead of fixing the test we can remove it.
We are worried about breaking backwards compatibility for any
application out there that may have relied on the old behavior. Their
RawNode invocation would have been broken by the removal of the peers
argument so it would not have changed silently; an associated comment
tells callers how to fix it.
This is the first (maybe not last) step in cleaning up the bootstrap
code around StartNode.
Initializing a Raft group for the first time is awkward, since a
configuration has to be pulled from thin air. The way this is solved
today is unclean: The app is supposed to pass peers to StartNode(),
we add configuration changes for them to the log, immediately pretend
that they are applied, but actually leave them unapplied (to give the
app a chance to observe them, though if the app did decide to not apply
them things would really go off the rails), and then return control to
the app. The app will then process the initial Readys and as a result
the configuration will be persisted to disk; restarts of the node then
use RestartNode which doesn't take any peers.
The code that did this lived awkwardly in two places fairly deep down
the callstack, though it was really only necessary in StartNode(). This
commit refactors things to make this more obvious: only StartNode does
this dance now. In particular, RawNode does not support this at all any
more; it expects the app to set up its Storage correctly.
Future work may provide helpers to make this "preseeding" of the Storage
more user-friendly. It isn't entirely straightforward to do so since
the Storage interface doesn't provide the right accessors for this
purpose. Briefly speaking, we want to make sure that a non-bootstrapped
node can never catch up via the log so that we can implicitly use one
of the "skipped" log entries to represent the configuration change into
the bootstrap configuration. This is an invasive change that affects
all consumers of raft, and it is of lower urgency since the code (post
this commit) already encapsulates the complexity sufficiently.
It has always bugged me that any new feature essentially needed to be
tested twice due to the two ways in which apps can use raft (`*node` and
`*RawNode`). Due to upcoming testing work for joint consensus, now is a
good time to rectify this somewhat.
This commit removes most logic from `(*node).run` and uses `*RawNode`
internally. This simplifies the logic and also lead (via debugging) to
some insight on how the semantics of the approaches differ, which is now
documented in the comments.
Now that a Config is also added to the full status, the old name
did not convey the intention, which was to get a Status without
an associated allocation.
Recent refactoring to the String() method of `Progress` hit an NPE
because we return nil Inflights as part of the Raft status. Just
fix this at the source and properly populate the Raft status instead
of teaching String() to ignore nil. A real Progress always has a
non-nil Inflights.
This commit introduces machinery to safely apply joint consensus
configuration changes to Raft.
The main contribution is the new package, `confchange`, which offers
the primitives `Simple`, `EnterJoint`, and `LeaveJoint`.
The first two take a list of configuration changes. `Simple` only
declares success if these configuration changes (applied atomically)
change the set of voters by at most one (i.e. it's fine to add or
remove any number of learners, but change only one voter). `EnterJoint`
makes the configuration joint and then applies the changes to it, in
preparation of the caller returning later and transitioning out of the
joint config into the final desired configuration via `LeaveJoint()`.
This commit streamlines the conversion between voters and learners, which
is now generally allowed whenever the above conditions are upheld (i.e.
it's not possible to demote a voter and add a new voter in the context
of a Simple configuration change, but it is possible via EnterJoint).
Previously, we had the artificial restriction that a voter could not be
demoted to a learner, but had to be removed first.
Even though demoting a learner is generally less useful than promoting
a learner (the latter is used to catch up future voters), demotions
could see use in improved handling of temporary node unavailability,
where it is desired to remove voting power from a down node, but to
preserve its data should it return.
An additional change that was made in this commit is to prevent the use
of empty commit quorums, which was previously possible but for no good
reason; this:
Closes#10884.
The work left to do in a future PR is to actually expose joint
configurations to the applications using Raft. This will entail mostly
API design and the addition of suitable testing, which to be carried
out ergonomically is likely to motivate a larger refactor.
Touches #7625.
I had been dealing with these intermittent failures for a while and
finally figured out why. The real solution is making genproto.sh less
ugly but that won't happen for a while.
In order to cover message can well be received when a node is paused, this commit sends message async using goroutine and random sleep. This change makes recvms is possible to cache message during node.pause is triggered.
This run *should* certainly pass, but it's consistently the one that
fails with a regularity that essentially blocks the CI pipeline.
Someone needs to take a look at #10700, but in the meantime, the show
must go on.
Etcd currently supports validating peers based on their TLS certificate's
CN field. The current best practice for creation and validation of TLS
certs is to use the Subject Alternative Name (SAN) fields instead, so that
a certificate might be issued with a unique CN and its logical
identities in the SANs.
This commit extends the peer validation logic to use Go's
`(*"crypto/x509".Certificate).ValidateHostname` function for name
validation, which allows SANs to be used for peer access control.
In addition, it allows name validation to be enabled on clients as well.
This is used when running Etcd behind an authenticating proxy, or as
an internal component in a larger system (like a Kubernetes master).
Make it less verbose by omitting the values for the steady state.
Also rearrange the order so that information that is typically more
relevant is printed first.
At the time of writing, we don't allow configuration changes to change
voters to learners directly, but note that a snapshot may compress
multiple changes to the configuration into one: the voter could have
been removed, then readded as a learner and the snapshot reflects both
changes. In that case, a voter receives a snapshot telling it that it is
now a learner. In fact, the node has to accept that snapshot, or it is
permanently cut off from the Raft log.
I think this just wasn't realized in the original work, but this is just
my guess since there generally is very little rationale on the various
decisions made. I also generally haven't been able to figure out whether
the decision to prevent voters from becoming learners without first
having been removed was motivated by some particular concern, or if it
just wasn't deemed necessary. I suspect it is the latter because
demoting a voter seems perfectly safe.
See https://github.com/etcd-io/etcd/pull/8751#issuecomment-342028091.
This is helpful to quickly print the configuration log messages without
having to specify Voters and Learners separately.
It will also come in handy for joint quorums because it allows holding
on to voters and learners as a unit, which is useful for unit testing.
Put all the logic related to applying a configuration change in one
place in preparation for adding joint consensus.
This inspired various TODOs.
I had to rewrite TestSnapshotSucceedViaAppResp since it was relying
on a snapshot applied to the leader, which is now prevented.
Description: w.mu is locked at line 385 and unlocked at line 396. Among 5 return statements in this function, 4 are below line 396 but there is 1 return at line 387.
Fix: Add w.mu.Unlock() before that return at line 387.
Mechanically extract `progressTracker`, `Progress`, and `inflights`
to their own package named `tracker`. Add lots of comments in the
progress, and take the opportunity to rename and clarify various
fields.
Every other test build times out due to the 20 minute test timeout. I
doesn't seem like tests are actually hanging, it's more that 20 minutes
just isn't enough to run the tests any more.
To ease a future transition into joint quorums, this commit removes the
previous "ad-hoc" majority-based quorum and vote computations with that
introduced in the `raft/quorum` package.
More specifically, the progressTracker now uses a quorum.JointConfig for
which the "second" majority quorum is always empty; in this case the
quorum behaves like the one quorum.MajorityConfig that is actually
present. Or, more briefly, this change is a no-op, but it will take the
busywork out of actually starting to make use of joint quorums in the
future.
On a side node, I suspect that this might've fixed a bug regarding the
read index though I haven't been able to explicitly come up with a
counter-example. The problem was that the acks collected for the read
index weren't taking into account membership changes, so they'd run the
danger of using acks from nodes since removed to claim that a quorum of
acks had been received. There's a chance that there isn't a
counter-example (the only guarantee extracted from the "quorum" is that
there isn't another leader, but even if there's another leader all that
matters is that that leader doesn't have a divergent history from the
stale leader in the hypothetical counter-example), but either way there
is morally a bug here that is now fixed because VoteCommitted doesn't
care about votes from members that are not voters known to the currently
active configuration.
Instead of having disjoint mappings of ID to *Progress for voters and
learners, use a map[id]struct{} for each and share a map of *Progress
among them.
This is easier to handle when joint quorums are introduced, at which
point a node may be a voting member of two quorums.
The quorum package contains logic to reason about committed indexes as
well as vote outcomes for both majority and joint quorums. The package
is oblivious to the existence of learner replicas.
The plan is to hook this up to etcd/raft in subsequent commits.
We were already taking some precautions against learners campaigning,
but there was no safeguard against an explicit call to `Campaign()`.
The newly added test also verifies that leadership transfers to
learners are ignored.
This will keep them from diverging to much. In fact we should remove
some of the obvious differences that have crept in over time so that
what remains is structural. This isn't done in this commit since
it amounts to a change in the public API; we should lump this in
when we break the public API the next time.
Update to Go 1.12.5 testing. Remove deprecated unused and gosimple
pacakges, and mask staticcheck 1006. Also, fix unconvert errors related
to unnecessary type conversions and following staticcheck errors:
- remove redundant return statements
- use for range instead of for select
- use time.Since instead of time.Now().Sub
- omit comparison to bool constant
- replace T.Fatal and T.Fatalf in tests with T.Error and T.Fatalf respectively because the goroutine calls T.Fatal must be called in the same goroutine as the test
- fix error strings that should not be capitalized
- use sort.Strings(...) instead of sort.Sort(sort.StringSlice(...))
- use he status code of Canceled instead of grpc.ErrClientConnClosing which is deprecated
- use use status.Errorf instead of grpc.Errorf which is deprecated
Related #10528#10438
related: https://github.com/etcd-io/etcd/pull/10668
Provide a ref to the issue and PR management doc from the README,
similar to other references we have provided in the README.
This commit adds a feature for creating a user without password. The
purpose of the feature is reducing attack surface by configuring bad
passwords (CN based auth will be allowed for the user).
The feature can be used with `--no-password` of `etcdctl user add`
command.
Fix https://github.com/coreos/etcd/issues/9590
Update TestMemberPromote to include both learner not-ready and learner
ready test cases.
Removed unit test TestPromoteMember, it requires underlying raft node to
be started and running. The member promote is covered by the integration
test.
Be more explicit in document and command line usage message that if a
config file is provided, other command line flags and environment
variables will be ignored.
Improve readability of ETCD_CONFIG_FILE env variable parsing part
by adding comments and using flags.FlagToEnv function.
Signed-off-by: Andrey Abramov <st5pub@yandex.ru>
This doesn't completely eliminate access to prs.nodes, but that's not
really necessary. This commit uses the existing APIs in a few more
places where it's convenient, and also sprinkles some assertions.
The Progress maps contain both the active configuration and information
about the replication status. By pulling it into its own component, this
becomes easier to unit test and also clarifies the code, which will see
changes as etcd-io/etcd#7625 is addressed.
More functionality will move into `prs` in self-contained follow-up commits.
I no longer have the time to maintain etcd as my career takes me to a different direction; Hence, I think it is appropriate to remove myself from the maintainer responsibility.
Working on etcd project was one of the most challenging and rewarding experiences I ever had. Thanks @xiang90 @gyuho @heyitsanthony @philips
We may not want to suggest to contact CoreOS now. We could remove this
section but consiering the nature of the subject, discussion with the project
maintainers probably a good idea if someone doesn't find it comfortable
to report an issue right away.
Adding integration test TestTransferLeadershipWithLearner, which ensures
that TransferLeadership does not timeout due to learner is automatically
picked by leader as transferee.
1. Maintenance API MoveLeader() returns ErrBadLeaderTransferee if
transferee does not exist or is raft learner.
2. etcdserver TransferLeadership() only choose voting member as
transferee.
Added IsLearner field to etcdserver internal Member type. Routed
learner MemberAdd request from server API to raft. Apply learner
MemberAdd result to server after the request is passed through Raft.
- Added isLearner flag to MemberAddRequest in Cluster API.
- Added isLearner field to StatusResponse in Maintenance API.
- Added MemberPromote rpc to Cluster API.
I would like to propose a formal guide for issue triage and PR management.
This should help us keep open issues and PRs under a desirable numbers.
For example, keep issues under 100. These guidelines should specially help
manage and close issues and PRs that are inactive in a timely manner.
To remove the dependency on ghodss/yaml. Replaced this dependency with sigs.k8s.io/yaml.
This wil help to remove the ghodss/yaml dependency from main kubernetes repository.
xref: https://github.com/kubernetes/kubernetes/issues/77024
The documentation mentions fio as a tool to benchmark disks to assess
whether they are fast enough for etcd. But doing that is far from trivial,
because fio is very flexible and complex to use, and the user must make sure
that the workload fio generates mirrors the I/O workload of its etcd cluster
closely enough. This commit adds links to a blog post with an example of how
to do that.
Appending to an empty slice twice could (and often did) result in
multiple allocations. This was wasteful. We can avoid this by performing
a single allocation with the correct size and copying into it.
`raftpb.Entry.String` takes a pointer receiver, so calling it
on a loop variable was causing the variable to escape. Removing
the `.String()` call was enough to avoid the allocation, but
this also avoids a memory copy and prevents similar bugs.
This was responsible for 11.63% of total allocations in an
experiment with https://github.com/nvanbenschoten/raft-toy.
By boxing a heap-allocated slice header instead of the slice
header on the stack, we can avoid an allocation when passing
through the sort.Interface interface.
This was responsible for 26.61% of total allocations in an
experiment with https://github.com/nvanbenschoten/raft-toy.
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Add a bunch of tests on fundamental operation of Create/Set/Delete.
Context: I want to use this package to run the discovery service against
the v3 storage backend. The limited functionality already implemented
should be sufficient to do this.
https://github.com/coreos/discovery.etcd.io/issues/52
Leader should check message sender after receiving MsgReadIndex, even
when raft quorum is 1. The message could be sent from learner node, and
leader should respond.
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.
This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().
This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.
Fixes: https://github.com/etcd-io/etcd/issues/10588
Add documentation to clarify that when writing TXN commands, multi-line values should be written using "\n" and not a literal newline (as in other commands).
Fixes#10169
Closing of watch by client will cancel the watch grpc stream and
can produce a context canceled error. However, since client
simply wanted to close the watcher the error can create confusion
that something went wrong instead of a successful close. Ensure
that Close do not return error.
Fixed#10340
Currently, when we access the URL https://www.grpc.io, it is redirected to https://www.grpc.io automatically.
So this commit aims to replace **HTTP** to **HTTPs** for security.
Currently alert messages state that we detect issue
within the last 1 hour, although we check
for last 15min and wait for 15min for this alert to keep firing.
This fix changes the message to be 30minutes.
This commit lets grpc gateway return a correct error to clients.
Even if a client has a cert with non empty CN, current gateway returns
an error like below:
```
$ curl --cacert ./integration/fixtures/ca.crt --cert ./integration/fixtures/server.crt --key ./integration/fixtures/server.key.insecure https://localhost:2379/v3/kv/put -X POST -d '{"key": "fromcurl", "value": "test"}'
{"error":"etcdserver: user name is empty","code":3}
```
This is because etcd ignores CN from gateway connection.
The error will be like this:
```
$ curl --cacert ./integration/fixtures/ca.crt --cert ./integration/fixtures/server.crt --key ./integration/fixtures/server.key.insecure https://localhost:2379/v3/kv/put -X POST -d '{"key": "fromcurl", "value": "test"}'
CommonName of client sending a request against gateway will be ignored and not used as expected
```
The error will be returned if the server is enabling auth and gRPC
gateway.
Don't panic if command is given in interactive mode, give a nice error
message instead.
Before:
$ ./bin/etcdctl watch -i
<hit return>
panic: runtime error: index out of range
goroutine 1 [running]:
etcdctl/ctlv3/command.watchInteractiveFunc(...)
etcd/etcdctl/ctlv3/command/watch_command.go:104 ...
After:
$ ./bin/etcdctl watch -i
<hit return>
Invalid command: (watch and progress supported)
foo
Invalid command foo (only support watch)
The documentation recommends using `etcdctl` to verify that etcd has
been built correctly, but the `go get` command provided does not install
it. This changes it so that it does, and also clarifies terminology
('using `go get`', referring to the command used to install it, rather
than 'using `$GOPATH`').
When a follower requires a snapshot and the snapshot is sent at the
committed (and last) index in an otherwise idle Raft group, the follower
would previously remain in ProgressStateProbe even though it had been
caught up completely.
In a busy Raft group this wouldn't be an issue since the next round of
MsgApp would update the state, but in an idle group there's nothing
that rectifies the status (since there's nothing to append or update).
The reason this matters is that the state is exposed through
`RaftStatus()`. Concretely, in CockroachDB, we use the Raft status to
make sharding decisions (since it's dangerous to make rapid changes
to a fragile Raft group), and had to work around this problem[1].
[1]: 91b11dae41/pkg/storage/split_delay_helper.go (L138-L141)
The leader perpetually kept itself in ProgressStateProbe even though of
course it has perfect knowledge of its log. This wasn't usually an issue
because it also doesn't care about its own Progress, but it's better to
keep this data correctly maintained, especially since this is part of
raft.Status and thus becomes visible to applications using the Raft
library.
(Concretely, in CockroachDB we use the Progress to inform log
truncations).
Prior to this change, MaxSizePerMsg was used both to cap the total byte size of
entries in messages as well as the total byte size of entries passed through
CommittedEntries in the Ready struct. This change adds a new Config parameter
MaxCommittedSizePerReady which defaults to MaxSizePerMsg and contols the second
of above descibed settings.
Once chk(ai) fails with auth.ErrAuthOldRevision it will always do,
regardless how many times you retry. So the error is better be returned
to fail the pending request and make the client re-authenticate.
etcd currently generates a few UUID style identifiers using approaches like `fmt.Sprintf("client-%s", strconv.FormatInt(time.Now().UnixNano(), 36))`.
But these can collide on machine architectures with larger timestamp steps (see https://github.com/etcd-io/etcd/issues/10035).
The previous code was using the proto-generated `Size()` method to
track the size of an incoming proposal at the leader. This includes
the Index and Term, which were mutated after the call to `Size()`
when appending to the log. Additionally, it was not taking into
account that an ignored configuration change would ignore the
original proposal and append an empty entry instead.
As a result, a fully committed Raft group could end up with a non-
zero tracked uncommitted Raft log counter that would eventually hit
the ceiling and drop all future proposals indiscriminately. It would
also immediately imply that proposals exceeding the threshold alone
would get refused (as the "first uncommitted proposal" gets special
treatment and is always allowed in).
Track only the size of the payload actually appended to the Raft log
instead.
For context, see:
https://github.com/cockroachdb/cockroach/issues/31618#issuecomment-431374938
Currently the EtcdInsufficientMembers alert fires, when more than (X/2)-1
instances are unavailable. This fixes it to fire at the correct limit of (X-1)/2
unavailable instances and $value now contains the number of available instances
instead of unavailable ones. Added unit test for EtcdInsufficientMembers alert.
The suggested pattern for Raft proposals is that they be retried
periodically until they succeed. This turns out to be an issue
when a leader cannot commit entries because the leader will continue
to append re-proposed entries to its log without committing anything.
This can result in the uncommitted tail of a leader's log growing
without bound until it is able to commit entries.
This change add a safeguard to protect against this case where a
leader's log can grow without bound during loss of quorum scenarios.
It does so by introducing a new, optional ``MaxUncommittedEntriesSize
configuration. This config limits the max aggregate size of uncommitted
entries that may be appended to a leader's log. Once this limit
is exceeded, proposals will begin to return ErrProposalDropped
errors.
See cockroachdb/cockroach#27772
This PR adds another probing routine to monitor the connection
for Raft message transports. Previously, we only monitored
snapshot transports.
In our production cluster, we found one TCP connection had >8-sec
latencies to a remote peer, but "etcd_network_peer_round_trip_time_seconds"
metrics shows <1-sec latency distribution, which means etcd server
was not sampling enough while such latency spikes happen
outside of snapshot pipeline connection.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Convenient invariant:
- if werr == nil then lock is supposed to be locked at the moment.
While we could not be confident in stronger invariant ('is exactly locked'),
it were inconvenient that previous code could return `werr == nil` after
Mutex.Unlock.
It could happen when ctx is canceled/timeouted exactly after waitDeletes
successfully returned werr == nil and before `<-ctx.Done()` checked.
While such situation is very rare, it is still possible.
fixes#10111
We should allow etcd client use all of the available keepalive
client parameters as documented in this link,
https://godoc.org/google.golang.org/grpc/keepalive#ClientParameters
Currently in the etcd, by default PermitWithoutStream is set to
false, and user has no way to override it.
On the server side, we explicitely setting EnforcementPolicy
PermitWithoutStream to false and don't provide option to override it
to user but on the client side we should allow this option as
provided by the grpc.
When building tools on Win,it shows `.\main.go:68:12: assignment mismatch: 2 variables but 1 values`.The reason is the return variables not match the calling from `main.go` and i try to fix it.
Maintain existing error message for not-enough-args
Add "too many args" if too many args
Add more helpful error message if v2 syntax was used
New output:
```
sauer@host:~/dev/etcd$ ./bin/etcdctl --endpoints http://localhost:5001 member add
Error: member name not provided.
sauer@host:~/dev/etcd$ ./bin/etcdctl --endpoints http://localhost:5001 member add node2 node2
Error: too many arguments
sauer@host:~/dev/etcd$ ./bin/etcdctl --endpoints http://localhost:5001 member add node2 http://localhost:6002
Error: too many arguments, did you mean "--peer-urls http://localhost:6002"
sauer@host:~/dev/etcd$ ./bin/etcdctl --endpoints http://localhost:5001 member add http://localhost:6002 node2
Error: too many arguments, did you mean "--peer-urls http://localhost:6002"
```
The previous logic was erroneously setting Ready.MustSync to true when
the hard state had not changed because we were comparing an empty hard
state to the previous hard state. In combination with another misfeature
in CockroachDB (unnecessary writing of empty batches), this was causing
a steady stream of synchronous writes to disk.
In #9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.
Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.
See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).
[1]: https://github.com/cockroachdb/cockroach/issues/28918#issuecomment-418174448
Go 1.11 now marks len(channel) over being-closed channel
as racey operation, fix tests by receiving from channel first
and then check the length of channel.
```
WARNING: DATA RACE
Write at 0x00c000e872c0 by goroutine 198:
runtime.closechan()
/usr/local/go/src/runtime/chan.go:327 +0x0
go.etcd.io/etcd/clientv3.(*lessor).closeRequireLeader()
/Users/leegyuho/go/src/go.etcd.io/etcd/clientv3/lease.go:379 +0x748
go.etcd.io/etcd/clientv3.(*lessor).recvKeepAliveLoop()
/Users/leegyuho/go/src/go.etcd.io/etcd/clientv3/lease.go:455 +0x3a5
Previous read at 0x00c000e872c0 by goroutine 27:
go.etcd.io/etcd/clientv3/integration.TestLeaseWithRequireLeader()
/Users/leegyuho/go/src/go.etcd.io/etcd/clientv3/integration/lease_test.go:828 +0x810
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
```
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Several RPC references in README with memberlist only hyperlinked,
and since it's linked with line numbers it's not pointing to a
desired ref now. Remove the link to make it consistent and correct.
"read index" doesn't tell much about the root cause.
Most likely, the local follower node is having slow
network, thus timing out waiting to receive read
index response from leader.
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
This change allows Watch users to retrieve the cancel reason when a
watch is canceled by the server. Additionally WatchResponses with
closeErr set now have Canceled set true which is in line with the
documentation for the Canceled field.
Now returns errors from checkPermissionForWatch() via the CancelReason
field. This allows clients to understand why the watch was canceled.
Additionally, this change protects a watch from starting and that
otherwise might hang indefinitely.
The error message is different to the condition it checks for. This PR
modifies the error message accordingly.
Closes#9717.
Signed-off-by: Kostas Christidis <kostas@christidis.io>
2018-07-26 11:14:15 -04:00
2100 changed files with 42563 additions and 509257 deletions
See [code changes](https://github.com/coreos/etcd/compare/v3.0.15...v3.0.16) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.4*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.14...v3.0.15) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Fixed
- Fix cancel watch request with wrong range end.
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.13...v3.0.14) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Added
- v3 `etcdctl migrate` command now supports `--no-ttl` flag to discard keys on transform.
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.12...v3.0.13) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.11...v3.0.12) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.10...v3.0.11) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Added
- Server returns previous key-value (optional)
-`clientv3.WithPrevKV` option
- v3 etcdctl `put,watch,del --prev-kv` flag
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.9...v3.0.10) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.8...v3.0.9) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Added
- Warn on domain names on listen URLs (v3.2 will reject domain names).
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.7...v3.0.8) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Other
- Allow only IP addresses in listen URLs (domain names are rejected).
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.6...v3.0.7) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Other
- SRV records only allow A records (RFC 2052).
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.5...v3.0.6) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.4...v3.0.5) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Other
- SRV records (e.g., infra1.example.com) must match the discovery domain (i.e., example.com) if no custom certificate authority is given.
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.3...v3.0.4) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Added
- v2 `etcdctl ls` command now supports `--output=json`.
- Add /var/lib/etcd directory to etcd official Docker image.
### Other
- v2 auth can now use common name from TLS certificate when `--client-cert-auth` is enabled.
### Go
- Compile with [*Go 1.6.3*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.2...v3.0.3) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Other
- Revert Dockerfile to use `CMD`, instead of `ENTRYPOINT`, to support `etcdctl` run.
- Docker commands for v3.0.2 won't work without specifying executable binary paths.
- v3 etcdctl default endpoints are now `127.0.0.1:2379`.
### Go
- Compile with [*Go 1.6.2*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.1...v3.0.2) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Other
- Dockerfile uses `ENTRYPOINT`, instead of `CMD`, to run etcd without binary path specified.
### Go
- Compile with [*Go 1.6.2*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.0...v3.0.1) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.2*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v2.3.0...v3.0.0) and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_0.md).**
### Go
- Compile with [*Go 1.6.2*](https://golang.org/doc/devel/release.html#go1.6).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.18...v3.1.19) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Improved
- Improve [Raft Read Index timeout warning messages](https://github.com/coreos/etcd/pull/9897).
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-1) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
### client v3
- Fix [lease keepalive interval updates when response queue is full](https://github.com/coreos/etcd/pull/9952).
- If `<-chan *clientv3LeaseKeepAliveResponse` from `clientv3.Lease.KeepAlive` was never consumed or channel is full, client was [sending keepalive request every 500ms](https://github.com/coreos/etcd/issues/9911) instead of expected rate of every "TTL / 3" duration.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.17...v3.1.18) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-1) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
See [code changes](https://github.com/coreos/etcd/compare/v3.1.16...v3.1.17) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
See [code changes](https://github.com/coreos/etcd/compare/v3.1.15...v3.1.16) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd server
- Fix [`mvcc` server panic from restore operation](https://github.com/coreos/etcd/pull/9775).
- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, **etcd server panicked** during snapshot restore operation.
- Now, this server-side panic has been fixed.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.14...v3.1.15) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd server
- Purge old [`*.snap.db` snapshot files](https://github.com/coreos/etcd/pull/7967).
- Previously, etcd did not respect `--max-snapshots` flag to purge old `*.snap.db` files.
- Now, etcd purges old `*.snap.db` files to keep maximum `--max-snapshots` number of files on disk.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.13...v3.1.14) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-1) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Add [`--initial-election-tick-advance`](https://github.com/coreos/etcd/pull/9591) flag to configure initial election tick fast-forward.
- By default, `--initial-election-tick-advance=true`, then local member fast-forwards election ticks to speed up "initial" leader election trigger.
- This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting `--initial-election-tick-advance=false`.
- Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring `--initial-election-tick-advance` at the cost of slow initial bootstrap.
See [code changes](https://github.com/coreos/etcd/compare/v3.1.12...v3.1.13) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Improved
- Adjust [election timeout on server restart](https://github.com/coreos/etcd/pull/9415) to reduce [disruptive rejoining servers](https://github.com/coreos/etcd/issues/9333).
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-1) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
See [code changes](https://github.com/coreos/etcd/compare/v3.1.11...v3.1.12) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
- A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.10...v3.1.11) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd server
- [#8411](https://github.com/coreos/etcd/issues/8411),[#8806](https://github.com/coreos/etcd/pull/8806) backport "mvcc: sending events after restore"
See [code changes](https://github.com/coreos/etcd/compare/v3.1.9...v3.1.10) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Added
- Tag docker images with minor versions.
- e.g. `docker pull quay.io/coreos/etcd:v3.1` to fetch latest v3.1 versions.
### Go
- Compile with [*Go 1.8.3*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.8...v3.1.9) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd server
- Allow v2 snapshot over 512MB.
### Go
- Compile with [*Go 1.7.6*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.7...v3.1.8) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.6...v3.1.7) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.5...v3.1.6) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd server
- Fill in Auth API response header.
- Remove auth check in Status API.
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.4...v3.1.5) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd server
- Fix raft memory leak issue.
- Fix Windows file path issues.
### Other
- Add `/etc/nsswitch.conf` file to alpine-based Docker image.
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.3...v3.1.4) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.2...v3.1.3) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd gateway
- Fix `etcd gateway` schema handling in DNS discovery.
- Fix sd_notify behaviors in `gateway`, `grpc-proxy`.
### gRPC Proxy
- Fix sd_notify behaviors in `gateway`, `grpc-proxy`.
### Other
- Use machine default host when advertise URLs are default values(`localhost:2379,2380`) AND if listen URL is `0.0.0.0`.
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.1...v3.1.2) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### etcd gateway
- Fix `etcd gateway` with multiple endpoints.
### Other
- Use IPv4 default host, by default (when IPv4 and IPv6 are available).
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.0...v3.1.1) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
### Go
- Compile with [*Go 1.7.5*](https://golang.org/doc/devel/release.html#go1.7).
See [code changes](https://github.com/coreos/etcd/compare/v3.0.0...v3.1.0) and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.1 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_1.md).**
- Deprecated following gRPC metrics in favor of [go-grpc-prometheus](https://github.com/grpc-ecosystem/go-grpc-prometheus).
-`etcd_grpc_requests_total`
-`etcd_grpc_requests_failed_total`
-`etcd_grpc_active_streams`
-`etcd_grpc_unary_requests_duration_seconds`
### Dependency
- Upgrade [`github.com/ugorji/go/codec`](https://github.com/ugorji/go) to [**`ugorji/go@9c7f9b7`**](https://github.com/ugorji/go/commit/9c7f9b7a2bc3a520f7c7b30b34b7f85f47fe27b6), and [regenerate v2 `client`](https://github.com/coreos/etcd/pull/6945).
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- SRV records (e.g., infra1.example.com) must match the discovery domain (i.e., example.com) if no custom certificate authority is given.
-`TLSConfig.ServerName` is ignored with user-provided certificates for backwards compatibility; to be deprecated.
- For example, `etcd --discovery-srv=example.com` will only authenticate peers/clients when the provided certs have root domain `example.com` as an entry in Subject Alternative Name (SAN) field.
### etcd server
- Automatic leadership transfer when leader steps down.
- etcd flags
-`--strict-reconfig-check` flag is set by default.
- Add `--log-output` flag.
- Add `--metrics` flag.
- etcd uses default route IP if advertise URL is not given.
- Cluster rejects removing members if quorum will be lost.
- Discovery now has upper limit for waiting on retries.
- Warn on binding listeners through domain names; to be deprecated.
- v3.0 and v3.1 with `--auto-compaction-retention=10` run periodic compaction on v3 key-value store for every 10-hour.
- Compactor only supports periodic compaction.
- Compactor records latest revisions every 5-minute, until it reaches the first compaction period (e.g. 10-hour).
- In order to retain key-value history of last compaction period, it uses the last revision that was fetched before compaction period, from the revision records that were collected every 5-minute.
- When `--auto-compaction-retention=10`, compactor uses revision 100 for compact revision where revision 100 is the latest revision fetched from 10 hours ago.
- If compaction succeeds or requested revision has already been compacted, it resets period timer and starts over with new historical revision records (e.g. restart revision collect and compact for the next 10-hour period).
- If compaction fails, it retries in 5 minutes.
### client v3
- Add `SetEndpoints` method; update endpoints at runtime.
- Add `Sync` method; auto-update endpoints at runtime.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.23...v3.2.24) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Improved
- Improve [Raft Read Index timeout warning messages](https://github.com/coreos/etcd/pull/9897).
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_server_quota_backend_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
### gRPC Proxy
- Add [flags for specifying TLS for connecting to proxy](https://github.com/coreos/etcd/pull/9894):
- Add [`grpc-proxy start --metrics-addr` flag for specifying a separate metrics listen address](https://github.com/coreos/etcd/pull/9894).
### client v3
- Fix [lease keepalive interval updates when response queue is full](https://github.com/coreos/etcd/pull/9952).
- If `<-chan *clientv3LeaseKeepAliveResponse` from `clientv3.Lease.KeepAlive` was never consumed or channel is full, client was [sending keepalive request every 500ms](https://github.com/coreos/etcd/issues/9911) instead of expected rate of every "TTL / 3" duration.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.22...v3.2.23) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
See [code changes](https://github.com/coreos/etcd/compare/v3.2.21...v3.2.22) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Security, Authentication
- Support TLS cipher suite whitelisting.
- To block [weak cipher suites](https://github.com/coreos/etcd/issues/8320).
- TLS handshake fails when client hello is requested with invalid cipher suites.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.20...v3.2.21) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Fix [auth storage panic when simple token provider is disabled](https://github.com/coreos/etcd/pull/8695).
- Fix [`mvcc` server panic from restore operation](https://github.com/coreos/etcd/pull/9775).
- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, **etcd server panicked** during snapshot restore operation.
- Now, this server-side panic has been fixed.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.19...v3.2.20) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Purge old [`*.snap.db` snapshot files](https://github.com/coreos/etcd/pull/7967).
- Previously, etcd did not respect `--max-snapshots` flag to purge old `*.snap.db` files.
- Now, etcd purges old `*.snap.db` files to keep maximum `--max-snapshots` number of files on disk.
### Go
- Compile with [*Go 1.8.7*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.18...v3.2.19) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Fix [TLS reload](https://github.com/coreos/etcd/pull/9570) when [certificate SAN field only includes IP addresses but no domain names](https://github.com/coreos/etcd/issues/9541).
- In Go, server calls `(*tls.Config).GetCertificate` for TLS reload if and only if server's `(*tls.Config).Certificates` field is not empty, or `(*tls.ClientHelloInfo).ServerName` is not empty with a valid SNI from the client. Previously, etcd always populates `(*tls.Config).Certificates` on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger `(*tls.Config).GetCertificate` to reload TLS assets.
- However, a certificate whose SAN field does [not include any domain names but only IP addresses](https://github.com/coreos/etcd/issues/9541) would request `*tls.ClientHelloInfo` with an empty `ServerName` field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online.
- Now, `(*tls.Config).Certificates` is created empty on initial TLS client handshake, first to trigger `(*tls.Config).GetCertificate`, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).
### etcd server
- Add [`etcd --initial-election-tick-advance`](https://github.com/coreos/etcd/pull/9591) flag to configure initial election tick fast-forward.
- By default, `etcd --initial-election-tick-advance=true`, then local member fast-forwards election ticks to speed up "initial" leader election trigger.
- This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting `--initial-election-tick-advance=false`.
- Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring `--initial-election-tick-advance` at the cost of slow initial bootstrap.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.17...v3.2.18) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Improved
- Adjust [election timeout on server restart](https://github.com/coreos/etcd/pull/9415) to reduce [disruptive rejoining servers](https://github.com/coreos/etcd/issues/9333).
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.16...v3.2.17) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Fix [server panic on invalid Election Proclaim/Resign HTTP(S) requests](https://github.com/coreos/etcd/pull/9379).
- Previously, wrong-formatted HTTP requests to Election API could trigger panic in etcd server.
- e.g. `curl -L http://localhost:2379/v3/election/proclaim -X POST -d '{"value":""}'`, `curl -L http://localhost:2379/v3/election/resign -X POST -d '{"value":""}'`.
- Prevent [overflow by large `TTL` values for `Lease` `Grant`](https://github.com/coreos/etcd/pull/9399).
-`TTL` parameter to `Grant` request is unit of second.
- Leases with too large `TTL` values exceeding `math.MaxInt64` [expire in unexpected ways](https://github.com/coreos/etcd/issues/9374).
- Server now returns `rpctypes.ErrLeaseTTLTooLarge` to client, when the requested `TTL` is larger than *9,000,000,000 seconds* (which is >285 years).
- Again, etcd `Lease` is meant for short-periodic keepalives or sessions, in the range of seconds or minutes. Not for hours or days!
- Enable etcd server [`raft.Config.CheckQuorum` when starting with `ForceNewCluster`](https://github.com/coreos/etcd/pull/9347).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.15...v3.2.16) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
- A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
### Go
- Compile with [*Go 1.8.5*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.14...v3.2.15) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Prevent [server panic from member update/add](https://github.com/coreos/etcd/pull/9174) with [wrong scheme URLs](https://github.com/coreos/etcd/issues/9173).
- Log [user context cancel errors on stream APIs in debug level with TLS](https://github.com/coreos/etcd/pull/9178).
### Go
- Compile with [*Go 1.8.5*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.13...v3.2.14) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Improved
- Log [user context cancel errors on stream APIs in debug level](https://github.com/coreos/etcd/pull/9105).
### etcd server
- Fix [`mvcc/backend.defragdb` nil-pointer dereference on create bucket failure](https://github.com/coreos/etcd/pull/9119).
### Go
- Compile with [*Go 1.8.5*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.12...v3.2.13) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Remove [verbose error messages on stream cancel and gRPC info-level logs](https://github.com/coreos/etcd/pull/9080) in server-side.
- Fix [gRPC server panic on `GracefulStop` TLS-enabled server](https://github.com/coreos/etcd/pull/8987).
### Go
- Compile with [*Go 1.8.5*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.11...v3.2.12) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Dependency
- Upgrade [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases/tag) from [**`v1.7.4`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.4) to [**`v1.7.5`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.5).
- Upgrade [`github.com/grpc-ecosystem/grpc-gateway`](https://github.com/grpc-ecosystem/grpc-gateway/releases) from [**`v1.3`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.3) to [**`v1.3.0`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.3.0).
### etcd server
- Fix [error message of `Revision` compactor](https://github.com/coreos/etcd/pull/8999) in server-side.
### client v3
- Add [`MaxCallSendMsgSize` and `MaxCallRecvMsgSize`](https://github.com/coreos/etcd/pull/9047) fields to [`clientv3.Config`](https://godoc.org/github.com/coreos/etcd/clientv3#Config).
- Fix [exceeded response size limit error in client-side](https://github.com/coreos/etcd/issues/9043).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.10...v3.2.11) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Dependency
- Upgrade [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases/tag) from [**`v1.7.3`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.3) to [**`v1.7.4`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.4).
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- Log [more details on TLS handshake failures](https://github.com/coreos/etcd/pull/8952/files).
### client v3
- Fix racey grpc-go's server handler transport `WriteStatus` call to prevent [TLS-enabled etcd server crash](https://github.com/coreos/etcd/issues/8904).
- Add [gRPC RPC failure warnings](https://github.com/coreos/etcd/pull/8939) to help debug such issues in the future.
### Documentation
- Remove `--listen-metrics-urls` flag in monitoring document (non-released in `v3.2.x`, planned for `v3.3.x`).
### Go
- Compile with [*Go 1.8.5*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.9...v3.2.10) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Dependency
- Upgrade [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases/tag) from [**`v1.2.1`**](https://github.com/grpc/grpc-go/releases/tag/v1.2.1) to [**`v1.7.3`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.3).
- Upgrade [`github.com/grpc-ecosystem/grpc-gateway`](https://github.com/grpc-ecosystem/grpc-gateway/releases) from [**`v1.2.0`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.2.0) to [**`v1.3`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.3).
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- Revert [discovery SRV auth `ServerName` with `*.{ROOT_DOMAIN}`](https://github.com/coreos/etcd/pull/8651) to support non-wildcard subject alternative names in the certs (see [issue #8445](https://github.com/coreos/etcd/issues/8445) for more contexts).
- For instance, `etcd --discovery-srv=etcd.local` will only authenticate peers/clients when the provided certs have root domain `etcd.local` (**not `*.etcd.local`**) as an entry in Subject Alternative Name (SAN) field.
### etcd server
- Replace backend key-value database `boltdb/bolt` with [`coreos/bbolt`](https://github.com/coreos/bbolt/releases) to address [backend database size issue](https://github.com/coreos/etcd/issues/8009).
### client v3
- Rewrite balancer to handle [network partitions](https://github.com/coreos/etcd/issues/8711).
### Go
- Compile with [*Go 1.8.5*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.8...v3.2.9) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- Update `golang.org/x/crypto/bcrypt` (see [golang/crypto@6c586e1](https://github.com/golang/crypto/commit/6c586e17d90a7d08bbbc4069984180dce3b04117)).
- Fix discovery SRV bootstrapping to [authenticate `ServerName` with `*.{ROOT_DOMAIN}`](https://github.com/coreos/etcd/pull/8651), in order to support sub-domain wildcard matching (see [issue #8445](https://github.com/coreos/etcd/issues/8445) for more contexts).
- For instance, `etcd --discovery-srv=etcd.local` will only authenticate peers/clients when the provided certs have root domain `*.etcd.local` as an entry in Subject Alternative Name (SAN) field.
### Go
- Compile with [*Go 1.8.4*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.7...v3.2.8) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### client v2
- Fix v2 client failover to next endpoint on mutable operation.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.6...v3.2.7) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Security, Authentication
- Fix [server-side auth so concurrent auth operations do not return old revision error](https://github.com/coreos/etcd/pull/8306).
### client v3
- Fix [`concurrency/stm` Put with serializable snapshot](https://github.com/coreos/etcd/pull/8439).
- Use store revision from first fetch to resolve write conflicts instead of modified revision.
### Go
- Compile with [*Go 1.8.3*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.5...v3.2.6) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Fix watch restore from snapshot.
- Fix multiple URLs for `--listen-peer-urls` flag.
- Add `--enable-pprof` flag to etcd configuration file format.
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.4...v3.2.5) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcdctl v3
- Return non-zero exit code on unhealthy `endpoint health`.
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- [Server supports reverse-lookup on wildcard DNS `SAN`](https://github.com/coreos/etcd/pull/8281). For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server first reverse-lookups the remote IP address to get a list of names mapping to that address (e.g. `nslookup IPADDR`). Then accepts the connection if those names have a matching name with peer cert's DNS names (either by exact or wildcard match). If none is matched, server forward-lookups each DNS entry in peer cert (e.g. look up `example.default.svc` when the entry is `*.example.default.svc`), and accepts connection only when the host's resolved addresses have the matching IP address with the peer's remote IP address. For example, peer B's CSR (with `cfssl`) SAN field is `["*.example.default.svc", "*.example.default.svc.cluster.local"]` when peer B's remote IP address is `10.138.0.2`. When peer B tries to join the cluster, peer A reverse-lookup the IP `10.138.0.2` to get the list of host names. And either exact or wildcard match the host names with peer B's cert DNS names in Subject Alternative Name (SAN) field. If none of reverse/forward lookups worked, it returns an error `"tls: "10.138.0.2" does not match any of DNSNames ["*.example.default.svc","*.example.default.svc.cluster.local"]`. See [issue#8268](https://github.com/coreos/etcd/issues/8268) for more detail.
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Fix unreachable `/metrics` endpoint when `--enable-v2=false`.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.3...v3.2.4) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Do not block on active client stream when stopping server
### gRPC proxy
- Fix gRPC proxy Snapshot RPC error handling
### Go
- Compile with [*Go 1.8.3*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.2...v3.2.3) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### client v3
- Let clients establish unlimited streams
### Other
- Tag docker images with minor versions
- e.g. `docker pull quay.io/coreos/etcd:v3.2` to fetch latest v3.2 versions
### Go
- Compile with [*Go 1.8.3*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.1...v3.2.2) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Improved
- Rate-limit lease revoke on expiration.
- Extend leases on promote to avoid queueing effect on lease expiration.
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- [Server accepts connections if IP matches, without checking DNS entries](https://github.com/coreos/etcd/pull/8223). For instance, if peer cert contains IP addresses and DNS names in Subject Alternative Name (SAN) field, and the remote IP address matches one of those IP addresses, server just accepts connection without further checking the DNS names. For example, peer B's CSR (with `cfssl`) SAN field is `["invalid.domain", "10.138.0.2"]` when peer B's remote IP address is `10.138.0.2` and `invalid.domain` is a invalid host. When peer B tries to join the cluster, peer A successfully authenticates B, since Subject Alternative Name (SAN) field has a valid matching IP address. See [issue#8206](https://github.com/coreos/etcd/issues/8206) for more detail.
### etcd server
- Accept connection with matched IP SAN but no DNS match.
- Don't check DNS entries in certs if there's a matching IP.
### gRPC gateway
- Use user-provided listen address to connect to gRPC gateway.
See [code changes](https://github.com/coreos/etcd/compare/v3.2.0...v3.2.1) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### etcd server
- Fix backend database in-memory index corruption issue on restore (only 3.2.0 is affected).
### gRPC gateway
- Fix Txn marshaling.
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Fix backend database size debugging metrics.
### Go
- Compile with [*Go 1.8.3*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.1.0...v3.2.0) and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.2 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_2.md).**
### Improved
- Improve backend read concurrency.
### Breaking Changes
- Increased [`--snapshot-count` default value from 10,000 to 100,000](https://github.com/coreos/etcd/pull/7160).
- Higher snapshot count means it holds Raft entries in memory for longer before discarding old entries.
- It is a trade-off between less frequent snapshotting and [higher memory usage](https://github.com/kubernetes/kubernetes/issues/60589#issuecomment-371977156).
- User lower `--snapshot-count` value for lower memory usage.
- User higher `--snapshot-count` value for better availabilities of slow followers (less frequent snapshots from leader).
-`clientv3.Lease.TimeToLive` returns `LeaseTimeToLiveResponse.TTL == -1` on lease not found.
-`clientv3.NewFromConfigFile` is moved to `clientv3/yaml.NewConfig`.
-`embed.Etcd.Peers` field is now `[]*peerListener`.
- Rejects domains names for `--listen-peer-urls` and `--listen-client-urls` (3.1 only prints out warnings), since [domain name is invalid for network interface binding](https://github.com/coreos/etcd/issues/6336).
### Dependency
- Upgrade [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases) from [**`v1.0.4`**](https://github.com/grpc/grpc-go/releases/tag/v1.0.4) to [**`v1.2.1`**](https://github.com/grpc/grpc-go/releases/tag/v1.2.1).
- Upgrade [`github.com/grpc-ecosystem/grpc-gateway`](https://github.com/grpc-ecosystem/grpc-gateway/releases) to [**`v1.2.0`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.2.0).
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-2) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- [TLS certificates get reloaded on every client connection](https://github.com/coreos/etcd/pull/7829). This is useful when replacing expiry certs without stopping etcd servers; it can be done by overwriting old certs with new ones. Refreshing certs for every connection should not have too much overhead, but can be improved in the future, with caching layer. Example tests can be found [here](https://github.com/coreos/etcd/blob/b041ce5d514a4b4aaeefbffb008f0c7570a18986/integration/v3_grpc_test.go#L1601-L1757).
- [Server denies incoming peer certs with wrong IP `SAN`](https://github.com/coreos/etcd/pull/7687). For instance, if peer cert contains any IP addresses in Subject Alternative Name (SAN) field, server authenticates a peer only when the remote IP address matches one of those IP addresses. This is to prevent unauthorized endpoints from joining the cluster. For example, peer B's CSR (with `cfssl`) SAN field is `["*.example.default.svc", "*.example.default.svc.cluster.local", "10.138.0.27"]` when peer B's actual IP address is `10.138.0.2`, not `10.138.0.27`. When peer B tries to join the cluster, peer A will reject B with the error `x509: certificate is valid for 10.138.0.27, not 10.138.0.2`, because B's remote IP address does not match the one in Subject Alternative Name (SAN) field.
- [Server resolves TLS `DNSNames` when checking `SAN`](https://github.com/coreos/etcd/pull/7767). For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server authenticates a peer only when forward-lookups (`dig b.com`) on those DNS names have matching IP with the remote IP address. For example, peer B's CSR (with `cfssl`) SAN field is `["b.com"]` when peer B's remote IP address is `10.138.0.2`. When peer B tries to join the cluster, peer A looks up the incoming host `b.com` to get the list of IP addresses (e.g. `dig b.com`). And rejects B if the list does not contain the IP `10.138.0.2`, with the error `tls: 10.138.0.2 does not match any of DNSNames ["b.com"]`.
- Auth support JWT token.
### etcd server
- RPCs
- Add Election, Lock service.
- Native client `etcdserver/api/v3client`
- client "embedded" in the server.
- Logging, monitoring
- Server warns large snapshot operations.
- Add `etcd --enable-v2` flag to enable v2 API server.
- Compactor continues to record latest revisions every 5-minute.
- For every hour, it uses the last revision that was fetched before compaction period, from the revision records that were collected every 5-minute.
- That is, for every hour, compactor discards historical data created before compaction period.
- The retention window of compaction period moves to next hour.
- For instance, when hourly writes are 100 and `--auto-compaction-retention=10`, v3.1 compacts revision 1000, 2000, and 3000 for every 10-hour, while v3.2 compacts revision 1000, 1100, and 1200 for every 1-hour.
- If compaction succeeds or requested revision has already been compacted, it resets period timer and removes used compacted revision from historical revision records (e.g. start next revision collect and compaction from previously collected revisions).
- If compaction fails, it retries in 5 minutes.
- Allow snapshot over 512MB.
### client v3
- STM prefetching.
- Add namespace feature.
- Add `ErrOldCluster` with server version checking.
- Translate `WithPrefix()` into `WithFromKey()` for empty key.
### etcdctl v3
- Add `check perf` command.
- Add `etcdctl --from-key` flag to role grant-permission command.
-`lock` command takes an optional command to execute.
### gRPC Proxy
- Proxy endpoint discovery.
- Namespaces.
- Coalesce lease requests.
### etcd gateway
- Support [DNS SRV priority](https://github.com/coreos/etcd/pull/7882) for [smart proxy routing](https://github.com/coreos/etcd/issues/4378).
### Other
- v3 client
- concurrency package's elections updated to match RPC interfaces.
- let client dial endpoints not in the balancer.
- Release
- Annotate acbuild with supports-systemd-notify.
- Add `nsswitch.conf` to Docker container image.
- Add ppc64le, arm64(experimental) builds.
### Go
- Compile with [*Go 1.8.3*](https://golang.org/doc/devel/release.html#go1.8).
See [code changes](https://github.com/coreos/etcd/compare/v3.3.8...v3.3.9) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
### Improved
- Improve [Raft Read Index timeout warning messages](https://github.com/coreos/etcd/pull/9897).
### Security, Authentication
- Compile with [*Go 1.10.3*](https://golang.org/doc/devel/release.html#go1.10) to support [crypto/x509 "Name Constraints"](https://github.com/coreos/etcd/issues/9912).
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-3) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
### client v3
- Fix [lease keepalive interval updates when response queue is full](https://github.com/coreos/etcd/pull/9952).
- If `<-chan *clientv3LeaseKeepAliveResponse` from `clientv3.Lease.KeepAlive` was never consumed or channel is full, client was [sending keepalive request every 500ms](https://github.com/coreos/etcd/issues/9911) instead of expected rate of every "TTL / 3" duration.
### Go
- Compile with [*Go 1.10.3*](https://golang.org/doc/devel/release.html#go1.10).
See [code changes](https://github.com/coreos/etcd/compare/v3.3.7...v3.3.8) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
See [code changes](https://github.com/coreos/etcd/compare/v3.3.6...v3.3.7) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
### Security, Authentication
- Support TLS cipher suite whitelisting.
- To block [weak cipher suites](https://github.com/coreos/etcd/issues/8320).
- TLS handshake fails when client hello is requested with invalid cipher suites.
See [code changes](https://github.com/coreos/etcd/compare/v3.3.5...v3.3.6) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
- Previously, when auth token is an empty string, it returns [`failed to initialize the etcd server: auth: invalid auth options` error](https://github.com/coreos/etcd/issues/9349).
- Fix [auth storage panic on server lease revoke routine with JWT token](https://github.com/coreos/etcd/issues/9695).
- Fix [`mvcc` server panic from restore operation](https://github.com/coreos/etcd/pull/9775).
- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, **etcd server panicked** during snapshot restore operation.
- Now, this server-side panic has been fixed.
### Go
- Compile with [*Go 1.9.6*](https://golang.org/doc/devel/release.html#go1.9).
See [code changes](https://github.com/coreos/etcd/compare/v3.3.4...v3.3.5) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
See [code changes](https://github.com/coreos/etcd/compare/v3.3.3...v3.3.4) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-3) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Fix [race conditions in v2 server stat collecting](https://github.com/coreos/etcd/pull/9562).
### Security, Authentication
- Fix [TLS reload](https://github.com/coreos/etcd/pull/9570) when [certificate SAN field only includes IP addresses but no domain names](https://github.com/coreos/etcd/issues/9541).
- In Go, server calls `(*tls.Config).GetCertificate` for TLS reload if and only if server's `(*tls.Config).Certificates` field is not empty, or `(*tls.ClientHelloInfo).ServerName` is not empty with a valid SNI from the client. Previously, etcd always populates `(*tls.Config).Certificates` on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger `(*tls.Config).GetCertificate` to reload TLS assets.
- However, a certificate whose SAN field does [not include any domain names but only IP addresses](https://github.com/coreos/etcd/issues/9541) would request `*tls.ClientHelloInfo` with an empty `ServerName` field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online.
- Now, `(*tls.Config).Certificates` is created empty on initial TLS client handshake, first to trigger `(*tls.Config).GetCertificate`, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).
### etcd server
- Add [`etcd --initial-election-tick-advance`](https://github.com/coreos/etcd/pull/9591) flag to configure initial election tick fast-forward.
- By default, `etcd --initial-election-tick-advance=true`, then local member fast-forwards election ticks to speed up "initial" leader election trigger.
- This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting `--initial-election-tick-advance=false`.
- Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring `etcd --initial-election-tick-advance` at the cost of slow initial bootstrap.
See [code changes](https://github.com/coreos/etcd/compare/v3.3.2...v3.3.3) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
### Improved
- Adjust [election timeout on server restart](https://github.com/coreos/etcd/pull/9415) to reduce [disruptive rejoining servers](https://github.com/coreos/etcd/issues/9333).
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
- e.g. `etcd --auto-compaction-mode=revision --auto-compaction-retention=1000` automatically `Compact` on `"latest revision" - 1000` every 5-minute (when latest revision is 30000, compact on revision 29000).
- e.g. Previously, `etcd --auto-compaction-mode=periodic --auto-compaction-retention=72h` automatically `Compact` with 72-hour retention windown for every 7.2-hour. **Now, `Compact` happens, for every 1-hour but still with 72-hour retention window.**
- e.g. Previously, `etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m` automatically `Compact` with 30-minute retention windown for every 3-minute. **Now, `Compact` happens, for every 30-minute but still with 30-minute retention window.**
- Periodic compactor keeps recording latest revisions for every compaction period when given period is less than 1-hour, or for every 1-hour when given compaction period is greater than 1-hour (e.g. 1-hour when `etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h`).
- For every compaction period or 1-hour, compactor uses the last revision that was fetched before compaction period, to discard historical data.
- The retention window of compaction period moves for every given compaction period or hour.
- For instance, when hourly writes are 100 and `etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h`, `v3.2.x`, `v3.3.0`, `v3.3.1`, and `v3.3.2` compact revision 2400, 2640, and 2880 for every 2.4-hour, while `v3.3.3`*or later* compacts revision 2400, 2500, 2600 for every 1-hour.
- Futhermore, when `etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m` and writes per minute are about 1000, `v3.3.0`, `v3.3.1`, and `v3.3.2` compact revision 30000, 33000, and 36000, for every 3-minute, while `v3.3.3`*or later* compacts revision 30000, 60000, and 90000, for every 30-minute.
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-3) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
See [code changes](https://github.com/coreos/etcd/compare/v3.3.1...v3.3.2) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
### etcd server
- Fix [server panic on invalid Election Proclaim/Resign HTTP(S) requests](https://github.com/coreos/etcd/pull/9379).
- Previously, wrong-formatted HTTP requests to Election API could trigger panic in etcd server.
- e.g. `curl -L http://localhost:2379/v3/election/proclaim -X POST -d '{"value":""}'`, `curl -L http://localhost:2379/v3/election/resign -X POST -d '{"value":""}'`.
See [code changes](https://github.com/coreos/etcd/compare/v3.3.0...v3.3.1) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
### Improved
- Add [warnings on requests taking too long](https://github.com/coreos/etcd/pull/9288).
- e.g. `etcdserver: read-only range request "key:\"\\000\" range_end:\"\\000\" " took too long [3.389041388s] to execute`
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
- A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
### Go
- Compile with [*Go 1.9.4*](https://golang.org/doc/devel/release.html#go1.9).
See [code changes](https://github.com/coreos/etcd/compare/v3.2.0...v3.3.0) and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.3 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md).**
- [v3.3.0-rc.4](https://github.com/coreos/etcd/releases/tag/v3.3.0-rc.4) (2018-01-22), see [code changes](https://github.com/coreos/etcd/compare/v3.3.0-rc.3...v3.3.0-rc.4).
- [v3.3.0-rc.3](https://github.com/coreos/etcd/releases/tag/v3.3.0-rc.3) (2018-01-17), see [code changes](https://github.com/coreos/etcd/compare/v3.3.0-rc.2...v3.3.0-rc.3).
- [v3.3.0-rc.2](https://github.com/coreos/etcd/releases/tag/v3.3.0-rc.2) (2018-01-11), see [code changes](https://github.com/coreos/etcd/compare/v3.3.0-rc.1...v3.3.0-rc.2).
- [v3.3.0-rc.1](https://github.com/coreos/etcd/releases/tag/v3.3.0-rc.1) (2018-01-02), see [code changes](https://github.com/coreos/etcd/compare/v3.3.0-rc.0...v3.3.0-rc.1).
- [v3.3.0-rc.0](https://github.com/coreos/etcd/releases/tag/v3.3.0-rc.0) (2017-12-20), see [code changes](https://github.com/coreos/etcd/compare/v3.2.0...v3.3.0-rc.0).
### Improved
- Use [`coreos/bbolt`](https://github.com/coreos/bbolt/releases) to replace [`boltdb/bolt`](https://github.com/boltdb/bolt#project-status).
- Fix [etcd database size grows until `mvcc: database space exceeded`](https://github.com/coreos/etcd/issues/8009).
- [Support database size larger than 8GiB](https://github.com/coreos/etcd/pull/7525) (8GiB is now a suggested maximum size for normal environments)
- [Reduce memory allocation](https://github.com/coreos/etcd/pull/8428) on [Range operations](https://github.com/coreos/etcd/pull/8475).
- [Rate limit](https://github.com/coreos/etcd/pull/8099) and [randomize](https://github.com/coreos/etcd/pull/8101) lease revoke on restart or leader elections.
- Prevent [spikes in Raft proposal rate](https://github.com/coreos/etcd/issues/8096).
- Support `clientv3` balancer failover under [network faults/partitions](https://github.com/coreos/etcd/issues/8711).
- Better warning on [mismatched `etcd --initial-cluster`](https://github.com/coreos/etcd/pull/8083) flag.
- etcd compares `etcd --initial-advertise-peer-urls` against corresponding `etcd --initial-cluster` URLs with forward-lookup.
- If resolved IP addresses of `etcd --initial-advertise-peer-urls` and `etcd --initial-cluster` do not match (e.g. [due to DNS error](https://github.com/coreos/etcd/pull/9210)), etcd will exit with errors.
- v3.2 error: `etcd --initial-cluster must include s1=https://s1.test:2380 given --initial-advertise-peer-urls=https://s1.test:2380`.
- v3.3 error: `failed to resolve https://s1.test:2380 to match --initial-cluster=s1=https://s1.test:2380 (failed to resolve "https://s1.test:2380" (error ...))`.
### Breaking Changes
- Require [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases) [**`v1.7.4`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.4) or [**`v1.7.5`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.5).
- Deprecate `grpclog.Logger`, upgrade to [`grpclog.LoggerV2`](https://github.com/coreos/etcd/pull/8533).
- Deprecate [`grpc.ErrClientConnTimeout`](https://github.com/coreos/etcd/pull/8505) errors in `clientv3`.
- Use [`MaxRecvMsgSize` and `MaxSendMsgSize`](https://github.com/coreos/etcd/pull/8437) to limit message size, in etcd server.
- Translate [gRPC status error in v3 client `Snapshot` API](https://github.com/coreos/etcd/pull/9038).
- v3 `etcdctl` [`lease timetolive LEASE_ID`](https://github.com/coreos/etcd/issues/9028) on expired lease now prints [`"lease LEASE_ID already expired"`](https://github.com/coreos/etcd/pull/9047).
-<=3.2 prints `"lease LEASE_ID granted with TTL(0s), remaining(-1s)"`.
- Replace [gRPC gateway](https://github.com/grpc-ecosystem/grpc-gateway) endpoint `/v3alpha` with [`/v3beta`](https://github.com/coreos/etcd/pull/8880).
- To deprecate [`/v3alpha`](https://github.com/coreos/etcd/issues/8125) in v3.4.
- In v3.3, `curl -L http://localhost:2379/v3alpha/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` still works as a fallback to `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'`, but `curl -L http://localhost:2379/v3alpha/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` won't work in v3.4. Use `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` instead.
- Change `etcd --auto-compaction-retention` flag to [accept string values](https://github.com/coreos/etcd/pull/8563) with [finer granularity](https://github.com/coreos/etcd/issues/8503).
- Now that `etcd --auto-compaction-retention` accepts string values, etcd configuration YAML file `auto-compaction-retention` field must be changed to `string` type.
- Previously, `--config-file etcd.config.yaml` can have `auto-compaction-retention: 24` field, now must be `auto-compaction-retention: "24"` or `auto-compaction-retention: "24h"`.
- If configured as `etcd --auto-compaction-mode periodic --auto-compaction-retention "24h"`, the time duration value for `etcd --auto-compaction-retention` flag must be valid for [`time.ParseDuration`](https://golang.org/pkg/time/#ParseDuration) function in Go.
### Dependency
- Upgrade [`boltdb/bolt`](https://github.com/boltdb/bolt#project-status) from [**`v1.3.0`**](https://github.com/boltdb/bolt/releases/tag/v1.3.0) to [`coreos/bbolt`](https://github.com/coreos/bbolt/releases) [**`v1.3.1-coreos.6`**](https://github.com/coreos/bbolt/releases/tag/v1.3.1-coreos.6).
- Upgrade [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases) from [**`v1.2.1`**](https://github.com/grpc/grpc-go/releases/tag/v1.2.1) to [**`v1.7.5`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.5).
- Upgrade [`github.com/ugorji/go/codec`](https://github.com/ugorji/go) to [**`v1.1`**](https://github.com/ugorji/go/releases/tag/v1.1), and [regenerate v2 `client`](https://github.com/coreos/etcd/pull/8721).
- Upgrade [`github.com/ugorji/go/codec`](https://github.com/ugorji/go) to [**`ugorji/go@54210f4e0`**](https://github.com/ugorji/go/commit/54210f4e076c57f351166f0ed60e67d3fca57a36), and [regenerate v2 `client`](https://github.com/coreos/etcd/pull/8574).
- Upgrade [`github.com/grpc-ecosystem/grpc-gateway`](https://github.com/grpc-ecosystem/grpc-gateway/releases) from [**`v1.2.2`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.2.2) to [**`v1.3.0`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.3.0).
- Upgrade [`golang.org/x/crypto/bcrypt`](https://github.com/golang/crypto) to [**`golang/crypto@6c586e17d`**](https://github.com/golang/crypto/commit/6c586e17d90a7d08bbbc4069984180dce3b04117).
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#v3-3) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Add [`etcd --listen-metrics-urls`](https://github.com/coreos/etcd/pull/8242) flag for additional `/metrics` and `/health` endpoints.
- Useful for [bypassing critical APIs when monitoring etcd](https://github.com/coreos/etcd/issues/8060).
- Initialize gRPC server [metrics with zero values](https://github.com/coreos/etcd/pull/8878).
- Fix [range/put/delete operation metrics](https://github.com/coreos/etcd/pull/8054) with transaction.
-`etcd_debugging_mvcc_range_total`
-`etcd_debugging_mvcc_put_total`
-`etcd_debugging_mvcc_delete_total`
-`etcd_debugging_mvcc_txn_total`
- Fix [`etcd_debugging_mvcc_keys_total`](https://github.com/coreos/etcd/pull/8390) on restore.
- Fix [`etcd_debugging_mvcc_db_total_size_in_bytes`](https://github.com/coreos/etcd/pull/8120) on restore.
- Also change to [`prometheus.NewGaugeFunc`](https://github.com/coreos/etcd/pull/8150).
### Security, Authentication
See [security doc](https://github.com/coreos/etcd/blob/master/Documentation/op-guide/security.md) for more details.
- Add [CRL based connection rejection](https://github.com/coreos/etcd/pull/8124) to manage [revoked certs](https://github.com/coreos/etcd/issues/4034).
- [Server accepts connections if IP matches, without checking DNS entries](https://github.com/coreos/etcd/pull/8223). For instance, if peer cert contains IP addresses and DNS names in Subject Alternative Name (SAN) field, and the remote IP address matches one of those IP addresses, server just accepts connection without further checking the DNS names.
- [Server supports reverse-lookup on wildcard DNS `SAN`](https://github.com/coreos/etcd/pull/8281). For instance, if peer cert contains only DNS names (no IP addresses) in Subject Alternative Name (SAN) field, server first reverse-lookups the remote IP address to get a list of names mapping to that address (e.g. `nslookup IPADDR`). Then accepts the connection if those names have a matching name with peer cert's DNS names (either by exact or wildcard match). If none is matched, server forward-lookups each DNS entry in peer cert (e.g. look up `example.default.svc` when the entry is `*.example.default.svc`), and accepts connection only when the host's resolved addresses have the matching IP address with the peer's remote IP address.
- To support [CommonName(CN) based auth](https://github.com/coreos/etcd/issues/8262) for inter peer connection.
- [Swap priority](https://github.com/coreos/etcd/pull/8594) of cert CommonName(CN) and username + password.
- To address ["username and password specified in the request should take priority over CN in the cert"](https://github.com/coreos/etcd/issues/8584).
- Protect [lease revoke with auth](https://github.com/coreos/etcd/pull/8031).
- Provide user's role on [auth permission error](https://github.com/coreos/etcd/pull/8164).
- Fix [auth store panic with disabled token](https://github.com/coreos/etcd/pull/8695).
### etcd server
- Add [`etcd --experimental-initial-corrupt-check`](https://github.com/coreos/etcd/pull/8554) flag to [check cluster database hashes before serving client/peer traffic](https://github.com/coreos/etcd/issues/8313).
-`etcd --experimental-initial-corrupt-check=false` by default.
- v3.4 will enable `--initial-corrupt-check=true` by default.
- Add [`etcd --experimental-corrupt-check-time`](https://github.com/coreos/etcd/pull/8420) flag to [raise corrupt alarm monitoring](https://github.com/coreos/etcd/issues/7125).
-`etcd --experimental-corrupt-check-time=0s` disabled by default.
- Add [`etcd --experimental-enable-v2v3`](https://github.com/coreos/etcd/pull/8407) flag to [emulate v2 API with v3](https://github.com/coreos/etcd/issues/6925).
-`etcd --experimental-enable-v2v3=false` by default.
- Add [`etcd --max-txn-ops`](https://github.com/coreos/etcd/pull/7976) flag to [configure maximum number operations in transaction](https://github.com/coreos/etcd/issues/7826).
- Add [`etcd --max-request-bytes`](https://github.com/coreos/etcd/pull/7968) flag to [configure maximum client request size](https://github.com/coreos/etcd/issues/7923).
- If not configured, it defaults to 1.5 MiB.
- Add [`etcd --client-crl-file`, `--peer-crl-file`](https://github.com/coreos/etcd/pull/8124) flags for [Certificate revocation list](https://github.com/coreos/etcd/issues/4034).
- Add [`etcd --peer-cert-allowed-cn`](https://github.com/coreos/etcd/pull/8616) flag to support [CN-based auth for inter-peer connection](https://github.com/coreos/etcd/issues/8262).
- Add [`etcd --listen-metrics-urls`](https://github.com/coreos/etcd/pull/8242) flag for additional `/metrics` and `/health` endpoints.
- Support [additional (non) TLS `/metrics` endpoints for a TLS-enabled cluster](https://github.com/coreos/etcd/pull/8282).
- e.g. `etcd --listen-metrics-urls=https://localhost:2378,http://localhost:9379` to serve `/metrics` and `/health` on secure port 2378 and insecure port 9379.
- Useful for [bypassing critical APIs when monitoring etcd](https://github.com/coreos/etcd/issues/8060).
- Add [`etcd --auto-compaction-mode`](https://github.com/coreos/etcd/pull/8123) flag to [support revision-based compaction](https://github.com/coreos/etcd/issues/8098).
- Change `etcd --auto-compaction-retention` flag to [accept string values](https://github.com/coreos/etcd/pull/8563) with [finer granularity](https://github.com/coreos/etcd/issues/8503).
- Now that `etcd --auto-compaction-retention` accepts string values, etcd configuration YAML file `auto-compaction-retention` field must be changed to `string` type.
- Previously, `etcd --config-file etcd.config.yaml` can have `auto-compaction-retention: 24` field, now must be `auto-compaction-retention: "24"` or `auto-compaction-retention: "24h"`.
- If configured as `--auto-compaction-mode periodic --auto-compaction-retention "24h"`, the time duration value for `etcd --auto-compaction-retention` flag must be valid for [`time.ParseDuration`](https://golang.org/pkg/time/#ParseDuration) function in Go.
- e.g. `etcd --auto-compaction-mode=revision --auto-compaction-retention=1000` automatically `Compact` on `"latest revision" - 1000` every 5-minute (when latest revision is 30000, compact on revision 29000).
- e.g. `etcd --auto-compaction-mode=periodic --auto-compaction-retention=72h` automatically `Compact` with 72-hour retention windown, for every 7.2-hour.
- e.g. `etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m` automatically `Compact` with 30-minute retention windown, for every 3-minute.
- Periodic compactor continues to record latest revisions for every 1/10 of given compaction period (e.g. 1-hour when `etcd --auto-compaction-mode=periodic --auto-compaction-retention=10h`).
- For every 1/10 of given compaction period, compactor uses the last revision that was fetched before compaction period, to discard historical data.
- The retention window of compaction period moves for every 1/10 of given compaction period.
- For instance, when hourly writes are 100 and `--auto-compaction-retention=10`, v3.1 compacts revision 1000, 2000, and 3000 for every 10-hour, while v3.2.x, v3.3.0, v3.3.1, and v3.3.2 compact revision 1000, 1100, and 1200 for every 1-hour. Futhermore, when writes per minute are 1000, v3.3.0, v3.3.1, and v3.3.2 with `--auto-compaction-mode=periodic --auto-compaction-retention=30m` compact revision 30000, 33000, and 36000, for every 3-minute with more finer granularity.
- Whether compaction succeeds or not, this process repeats for every 1/10 of given compaction period. If compaction succeeds, it just removes compacted revision from historical revision records.
- Serve [`/health` endpoint as unhealthy](https://github.com/coreos/etcd/pull/8272) when [alarm (e.g. `NOSPACE`) is raised or there's no leader](https://github.com/coreos/etcd/issues/8207).
- Define [`etcdhttp.Health`](https://godoc.org/github.com/coreos/etcd/etcdserver/api/etcdhttp#Health) struct with JSON encoder.
- Note that `"health"` field is [`string` type, not `bool`](https://github.com/coreos/etcd/pull/9143).
- e.g. `{"health":"false"}`, `{"health":"true"}`
- [Remove `"errors"` field](https://github.com/coreos/etcd/pull/9162) since `v3.3.0-rc.3` (did exist only in `v3.3.0-rc.0`, `v3.3.0-rc.1`, `v3.3.0-rc.2`).
- Move [logging setup to embed package](https://github.com/coreos/etcd/pull/8810)
- Disable gRPC server info-level logs by default (can be enabled with `etcd --debug` flag).
- Use [monotonic time in Go 1.9](https://github.com/coreos/etcd/pull/8507) for `lease` package.
- Warn on [empty hosts in advertise URLs](https://github.com/coreos/etcd/pull/8384).
- e.g. `etcd --advertise-client-urls=http://:2379`.
- Warn on [shadowed environment variables](https://github.com/coreos/etcd/pull/8385).
- Address [error on shadowed environment variables](https://github.com/coreos/etcd/issues/8380).
- etcd v3.4 will exit on this error.
### API
- Support [ranges in transaction comparisons](https://github.com/coreos/etcd/pull/8025) for [disconnected linearized reads](https://github.com/coreos/etcd/issues/7924).
- Add [nested transactions](https://github.com/coreos/etcd/pull/8102) to extend [proxy use cases](https://github.com/coreos/etcd/issues/7857).
- Add [lease comparison target in transaction](https://github.com/coreos/etcd/pull/8324).
- Add [hash by revision](https://github.com/coreos/etcd/pull/8263) for [better corruption checking against boltdb](https://github.com/coreos/etcd/issues/8016).
### client v3
- Add [health balancer](https://github.com/coreos/etcd/pull/8545) to fix [watch API hangs](https://github.com/coreos/etcd/issues/7247), improve [endpoint switch under network faults](https://github.com/coreos/etcd/issues/7941).
- [Refactor balancer](https://github.com/coreos/etcd/pull/8840) and add [client-side keepalive pings](https://github.com/coreos/etcd/pull/8199) to handle [network partitions](https://github.com/coreos/etcd/issues/8711).
- Add [`MaxCallSendMsgSize` and `MaxCallRecvMsgSize`](https://github.com/coreos/etcd/pull/9047) fields to [`clientv3.Config`](https://godoc.org/github.com/coreos/etcd/clientv3#Config).
- Fix [exceeded response size limit error in client-side](https://github.com/coreos/etcd/issues/9043).
- In previous versions(v3.2.10, v3.2.11), client response size was limited to only 4 MiB.
-`MaxCallSendMsgSize` default value is 2 MiB, if not configured.
-`MaxCallRecvMsgSize` default value is `math.MaxInt32`, if not configured.
- Accept [`Compare_LEASE`](https://github.com/coreos/etcd/pull/8324) in [`clientv3.Compare`](https://godoc.org/github.com/coreos/etcd/clientv3#Compare).
- Add [`LeaseValue` helper](https://github.com/coreos/etcd/pull/8488) to `Cmp``LeaseID` values in `Txn`.
- Add [`MoveLeader`](https://github.com/coreos/etcd/pull/8153) to `Maintenance`.
- Add [`HashKV`](https://github.com/coreos/etcd/pull/8351) to `Maintenance`.
- Add [`Leases`](https://github.com/coreos/etcd/pull/8358) to `Lease`.
- Add [`clientv3/ordering`](https://github.com/coreos/etcd/pull/8092) for enforce [ordering in serialized requests](https://github.com/coreos/etcd/issues/7623).
- Support [`etcdctl watch [key] [range_end] -- [exec-command…]`](https://github.com/coreos/etcd/pull/8919), equivalent to [v2 `etcdctl exec-watch`](https://github.com/coreos/etcd/issues/8814).
- Make `etcdctl watch -- [exec-command]` set environmental variables [`ETCD_WATCH_REVISION`, `ETCD_WATCH_EVENT_TYPE`, `ETCD_WATCH_KEY`, `ETCD_WATCH_VALUE`](https://github.com/coreos/etcd/pull/9142) for each event.
- Support [`etcdctl watch` with environmental variables `ETCDCTL_WATCH_KEY` and `ETCDCTL_WATCH_RANGE_END`](https://github.com/coreos/etcd/pull/9142).
- Enable [`clientv3.WithRequireLeader(context.Context)` for `watch`](https://github.com/coreos/etcd/pull/8672) command.
- Print [`"del"` instead of `"delete"`](https://github.com/coreos/etcd/pull/8297) in `txn` interactive mode.
- Print [`ETCD_INITIAL_ADVERTISE_PEER_URLS` in `member add`](https://github.com/coreos/etcd/pull/8332).
### etcdctl v3
- Handle [empty key permission](https://github.com/coreos/etcd/pull/8514) in `etcdctl`.
- Add [`grpc-proxy start --max-send-bytes`](https://github.com/coreos/etcd/pull/9250) flag to [configure maximum client request size](https://github.com/coreos/etcd/issues/7923).
- Add [`grpc-proxy start --max-recv-bytes`](https://github.com/coreos/etcd/pull/9250) flag to [configure maximum client request size](https://github.com/coreos/etcd/issues/7923).
- Fix [Snapshot API error handling](https://github.com/coreos/etcd/commit/dbd16d52fbf81e5fd806d21ff5e9148d5bf203ab).
- Fix [KV API `PrevKv` flag handling](https://github.com/coreos/etcd/pull/8366).
- Fix [KV API `KeysOnly` flag handling](https://github.com/coreos/etcd/pull/8552).
### gRPC gateway
- Replace [gRPC gateway](https://github.com/grpc-ecosystem/grpc-gateway) endpoint `/v3alpha` with [`/v3beta`](https://github.com/coreos/etcd/pull/8880).
- To deprecate [`/v3alpha`](https://github.com/coreos/etcd/issues/8125) in v3.4.
- In v3.3, `curl -L http://localhost:2379/v3alpha/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` still works as a fallback to `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'`, but `curl -L http://localhost:2379/v3alpha/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` won't work in v3.4. Use `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` instead.
- Support ["authorization" token](https://github.com/coreos/etcd/pull/7999).
- Support [websocket for bi-directional streams](https://github.com/coreos/etcd/pull/8257).
- Fix [`Watch` API with gRPC gateway](https://github.com/coreos/etcd/issues/8237).
- Upgrade gRPC gateway to [v1.3.0](https://github.com/coreos/etcd/issues/8838).
### etcd server
- Fix [backend database in-memory index corruption](https://github.com/coreos/etcd/pull/8127) issue on restore (only 3.2.0 is affected).
- Fix [watch restore from snapshot](https://github.com/coreos/etcd/pull/8427).
- Fix [`mvcc/backend.defragdb` nil-pointer dereference on create bucket failure](https://github.com/coreos/etcd/pull/9119).
- Fix [server crash](https://github.com/coreos/etcd/pull/8010) on [invalid transaction request from gRPC gateway](https://github.com/coreos/etcd/issues/7889).
- Prevent [server panic from member update/add](https://github.com/coreos/etcd/pull/9174) with [wrong scheme URLs](https://github.com/coreos/etcd/issues/9173).
- Make [peer dial timeout longer](https://github.com/coreos/etcd/pull/8599).
- See [coreos/etcd-operator#1300](https://github.com/coreos/etcd-operator/issues/1300) for more detail.
- Make server [wait up to request time-out](https://github.com/coreos/etcd/pull/8267) with [pending RPCs](https://github.com/coreos/etcd/issues/8224).
- Fix [`grpc.Server` panic on `GracefulStop`](https://github.com/coreos/etcd/pull/8987) with [TLS-enabled server](https://github.com/coreos/etcd/issues/8916).
Previous change logs can be found at [CHANGELOG-3.3](https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md).
## v3.4.0 (TBD 2018-09)
See [code changes](https://github.com/coreos/etcd/compare/v3.3.0...v3.4.0) and [v3.4 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_4.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.4 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_4.md).**
### Improved
- Rewrite [client balancer](https://github.com/coreos/etcd/pull/9860) with [new gRPC balancer interface](https://github.com/coreos/etcd/issues/9106).
- Add [backoff on watch retries on transient errors](https://github.com/coreos/etcd/pull/9840).
- Add [jitter to watch progress notify](https://github.com/coreos/etcd/pull/9278) to prevent [spikes in `etcd_network_client_grpc_sent_bytes_total`](https://github.com/coreos/etcd/issues/9246).
- e.g. `read-only range request "key:\"/a\" range_end:\"/b\" " with result "range_response_count:3 size:96" took too long (97.966µs) to execute`.
- Redact [request value field](https://github.com/coreos/etcd/pull/9822).
- Provide [response size](https://github.com/coreos/etcd/pull/9826).
- Improve [TLS setup error logging](https://github.com/coreos/etcd/pull/9518) to help debug [TLS-enabled cluster configuring issues](https://github.com/coreos/etcd/issues/9400).
- Improve [long-running concurrent read transactions under light write workloads](https://github.com/coreos/etcd/pull/9296).
- Previously, periodic commit on pending writes blocks incoming read transactions, even if there is no pending write.
- Now, periodic commit operation does not block concurrent read transactions, thus improves long-running read transaction performance.
- Improve [Raft Read Index timeout warning messages](https://github.com/coreos/etcd/pull/9897).
- Adjust [election timeout on server restart](https://github.com/coreos/etcd/pull/9415) to reduce [disruptive rejoining servers](https://github.com/coreos/etcd/issues/9333).
- Previously, etcd fast-forwards election ticks on server start, with only one tick left for leader election. This is to speed up start phase, without having to wait until all election ticks elapse. Advancing election ticks is useful for cross datacenter deployments with larger election timeouts. However, it was affecting cluster availability if the last tick elapses before leader contacts the restarted node.
- Now, when etcd restarts, it adjusts election ticks with more than one tick left, thus more time for leader to prevent disruptive restart.
- Add [Raft Pre-Vote feature](https://github.com/coreos/etcd/pull/9352) to reduce [disruptive rejoining servers](https://github.com/coreos/etcd/issues/9333).
- For instance, a flaky(or rejoining) member may drop in and out, and start campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election.
- e.g. `etcd --auto-compaction-mode=revision --auto-compaction-retention=1000` automatically `Compact` on `"latest revision" - 1000` every 5-minute (when latest revision is 30000, compact on revision 29000).
- e.g. Previously, `etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h` automatically `Compact` with 24-hour retention windown for every 2.4-hour. Now, `Compact` happens for every 1-hour.
- e.g. Previously, `etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m` automatically `Compact` with 30-minute retention windown for every 3-minute. Now, `Compact` happens for every 30-minute.
- Periodic compactor keeps recording latest revisions for every compaction period when given period is less than 1-hour, or for every 1-hour when given compaction period is greater than 1-hour (e.g. 1-hour when `etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h`).
- For every compaction period or 1-hour, compactor uses the last revision that was fetched before compaction period, to discard historical data.
- The retention window of compaction period moves for every given compaction period or hour.
- For instance, when hourly writes are 100 and `etcd --auto-compaction-mode=periodic --auto-compaction-retention=24h`, `v3.2.x`, `v3.3.0`, `v3.3.1`, and `v3.3.2` compact revision 2400, 2640, and 2880 for every 2.4-hour, while `v3.3.3`*or later* compacts revision 2400, 2500, 2600 for every 1-hour.
- Futhermore, when `etcd --auto-compaction-mode=periodic --auto-compaction-retention=30m` and writes per minute are about 1000, `v3.3.0`, `v3.3.1`, and `v3.3.2` compact revision 30000, 33000, and 36000, for every 3-minute, while `v3.3.3`*or later* compacts revision 30000, 60000, and 90000, for every 30-minute.
- Make [Lease `Lookup` non-blocking with concurrent `Grant`/`Revoke`](https://github.com/coreos/etcd/pull/9229).
- Make etcd server return `raft.ErrProposalDropped` on internal Raft proposal drop in [v3 applier](https://github.com/coreos/etcd/pull/9549) and [v2 applier](https://github.com/coreos/etcd/pull/9558).
- e.g. a node is removed from cluster, or [`raftpb.MsgProp` arrives at current leader while there is an ongoing leadership transfer](https://github.com/coreos/etcd/issues/8975).
- Add [`snapshot`](https://github.com/coreos/etcd/pull/9118) package for easier snapshot workflow (see [`godoc.org/github.com/etcd/clientv3/snapshot`](https://godoc.org/github.com/coreos/etcd/clientv3/snapshot) for more).
- Improve [functional tester](https://github.com/coreos/etcd/tree/master/functional) coverage: [proxy layer to run network fault tests in CI](https://github.com/coreos/etcd/pull/9081), [TLS is enabled both for server and client](https://github.com/coreos/etcd/pull/9534), [liveness mode](https://github.com/coreos/etcd/issues/9230), [shuffle test sequence](https://github.com/coreos/etcd/issues/9381), [membership reconfiguration failure cases](https://github.com/coreos/etcd/pull/9564), [disastrous quorum loss and snapshot recover from a seed member](https://github.com/coreos/etcd/pull/9565), [embedded etcd](https://github.com/coreos/etcd/pull/9572).
- Improve [index compaction blocking](https://github.com/coreos/etcd/pull/9511) by using a copy on write clone to avoid holding the lock for the traversal of the entire index.
- Update [JWT methods](https://github.com/coreos/etcd/pull/9883) to allow for use of any supported signature method/algorithm.
- Add [Lease checkpointing](https://github.com/coreos/etcd/pull/9924) to persist remaining TTLs to the consensus log periodically so that long lived leases progress toward expiry in the presence of leader elections and server restarts.
### Breaking Changes
- Make [`ETCDCTL_API=3 etcdctl` default](https://github.com/coreos/etcd/issues/9600).
- Now, `etcdctl set foo bar` must be `ETCDCTL_API=2 etcdctl set foo bar`.
- Now, `ETCDCTL_API=3 etcdctl put foo bar` could be just `etcdctl put foo bar`.
- **Remove `etcd --ca-file` flag**, instead [use `etcd --trusted-ca-file`](https://github.com/coreos/etcd/pull/9470) (`etcd --ca-file` flag has been marked deprecated since v2.1).
- **Remove `etcd --peer-ca-file` flag**, instead [use `etcd --peer-trusted-ca-file`](https://github.com/coreos/etcd/pull/9470) (`etcd --peer-ca-file` flag has been marked deprecated since v2.1).
- **Remove `pkg/transport.TLSInfo.CAFile` field**, instead [use `pkg/transport.TLSInfo.TrustedCAFile`](https://github.com/coreos/etcd/pull/9470) (`CAFile` field has been marked deprecated since v2.1).
- e.g. exit with error on `--advertise-client-urls=http://:2379`.
- e.g. exit with error on `--initial-advertise-peer-urls=http://:2380`.
- Exit on [shadowed environment variables](https://github.com/coreos/etcd/pull/9382).
- Address [error on shadowed environment variables](https://github.com/coreos/etcd/issues/8380).
- e.g. exit with error on `ETCD_NAME=abc etcd --name=def`.
- e.g. exit with error on `ETCD_INITIAL_CLUSTER_TOKEN=abc etcd --initial-cluster-token=def`.
- e.g. exit with error on `ETCDCTL_ENDPOINTS=abc.com ETCDCTL_API=3 etcdctl endpoint health --endpoints=def.com`.
- Change [`etcdserverpb.AuthRoleRevokePermissionRequest/key,range_end` fields type from `string` to `bytes`](https://github.com/coreos/etcd/pull/9433).
- Rename [`etcd_debugging_mvcc_db_total_size_in_bytes` Prometheus metric to `etcd_mvcc_db_total_size_in_bytes`](https://github.com/coreos/etcd/pull/9819).
- Rename `etcdserver.ServerConfig.SnapCount` field to `etcdserver.ServerConfig.SnapshotCount`, to be consistent with the flag name `etcd --snapshot-count`.
- Rename `embed.Config.SnapCount` field to [`embed.Config.SnapshotCount`](https://github.com/coreos/etcd/pull/9745), to be consistent with the flag name `etcd --snapshot-count`.
- Change [`embed.Config.CorsInfo` in `*cors.CORSInfo` type to `embed.Config.CORS` in `map[string]struct{}` type](https://github.com/coreos/etcd/pull/9490).
- Now logger is set up automatically based on [`embed.Config.Logger`, `embed.Config.LogOutputs`, `embed.Config.Debug` fields](https://github.com/coreos/etcd/pull/9572).
- Rename [`etcd --log-output` to `etcd --log-outputs`](https://github.com/coreos/etcd/pull/9624) to support multiple log outputs.
- **`etcd --log-output`** will be deprecated in v3.5.
- Rename [**`embed.Config.LogOutput`** to **`embed.Config.LogOutputs`**](https://github.com/coreos/etcd/pull/9624) to support multiple log outputs.
- Change [**`embed.Config.LogOutputs`** type from `string` to `[]string`](https://github.com/coreos/etcd/pull/9579) to support multiple log outputs.
- Now that `etcd --log-outputs` accepts multiple writers, etcd configuration YAML file `log-outputs` field must be changed to `[]string` type.
- Previously, `etcd --config-file etcd.config.yaml` can have `log-outputs: default` field, now must be `log-outputs: [default]`.
- Change v3 `etcdctl snapshot` exit codes with [`snapshot` package](https://github.com/coreos/etcd/pull/9118/commits/df689f4280e1cce4b9d61300be13ca604d41670a).
- Exit on error with exit code 1 (no more exit code 5 or 6 on `snapshot save/restore` commands).
- Migrate dependency management tool from `glide` to [`golang/dep`](https://github.com/coreos/etcd/pull/9155).
-<= 3.3 puts `vendor` directory under `cmd/vendor` directory to [prevent conflicting transitive dependencies](https://github.com/coreos/etcd/issues/4913).
- 3.4 moves `cmd/vendor` directory to `vendor` at repository root.
- Remove recursive symlinks in `cmd` directory.
- Now `go get/install/build` on `etcd` packages (e.g. `clientv3`, `tools/benchmark`) enforce builds with etcd `vendor` directory.
- Replace [gRPC gateway](https://github.com/grpc-ecosystem/grpc-gateway) endpoint `/v3beta` with [`/v3`](https://github.com/coreos/etcd/pull/9298).
- To deprecate [`/v3beta`](https://github.com/coreos/etcd/issues/9189) in v3.5.
- In v3.4, `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` still works as a fallback to `curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'`, but `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` won't work in v3.5. Use `curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` instead.
- Change [`wal` package function signatures](https://github.com/coreos/etcd/pull/9572) to support [structured logger and logging to file](https://github.com/coreos/etcd/issues/9438) in server-side.
- Change [`etcd --experimental-enable-v2v3`](TODO) flag to `etcd --enable-v2v3`; v2 storage emulation is now stable.
- Move internal packages to `etcdserver`.
-`"github.com/coreos/etcd/alarm"` to `"github.com/coreos/etcd/etcdserver/api/v3alarm"`.
-`"github.com/coreos/etcd/compactor"` to `"github.com/coreos/etcd/etcdserver/api/v3compactor"`.
-`"github.com/coreos/etcd/discovery"` to `"github.com/coreos/etcd/etcdserver/api/v2discovery"`.
-`"github.com/coreos/etcd/etcdserver/auth"` to `"github.com/coreos/etcd/etcdserver/api/v2auth"`.
-`"github.com/coreos/etcd/etcdserver/membership"` to `"github.com/coreos/etcd/etcdserver/api/membership"`.
-`"github.com/coreos/etcd/etcdserver/stats"` to `"github.com/coreos/etcd/etcdserver/api/v2stats"`.
-`"github.com/coreos/etcd/error"` to `"github.com/coreos/etcd/etcdserver/api/v2error"`.
-`"github.com/coreos/etcd/rafthttp"` to `"github.com/coreos/etcd/etcdserver/api/rafthttp"`.
-`"github.com/coreos/etcd/snap"` to `"github.com/coreos/etcd/etcdserver/api/snap"`.
-`"github.com/coreos/etcd/store"` to `"github.com/coreos/etcd/etcdserver/api/v2store"`.
- Change [snapshot file permissions](https://github.com/coreos/etcd/pull/9977): On Linux, the snapshot file changes from readable by all (mode 0644) to readable by the user only (mode 0600).
### Dependency
- Upgrade [`google.golang.org/grpc`](https://github.com/grpc/grpc-go/releases) from [**`v1.7.5`**](https://github.com/grpc/grpc-go/releases/tag/v1.7.5) to [**`v1.13.0`**](https://github.com/grpc/grpc-go/releases/tag/v1.13.0).
- Upgrade [`github.com/golang/protobuf`](https://github.com/golang/protobuf/releases) from [**`golang/protobuf@1e59b77b5`**](https://github.com/golang/protobuf/commit/1e59b77b52bf8e4b449a57e6f79f21226d571845) to [**`v1.1.0`**](https://github.com/golang/protobuf/releases/tag/v1.1.0).
- Upgrade [`golang.org/x/crypto`](https://github.com/golang/crypto) from [**`crypto@9419663f5`**](https://github.com/golang/crypto/commit/9419663f5a44be8b34ca85f08abc5fe1be11f8a3) to [**`crypto@8ac0e0d97`**](https://github.com/golang/crypto/commit/8ac0e0d97ce45cd83d1d7243c060cb8461dda5e9).
- Upgrade [`golang.org/x/net`](https://github.com/golang/net) from [**`net@66aacef3d`**](https://github.com/golang/net/commit/66aacef3dd8a676686c7ae3716979581e8b03c47) to [**`net@db08ff08e`**](https://github.com/golang/net/commit/db08ff08e8622530d9ed3a0e8ac279f6d4c02196).
- Upgrade [`golang.org/x/sys`](https://github.com/golang/sys) from [**`sys@ebfc5b463`**](https://github.com/golang/sys/commit/ebfc5b4631820b793c9010c87fd8fef0f39eb082) to [**`sys@56ede360e`**](https://github.com/golang/sys/commit/56ede360ec1c541828fb88741b3f1049406d28f5).
- Upgrade [`golang.org/x/text`](https://github.com/golang/text) from [**`text@b19bf474d`**](https://github.com/golang/text/commit/b19bf474d317b857955b12035d2c5acb57ce8b01) to [**`text@f21a4dfb5`**](https://github.com/golang/text/commit/f21a4dfb5e38f5895301dc265a8def02365cc3d0).
- Upgrade [`golang.org/x/time`](https://github.com/golang/time) from [**`time@c06e80d93`**](https://github.com/golang/time/commit/c06e80d9300e4443158a03817b8a8cb37d230320) to [**`time@fbb02b229`**](https://github.com/golang/time/commit/fbb02b2291d28baffd63558aa44b4b56f178d650).
- Upgrade [`github.com/golang/protobuf`](https://github.com/golang/protobuf/releases) from [**`golang/protobuf@1e59b77b5`**](https://github.com/golang/protobuf/commit/1e59b77b52bf8e4b449a57e6f79f21226d571845) to [**`v1.1.0`**](https://github.com/golang/protobuf/releases/tag/v1.1.0).
- Upgrade [`gopkg.in/yaml.v2`](https://github.com/go-yaml/yaml/releases) from [**`yaml@cd8b52f82`**](https://github.com/go-yaml/yaml/commit/cd8b52f8269e0feb286dfeef29f8fe4d5b397e0b) to [**`yaml@5420a8b67`**](https://github.com/go-yaml/yaml/commit/5420a8b6744d3b0345ab293f6fcba19c978f1183).
- Upgrade [`github.com/dgrijalva/jwt-go`](https://github.com/dgrijalva/jwt-go/releases) from [**`v3.0.0`**](https://github.com/dgrijalva/jwt-go/releases/tag/v3.0.0) to [**`v3.2.0`**](https://github.com/dgrijalva/jwt-go/releases/tag/v3.2.0).
- Upgrade [`github.com/ugorji/go/codec`](https://github.com/ugorji/go/releases) to [**`v1.1.1`**](https://github.com/ugorji/go/releases/tag/v1.1.1), and [regenerate v2 `client`](https://github.com/coreos/etcd/pull/9494).
- Upgrade [`github.com/soheilhy/cmux`](https://github.com/soheilhy/cmux/releases) from [**`v0.1.3`**](https://github.com/soheilhy/cmux/releases/tag/v0.1.3) to [**`v0.1.4`**](https://github.com/soheilhy/cmux/releases/tag/v0.1.4).
- Upgrade [`github.com/google/btree`](https://github.com/google/btree/releases) from [**`google/btree@925471ac9`**](https://github.com/google/btree/commit/925471ac9e2131377a91e1595defec898166fe49) to [**`google/btree@e89373fe6`**](https://github.com/google/btree/commit/e89373fe6b4a7413d7acd6da1725b83ef713e6e4).
- Upgrade [`github.com/spf13/cobra`](https://github.com/spf13/cobra/releases) from [**`spf13/cobra@1c44ec8d3`**](https://github.com/spf13/cobra/commit/1c44ec8d3f1552cac48999f9306da23c4d8a288b) to [**`v0.0.3`**](https://github.com/spf13/cobra/releases/tag/v0.0.3).
- Upgrade [`github.com/spf13/pflag`](https://github.com/spf13/pflag/releases) from [**`v1.0.0`**](https://github.com/spf13/pflag/releases/tag/v1.0.0) to [**`spf13/pflag@1ce0cc6db`**](https://github.com/spf13/pflag/commit/1ce0cc6db4029d97571db82f85092fccedb572ce).
- Upgrade [`github.com/coreos/go-systemd`](https://github.com/coreos/go-systemd/releases) from [**`v15`**](https://github.com/coreos/go-systemd/releases/tag/v15) to [**`v17`**](https://github.com/coreos/go-systemd/releases/tag/v17).
- Upgrade [`github.com/prometheus/client_golang`](https://github.com/prometheus/client_golang/releases) from [**``prometheus/client_golang@5cec1d042``**](https://github.com/prometheus/client_golang/commit/5cec1d0429b02e4323e042eb04dafdb079ddf568) to [**`v0.8.0`**](https://github.com/prometheus/client_golang/releases/tag/v0.8.0).
- Upgrade [`github.com/grpc-ecosystem/go-grpc-prometheus`](https://github.com/grpc-ecosystem/go-grpc-prometheus/releases) from [**``grpc-ecosystem/go-grpc-prometheus@0dafe0d49``**](https://github.com/grpc-ecosystem/go-grpc-prometheus/commit/0dafe0d496ea71181bf2dd039e7e3f44b6bd11a7) to [**`v1.2.0`**](https://github.com/grpc-ecosystem/go-grpc-prometheus/releases/tag/v1.2.0).
- Upgrade [`github.com/grpc-ecosystem/grpc-gateway`](https://github.com/grpc-ecosystem/grpc-gateway/releases) from [**`v1.3.1`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.3.1) to [**`v1.4.1`**](https://github.com/grpc-ecosystem/grpc-gateway/releases/tag/v1.4.1).
### Metrics, Monitoring
See [List of metrics](https://etcd.readthedocs.io/en/latest/operate.html#latest) for all metrics per release.
Note that any `etcd_debugging_*` metrics are experimental and subject to change.
- Let's say `"7339c4e5e833c029"` server `/metrics` returns `etcd_network_active_peers{Local="7339c4e5e833c029",Remote="729934363faa4a24"} 1` and `etcd_network_active_peers{Local="7339c4e5e833c029",Remote="b548c2511513015"} 1`. This indicates that the local node `"7339c4e5e833c029"` currently has two active remote peers `"729934363faa4a24"` and `"b548c2511513015"` in a 3-node cluster. If the node `"b548c2511513015"` is down, the local node `"7339c4e5e833c029"` will show `etcd_network_active_peers{Local="7339c4e5e833c029",Remote="729934363faa4a24"} 1` and `etcd_network_active_peers{Local="7339c4e5e833c029",Remote="b548c2511513015"} 0`.
- If a remote peer `"b548c2511513015"` is down, the local node `"7339c4e5e833c029"` server `/metrics` would return `etcd_network_disconnected_peers_total{Local="7339c4e5e833c029",Remote="b548c2511513015"} 1`, while active peer metrics will show `etcd_network_active_peers{Local="7339c4e5e833c029",Remote="729934363faa4a24"} 1` and `etcd_network_active_peers{Local="7339c4e5e833c029",Remote="b548c2511513015"} 0`.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
- Use it with `etcd_mvcc_db_total_size_in_bytes` and `etcd_mvcc_db_total_size_in_use_in_bytes`.
-`etcd_server_quota_backend_bytes 2.147483648e+09` means current quota size is 2 GB.
-`etcd_mvcc_db_total_size_in_bytes 20480` means current physically allocated DB size is 20 KB.
-`etcd_mvcc_db_total_size_in_use_in_bytes 16384` means future DB size if defragment operation is complete.
-`etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes` is the number of bytes that can be saved on disk with defragment operation.
- Add [`etcd --host-whitelist`](https://github.com/coreos/etcd/pull/9372) flag, [`etcdserver.Config.HostWhitelist`](https://github.com/coreos/etcd/pull/9372), and [`embed.Config.HostWhitelist`](https://github.com/coreos/etcd/pull/9372), to prevent ["DNS Rebinding"](https://en.wikipedia.org/wiki/DNS_rebinding) attack.
- Any website can simply create an authorized DNS name, and direct DNS to `"localhost"` (or any other address). Then, all HTTP endpoints of etcd server listening on `"localhost"` becomes accessible, thus vulnerable to [DNS rebinding attacks (CVE-2018-5702)](https://bugs.chromium.org/p/project-zero/issues/detail?id=1447#c2).
- Client origin enforce policy works as follow:
- If client connection is secure via HTTPS, allow any hostnames..
- If client connection is not secure and `"HostWhitelist"` is not empty, only allow HTTP requests whose Host field is listed in whitelist.
- By default, `"HostWhitelist"` is `"*"`, which means insecure server allows all client HTTP requests.
- Note that the client origin policy is enforced whether authentication is enabled or not, for tighter controls.
- When specifying hostnames, loopback addresses are not added automatically. To allow loopback interfaces, add them to whitelist manually (e.g. `"localhost"`, `"127.0.0.1"`, etc.).
- e.g. `etcd --host-whitelist example.com`, then the server will reject all HTTP requests whose Host field is not `example.com` (also rejects requests to `"localhost"`).
- Support [`etcd --cors`](https://github.com/coreos/etcd/pull/9490) in v3 HTTP requests (gRPC gateway).
- Support [`ttl` field for `etcd` Authentication JWT token](https://github.com/coreos/etcd/pull/8302).
- e.g. `etcd --auth-token jwt,pub-key=<pub key path>,priv-key=<priv key path>,sign-method=<sign method>,ttl=5m`.
- Allow empty token provider in [`etcdserver.ServerConfig.AuthToken`](https://github.com/coreos/etcd/pull/9369).
- Fix [TLS reload](https://github.com/coreos/etcd/pull/9570) when [certificate SAN field only includes IP addresses but no domain names](https://github.com/coreos/etcd/issues/9541).
- In Go, server calls `(*tls.Config).GetCertificate` for TLS reload if and only if server's `(*tls.Config).Certificates` field is not empty, or `(*tls.ClientHelloInfo).ServerName` is not empty with a valid SNI from the client. Previously, etcd always populates `(*tls.Config).Certificates` on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger `(*tls.Config).GetCertificate` to reload TLS assets.
- However, a certificate whose SAN field does [not include any domain names but only IP addresses](https://github.com/coreos/etcd/issues/9541) would request `*tls.ClientHelloInfo` with an empty `ServerName` field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online.
- Now, `(*tls.Config).Certificates` is created empty on initial TLS client handshake, first to trigger `(*tls.Config).GetCertificate`, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).
### etcd server
- Add [`etcd --initial-election-tick-advance`](https://github.com/coreos/etcd/pull/9591) flag to configure initial election tick fast-forward.
- By default, `etcd --initial-election-tick-advance=true`, then local member fast-forwards election ticks to speed up "initial" leader election trigger.
- This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
- Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
- However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
- Now, this can be disabled by setting `etcd --initial-election-tick-advance=false`.
- Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring `etcd --initial-election-tick-advance` at the cost of slow initial bootstrap.
- Add [`etcd --pre-vote`](https://github.com/coreos/etcd/pull/9352) flag to enable to run an additional Raft election phase.
- For instance, a flaky(or rejoining) member may drop in and out, and start campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election.
-`etcd --pre-vote=false` by default.
- v3.5 will enable `etcd --pre-vote=true` by default.
- [`etcd --initial-corrupt-check`](TODO) flag is now stable (`etcd --experimental-initial-corrupt-check`haisbeen deprecated).
-`etcd --initial-corrupt-check=true` by default, to check cluster database hashes before serving client/peer traffic.
- [`etcd --corrupt-check-time`](TODO) flag is now stable (`etcd --experimental-corrupt-check-time`haisbeen deprecated).
-`etcd --corrupt-check-time=12h` by default, to check cluster database hashes for every 12-hour.
- [`etcd --enable-v2v3`](TODO) flag is now stable.
-`etcd --experimental-enable-v2v3` has been deprecated.
-`etcd --enable-v2=true --enable-v2v3=''` by default, to enable v2 API server that is backed by **v2 store**.
-`etcd --enable-v2=true --enable-v2v3=/aaa` to enable v2 API server that is backed by **v3 storage**.
-`etcd --enable-v2=false --enable-v2v3=''` to disable v2 API server.
-`etcd --enable-v2=false --enable-v2v3=/aaa` to disable v2 API server. TODO: error?
- Automatically [create parent directory if it does not exist](https://github.com/coreos/etcd/pull/9626) (fix [issue#9609](https://github.com/coreos/etcd/issues/9609)).
- v4.0 will configure `etcd --enable-v2=true --enable-v2v3=/aaa` to enable v2 API server that is backed by **v3 storage**.
- Add [`etcd --discovery-srv-name`](https://github.com/coreos/etcd/pull/8690) flag to support custom DNS SRV name with discovery.
- If not given, etcd queries `_etcd-server-ssl._tcp.[YOUR_HOST]` and `_etcd-server._tcp.[YOUR_HOST]`.
- If `etcd --discovery-srv-name="foo"`, then query `_etcd-server-ssl-foo._tcp.[YOUR_HOST]` and `_etcd-server-foo._tcp.[YOUR_HOST]`.
- Useful for operating multiple etcd clusters under the same domain.
- Support TLS cipher suite whitelisting.
- To block [weak cipher suites](https://github.com/coreos/etcd/issues/8320).
- TLS handshake fails when client hello is requested with invalid cipher suites.
- Support [`etcd --cors`](https://github.com/coreos/etcd/pull/9490) in v3 HTTP requests (gRPC gateway).
- Rename [`etcd --log-output` to `etcd --log-outputs`](https://github.com/coreos/etcd/pull/9624) to support multiple log outputs.
- **`etcd --log-output` will be deprecated in v3.5**.
- Add [`etcd --logger`](https://github.com/coreos/etcd/pull/9572) flag to support [structured logger and multiple log outputs](https://github.com/coreos/etcd/issues/9438) in server-side.
- **`etcd --logger=capnslog` will be deprecated in v3.5**.
- Main motivation is to promote automated etcd monitoring, rather than looking back server logs when it starts breaking. Future development will make etcd log as few as possible, and make etcd easier to monitor with metrics and alerts.
-`etcd --logger=capnslog --log-outputs=default` is the default setting and same as previous etcd server logging format.
-`etcd --logger=zap --log-outputs=default` is not supported when `etcd --logger=zap`.
- Instead, use `etcd --logger=zap --log-outputs=stderr`.
- Or, use `etcd --logger=zap --log-outputs=systemd/journal` to send logs to the local systemd journal.
- Previously, if etcd parent process ID (PPID) is 1 (e.g. run with systemd), `etcd --logger=capnslog --log-outputs=default` redirects server logs to local systemd journal. And if write to journald fails, it writes to `os.Stderr` as a fallback.
- However, even with PPID 1, it can fail to dial systemd journal (e.g. run embedded etcd with Docker container). Then, [every single log write will fail](https://github.com/coreos/etcd/pull/9729) and fall back to `os.Stderr`, which is inefficient.
- To avoid this problem, systemd journal logging must be configured manually.
-`etcd --logger=zap --log-outputs=stderr` will log server operations in [JSON-encoded format](https://godoc.org/go.uber.org/zap#NewProductionEncoderConfig) and writes logs to `os.Stderr`. Use this to override journald log redirects.
-`etcd --logger=zap --log-outputs=stdout` will log server operations in [JSON-encoded format](https://godoc.org/go.uber.org/zap#NewProductionEncoderConfig) and writes logs to `os.Stdout` Use this to override journald log redirects.
-`etcd --logger=zap --log-outputs=a.log` will log server operations in [JSON-encoded format](https://godoc.org/go.uber.org/zap#NewProductionEncoderConfig) and writes logs to the specified file `a.log`.
-`etcd --logger=zap --log-outputs=a.log,b.log,c.log,stdout` [writes server logs to multiple files `a.log`, `b.log` and `c.log` at the same time](https://github.com/coreos/etcd/pull/9579) and outputs to `os.Stderr`, in [JSON-encoded format](https://godoc.org/go.uber.org/zap#NewProductionEncoderConfig).
-`etcd --logger=zap --log-outputs=/dev/null` will discard all server logs.
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
- A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
- Fix [`mvcc` server panic from restore operation](https://github.com/coreos/etcd/pull/9775).
- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, **etcd server panicked** during snapshot restore operation.
- Now, this server-side panic has been fixed.
- Fix [server panic on invalid Election Proclaim/Resign HTTP(S) requests](https://github.com/coreos/etcd/pull/9379).
- Previously, wrong-formatted HTTP requests to Election API could trigger panic in etcd server.
- e.g. `curl -L http://localhost:2379/v3/election/proclaim -X POST -d '{"value":""}'`, `curl -L http://localhost:2379/v3/election/resign -X POST -d '{"value":""}'`.
- Previously, `etcd --auto-compaction-mode revision --auto-compaction-retention 1` was [translated to revision retention 3600000000000](https://github.com/coreos/etcd/issues/9337).
- Now, `etcd --auto-compaction-mode revision --auto-compaction-retention 1` is correctly parsed as revision retention 1.
- Prevent [overflow by large `TTL` values for `Lease` `Grant`](https://github.com/coreos/etcd/pull/9399).
-`TTL` parameter to `Grant` request is unit of second.
- Leases with too large `TTL` values exceeding `math.MaxInt64` [expire in unexpected ways](https://github.com/coreos/etcd/issues/9374).
- Server now returns `rpctypes.ErrLeaseTTLTooLarge` to client, when the requested `TTL` is larger than *9,000,000,000 seconds* (which is >285 years).
- Again, etcd `Lease` is meant for short-periodic keepalives or sessions, in the range of seconds or minutes. Not for hours or days!
- Enable etcd server [`raft.Config.CheckQuorum` when starting with `ForceNewCluster`](https://github.com/coreos/etcd/pull/9347).
- Allow [non-WAL files in `etcd --wal-dir` directory](https://github.com/coreos/etcd/pull/9743).
- Previously, existing files such as [`lost+found`](https://github.com/coreos/etcd/issues/7287) in WAL directory prevent etcd server boot.
- Now, WAL directory that contains only `lost+found` or a file that's not suffixed with `.wal` is considered non-initialized.
### API
- Add [`snapshot`](https://github.com/coreos/etcd/pull/9118) package for snapshot restore/save operations (see [`godoc.org/github.com/etcd/clientv3/snapshot`](https://godoc.org/github.com/coreos/etcd/clientv3/snapshot) for more).
- Add [`watch_id` field to `etcdserverpb.WatchCreateRequest`](https://github.com/coreos/etcd/pull/9065) to allow user-provided watch ID to `mvcc`.
- Corresponding `watch_id` is returned via `etcdserverpb.WatchResponse`, if any.
- Add [`fragment` field to `etcdserverpb.WatchCreateRequest`](https://github.com/coreos/etcd/pull/9291) to request etcd server to [split watch events](https://github.com/coreos/etcd/issues/9294) when the total size of events exceeds `etcd --max-request-bytes` flag value plus gRPC-overhead 512 bytes.
- The default server-side request bytes limit is `embed.DefaultMaxRequestBytes` which is 1.5 MiB plus gRPC-overhead 512 bytes.
- If watch response events exceed this server-side request limit and watch request is created with `fragment` field `true`, the server will split watch events into a set of chunks, each of which is a subset of watch events below server-side request limit.
- Useful when client-side has limited bandwidths.
- For example, watch response contains 10 events, where each event is 1 MiB. And server `etcd --max-request-bytes` flag value is 1 MiB. Then, server will send 10 separate fragmented events to the client.
- For example, watch response contains 5 events, where each event is 2 MiB. And server `etcd --max-request-bytes` flag value is 1 MiB and `clientv3.Config.MaxCallRecvMsgSize` is 1 MiB. Then, server will try to send 5 separate fragmented events to the client, and the client will error with `"code = ResourceExhausted desc = grpc: received message larger than max (...)"`.
- Client must implement fragmented watch event merge (which `clientv3` does in etcd v3.4).
- Add [`raftAppliedIndex` field to `etcdserverpb.StatusResponse`](https://github.com/coreos/etcd/pull/9176) for current Raft applied index.
- Add [`errors` field to `etcdserverpb.StatusResponse`](https://github.com/coreos/etcd/pull/9206) for server-side error.
- e.g. `"etcdserver: no leader", "NOSPACE", "CORRUPT"`
- Add [`dbSizeInUse` field to `etcdserverpb.StatusResponse`](https://github.com/coreos/etcd/pull/9256) for actual DB size after compaction.
- To manually trigger broadcasting watch progress event (empty watch response with latest header) to all associated watch streams.
- Think of it as `WithProgressNotify` that can be triggered manually.
Note: **v3.5 will deprecate `etcd --log-package-levels` flag for `capnslog`**; `etcd --logger=zap --log-outputs=stderr` will the default. **v3.5 will deprecate `[CLIENT-URL]/config/local/log` endpoint.**
### Package `embed`
- Add [`embed.Config.CipherSuites`](https://github.com/coreos/etcd/pull/9801) to specify a list of supported cipher suites for TLS handshake between client/server and peers.
- If empty, Go auto-populates the list.
- Both `embed.Config.ClientTLSInfo.CipherSuites` and `embed.Config.CipherSuites` cannot be non-empty at the same time.
- If not empty, specify either `embed.Config.ClientTLSInfo.CipherSuites` or `embed.Config.CipherSuites`.
- Add [`embed.Config.InitialElectionTickAdvance`](https://github.com/coreos/etcd/pull/9591) to enable/disable initial election tick fast-forward.
-`embed.NewConfig()` would return `*embed.Config` with `InitialElectionTickAdvance` as true by default.
- Define [`embed.CompactorModePeriodic`](https://godoc.org/github.com/coreos/etcd/embed#pkg-variables) for `compactor.ModePeriodic`.
- Define [`embed.CompactorModeRevision`](https://godoc.org/github.com/coreos/etcd/embed#pkg-variables) for `compactor.ModeRevision`.
- Change [`embed.Config.CorsInfo` in `*cors.CORSInfo` type to `embed.Config.CORS` in `map[string]struct{}` type](https://github.com/coreos/etcd/pull/9490).
- Now logger is set up automatically based on [`embed.Config.Logger`, `embed.Config.LogOutputs`, `embed.Config.Debug` fields](https://github.com/coreos/etcd/pull/9572).
- Add [`embed.Config.Logger`](https://github.com/coreos/etcd/pull/9518) to support [structured logger `zap`](https://github.com/uber-go/zap) in server-side.
- Rename `embed.Config.SnapCount` field to [`embed.Config.SnapshotCount`](https://github.com/coreos/etcd/pull/9745), to be consistent with the flag name `etcd --snapshot-count`.
- Rename [**`embed.Config.LogOutput`** to **`embed.Config.LogOutputs`**](https://github.com/coreos/etcd/pull/9624) to support multiple log outputs.
- Change [**`embed.Config.LogOutputs`** type from `string` to `[]string`](https://github.com/coreos/etcd/pull/9579) to support multiple log outputs.
### Package `integration`
- Add [`CLUSTER_DEBUG` to enable test cluster logging](https://github.com/coreos/etcd/pull/9678).
- Deprecated `capnslog` in integration tests.
### client v3
- Add [`WithFragment` `OpOption`](https://github.com/coreos/etcd/pull/9291) to support [watch events fragmentation](https://github.com/coreos/etcd/issues/9294) when the total size of events exceeds `etcd --max-request-bytes` flag value plus gRPC-overhead 512 bytes.
- Watch fragmentation is disabled by default.
- The default server-side request bytes limit is `embed.DefaultMaxRequestBytes` which is 1.5 MiB plus gRPC-overhead 512 bytes.
- If watch response events exceed this server-side request limit and watch request is created with `fragment` field `true`, the server will split watch events into a set of chunks, each of which is a subset of watch events below server-side request limit.
- Useful when client-side has limited bandwidths.
- For example, watch response contains 10 events, where each event is 1 MiB. And server `etcd --max-request-bytes` flag value is 1 MiB. Then, server will send 10 separate fragmented events to the client.
- For example, watch response contains 5 events, where each event is 2 MiB. And server `etcd --max-request-bytes` flag value is 1 MiB and `clientv3.Config.MaxCallRecvMsgSize` is 1 MiB. Then, server will try to send 5 separate fragmented events to the client, and the client will error with `"code = ResourceExhausted desc = grpc: received message larger than max (...)"`.
- To manually trigger broadcasting watch progress event (empty watch response with latest header) to all associated watch streams.
- Think of it as `WithProgressNotify` that can be triggered manually.
- Fix [lease keepalive interval updates when response queue is full](https://github.com/coreos/etcd/pull/9952).
- If `<-chan *clientv3LeaseKeepAliveResponse` from `clientv3.Lease.KeepAlive` was never consumed or channel is full, client was [sending keepalive request every 500ms](https://github.com/coreos/etcd/issues/9911) instead of expected rate of every "TTL / 3" duration.
- Change [snapshot file permissions](https://github.com/coreos/etcd/pull/9977): On Linux, the snapshot file changes from readable by all (mode 0644) to readable by the user only (mode 0600).
### etcdctl v3
- Make [`ETCDCTL_API=3 etcdctl` default](https://github.com/coreos/etcd/issues/9600).
- Now, `etcdctl set foo bar` must be `ETCDCTL_API=2 etcdctl set foo bar`.
- Now, `ETCDCTL_API=3 etcdctl put foo bar` could be just `etcdctl put foo bar`.
- Fix [`etcdctl move-leader` command for TLS-enabled endpoints](https://github.com/coreos/etcd/pull/9807).
- Add [`progress` command to `etcdctl watch --interactive`](https://github.com/coreos/etcd/pull/9869).
- To manually trigger broadcasting watch progress event (empty watch response with latest header) to all associated watch streams.
- Think of it as `WithProgressNotify` that can be triggered manually.
### gRPC proxy
- Fix [etcd server panic from restore operation](https://github.com/coreos/etcd/pull/9775).
- Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, **etcd server panicked** during snapshot restore operation.
- Especially, gRPC proxy was affected, since it detects a leader loss with a key `"proxy-namespace__lostleader"` and a watch revision `"int64(math.MaxInt64 - 2)"`.
- Now, this server-side panic has been fixed.
### gRPC gateway
- Replace [gRPC gateway](https://github.com/grpc-ecosystem/grpc-gateway) endpoint `/v3beta` with [`/v3`](https://github.com/coreos/etcd/pull/9298).
- To deprecate [`/v3beta`](https://github.com/coreos/etcd/issues/9189) in v3.5.
- In v3.4, `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` still works as a fallback to `curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'`, but `curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` won't work in v3.5. Use `curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` instead.
- Add API endpoints [`/{v3beta,v3}/lease/leases, /{v3beta,v3}/lease/revoke, /{v3beta,v3}/lease/timetolive`](https://github.com/coreos/etcd/pull/9450).
- To deprecate [`/{v3beta,v3}/kv/lease/leases, /{v3beta,v3}/kv/lease/revoke, /{v3beta,v3}/kv/lease/timetolive`](https://github.com/coreos/etcd/issues/9430) in v3.5.
- Support [`etcd --cors`](https://github.com/coreos/etcd/pull/9490) in v3 HTTP requests (gRPC gateway).
### Package `raft`
- Fix [deadlock during PreVote migration process](https://github.com/coreos/etcd/pull/8525).
- Now [`(r *raft) Step` returns `raft.ErrProposalDropped`](https://github.com/coreos/etcd/pull/9137) if a proposal has been ignored.
- e.g. a node is removed from cluster, or [`raftpb.MsgProp` arrives at current leader while there is an ongoing leadership transfer](https://github.com/coreos/etcd/issues/8975).
- Improve [Raft `becomeLeader` and `stepLeader`](https://github.com/coreos/etcd/pull/9073) by keeping track of latest `pb.EntryConfChange` index.
- Previously record `pendingConf` boolean field scanning the entire tail of the log, which can delay hearbeat send.
- Fix [missing learner nodes on `(n *node) ApplyConfChange`](https://github.com/coreos/etcd/pull/9116).
### Tooling
- Add [`etcd-dump-logs --entry-type`](https://github.com/coreos/etcd/pull/9628) flag to support WAL log filtering by entry type.
- Add [`etcd-dump-logs --stream-decoder`](https://github.com/coreos/etcd/pull/9790) flag to support custom decoder.
### Go
- Require *Go 1.10+*.
- Compile with [*Go 1.10.3*](https://golang.org/doc/devel/release.html#go1.10).
Previous change logs can be found at [CHANGELOG-3.4](https://github.com/coreos/etcd/blob/master/CHANGELOG-3.4.md).
## v3.5.0 (TBD 2018-12)
See [code changes](https://github.com/coreos/etcd/compare/v3.4.0...v3.5.0) and [v3.5 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_5.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v3.5 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_3_5.md).**
### Breaking Changes
- [gRPC gateway](https://github.com/grpc-ecosystem/grpc-gateway) only supports [`/v3`](TODO) endpoint.
-`curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` does work in v3.5. Use `curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` instead.
- **`etcd --log-output` flag has been deprecated.** Use **`etcd --log-outputs`** instead.
- **`etcd --logger=zap --log-outputs=stderr`** is now the default.
- **`etcd --logger=capnslog` flag has been deprecated.**
- **`etcd --logger=zap --log-outputs=default` flag value is not supported.**.
- Instead, use `etcd --logger=zap --log-outputs=stderr`.
- Or, use `etcd --logger=zap --log-outputs=systemd/journal` to send logs to the local systemd journal.
- Previously, if etcd parent process ID (PPID) is 1 (e.g. run with systemd), `etcd --logger=capnslog --log-outputs=default` redirects server logs to local systemd journal. And if write to journald fails, it writes to `os.Stderr` as a fallback.
- However, even with PPID 1, it can fail to dial systemd journal (e.g. run embedded etcd with Docker container). Then, [every single log write will fail](https://github.com/coreos/etcd/pull/9729) and fall back to `os.Stderr`, which is inefficient.
- To avoid this problem, systemd journal logging must be configured manually.
- **`etcd --log-outputs=stderr`** is now the default.
- **`etcd --log-package-levels` flag for `capnslog` has been deprecated.** Now, **`etcd --logger=zap --log-outputs=stderr`** is the default.
- **`[CLIENT-URL]/config/local/log` endpoint has been deprecated, as is `etcd --log-package-levels` flag.**
-`curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` does work in v3.5. Use `curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}'` instead.
Previous change logs can be found at [CHANGELOG-3.x](https://github.com/coreos/etcd/blob/master/CHANGELOG-3.x.md).
Planned breaking changes.
## v4.0.0 (TBD)
See [code changes](https://github.com/coreos/etcd/compare/v3.3.0...v4.0.0) and [v4.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_4_0.md) for any breaking changes. **Again, before running upgrades from any previous release, please make sure to read change logs below and [v4.0 upgrade guide](https://github.com/coreos/etcd/blob/master/Documentation/upgrades/upgrade_4_0.md).**
### Breaking Changes
- [Secure etcd by default](https://github.com/coreos/etcd/issues/9475)?
- Change `/health` endpoint output.
- Previously, `{"health":"true"}`.
- Now, `{"health":true}`.
- Breaks [Kubernetes `kubectl get componentstatuses` command](https://github.com/kubernetes/kubernetes/issues/58240).
- Deprecate [`etcd --proxy*`](TODO) flags; **no more v2 proxy**.
- Deprecate [v2 storage backend](https://github.com/coreos/etcd/issues/9232); **no more v2 store**.
- v2 API is still supported via [v2 emulation](TODO).
- Deprecate [`etcdctl backup`](TODO) command.
-`clientv3.Client.KeepAlive(ctx context.Context, id LeaseID) (<-chan *LeaseKeepAliveResponse, error)` is now [`clientv4.Client.KeepAlive(ctx context.Context, id LeaseID) <-chan *LeaseKeepAliveResponse`](TODO).
- Similar to `Watch`, [`KeepAlive` does not return errors](https://github.com/coreos/etcd/issues/7488).
- If there's an unknown server error, kill all open channels and create a new stream on the next `KeepAlive` call.
- Rename `github.com/coreos/client` to `github.com/coreos/clientv2`.
@ -14,7 +14,7 @@ etcd is Apache 2.0 licensed and accepts contributions via GitHub pull requests.
## Reporting bugs and creating issues
Reporting bugs is one of the best ways to contribute. However, a good bug report has some very specific qualities, so please read over our short document on [reporting bugs](https://github.com/coreos/etcd/blob/master/Documentation/reporting_bugs.md) before submitting a bug report. This document might contain links to known issues, another good reason to take a look there before reporting a bug.
Reporting bugs is one of the best ways to contribute. However, a good bug report has some very specific qualities, so please read over our short document on [reporting bugs](https://github.com/etcd-io/etcd/blob/master/Documentation/reporting_bugs.md) before submitting a bug report. This document might contain links to known issues, another good reason to take a look there before reporting a bug.
## Contribution flow
@ -24,7 +24,7 @@ This is a rough outline of what a contributor's workflow looks like:
- Make commits of logical units.
- Make sure commit messages are in the proper format (see below).
- Push changes in a topic branch to a personal fork of the repository.
- Submit a pull request to coreos/etcd.
- Submit a pull request to etcd-io/etcd.
- The PR must receive a LGTM from two maintainers found in the MAINTAINERS file.
Thanks for contributing!
@ -42,9 +42,13 @@ questions: what changed and why. The subject line should feature the what and
the body of the commit should describe the why.
```
scripts: add the test-cluster command
etcdserver: add grpc interceptor to log info on incoming requests
this uses tmux to setup a test cluster that can easily be killed and started for debugging.
To improve debuggability of etcd v3. Added a grpc interceptor to log
info on incoming requests to etcd server. The log output includes
remote client info, request content (with value field redacted), request
handling latency, response size, etc. Uses zap logger if available,
otherwise uses capnslog.
Fixes #38
```
@ -52,11 +56,38 @@ Fixes #38
The format can be described more formally as follows:
```
<subsystem>: <what changed>
<package>: <what changed>
<BLANK LINE>
<why this change was made>
<BLANK LINE>
<footer>
```
The first line is the subject and should be no longer than 70 characters, the second line is always blank, and other lines should be wrapped at 80 characters. This allows the message to be easier to read on GitHub as well as in various git tools.
The first line is the subject and should be no longer than 70 characters, the second
line is always blank, and other lines should be wrapped at 80 characters. This allows
the message to be easier to read on GitHub as well as in various git tools.
### Pull request across multiple files and packages
If multiple files in a package are changed in a pull request for example:
```
etcdserver/config.go
etcdserver/corrupt.go
```
At the end of the review process if multiple commits exist for a single package they
should be squashed/rebased into a single commit before being merged.
```
etcdserver: <what changed>
[..]
```
If a pull request spans many packages these commits should be squashed/rebased into a single
commit using message with a more generic `*:` prefix.
*NOTE*: The watch features are under active development, and their memory usage may change as that development progresses. We do not expect it to significantly increase beyond the figures stated below.
Two components of etcd storage consume physical memory. The etcd process allocates an *in-memory index* to speed key lookup. The process's *page cache*, managed by the operating system, stores recently-accessed data from disk for quick re-use.
etcd v3 uses [gRPC][grpc] for its messaging protocol. The etcd project includes a gRPC-based [Go client][go-client] and a command line utility, [etcdctl][etcdctl], for communicating with an etcd cluster through gRPC. For languages with no gRPC support, etcd provides a JSON [gRPC gateway][grpc-gateway]. This gateway serves a RESTful proxy that translates HTTP/JSON requests into gRPC messages.
@ -18,6 +19,8 @@ gRPC gateway endpoint has changed since etcd v3.3:
- etcd v3.5 or later uses only `[CLIENT-URL]/v3/*`.
- **`[CLIENT-URL]/v3beta/*` is deprecated**.
gRPC-gateway does not support authentication using TLS Common Name.
### Put and get keys
Use the `/v3/kv/range` and `/v3/kv/put` services to read and write keys:
@ -35,6 +35,7 @@ This is a generated documentation. Please read the proto files for more.
| MemberRemove | MemberRemoveRequest | MemberRemoveResponse | MemberRemove removes an existing member from the cluster. |
| MemberUpdate | MemberUpdateRequest | MemberUpdateResponse | MemberUpdate updates the member configuration. |
| MemberList | MemberListRequest | MemberListResponse | MemberList lists all the members in the cluster. |
| MemberPromote | MemberPromoteRequest | MemberPromoteResponse | MemberPromote promotes a member from raft learner (non-voting) to raft voting member. |
@ -245,6 +246,7 @@ Empty field.
| ----- | ----------- | ---- |
| name | | string |
| password | | string |
| options | | authpb.UserAddOptions |
@ -607,6 +609,7 @@ Empty field.
| name | name is the human-readable name of the member. If the member is not started, the name will be an empty string. | string |
| peerURLs | peerURLs is the list of URLs the member exposes to the cluster for communication. | (slice of) string |
| clientURLs | clientURLs is the list of URLs the member exposes to clients for communication. If the member is not started, clientURLs will be empty. | (slice of) string |
| isLearner | isLearner indicates if the member is raft learner. | bool |
@ -615,6 +618,7 @@ Empty field.
| Field | Description | Type |
| ----- | ----------- | ---- |
| peerURLs | peerURLs is the list of URLs the added member will use to communicate with the cluster. | (slice of) string |
| isLearner | isLearner indicates if the added member is raft learner. | bool |
For the most part, the etcd project is stable, but we are still moving fast! We believe in the release fast philosophy. We want to get early feedback on features still in development and stabilizing. Thus, there are, and will be more, experimental features and APIs. We plan to improve these features based on the early feedback from the community, or abandon them if there is little interest, in the next few releases. Please do not rely on any experimental features or APIs in production environment.
## The current experimental API/features are:
- [KV ordering](https://godoc.org/github.com/coreos/etcd/clientv3/ordering) wrapper. When an etcd client switches endpoints, responses to serializable reads may go backward in time if the new endpoint is lagging behind the rest of the cluster. The ordering wrapper caches the current cluster revision from response headers. If a response revision is less than the cached revision, the client selects another endpoint and reissues the read. Enable in grpcproxy with `--experimental-serializable-ordering`.
- [KV ordering](https://godoc.org/github.com/etcd-io/etcd/clientv3/ordering) wrapper. When an etcd client switches endpoints, responses to serializable reads may go backward in time if the new endpoint is lagging behind the rest of the cluster. The ordering wrapper caches the current cluster revision from response headers. If a response revision is less than the cached revision, the client selects another endpoint and reissues the read. Enable in grpcproxy with `--experimental-serializable-ordering`.
etcd provides a gRPC resolver to support an alternative name system that fetches endpoints from etcd for discovering gRPC services. The underlying mechanism is based on watching updates to keys prefixed with the service name.
## Using etcd discovery with go-grpc
The etcd client provides a gRPC resolver for resolving gRPC endpoints with an etcd backend. The resolver is initialized with an etcd client and given a target for resolution:
The etcd client provides a gRPC resolver for resolving gRPC endpoints with an etcd backend. The resolver is initialized with an etcd client:
The etcd resolver treats all keys under the prefix of the resolution target following a "/" (e.g., "my-service/") with JSON-encoded go-grpc `naming.Update` values as potential service endpoints. Endpoints are added to the service by creating new keys and removed from the service by deleting keys.
The etcd resolver treats all keys under the prefix of the resolution target following a "/" (e.g., "foo/bar/my-service/")
with JSON-encoded (historically go-grpc `naming.Update`) values as potential service endpoints.
Endpoints are added to the service by creating new keys and removed from the service by deleting keys.
### Adding an endpoint
New endpoints can be added to the service through `etcdctl`:
```sh
ETCDCTL_API=3 etcdctl put my-service/1.2.3.4 '{"Addr":"1.2.3.4","Metadata":"..."}'
ETCDCTL_API=3 etcdctl put foo/bar/my-service/1.2.3.4 '{"Addr":"1.2.3.4","Metadata":"..."}'
```
The etcd client's `GRPCResolver.Update` method can also register new endpoints with a key matching the `Addr`:
The etcd client's `endpoints.Manager` method can also register new endpoints with a key matching the `Addr`:
Users mostly interact with etcd by putting or getting the value of a key. This section describes how to do that by using etcdctl, a command line tool for interacting with etcd server. The concepts described here should apply to the gRPC APIs or client library APIs.
By default, etcdctl talks to the etcd server with the v2 API for backward compatibility. For etcdctl to speak to etcd using the v3 API, the API version must be set to version 3 via the `ETCDCTL_API` environment variable. However note that any key that was created using the v2 API will not be able to be queried via the v3 API. A v3 API ```etcdctl get``` of a v2 key will exit with 0 and no key data, this is the expected behaviour.
The API version used by etcdctl to speak to etcd may be set to version `2` or `3` via the `ETCDCTL_API` environment variable. By default, etcdctl on master (3.4) uses the v3 API and earlier versions (3.3 and earlier) default to the v2 API.
Note that any key that was created using the v2 API will not be able to be queried via the v3 API. A v3 API ```etcdctl get``` of a v2 key will exit with 0 and no key data, this is the expected behaviour.
For testing and development deployments, the quickest and easiest way is to configure a local cluster. For a production deployment, refer to the [clustering][clustering] section.
@ -21,14 +23,7 @@ The running etcd member listens on `localhost:2379` for client requests.
Use `etcdctl` to interact with the running cluster:
1.Configure the environment to have `ETCDCTL_API=3` so `etcdctl` uses the etcd API version 3 instead of defaulting to version 2.
```
# use API version 3
$ export ETCDCTL_API=3
```
2. Store an example key-value pair in the cluster:
1.Store an example key-value pair in the cluster:
```
$ ./etcdctl put foo bar
@ -37,7 +32,7 @@ Use `etcdctl` to interact with the running cluster:
If OK is printed, storing key-value pair is successful.
3. Retrieve the value of `foo`:
2. Retrieve the value of `foo`:
```
$ ./etcdctl get foo
@ -70,14 +65,7 @@ A `Procfile` at the base of the etcd git repository is provided to easily config
Use `etcdctl` to interact with the running cluster:
1. Configure the environment to have `ETCDCTL_API=3` so `etcdctl` uses the etcd API version 3 instead of defaulting to version 2.
```
# use API version 3
$ export ETCDCTL_API=3
```
2. Print the list of members:
1. Print the list of members:
```
$ etcdctl --write-out=table --endpoints=localhost:2379 member list
@ -94,7 +82,7 @@ Use `etcdctl` to interact with the running cluster:
etcd uses the [capnslog][capnslog] library for logging application output categorized into *levels*. A log message's level is determined according to these conventions:
- Create new stable branch through `git push origin ${VERSION_MAJOR}.${VERSION_MINOR}` if this is a major stable release. This assumes `origin` corresponds to "https://github.com/coreos/etcd".
- Bump [hardcoded Version in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L30) to the version `${VERSION}+git`.
- Create new stable branch through `git push origin ${VERSION_MAJOR}.${VERSION_MINOR}` if this is a major stable release. This assumes `origin` corresponds to "https://github.com/etcd-io/etcd".
- Bump [hardcoded Version in the repository](https://github.com/etcd-io/etcd/blob/master/version/version.go#L30) to the version `${VERSION}+git`.
@ -9,3 +9,17 @@ Instructions for use are the same as the [kubernetes-mixin](https://github.com/k
## Background
* For more information about monitoring mixins, see this [design doc](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/edit#).
## Testing alerts
Make sure to have [jsonnet](https://jsonnet.org/) and [gojsontoyaml](https://github.com/brancz/gojsontoyaml) installed.
First compile the mixin to a YAML file, which the promtool will read:
message: 'Etcd cluster "{{ $labels.job }}": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}.',
message: 'etcd cluster "{{ $labels.job }}": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}.',
@ -22,7 +24,7 @@ A member's advertised peer URLs come from `--initial-advertise-peer-urls` on ini
### System requirements
Since etcd writes data to disk, SSD is highly recommended. To prevent performance degradation or unintentionally overloading the key-value store, etcd enforces a configurable storage size quota set to 2GB by default. To avoid swapping or running out of memory, the machine should have at least as much RAM to cover the quota. 8GB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it. At CoreOS, an etcd cluster is usually deployed on dedicated CoreOS Container Linux machines with dual-core processors, 2GB of RAM, and 80GB of SSD *at the very least*. **Note that performance is intrinsically workload dependent; please test before production deployment**. See [hardware][hardware-setup] for more recommendations.
Since etcd writes data to disk, its performance strongly depends on disk performance. For this reason, SSD is highly recommended. To assess whether a disk is fast enough for etcd, one possibility is using a disk benchmarking tool such as [fio][fio]. For an example on how to do that, read [here][fio-blog-post]. To prevent performance degradation or unintentionally overloading the key-value store, etcd enforces a configurable storage size quota set to 2GB by default. To avoid swapping or running out of memory, the machine should have at least as much RAM to cover the quota. 8GB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it. At CoreOS, an etcd cluster is usually deployed on dedicated CoreOS Container Linux machines with dual-core processors, 2GB of RAM, and 80GB of SSD *at the very least*. **Note that performance is intrinsically workload dependent; please test before production deployment**. See [hardware][hardware-setup] for more recommendations.
Most stable production environment is Linux operating system with amd64 architecture; see [supported platform][supported-platform] for more.
@ -70,7 +72,7 @@ etcdctl provides a `snapshot` command to create backups. See [backup][backup] fo
When replacing an etcd node, it's important to remove the member first and then add its replacement.
etcd employs distributed consensus based on a quorum model; (n+1)/2 members, a majority, must agree on a proposal before it can be committed to the cluster. These proposals include key-value updates and membership changes. This model totally avoids any possibility of split brain inconsistency. The downside is permanent quorum loss is catastrophic.
etcd employs distributed consensus based on a quorum model; (n/2)+1 members, a majority, must agree on a proposal before it can be committed to the cluster. These proposals include key-value updates and membership changes. This model totally avoids any possibility of split brain inconsistency. The downside is permanent quorum loss is catastrophic.
How this applies to membership: If a 3-member cluster has 1 downed member, it can still make forward progress because the quorum is 2 and 2 members are still live. However, adding a new member to a 3-member cluster will increase the quorum to 3 because 3 votes are required for a majority of 4 members. Since the quorum increased, this extra member buys nothing in terms of fault tolerance; the cluster is still one node failure away from being unrecoverable.
@ -106,7 +108,7 @@ To recover from the low space quota alarm:
This is gRPC-side warning when a server receives a TCP RST flag with client-side streams being prematurely closed. For example, a client closes its connection, while gRPC server has not yet processed all HTTP/2 frames in the TCP queue. Some data may have been lost in server side, but it is ok so long as client connection has already been closed.
Only [old versions of gRPC](https://github.com/grpc/grpc-go/issues/1362) log this. etcd [>=v3.2.13 by default log this with DEBUG level](https://github.com/coreos/etcd/pull/9080), thus only visible with `--debug` flag enabled.
Only [old versions of gRPC](https://github.com/grpc/grpc-go/issues/1362) log this. etcd [>=v3.2.13 by default log this with DEBUG level](https://github.com/etcd-io/etcd/pull/9080), thus only visible with `--debug` flag enabled.
## Performance
@ -130,7 +132,7 @@ If none of the above suggestions clear the warnings, please [open an issue][new_
etcd uses a leader-based consensus protocol for consistent data replication and log execution. Cluster members elect a single leader, all other members become followers. The elected leader must periodically send heartbeats to its followers to maintain its leadership. Followers infer leader failure if no heartbeats are received within an election interval and trigger an election. If a leader doesn’t send its heartbeats in time but is still running, the election is spurious and likely caused by insufficient resources. To catch these soft failures, if the leader skips two heartbeat intervals, etcd will warn it failed to send a heartbeat on time.
Usually this issue is caused by a slow disk. Before the leader sends heartbeats attached with metadata, it may need to persist the metadata to disk. The disk could be experiencing contention among etcd and other applications, or the disk is too simply slow (e.g., a shared virtualized disk). To rule out a slow disk from causing this warning, monitor [wal_fsync_duration_seconds][wal_fsync_duration_seconds] (p99 duration should be less than 10ms) to confirm the disk is reasonably fast. If the disk is too slow, assigning a dedicated disk to etcd or using faster disk will typically solve the problem.
Usually this issue is caused by a slow disk. Before the leader sends heartbeats attached with metadata, it may need to persist the metadata to disk. The disk could be experiencing contention among etcd and other applications, or the disk is too simply slow (e.g., a shared virtualized disk). To rule out a slow disk from causing this warning, monitor [wal_fsync_duration_seconds][wal_fsync_duration_seconds] (p99 duration should be less than 10ms) to confirm the disk is reasonably fast. If the disk is too slow, assigning a dedicated disk to etcd or using faster disk will typically solve the problem. To tell whether a disk is fast enough for etcd, a benchmarking tool such as [fio][fio] can be used. Read [here][fio-blog-post] for an example.
The second most common cause is CPU starvation. If monitoring of the machine’s CPU usage shows heavy utilization, there may not be enough compute capacity for etcd. Moving etcd to dedicated machine, increasing process resource isolation with cgroups, or renicing the etcd server process into a higher priority can usually solve the problem.
@ -147,15 +149,17 @@ etcd sends a snapshot of its complete key-value store to refresh slow followers
This document is meant to give an overview of the etcd3 API's central design. It is by no means all encompassing, but intended to focus on the basic ideas needed to understand etcd without the distraction of less common API calls. All etcd3 API's are defined in [gRPC services][grpc-service], which categorize remote procedure calls (RPCs) understood by the etcd server. A full listing of all etcd RPCs are documented in markdown in the [gRPC API listing][grpc-api].
etcd is a consistent and durable key value store with [mini-transaction][txn] support. The key value store is exposed through the KV APIs. etcd tries to ensure the strongest consistency and durability guarantees for a distributed system. This specification enumerates the KV API guarantees made by etcd.
etcd is designed to reliably store infrequently updated data and provide reliable watch queries. etcd exposes previous versions of key-value pairs to support inexpensive snapshots and watch history events (“time travel queries”). A persistent, multi-version, concurrency-control data model is a good fit for these use cases.
@ -26,7 +28,7 @@ The metadata for auth should also be stored and managed in the storage controlle
The authentication mechanism in the etcd v2 protocol has a tricky part because the metadata consistency should work as in the above, but does not: each permission check is processed by the etcd member that receives the client request (etcdserver/api/v2http/client.go), including follower members. Therefore, it's possible the check may be based on stale metadata.
This staleness means that auth configuration cannot be reflected as soon as operators execute etcdctl. Therefore there is no way to know how long the stale metadata is active. Practically, the configuration change is reflected immediately after the command execution. However, in some cases of heavy load, the inconsistent state can be prolonged and it might result in counter-intuitive situations for users and developers. It requires a workaround like this: https://github.com/coreos/etcd/pull/4317#issuecomment-179037582
This staleness means that auth configuration cannot be reflected as soon as operators execute etcdctl. Therefore there is no way to know how long the stale metadata is active. Practically, the configuration change is reflected immediately after the command execution. However, in some cases of heavy load, the inconsistent state can be prolonged and it might result in counter-intuitive situations for users and developers. It requires a workaround like this: https://github.com/etcd-io/etcd/pull/4317#issuecomment-179037582
### Inconsistent permissions are unsafe for linearized requests
*Gyuho Lee (github.com/gyuho, Amazon Web Services, Inc.), Joe Betz (github.com/jpbetz, Google Inc.)*
Introduction
============
etcd server has proven its robustness with years of failure injection testing. Most complex application logic is already handled by etcd server and its data stores (e.g. cluster membership is transparent to clients, with Raft-layer forwarding proposals to leader). Although server components are correct, its composition with client requires a different set of intricate protocols to guarantee its correctness and high availability under faulty conditions. Ideally, etcd server provides one logical cluster view of many physical machines, and client implements automatic failover between replicas. This documents client architectural decisions and its implementation details.
Glossary
========
*clientv3*: etcd Official Go client for etcd v3 API.
*clientv3-grpc1.0*: Official client implementation, with [`grpc-go v1.0.x`](https://github.com/grpc/grpc-go/releases/tag/v1.0.0), which is used in latest etcd v3.1.
*clientv3-grpc1.7*: Official client implementation, with [`grpc-go v1.7.x`](https://github.com/grpc/grpc-go/releases/tag/v1.7.0), which is used in latest etcd v3.2 and v3.3.
*clientv3-grpc1.23*: Official client implementation, with [`grpc-go v1.23.x`](https://github.com/grpc/grpc-go/releases/tag/v1.23.0), which is used in latest etcd v3.4.
*Balancer*: etcd client load balancer that implements retry and failover mechanism. etcd client should automatically balance loads between multiple endpoints.
*Endpoints*: A list of etcd server endpoints that clients can connect to. Typically, 3 or 5 client URLs of an etcd cluster.
*Pinned endpoint*: When configured with multiple endpoints, <= v3.3 client balancer chooses only one endpoint to establish a TCP connection, in order to conserve total open connections to etcd cluster. In v3.4, balancer round-robins pinned endpoints for every request, thus distributing loads more evenly.
*Client Connection*: TCP connection that has been established to an etcd server, via gRPC Dial.
*Sub Connection*: gRPC SubConn interface. Each sub-connection contains a list of addresses. Balancer creates a SubConn from a list of resolved addresses. gRPC ClientConn can map to multiple SubConn (e.g. example.com resolves to `10.10.10.1` and `10.10.10.2` of two sub-connections). etcd v3.4 balancer employs internal resolver to establish one sub-connection for each endpoint.
*Transient disconnect*: When gRPC server returns a status error of [`code Unavailable`](https://godoc.org/google.golang.org/grpc/codes#Code).
Client Requirements
===================
*Correctness*. Requests may fail in the presence of server faults. However, it never violates consistency guarantees: global ordering properties, never write corrupted data, at-most once semantics for mutable operations, watch never observes partial events, and so on.
*Liveness*. Servers may fail or disconnect briefly. Clients should make progress in either way. Clients should [never deadlock](https://github.com/etcd-io/etcd/issues/8980) waiting for a server to come back from offline, unless configured to do so. Ideally, clients detect unavailable servers with HTTP/2 ping and failover to other nodes with clear error messages.
*Effectiveness*. Clients should operate effectively with minimum resources: previous TCP connections should be [gracefully closed](https://github.com/etcd-io/etcd/issues/9212) after endpoint switch. Failover mechanism should effectively predict the next replica to connect, without wastefully retrying on failed nodes.
*Portability*. Official client should be clearly documented and its implementation be applicable to other language bindings. Error handling between different language bindings should be consistent. Since etcd is fully committed to gRPC, implementation should be closely aligned with gRPC long-term design goals (e.g. pluggable retry policy should be compatible with [gRPC retry](https://github.com/grpc/proposal/blob/master/A6-client-retries.md)). Upgrades between two client versions should be non-disruptive.
Client Overview
===============
etcd client implements the following components:
* balancer that establishes gRPC connections to an etcd cluster,
* API client that sends RPCs to an etcd server, and
* error handler that decides whether to retry a failed request or switch endpoints.
Languages may differ in how to establish an initial connection (e.g. configure TLS), how to encode and send Protocol Buffer messages to server, how to handle stream RPCs, and so on. However, errors returned from etcd server will be the same. So should be error handling and retry policy.
For example, etcd server may return `"rpc error: code = Unavailable desc = etcdserver: request timed out"`, which is transient error that expects retries. Or return `rpc error: code = InvalidArgument desc = etcdserver: key is not provided`, which means request was invalid and should not be retried. Go client can parse errors with `google.golang.org/grpc/status.FromError`, and Java client with `io.grpc.Status.fromThrowable`.
clientv3-grpc1.0: Balancer Overview
-----------------------------------
`clientv3-grpc1.0` maintains multiple TCP connections when configured with multiple etcd endpoints. Then pick one address and use it to send all client requests. The pinned address is maintained until the client object is closed (see *Figure 1*). When the client receives an error, it randomly picks another and retries.
`clientv3-grpc1.0` opening multiple TCP connections may provide faster balancer failover but requires more resources. The balancer does not understand node’s health status or cluster membership. So, it is possible that balancer gets stuck with one failed or partitioned node.
clientv3-grpc1.7: Balancer Overview
------------------------------------
`clientv3-grpc1.7` maintains only one TCP connection to a chosen etcd server. When given multiple cluster endpoints, a client first tries to connect to them all. As soon as one connection is up, balancer pins the address, closing others (see *Figure 2*). The pinned address is to be maintained until the client object is closed. An error, from server or client network fault, is sent to client error handler (see *Figure 3*).
The client error handler takes an error from gRPC server, and decides whether to retry on the same endpoint, or to switch to other addresses, based on the error code and message (see *Figure 4* and *Figure 5*).
Stream RPCs, such as Watch and KeepAlive, are often requested with no timeouts. Instead, client can send periodic HTTP/2 pings to check the status of a pinned endpoint; if the server does not respond to the ping, balancer switches to other endpoints (see *Figure 6*).
`clientv3-grpc1.7` balancer sends HTTP/2 keepalives to detect disconnects from streaming requests. It is a simple gRPC server ping mechanism and does not reason about cluster membership, thus unable to detect network partitions. Since partitioned gRPC server can still respond to client pings, balancer may get stuck with a partitioned node. Ideally, keepalive ping detects partition and triggers endpoint switch, before request time-out (see [etcd#8673](https://github.com/etcd-io/etcd/issues/8673) and *Figure 7*).
`clientv3-grpc1.7` balancer maintains a list of unhealthy endpoints. Disconnected addresses are added to “unhealthy” list, and considered unavailable until after wait duration, which is hard coded as dial timeout with default value 5-second. Balancer can have false positives on which endpoints are unhealthy. For instance, endpoint A may come back right after being blacklisted, but still unusable for next 5 seconds (see *Figure 8*).
`clientv3-grpc1.0` suffered the same problems above.
Upstream gRPC Go had already migrated to new balancer interface. For example, `clientv3-grpc1.7` underlying balancer implementation uses new gRPC balancer and tries to be consistent with old balancer behaviors. While its compatibility has been maintained reasonably well, etcd client still [suffered from subtle breaking changes](https://github.com/grpc/grpc-go/issues/1649). Furthermore, gRPC maintainer recommends to [not rely on the old balancer interface](https://github.com/grpc/grpc-go/issues/1942#issuecomment-375368665). In general, to get better support from upstream, it is best to be in sync with latest gRPC releases. And new features, such as retry policy, may not be backported to gRPC 1.7 branch. Thus, both etcd server and client must migrate to latest gRPC versions.
clientv3-grpc1.23: Balancer Overview
------------------------------------
`clientv3-grpc1.7` is so tightly coupled with old gRPC interface, that every single gRPC dependency upgrade broke client behavior. Majority of development and debugging efforts were devoted to fixing those client behavior changes. As a result, its implementation has become overly complicated with bad assumptions on server connectivities.
The primary goal of `clientv3-grpc1.23` is to simplify balancer failover logic; rather than maintaining a list of unhealthy endpoints, which may be stale, simply roundrobin to the next endpoint whenever client gets disconnected from the current endpoint. It does not assume endpoint status. Thus, no more complicated status tracking is needed (see *Figure 8* and above). Upgrading to `clientv3-grpc1.23` should be no issue; all changes were internal while keeping all the backward compatibilities.
Internally, when given multiple endpoints, `clientv3-grpc1.23` creates multiple sub-connections (one sub-connection per each endpoint), while `clientv3-grpc1.7` creates only one connection to a pinned endpoint (see *Figure 9*). For instance, in 5-node cluster, `clientv3-grpc1.23` balancer would require 5 TCP connections, while `clientv3-grpc1.7` only requires one. By preserving the pool of TCP connections, `clientv3-grpc1.23` may consume more resources but provide more flexible load balancer with better failover performance. The default balancing policy is round robin but can be easily extended to support other types of balancers (e.g. power of two, pick leader, etc.). `clientv3-grpc1.23` uses gRPC resolver group and implements balancer picker policy, in order to delegate complex balancing work to upstream gRPC. On the other hand, `clientv3-grpc1.7` manually handles each gRPC connection and balancer failover, which complicates the implementation. `clientv3-grpc1.23` implements retry in the gRPC interceptor chain that automatically handles gRPC internal errors and enables more advanced retry policies like backoff, while `clientv3-grpc1.7` manually interprets gRPC errors for retries.
Improvements can be made by caching the status of each endpoint. For instance, balancer can ping each server in advance to maintain a list of healthy candidates, and use this information when doing round-robin. Or when disconnected, balancer can prioritize healthy endpoints. This may complicate the balancer implementation, thus can be addressed in later versions.
Client-side keepalive ping still does not reason about network partitions. Streaming request may get stuck with a partitioned node. Advanced health checking service need to be implemented to understand the cluster membership (see [etcd#8673](https://github.com/etcd-io/etcd/issues/8673) for more detail).
Currently, retry logic is handled manually as an interceptor. This may be simplified via [official gRPC retries](https://github.com/grpc/proposal/blob/master/A6-client-retries.md).
*Gyuho Lee (github.com/gyuho, Amazon Web Services, Inc.), Joe Betz (github.com/jpbetz, Google Inc.)*
Background
==========
Membership reconfiguration has been one of the biggest operational challenges. Let’s review common challenges.
### 1. New Cluster member overloads Leader
A newly joined etcd member starts with no data, thus demanding more updates from leader until it catches up with leader’s logs. Then leader’s network is more likely to be overloaded, blocking or dropping leader heartbeats to followers. In such case, a follower may election-timeout to start a new leader election. That is, a cluster with a new member is more vulnerable to leader election. Both leader election and the subsequent update propagation to the new member are prone to causing periods of cluster unavailability (see *Figure 1*).
What if network partition happens? It depends on leader partition. If the leader still maintains the active quorum, the cluster would continue to operate (see *Figure 2*).
What if the leader becomes isolated from the rest of the cluster? Leader monitors progress of each follower. When leader loses connectivity from the quorum, it reverts back to follower which will affect the cluster availability (see *Figure 3*).
When a new node is added to 3 node cluster, the cluster size becomes 4 and the quorum size becomes 3. What if a new node had joined the cluster, and then network partition happens? It depends on which partition the new member gets located after partition.
#### 2.2 Cluster Split 3+1
If the new node happens to be located in the same partition as leader’s, the leader still maintains the active quorum of 3. No leadership election happens, and no cluster availability gets affected (see *Figure 4*).
If the cluster is 2-and-2 partitioned, then neither of partition maintains the quorum of 3. In this case, leadership election happens (see *Figure 5*).
What if network partition happens first, and then a new member gets added? A partitioned 3-node cluster already has one disconnected follower. When a new member is added, the quorum changes from 2 to 3. Now, this cluster has only 2 active nodes out 4, thus losing quorum and starting a new leadership election (see *Figure 6*).
Since member add operation can change the size of quorum, it is always recommended to “member remove” first to replace an unhealthy node.
Adding a new member to a 1-node cluster changes the quorum size to 2, immediately causing a leader election when the previous leader finds out quorum is not active. This is because “member add” operation is a 2-step process where user needs to apply “member add” command first, and then starts the new node process (see *Figure 7*).
An even worse case is when an added member is misconfigured. Membership reconfiguration is a two-step process: “etcdctl member add” and starting an etcd server process with the given peer URL. That is, “member add” command is applied regardless of URL, even when the URL value is invalid. If the first step is applied with invalid URLs, the second step cannot even start the new etcd. Once the cluster loses quorum, there is no way to revert the membership change (see *Figure 8*).
Same applies to a multi-node cluster. For example, the cluster has two members down (one is failed, the other is misconfigured) and two members up, but now it requires at least 3 votes to change the cluster membership (see *Figure 9*).
As seen above, a simple misconfiguration can fail the whole cluster into an inoperative state. In such case, an operator need manually recreate the cluster with `etcd --force-new-cluster` flag. As etcd has become a mission-critical service for Kubernetes, even the slightest outage may have significant impact on users. What can we better to make etcd such operations easier? Among other things, leader election is most critical to cluster availability: Can we make membership reconfiguration less disruptive by not changing the size of quorum? Can a new node be idle, only requesting the minimum updates from leader, until it catches up? Can membership misconfiguration be always reversible and handled in a more secure way (wrong member add command run should never fail the cluster)? Should an user worry about network topology when adding a new member? Can member add API work regardless of the location of nodes and ongoing network partitions?
Raft Learner
============
In order to mitigate such availability gaps in the previous section, [Raft §4.2.1](https://github.com/ongardie/dissertation/blob/master/stanford.pdf) introduces a new node state “Learner”, which joins the cluster as a **non-voting member** until it catches up to leader’s logs.
Features in v3.4
----------------
An operator should do the minimum amount of work possible to add a new learner node. `member add --learner` command to add a new learner, which joins cluster as a non-voting member but still receives all data from leader (see *Figure 10*).
When a learner has caught up with leader’s progress, the learner can be promoted to a voting member using `member promote` API, which then counts towards the quorum (see *Figure 11*).
etcd server validates promote request to ensure its operational safety. Only after its log has caught up to leader’s can learner be promoted to a voting member (see *Figure 12*).
Learner only serves as a standby node until promoted: Leadership cannot be transferred to learner. Learner rejects client reads and writes (client balancer should not route requests to learner). Which means learner does not need issue Read Index requests to leader. Such limitation simplifies the initial learner implementation in v3.4 release (see *Figure 13*).
In addition, etcd limits the total number of learners that a cluster can have, and avoids overloading the leader with log replication. Learner never promotes itself. While etcd provides learner status information and safety checks, cluster operator must make the final decision whether to promote learner or not.
Features in v3.5
----------------
*Make learner state only and default*: Defaulting a new member state to learner will greatly improve membership reconfiguration safety, because learner does not change the size of quorum. Misconfiguration will always be reversible without losing the quorum.
*Make voting-member promotion fully automatic*: Once a learner catches up to leader’s logs, a cluster can automatically promote the learner. etcd requires certain thresholds to be defined by the user, and once the requirements are satisfied, learner promotes itself to a voting member. From a user’s perspective, “member add” command would work the same way as today but with greater safety provided by learner feature.
*Make learner standby failover node*: A learner joins as a standby node, and gets automatically promoted when the cluster availability is affected.
*Make learner read-only*: A learner can serve as a read-only node that never gets promoted. In a weak consistency mode, learner only receives data from leader and never process writes. Serving reads locally without consensus overhead would greatly decrease the workloads to leader but may serve stale data. In a strong consistency mode, learner requests read index from leader to serve latest data, but still rejects writes.
Learner vs. Mirror Maker
========================
etcd implements “mirror maker” using watch API to continuously relay key creates and updates to a separate cluster. Mirroring usually has low latency overhead once it completes initial synchronization. Learner and mirroring overlap in that both can be used to replicate existing data for read-only. However, mirroring does not guarantee linearizability. During network disconnects, previous key-values might have been discarded, and clients are expected to verify watch responses for correct ordering. Thus, there is no ordering guarantee in mirror. Use mirror for minimum latency (e.g. cross data center) at the costs of consistency. Use learner to retain all historical data and its ordering.
Appendix: Learner Implementation in v3.4
========================================
*Expose "Learner" node type to "MemberAdd" API.*
etcd client adds a flag to “MemberAdd” API for learner node. And etcd server handler applies membership change entry with `pb.ConfChangeAddLearnerNode` type. Once the command has been applied, a server joins the cluster with `etcd --initial-cluster-state=existing` flag. This learner node can neither vote nor count as quorum.
etcd server must not transfer leadership to learner, since it may still lag behind and does not count as quorum. etcd server limits the number of learners that cluster can have to one: the more learners we have, the more data the leader has to propagate. Clients may talk to learner node, but learner rejects all requests other than serializable read and member status API. This is for simplicity of initial implementation. In the future, learner can be extended as a read-only server that continuously mirrors cluster data. Client balancer must provide helper function to exclude learner node endpoint. Otherwise, request sent to learner may fail. Client sync member call should factor into learner node type. So should client endpoints update call.
`MemberList` and `MemberStatus` responses should indicate which node is learner.
*Add "MemberPromote" API.*
Internally in Raft, second `MemberAdd` call to learner node promotes it to a voting member. Leader maintains the progress of each follower and learner. If learner has not completed its snapshot message, reject promote request. Only accept promote request if and only if: The learner node is in a healthy state. The learner is in sync with leader or the delta is within the threshold (e.g. the number of entries to replicate to learner is less than 1/10 of snapshot count, which means it is less likely that even after promotion leader would not need send snapshot to the learner). All these logic are hard-coded in `etcdserver` package and not configurable.
Reference
=========
- Original github issue: [etcd#9161](https://github.com/etcd-io/etcd/issues/9161)
- Use case: [etcd#3715](https://github.com/etcd-io/etcd/issues/3715)
- Use case: [etcd#8888](https://github.com/etcd-io/etcd/issues/8888)
- Use case: [etcd#10114](https://github.com/etcd-io/etcd/issues/10114)
The name "etcd" originated from two ideas, the unix "/etc" folder and "d"istributed systems. The "/etc" folder is a place to store configuration data for a single system whereas etcd stores configuration information for large scale distributed systems. Hence, a "d"istributed "/etc" is "etcd".
@ -76,18 +78,18 @@ In theory, it’s possible to build these primitives atop any storage systems pr
For distributed coordination, choosing etcd can help prevent operational headaches and save engineering effort.
etcd uses [Prometheus][prometheus] for metrics reporting. The metrics can be used for real-time monitoring and debugging. etcd does not persist its metrics; if a member restarts, the metrics will be reset.
@ -99,7 +101,7 @@ Abnormally high snapshot duration (`snapshot_save_total_duration_seconds`) indic
## Prometheus supplied metrics
The Prometheus client library provides a number of metrics under the `go` and `process` namespaces. There are a few that are particlarly interesting.
The Prometheus client library provides a number of metrics under the `go` and `process` namespaces. There are a few that are particularly interesting.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.