Commit Graph

16282 Commits

Author SHA1 Message Date
edaac6e2a9 CHANGELOG: add v3.3.24 release dates
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-18 09:30:27 -07:00
e37b28bd28 CHANGELOG: add v3.4.11
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-18 09:23:24 -07:00
OG
0526f461e1 Doc: Update curl command to fix 400 Bad Request (#11911) 2020-08-16 16:12:39 -07:00
0e4ba37b6c Merge pull request #12193 from mitake/integration_test
test: avoid non existing package for integration test
2020-08-15 23:26:34 -07:00
d35933c351 Merge pull request #12221 from wenjiaswe/changelog_12215
CHANGELOG: update from 12215
2020-08-14 13:29:40 -07:00
32982ef469 CHANGELOG: update from 12215
Change-Id: I17e076554a56c95fabd95af111eccd8d7409966b
2020-08-14 13:18:02 -07:00
92f9e6eba2 Merge pull request #12216 from jingyih/experimental_flag_for_watch_notify_interval
*: add experimental flag for watch notify interval
2020-08-14 12:47:43 -07:00
799b16c2d1 CHANGELOG: update for PR12216 2020-08-14 12:06:38 -07:00
9a698476bf *: add experimental flag for watch notify interval 2020-08-14 12:01:00 -07:00
06f89cc4f8 Merge pull request #12212 from gyuho/logger
*: upgrade zap logger to 1.15, replace global logger
2020-08-13 09:46:44 -07:00
93cf449205 Merge pull request #12214 from gyuho/fd
*: optimize runtime.FDUsage + add OS level FD metrics
2020-08-12 18:37:05 -07:00
5678779665 CHANGELOG: update
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 10:32:27 -07:00
421df2ecbb etcdserver: add OS level FD metrics
Similar counts are exposed via Prometheus.
This adds the one that are perceived by etcd server.

e.g.

os_fd_limit 120000
os_fd_used 14
process_cpu_seconds_total 0.31
process_max_fds 120000
process_open_fds 17

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 10:32:27 -07:00
53fdcdc5a2 pkg/runtime: optimize FDUsage by removing sort
No need sort when we just want the counts.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 10:32:24 -07:00
d8ed233791 CHANGELOG: update
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 09:50:00 -07:00
7eac6bd497 *: upgrade zap logger to 1.15, replace global logger
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2020-08-12 09:50:00 -07:00
ed27d9d2de Merge pull request #12198 from ptabor/20200803-int-to-string-test-fix
etcdserver, wal: Fix tests unintended CASTing of int->String.
2020-08-11 21:35:20 -07:00
8c44d25f2a Merge pull request #12211 from tangcong/ignore-errcompacted
etcdserver: ignore ErrCompacted error
2020-08-11 21:34:38 -07:00
fe36be2251 Merge pull request #12195 from tangcong/optimize-healthcheck
*: check health by using v3 range request and its corresponding timeout
2020-08-11 21:32:44 -07:00
8a4c7751d8 CHANGELOG: update for 12195 2020-08-12 08:10:13 +08:00
18adf55c92 Merge pull request #12199 from ptabor/20200803-expect-replace-fix
tests/e2e: Update github.com/creack/pty v1.1.7 -> v1.1.11
2020-08-11 11:59:12 -07:00
844091dda3 Merge pull request #12206 from ptabor/20200807-setLoggingDataRace
integration: Fix flakes due to .setupLogging race.
2020-08-11 11:58:47 -07:00
afa0e8196c etcdserver: ignore mvcc.ErrCompacted error 2020-08-11 23:45:20 +08:00
cd25d6c06e Merge pull request #12130 from ptabor/master
functional/tester: Update cluster_test.go to reflect functional.yaml
2020-08-09 02:14:31 +08:00
9d182c2a70 clientv3/integration: Fix flaky TestGetTokenWithoutAuth (#12200)
The test is vary flaky on Travis.

Seems that since (https://github.com/etcd-io/etcd/issues/7724) the
client is expected to simply ignore whether server is in AuthDisabled
mode even if the user supplies credentials.

The tests used to:
  * use very large cluster (10 nodes)
  * set very low timeout (1 sec)

Such setup led to frequent deadlineExceed errors or following failures:

    === RUN   TestGetTokenWithoutAuth
    {"level":"warn","ts":"2020-08-04T16:50:48.686+0200","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-35573307-1ee5-441b-acc7-d073f0bd7de5/localhost:69820396562031027440","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: leader changed"}
        user_test.go:151: other errors:etcdserver: leader changed
    --- FAIL: TestGetTokenWithoutAuth (10.91s)
2020-08-07 13:32:32 -07:00
830618e44d ./integration: Fix flakes due to .setupLogging race.
The source of problem was the fact that multiple tests were creating
their clusters (and some of them were setting global grpclog).
If the test was running after some other test that created HttpServer
(so accessed grpclog), this was reported as race.

Tested with:
  go test ./clientv3/. -v "--run=(Example).*" --count=2
  go test ./clientv3/. -v "--run=(Test).*" --count=2
  go test ./integration/embed/. -v "--run=(Test).*" --count=2
2020-08-07 13:54:41 +02:00
f395f82a75 Merge pull request #12202 from spzala/auditchangelog
CHANGELOG: update with added audit report
2020-08-05 08:56:16 -07:00
eafd374309 CHANGELOG: update with added audit report
Update the changelog for recently added audit report. Also mention the
report in the security readme.
2020-08-04 22:44:48 -04:00
d29af0f22b Merge pull request #12201 from spzala/audit
Add audit report
2020-08-04 20:25:24 -04:00
4edd2679f0 doc: add audit report
Adding audit report.
2020-08-04 19:07:18 -04:00
00de56a4a4 etcdserver, wal: Fix tests that were performing unintended casting of int to String.
Fixes following problems during "./etcd# go test ./..."

> go.etcd.io/etcd/v3/etcdserver/api/v2store_test
etcdserver/api/v2store/store_test.go:847:24: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)

> go.etcd.io/etcd/v3/wal
wal/wal_test.go:242:68: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
2020-08-04 20:28:41 +02:00
b151a47d1b tests/e2e: Update github.com/creack/pty v1.1.7 -> v1.1.11
The fix is needed to mitigate consequences of
https://github.com/golang/go/issues/29458 "golang breaking change" that
causes following test failures on etcd end:

--- FAIL: TestCtlV2Set (0.00s)
    ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetQuorum (0.00s)
    ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
--- FAIL: TestCtlV2SetClientTLS (0.00s)
    ctl_v2_test.go:552: could not start etcd process cluster (fork/exec ../../bin/etcd: Setctty set but Ctty not valid in child)
2020-08-04 16:12:12 +02:00
36f07cb591 functional/tester: Update cluster_test.go to reflect current content of functional.yaml.
Before the change:

.../etcd/functional% go test -v ./...
...
?   	go.etcd.io/etcd/v3/functional/runner	[no test files]
=== RUN   Test_read
{"level":"info","ts":1596018672.2852418,"caller":"tester/cluster_read_config.go:36","msg":"opened configuration file","path":"../../functional.yaml"}
    Test_read: cluster_test.go:269: expected &{lg:<nil> agentConns:[] agentClients:[] agentStreams:[] agentRequests:[] testerHTTPServer:<nil> Members:[EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:19027" FailpointHTTPAddr:"http://127.0.0.1:7381" BaseDir:"/tmp/etcd-functional-1" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:1379" Etcd:<Name:"s1" DataDir:"/tmp/etcd-functional-1/etcd.data" WALDir:"/tmp/etcd-functional-1/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:1379" AdvertiseClientURLs:"https://127.0.0.1:1379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:1380" AdvertisePeerURLs:"https://127.0.0.1:1381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:10000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-1/etcd.log" > SnapshotPath:"/tmp/etcd-functional-1.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:29027" FailpointHTTPAddr:"http://127.0.0.1:7382" BaseDir:"/tmp/etcd-functional-2" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:2379" Etcd:<Name:"s2" DataDir:"/tmp/etcd-functional-2/etcd.data" WALDir:"/tmp/etcd-functional-2/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:2379" AdvertiseClientURLs:"https://127.0.0.1:2379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:2380" AdvertisePeerURLs:"https://127.0.0.1:2381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:10000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-2/etcd.log" > SnapshotPath:"/tmp/etcd-functional-2.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:39027" FailpointHTTPAddr:"http://127.0.0.1:7383" BaseDir:"/tmp/etcd-functional-3" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:3379" Etcd:<Name:"s3" DataDir:"/tmp/etcd-functional-3/etcd.data" WALDir:"/tmp/etcd-functional-3/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:3379" AdvertiseClientURLs:"https://127.0.0.1:3379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:3380" AdvertisePeerURLs:"https://127.0.0.1:3381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:10000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-3/etcd.log" > SnapshotPath:"/tmp/etcd-functional-3.snapshot.db" ] Tester:DataDir:"/tmp/etcd-tester-data" Network:"tcp" Addr:"127.0.0.1:9028" DelayLatencyMs:5000 DelayLatencyMsRv:500 UpdatedDelayLatencyMs:5000 RoundLimit:1 ExitOnCaseFail:true EnablePprof:true CaseDelayMs:7000 CaseShuffle:true Cases:"SIGTERM_ONE_FOLLOWER" Cases:"SIGTERM_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_LEADER" Cases:"SIGTERM_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_QUORUM" Cases:"SIGTERM_ALL" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_QUORUM" Cases:"BLACKHOLE_PEER_PORT_TX_RX_ALL" Cases:"DELAY_PEER_PORT_TX_RX_LEADER" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER" Cases:"DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"DELAY_PEER_PORT_TX_RX_ALL" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_ALL" Cases:"NO_FAIL_WITH_STRESS" Cases:"NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS" FailpointCommands:"panic(\"etcd-tester\")" RunnerExecPath:"./bin/etcd-runner" Stressers:<Type:"KV_WRITE_SMALL" Weight:0.35 > Stressers:<Type:"KV_WRITE_LARGE" Weight:0.002 > Stressers:<Type:"KV_READ_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_READ_RANGE" Weight:0.07 > Stressers:<Type:"KV_DELETE_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_DELETE_RANGE" Weight:0.07 > Stressers:<Type:"KV_TXN_WRITE_DELETE" Weight:0.35 > Stressers:<Type:"LEASE" > Checkers:"KV_HASH" Checkers:"LEASE_EXPIRE" StressKeySize:100 StressKeySizeLarge:32769 StressKeySuffixRange:250000 StressKeySuffixRangeTxn:100 StressKeyTxnOps:10 StressClients:100 StressQPS:2000  cases:[] rateLimiter:<nil> stresser:<nil> checkers:[] currentRevision:0 rd:0 cs:0}, got &{lg:<nil> agentConns:[] agentClients:[] agentStreams:[] agentRequests:[] testerHTTPServer:<nil> Members:[EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:19027" FailpointHTTPAddr:"http://127.0.0.1:7381" BaseDir:"/tmp/etcd-functional-1" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:1379" Etcd:<Name:"s1" DataDir:"/tmp/etcd-functional-1/etcd.data" WALDir:"/tmp/etcd-functional-1/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:1379" AdvertiseClientURLs:"https://127.0.0.1:1379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:1380" AdvertisePeerURLs:"https://127.0.0.1:1381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:2000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-1/etcd.log" LogLevel:"info" > SnapshotPath:"/tmp/etcd-functional-1.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:29027" FailpointHTTPAddr:"http://127.0.0.1:7382" BaseDir:"/tmp/etcd-functional-2" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:2379" Etcd:<Name:"s2" DataDir:"/tmp/etcd-functional-2/etcd.data" WALDir:"/tmp/etcd-functional-2/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:2379" AdvertiseClientURLs:"https://127.0.0.1:2379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:2380" AdvertisePeerURLs:"https://127.0.0.1:2381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:2000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-2/etcd.log" LogLevel:"info" > SnapshotPath:"/tmp/etcd-functional-2.snapshot.db"  EtcdExec:"./bin/etcd" AgentAddr:"127.0.0.1:39027" FailpointHTTPAddr:"http://127.0.0.1:7383" BaseDir:"/tmp/etcd-functional-3" EtcdPeerProxy:true EtcdClientEndpoint:"127.0.0.1:3379" Etcd:<Name:"s3" DataDir:"/tmp/etcd-functional-3/etcd.data" WALDir:"/tmp/etcd-functional-3/etcd.data/member/wal" HeartbeatIntervalMs:100 ElectionTimeoutMs:1000 ListenClientURLs:"https://127.0.0.1:3379" AdvertiseClientURLs:"https://127.0.0.1:3379" ClientAutoTLS:true ListenPeerURLs:"https://127.0.0.1:3380" AdvertisePeerURLs:"https://127.0.0.1:3381" PeerAutoTLS:true InitialCluster:"s1=https://127.0.0.1:1381,s2=https://127.0.0.1:2381,s3=https://127.0.0.1:3381" InitialClusterState:"new" InitialClusterToken:"tkn" SnapshotCount:2000 QuotaBackendBytes:10740000000 PreVote:true InitialCorruptCheck:true Logger:"zap" LogOutputs:"/tmp/etcd-functional-3/etcd.log" LogLevel:"info" > SnapshotPath:"/tmp/etcd-functional-3.snapshot.db" ] Tester:DataDir:"/tmp/etcd-tester-data" Network:"tcp" Addr:"127.0.0.1:9028" DelayLatencyMs:5000 DelayLatencyMsRv:500 UpdatedDelayLatencyMs:5000 RoundLimit:1 ExitOnCaseFail:true EnablePprof:true CaseDelayMs:7000 CaseShuffle:true Cases:"SIGTERM_ONE_FOLLOWER" Cases:"SIGTERM_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_LEADER" Cases:"SIGTERM_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"SIGTERM_QUORUM" Cases:"SIGTERM_ALL" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER" Cases:"SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER" Cases:"BLACKHOLE_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"BLACKHOLE_PEER_PORT_TX_RX_QUORUM" Cases:"BLACKHOLE_PEER_PORT_TX_RX_ALL" Cases:"DELAY_PEER_PORT_TX_RX_LEADER" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER" Cases:"DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT" Cases:"DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_QUORUM" Cases:"DELAY_PEER_PORT_TX_RX_ALL" Cases:"RANDOM_DELAY_PEER_PORT_TX_RX_ALL" Cases:"NO_FAIL_WITH_STRESS" Cases:"NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS" FailpointCommands:"panic(\"etcd-tester\")" RunnerExecPath:"./bin/etcd-runner" Stressers:<Type:"KV_WRITE_SMALL" Weight:0.35 > Stressers:<Type:"KV_WRITE_LARGE" Weight:0.002 > Stressers:<Type:"KV_READ_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_READ_RANGE" Weight:0.07 > Stressers:<Type:"KV_DELETE_ONE_KEY" Weight:0.07 > Stressers:<Type:"KV_DELETE_RANGE" Weight:0.07 > Stressers:<Type:"KV_TXN_WRITE_DELETE" Weight:0.35 > Stressers:<Type:"LEASE" > Checkers:"KV_HASH" Checkers:"LEASE_EXPIRE" StressKeySize:100 StressKeySizeLarge:32769 StressKeySuffixRange:250000 StressKeySuffixRangeTxn:100 StressKeyTxnOps:10 StressClients:100 StressQPS:2000  cases:[] rateLimiter:<nil> stresser:<nil> checkers:[] currentRevision:0 rd:0 cs:0}
--- FAIL: Test_read (0.00s)
FAIL
FAIL	go.etcd.io/etcd/v3/functional/tester	0.050s
...

After:
go test -v ../...

...
=== RUN   Test_read
{"level":"info","ts":1596018561.408186,"caller":"tester/cluster_read_config.go:36","msg":"opened configuration file","path":"../../functional.yaml"}
{"level":"info","ts":1596018561.408739,"caller":"tester/cluster_shuffle.go:35","msg":"shuffled test failure cases","total":22}
{"level":"info","ts":1596018561.4087567,"caller":"tester/cluster_shuffle.go:35","msg":"shuffled test failure cases","total":22}
--- PASS: Test_read (0.00s)
PASS

...
2020-08-04 15:47:41 +02:00
25220a0287 *: check health by using v3 range request and its corresponding timeout 2020-08-04 00:34:18 +08:00
76539cee57 test: avoid non existing package for integration test 2020-08-03 00:13:47 +09:00
1af6d61a1c Merge pull request #12177 from ironcladlou/etcdmembersdown-tweak
Documentation: Further improve etcdMembersDown alert
2020-07-31 15:06:52 -04:00
cd3df73944 Documentation: Further improve etcdMembersDown alert
Before this change, the default window for the etcdMembersDown network failure
rate function was recently changed to 1 minute. While this helps detect a etcd
recovery more quickly, it depends on scrape intervals of <= 15s to collect
sufficient data points for the rate function. In practice, an interval of >= 30s
is more typical, which causes the rate function to be less accurate.

This patch increases the window to 2m, which is a compromise between the
original value of 3m and the 1m change introuced with 2aa5684, and should
accomodate more typical scrape intervals.

To offset the window change and to further improve the chance that the alert
will only fire when etcd is truly dead, this patch changes the `for` clause from
3m to 10m. The rationale is as follows:

1. There can be significant variance in durations following a reboot before etcd
is scraped and detected as available.

2. A conservative trigger like 10m seems less likely to produce a false alarm in
the face of such variance.

3. In this alerting situation, if the outage is real, it seems unlikely that an
additional 7 minutes of delay before (for example) paging somebody will make a
significant impact on the overall response.
2020-07-31 09:26:46 -04:00
cc564110bd clientv3: remove excessive watch cancel logging (#12187) 2020-07-29 14:58:53 -07:00
6c81b20ec8 rafthttp: fix streamHandle outgoingConn peerID (#12179) 2020-07-28 14:41:10 -07:00
bc67babee8 package adt: rename the filename to be consistent with the package name (#12170) 2020-07-28 14:40:34 -07:00
9006d8d4f9 Documentation/learning/lock/client: Add defer Unlock (#11802) 2020-07-26 11:22:19 -07:00
51de68ddac 12126: snapshot: corrupted in Embedded server (#12129) 2020-07-26 11:14:46 -07:00
Jay
26b89fd418 raft: don't campaign with pending snapshot (#12163)
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-26 00:04:46 -07:00
c9a5889915 Documentation/etcd-mixin: Reformulate alerting rules to use without rather than by (#12122)
* etcd-mixin: Reformulate alerting rules to use `without` rather than `by`

With aggregations using `by`, all additional target labels that a user
might have configured, are aggregated away. However, those target
labels are useful for e.g. alert routing. With this commit, nothing
should change for vanilla job/instance target labels, but whoever has
more target labels can now still make use of them.

Signed-off-by: beorn7 <beorn@grafana.com>

* etcd-mixin: Parametrize instance labels to aggregate away

Signed-off-by: beorn7 <beorn@grafana.com>
2020-07-23 16:02:26 -07:00
Jay
d0e4fe56a5 raft: check pending conf change before campaign (#12134)
* raft: check conf change before campaign

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

* raft: extract hup function

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

* raft: check pending conf change for transferleader

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-22 17:04:48 -07:00
772dfbfe35 wal: Fix format error avoid using reflect.DeepEqual with errors (#12118) 2020-07-20 19:49:57 -07:00
e9d16c2b62 etcdserverpb: add TestInvalidGoTypeIntPanic test (#12116)
Tested conditions that cause
panic: invalid Go type int for field k8s_io.kubernetes.vendor.go_etcd_io.etcd.etcdserver.etcdserverpb.loggablePutRequest.value_size

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2020-07-20 19:18:52 -07:00
7f27697df9 v3client: implement clientv3.Auth interface (#12140) 2020-07-20 16:53:25 -07:00
Jay
cc656718fa raft: correct pendingConfIndex check for AutoLeave (#12137)
Close #12136

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2020-07-20 16:49:22 -07:00
93637b1779 raft: bug fix (#12123)
we need to test the case when configuration set is changed, but there is typo.

None

Signed-off-by: accelsao <bebe8277@gmail.com>
2020-07-20 16:30:17 -07:00