Compare commits

...

264 Commits

Author SHA1 Message Date
92e3895214 *: bump to v2.0.13 2015-06-25 14:10:15 -07:00
b12a52b0fd etcdmain: fix the check in fallback-to-proxy case
advertise-client-urls has to be set if listen-client-urls is set when
fallbacking to proxy, which breaks the behavior. Loosen the check to fix
it.
2015-06-25 14:07:45 -07:00
9fa4002787 *: bump to v2.0.12+git 2015-06-16 14:20:25 -07:00
5686c33e4b *: bump to v2.0.12 2015-06-16 14:19:37 -07:00
6fd2dfdebc etcdmain: fix that advertise-client-urls is required in proxy mode
etcd proxy doesn't need to set advertise-client-urls because the flag is
not used.
2015-06-16 14:18:01 -07:00
896ce1668c build: default git sha to GitNotFound in case git fails 2015-06-16 14:15:48 -07:00
0520b4cd24 proxy: Reuse a bytes buffer as proxy request body.
The call to transport.RoundTrip closes the request body regardless of
the value of request.Closed. This causes subsequent calls to RoundTrip
using the same request body to fail.

Fixes #2895
2015-06-16 14:13:15 -07:00
6ee6f72c48 etcdmain: increase maxIdleConnsPerHost in proxy transport
This PR set maxIdleConnsPerHost to 128 to let proxy handle 128 concurrent
requests in long term smoothly.
If the number of concurrent requests is bigger than this value,
proxy needs to create one new connection when handling each request in
the delta, which is bad because the creation consumes resource and may
eat up your ephemeral port.
2015-06-16 14:10:58 -07:00
b4dd519a63 raft: fix raft node start bug
raft node should set initial prev hard state to empty.
Or it will not send the first hard coded state to application
until the state changes again.

This commit fixs the issue. It introduce a small overhead, that
the same tate might send to application twice when restarting.
But this is fine.
2015-06-16 14:10:41 -07:00
a98fff84e7 etcdctl/cluster_health: improve output if failed to get leader stats
When failing to get leader stats, it said 'cluster is unhealthy' before.
This is confusing when it cannot get stats because advertised client urls
are set wrong and the cluster is healthy.
2015-06-16 14:07:45 -07:00
973cfbebda *: make dial timeout configurable
Dial timeout is set shorter because
1. etcd is supposed to work in good environment, and the new value is long
enough
2. shorter dial timeout makes dial fail faster, which is good for
performance

Conflicts:
	etcdmain/etcd.go
2015-06-16 14:06:18 -07:00
00d1d34cf8 Merge pull request #2832 from yichengq/stream-2.1
not print unhelpful info when connecting to etcd 2.1
2015-05-27 13:36:38 -07:00
fcf81fd6bf *: bump to v2.0.11+git 2015-05-15 13:55:13 -07:00
0678329cd6 *: bump to v2.0.11 2015-05-15 13:54:32 -07:00
9a0e0c2eae etcdmain: better error msg when detected duplicate id in discovery
Conflicts:
	etcdmain/etcd.go
2015-05-15 13:47:02 -07:00
3e4d57c37d pkg/fileutil: add plan9 lockfile support 2015-05-15 13:35:51 -07:00
d30e764b2d version: added more version information
added more version information output to aid debugging
print etcd Version, Git SHA, Go runtime version, OS
and architecture

Fixes #2560

Conflicts:
	version/version.go
2015-05-15 12:34:33 -07:00
b5b7c78f1b docs: proxy needs accessible advertise client urls
Users cannot use proxy if -advertise-client-urls is set correctly.
Especially mention this in the doc to help them bypass the wrong
settings.
2015-05-15 12:32:58 -07:00
ee1c07c3d4 proxy: Fix connection leak when client disconnect
established connections were leaked when client disconnected before
proxyreq completes. This happens all time for wait=true requests.
2015-05-15 12:32:49 -07:00
67c5d4dfd2 etcdmain: advertise-client-urls must be set if listen-client-urls is set
Before this PR, people can set listen-client-urls without setting
advertise-client-urls, and leaves advertise-client-urls as default
localhost value. The client libraries which sync the cluster info
fetch wrong advertise-client-urls and cannot connect to the cluster.
This PR avoids this case and provides better UX.

On the other hand, this change is safe because people always want to set
advertise-client-urls if listen-client-urls is set. The default localhost
advertise url cannot be accessed from the outside, and should always be
set except that etcd is bootstrapped with no flag.

Conflicts:
	etcdmain/etcd.go
2015-05-15 12:32:35 -07:00
3afcbd6f83 docs: clarify the disaster recovery guide
A bit was missing from the documentation on disaster recovery, the reset
of the advertised peer urls for the node recovered from backup. Without
that, any subsequent server joining the cluster would not be able to
speak to the first node.
2015-05-15 12:30:48 -07:00
8fed61b2eb client: 410 is a vaild response for member.Remove
When removing a member, etcdserver might return 410 that indicates
the member has been removed. To client, 410 is a vaild response since
the client might do internal retry.
2015-05-15 12:30:37 -07:00
c8d386e18c pkg/fileutil: add filelock support for solaris 2015-05-15 11:30:40 -07:00
2b6a44b7b0 raft: fix typo in raftlog
fix typo in String() method of raftlog which will misorder
the "committed" and "unstable.offset" output.
2015-05-15 11:30:32 -07:00
8069d08b96 etcdserver: init server stats before passing it as argument
It is more reasonable to init the variable before passing it as an
argument.

It fixes a bug that etcdserver may panic on server stats when processing
a message from rafthttp streamReader before server stats is initialized
in server.Start().
2015-05-15 11:30:23 -07:00
5074235254 rafthttp: stop printing log when attaching stream with the same term
There is no need to print log when attaching stream with the same
term because the stream is installed back immediately.

This happens a lot when etcd 2.1 connects to etcd 2.0, so we make
the change.
2015-05-14 21:52:59 -07:00
f59bddd74b rafthttp: not log endpoint unsupport error
The error happens a lot when running 2.0 together with 2.1, and is
totally unhelpful.
2015-05-07 14:21:15 -07:00
58f035844c Merge pull request #2753 from yichengq/fix-remove-panic
backport #2701 to release-2.0 branch
2015-04-28 21:17:55 -07:00
f83774b4cd integration: add tests around the membership change issues 2015-04-24 13:49:17 -07:00
12c32137a8 rafthttp: add AddRemote
Add remotes to rafthttp, who help newly joined members catch up the
progress of the cluster. It supports basic message sending to remote, and
has no stream connection for simplicity. remotes will not be used
after the latest peers have been added into rafthttp.

Conflicts:
	rafthttp/pipeline.go
	rafthttp/transport.go
2015-04-24 13:37:16 -07:00
fce4cf4dc8 Revert "etcdserver: fix cluster fallback recovery"
This reverts commit cff005777a.
2015-04-24 13:06:43 -07:00
06a72b2702 *: bump to v2.0.10+git 2015-04-22 15:21:59 -07:00
fbaef05885 *: bump to v2.0.10 2015-04-22 15:21:38 -07:00
31a94d28e3 etcdctl: add extended as output format
extended wasn't documented in the help as one of the output formats, fix
this!

Conflicts:
	etcdctl/main.go
2015-04-22 15:11:06 -07:00
88660a303f snap: load should only return ErrNoSnapshot
If there is no available snapshot, load should return
ErrNoSnapshot. etcdserver might recover from that error
if it still have complete WAL files.
2015-04-22 15:09:38 -07:00
53c74dbd0b etcdserver: prevExist=true + condition is compareAndSwap
PrevExist indicates the key should exist. Condition compares with
an existing key. So PrevExist+condition = CompareAndSwap not Update.
2015-04-22 15:09:28 -07:00
8a8af60fad etcdctl: backup tool should use the new layout 2015-04-22 15:09:15 -07:00
7de19fefe8 etcdserver: fix minor bug in EtcdServer.send
it seems to nothing serious.
after deleted peers, the log may output:
"etcdserver: send message to unknown receiver %s"
2015-04-22 15:09:04 -07:00
7750f387b0 wal: better log msg 2015-04-22 15:08:50 -07:00
e33ab24442 wal: never leave a corrupted wal file
If the process dies during wal.cut(), it might leave a corrupted wal
file. This commit solves the problem by creating a temp wal file first,
then atomically rename it to a wal file when we are sure it is vaild.

Conflicts:
	wal/wal.go
2015-04-22 15:08:42 -07:00
fce2c1eeaf discovery: drop trailing . from srv target 2015-04-22 15:06:20 -07:00
6a3bb93305 discovery: add a test case for srv
During srv discovery, it should try to match local member with
resolved addr and return unresolved hostnames for the cluster.

Conflicts:
	discovery/srv_test.go
2015-04-22 15:06:03 -07:00
21455d2f3b *: stop using resolved tcp addr
We start to resolve host into tcp addrs since we generate
tcp based initial-cluster during srv discovery. However it
creates problems around tls and cluster verification. The
srv discovery only needs to use resolved the tcp addr to
find the local node. It does not have to resolve everything
and use the resolved addrs.

This fixes #2488 and #2226
2015-04-22 14:59:07 -07:00
51bb4220c5 Clarify that it is the proxy doing the shuffle. 2015-04-22 14:58:54 -07:00
d8c506923f proxy: shuffle endpoints
Shuffle endpoitns to avoid being "stuck" to a single cluster member.
2015-04-22 14:58:40 -07:00
5d778f85ca *: bump to v2.0.9+git 2015-04-07 15:18:50 -07:00
02697ca725 *: bump to v2.0.9 2015-04-07 15:18:29 -07:00
bd693c7069 etcdctl: refactor message in import command 2015-04-07 15:16:13 -07:00
52c90cdcfb etcdctl: import hidden keys 2015-04-07 14:49:40 -07:00
a88b22ac0a store: fix watcher removal 2015-04-07 14:46:10 -07:00
e93f8b8a12 *: bump to v2.0.8+git 2015-03-31 14:29:38 -07:00
86e616c6e9 *: bump to v2.0.8 2015-03-31 14:29:13 -07:00
5ae55a2c0d etcdctl: fix import typos 2015-03-31 13:48:18 -07:00
62ce6eef7b etcdctl: main routine of import command should wait for goroutine existing 2015-03-31 13:26:15 -07:00
7df4f5c804 build: do not build internal debugging tool
We are still playing around with the dump-log tool.
Stop building it publicly until we are happy with its
ux and functionality.
2015-03-31 13:26:05 -07:00
461c24e899 etcdct: adopt new client port by default
etcdserver uses both 4001 and 2379 for serving client requests by
default. etcdctl supports both ports by default.
2015-03-31 13:25:56 -07:00
6d90d03bf0 etcdctl: add migratesnap command 2015-03-31 13:25:39 -07:00
9995e80a2c Revert "etcdhttp: add internalVersion"
This reverts commit a77bf97c14.

Conflicts:
	version/version.go

Conflicts:
	version/version.go
2015-03-31 13:25:22 -07:00
229405f113 *: remove upgrading related stuff 2015-03-31 13:24:28 -07:00
b3f2a998d4 docs: add clarity about the 1000 events history
When talking about missing events on a particular key, the 1000 event history
limit can be understood as being per key, instead of etcd-wide events. Make it
clear that it is across all etcd keys.
2015-03-31 13:24:19 -07:00
8436e901e9 etcdserver: loose member validation for joining existing cluster 2015-03-31 13:24:07 -07:00
c03f5cb941 *: bump to v2.0.7+git 2015-03-24 23:14:38 -07:00
0cb90e4bea *: bump to v2.0.7 2015-03-24 23:07:57 -07:00
df83b1b34e wal: fix missing import 2015-03-24 23:00:04 -07:00
f2bef04009 wal: releastTo should work with large release index 2015-03-24 22:51:02 -07:00
02198336f6 version: not return err NotExist in Detect 2015-03-24 22:50:44 -07:00
0c9a226e0e etcdserver: print out extra files in data dir instead of erroring 2015-03-24 22:50:33 -07:00
5bd1d420bb etcdserver: add join-existing check 2015-03-24 22:49:41 -07:00
a1cb5cb768 etcdmain: print error when non-flag args remain 2015-03-24 22:49:31 -07:00
acba49fe81 *: bump to v2.0.6+git 2015-03-23 14:05:08 -07:00
e3c902228b *: bump to v2.0.6 2015-03-23 13:52:00 -07:00
52a2d143d2 migrate: remove starter code
It has been moved to github.com/coreos/etcd-starter.
2015-03-21 11:15:26 -07:00
f53d550a79 store: fixed clone error for store stats. 2015-03-21 11:14:06 -07:00
63b799b891 migrate: detect version 2.0.1
Without this code a second start will crash:

```
$ ./bin/etcd -name foobar --data-dir=foobar
2015/03/18 18:06:28 starter: detect etcd version 2.0.1 in foobar
2015/03/18 18:06:28 starter: unhandled etcd version in foobar
panic: starter: unhandled etcd version in foobar

goroutine 1 [running]:
log.Panicf(0x594770, 0x25, 0x208927c70, 0x1, 0x1)
	/usr/local/go/src/log/log.go:314 +0xd0
github.com/coreos/etcd/migrate/starter.checkInternalVersion(0x20889a480, 0x0, 0x0)
	/Users/philips/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/migrate/starter/starter.go:160 +0xf2f
github.com/coreos/etcd/migrate/starter.StartDesiredVersion(0x20884a010, 0x3, 0x3)
	/Users/philips/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/migrate/starter/starter.go:77 +0x2a9
main.main()
	/Users/philips/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/main.go:46 +0x25e

goroutine 9 [syscall]:
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
	/usr/local/go/src/os/signal/signal_unix.go:27 +0x35
```
2015-03-21 11:13:55 -07:00
697883fb8c etcdmain: let user provide a name w/o initial-cluster update
Currently this doesn't work if a user wants to try out a single machine
cluster but change the name for whatever reason. This is because the
name is always "default" and the

```
./bin/etcd -name 'baz'
```

This solves our problem on CoreOS where the default is `ETCD_NAME=%m`.
2015-03-21 11:13:42 -07:00
f794f87f26 Documentation: fixup grammar around the unsafe flags 2015-03-21 11:13:28 -07:00
0847986d4a etcdmain: identify data dir type 2015-03-21 11:12:18 -07:00
9ea80c6ac1 raft: fix godoc about starting a node 2015-03-21 11:11:21 -07:00
02fb648abf etcdmain: verify heartbeat and election flag 2015-03-21 11:11:09 -07:00
4c9e1686b1 pkg/flags: Add support for IPv6 addresses
Support IPv6 address for ETCD_ADDR and ETCD_PEER_ADDR

pkg/flags: Support IPv6 address for ETCD_ADDR and ETCD_PEER_ADDR

pkg/flags: tests for IPv6 addr and bind-addr flags

pkg/flags: IPAddressPort.Host: do not enclose IPv6 address in square brackets

pkg/flags: set default bind address to [::] instead of 0.0.0.0

pkg/flags: we don't need fmt any more

also, one minor fix: net.JoinHostPort takes string as a port value

pkg/flags: fix ipv6 tests

pkg/flags: test both IPv4 and IPv6 addresses in TestIPAddressPortString

etcdmain: test: use [::] instead of 0.0.0.0
2015-03-21 11:05:20 -07:00
0fb9362c5c *: bump to v2.0.5+git 2015-03-11 17:00:51 -07:00
9481945228 *: bump to v2.0.5 2015-03-11 11:33:43 -07:00
e13b09e4d9 wal: fix ReleaseLockTo
ReleaseLockTo should not release the lock on the WAL
segment that is right before the given index. When
restarting etcd, etcd needs to read from the WAL segment
that has a smaller index than the snapshot index.

The correct behavior is that ReleaseLockTo releases
the locks w is holding so that w only holds one lock
that has an index smaller than the given index.
2015-03-10 09:45:46 -07:00
78e0149f41 raft: do not reset vote if term is not changed
raft MUST keep the voting information for the same term. reset
should not reset vote if term is not changed.
2015-03-10 09:42:45 -07:00
4c86ab4868 pkg/transport: fix downgrade https to http bug in transport
If the TLS config is empty, etcd downgrades https to http without a warning.
This commit avoid the downgrade and stoping etcd from bootstrap if it cannot
listen on TLS.
2015-03-10 09:39:01 -07:00
59327bab47 pkg/transport: set the maxIdleConnsPerHost to -1
for transport that are using timeout connections, we set the
maxIdleConnsPerHost to -1. The default transport does not clear
the timeout for the connections it sets to be idle. So the connections
with timeout cannot be reused.
2015-03-10 09:38:39 -07:00
62ed1ebf03 Documentation: fix "Missing infra1="
Documentation: fix "Missing infra1="
2015-03-10 09:38:27 -07:00
cea3448438 *: bump to v2.0.4+git 2015-02-27 12:25:50 -08:00
1a2c6d3f2f *: bump to v2.0.4 2015-02-26 22:01:24 -08:00
ecf7c27697 Merge pull request #2374 from wellbehavedsoftware/fix-2373
etcdtcl: fix etcdctl cluster-health ignores SSL settings
2015-02-25 07:44:10 -08:00
05ecdbc617 etcdtcl: fix etcdctl cluster-health ignores SSL settings
etcdctl reconnects to the leader, but was not picking up ssl settings in this
case, which causes it to show unhealthy when this is not the case.

Fixes #2373
2015-02-25 13:19:07 +01:00
6648b7e302 Merge pull request #2363 from yichengq/329
migrate/starter: fix v2 data dir checking
2015-02-24 22:44:10 -08:00
194105e02c Merge pull request #2369 from jonsyu1/master
Documentation fixes for proxy
2015-02-24 21:39:20 -08:00
31bfffaa48 Documentation: standardize on url over URL
url and URL both appear in this doc. Choose url due to higher frequency
2015-02-24 16:26:27 -05:00
1fbaf9dbb7 Documentation: fix discovery flag for proxy docs
It seems that the -discovery flag used to be -discovery-url. Updated this to use
the currently documented and supported -discovery flag.
2015-02-24 16:25:18 -05:00
3fd9136740 migrate/starter: fix v2 data dir checking 2015-02-24 11:47:56 -08:00
a560c52815 Merge pull request #2354 from xiang90/wait_time
pkg/wait: add WaitTime
2015-02-23 14:29:39 -08:00
53d20a8a29 pkg/wait: add WaitTime
WaitTime waits on deadline instead of id.
2015-02-23 14:26:42 -08:00
4b72095bd3 Merge pull request #2350 from jonsyu1/master
Fixed sample command flags in proxy docs
2015-02-23 09:19:15 -08:00
28e150e50e Documentation: fix sample command flags for proxy
The docs mention the listen-client-urls flag, but the examples use
client-listen-urls, which is an invalid flag.
2015-02-23 11:15:42 -05:00
4d0472029a Merge pull request #2348 from yichengq/326
etcdserver: fix cluster fallback recovery
2015-02-21 12:16:08 -08:00
e54fdfd9cc Merge pull request #2349 from yichengq/327
rafthttp: fix panic on receiving empty ents
2015-02-20 15:15:43 -08:00
ca390560f9 rafthttp: fix panic on receiving empty ents
2.0 rc may send empty ents. Fix it for backward compatibility.
2015-02-20 15:07:27 -08:00
cff005777a etcdserver: fix cluster fallback recovery
Cluster and transport may recover to old states when new node joins
the cluster. Record cluster last modified index to avoid this.
2015-02-20 14:30:00 -08:00
d57e07dcde Merge pull request #2347 from bdarnell/fix-nyet-test
Fix test for existence of go-nyet.
2015-02-20 14:07:55 -05:00
79bc3f4774 Fix test for existence of go-nyet.
When the file is not found, `which` returns an empty string,
which passes the -f test. `command -v` is the most portable alternative
to `which` per
http://stackoverflow.com/questions/592620/check-if-a-program-exists-from-a-bash-script/677212#677212
2015-02-20 14:02:43 -05:00
d2b0dd2419 Merge pull request #2345 from bdarnell/normal-entry-formatter
Only use the EntryFormatter for normal entries.
2015-02-20 11:00:12 -08:00
b53dc0826e Only use the EntryFormatter for normal entries.
ConfChange entries also have a Data field but the application-supplied
formatter won't know what to do with them.
2015-02-20 13:51:14 -05:00
0ea2173a7e Merge pull request #2343 from xiang90/fix_kill
osutil: pid 1 should exit directly instead of trying to kill itself
2015-02-20 09:01:49 -08:00
7ae94f2bf0 osutil: pid 1 should exit directly instead of trying to kill itself 2015-02-19 20:27:50 -08:00
4228c703a7 Merge pull request #2341 from yichengq/326
migrate/starter: fix flag parsing
2015-02-19 11:02:07 -08:00
10629c40e1 migrate/starter: fix flag parsing 2015-02-18 23:47:52 -08:00
e2928cd97a Merge pull request #2242 from barakmich/acl_doc
docs: Add v2 ACL RFC
2015-02-18 23:31:26 -08:00
40365c4f8d docs: add Security RFC
docs: Add v2 ACL RFC

Add workflow, fix terminology, make the API JSON, and general cleanup

fixes from xiang90s comments

add permissions struct

update regarding glob matches

rename file
2015-02-18 14:34:00 -05:00
88994f9ec8 Merge pull request #2335 from xiang90/dump-tool
tool: dump tool supports index
2015-02-18 09:35:49 -08:00
d6f8a30f7c tool: dump tool supports index 2015-02-18 09:13:47 -08:00
7c65857283 Merge pull request #2327 from barakmich/remove_shadowing
*: remove shadowing of variables from etcd and add travis test
2015-02-17 17:46:41 -05:00
92dca0af0f *: remove shadowing of variables from etcd and add travis test
We've been bitten by this enough times that I wrote a tool so that
it never happens again.
2015-02-17 16:31:42 -05:00
0a5707420b Merge pull request #2326 from yichengq/325
migrate/functional: fix `go build` failure
2015-02-17 10:46:39 -08:00
90b06f874d migrate/functional: fix go build failure 2015-02-17 10:35:30 -08:00
66199afb25 Merge pull request #2322 from kelseyhightower/add-etcd-docker-guide
doc: add etcd docker guide
2015-02-16 12:43:17 -08:00
217a1f0730 doc: add etcd docker guide
Fixes #2253
2015-02-16 11:44:41 -08:00
def62071f0 Merge pull request #2320 from xiang90/fix_error
etcdserver: fix error message when valide the discovery cluster
2015-02-16 09:53:24 -08:00
beb44ef6ba etcdserver: fix error message when valide the discovery cluster 2015-02-16 09:53:01 -08:00
d1ed54b734 Merge pull request #2317 from zhangbaitong/master
docs:small fix
2015-02-16 08:28:37 -08:00
518eb9fa2f docs:small fix
Signed-off-by: zhangbaitong <zhangbaitong@163.com>
2015-02-16 17:54:24 +08:00
73e67628d9 Merge pull request #2313 from xiang90/cluster_mu
etcdserver: move the mutex before what it guards
2015-02-14 23:05:53 -08:00
04bd06d20b etcdserver: move the mutex before what it guards 2015-02-14 22:26:12 -08:00
29f05bb217 Merge pull request #2307 from xiang90/refactor_cluster
etcdserver: getOtherPeerURLs -> getRemotePeerURLs
2015-02-14 20:59:38 -08:00
c5ca1218f3 etcdserver: GetClusterFromPeers -> GetClusterFromRemotePeers 2015-02-13 19:05:29 -08:00
f7540912d6 etcdserver: getOtherPeerURLs -> getRemotePeerURLs 2015-02-13 18:56:45 -08:00
0fcbadc10b Merge pull request #2305 from xiang90/fix_win
osutil: fix win build
2015-02-13 16:39:07 -08:00
e44dc0f3fe osutil: fix win build 2015-02-13 16:33:39 -08:00
4d728cc8c4 *: bump to v2.0.3 2015-02-13 15:27:24 -08:00
f7998bb2db Merge pull request #2304 from xiang90/fix_discovery_validation
etcdserver: validate discovery cluster
2015-02-13 14:41:09 -08:00
cfa7ab6074 etcdserver: validate discovery cluster 2015-02-13 14:32:24 -08:00
b59390c9c3 Merge pull request #2293 from barakmich/etcd_underscore
migrate: stop deleting _etcd
2015-02-13 17:10:14 -05:00
fdebf2b109 fix parent references 2015-02-13 16:54:15 -05:00
e9f4be498d migrate: decrease memory usage (only duplicate machines) 2015-02-13 15:26:54 -05:00
6d9d7b4497 Merge pull request #2302 from xiang90/fix_travis
integration: wait for slow travis
2015-02-13 11:49:37 -08:00
163ea3f5c5 integration: wait for slow travis 2015-02-13 11:41:03 -08:00
ea1e54b2a1 Merge pull request #2291 from ArtfulCoder/master
Added go build flag '-installsuffix cgo' to create a static library for etcd and etcdctl
2015-02-13 11:23:03 -08:00
b31109cfd7 Merge pull request #2290 from xiang90/fix_transport
etcdserver: recover transport when recovering from a snapshot
2015-02-13 10:23:29 -08:00
7a909c3950 Merge pull request #2282 from matishsiao/patch-1
add etcd-console tool to tools list
2015-02-13 10:20:31 -08:00
c16cc3a6a3 etcdserver: recover transport when recovering from a snapshot 2015-02-13 10:16:28 -08:00
d7840b75c3 Merge pull request #2301 from xiang90/fix_snap
integration: fix test
2015-02-13 10:03:45 -08:00
aed2c82e44 integration: fix test 2015-02-13 10:02:42 -08:00
39ee85470f Merge pull request #2300 from xiang90/fix_snap
etcdserver: fix snapshot
2015-02-13 09:56:19 -08:00
fbc4c8efb5 etcdserver: fix snapshot 2015-02-13 09:54:25 -08:00
12999ba083 Merge pull request #2298 from barakmich/issue2295
etcdserver: Unmask the snapshotter. Fixes #2295
2015-02-13 09:38:58 -08:00
a0e3bc9cbd etcdserver: Unmask the snapshotter. Fixes #2295 2015-02-13 11:56:00 -05:00
b06e43b803 Merge pull request #2289 from fabxc/feature/graceful_shutdown
main: shutdown gracefully.
2015-02-13 07:34:07 -08:00
8bf795dc3c etcdmain/osutil: shutdown gracefully, interrupt handling
The functionality in pkg/osutil ensures that all interrupt handlers finish
and the process kills itself with the proper signal.
Test for interrupt handling added.
The server shutsdown gracefully by stopping on interrupt (Issue #2277.)
2015-02-13 10:28:53 +01:00
02c52f175f migrate: stop deleting etcd 2015-02-12 19:35:33 -05:00
daf1a913bb Merge pull request #2287 from Amit-PivotalLabs/master
rafthttp/transport.go: Fix nil pointer dereference in RemovePeer
2015-02-12 14:49:12 -08:00
317e57a8a8 rafthttp: Panic informatively when removing unknown peer ID 2015-02-12 14:43:44 -08:00
5c0d3889f8 Added go build flag '-installsuffix cgo' to create a static library. This is needed when go 1.4 is used to build. 2015-02-12 14:08:02 -08:00
a71184424a *: bump to v2.0.2+git 2015-02-12 11:41:48 -08:00
409daceb73 *: bump to v2.0.2 2015-02-12 11:14:50 -08:00
c6cc276ef0 Merge pull request #2286 from barakmich/fix_migrations
etcdserver: Canonicalize migrations
2015-02-12 12:53:33 -05:00
cd50f0e058 etcdserver: Create MemberDir() and base {Snap,WAL}Dir() thereon. Audit DataDir. 2015-02-12 12:45:19 -05:00
fade9b6065 etcdserver: Refactor 2.0.1 directory rename into a proper migration
fix all instances

fix detection test
2015-02-12 11:53:19 -05:00
590205b8c0 Merge pull request #2284 from xiang90/cleanup
Cleanup
2015-02-11 16:21:10 -08:00
163f0f09f6 etcdserver: cleanup cluster_util 2015-02-11 16:20:38 -08:00
20497f1f85 etcdserver: move remote cluster retrive to cluster_util.go 2015-02-11 14:03:14 -08:00
4a0887ef7a Merge pull request #2283 from xiang90/etcd-dump
etcd dump
2015-02-11 11:24:05 -08:00
161b1d2e2e tools: etcd-dump-logs tool support dump from a given snapshot file 2015-02-11 10:50:04 -08:00
71bed48916 snap: add Read function 2015-02-11 10:21:19 -08:00
fe1d9565c2 *: bump to 2.0.1 2015-02-10 20:19:35 -08:00
fd90ec6c26 add etcd-console tool to tools list
i add etcd-console tool to tools list for reference
2015-02-11 10:43:21 +08:00
a81e147d8f Merge pull request #2281 from robszumski/docs-migrate
docs: add diagram and restructure for clarity
2015-02-10 17:45:24 -08:00
24b953a55d docs: add diagram and restructure for clarity 2015-02-10 17:34:23 -08:00
54bef0d2cd Merge pull request #2233 from yichengq/315
docs: add allow_legacy_mode.md
2015-02-10 15:46:52 -08:00
d0677a24dd docs: add allow_legacy_mode.md 2015-02-10 15:46:26 -08:00
bdc8cc1f54 Merge pull request #2278 from xiang90/ctl
etcdctl: add default peerurl for upgrade subcmd
2015-02-10 15:35:04 -08:00
b036c384a5 Merge pull request #2280 from gabesullice/typo-fix
documentation: fix typo in Documentation/clustering.md
2015-02-10 15:24:17 -08:00
df2a689d1c documentation: fix typo in Documentation/clustering.md
just an extra space needed to be removed.

Fixes #2279
2015-02-10 16:18:51 -07:00
f97a263a95 etcdctl: add default peerurl for upgrade subcmd 2015-02-10 15:13:12 -08:00
96ea0ff45c Merge pull request #2274 from xiang90/fix_stats
rafthttp: remove follower from leaderstats when it is removed from the c...
2015-02-10 11:27:29 -08:00
58112c4d2d rafthttp: remove follower from leaderstats when it is removed from the cluster 2015-02-10 11:22:33 -08:00
d74e74d320 Merge pull request #2261 from yichengq/322
migrate: fix setting commit index from snapshot
2015-02-10 09:57:24 -08:00
9834875d35 Merge pull request #2271 from yichengq/323
etcdmain: infer bind addr from addr in v1 flagset
2015-02-10 09:53:37 -08:00
9460b6efda Merge pull request #2267 from xiang90/fix_snapconf
etcdserver: save confstate when apply new snapshot
2015-02-10 09:44:10 -08:00
57dd8c18cc etcdmain: infer bind addr from addr in v1 flagset 2015-02-10 09:42:10 -08:00
9ec8ea47c8 Merge pull request #2272 from yichengq/324
rafthttp: not send 0-entry MsgApp using stream
2015-02-10 09:40:32 -08:00
6e1aecfc6f etcdserver: save confstate when apply new snapshot 2015-02-10 07:31:25 -08:00
96fde55a0f rafthttp: not send 0-entry MsgApp using stream
It is not sent out because it is useless to let remote raft step the
message.
Moreover, MsgApp stream reader can always assume that the length
of entries sent is > 0.
2015-02-10 00:02:22 -08:00
84dac75ed5 Merge pull request #2213 from yichengq/317
migrate/starter: fix --version output
2015-02-09 23:18:35 -08:00
1481ef9a5e Merge pull request #2264 from xiang90/pause
rafttest: support node pause
2015-02-09 18:54:56 -08:00
fa66055f66 rafttest: drop isPaused 2015-02-09 18:52:34 -08:00
085b608de9 rafttest: support node pause 2015-02-09 16:26:43 -08:00
3c9c4c4afa Merge pull request #2249 from xiang90/rttest
raftest: wait for network sending
2015-02-09 15:55:30 -08:00
279b216f9a raftest: wait for network sending 2015-02-09 15:52:16 -08:00
8788c74b48 Merge pull request #2263 from xiang90/cread
Documentation: document kv api change
2015-02-09 15:46:45 -08:00
8d663078bf Documentation: document kv api change 2015-02-09 15:35:15 -08:00
0242faa838 Merge pull request #2257 from yichengq/321
docs: fix stats response in api.md
2015-02-09 14:46:57 -08:00
9c850b7182 Merge pull request #2259 from xiang90/healthy
etcdctl: support healthy checking
2015-02-09 14:38:43 -08:00
db88d9764c migrate: fix setting commit index from snapshot 2015-02-09 14:38:38 -08:00
7bbdad9068 etcdctl: support healthy checking 2015-02-09 14:35:24 -08:00
af00536d71 Merge pull request #2252 from xiang90/raftdelay
rafttest: add network delay
2015-02-09 13:14:32 -08:00
c990099008 docs: fix stats response in api.md 2015-02-09 11:48:54 -08:00
65cd0051fe rafttest: add network delay 2015-02-06 15:01:07 -08:00
c94db98177 Merge pull request #2250 from xiang90/raftnt
rafttest: add network drop
2015-02-06 12:40:55 -08:00
d423946fa4 rafttest: add network drop 2015-02-06 10:50:55 -08:00
e2feafc741 Merge pull request #2241 from peterrosell/correct_defaults_in_tuning
Correct defaults for heartbeat and election
2015-02-06 07:41:47 -08:00
c8b5d47f24 Documentation: Correct defaults for heartbeat and election
Defaults for hearbeat-interval and election-timeout is updated according to configuration documentation.
2015-02-06 10:13:57 +01:00
d71be31e68 Merge pull request #2245 from xiang90/fix_store
store: fix modifiedindex in node clone
2015-02-05 22:42:07 -08:00
9776e6d082 store: fix modifiedindex in node clone 2015-02-05 22:26:52 -08:00
766e0ad901 Merge pull request #2236 from philips/remove-becomes
Small grammar fixes
2015-02-05 11:26:16 -08:00
a387e2a989 Merge pull request #2232 from yichengq/319
fix the problem of StoreKeysPrefix key in store
2015-02-05 07:58:49 -08:00
26dc5904a5 Merge pull request #2235 from yichengq/318
migrate/functional: add Upgrade TLS V1 cluster test
2015-02-04 21:56:42 -08:00
136e0b6e26 migrate/functional: add Upgrade TLS V1 cluster test 2015-02-04 21:49:42 -08:00
599e821309 etcdctl/upgrade: use peer flags for peer transport 2015-02-04 21:49:42 -08:00
1ce7f6e0d0 migrate/starter: fix --version output 2015-02-04 21:28:56 -08:00
860a8c8717 Documentation: grammar fixup in admin guide
Rephrase to avoid "becomes".
2015-02-04 21:28:43 -08:00
a4c4027dc7 rafthttp: becomes -> became in log line
Simple grammar fix.
2015-02-04 21:28:23 -08:00
3ac0298bd0 store: set readonly to pre-defined namespaces 2015-02-04 16:47:08 -08:00
f13c7872d5 etcdserver: register pre-defined namespaces in store 2015-02-04 16:33:40 -08:00
38038e476a Merge pull request #2230 from yichengq/315
pkg/osutil: add Unsetenv
2015-02-04 10:43:43 -08:00
871e92ef73 pkg/osutil: add Unsetenv
go1.4 doesn't support static link well, so we stay in go1.3 for a while.
Implement Unsetenv in go1.3 way.
2015-02-04 10:29:20 -08:00
58cb9a3b76 Merge pull request #2222 from yichengq/315
migrate/starter: unset discovery when setting initial-cluster
2015-02-03 23:27:43 -08:00
a0f8aa1add travis: use latest go tool repo 2015-02-03 23:23:02 -08:00
5c6ce0c18d travis: use go1.4 for new feature os.Unsetenv() 2015-02-03 23:11:08 -08:00
378fa46b7d Merge pull request #2224 from xiang90/raftt
rafttest: separate network interface and network
2015-02-03 23:11:01 -08:00
83edf0d862 rafttest: separate network interface and network 2015-02-03 22:50:27 -08:00
d0205519a8 migrate/starter: unset discovery when setting initial-cluster 2015-02-03 18:29:52 -08:00
fca9805f84 Merge pull request #2221 from yichengq/315
migrate/starter: fix default version dir
2015-02-03 14:57:27 -08:00
f109020b94 migrate/starter: fix default version dir 2015-02-03 14:56:26 -08:00
81d7eaf17f Merge pull request #2205 from yichengq/315
migrate: support standby mode upgrade
2015-02-03 11:07:49 -08:00
2d081bd3b9 migrate: support standby mode upgrade 2015-02-03 10:59:43 -08:00
f2f2adc663 migrate/functional: always run tests on CoreOS image 2015-02-03 10:59:43 -08:00
92b329fdb9 etcdmain: use symlink instead of link for v0.4 files
link doesn't support directory.
2015-02-03 10:59:43 -08:00
00eaf165a8 Merge pull request #2212 from xiang90/rt
raftest: add restart and related simple test
2015-02-03 10:15:23 -08:00
b147a6328d raftest: add restart and related simple test 2015-02-03 10:08:52 -08:00
afb14a3e7a Merge pull request #2210 from yichengq/316
etcdmain: use /member subdir to save member data
2015-02-02 17:06:30 -08:00
ce1d7a9fa9 etcdmain: use /member subdir to save member data 2015-02-02 17:01:19 -08:00
470be16c04 Merge pull request #2209 from xiang90/fix_proxy
etcd: fix proxy
2015-02-02 15:02:25 -08:00
fbabcedcc9 etcd: fix proxy
1. move proxy datadir to /proxy subdir.
2. delay update proxy's cluster after validation.
2015-02-02 14:58:45 -08:00
d16c5e1e81 Merge pull request #2203 from xiang90/raft_test
raft: add raft test suite
2015-02-01 14:57:09 -08:00
d65af21b73 raft: add raft test suite 2015-02-01 14:53:22 -08:00
bdcae31638 Merge pull request #2202 from xiang90/proxy
etcd: fix proxy updating
2015-01-30 17:00:07 -08:00
ae9f54c132 etcd: fix proxy updating 2015-01-30 16:56:41 -08:00
a3d0097908 Merge pull request #2201 from barakmich/member_suggestion
etcdctl: give more helpful suggestions on removal
2015-01-30 19:51:22 -05:00
37e8d608b3 add documentation link and describe the 404/500 errors better 2015-01-30 19:41:44 -05:00
c66176b538 etcdctl: give more helpful suggestions on removal 2015-01-30 19:23:19 -05:00
b6936a0079 docs: fix broken link 2015-01-30 15:37:26 -08:00
9961d5ca2b Merge pull request #2198 from xiang90/proxy
Proxy
2015-01-30 15:24:13 -08:00
dc7374c488 etcd: persist proxy cluster to disk 2015-01-30 15:18:26 -08:00
87a8ebd222 docs: expand description of -initial-cluster-state 2015-01-30 14:14:51 -08:00
27e5b9a394 docs: clarify reconfig options 2015-01-30 14:14:28 -08:00
f5afe3cc34 Fixed typo in API documentation. 2015-01-30 14:14:18 -08:00
3ee7a265f6 README: remove doozer and zookeeper mentions
doozer in particular is rather confusing to mention since the project
hasn't been worked on in years. While we are at it it might simplify
people's understanding if we remove zookeeper too.
2015-01-30 14:13:47 -08:00
d1f9f2f1b7 scripts: remove 2.0 Documentation from build-release
2.0 docs have been merged into the Documentation folder now.
2015-01-30 14:13:25 -08:00
894f1aadce Merge pull request #2199 from xiang90/coreos
mian: detects coreos
2015-01-30 12:10:23 -08:00
fce80136e3 main: detects coreos 2015-01-30 12:10:05 -08:00
ebf9daff74 Merge pull request #2190 from yichengq/308
migrate: support start desired version
2015-01-30 11:47:22 -08:00
ec5a6e8beb migrate: support start desired version 2015-01-30 00:35:53 -08:00
0945e487e7 docs: fix static clustering example
When using very similar flags to our examples, the cluster doesn't bootstrap due to mismatched protocols (`http` vs `https`) in the `-initial-advertise-peer-urls` and `initial-cluster` list:

```
./etcd -name infra0 -initial-advertise-peer-urls https://127.0.0.1:2380 \
>   -listen-peer-urls https://127.0.0.1:2380 \
>   -initial-cluster-token etcd-cluster-1 \
>   -initial-cluster infra0=http://127.0.0.1:2380,infra1=http://127.0.0.1:2381,infra2=http://127.0.0.1:2382 \
>   -initial-cluster-state new
2015/01/29 10:32:16 no data-dir provided, using default data-dir ./infra0.etcd
2015/01/29 10:32:16 etcd: listening for peers on https://127.0.0.1:2380
2015/01/29 10:32:16 etcd: listening for client requests on http://localhost:2379
2015/01/29 10:32:16 etcd: listening for client requests on http://localhost:4001
2015/01/29 10:32:16 etcd: stopping listening for client requests on http://localhost:4001
2015/01/29 10:32:16 etcd: stopping listening for client requests on http://localhost:2379
2015/01/29 10:32:16 etcd: stopping listening for peers on https://127.0.0.1:2380
2015/01/29 10:32:16 etcd: infra0 has different advertised URLs in the cluster and advertised peer URLs list
```
2015-01-29 13:44:13 -08:00
a65556abe2 Merge pull request #2189 from yichengq/314
support disaster recovery from rc1 data dir
2015-01-29 13:38:00 -08:00
e966e565c4 etcdctl/backup_command: handle datadir with missed snapshot mark
This helps to recover from the data dir created in v2.0.0-rc1.
2015-01-29 13:32:59 -08:00
7840d49ae0 etcdserver: not add self to transporter based on local ID
If this is decided by local name, it comes to trouble if the name is
duplicate in the cluster.
2015-01-29 12:35:47 -08:00
d0af96d558 etcdctl/backup_command: save snapshot mark in new wal 2015-01-29 12:35:39 -08:00
fd0c0c9263 Merge pull request #2185 from xiang90/fix_tls_keepalive
Fix tls keepalive
2015-01-29 10:36:11 -08:00
4960324876 pkg/transport: fix tlskeepalive 2015-01-29 09:42:48 -08:00
110 changed files with 3864 additions and 761 deletions

View File

@ -1,11 +1,12 @@
language: go
sudo: false
go:
- 1.3
- 1.4
install:
- go get code.google.com/p/go.tools/cmd/cover
- go get code.google.com/p/go.tools/cmd/vet
- go get golang.org/x/tools/cmd/cover
- go get golang.org/x/tools/cmd/vet
- go get github.com/barakmich/go-nyet
script:
- INTEGRATION=y ./test

View File

@ -36,7 +36,7 @@ This can protect you from cluster corruption in case of mis-configuration becaus
#### Optimal Cluster Size
The recommended etcd cluster size is 3, 5 or 7, which is decided by the fault tolerance requirement. A 7-member cluster can provide enough fault tolerance in most cases. While larger cluster provides better fault tolerance, its write performance becomes lower since data needs to be replicated to more machines.
The recommended etcd cluster size is 3, 5 or 7, which is decided by the fault tolerance requirement. A 7-member cluster can provide enough fault tolerance in most cases. While larger cluster provides better fault tolerance the write performance reduces since data needs to be replicated to more machines.
#### Fault Tolerance Table
@ -55,9 +55,13 @@ It is recommended to have an odd number of members in a cluster. Having an odd c
As you can see, adding another member to bring the size of cluster up to an odd size is always worth it. During a network partition, an odd number of members also guarantees that there will almost always be a majority of the cluster that can continue to operate and be the source of truth when the partition ends.
#### Changing Cluster Size
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md), which allows the cluster to be modified without downtime. The `etcdctl` tool has a `member list`, `member add` and `member remove` commands to complete this process.
### Member Migration
When there is a scheduled machine maintenance or retirement, you might want to migrate an etcd member to another machine without losing the data and changing the member ID.
When there is a scheduled machine maintenance or retirement, you might want to migrate an etcd member to another machine without losing the data and changing the member ID.
The data directory contains all the data to recover a member to its point-in-time state. To migrate a member:
@ -98,7 +102,7 @@ $ sudo systemctl stop etcd
#### Copy the data directory of the now-idle member to the new machine
```
$ tar -cvzf node1.etcd.tar.gz /var/lib/etcd/node1.etcd
$ tar -cvzf node1.etcd.tar.gz /var/lib/etcd/node1.etcd
```
```
@ -177,7 +181,9 @@ Once you have verified that etcd has started successfully, shut it down and move
#### Restoring the cluster
Now that the node is running successfully, you can add more nodes to the cluster and restore resiliency. See the [runtime configuration](runtime-configuration.md) guide for more details.
Now that the node is running successfully, you should [change its advertised peer URLs](other_apis.md#change-the-peer-urls-of-a-member), as the `--force-new-cluster` has set the peer URL to the default (listening on localhost).
You can then add more nodes to the cluster and restore resiliency. See the [runtime configuration](runtime-configuration.md) guide for more details.
### Client Request Timeout

View File

@ -287,7 +287,7 @@ curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=7'
The watch command returns immediately with the same response as previously.
**Note**: etcd only keeps the responses of the most recent 1000 events.
**Note**: etcd only keeps the responses of the most recent 1000 events across all etcd keys.
It is recommended to send the response to another thread to process immediately
instead of blocking the watch while processing the result.
@ -305,7 +305,7 @@ We get the index is outdated response, since we miss the 1000 events kept in etc
{"errorCode":401,"message":"The event in requested index is outdated and cleared","cause":"the requested history has been cleared [1003/7]","index":2002}
```
To start watch, frist we need to fetch the current state of key `/foo` and the etcdIndex.
To start watch, first we need to fetch the current state of key `/foo` and the etcdIndex.
```sh
curl 'http://127.0.0.1:2379/v2/keys/foo' -vv
```
@ -913,19 +913,35 @@ curl http://127.0.0.1:2379/v2/stats/leader
```json
{
"id": "2c7d3e0b8627375b",
"leaderInfo": {
"leader": "8a69d5f6b7814500",
"startTime": "2014-10-24T13:15:51.184719899-07:00",
"uptime": "7m17.859616962s"
"followers": {
"6e3bd23ae5f1eae0": {
"counts": {
"fail": 0,
"success": 745
},
"latency": {
"average": 0.017039507382550306,
"current": 0.000138,
"maximum": 1.007649,
"minimum": 0,
"standardDeviation": 0.05289178277920594
}
},
"a8266ecf031671f3": {
"counts": {
"fail": 0,
"success": 735
},
"latency": {
"average": 0.012124141496598642,
"current": 0.000559,
"maximum": 0.791547,
"minimum": 0,
"standardDeviation": 0.04187900156583733
}
}
},
"name": "infra1",
"recvAppendRequestCnt": 3949,
"recvBandwidthRate": 561.5729321100841,
"recvPkgRate": 9.008227977383449,
"sendAppendRequestCnt": 0,
"startTime": "2014-10-24T13:15:50.070369454-07:00",
"state": "StateFollower"
"leader": "924e2e83e93f2560"
}
```
@ -979,19 +995,19 @@ curl http://127.0.0.1:2379/v2/stats/self
```json
{
"id": "eca0338f4ea31566",
"id": "924e2e83e93f2560",
"leaderInfo": {
"leader": "8a69d5f6b7814500",
"startTime": "2014-10-24T13:15:51.186620747-07:00",
"uptime": "10m47.012122091s"
"leader": "924e2e83e93f2560",
"startTime": "2015-02-09T11:38:30.177534688-08:00",
"uptime": "9m33.891343412s"
},
"name": "node3",
"recvAppendRequestCnt": 5835,
"recvBandwidthRate": 584.1485698657176,
"recvPkgRate": 9.17390765395709,
"sendAppendRequestCnt": 0,
"startTime": "2014-10-24T13:15:50.072007085-07:00",
"state": "StateFollower"
"name": "infra3",
"recvAppendRequestCnt": 0,
"sendAppendRequestCnt": 6535,
"sendBandwidthRate": 824.1758351191694,
"sendPkgRate": 11.111234716807138,
"startTime": "2015-02-09T11:38:28.972034204-08:00",
"state": "StateLeader"
}
```

View File

@ -27,6 +27,25 @@ https://github.com/coreos/etcd/blob/master/Documentation/configuration.md.
[migrationtooldoc]: https://github.com/coreos/etcd/blob/master/Documentation/0_4_migration_tool.md
#### Key-Value API
##### Read consistency flag
The consistent flag for read operations is removed in etcd 2.0.0. The normal read operations provides the same consistency guarantees with the 0.4.6 read operations with consistent flag set.
The read consistency guarantees are:
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to a etcd server successfully, it should be able to get the value out of the server immediately.
Each etcd member will proxy the request to leader and only return the result to user after the result is applied on the local member. Thus after the write succeed, the user is guaranteed to see the value on the member it sent the request to.
Reads do not provide linearizability. If you want linearizabilable read, you need to set quorum option to true.
**Previous behavior**
We added an option for a consistent read in the old version of etcd since etcd 0.x redirects the write request to the leader. When the user get back the result from the leader, the member it sent the request to originally might not apply the write request yet. With the consistent flag set to true, the client will always send read request to the leader. So one client should be able to see its last write when consistent=true is enabled. There is no order guarantees among different clients.
#### Standby
etcd 0.4s standby mode has been deprecated. [Proxy mode][proxymode] is introduced to solve a subset of problems standby was solving.

View File

@ -4,7 +4,9 @@
Starting an etcd cluster statically requires that each member knows another in the cluster. In a number of cases, you might not know the IPs of your cluster members ahead of time. In these cases, you can bootstrap an etcd cluster with the help of a discovery service.
This guide willcover the following mechanisms for bootstrapping an etcd cluster:
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md).
This guide will cover the following mechanisms for bootstrapping an etcd cluster:
* [Static](#static)
* [etcd Discovery](#etcd-discovery)
@ -28,7 +30,7 @@ ETCD_INITIAL_CLUSTER_STATE=new
```
```
-initial-cluster infra0=http://10.0.1.10:2380,http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
```
@ -39,22 +41,22 @@ If you are spinning up multiple clusters (or creating and destroying a single cl
On each machine you would start etcd with these flags:
```
$ etcd -name infra0 -initial-advertise-peer-urls https://10.0.1.10:2380 \
-listen-peer-urls https://10.0.1.10:2380 \
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
```
```
$ etcd -name infra1 -initial-advertise-peer-urls https://10.0.1.11:2380 \
-listen-peer-urls https://10.0.1.11:2380 \
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
-listen-peer-urls http://10.0.1.11:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
```
```
$ etcd -name infra2 -initial-advertise-peer-urls https://10.0.1.12:2380 \
-listen-peer-urls https://10.0.1.12:2380 \
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
-listen-peer-urls http://10.0.1.12:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new

View File

@ -68,9 +68,11 @@ To start etcd automatically using custom settings at startup in Linux, using a [
+ default: "default=http://localhost:2380,default=http://localhost:7001"
##### -initial-cluster-state
+ Initial cluster state ("new" or "existing").
+ Initial cluster state ("new" or "existing"). Set to `new` for all members present during initial static or DNS bootstrapping. If this option is set to `existing`, etcd will attempt to join the existing cluster. If the wrong value is set, etcd will attempt to start but fail safely.
+ default: "new"
[static bootstrap]: clustering.md#static
##### -initial-cluster-token
+ Initial cluster token for the etcd cluster during bootstrap.
+ default: "etcd-cluster"
@ -133,7 +135,9 @@ The security flags help to [build a secure etcd cluster][security].
### Unsafe Flags
Be CAUTIOUS to use unsafe flags because it will break the guarantee given by consensus protocol. For example, it may panic if other members in the cluster are still alive. Follow the instructions when using these falgs.
Please be CAUTIOUS when using unsafe flags because it will break the guarantees given by the consensus protocol.
For example, it may panic if other members in the cluster are still alive.
Follow the instructions when using these flags.
##### -force-new-cluster
+ Force to create a new one-member cluster. It commits configuration changes in force to remove all existing members in the cluster and add itself. It needs to be set to [restore a backup][restore].

View File

@ -0,0 +1,88 @@
# Running etcd under Docker
The following guide will show you how to run etcd under Docker using the [static bootstrap process](clustering.md#static).
## Running etcd in standalone mode
In order to expose the etcd API to clients outside of the Docker host you'll need use the host IP address when configuring etcd.
```
export HostIP="192.168.12.50"
```
The following `docker run` command will expose the etcd client API over ports 4001 and 2379, and expose the peer port over 2380.
```
docker run -d -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.3 \
-name etcd0 \
-advertise-client-urls http://${HostIP}:2379,http://${HostIP}:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://${HostIP}:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://${HostIP}:2380 \
-initial-cluster-state new
```
Configure etcd clients to use the Docker host IP and one of the listening ports from above.
```
etcdctl -C http://192.168.12.50:2379 member list
```
```
etcdctl -C http://192.168.12.50:4001 member list
```
## Running a 3 node etcd cluster
Using Docker to setup a multi-node cluster is very similar to the standalone mode configuration.
The main difference being the value used for the `-initial-cluster` flag, which must contain the peer urls for each etcd member in the cluster.
### etcd0
```
docker run -d -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.3 \
-name etcd0 \
-advertise-client-urls http://192.168.12.50:2379,http://192.168.12.50:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://192.168.12.50:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://192.168.12.50:2380,etcd1=http://192.168.12.51:2380,etcd2=http://192.168.12.52:2380 \
-initial-cluster-state new
```
### etcd1
```
docker run -d -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.3 \
-name etcd1 \
-advertise-client-urls http://192.168.12.51:2379,http://192.168.12.51:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://192.168.12.51:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://192.168.12.50:2380,etcd1=http://192.168.12.51:2380,etcd2=http://192.168.12.52:2380 \
-initial-cluster-state new
```
### etcd2
```
docker run -d -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.3 \
-name etcd2 \
-advertise-client-urls http://192.168.12.52:2379,http://192.168.12.52:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://192.168.12.52:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://192.168.12.50:2380,etcd1=http://192.168.12.51:2380,etcd2=http://192.168.12.52:2380 \
-initial-cluster-state new
```
Once the cluster has been bootstrapped etcd clients can be configured with a list of etcd members:
```
etcdctl -C http://192.168.12.50:2379,http://192.168.12.51:2379,http://192.168.12.52:2379 member list
```

View File

@ -8,6 +8,7 @@
- [etcd-fs](https://github.com/xetorthio/etcd-fs) - FUSE filesystem for etcd
- [etcd-browser](https://github.com/henszey/etcd-browser) - A web-based key/value editor for etcd using AngularJS
- [etcd-lock](https://github.com/datawisesystems/etcd-lock) - A lock implementation for etcd
- [etcd-console](https://github.com/matishsiao/etcd-console) - A web-base key/value editor for etcd using PHP
**Go libraries**

View File

@ -99,7 +99,7 @@ curl http://10.0.0.10:2379/v2/members/272e204152 -XDELETE
## Change the peer urls of a member
Change the peer urls of a given mamber. The member ID must be a hex-encoded uint64. Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
Change the peer urls of a given member. The member ID must be a hex-encoded uint64. Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
If the POST body is malformed an HTTP 400 will be returned. If the member does not exist in the cluster an HTTP 404 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.

View File

@ -4,29 +4,33 @@ etcd can now run as a transparent proxy. Running etcd as a proxy allows for easi
etcd currently supports two proxy modes: `readwrite` and `readonly`. The default mode is `readwrite`, which forwards both read and write requests to the etcd cluster. A `readonly` etcd proxy only forwards read requests to the etcd cluster, and returns `HTTP 501` to all write requests.
The proxy will shuffle the list of cluster members periodically to avoid sending all connections to a single member.
The member list used by proxy consists of all client URLs advertised within the cluster, as specified in each members' `-advertise-client-urls` flag. If this flag is set incorrectly, requests sent to the proxy are forwarded to wrong addresses and then fail. The fix for this problem is to restart etcd member with correct `-advertise-client-urls` flag. After client URLs list in proxy is recalculated, which happens every 30 seconds, requests will be forwarded correctly.
### Using an etcd proxy
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery-url`).
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery`).
To start a readwrite proxy, set `-proxy on`; To start a readonly proxy, set `-proxy readonly`.
The proxy will be listening on `listen-client-urls` and forward requests to the etcd cluster discovered from in `initial-cluster` or `discovery url`.
The proxy will be listening on `listen-client-urls` and forward requests to the etcd cluster discovered from in `initial-cluster` or `discovery` url.
#### Start an etcd proxy with a static configuration
To start a proxy that will connect to a statically defined etcd cluster, specify the `initial-cluster` flag:
```
etcd -proxy on -client-listen-urls 127.0.0.1:8080 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
etcd -proxy on -listen-client-urls 127.0.0.1:8080 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
```
#### Start an etcd proxy with the discovery service
If you bootstrap an etcd cluster using the [discovery service][discovery-service], you can also start the proxy with the same `discovery-url`.
If you bootstrap an etcd cluster using the [discovery service][discovery-service], you can also start the proxy with the same `discovery`.
To start a proxy using the discovery service, specify the `discovery-url` flag. The proxy will wait until the etcd cluster defined at the `discovery-url` finishes bootstrapping, and then start to forward the requests.
To start a proxy using the discovery service, specify the `discovery` flag. The proxy will wait until the etcd cluster defined at the `discovery` url finishes bootstrapping, and then start to forward the requests.
```
etcd -proxy on -client-listen-urls 127.0.0.1:8080 -discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
etcd -proxy on -listen-client-urls 127.0.0.1:8080 -discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
#### Fallback to proxy mode with discovery service
If you bootstrap a etcd cluster using [discovery service][discovery-service] with more than the expected number of etcd members, the extra etcd processes will fall back to being `readwrite` proxies by default. They will forward the requests to the cluster as described above. For example, if you create a discovery url with `size=5`, and start ten etcd processes using that same discovery URL, the result will be a cluster with five etcd members and five proxies. Note that this behaviour can be disabled with the `proxy-fallback` flag.
If you bootstrap a etcd cluster using [discovery service][discovery-service] with more than the expected number of etcd members, the extra etcd processes will fall back to being `readwrite` proxies by default. They will forward the requests to the cluster as described above. For example, if you create a discovery url with `size=5`, and start ten etcd processes using that same discovery url, the result will be a cluster with five etcd members and five proxies. Note that this behaviour can be disabled with the `proxy-fallback` flag.
[discovery-service]: https://github.com/coreos/etcd/blob/master/Documentation/clustering.md#discovery

View File

@ -0,0 +1,470 @@
# v2 Auth and Security
## etcd Resources
There are three types of resources in etcd
1. user resources: users and roles in the user store
2. key-value resources: key-value pairs in the key-value store
3. settings resources: security settings, auth settings, and dynamic etcd cluster settings (election/heartbeat)
### User Resources
#### Users
A user is an identity to be authenticated. Each user can have multiple roles. The user has a capability on the resource if one of the roles has that capability.
The special static `root` user has a ROOT role. (Caps for visual aid throughout)
#### Role
Each role has exact one associated Permission List. An permission list exists for each permission on key-value resources. A role with `manage` permission of a key-value resource can grant/revoke capability of that key-value to other roles.
The special static ROOT role has a full permissions on all key-value resources, the permission to manage user resources and settings resources. Only the ROOT role has the permission to manage user resources and modify settings resources.
#### Permissions
There are two types of permissions, `read` and `write`. All management stems from the ROOT user.
A Permission List is a list of allowed patterns for that particular permission (read or write). Only ALLOW prefixes (incidentally, this is what Amazon S3 does). DENY becomes more complicated and is TBD.
### Key-Value Resources
A key-value resource is a key-value pairs in the store. Given a list of matching patterns, permission for any given key in a request is granted if any of the patterns in the list match.
The glob match rules are as follows:
* `*` and `\` are special characters, representing "greedy match" and "escape" respectively.
* As a corrolary, `\*` and `\\` are the corresponding literal matches.
* All other bytes match exactly their bytes, starting always from the *first byte*. (For regex fans, `re.match` in Python)
* Examples:
* `/foo` matches only the single key/directory of `/foo`
* `/foo*` matches the prefix `/foo`, and all subdirectories/keys
* `/foo/*/bar` matches the keys bar in any (recursive) subdirectory of `/foo`.
### Settings Resources
Specific settings for the cluster as a whole. This can include adding and removing cluster members, enabling or disabling security, replacing certificates, and any other dynamic configuration by the administrator.
## v2 Auth
### Basic Auth
We only support [Basic Auth](http://en.wikipedia.org/wiki/Basic_access_authentication) for the first version. Client needs to attach the basic auth to the HTTP Authorization Header.
### Authorization field for operations
Added to requests to /v2/keys, /v2/security
Add code 403 Forbidden to the set of responses from the v2 API
Authorization: Basic {encoded string}
### Future Work
Other types of auth can be considered for the future (eg, signed certs, public keys) but the `Authorization:` header allows for other such types
### Things out of Scope for etcd Permissions
* Pluggable AUTH backends like LDAP (other Authorization tokens generated by LDAP et al may be a possiblity)
* Very fine-grained access controls (eg: users modifying keys outside work hours)
## API endpoints
An Error JSON corresponds to:
{
"name": "ErrErrorName",
"description" : "The longer helpful description of the error."
}
#### Users
The User JSON object is formed as follows:
```
{
"user": "userName"
"password": "password"
"roles": [
"role1",
"role2"
],
"grant": [],
"revoke": [],
"lastModified": "2006-01-02Z04:05:07"
}
```
Password is only passed when necessary. Last Modified is set by the server and ignored in all client posts.
**Get a list of users**
GET/HEAD /v2/security/user
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
403 Forbidden
200 Headers:
ETag: "<hash of list of users>"
Content-type: application/json
200 Body:
{
"users": ["alice", "bob", "eve"]
}
**Get User Details**
GET/HEAD /v2/security/users/alice
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
200 Headers:
ETag: "users/alice:<lastModified>"
Content-type: application/json
200 Body:
{
"user" : "alice"
"roles" : ["fleet", "etcd"]
"lastModified": "2015-02-05Z18:00:00"
}
**Create A User**
A user can be created with initial roles, if filled in. However, no roles are required; only the username and password fields
PUT /v2/security/users/charlie
Sent Headers:
Authorization: Basic <BasicAuthString>
Put Body:
JSON struct, above, matching the appropriate name and with starting roles.
Possible Status Codes:
200 OK
403 Forbidden
409 Conflict (if exists)
200 Headers:
ETag: "users/charlie:<tzNow>"
200 Body: (empty)
**Remove A User**
DELETE /v2/security/users/charlie
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
200 Headers:
200 Body: (empty)
**Grant a Role(s) to a User**
PUT /v2/security/users/charlie/grant
Sent Headers:
Authorization: Basic <BasicAuthString>
Put Body:
{ "grantRoles" : ["fleet", "etcd"], (extra JSON data for checking OK) }
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
409 Conflict
200 Headers:
ETag: "users/charlie:<tzNow>"
200 Body:
JSON user struct, updated. "roles" now contains the grants, and "grantRoles" is empty. If there is an error in the set of roles to be added, for example, a non-existent role, then 409 is returned, with an error JSON stating why.
**Revoke a Role(s) from a User**
PUT /v2/security/users/charlie/revoke
Sent Headers:
Authorization: Basic <BasicAuthString>
Put Body:
{ "revokeRoles" : ["fleet"], (extra JSON data for checking OK) }
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
409 Conflict
200 Headers:
ETag: "users/charlie:<tzNow>"
200 Body:
JSON user struct, updated. "roles" now doesn't contain the roles, and "revokeRoles" is empty. If there is an error in the set of roles to be removed, for example, a non-existent role, then 409 is returned, with an error JSON stating why.
**Change password**
PUT /v2/security/users/charlie/password
Sent Headers:
Authorization: Basic <BasicAuthString>
Put Body:
{"user": "charlie", "password": "newCharliePassword"}
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
200 Headers:
ETag: "users/charlie:<tzNow>"
200 Body:
JSON user struct, updated
#### Roles
A full role structure may look like this. A Permission List structure is used for the "permissions", "grant", and "revoke" keys.
```
{
"role" : "fleet",
"permissions" : {
"kv" {
"read" : [ "/fleet/" ],
"write": [ "/fleet/" ],
}
}
"grant" : {"kv": {...}},
"revoke": {"kv": {...}},
"members" : ["alice", "bob"],
"lastModified": "2015-02-05Z18:00:00"
}
```
**Get a list of Roles**
GET/HEAD /v2/security/roles
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
403 Forbidden
200 Headers:
ETag: "<hash of list of roles>"
Content-type: application/json
200 Body:
{
"roles": ["fleet", "etcd", "quay"]
}
**Get Role Details**
GET/HEAD /v2/security/roles/fleet
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
200 Headers:
ETag: "roles/fleet:<lastModified>"
Content-type: application/json
200 Body:
{
"role" : "fleet",
"read": {
"prefixesAllowed": ["/fleet/"],
},
"write": {
"prefixesAllowed": ["/fleet/"],
},
"members" : ["alice", "bob"] // Reverse map optional?
"lastModified": "2015-02-05Z18:00:00"
}
**Create A Role**
PUT /v2/security/roles/rocket
Sent Headers:
Authorization: Basic <BasicAuthString>
Put Body:
Initial desired JSON state, complete with prefixes and
Possible Status Codes:
201 Created
403 Forbidden
404 Not Found
409 Conflict (if exists)
200 Headers:
ETag: "roles/rocket:<tzNow>"
200 Body:
JSON state of the role
**Remove A Role**
DELETE /v2/security/roles/rocket
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
200 Headers:
200 Body: (empty)
**Update a Roles Permission List for {read,write}ing**
PUT /v2/security/roles/rocket/update
Sent Headers:
Authorization: Basic <BasicAuthString>
Put Body:
{
"role" : "rocket",
"grant": {
"kv": {
"read" : [ "/rocket/"]
}
},
"revoke": {
"kv": {
"read" : [ "/fleet/"]
}
}
}
Possible Status Codes:
200 OK
403 Forbidden
404 Not Found
200 Headers:
ETag: "roles/rocket:<tzNow>"
200 Body:
JSON state of the role, with change containing empty lists and the deltas applied appropriately.
#### TBD Management modification
## Example Workflow
Let's walk through an example to show two tenants (applications, in our case) using etcd permissions.
### Enable security
//TODO(barakmich): Maybe this is dynamic? I don't like the idea of rebooting when we don't have to.
#### Default ROOT
etcd always has a ROOT when started with security enabled. The default username is `root`, and the password is `root`.
// TODO(barakmich): if the enabling is dynamic, perhaps that'd be a good time to set a password? Thus obviating the next section.
### Change root's password
```
PUT /v2/security/users/root/password
Headers:
Authorization: Basic <root:root>
Put Body:
{"user" : "root", "password": "betterRootPW!"}
```
//TODO(barakmich): How do you recover the root password? *This* may require a flag and a restart. `--disable-permissions`
### Create Roles for the Applications
Create the rocket role fully specified:
```
PUT /v2/security/roles/rocket
Headers:
Authorization: Basic <root:betterRootPW!>
Body:
{
"role" : "rocket",
"permissions" : {
"kv": {
"read": [
"/rocket/"
],
"write": [
"/rocket/"
]
}
}
}
```
But let's make fleet just a basic role for now:
```
PUT /v2/security/roles/fleet
Headers:
Authorization: Basic <root:betterRootPW!>
Body:
{
"role" : "fleet",
}
```
### Optional: Add some permissions to the roles
Well, we finally figured out where we want fleet to live. Let's fix it.
(Note that we avoided this in the rocket case. So this step is optional.)
```
PUT /v2/security/roles/fleet/update
Headers:
Authorization: Basic <root:betterRootPW!>
Put Body:
{
"role" : "fleet",
"grant" : {
"kv" : {
"read": [
"/fleet/"
]
}
}
}
```
### Create Users
Same as before, let's use rocket all at once and fleet separately
```
PUT /v2/security/users/rocketuser
Headers:
Authorization: Basic <root:betterRootPW!>
Body:
{"user" : "rocketuser", "password" : "rocketpw", "roles" : ["rocket"]}
```
```
PUT /v2/security/users/fleetuser
Headers:
Authorization: Basic <root:betterRootPW!>
Body:
{"user" : "fleetuser", "password" : "fleetpw"}
```
### Optional: Grant Roles to Users
Likewise, let's explicitly grant fleetuser access.
```
PUT /v2/security/users/fleetuser/grant
Headers:
Authorization: Basic <root:betterRootPW!>
Body:
{"user": "fleetuser", "grant": ["fleet"]}
```
#### Start to use fleetuser and rocketuser
For example:
```
PUT /v2/keys/rocket/RocketData
Headers:
Authorization: Basic <rocketuser:rocketpw>
```
Reads and writes outside the prefixes granted will fail with a 403 Forbidden.

View File

@ -8,40 +8,40 @@ Reconfiguration requests can only be processed when the the majority of the clus
## Reconfiguration Use Cases
Let us walk through the four use cases for re-configuring a cluster: replacing a member, increasing or decreasing cluster size, and restarting a cluster from a majority failure.
Let us walk through some common reasons for reconfiguring a cluster. Most of these just involve combinations of adding or removing a member, which are explained below under [Cluster Reconfiguration Operations](#cluster-reconfiguration-operations).
### Replace a Non-recoverable Member
### Cycle or Upgrade Multiple Machines
The most common use case of cluster reconfiguration is to replace a member because of a permanent failure of the existing member: for example, hardware failure or data directory corruption.
It is important to replace failed members as soon as the failure is detected.
If etcd falls below a simple majority of members it can no longer accept writes: e.g. in a 3 member cluster the loss of two members will cause writes to fail and the cluster to stop operating.
If you need to move multiple members of your cluster due to planned maintenance (hardware upgrades, network downtime, etc.), it is recommended to modify members one at a time.
If you want to migrate a running member to another machine, please refer [member migration section][member migration].
It is safe to remove the leader, however there is a brief period of downtime while the election process takes place. If your cluster holds more than 50MB, it is recommended to [migrate the member's data directory][member migration].
[member migration]: https://github.com/coreos/etcd/blob/master/Documentation/admin_guide.md#member-migration
[member migration]: admin_guide.md#member-migration
### Increase Cluster Size
### Change the Cluster Size
To make your cluster more resilient to machine failure you can increase the size of the cluster.
For example, if the cluster consists of three machines, it can tolerate one failure.
If we increase the cluster size to five, it can tolerate two machine failures.
Increasing the cluster size can enhance [failure tolerance][fault tolerance table] and provide better read performance. Since clients can read from any member, increasing the number of members increases the overall read throughput.
Increasing the cluster size can also provide better read performance.
When a client accesses etcd, the normal read gets the data from the local copy of each member (members always shares the same view of the cluster at the same index, which is guaranteed by the sequential consistency of etcd).
Since clients can read from any member, increasing the number of members thus increases overall read throughput.
Decreasing the cluster size can improve the write performance of a cluster, with a trade-off of decreased resilience. Writes into the cluster are replicated to a majority of members of the cluster before considered committed. Decreasing the cluster size lowers the majority, and each write is committed more quickly.
### Decrease Cluster Size
[fault tolerance table]: admin_guide.md#fault-tolerance-table
To improve the write performance of a cluster, you might want to trade off resilience by removing members.
etcd replicates the data to the majority of members of the cluster before committing the write.
Decreasing the cluster size means the etcd cluster has to do less work for each write, thus increasing the write performance.
### Replace A Failed Machine
If a machine fails due to hardware failure, data directory corruption, or some other fatal situation, it should be replaced as soon as possible. Machines that have failed but haven't been removed adversely affect your quorum and reduce the tolerance for an additional failure.
To replace the machine, follow the instructions for [removing the member][remove member] from the cluster, and then [add a new member][add member] in its place. If your cluster holds more than 50MB, it is recommended to [migrate the failed member's data directory][member migration] if you can still access it.
[remove member]: #remove-a-member
[add member]: #add-a-new-member
### Restart Cluster from Majority Failure
If the majority of your cluster is lost, then you need to take manual action in order to recover safely.
The basic steps in the recovery process include creating a new cluster using the old data, forcing a single member to act as the leader, and finally using runtime configuration to add members to this new cluster.
The basic steps in the recovery process include [creating a new cluster using the old data][disaster recovery], forcing a single member to act as the leader, and finally using runtime configuration to [add new members][add member] to this new cluster one at a time.
TODO: https://github.com/coreos/etcd/issues/1242
[add member]: #add-a-new-member
[disaster recovery]: admin_guide.md#disaster-recovery
## Cluster Reconfiguration Operations
@ -61,7 +61,7 @@ If you want to use the member API directly you can find the documentation [here]
### Remove a Member
First, we need to find the target member:
First, we need to find the target member's ID. You can list all members with `etcdctl`:
```
$ etcdctl member list
@ -84,27 +84,27 @@ The target member will stop itself at this point and print out the removal in th
etcd: this member has been permanently removed from the cluster. Exiting.
```
Removal of the leader is safe, but the cluster will be out of progress for a period of election timeout because it needs to elect the new leader.
It is safe to remove the leader, however the cluster will be inactive while a new leader is elected. This duration is normally the period of election timeout plus the voting process.
### Add a Member
### Add a New Member
Adding a member is a two step process:
* Add the new member to the cluster via the [members API](https://github.com/coreos/etcd/blob/master/Documentation/other_apis.md#post-v2members) or the `etcdctl member add` command.
* Start the member with the correct configuration.
* Start the new member with the new cluster configuration, including a list of the updated members (existing members + the new member).
Using `etcdctl` let's add the new member to the cluster:
Using `etcdctl` let's add the new member to the cluster by specifing its [name](configuration.md#-name) and [advertised peer URLs](configuration.md#-initial-advertise-peer-urls):
```
$ etcdctl member add infra3 http://10.0.1.13:2380
added member 9bf1b35fc7761a23 to cluster
ETCD_NAME="infra3"
ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE=existing
```
> Notice that infra3 was added to the cluster using its advertised peer URL.
`etcdctl` has informed the cluster about the new member and printed out the environment variables needed to successfully start it.
Now start the new etcd process with the relevant flags for the new member:
```
@ -116,8 +116,8 @@ $ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://1
The new member will run as a part of the cluster and immediately begin catching up with the rest of the cluster.
If you are adding multiple members the best practice is to configure the new member, then start the process, then configure the next, and so on.
A common case is increasing a cluster from 1 to 3: if you add one member to a 1-node cluster, the cluster cannot make progress before the new member starts because it needs two members as majority to agree on the consensus.
If you are adding multiple members the best practice is to configure a single member at a time and verify it starts correctly before adding more new members.
If you add a new member to a 1-node cluster, the cluster cannot make progress before the new member starts because it needs two members as majority to agree on the consensus. You will only see this behavior between the time `etcdctl member add` informs the cluster about the new member and the new member successfully establishing a connection to the existing one.
#### Error Cases

View File

@ -11,11 +11,11 @@ The underlying distributed consensus protocol relies on two separate time parame
The first parameter is called the *Heartbeat Interval*.
This is the frequency with which the leader will notify followers that it is still the leader.
etcd batches commands together for higher throughput so this heartbeat interval is also a delay for how long it takes for commands to be committed.
By default, etcd uses a `50ms` heartbeat interval.
By default, etcd uses a `100ms` heartbeat interval.
The second parameter is the *Election Timeout*.
This timeout is how long a follower node will go without hearing a heartbeat before attempting to become leader itself.
By default, etcd uses a `200ms` election timeout.
By default, etcd uses a `1000ms` election timeout.
Adjusting these values is a trade off.
Lowering the heartbeat interval will cause individual commands to be committed faster but it will lower the overall throughput of etcd.

2
Godeps/Godeps.json generated
View File

@ -1,6 +1,6 @@
{
"ImportPath": "github.com/coreos/etcd",
"GoVersion": "go1.3.1",
"GoVersion": "go1.4.1",
"Packages": [
"./..."
],

View File

@ -2,4 +2,4 @@
etcd1: bin/etcd -name infra1 -listen-client-urls http://localhost:4001 -advertise-client-urls http://localhost:4001 -listen-peer-urls http://localhost:7001 -initial-advertise-peer-urls http://localhost:7001 -initial-cluster-token etcd-cluster-1 -initial-cluster 'infra1=http://localhost:7001,infra2=http://localhost:7002,infra3=http://localhost:7003' -initial-cluster-state new
etcd2: bin/etcd -name infra2 -listen-client-urls http://localhost:4002 -advertise-client-urls http://localhost:4002 -listen-peer-urls http://localhost:7002 -initial-advertise-peer-urls http://localhost:7002 -initial-cluster-token etcd-cluster-1 -initial-cluster 'infra1=http://localhost:7001,infra2=http://localhost:7002,infra3=http://localhost:7003' -initial-cluster-state new
etcd3: bin/etcd -name infra3 -listen-client-urls http://localhost:4003 -advertise-client-urls http://localhost:4003 -listen-peer-urls http://localhost:7003 -initial-advertise-peer-urls http://localhost:7003 -initial-cluster-token etcd-cluster-1 -initial-cluster 'infra1=http://localhost:7001,infra2=http://localhost:7002,infra3=http://localhost:7003' -initial-cluster-state new
proxy: bin/etcd -proxy=on -bind-addr 127.0.0.1:8080 -initial-cluster 'infra1=http://localhost:7001,infra2=http://localhost:7002,infra3=http://localhost:7003'
proxy: bin/etcd -name proxy1 -proxy=on -bind-addr 127.0.0.1:8080 -initial-cluster 'infra1=http://localhost:7001,infra2=http://localhost:7002,infra3=http://localhost:7003'

View File

@ -5,8 +5,7 @@
![etcd Logo](logos/etcd-horizontal-color.png)
A highly-available key value store for shared configuration and service discovery.
etcd is inspired by [Apache ZooKeeper][zookeeper] and [doozer][doozer], with a focus on being:
etcd is a distributed, consistent key value store for shared configuration and service discovery with a focus on being:
* *Simple*: curl'able user facing API (HTTP+JSON)
* *Secure*: optional SSL client cert authentication

8
build
View File

@ -11,8 +11,8 @@ ln -s ${PWD} $GOPATH/src/${REPO_PATH}
eval $(go env)
GIT_SHA=`git rev-parse --short HEAD || echo "GitNotFound"`
# Static compilation is useful when etcd is run in a container
CGO_ENABLED=0 go build -a -ldflags '-s' -o bin/etcd ${REPO_PATH}
CGO_ENABLED=0 go build -a -ldflags '-s' -o bin/etcdctl ${REPO_PATH}/etcdctl
go build -o bin/etcd-migrate ${REPO_PATH}/tools/etcd-migrate
go build -o bin/etcd-dump-logs ${REPO_PATH}/tools/etcd-dump-logs
CGO_ENABLED=0 go build -a -installsuffix cgo -ldflags "-s -X ${REPO_PATH}/version.GitSHA ${GIT_SHA}" -o bin/etcd ${REPO_PATH}
CGO_ENABLED=0 go build -a -installsuffix cgo -ldflags "-s" -o bin/etcdctl ${REPO_PATH}/etcdctl

View File

@ -89,7 +89,7 @@ func TestV2KeysURLHelper(t *testing.T) {
func TestGetAction(t *testing.T) {
ep := url.URL{Scheme: "http", Host: "example.com/v2/keys"}
wantURL := &url.URL{
baseWantURL := &url.URL{
Scheme: "http",
Host: "example.com",
Path: "/v2/keys/foo/bar",
@ -117,7 +117,7 @@ func TestGetAction(t *testing.T) {
}
got := *f.HTTPRequest(ep)
wantURL := wantURL
wantURL := baseWantURL
wantURL.RawQuery = tt.wantQuery
err := assertResponse(got, wantURL, wantHeader, nil)
@ -129,7 +129,7 @@ func TestGetAction(t *testing.T) {
func TestWaitAction(t *testing.T) {
ep := url.URL{Scheme: "http", Host: "example.com/v2/keys"}
wantURL := &url.URL{
baseWantURL := &url.URL{
Scheme: "http",
Host: "example.com",
Path: "/v2/keys/foo/bar",
@ -166,7 +166,7 @@ func TestWaitAction(t *testing.T) {
}
got := *f.HTTPRequest(ep)
wantURL := wantURL
wantURL := baseWantURL
wantURL.RawQuery = tt.wantQuery
err := assertResponse(got, wantURL, wantHeader, nil)

View File

@ -105,7 +105,7 @@ func (m *httpMembersAPI) Remove(ctx context.Context, memberID string) error {
return err
}
return assertStatusCode(resp.StatusCode, http.StatusNoContent)
return assertStatusCode(resp.StatusCode, http.StatusNoContent, http.StatusGone)
}
type membersAPIActionList struct{}

View File

@ -193,14 +193,14 @@ func TestCheckCluster(t *testing.T) {
})
}
c := &clientWithResp{rs: rs}
d := discovery{cluster: cluster, id: 1, c: c}
dBase := discovery{cluster: cluster, id: 1, c: c}
cRetry := &clientWithRetry{failTimes: 3}
cRetry.rs = rs
fc := clockwork.NewFakeClock()
dRetry := discovery{cluster: cluster, id: 1, c: cRetry, clock: fc}
for _, d := range []discovery{d, dRetry} {
for _, d := range []discovery{dBase, dRetry} {
go func() {
for i := uint(1); i <= maxRetryInTest; i++ {
fc.BlockUntil(1)
@ -263,7 +263,7 @@ func TestWaitNodes(t *testing.T) {
for i, tt := range tests {
// Basic case
c := &clientWithResp{nil, &watcherWithResp{tt.rs}}
d := &discovery{cluster: "1000", c: c}
dBase := &discovery{cluster: "1000", c: c}
// Retry case
retryScanResp := make([]*client.Response, 0)
@ -291,7 +291,7 @@ func TestWaitNodes(t *testing.T) {
clock: fc,
}
for _, d := range []*discovery{d, dRetry} {
for _, d := range []*discovery{dBase, dRetry} {
go func() {
for i := uint(1); i <= maxRetryInTest; i++ {
fc.BlockUntil(1)

View File

@ -25,7 +25,8 @@ import (
var (
// indirection for testing
lookupSRV = net.LookupSRV
lookupSRV = net.LookupSRV
resolveTCPAddr = net.ResolveTCPAddr
)
// TODO(barakmich): Currently ignores priority and weight (as they don't make as much sense for a bootstrap)
@ -38,7 +39,7 @@ func SRVGetCluster(name, dns string, defaultToken string, apurls types.URLs) (st
// First, resolve the apurls
for _, url := range apurls {
tcpAddr, err := net.ResolveTCPAddr("tcp", url.Host)
tcpAddr, err := resolveTCPAddr("tcp", url.Host)
if err != nil {
log.Printf("discovery: Couldn't resolve host %s during SRV discovery", url.Host)
return "", "", err
@ -52,8 +53,9 @@ func SRVGetCluster(name, dns string, defaultToken string, apurls types.URLs) (st
return err
}
for _, srv := range addrs {
host := net.JoinHostPort(srv.Target, fmt.Sprintf("%d", srv.Port))
tcpAddr, err := net.ResolveTCPAddr("tcp", host)
target := strings.TrimSuffix(srv.Target, ".")
host := net.JoinHostPort(target, fmt.Sprintf("%d", srv.Port))
tcpAddr, err := resolveTCPAddr("tcp", host)
if err != nil {
log.Printf("discovery: Couldn't resolve host %s during SRV discovery", host)
continue
@ -68,8 +70,8 @@ func SRVGetCluster(name, dns string, defaultToken string, apurls types.URLs) (st
n = fmt.Sprintf("%d", tempName)
tempName += 1
}
stringParts = append(stringParts, fmt.Sprintf("%s=%s%s", n, prefix, tcpAddr.String()))
log.Printf("discovery: Got bootstrap from DNS for %s at host %s to %s%s", service, host, prefix, tcpAddr.String())
stringParts = append(stringParts, fmt.Sprintf("%s=%s%s", n, prefix, host))
log.Printf("discovery: Got bootstrap from DNS for %s at %s%s", service, prefix, host)
}
return nil
}

View File

@ -23,19 +23,26 @@ import (
)
func TestSRVGetCluster(t *testing.T) {
defer func() { lookupSRV = net.LookupSRV }()
defer func() {
lookupSRV = net.LookupSRV
resolveTCPAddr = net.ResolveTCPAddr
}()
name := "dnsClusterTest"
tests := []struct {
withSSL []*net.SRV
withoutSSL []*net.SRV
urls []string
expected string
dns map[string]string
expected string
}{
{
[]*net.SRV{},
[]*net.SRV{},
nil,
nil,
"",
},
{
@ -46,6 +53,8 @@ func TestSRVGetCluster(t *testing.T) {
},
[]*net.SRV{},
nil,
nil,
"0=https://10.0.0.1:2480,1=https://10.0.0.2:2480,2=https://10.0.0.3:2480",
},
{
@ -58,6 +67,7 @@ func TestSRVGetCluster(t *testing.T) {
&net.SRV{Target: "10.0.0.1", Port: 7001},
},
nil,
nil,
"0=https://10.0.0.1:2480,1=https://10.0.0.2:2480,2=https://10.0.0.3:2480,3=http://10.0.0.1:7001",
},
{
@ -70,8 +80,22 @@ func TestSRVGetCluster(t *testing.T) {
&net.SRV{Target: "10.0.0.1", Port: 7001},
},
[]string{"https://10.0.0.1:2480"},
nil,
"dnsClusterTest=https://10.0.0.1:2480,0=https://10.0.0.2:2480,1=https://10.0.0.3:2480,2=http://10.0.0.1:7001",
},
// matching local member with resolved addr and return unresolved hostnames
{
[]*net.SRV{
&net.SRV{Target: "1.example.com.", Port: 2480},
&net.SRV{Target: "2.example.com.", Port: 2480},
&net.SRV{Target: "3.example.com.", Port: 2480},
},
nil,
[]string{"https://10.0.0.1:2480"},
map[string]string{"1.example.com:2480": "10.0.0.1:2480", "2.example.com:2480": "10.0.0.2:2480", "3.example.com:2480": "10.0.0.3:2480"},
"dnsClusterTest=https://1.example.com:2480,0=https://2.example.com:2480,1=https://3.example.com:2480",
},
}
for i, tt := range tests {
@ -84,6 +108,12 @@ func TestSRVGetCluster(t *testing.T) {
}
return "", nil, errors.New("Unkown service in mock")
}
resolveTCPAddr = func(network, addr string) (*net.TCPAddr, error) {
if tt.dns == nil || tt.dns[addr] == "" {
return net.ResolveTCPAddr(network, addr)
}
return net.ResolveTCPAddr(network, tt.dns[addr])
}
urls := testutil.MustNewURLs(t, tt.urls)
str, token, err := SRVGetCluster(name, "example.com", "token", urls)
if err != nil {

View File

@ -15,6 +15,7 @@
package command
import (
"fmt"
"log"
"os"
"path"
@ -43,10 +44,10 @@ func NewBackupCommand() cli.Command {
// handleBackup handles a request that intends to do a backup.
func handleBackup(c *cli.Context) {
srcSnap := path.Join(c.String("data-dir"), "snap")
destSnap := path.Join(c.String("backup-dir"), "snap")
srcWAL := path.Join(c.String("data-dir"), "wal")
destWAL := path.Join(c.String("backup-dir"), "wal")
srcSnap := path.Join(c.String("data-dir"), "member", "snap")
destSnap := path.Join(c.String("backup-dir"), "member", "snap")
srcWAL := path.Join(c.String("data-dir"), "member", "wal")
destWAL := path.Join(c.String("backup-dir"), "member", "wal")
if err := os.MkdirAll(destSnap, 0700); err != nil {
log.Fatalf("failed creating backup snapshot dir %v: %v", destSnap, err)
@ -71,7 +72,12 @@ func handleBackup(c *cli.Context) {
}
defer w.Close()
wmetadata, state, ents, err := w.ReadAll()
if err != nil {
switch err {
case nil:
case wal.ErrSnapshotNotFound:
fmt.Printf("Failed to find the match snapshot record %+v in wal %v.", walsnap, srcWAL)
fmt.Printf("etcdctl will add it back. Start auto fixing...")
default:
log.Fatal(err)
}
var metadata etcdserverpb.Metadata
@ -88,4 +94,7 @@ func handleBackup(c *cli.Context) {
if err := neww.Save(state, ents); err != nil {
log.Fatal(err)
}
if err := neww.SaveSnapshot(walsnap); err != nil {
log.Fatal(err)
}
}

View File

@ -0,0 +1,142 @@
package command
import (
"encoding/json"
"errors"
"fmt"
"net/http"
"os"
"sort"
"strings"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/coreos/go-etcd/etcd"
"github.com/coreos/etcd/etcdserver/stats"
)
func NewClusterHealthCommand() cli.Command {
return cli.Command{
Name: "cluster-health",
Usage: "check the health of the etcd cluster",
Flags: []cli.Flag{},
Action: handleClusterHealth,
}
}
func handleClusterHealth(c *cli.Context) {
endpoints, err := getEndpoints(c)
if err != nil {
handleError(ErrorFromEtcd, err)
}
tr, err := getTransport(c)
if err != nil {
handleError(ErrorFromEtcd, err)
}
client := etcd.NewClient(endpoints)
client.SetTransport(tr)
if c.GlobalBool("debug") {
go dumpCURL(client)
}
if ok := client.SyncCluster(); !ok {
handleError(FailedToConnectToHost, errors.New("cannot sync with the cluster using endpoints "+strings.Join(endpoints, ", ")))
}
// do we have a leader?
cl := client.GetCluster()
ep, ls0, err := getLeaderStats(tr, cl)
if err != nil {
fmt.Println("cluster may be unhealthy: failed to connect", cl)
os.Exit(1)
}
// is raft stable and making progress?
client = etcd.NewClient([]string{ep})
client.SetTransport(tr)
resp, err := client.Get("/", false, false)
if err != nil {
fmt.Println("cluster is unhealthy")
os.Exit(1)
}
rt0, ri0 := resp.RaftTerm, resp.RaftIndex
time.Sleep(time.Second)
resp, err = client.Get("/", false, false)
if err != nil {
fmt.Println("cluster is unhealthy")
os.Exit(1)
}
rt1, ri1 := resp.RaftTerm, resp.RaftIndex
if rt0 != rt1 {
fmt.Println("cluster is unhealthy")
os.Exit(1)
}
if ri1 == ri0 {
fmt.Println("cluster is unhealthy")
os.Exit(1)
}
// are all the members makeing progress?
_, ls1, err := getLeaderStats(tr, []string{ep})
if err != nil {
fmt.Println("cluster is unhealthy")
os.Exit(1)
}
fmt.Println("cluster is healthy")
// self is healthy
var prints []string
prints = append(prints, fmt.Sprintf("member %s is healthy\n", ls1.Leader))
for name, fs0 := range ls0.Followers {
fs1, ok := ls1.Followers[name]
if !ok {
fmt.Println("Cluster configuration changed during health checking. Please retry.")
os.Exit(1)
}
if fs1.Counts.Success <= fs0.Counts.Success {
prints = append(prints, fmt.Sprintf("member %s is unhealthy\n", name))
} else {
prints = append(prints, fmt.Sprintf("member %s is healthy\n", name))
}
}
sort.Strings(prints)
for _, p := range prints {
fmt.Print(p)
}
os.Exit(0)
}
func getLeaderStats(tr *http.Transport, endpoints []string) (string, *stats.LeaderStats, error) {
// go-etcd does not support cluster stats, use http client for now
// TODO: use new etcd client with new member/stats endpoint
httpclient := http.Client{
Transport: tr,
}
for _, ep := range endpoints {
resp, err := httpclient.Get(ep + "/v2/stats/leader")
if err != nil {
continue
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
continue
}
ls := &stats.LeaderStats{}
d := json.NewDecoder(resp.Body)
err = d.Decode(ls)
if err != nil {
continue
}
return ep, ls, nil
}
return "", nil, errors.New("no leader")
}

View File

@ -0,0 +1,128 @@
package command
import (
"errors"
"fmt"
"io/ioutil"
"log"
"os"
"strings"
"sync"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/coreos/go-etcd/etcd"
"github.com/coreos/etcd/store"
)
type set struct {
key string
value string
ttl int64
}
func NewImportSnapCommand() cli.Command {
return cli.Command{
Name: "import",
Usage: "import a snapshot to a cluster",
Flags: []cli.Flag{
cli.StringFlag{Name: "snap", Value: "", Usage: "Path to the vaild etcd 0.4.x snapshot."},
cli.StringSliceFlag{Name: "hidden", Value: new(cli.StringSlice), Usage: "Hidden key spaces to import from snapshot"},
cli.IntFlag{Name: "c", Value: 10, Usage: "Number of concurrent clients to import the data"},
},
Action: handleImportSnap,
}
}
func handleImportSnap(c *cli.Context) {
d, err := ioutil.ReadFile(c.String("snap"))
if err != nil {
if c.String("snap") == "" {
fmt.Printf("no snapshot file provided (use --snap)\n")
} else {
fmt.Printf("cannot read snapshot file %s\n", c.String("snap"))
}
os.Exit(1)
}
st := store.New()
err = st.Recovery(d)
if err != nil {
fmt.Printf("cannot recover the snapshot file: %v\n", err)
os.Exit(1)
}
endpoints, err := getEndpoints(c)
if err != nil {
handleError(ErrorFromEtcd, err)
}
tr, err := getTransport(c)
if err != nil {
handleError(ErrorFromEtcd, err)
}
wg := &sync.WaitGroup{}
setc := make(chan set)
concurrent := c.Int("c")
fmt.Printf("starting to import snapshot %s with %d clients\n", c.String("snap"), concurrent)
for i := 0; i < concurrent; i++ {
client := etcd.NewClient(endpoints)
client.SetTransport(tr)
if c.GlobalBool("debug") {
go dumpCURL(client)
}
if ok := client.SyncCluster(); !ok {
handleError(FailedToConnectToHost, errors.New("cannot sync with the cluster using endpoints "+strings.Join(endpoints, ", ")))
}
wg.Add(1)
go runSet(client, setc, wg)
}
all, err := st.Get("/", true, true)
if err != nil {
handleError(ErrorFromEtcd, err)
}
n := copyKeys(all.Node, setc)
hiddens := c.StringSlice("hidden")
for _, h := range hiddens {
allh, err := st.Get(h, true, true)
if err != nil {
handleError(ErrorFromEtcd, err)
}
n += copyKeys(allh.Node, setc)
}
close(setc)
wg.Wait()
fmt.Printf("finished importing %d keys\n", n)
}
func copyKeys(n *store.NodeExtern, setc chan set) int {
num := 0
if !n.Dir {
setc <- set{n.Key, *n.Value, n.TTL}
return 1
}
log.Println("entering dir:", n.Key)
for _, nn := range n.Nodes {
sub := copyKeys(nn, setc)
num += sub
}
return num
}
func runSet(c *etcd.Client, setc chan set, wg *sync.WaitGroup) {
for s := range setc {
log.Println("copying key:", s.key)
if s.ttl != 0 && s.ttl < 300 {
log.Printf("extending key %s's ttl to 300 seconds", s.key)
s.ttl = 5 * 60
}
_, err := c.Set(s.key, s.value, uint64(s.ttl))
if err != nil {
log.Fatalf("failed to copy key: %v\n", err)
}
}
wg.Done()
}

View File

@ -134,10 +134,10 @@ func actionMemberAdd(c *cli.Context) {
}
conf := []string{}
for _, m := range members {
for _, u := range m.PeerURLs {
n := m.Name
if m.ID == newID {
for _, memb := range members {
for _, u := range memb.PeerURLs {
n := memb.Name
if memb.ID == newID {
n = newName
}
conf = append(conf, fmt.Sprintf("%s=%s", n, u))
@ -156,16 +156,42 @@ func actionMemberRemove(c *cli.Context) {
fmt.Fprintln(os.Stderr, "Provide a single member ID")
os.Exit(1)
}
removalID := args[0]
mAPI := mustNewMembersAPI(c)
mID := args[0]
ctx, cancel := context.WithTimeout(context.Background(), client.DefaultRequestTimeout)
err := mAPI.Remove(ctx, mID)
cancel()
// Get the list of members.
listctx, listCancel := context.WithTimeout(context.Background(), client.DefaultRequestTimeout)
members, err := mAPI.List(listctx)
listCancel()
if err != nil {
fmt.Fprintln(os.Stderr, err.Error())
fmt.Fprintln(os.Stderr, "Error while verifying ID against known members:", err.Error())
os.Exit(1)
}
// Sanity check the input.
foundID := false
for _, m := range members {
if m.ID == removalID {
foundID = true
}
if m.Name == removalID {
// Note that, so long as it's not ambiguous, we *could* do the right thing by name here.
fmt.Fprintf(os.Stderr, "Found a member named %s; if this is correct, please use its ID, eg:\n\tetcdctl member remove %s\n", m.Name, m.ID)
fmt.Fprintf(os.Stderr, "For more details, read the documentation at https://github.com/coreos/etcd/blob/master/Documentation/runtime-configuration.md#remove-a-member\n\n")
}
}
if !foundID {
fmt.Fprintf(os.Stderr, "Couldn't find a member in the cluster with an ID of %s.\n", removalID)
os.Exit(1)
}
fmt.Printf("Removed member %s from cluster\n", mID)
// Actually attempt to remove the member.
ctx, removeCancel := context.WithTimeout(context.Background(), client.DefaultRequestTimeout)
err = mAPI.Remove(ctx, removalID)
removeCancel()
if err != nil {
fmt.Fprintf(os.Stderr, "Recieved an error trying to remove member %s: %s", removalID, err.Error())
os.Exit(1)
}
fmt.Printf("Removed member %s from cluster\n", removalID)
}

View File

@ -1,69 +0,0 @@
/*
Copyright 2015 CoreOS, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package command
import (
"fmt"
"log"
"net/http"
"os"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli"
)
func UpgradeCommand() cli.Command {
return cli.Command{
Name: "upgrade",
Usage: "upgrade an old version etcd cluster to a new version",
Flags: []cli.Flag{
cli.StringFlag{Name: "old-version", Value: "1", Usage: "Old internal version"},
cli.StringFlag{Name: "new-version", Value: "2", Usage: "New internal version"},
cli.StringFlag{Name: "peer-url", Value: "", Usage: "An etcd peer url string"},
},
Action: handleUpgrade,
}
}
func handleUpgrade(c *cli.Context) {
if c.String("old-version") != "1" {
fmt.Printf("Do not support upgrade from version %s\n", c.String("old-version"))
os.Exit(1)
}
if c.String("new-version") != "2" {
fmt.Printf("Do not support upgrade to version %s\n", c.String("new-version"))
os.Exit(1)
}
t, err := getTransport(c)
if err != nil {
log.Fatal(err)
}
client := http.Client{Transport: t}
resp, err := client.Get(c.String("peer-url") + "/v2/admin/next-internal-version")
if err != nil {
fmt.Printf("Failed to send upgrade request to %s: %v\n", c.String("peer-url"), err)
return
}
if resp.StatusCode == http.StatusOK {
fmt.Println("Cluster will start upgrading from internal version 1 to 2 in 10 seconds.")
return
}
if resp.StatusCode == http.StatusNotFound {
fmt.Println("Cluster cannot upgrade to 2: version is not 0.4.7")
return
}
fmt.Printf("Faild to send upgrade request to %s: bad status code %d\n", c.String("cluster-url"), resp.StatusCode)
}

View File

@ -65,7 +65,7 @@ func getPeersFlagValue(c *cli.Context) []string {
// If we still don't have peers, use a default
if peerstr == "" {
peerstr = "127.0.0.1:4001"
peerstr = "127.0.0.1:4001,127.0.0.1:2379"
}
return strings.Split(peerstr, ",")

View File

@ -31,7 +31,7 @@ func main() {
app.Flags = []cli.Flag{
cli.BoolFlag{Name: "debug", Usage: "output cURL commands which can be used to reproduce the request"},
cli.BoolFlag{Name: "no-sync", Usage: "don't synchronize cluster information before sending request"},
cli.StringFlag{Name: "output, o", Value: "simple", Usage: "output response in the given format (`simple` or `json`)"},
cli.StringFlag{Name: "output, o", Value: "simple", Usage: "output response in the given format (`simple`, `extended` or `json`)"},
cli.StringFlag{Name: "peers, C", Value: "", Usage: "a comma-delimited list of machine addresses in the cluster (default: \"127.0.0.1:4001\")"},
cli.StringFlag{Name: "cert-file", Value: "", Usage: "identify HTTPS client using this SSL certificate file"},
cli.StringFlag{Name: "key-file", Value: "", Usage: "identify HTTPS client using this SSL key file"},
@ -39,6 +39,7 @@ func main() {
}
app.Commands = []cli.Command{
command.NewBackupCommand(),
command.NewClusterHealthCommand(),
command.NewMakeCommand(),
command.NewMakeDirCommand(),
command.NewRemoveCommand(),
@ -52,7 +53,7 @@ func main() {
command.NewWatchCommand(),
command.NewExecWatchCommand(),
command.NewMemberCommand(),
command.UpgradeCommand(),
command.NewImportSnapCommand(),
}
app.Run(os.Args)

View File

@ -15,18 +15,17 @@
package etcdmain
import (
"errors"
"flag"
"fmt"
"log"
"net/url"
"os"
"runtime"
"strings"
"github.com/coreos/etcd/etcdserver"
"github.com/coreos/etcd/pkg/cors"
"github.com/coreos/etcd/pkg/flags"
"github.com/coreos/etcd/pkg/netutil"
"github.com/coreos/etcd/pkg/transport"
"github.com/coreos/etcd/version"
)
@ -41,6 +40,8 @@ const (
clusterStateFlagNew = "new"
clusterStateFlagExisting = "existing"
defaultName = "default"
)
var (
@ -62,6 +63,7 @@ var (
ErrConflictBootstrapFlags = fmt.Errorf("multiple discovery or bootstrap flags are set" +
"Choose one of \"initial-cluster\", \"discovery\" or \"discovery-srv\"")
errUnsetAdvertiseClientURLsFlag = fmt.Errorf("-advertise-client-urls is required when -listen-client-urls is set explicitly")
)
type config struct {
@ -137,7 +139,7 @@ func NewConfig() *config {
fs.Var(flags.NewURLsValue("http://localhost:2379,http://localhost:4001"), "listen-client-urls", "List of URLs to listen on for client traffic")
fs.UintVar(&cfg.maxSnapFiles, "max-snapshots", defaultMaxSnapshots, "Maximum number of snapshot files to retain (0 is unlimited)")
fs.UintVar(&cfg.maxWalFiles, "max-wals", defaultMaxWALs, "Maximum number of wal files to retain (0 is unlimited)")
fs.StringVar(&cfg.name, "name", "default", "Unique human-readable name for this node")
fs.StringVar(&cfg.name, "name", defaultName, "Unique human-readable name for this node")
fs.Uint64Var(&cfg.snapCount, "snapshot-count", etcdserver.DefaultSnapCount, "Number of committed transactions to trigger a snapshot")
fs.UintVar(&cfg.TickMs, "heartbeat-interval", 100, "Time (in milliseconds) of a heartbeat interval.")
fs.UintVar(&cfg.ElectionMs, "election-timeout", 1000, "Time (in milliseconds) for an election to timeout.")
@ -153,7 +155,7 @@ func NewConfig() *config {
}
fs.StringVar(&cfg.dproxy, "discovery-proxy", "", "HTTP proxy to use for traffic to discovery service")
fs.StringVar(&cfg.dnsCluster, "discovery-srv", "", "DNS domain used to bootstrap initial cluster")
fs.StringVar(&cfg.initialCluster, "initial-cluster", "default=http://localhost:2380,default=http://localhost:7001", "Initial cluster configuration for bootstrapping")
fs.StringVar(&cfg.initialCluster, "initial-cluster", initialClusterFromName(defaultName), "Initial cluster configuration for bootstrapping")
fs.StringVar(&cfg.initialClusterToken, "initial-cluster-token", "etcd-cluster", "Initial cluster token for the etcd cluster during bootstrap")
fs.Var(cfg.clusterState, "initial-cluster-state", "Initial cluster configuration for bootstrapping")
if err := cfg.clusterState.Set(clusterStateFlagNew); err != nil {
@ -206,9 +208,15 @@ func (cfg *config) Parse(arguments []string) error {
default:
os.Exit(2)
}
if len(cfg.FlagSet.Args()) != 0 {
return fmt.Errorf("'%s' is not a valid flag", cfg.FlagSet.Arg(0))
}
if cfg.printVersion {
fmt.Println("etcd version", version.Version)
fmt.Printf("etcd Version: %s\n", version.Version)
fmt.Printf("Git SHA: %s\n", version.GitSHA)
fmt.Printf("Go Version: %s\n", runtime.Version())
fmt.Printf("Go OS/Arch: %s/%s\n", runtime.GOOS, runtime.GOARCH)
os.Exit(0)
}
@ -231,6 +239,9 @@ func (cfg *config) Parse(arguments []string) error {
return ErrConflictBootstrapFlags
}
flags.SetBindAddrFromAddr(cfg.FlagSet, "peer-bind-addr", "peer-addr")
flags.SetBindAddrFromAddr(cfg.FlagSet, "bind-addr", "addr")
cfg.lpurls, err = flags.URLsFromFlags(cfg.FlagSet, "listen-peer-urls", "peer-bind-addr", cfg.peerTLSInfo)
if err != nil {
return err
@ -248,15 +259,29 @@ func (cfg *config) Parse(arguments []string) error {
return err
}
if err := cfg.resolveUrls(); err != nil {
return errors.New("cannot resolve DNS hostnames.")
// when etcd runs in member mode user needs to set -advertise-client-urls if -listen-client-urls is set.
// TODO(yichengq): check this for joining through discovery service case
mayFallbackToProxy := flags.IsSet(cfg.FlagSet, "discovery") && cfg.fallback.String() == fallbackFlagProxy
mayBeProxy := cfg.proxy.String() != proxyFlagOff || mayFallbackToProxy
if !mayBeProxy {
if flags.IsSet(cfg.FlagSet, "listen-client-urls") && !flags.IsSet(cfg.FlagSet, "advertise-client-urls") {
return errUnsetAdvertiseClientURLsFlag
}
}
if 5*cfg.TickMs > cfg.ElectionMs {
return fmt.Errorf("-election-timeout[%vms] should be at least as 5 times as -heartbeat-interval[%vms]", cfg.ElectionMs, cfg.TickMs)
}
return nil
}
func (cfg *config) resolveUrls() error {
return netutil.ResolveTCPAddrs(cfg.lpurls, cfg.apurls, cfg.lcurls, cfg.acurls)
func initialClusterFromName(name string) string {
n := name
if name == "" {
n = defaultName
}
return fmt.Sprintf("%s=http://localhost:2380,%s=http://localhost:7001", n, n)
}
func (cfg config) isNewCluster() bool { return cfg.clusterState.String() == clusterStateFlagNew }

View File

@ -29,6 +29,8 @@ func TestConfigParsingMemberFlags(t *testing.T) {
"-snapshot-count=10",
"-listen-peer-urls=http://localhost:8000,https://localhost:8001",
"-listen-client-urls=http://localhost:7000,https://localhost:7001",
// it should be set if -listen-client-urls is set
"-advertise-client-urls=http://localhost:7000,https://localhost:7001",
}
wcfg := &config{
dir: "testdir",
@ -151,6 +153,35 @@ func TestConfigParsingOtherFlags(t *testing.T) {
}
}
func TestConfigParsingV1Flags(t *testing.T) {
args := []string{
"-peer-addr=127.0.0.1:7001",
"-addr=127.0.0.1:4001",
}
wcfg := NewConfig()
wcfg.lpurls = []url.URL{{Scheme: "http", Host: "[::]:7001"}}
wcfg.apurls = []url.URL{{Scheme: "http", Host: "127.0.0.1:7001"}}
wcfg.lcurls = []url.URL{{Scheme: "http", Host: "[::]:4001"}}
wcfg.acurls = []url.URL{{Scheme: "http", Host: "127.0.0.1:4001"}}
cfg := NewConfig()
if err := cfg.Parse(args); err != nil {
t.Fatal(err)
}
if !reflect.DeepEqual(cfg.lpurls, wcfg.lpurls) {
t.Errorf("listen peer urls = %+v, want %+v", cfg.lpurls, wcfg.lpurls)
}
if !reflect.DeepEqual(cfg.apurls, wcfg.apurls) {
t.Errorf("advertise peer urls = %+v, want %+v", cfg.apurls, wcfg.apurls)
}
if !reflect.DeepEqual(cfg.lcurls, wcfg.lcurls) {
t.Errorf("listen client urls = %+v, want %+v", cfg.lcurls, wcfg.lcurls)
}
if !reflect.DeepEqual(cfg.acurls, wcfg.acurls) {
t.Errorf("advertise client urls = %+v, want %+v", cfg.acurls, wcfg.acurls)
}
}
func TestConfigParsingConflictClusteringFlags(t *testing.T) {
conflictArgs := [][]string{
[]string{
@ -181,6 +212,71 @@ func TestConfigParsingConflictClusteringFlags(t *testing.T) {
}
}
func TestConfigParsingMissedAdvertiseClientURLsFlag(t *testing.T) {
tests := []struct {
args []string
werr error
}{
{
[]string{
"-initial-cluster=infra1=http://127.0.0.1:2380",
"-listen-client-urls=http://127.0.0.1:2379",
},
errUnsetAdvertiseClientURLsFlag,
},
{
[]string{
"-discovery-srv=example.com",
"-listen-client-urls=http://127.0.0.1:2379",
},
errUnsetAdvertiseClientURLsFlag,
},
{
[]string{
"-discovery=http://example.com/abc",
"-discovery-fallback=exit",
"-listen-client-urls=http://127.0.0.1:2379",
},
errUnsetAdvertiseClientURLsFlag,
},
{
[]string{
"-listen-client-urls=http://127.0.0.1:2379",
},
errUnsetAdvertiseClientURLsFlag,
},
{
[]string{
"-discovery=http://example.com/abc",
"-listen-client-urls=http://127.0.0.1:2379",
},
nil,
},
{
[]string{
"-proxy=on",
"-listen-client-urls=http://127.0.0.1:2379",
},
nil,
},
{
[]string{
"-proxy=readonly",
"-listen-client-urls=http://127.0.0.1:2379",
},
nil,
},
}
for i, tt := range tests {
cfg := NewConfig()
err := cfg.Parse(tt.args)
if err != tt.werr {
t.Errorf("%d: err = %v, want %v", i, err, tt.werr)
}
}
}
func TestConfigIsNewCluster(t *testing.T) {
tests := []struct {
state string

View File

@ -15,11 +15,15 @@
package etcdmain
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net"
"net/http"
"os"
"path"
"reflect"
"strings"
"time"
@ -28,28 +32,55 @@ import (
"github.com/coreos/etcd/etcdserver/etcdhttp"
"github.com/coreos/etcd/pkg/cors"
"github.com/coreos/etcd/pkg/fileutil"
"github.com/coreos/etcd/pkg/osutil"
"github.com/coreos/etcd/pkg/transport"
"github.com/coreos/etcd/pkg/types"
"github.com/coreos/etcd/proxy"
"github.com/coreos/etcd/rafthttp"
)
type dirType string
const (
// the owner can make/remove files inside the directory
privateDirMode = 0700
)
var (
dirMember = dirType("member")
dirProxy = dirType("proxy")
dirEmpty = dirType("empty")
)
func Main() {
cfg := NewConfig()
err := cfg.Parse(os.Args[1:])
if err != nil {
log.Printf("etcd: error verifying flags, %v", err)
os.Exit(2)
log.Printf("error verifying flags, %v. See 'etcd -help'.", err)
switch err {
case errUnsetAdvertiseClientURLsFlag:
log.Printf("When listening on specific address(es), this etcd process must advertise accessible url(s) to each connected client.")
}
os.Exit(1)
}
var stopped <-chan struct{}
shouldProxy := cfg.isProxy()
if cfg.name != defaultName && cfg.initialCluster == initialClusterFromName(defaultName) {
cfg.initialCluster = initialClusterFromName(cfg.name)
}
if cfg.dir == "" {
cfg.dir = fmt.Sprintf("%v.etcd", cfg.name)
log.Printf("etcd: no data-dir provided, using default data-dir ./%s", cfg.dir)
}
which := identifyDataDirOrDie(cfg.dir)
if which != dirEmpty {
log.Printf("etcd: already initialized as %v before, starting as etcd %v...", which, which)
}
shouldProxy := cfg.isProxy() || which == dirProxy
if !shouldProxy {
stopped, err = startEtcd(cfg)
if err == discovery.ErrFullCluster && cfg.shouldFallbackToProxy() {
@ -63,14 +94,19 @@ func Main() {
if err != nil {
switch err {
case discovery.ErrDuplicateID:
log.Fatalf("etcd: member %s has previously registered with discovery service (%s), but the data-dir (%s) on disk cannot be found.",
cfg.name, cfg.durl, cfg.dir)
log.Printf("member %q has previously registered with discovery service token (%s).", cfg.name, cfg.durl)
log.Printf("But etcd could not find vaild cluster configuration in the given data dir (%s).", cfg.dir)
log.Printf("Please check the given data dir path if the previous bootstrap succeeded")
log.Printf("or use a new discovery token if the previous bootstrap failed.")
default:
log.Fatalf("etcd: %v", err)
}
}
osutil.HandleInterrupts()
<-stopped
osutil.Exit(0)
}
// startEtcd launches the etcd server and HTTP handlers for client/server communication.
@ -80,18 +116,7 @@ func startEtcd(cfg *config) (<-chan struct{}, error) {
return nil, fmt.Errorf("error setting up initial cluster: %v", err)
}
if cfg.dir == "" {
cfg.dir = fmt.Sprintf("%v.etcd", cfg.name)
log.Printf("no data-dir provided, using default data-dir ./%s", cfg.dir)
}
if err := os.MkdirAll(cfg.dir, privateDirMode); err != nil {
return nil, fmt.Errorf("cannot create data directory: %v", err)
}
if err := fileutil.IsDirWriteable(cfg.dir); err != nil {
return nil, fmt.Errorf("cannot write to data directory: %v", err)
}
pt, err := transport.NewTimeoutTransport(cfg.peerTLSInfo, rafthttp.ConnReadTimeout, rafthttp.ConnWriteTimeout)
pt, err := transport.NewTimeoutTransport(cfg.peerTLSInfo, rafthttp.DialTimeout, rafthttp.ConnReadTimeout, rafthttp.ConnWriteTimeout)
if err != nil {
return nil, err
}
@ -163,6 +188,7 @@ func startEtcd(cfg *config) (<-chan struct{}, error) {
return nil, err
}
s.Start()
osutil.RegisterInterruptHandler(s.Stop)
if cfg.corsInfo.String() != "" {
log.Printf("etcd: cors = %s", cfg.corsInfo)
@ -207,6 +233,7 @@ func startProxy(cfg *config) error {
}
pt, err := transport.NewTransport(cfg.clientTLSInfo)
pt.MaxIdleConnsPerHost = proxy.DefaultMaxIdleConnsPerHost
if err != nil {
return err
}
@ -216,15 +243,67 @@ func startProxy(cfg *config) error {
return err
}
// TODO(jonboulle): update peerURLs dynamically (i.e. when updating
// clientURLs) instead of just using the initial fixed list here
peerURLs := cls.PeerURLs()
cfg.dir = path.Join(cfg.dir, "proxy")
err = os.MkdirAll(cfg.dir, 0700)
if err != nil {
return err
}
var peerURLs []string
clusterfile := path.Join(cfg.dir, "cluster")
b, err := ioutil.ReadFile(clusterfile)
switch {
case err == nil:
urls := struct{ PeerURLs []string }{}
err := json.Unmarshal(b, &urls)
if err != nil {
return err
}
peerURLs = urls.PeerURLs
log.Printf("proxy: using peer urls %v from cluster file ./%s", peerURLs, clusterfile)
case os.IsNotExist(err):
peerURLs = cls.PeerURLs()
log.Printf("proxy: using peer urls %v ", peerURLs)
default:
return err
}
uf := func() []string {
cls, err := etcdserver.GetClusterFromPeers(peerURLs, tr)
gcls, err := etcdserver.GetClusterFromRemotePeers(peerURLs, tr)
// TODO: remove the 2nd check when we fix GetClusterFromPeers
// GetClusterFromPeers should not return nil error with an invaild empty cluster
if err != nil {
log.Printf("proxy: %v", err)
return []string{}
}
if len(gcls.Members()) == 0 {
return cls.ClientURLs()
}
cls = gcls
urls := struct{ PeerURLs []string }{cls.PeerURLs()}
b, err := json.Marshal(urls)
if err != nil {
log.Printf("proxy: error on marshal peer urls %s", err)
return cls.ClientURLs()
}
err = ioutil.WriteFile(clusterfile+".bak", b, 0600)
if err != nil {
log.Printf("proxy: error on writing urls %s", err)
return cls.ClientURLs()
}
err = os.Rename(clusterfile+".bak", clusterfile)
if err != nil {
log.Printf("proxy: error on updating clusterfile %s", err)
return cls.ClientURLs()
}
if !reflect.DeepEqual(cls.PeerURLs(), peerURLs) {
log.Printf("proxy: updated peer urls in cluster file from %v to %v", peerURLs, cls.PeerURLs())
}
peerURLs = cls.PeerURLs()
return cls.ClientURLs()
}
ph := proxy.NewHandler(pt, uf)
@ -282,3 +361,38 @@ func genClusterString(name string, urls types.URLs) string {
}
return strings.Join(addrs, ",")
}
// identifyDataDirOrDie returns the type of the data dir.
// Dies if the datadir is invalid.
func identifyDataDirOrDie(dir string) dirType {
names, err := fileutil.ReadDir(dir)
if err != nil {
if os.IsNotExist(err) {
return dirEmpty
}
log.Fatalf("etcd: error listing data dir: %s", dir)
}
var m, p bool
for _, name := range names {
switch dirType(name) {
case dirMember:
m = true
case dirProxy:
p = true
default:
log.Printf("etcd: found invalid file/dir %s under data dir %s (Ignore this if you are upgrading etcd)", name, dir)
}
}
if m && p {
log.Fatal("etcd: invalid datadir. Both member and proxy directories exist.")
}
if m {
return dirMember
}
if p {
return dirProxy
}
return dirEmpty
}

View File

@ -56,7 +56,8 @@ clustering flags:
--initial-cluster-token 'etcd-cluster'
initial cluster token for the etcd cluster during bootstrap.
--advertise-client-urls 'http://localhost:2379,http://localhost:4001'
list of this member's client URLs to advertise to the rest of the cluster.
list of this member's client URLs to advertise to the public.
The client URLs advertised should be accessible to machines that talk to etcd cluster. etcd client libraries parse these URLs to connect to the cluster.
--discovery ''
discovery URL used to bootstrap the cluster.
--discovery-fallback 'proxy'
@ -91,8 +92,8 @@ security flags:
unsafe flags:
Please be CAUTIOUS to use unsafe flags because it will break the guarantee given
by consensus protocol.
Please be CAUTIOUS when using unsafe flags because it will break the guarantees
given by the consensus protocol.
--force-new-cluster 'false'
force to create a new one-member cluster.

View File

@ -56,14 +56,15 @@ type ClusterInfo interface {
// Cluster is a list of Members that belong to the same raft cluster
type Cluster struct {
id types.ID
token string
members map[types.ID]*Member
id types.ID
token string
store store.Store
sync.Mutex // guards members and removed map
members map[types.ID]*Member
// removed contains the ids of removed members in the cluster.
// removed id cannot be reused.
removed map[types.ID]bool
store store.Store
sync.Mutex
}
// NewClusterFromString returns a Cluster instantiated from the given cluster token
@ -346,6 +347,20 @@ func (c *Cluster) UpdateRaftAttributes(id types.ID, raftAttr RaftAttributes) {
c.members[id].RaftAttributes = raftAttr
}
// Validate ensures that there is no identical urls in the cluster peer list
func (c *Cluster) Validate() error {
urlMap := make(map[string]bool)
for _, m := range c.Members() {
for _, url := range m.PeerURLs {
if urlMap[url] {
return fmt.Errorf("duplicate url %v in cluster config", url)
}
urlMap[url] = true
}
}
return nil
}
func membersFromStore(st store.Store) (map[types.ID]*Member, map[types.ID]bool) {
members := make(map[types.ID]*Member)
removed := make(map[types.ID]bool)

108
etcdserver/cluster_util.go Normal file
View File

@ -0,0 +1,108 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package etcdserver
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
"sort"
"time"
"github.com/coreos/etcd/pkg/types"
)
// isMemberBootstrapped tries to check if the given member has been bootstrapped
// in the given cluster.
func isMemberBootstrapped(cl *Cluster, member string, tr *http.Transport) bool {
rcl, err := getClusterFromRemotePeers(getRemotePeerURLs(cl, member), false, tr)
if err != nil {
return false
}
id := cl.MemberByName(member).ID
m := rcl.Member(id)
if m == nil {
return false
}
if len(m.ClientURLs) > 0 {
return true
}
return false
}
// GetClusterFromRemotePeers takes a set of URLs representing etcd peers, and
// attempts to construct a Cluster by accessing the members endpoint on one of
// these URLs. The first URL to provide a response is used. If no URLs provide
// a response, or a Cluster cannot be successfully created from a received
// response, an error is returned.
func GetClusterFromRemotePeers(urls []string, tr *http.Transport) (*Cluster, error) {
return getClusterFromRemotePeers(urls, true, tr)
}
// If logerr is true, it prints out more error messages.
func getClusterFromRemotePeers(urls []string, logerr bool, tr *http.Transport) (*Cluster, error) {
cc := &http.Client{
Transport: tr,
Timeout: time.Second,
}
for _, u := range urls {
resp, err := cc.Get(u + "/members")
if err != nil {
if logerr {
log.Printf("etcdserver: could not get cluster response from %s: %v", u, err)
}
continue
}
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
if logerr {
log.Printf("etcdserver: could not read the body of cluster response: %v", err)
}
continue
}
var membs []*Member
if err := json.Unmarshal(b, &membs); err != nil {
if logerr {
log.Printf("etcdserver: could not unmarshal cluster response: %v", err)
}
continue
}
id, err := types.IDFromString(resp.Header.Get("X-Etcd-Cluster-ID"))
if err != nil {
if logerr {
log.Printf("etcdserver: could not parse the cluster ID from cluster res: %v", err)
}
continue
}
return NewClusterFromMembers("", id, membs), nil
}
return nil, fmt.Errorf("etcdserver: could not retrieve cluster information from the given urls")
}
// getRemotePeerURLs returns peer urls of remote members in the cluster. The
// returned list is sorted in ascending lexicographical order.
func getRemotePeerURLs(cl ClusterInfo, local string) []string {
us := make([]string, 0)
for _, m := range cl.Members() {
if m.Name == local {
continue
}
us = append(us, m.PeerURLs...)
}
sort.Strings(us)
return us
}

View File

@ -46,9 +46,43 @@ type ServerConfig struct {
ElectionTicks int
}
// VerifyBootstrapConfig sanity-checks the initial config and returns an error
// for things that should never happen.
func (c *ServerConfig) VerifyBootstrapConfig() error {
// VerifyBootstrapConfig sanity-checks the initial config for bootstrap case
// and returns an error for things that should never happen.
func (c *ServerConfig) VerifyBootstrap() error {
if err := c.verifyLocalMember(true); err != nil {
return err
}
if err := c.Cluster.Validate(); err != nil {
return err
}
if c.Cluster.String() == "" && c.DiscoveryURL == "" {
return fmt.Errorf("initial cluster unset and no discovery URL found")
}
return nil
}
// VerifyJoinExisting sanity-checks the initial config for join existing cluster
// case and returns an error for things that should never happen.
func (c *ServerConfig) VerifyJoinExisting() error {
// no need for strict checking since the member have announced its
// peer urls to the cluster before starting and do not have to set
// it in the configuration again.
if err := c.verifyLocalMember(false); err != nil {
return err
}
if err := c.Cluster.Validate(); err != nil {
return err
}
if c.DiscoveryURL != "" {
return fmt.Errorf("discovery URL should not be set when joining existing initial cluster")
}
return nil
}
// verifyLocalMember verifies the configured member is in configured
// cluster. If strict is set, it also verifies the configured member
// has the same peer urls as configured advertised peer urls.
func (c *ServerConfig) verifyLocalMember(strict bool) error {
m := c.Cluster.MemberByName(c.Name)
// Make sure the cluster at least contains the local server.
if m == nil {
@ -58,34 +92,23 @@ func (c *ServerConfig) VerifyBootstrapConfig() error {
return fmt.Errorf("cannot use %x as member id", raft.None)
}
if c.DiscoveryURL == "" && !c.NewCluster {
return fmt.Errorf("initial cluster state unset and no wal or discovery URL found")
}
// No identical IPs in the cluster peer list
urlMap := make(map[string]bool)
for _, m := range c.Cluster.Members() {
for _, url := range m.PeerURLs {
if urlMap[url] {
return fmt.Errorf("duplicate url %v in cluster config", url)
}
urlMap[url] = true
}
}
// Advertised peer URLs must match those in the cluster peer list
// TODO: Remove URLStringsEqual after improvement of using hostnames #2150 #2123
apurls := c.PeerURLs.StringSlice()
sort.Strings(apurls)
if !netutil.URLStringsEqual(apurls, m.PeerURLs) {
return fmt.Errorf("%s has different advertised URLs in the cluster and advertised peer URLs list", c.Name)
if strict {
if !netutil.URLStringsEqual(apurls, m.PeerURLs) {
return fmt.Errorf("%s has different advertised URLs in the cluster and advertised peer URLs list", c.Name)
}
}
return nil
}
func (c *ServerConfig) WALDir() string { return path.Join(c.DataDir, "wal") }
func (c *ServerConfig) MemberDir() string { return path.Join(c.DataDir, "member") }
func (c *ServerConfig) SnapDir() string { return path.Join(c.DataDir, "snap") }
func (c *ServerConfig) WALDir() string { return path.Join(c.MemberDir(), "wal") }
func (c *ServerConfig) SnapDir() string { return path.Join(c.MemberDir(), "snap") }
func (c *ServerConfig) ShouldDiscover() bool { return c.DiscoveryURL != "" }
@ -99,6 +122,7 @@ func (c *ServerConfig) print(initial bool) {
log.Println("etcdserver: force new cluster")
}
log.Printf("etcdserver: data dir = %s", c.DataDir)
log.Printf("etcdserver: member dir = %s", c.MemberDir())
log.Printf("etcdserver: heartbeat = %dms", c.TickMs)
log.Printf("etcdserver: election = %dms", c.ElectionTicks*int(c.TickMs))
log.Printf("etcdserver: snapshot count = %d", c.SnapCount)

View File

@ -22,6 +22,9 @@ import (
)
func mustNewURLs(t *testing.T, urls []string) []url.URL {
if len(urls) == 0 {
return nil
}
u, err := types.NewURLs(urls)
if err != nil {
t.Fatalf("error creating new URLs from %q: %v", urls, err)
@ -29,77 +32,101 @@ func mustNewURLs(t *testing.T, urls []string) []url.URL {
return u
}
func TestBootstrapConfigVerify(t *testing.T) {
func TestConfigVerifyBootstrapWithoutClusterAndDiscoveryURLFail(t *testing.T) {
cluster, err := NewClusterFromString("", "")
if err != nil {
t.Fatalf("NewClusterFromString error: %v", err)
}
c := &ServerConfig{
Name: "node1",
DiscoveryURL: "",
Cluster: cluster,
}
if err := c.VerifyBootstrap(); err == nil {
t.Errorf("err = nil, want not nil")
}
}
func TestConfigVerifyExistingWithDiscoveryURLFail(t *testing.T) {
cluster, err := NewClusterFromString("", "node1=http://127.0.0.1:2380")
if err != nil {
t.Fatalf("NewClusterFromString error: %v", err)
}
c := &ServerConfig{
Name: "node1",
DiscoveryURL: "http://127.0.0.1:4001/abcdefg",
PeerURLs: mustNewURLs(t, []string{"http://127.0.0.1:2380"}),
Cluster: cluster,
NewCluster: false,
}
if err := c.VerifyJoinExisting(); err == nil {
t.Errorf("err = nil, want not nil")
}
}
func TestConfigVerifyLocalMember(t *testing.T) {
tests := []struct {
clusterSetting string
newclst bool
apurls []string
disc string
strict bool
shouldError bool
}{
{
// Node must exist in cluster
"",
true,
nil,
"",
true,
true,
},
{
// Cannot have duplicate URLs in cluster config
"node1=http://localhost:7001,node2=http://localhost:7001,node2=http://localhost:7002",
true,
nil,
"",
true,
},
{
// Node defined, ClusterState OK
// Initial cluster set
"node1=http://localhost:7001,node2=http://localhost:7002",
true,
[]string{"http://localhost:7001"},
"",
true,
false,
},
{
// Node defined, discovery OK
"node1=http://localhost:7001",
false,
[]string{"http://localhost:7001"},
"http://discovery",
false,
},
{
// Cannot have ClusterState!=new && !discovery
"node1=http://localhost:7001",
false,
nil,
"",
// Default initial cluster
"node1=http://localhost:2380,node1=http://localhost:7001",
[]string{"http://localhost:2380", "http://localhost:7001"},
true,
false,
},
{
// Advertised peer URLs must match those in cluster-state
"node1=http://localhost:7001",
true,
[]string{"http://localhost:12345"},
"",
true,
true,
},
{
// Advertised peer URLs must match those in cluster-state
"node1=http://localhost:7001,node1=http://localhost:12345",
true,
[]string{"http://localhost:12345"},
"",
true,
true,
},
{
// Advertised peer URLs must match those in cluster-state
"node1=http://localhost:7001",
[]string{},
true,
true,
},
{
// do not care about the urls if strict is not set
"node1=http://localhost:7001",
[]string{},
false,
false,
},
}
for i, tt := range tests {
@ -108,15 +135,13 @@ func TestBootstrapConfigVerify(t *testing.T) {
t.Fatalf("#%d: Got unexpected error: %v", i, err)
}
cfg := ServerConfig{
Name: "node1",
DiscoveryURL: tt.disc,
Cluster: cluster,
NewCluster: tt.newclst,
Name: "node1",
Cluster: cluster,
}
if tt.apurls != nil {
cfg.PeerURLs = mustNewURLs(t, tt.apurls)
}
err = cfg.VerifyBootstrapConfig()
err = cfg.verifyLocalMember(tt.strict)
if (err == nil) && tt.shouldError {
t.Errorf("%#v", *cluster)
t.Errorf("#%d: Got no error where one was expected", i)
@ -129,8 +154,8 @@ func TestBootstrapConfigVerify(t *testing.T) {
func TestSnapDir(t *testing.T) {
tests := map[string]string{
"/": "/snap",
"/var/lib/etc": "/var/lib/etc/snap",
"/": "/member/snap",
"/var/lib/etc": "/var/lib/etc/member/snap",
}
for dd, w := range tests {
cfg := ServerConfig{
@ -144,8 +169,8 @@ func TestSnapDir(t *testing.T) {
func TestWALDir(t *testing.T) {
tests := map[string]string{
"/": "/wal",
"/var/lib/etc": "/var/lib/etc/wal",
"/": "/member/wal",
"/var/lib/etc": "/var/lib/etc/member/wal",
}
for dd, w := range tests {
cfg := ServerConfig{

View File

@ -119,7 +119,6 @@ func (h *keysHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
writeError(w, err)
return
}
switch {
case resp.Event != nil:
if err := writeKeyEvent(w, resp.Event, h.timer); err != nil {
@ -334,7 +333,7 @@ func serveVersion(w http.ResponseWriter, r *http.Request) {
if !allowMethod(w, r.Method, "GET") {
return
}
fmt.Fprintf(w, `{"releaseVersion":"%s","internalVersion":"%s"}`, version.Version, version.InternalVersion)
w.Write([]byte("etcd " + version.Version))
}
// parseKeyRequest converts a received http.Request on keysPrefix to

View File

@ -1064,13 +1064,13 @@ func TestServeMembersFail(t *testing.T) {
func TestWriteEvent(t *testing.T) {
// nil event should not panic
rw := httptest.NewRecorder()
writeKeyEvent(rw, nil, dummyRaftTimer{})
h := rw.Header()
rec := httptest.NewRecorder()
writeKeyEvent(rec, nil, dummyRaftTimer{})
h := rec.Header()
if len(h) > 0 {
t.Fatalf("unexpected non-empty headers: %#v", h)
}
b := rw.Body.String()
b := rec.Body.String()
if len(b) > 0 {
t.Fatalf("unexpected non-empty body: %q", b)
}
@ -1327,7 +1327,7 @@ func TestServeVersion(t *testing.T) {
if rw.Code != http.StatusOK {
t.Errorf("code=%d, want %d", rw.Code, http.StatusOK)
}
w := fmt.Sprintf(`{"releaseVersion":"%s","internalVersion":"%s"}`, version.Version, version.InternalVersion)
w := fmt.Sprintf("etcd %s", version.Version)
if g := rw.Body.String(); g != w {
t.Fatalf("body = %q, want %q", g, w)
}

View File

@ -76,13 +76,13 @@ func (fs *errServer) UpdateMember(ctx context.Context, m etcdserver.Member) erro
func TestWriteError(t *testing.T) {
// nil error should not panic
rw := httptest.NewRecorder()
writeError(rw, nil)
h := rw.Header()
rec := httptest.NewRecorder()
writeError(rec, nil)
h := rec.Header()
if len(h) > 0 {
t.Fatalf("unexpected non-empty headers: %#v", h)
}
b := rw.Body.String()
b := rec.Body.String()
if len(b) > 0 {
t.Fatalf("unexpected non-empty body: %q", b)
}

View File

@ -18,13 +18,11 @@ import (
"encoding/json"
"expvar"
"fmt"
"io/ioutil"
"log"
"math/rand"
"net/http"
"path"
"regexp"
"sort"
"sync/atomic"
"time"
@ -44,6 +42,7 @@ import (
"github.com/coreos/etcd/rafthttp"
"github.com/coreos/etcd/snap"
"github.com/coreos/etcd/store"
"github.com/coreos/etcd/version"
"github.com/coreos/etcd/wal"
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/net/context"
@ -141,60 +140,72 @@ type EtcdServer struct {
// NewServer creates a new EtcdServer from the supplied configuration. The
// configuration is considered static for the lifetime of the EtcdServer.
func NewServer(cfg *ServerConfig) (*EtcdServer, error) {
st := store.New()
st := store.New(StoreAdminPrefix, StoreKeysPrefix)
var w *wal.WAL
var n raft.Node
var s *raft.MemoryStorage
var id types.ID
walVersion, err := wal.DetectVersion(cfg.DataDir)
// Run the migrations.
dataVer, err := version.DetectDataDir(cfg.DataDir)
if err != nil {
return nil, err
}
if walVersion == wal.WALUnknown {
return nil, fmt.Errorf("unknown wal version in data dir %s", cfg.DataDir)
if err := upgradeDataDir(cfg.DataDir, cfg.Name, dataVer); err != nil {
return nil, err
}
haveWAL := walVersion != wal.WALNotExist
haveWAL := wal.Exist(cfg.WALDir())
ss := snap.New(cfg.SnapDir())
var remotes []*Member
switch {
case !haveWAL && !cfg.NewCluster:
us := getOtherPeerURLs(cfg.Cluster, cfg.Name)
existingCluster, err := GetClusterFromPeers(us, cfg.Transport)
if err := cfg.VerifyJoinExisting(); err != nil {
return nil, err
}
existingCluster, err := GetClusterFromRemotePeers(getRemotePeerURLs(cfg.Cluster, cfg.Name), cfg.Transport)
if err != nil {
return nil, fmt.Errorf("cannot fetch cluster info from peer urls: %v", err)
}
if err := ValidateClusterAndAssignIDs(cfg.Cluster, existingCluster); err != nil {
return nil, fmt.Errorf("error validating peerURLs %s: %v", existingCluster, err)
}
remotes = existingCluster.Members()
cfg.Cluster.SetID(existingCluster.id)
cfg.Cluster.SetStore(st)
cfg.Print()
id, n, s, w = startNode(cfg, nil)
case !haveWAL && cfg.NewCluster:
if err := cfg.VerifyBootstrapConfig(); err != nil {
if err := cfg.VerifyBootstrap(); err != nil {
return nil, err
}
m := cfg.Cluster.MemberByName(cfg.Name)
if isBootstrapped(cfg) {
if isMemberBootstrapped(cfg.Cluster, cfg.Name, cfg.Transport) {
return nil, fmt.Errorf("member %s has already been bootstrapped", m.ID)
}
if cfg.ShouldDiscover() {
s, err := discovery.JoinCluster(cfg.DiscoveryURL, cfg.DiscoveryProxy, m.ID, cfg.Cluster.String())
str, err := discovery.JoinCluster(cfg.DiscoveryURL, cfg.DiscoveryProxy, m.ID, cfg.Cluster.String())
if err != nil {
return nil, err
}
if cfg.Cluster, err = NewClusterFromString(cfg.Cluster.token, s); err != nil {
if cfg.Cluster, err = NewClusterFromString(cfg.Cluster.token, str); err != nil {
return nil, err
}
if err := cfg.Cluster.Validate(); err != nil {
return nil, fmt.Errorf("bad discovery cluster: %v", err)
}
}
cfg.Cluster.SetStore(st)
cfg.PrintWithInitial()
id, n, s, w = startNode(cfg, cfg.Cluster.MemberIDs())
case haveWAL:
if walVersion != wal.WALv0_5 {
if err := upgradeWAL(cfg, walVersion); err != nil {
return nil, err
}
if err := fileutil.IsDirWriteable(cfg.DataDir); err != nil {
return nil, fmt.Errorf("cannot write to data directory: %v", err)
}
if err := fileutil.IsDirWriteable(cfg.MemberDir()); err != nil {
return nil, fmt.Errorf("cannot write to member directory: %v", err)
}
if cfg.ShouldDiscover() {
@ -228,6 +239,7 @@ func NewServer(cfg *ServerConfig) (*EtcdServer, error) {
Name: cfg.Name,
ID: id.String(),
}
sstats.Initialize()
lstats := stats.NewLeaderStats(id.String())
srv := &EtcdServer{
@ -250,10 +262,16 @@ func NewServer(cfg *ServerConfig) (*EtcdServer, error) {
reqIDGen: idutil.NewGenerator(uint8(id), time.Now()),
}
// TODO: move transport initialization near the definition of remote
tr := rafthttp.NewTransporter(cfg.Transport, id, cfg.Cluster.ID(), srv, srv.errorc, sstats, lstats)
// add all the remote members into sendhub
// add all remotes into transport
for _, m := range remotes {
if m.ID != id {
tr.AddRemote(m.ID, m.PeerURLs)
}
}
for _, m := range cfg.Cluster.Members() {
if m.Name != cfg.Name {
if m.ID != id {
tr.AddPeer(m.ID, m.PeerURLs)
}
}
@ -282,7 +300,6 @@ func (s *EtcdServer) start() {
s.w = wait.New()
s.done = make(chan struct{})
s.stop = make(chan struct{})
s.stats.Initialize()
// TODO: if this is an empty log, writes all peer infos
// into the first entry
go s.run()
@ -386,7 +403,18 @@ func (s *EtcdServer) run() {
log.Panicf("recovery store error: %v", err)
}
s.Cluster.Recover()
// recover raft transport
s.r.transport.RemoveAllPeers()
for _, m := range s.Cluster.Members() {
if m.ID == s.ID() {
continue
}
s.r.transport.AddPeer(m.ID, m.PeerURLs)
}
appliedi = rd.Snapshot.Metadata.Index
confState = rd.Snapshot.Metadata.ConfState
log.Printf("etcdserver: recovered from incoming snapshot at index %d", snapi)
}
// TODO(bmizerany): do this in the background, but take
@ -646,9 +674,9 @@ func (s *EtcdServer) publish(retryInterval time.Duration) {
}
func (s *EtcdServer) send(ms []raftpb.Message) {
for _, m := range ms {
if !s.Cluster.IsIDRemoved(types.ID(m.To)) {
m.To = 0
for i, _ := range ms {
if s.Cluster.IsIDRemoved(types.ID(ms[i].To)) {
ms[i].To = 0
}
}
s.r.transport.Send(ms)
@ -698,7 +726,11 @@ func (s *EtcdServer) applyRequest(r pb.Request) Response {
switch {
case existsSet:
if exists {
return f(s.store.Update(r.Path, r.Val, expr))
if r.PrevIndex == 0 && r.PrevValue == "" {
return f(s.store.Update(r.Path, r.Val, expr))
} else {
return f(s.store.CompareAndSwap(r.Path, r.PrevValue, r.PrevIndex, r.Val, expr))
}
}
return f(s.store.Create(r.Path, r.Dir, r.Val, false, expr))
case r.PrevIndex > 0 || r.PrevValue != "":
@ -820,88 +852,3 @@ func (s *EtcdServer) snapshot(snapi uint64, confState *raftpb.ConfState) {
func (s *EtcdServer) PauseSending() { s.r.pauseSending() }
func (s *EtcdServer) ResumeSending() { s.r.resumeSending() }
// isBootstrapped tries to check if the given member has been bootstrapped
// in the given cluster.
func isBootstrapped(cfg *ServerConfig) bool {
cl := cfg.Cluster
member := cfg.Name
us := getOtherPeerURLs(cl, member)
rcl, err := getClusterFromPeers(us, false, cfg.Transport)
if err != nil {
return false
}
id := cl.MemberByName(member).ID
m := rcl.Member(id)
if m == nil {
return false
}
if len(m.ClientURLs) > 0 {
return true
}
return false
}
// GetClusterFromPeers takes a set of URLs representing etcd peers, and
// attempts to construct a Cluster by accessing the members endpoint on one of
// these URLs. The first URL to provide a response is used. If no URLs provide
// a response, or a Cluster cannot be successfully created from a received
// response, an error is returned.
func GetClusterFromPeers(urls []string, tr *http.Transport) (*Cluster, error) {
return getClusterFromPeers(urls, true, tr)
}
// If logerr is true, it prints out more error messages.
func getClusterFromPeers(urls []string, logerr bool, tr *http.Transport) (*Cluster, error) {
cc := &http.Client{
Transport: tr,
Timeout: time.Second,
}
for _, u := range urls {
resp, err := cc.Get(u + "/members")
if err != nil {
if logerr {
log.Printf("etcdserver: could not get cluster response from %s: %v", u, err)
}
continue
}
b, err := ioutil.ReadAll(resp.Body)
if err != nil {
if logerr {
log.Printf("etcdserver: could not read the body of cluster response: %v", err)
}
continue
}
var membs []*Member
if err := json.Unmarshal(b, &membs); err != nil {
if logerr {
log.Printf("etcdserver: could not unmarshal cluster response: %v", err)
}
continue
}
id, err := types.IDFromString(resp.Header.Get("X-Etcd-Cluster-ID"))
if err != nil {
if logerr {
log.Printf("etcdserver: could not parse the cluster ID from cluster res: %v", err)
}
continue
}
return NewClusterFromMembers("", id, membs), nil
}
return nil, fmt.Errorf("etcdserver: could not retrieve cluster information from the given urls")
}
// getOtherPeerURLs returns peer urls of other members in the cluster. The
// returned list is sorted in ascending lexicographical order.
func getOtherPeerURLs(cl ClusterInfo, self string) []string {
us := make([]string, 0)
for _, m := range cl.Members() {
if m.Name == self {
continue
}
us = append(us, m.PeerURLs...)
}
sort.Strings(us)
return us
}

View File

@ -235,20 +235,18 @@ func TestApplyRequest(t *testing.T) {
},
},
},
// PUT with PrevExist=true *and* PrevIndex set ==> Update
// TODO(jonboulle): is this expected?!
// PUT with PrevExist=true *and* PrevIndex set ==> CompareAndSwap
{
pb.Request{Method: "PUT", ID: 1, PrevExist: pbutil.Boolp(true), PrevIndex: 1},
Response{Event: &store.Event{}},
[]testutil.Action{
{
Name: "Update",
Params: []interface{}{"", "", time.Time{}},
Name: "CompareAndSwap",
Params: []interface{}{"", "", uint64(1), "", time.Time{}},
},
},
},
// PUT with PrevExist=false *and* PrevIndex set ==> Create
// TODO(jonboulle): is this expected?!
{
pb.Request{Method: "PUT", ID: 1, PrevExist: pbutil.Boolp(false), PrevIndex: 1},
Response{Event: &store.Event{}},
@ -1027,8 +1025,8 @@ func TestPublish(t *testing.T) {
t.Errorf("method = %s, want PUT", r.Method)
}
wm := Member{ID: 1, Attributes: Attributes{Name: "node1", ClientURLs: []string{"http://a", "http://b"}}}
if w := path.Join(memberStoreKey(wm.ID), attributesSuffix); r.Path != w {
t.Errorf("path = %s, want %s", r.Path, w)
if wpath := path.Join(memberStoreKey(wm.ID), attributesSuffix); r.Path != wpath {
t.Errorf("path = %s, want %s", r.Path, wpath)
}
var gattr Attributes
if err := json.Unmarshal([]byte(r.Val), &gattr); err != nil {
@ -1072,8 +1070,8 @@ func TestPublishRetry(t *testing.T) {
action := n.Action()
// multiple Proposes
if n := len(action); n < 2 {
t.Errorf("len(action) = %d, want >= 2", n)
if cnt := len(action); cnt < 2 {
t.Errorf("len(action) = %d, want >= 2", cnt)
}
}
@ -1135,7 +1133,7 @@ func TestGetOtherPeerURLs(t *testing.T) {
}
for i, tt := range tests {
cl := NewClusterFromMembers("", types.ID(0), tt.membs)
urls := getOtherPeerURLs(cl, tt.self)
urls := getRemotePeerURLs(cl, tt.self)
if !reflect.DeepEqual(urls, tt.wurls) {
t.Errorf("#%d: urls = %+v, want %+v", i, urls, tt.wurls)
}
@ -1391,8 +1389,10 @@ type nopTransporter struct{}
func (s *nopTransporter) Handler() http.Handler { return nil }
func (s *nopTransporter) Send(m []raftpb.Message) {}
func (s *nopTransporter) AddRemote(id types.ID, us []string) {}
func (s *nopTransporter) AddPeer(id types.ID, us []string) {}
func (s *nopTransporter) RemovePeer(id types.ID) {}
func (s *nopTransporter) RemoveAllPeers() {}
func (s *nopTransporter) UpdatePeer(id types.ID, us []string) {}
func (s *nopTransporter) Stop() {}
func (s *nopTransporter) Pause() {}

View File

@ -16,6 +16,8 @@ package etcdserver
import (
"log"
"os"
"path"
pb "github.com/coreos/etcd/etcdserver/etcdserverpb"
"github.com/coreos/etcd/migrate"
@ -23,6 +25,7 @@ import (
"github.com/coreos/etcd/pkg/types"
"github.com/coreos/etcd/raft/raftpb"
"github.com/coreos/etcd/snap"
"github.com/coreos/etcd/version"
"github.com/coreos/etcd/wal"
"github.com/coreos/etcd/wal/walpb"
)
@ -91,14 +94,47 @@ func readWAL(waldir string, snap walpb.Snapshot) (w *wal.WAL, id, cid types.ID,
// upgradeWAL converts an older version of the etcdServer data to the newest version.
// It must ensure that, after upgrading, the most recent version is present.
func upgradeWAL(cfg *ServerConfig, ver wal.WalVersion) error {
if ver == wal.WALv0_4 {
func upgradeDataDir(baseDataDir string, name string, ver version.DataDirVersion) error {
switch ver {
case version.DataDir0_4:
log.Print("etcdserver: converting v0.4 log to v2.0")
err := migrate.Migrate4To2(cfg.DataDir, cfg.Name)
err := migrate.Migrate4To2(baseDataDir, name)
if err != nil {
log.Fatalf("etcdserver: failed migrating data-dir: %v", err)
return err
}
fallthrough
case version.DataDir2_0:
err := makeMemberDir(baseDataDir)
if err != nil {
return err
}
fallthrough
case version.DataDir2_0_1:
fallthrough
default:
log.Printf("etcdserver: datadir is valid for the 2.0.1 format")
}
return nil
}
func makeMemberDir(dir string) error {
membdir := path.Join(dir, "member")
_, err := os.Stat(membdir)
switch {
case err == nil:
return nil
case !os.IsNotExist(err):
return err
}
if err := os.MkdirAll(membdir, 0700); err != nil {
return err
}
names := []string{"snap", "wal"}
for _, name := range names {
if err := os.Rename(path.Join(dir, name), path.Join(membdir, name)); err != nil {
return err
}
}
return nil
}

View File

@ -170,6 +170,46 @@ func TestForceNewCluster(t *testing.T) {
clusterMustProgress(t, c.Members[:1])
}
// Ensure we can remove a member then add a new one back immediately.
func TestIssue2681(t *testing.T) {
defer afterTest(t)
c := NewCluster(t, 5)
c.Launch(t)
defer c.Terminate(t)
c.RemoveMember(t, uint64(c.Members[4].s.ID()))
c.waitLeader(t, c.Members)
c.AddMember(t)
c.waitLeader(t, c.Members)
clusterMustProgress(t, c.Members)
}
// Ensure we can remove a member after a snapshot then add a new one back.
func TestIssue2746(t *testing.T) {
defer afterTest(t)
c := NewCluster(t, 5)
for _, m := range c.Members {
m.SnapCount = 10
}
c.Launch(t)
defer c.Terminate(t)
// force a snapshot
for i := 0; i < 20; i++ {
clusterMustProgress(t, c.Members)
}
c.RemoveMember(t, uint64(c.Members[4].s.ID()))
c.waitLeader(t, c.Members)
c.AddMember(t)
c.waitLeader(t, c.Members)
clusterMustProgress(t, c.Members)
}
// clusterMustProgress ensures that cluster can make progress. It creates
// a random key first, and check the new key could be got from all client urls
// of the cluster.
@ -186,13 +226,13 @@ func clusterMustProgress(t *testing.T, membs []*member) {
for i, m := range membs {
u := m.URL()
cc := mustNewHTTPClient(t, []string{u})
kapi := client.NewKeysAPI(cc)
ctx, cancel := context.WithTimeout(context.Background(), requestTimeout)
if _, err := kapi.Watch(key, resp.Node.ModifiedIndex).Next(ctx); err != nil {
mcc := mustNewHTTPClient(t, []string{u})
mkapi := client.NewKeysAPI(mcc)
mctx, mcancel := context.WithTimeout(context.Background(), requestTimeout)
if _, err := mkapi.Watch(key, resp.Node.ModifiedIndex).Next(mctx); err != nil {
t.Fatalf("#%d: watch on %s error: %v", i, u, err)
}
cancel()
mcancel()
}
}
@ -547,6 +587,24 @@ func (m *member) Launch() error {
return nil
}
func (m *member) WaitOK(t *testing.T) {
cc := mustNewHTTPClient(t, []string{m.URL()})
kapi := client.NewKeysAPI(cc)
for {
ctx, cancel := context.WithTimeout(context.Background(), requestTimeout)
_, err := kapi.Get(ctx, "/")
if err != nil {
time.Sleep(tickDuration)
continue
}
cancel()
break
}
for m.s.Leader() == 0 {
time.Sleep(tickDuration)
}
}
func (m *member) URL() string { return m.ClientURLs[0].String() }
func (m *member) Pause() {
@ -605,7 +663,7 @@ func mustNewHTTPClient(t *testing.T, eps []string) client.HTTPClient {
}
func mustNewTransport(t *testing.T) *http.Transport {
tr, err := transport.NewTimeoutTransport(transport.TLSInfo{}, rafthttp.ConnReadTimeout, rafthttp.ConnWriteTimeout)
tr, err := transport.NewTimeoutTransport(transport.TLSInfo{}, rafthttp.DialTimeout, rafthttp.ConnReadTimeout, rafthttp.ConnWriteTimeout)
if err != nil {
t.Fatal(err)
}

View File

@ -15,9 +15,14 @@
package integration
import (
"fmt"
"io/ioutil"
"os"
"reflect"
"testing"
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/net/context"
"github.com/coreos/etcd/client"
)
func TestPauseMember(t *testing.T) {
@ -74,3 +79,44 @@ func TestLaunchDuplicateMemberShouldFail(t *testing.T) {
t.Errorf("unexpect successful launch")
}
}
func TestSnapshotAndRestartMember(t *testing.T) {
defer afterTest(t)
m := mustNewMember(t, "snapAndRestartTest")
m.SnapCount = 100
m.Launch()
defer m.Terminate(t)
m.WaitOK(t)
resps := make([]*client.Response, 120)
var err error
for i := 0; i < 120; i++ {
cc := mustNewHTTPClient(t, []string{m.URL()})
kapi := client.NewKeysAPI(cc)
ctx, cancel := context.WithTimeout(context.Background(), requestTimeout)
key := fmt.Sprintf("foo%d", i)
resps[i], err = kapi.Create(ctx, "/"+key, "bar", -1)
if err != nil {
t.Fatalf("#%d: create on %s error: %v", i, m.URL(), err)
}
cancel()
}
m.Stop(t)
m.Restart(t)
for i := 0; i < 120; i++ {
cc := mustNewHTTPClient(t, []string{m.URL()})
kapi := client.NewKeysAPI(cc)
ctx, cancel := context.WithTimeout(context.Background(), requestTimeout)
key := fmt.Sprintf("foo%d", i)
resp, err := kapi.Get(ctx, "/"+key)
if err != nil {
t.Fatalf("#%d: get on %s error: %v", i, m.URL(), err)
}
cancel()
if !reflect.DeepEqual(resp.Node, resps[i].Node) {
t.Errorf("#%d: node = %v, want %v", i, resp.Node, resps[i].Node)
}
}
}

View File

@ -327,21 +327,21 @@ func TestV2Delete(t *testing.T) {
v := url.Values{}
v.Set("value", "XXX")
resp, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo"), v)
r, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
resp, err = tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/emptydir?dir=true"), v)
r.Body.Close()
r, err = tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/emptydir?dir=true"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
resp, err = tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foodir/bar?dir=true"), v)
r.Body.Close()
r, err = tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foodir/bar?dir=true"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
r.Body.Close()
tests := []struct {
relativeURL string
@ -423,17 +423,17 @@ func TestV2CAD(t *testing.T) {
v := url.Values{}
v.Set("value", "XXX")
resp, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo"), v)
r, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
r.Body.Close()
resp, err = tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foovalue"), v)
r, err = tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foovalue"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
r.Body.Close()
tests := []struct {
relativeURL string
@ -582,11 +582,11 @@ func TestV2Get(t *testing.T) {
v := url.Values{}
v.Set("value", "XXX")
resp, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo/bar/zar"), v)
r, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo/bar/zar"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
r.Body.Close()
tests := []struct {
relativeURL string
@ -676,11 +676,11 @@ func TestV2QuorumGet(t *testing.T) {
v := url.Values{}
v.Set("value", "XXX")
resp, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo/bar/zar?quorum=true"), v)
r, err := tc.PutForm(fmt.Sprintf("%s%s", u, "/v2/keys/foo/bar/zar?quorum=true"), v)
if err != nil {
t.Error(err)
}
resp.Body.Close()
r.Body.Close()
tests := []struct {
relativeURL string

View File

@ -23,9 +23,7 @@
package main
import (
"github.com/coreos/etcd/etcdmain"
)
import "github.com/coreos/etcd/etcdmain"
func main() {
etcdmain.Main()

View File

@ -103,7 +103,7 @@ func Migrate4To2(dataDir string, name string) error {
st2 := cfg4.HardState2()
// If we've got the most recent snapshot, we can use it's committed index. Still likely less than the current actual index, but worth it for the replay.
if snap2 != nil {
if snap2 != nil && st2.Commit < snap2.Metadata.Index {
st2.Commit = snap2.Metadata.Index
}
@ -175,8 +175,8 @@ func GuessNodeID(nodes map[string]uint64, snap4 *Snapshot4, cfg *Config4, name s
delete(snapNodes, p.Name)
}
if len(snapNodes) == 1 {
for name, id := range nodes {
log.Printf("Autodetected from snapshot: name %s", name)
for nodename, id := range nodes {
log.Printf("Autodetected from snapshot: name %s", nodename)
return id
}
}
@ -186,8 +186,8 @@ func GuessNodeID(nodes map[string]uint64, snap4 *Snapshot4, cfg *Config4, name s
delete(nodes, p.Name)
}
if len(nodes) == 1 {
for name, id := range nodes {
log.Printf("Autodetected name %s", name)
for nodename, id := range nodes {
log.Printf("Autodetected name %s", nodename)
return id
}
}

View File

@ -43,7 +43,7 @@ type Snapshot4 struct {
} `json:"peers"`
}
type sstore struct {
type Store4 struct {
Root *node
CurrentIndex uint64
CurrentVersion int
@ -63,6 +63,24 @@ type node struct {
Children map[string]*node // for directory
}
func deepCopyNode(n *node, parent *node) *node {
out := &node{
Path: n.Path,
CreatedIndex: n.CreatedIndex,
ModifiedIndex: n.ModifiedIndex,
Parent: parent,
ExpireTime: n.ExpireTime,
ACL: n.ACL,
Value: n.Value,
Children: make(map[string]*node),
}
for k, v := range n.Children {
out.Children[k] = deepCopyNode(v, out)
}
return out
}
func replacePathNames(n *node, s1, s2 string) {
n.Path = path.Clean(strings.Replace(n.Path, s1, s2, 1))
for _, c := range n.Children {
@ -87,9 +105,23 @@ func pullNodesFromEtcd(n *node) map[string]uint64 {
return out
}
func fixEtcd(n *node) {
n.Path = "/0"
machines := n.Children["machines"]
func fixEtcd(etcdref *node) *node {
n := &node{
Path: "/0",
CreatedIndex: etcdref.CreatedIndex,
ModifiedIndex: etcdref.ModifiedIndex,
ExpireTime: etcdref.ExpireTime,
ACL: etcdref.ACL,
Children: make(map[string]*node),
}
var machines *node
if machineOrig, ok := etcdref.Children["machines"]; ok {
machines = deepCopyNode(machineOrig, n)
}
if machines == nil {
return n
}
n.Children["members"] = &node{
Path: "/0/members",
CreatedIndex: machines.CreatedIndex,
@ -97,6 +129,7 @@ func fixEtcd(n *node) {
ExpireTime: machines.ExpireTime,
ACL: machines.ACL,
Children: make(map[string]*node),
Parent: n,
}
for name, c := range machines.Children {
q, err := url.ParseQuery(c.Value)
@ -121,29 +154,32 @@ func fixEtcd(n *node) {
ModifiedIndex: c.ModifiedIndex,
ExpireTime: c.ExpireTime,
ACL: c.ACL,
Children: map[string]*node{
"attributes": &node{
Path: path.Join("/0/members", m.ID.String(), "attributes"),
CreatedIndex: c.CreatedIndex,
ModifiedIndex: c.ModifiedIndex,
ExpireTime: c.ExpireTime,
ACL: c.ACL,
Value: string(attrBytes),
},
"raftAttributes": &node{
Path: path.Join("/0/members", m.ID.String(), "raftAttributes"),
CreatedIndex: c.CreatedIndex,
ModifiedIndex: c.ModifiedIndex,
ExpireTime: c.ExpireTime,
ACL: c.ACL,
Value: string(raftBytes),
},
},
Children: make(map[string]*node),
Parent: n.Children["members"],
}
attrs := &node{
Path: path.Join("/0/members", m.ID.String(), "attributes"),
CreatedIndex: c.CreatedIndex,
ModifiedIndex: c.ModifiedIndex,
ExpireTime: c.ExpireTime,
ACL: c.ACL,
Value: string(attrBytes),
Parent: newNode,
}
newNode.Children["attributes"] = attrs
raftAttrs := &node{
Path: path.Join("/0/members", m.ID.String(), "raftAttributes"),
CreatedIndex: c.CreatedIndex,
ModifiedIndex: c.ModifiedIndex,
ExpireTime: c.ExpireTime,
ACL: c.ACL,
Value: string(raftBytes),
Parent: newNode,
}
newNode.Children["raftAttributes"] = raftAttrs
n.Children["members"].Children[m.ID.String()] = newNode
}
delete(n.Children, "machines")
return n
}
func mangleRoot(n *node) *node {
@ -157,15 +193,15 @@ func mangleRoot(n *node) *node {
}
newRoot.Children["1"] = n
etcd := n.Children["_etcd"]
delete(n.Children, "_etcd")
replacePathNames(n, "/", "/1/")
fixEtcd(etcd)
newRoot.Children["0"] = etcd
newZero := fixEtcd(etcd)
newZero.Parent = newRoot
newRoot.Children["0"] = newZero
return newRoot
}
func (s *Snapshot4) GetNodesFromStore() map[string]uint64 {
st := &sstore{}
st := &Store4{}
if err := json.Unmarshal(s.State, st); err != nil {
log.Fatal("Couldn't unmarshal snapshot")
}
@ -174,7 +210,7 @@ func (s *Snapshot4) GetNodesFromStore() map[string]uint64 {
}
func (s *Snapshot4) Snapshot2() *raftpb.Snapshot {
st := &sstore{}
st := &Store4{}
if err := json.Unmarshal(s.State, st); err != nil {
log.Fatal("Couldn't unmarshal snapshot")
}

70
migrate/standby.go Normal file
View File

@ -0,0 +1,70 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package migrate
import (
"bytes"
"encoding/json"
"fmt"
"os"
)
type StandbyInfo4 struct {
Running bool
Cluster []*MachineMessage
SyncInterval float64
}
// MachineMessage represents information about a peer or standby in the registry.
type MachineMessage struct {
Name string `json:"name"`
State string `json:"state"`
ClientURL string `json:"clientURL"`
PeerURL string `json:"peerURL"`
}
func (si *StandbyInfo4) ClientURLs() []string {
var urls []string
for _, m := range si.Cluster {
urls = append(urls, m.ClientURL)
}
return urls
}
func (si *StandbyInfo4) InitialCluster() string {
b := &bytes.Buffer{}
first := true
for _, m := range si.Cluster {
if !first {
fmt.Fprintf(b, ",")
}
first = false
fmt.Fprintf(b, "%s=%s", m.Name, m.PeerURL)
}
return b.String()
}
func DecodeStandbyInfo4FromFile(path string) (*StandbyInfo4, error) {
var info StandbyInfo4
file, err := os.OpenFile(path, os.O_RDONLY, 0600)
if err != nil {
return nil, err
}
defer file.Close()
if err = json.NewDecoder(file).Decode(&info); err != nil {
return nil, err
}
return &info, nil
}

27
pkg/coreos/coreos.go Normal file
View File

@ -0,0 +1,27 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package coreos
import (
"io/ioutil"
"strings"
)
func IsCoreOS() bool {
b, err := ioutil.ReadFile("/usr/lib/os-release")
if err != nil {
return false
}
return strings.Contains(string(b), "ID=coreos")
}

View File

@ -0,0 +1,90 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package fileutil
import (
"errors"
"os"
"syscall"
"time"
)
var (
ErrLocked = errors.New("file already locked")
)
type Lock interface {
Name() string
TryLock() error
Lock() error
Unlock() error
Destroy() error
}
type lock struct {
fname string
file *os.File
}
func (l *lock) Name() string {
return l.fname
}
// TryLock acquires exclusivity on the lock without blocking
func (l *lock) TryLock() error {
err := os.Chmod(l.fname, syscall.DMEXCL|0600)
if err != nil {
return err
}
f, err := os.Open(l.fname)
if err != nil {
return ErrLocked
}
l.file = f
return nil
}
// Lock acquires exclusivity on the lock with blocking
func (l *lock) Lock() error {
err := os.Chmod(l.fname, syscall.DMEXCL|0600)
if err != nil {
return err
}
for {
f, err := os.Open(l.fname)
if err == nil {
l.file = f
return nil
}
time.Sleep(10 * time.Millisecond)
}
}
// Unlock unlocks the lock
func (l *lock) Unlock() error {
return l.file.Close()
}
func (l *lock) Destroy() error {
return nil
}
func NewLock(file string) (Lock, error) {
l := &lock{fname: file}
return l, nil
}

View File

@ -0,0 +1,98 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// +build solaris
package fileutil
import (
"errors"
"os"
"syscall"
)
var (
ErrLocked = errors.New("file already locked")
)
type Lock interface {
Name() string
TryLock() error
Lock() error
Unlock() error
Destroy() error
}
type lock struct {
fd int
file *os.File
}
func (l *lock) Name() string {
return l.file.Name()
}
// TryLock acquires exclusivity on the lock without blocking
func (l *lock) TryLock() error {
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Pid = 0
lock.Type = syscall.F_WRLCK
lock.Whence = 0
lock.Pid = 0
err := syscall.FcntlFlock(uintptr(l.fd), syscall.F_SETLK, &lock)
if err != nil && err == syscall.EAGAIN {
return ErrLocked
}
return err
}
// Lock acquires exclusivity on the lock without blocking
func (l *lock) Lock() error {
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Type = syscall.F_WRLCK
lock.Whence = 0
lock.Pid = 0
return syscall.FcntlFlock(uintptr(l.fd), syscall.F_SETLK, &lock)
}
// Unlock unlocks the lock
func (l *lock) Unlock() error {
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Type = syscall.F_UNLCK
lock.Whence = 0
err := syscall.FcntlFlock(uintptr(l.fd), syscall.F_SETLK, &lock)
if err != nil && err == syscall.EAGAIN {
return ErrLocked
}
return err
}
func (l *lock) Destroy() error {
return l.file.Close()
}
func NewLock(file string) (Lock, error) {
f, err := os.OpenFile(file, os.O_WRONLY, 0600)
if err != nil {
return nil, err
}
l := &lock{int(f.Fd()), f}
return l, nil
}

View File

@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
// +build !windows,!plan9
// +build !windows,!plan9,!solaris
package fileutil

View File

@ -85,6 +85,21 @@ func SetFlagsFromEnv(fs *flag.FlagSet) error {
return err
}
// SetBindAddrFromAddr sets the value of bindAddr flag from the value
// of addr flag. Both flags' Value must be of type IPAddressPort. If the
// bindAddr flag is set and the addr flag is unset, it will set bindAddr to
// [::]:port of addr. Otherwise, it keeps the original values.
func SetBindAddrFromAddr(fs *flag.FlagSet, bindAddrFlagName, addrFlagName string) {
if IsSet(fs, bindAddrFlagName) || !IsSet(fs, addrFlagName) {
return
}
addr := *fs.Lookup(addrFlagName).Value.(*IPAddressPort)
addr.IP = "::"
if err := fs.Set(bindAddrFlagName, addr.String()); err != nil {
log.Panicf("etcdmain: unexpected flags set error: %v", err)
}
}
// URLsFromFlags decides what URLs should be using two different flags
// as datasources. The first flag's Value must be of type URLs, while
// the second must be of type IPAddressPort. If both of these flags
@ -119,3 +134,13 @@ func URLsFromFlags(fs *flag.FlagSet, urlsFlagName string, addrFlagName string, t
return []url.URL(*fs.Lookup(urlsFlagName).Value.(*URLsValue)), nil
}
func IsSet(fs *flag.FlagSet, name string) bool {
set := false
fs.Visit(func(f *flag.Flag) {
if f.Name == name {
set = true
}
})
return set
}

View File

@ -81,6 +81,54 @@ func TestSetFlagsFromEnvBad(t *testing.T) {
}
}
func TestSetBindAddrFromAddr(t *testing.T) {
tests := []struct {
args []string
waddr *IPAddressPort
}{
// no flags set
{
args: []string{},
waddr: &IPAddressPort{},
},
// addr flag set
{
args: []string{"-addr=192.0.3.17:4001"},
waddr: &IPAddressPort{IP: "::", Port: 4001},
},
// bindAddr flag set
{
args: []string{"-bind-addr=127.0.0.1:4001"},
waddr: &IPAddressPort{IP: "127.0.0.1", Port: 4001},
},
// both addr flags set
{
args: []string{"-bind-addr=127.0.0.1:4001", "-addr=192.0.3.17:4001"},
waddr: &IPAddressPort{IP: "127.0.0.1", Port: 4001},
},
// both addr flags set, IPv6
{
args: []string{"-bind-addr=[2001:db8::4:9]:4001", "-addr=[2001:db8::4:f0]:4001"},
waddr: &IPAddressPort{IP: "2001:db8::4:9", Port: 4001},
},
}
for i, tt := range tests {
fs := flag.NewFlagSet("test", flag.PanicOnError)
fs.Var(&IPAddressPort{}, "addr", "")
bindAddr := &IPAddressPort{}
fs.Var(bindAddr, "bind-addr", "")
if err := fs.Parse(tt.args); err != nil {
t.Errorf("#%d: failed to parse flags: %v", i, err)
continue
}
SetBindAddrFromAddr(fs, "bind-addr", "addr")
if !reflect.DeepEqual(bindAddr, tt.waddr) {
t.Errorf("#%d: bindAddr = %+v, want %+v", i, bindAddr, tt.waddr)
}
}
}
func TestURLsFromFlags(t *testing.T) {
tests := []struct {
args []string

View File

@ -16,7 +16,6 @@ package flags
import (
"errors"
"fmt"
"net"
"strconv"
"strings"
@ -32,26 +31,26 @@ type IPAddressPort struct {
func (a *IPAddressPort) Set(arg string) error {
arg = strings.TrimSpace(arg)
parts := strings.SplitN(arg, ":", 2)
if len(parts) != 2 {
return errors.New("bad format in address specification")
host, portStr, err := net.SplitHostPort(arg)
if err != nil {
return err
}
if net.ParseIP(parts[0]) == nil {
if net.ParseIP(host) == nil {
return errors.New("bad IP in address specification")
}
port, err := strconv.Atoi(parts[1])
port, err := strconv.Atoi(portStr)
if err != nil {
return errors.New("bad port in address specification")
}
a.IP = parts[0]
a.IP = host
a.Port = port
return nil
}
func (a *IPAddressPort) String() string {
return fmt.Sprintf("%s:%d", a.IP, a.Port)
return net.JoinHostPort(a.IP, strconv.Itoa(a.Port))
}

View File

@ -22,6 +22,7 @@ func TestIPAddressPortSet(t *testing.T) {
pass := []string{
"1.2.3.4:8080",
"10.1.1.1:80",
"[2001:db8::1]:8080",
}
fail := []string{
@ -40,6 +41,8 @@ func TestIPAddressPortSet(t *testing.T) {
"234#$",
"file://foo/bar",
"http://hello",
"2001:db8::1",
"2001:db8::1:1",
}
for i, tt := range pass {
@ -58,14 +61,20 @@ func TestIPAddressPortSet(t *testing.T) {
}
func TestIPAddressPortString(t *testing.T) {
f := &IPAddressPort{}
if err := f.Set("127.0.0.1:4001"); err != nil {
t.Fatalf("unexpected error: %v", err)
addresses := []string{
"[2001:db8::1:1234]:4001",
"127.0.0.1:4001",
}
for i, tt := range addresses {
f := &IPAddressPort{}
if err := f.Set(tt); err != nil {
t.Errorf("#%d: unexpected error: %v", i, err)
}
want := "127.0.0.1:4001"
got := f.String()
if want != got {
t.Fatalf("IPAddressPort.String() value should be %q, got %q", want, got)
want := tt
got := f.String()
if want != got {
t.Errorf("#%d: IPAddressPort.String() value should be %q, got %q", i, want, got)
}
}
}

View File

@ -0,0 +1,81 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// +build !windows,!plan9
// InterruptHandler is a function that is called on receiving a
// SIGTERM or SIGINT signal.
package osutil
import (
"log"
"os"
"os/signal"
"sync"
"syscall"
)
type InterruptHandler func()
var (
interruptRegisterMu, interruptExitMu sync.Mutex
// interruptHandlers holds all registered InterruptHandlers in order
// they will be executed.
interruptHandlers = []InterruptHandler{}
)
// RegisterInterruptHandler registers a new InterruptHandler. Handlers registered
// after interrupt handing was initiated will not be executed.
func RegisterInterruptHandler(h InterruptHandler) {
interruptRegisterMu.Lock()
defer interruptRegisterMu.Unlock()
interruptHandlers = append(interruptHandlers, h)
}
// HandleInterrupts calls the handler functions on receiving a SIGINT or SIGTERM.
func HandleInterrupts() {
notifier := make(chan os.Signal, 1)
signal.Notify(notifier, syscall.SIGINT, syscall.SIGTERM)
go func() {
sig := <-notifier
interruptRegisterMu.Lock()
ihs := make([]InterruptHandler, len(interruptHandlers))
copy(ihs, interruptHandlers)
interruptRegisterMu.Unlock()
interruptExitMu.Lock()
log.Printf("received %v signal, shutting down...", sig)
for _, h := range ihs {
h()
}
signal.Stop(notifier)
pid := syscall.Getpid()
// exit directly if it is the "init" process, since the kernel will not help to kill pid 1.
if pid == 1 {
os.Exit(0)
}
syscall.Kill(pid, sig.(syscall.Signal))
}()
}
// Exit relays to os.Exit if no interrupt handlers are running, blocks otherwise.
func Exit(code int) {
interruptExitMu.Lock()
os.Exit(code)
}

View File

@ -0,0 +1,32 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// +build windows
package osutil
import "os"
type InterruptHandler func()
// RegisterInterruptHandler is a no-op on windows
func RegisterInterruptHandler(h InterruptHandler) {}
// HandleInterrupts is a no-op on windows
func HandleInterrupts() {}
// Exit calls os.Exit
func Exit(code int) {
os.Exit(code)
}

35
pkg/osutil/osutil.go Normal file
View File

@ -0,0 +1,35 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package osutil
import (
"os"
"strings"
)
func Unsetenv(key string) error {
envs := os.Environ()
os.Clearenv()
for _, e := range envs {
strs := strings.SplitN(e, "=", 2)
if strs[0] == key {
continue
}
if err := os.Setenv(strs[0], strs[1]); err != nil {
return err
}
}
return nil
}

88
pkg/osutil/osutil_test.go Normal file
View File

@ -0,0 +1,88 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package osutil
import (
"os"
"os/signal"
"reflect"
"syscall"
"testing"
"time"
)
func TestUnsetenv(t *testing.T) {
tests := []string{
"data",
"space data",
"equal=data",
}
for i, tt := range tests {
key := "ETCD_UNSETENV_TEST"
if os.Getenv(key) != "" {
t.Fatalf("#%d: cannot get empty %s", i, key)
}
env := os.Environ()
if err := os.Setenv(key, tt); err != nil {
t.Fatalf("#%d: cannot set %s: %v", i, key, err)
}
if err := Unsetenv(key); err != nil {
t.Errorf("#%d: unsetenv %s error: %v", i, key, err)
}
if g := os.Environ(); !reflect.DeepEqual(g, env) {
t.Errorf("#%d: env = %+v, want %+v", i, g, env)
}
}
}
func waitSig(t *testing.T, c <-chan os.Signal, sig os.Signal) {
select {
case s := <-c:
if s != sig {
t.Fatalf("signal was %v, want %v", s, sig)
}
case <-time.After(1 * time.Second):
t.Fatalf("timeout waiting for %v", sig)
}
}
func TestHandleInterrupts(t *testing.T) {
for _, sig := range []syscall.Signal{syscall.SIGINT, syscall.SIGTERM} {
n := 1
RegisterInterruptHandler(func() { n++ })
RegisterInterruptHandler(func() { n *= 2 })
c := make(chan os.Signal, 2)
signal.Notify(c, sig)
HandleInterrupts()
syscall.Kill(syscall.Getpid(), sig)
// we should receive the signal once from our own kill and
// a second time from HandleInterrupts
waitSig(t, c, sig)
waitSig(t, c, sig)
if n == 3 {
t.Fatalf("interrupt handlers were called in wrong order")
}
if n != 4 {
t.Fatalf("interrupt handlers were not called properly")
}
// reset interrupt handlers
interruptHandlers = interruptHandlers[:0]
interruptExitMu.Unlock()
}
}

View File

@ -15,6 +15,7 @@
package transport
import (
"crypto/tls"
"net"
"time"
)
@ -22,18 +23,26 @@ import (
// NewKeepAliveListener returns a listener that listens on the given address.
// http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
func NewKeepAliveListener(addr string, scheme string, info TLSInfo) (net.Listener, error) {
ln, err := NewListener(addr, scheme, info)
l, err := net.Listen("tcp", addr)
if err != nil {
return nil, err
}
if !info.Empty() && scheme == "https" {
cfg, err := info.ServerConfig()
if err != nil {
return nil, err
}
return newTLSKeepaliveListener(l, cfg), nil
}
return &keepaliveListener{
Listener: ln,
Listener: l,
}, nil
}
type keepaliveListener struct {
net.Listener
}
type keepaliveListener struct{ net.Listener }
func (kln *keepaliveListener) Accept() (net.Conn, error) {
c, err := kln.Listener.Accept()
@ -48,3 +57,37 @@ func (kln *keepaliveListener) Accept() (net.Conn, error) {
tcpc.SetKeepAlivePeriod(30 * time.Second)
return tcpc, nil
}
// A tlsKeepaliveListener implements a network listener (net.Listener) for TLS connections.
type tlsKeepaliveListener struct {
net.Listener
config *tls.Config
}
// Accept waits for and returns the next incoming TLS connection.
// The returned connection c is a *tls.Conn.
func (l *tlsKeepaliveListener) Accept() (c net.Conn, err error) {
c, err = l.Listener.Accept()
if err != nil {
return
}
tcpc := c.(*net.TCPConn)
// detection time: tcp_keepalive_time + tcp_keepalive_probes + tcp_keepalive_intvl
// default on linux: 30 + 8 * 30
// default on osx: 30 + 8 * 75
tcpc.SetKeepAlive(true)
tcpc.SetKeepAlivePeriod(30 * time.Second)
c = tls.Server(c, l.config)
return
}
// NewListener creates a Listener which accepts connections from an inner
// Listener and wraps each connection with Server.
// The configuration config must be non-nil and must have
// at least one certificate.
func newTLSKeepaliveListener(inner net.Listener, config *tls.Config) net.Listener {
l := &tlsKeepaliveListener{}
l.Listener = inner
l.config = config
return l
}

View File

@ -15,7 +15,9 @@
package transport
import (
"crypto/tls"
"net/http"
"os"
"testing"
)
@ -34,4 +36,29 @@ func TestNewKeepAliveListener(t *testing.T) {
t.Fatalf("unexpected Accept error: %v", err)
}
conn.Close()
ln.Close()
// tls
tmp, err := createTempFile([]byte("XXX"))
if err != nil {
t.Fatalf("unable to create tmpfile: %v", err)
}
defer os.Remove(tmp)
tlsInfo := TLSInfo{CertFile: tmp, KeyFile: tmp}
tlsInfo.parseFunc = fakeCertificateParserFunc(tls.Certificate{}, nil)
tlsln, err := NewKeepAliveListener("127.0.0.1:0", "https", tlsInfo)
if err != nil {
t.Fatalf("unexpected NewKeepAliveListener error: %v", err)
}
go http.Get("https://" + tlsln.Addr().String())
conn, err = tlsln.Accept()
if err != nil {
t.Fatalf("unexpected Accept error: %v", err)
}
if _, ok := conn.(*tls.Conn); !ok {
t.Errorf("failed to accept *tls.Conn")
}
conn.Close()
tlsln.Close()
}

View File

@ -31,7 +31,10 @@ func NewListener(addr string, scheme string, info TLSInfo) (net.Listener, error)
return nil, err
}
if !info.Empty() && scheme == "https" {
if scheme == "https" {
if info.Empty() {
return nil, fmt.Errorf("cannot listen on TLS for %s: KeyFile and CertFile are not presented", scheme+"://"+addr)
}
cfg, err := info.ServerConfig()
if err != nil {
return nil, err

View File

@ -70,6 +70,13 @@ func TestNewListenerTLSInfo(t *testing.T) {
}
}
func TestNewListenerTLSEmptyInfo(t *testing.T) {
_, err := NewListener("127.0.0.1:0", "https", TLSInfo{})
if err == nil {
t.Errorf("err = nil, want not presented error")
}
}
func TestNewListenerTLSInfoNonexist(t *testing.T) {
tlsInfo := TLSInfo{CertFile: "@badname", KeyFile: "@badname"}
_, err := NewListener("127.0.0.1:0", "https", tlsInfo)

View File

@ -23,14 +23,17 @@ import (
// NewTimeoutTransport returns a transport created using the given TLS info.
// If read/write on the created connection blocks longer than its time limit,
// it will return timeout error.
func NewTimeoutTransport(info TLSInfo, rdtimeoutd, wtimeoutd time.Duration) (*http.Transport, error) {
func NewTimeoutTransport(info TLSInfo, dialtimeoutd, rdtimeoutd, wtimeoutd time.Duration) (*http.Transport, error) {
tr, err := NewTransport(info)
if err != nil {
return nil, err
}
// the timeouted connection will tiemout soon after it is idle.
// it should not be put back to http transport as an idle connection for future usage.
tr.MaxIdleConnsPerHost = -1
tr.Dial = (&rwTimeoutDialer{
Dialer: net.Dialer{
Timeout: 30 * time.Second,
Timeout: dialtimeoutd,
KeepAlive: 30 * time.Second,
},
rdtimeoutd: rdtimeoutd,

View File

@ -15,6 +15,8 @@
package transport
import (
"bytes"
"io/ioutil"
"net/http"
"net/http/httptest"
"testing"
@ -24,11 +26,16 @@ import (
// TestNewTimeoutTransport tests that NewTimeoutTransport returns a transport
// that can dial out timeout connections.
func TestNewTimeoutTransport(t *testing.T) {
tr, err := NewTimeoutTransport(TLSInfo{}, time.Hour, time.Hour)
tr, err := NewTimeoutTransport(TLSInfo{}, time.Hour, time.Hour, time.Hour)
if err != nil {
t.Fatalf("unexpected NewTimeoutTransport error: %v", err)
}
srv := httptest.NewServer(http.NotFoundHandler())
remoteAddr := func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte(r.RemoteAddr))
}
srv := httptest.NewServer(http.HandlerFunc(remoteAddr))
defer srv.Close()
conn, err := tr.Dial("tcp", srv.Listener.Addr().String())
if err != nil {
@ -46,4 +53,33 @@ func TestNewTimeoutTransport(t *testing.T) {
if tconn.wtimeoutd != time.Hour {
t.Errorf("write timeout = %s, want %s", tconn.wtimeoutd, time.Hour)
}
// ensure not reuse timeout connection
req, err := http.NewRequest("GET", srv.URL, nil)
if err != nil {
t.Fatalf("unexpected err %v", err)
}
resp, err := tr.RoundTrip(req)
if err != nil {
t.Fatalf("unexpected err %v", err)
}
addr0, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
t.Fatalf("unexpected err %v", err)
}
resp, err = tr.RoundTrip(req)
if err != nil {
t.Fatalf("unexpected err %v", err)
}
addr1, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
t.Fatalf("unexpected err %v", err)
}
if bytes.Equal(addr0, addr1) {
t.Errorf("addr0 = %s addr1= %s, want not equal", string(addr0), string(addr1))
}
}

62
pkg/wait/wait_time.go Normal file
View File

@ -0,0 +1,62 @@
/*
Copyright 2015 CoreOS, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package wait
import (
"sync"
"time"
)
type WaitTime interface {
// Wait returns a chan that waits on the given deadline.
// The chan will be triggered when Trigger is called with a
// deadline that is later than the one it is waiting for.
// The given deadline MUST be unique. The deadline should be
// retrived by calling time.Now() in most cases.
Wait(deadline time.Time) <-chan struct{}
// Trigger triggers all the waiting chans with an earlier deadline.
Trigger(deadline time.Time)
}
type timeList struct {
l sync.Mutex
m map[int64]chan struct{}
}
func NewTimeList() *timeList {
return &timeList{m: make(map[int64]chan struct{})}
}
func (tl *timeList) Wait(deadline time.Time) <-chan struct{} {
tl.l.Lock()
defer tl.l.Unlock()
ch := make(chan struct{}, 1)
// The given deadline SHOULD be unique.
tl.m[deadline.UnixNano()] = ch
return ch
}
func (tl *timeList) Trigger(deadline time.Time) {
tl.l.Lock()
defer tl.l.Unlock()
for t, ch := range tl.m {
if t < deadline.UnixNano() {
delete(tl.m, t)
close(ch)
}
}
}

View File

@ -0,0 +1,85 @@
/*
Copyright 2015 CoreOS, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package wait
import (
"testing"
"time"
)
func TestWaitTime(t *testing.T) {
wt := NewTimeList()
ch1 := wt.Wait(time.Now())
t1 := time.Now()
wt.Trigger(t1)
select {
case <-ch1:
case <-time.After(10 * time.Millisecond):
t.Fatalf("cannot receive from ch as expected")
}
ch2 := wt.Wait(time.Now())
t2 := time.Now()
wt.Trigger(t1)
select {
case <-ch2:
t.Fatalf("unexpected to receive from ch")
case <-time.After(10 * time.Millisecond):
}
wt.Trigger(t2)
select {
case <-ch2:
case <-time.After(10 * time.Millisecond):
t.Fatalf("cannot receive from ch as expected")
}
}
func TestWaitTestStress(t *testing.T) {
chs := make([]<-chan struct{}, 0)
wt := NewTimeList()
for i := 0; i < 10000; i++ {
chs = append(chs, wt.Wait(time.Now()))
}
wt.Trigger(time.Now())
for _, ch := range chs {
select {
case <-ch:
case <-time.After(10 * time.Millisecond):
t.Fatalf("cannot receive from ch as expected")
}
}
}
func BenchmarkWaitTime(b *testing.B) {
t := time.Now()
wt := NewTimeList()
for i := 0; i < b.N; i++ {
wt.Wait(t)
}
}
func BenchmarkTriggerAnd10KWaitTime(b *testing.B) {
for i := 0; i < b.N; i++ {
t := time.Now()
wt := NewTimeList()
for j := 0; j < 10000; j++ {
wt.Wait(t)
}
wt.Trigger(time.Now())
}
}

View File

@ -16,6 +16,7 @@ package proxy
import (
"log"
"math/rand"
"net/url"
"sync"
"time"
@ -65,6 +66,13 @@ func (d *director) refresh() {
}
endpoints = append(endpoints, newEndpoint(*uu))
}
// shuffle array to avoid connections being "stuck" to a single endpoint
for i := range endpoints {
j := rand.Intn(i + 1)
endpoints[i], endpoints[j] = endpoints[j], endpoints[i]
}
d.ep = endpoints
}

View File

@ -18,6 +18,17 @@ import (
"net/http"
)
const (
// DefaultMaxIdleConnsPerHost indicates the default maximal idle connections
// maintained between proxy and each member. We set it to 128 to
// let proxy handle 128 concurrent requests in long term smoothly.
// If the number of concurrent requests is bigger than this value,
// proxy needs to create one new connection when handling each request in
// the delta, which is bad because the creation consumes resource and
// may eat up ephemeral ports.
DefaultMaxIdleConnsPerHost = 128
)
// GetProxyURLs is a function which should return the current set of URLs to
// which client requests should be proxied. This function will be queried
// periodically by the proxy Handler to refresh the set of available

View File

@ -15,8 +15,10 @@
package proxy
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"log"
"net"
"net/http"
@ -55,6 +57,21 @@ func (p *reverseProxy) ServeHTTP(rw http.ResponseWriter, clientreq *http.Request
proxyreq := new(http.Request)
*proxyreq = *clientreq
var (
proxybody []byte
err error
)
if clientreq.Body != nil {
proxybody, err = ioutil.ReadAll(clientreq.Body)
if err != nil {
msg := fmt.Sprintf("proxy: failed to read request body: %v", err)
e := httptypes.NewHTTPError(http.StatusInternalServerError, msg)
e.WriteTo(rw)
return
}
}
// deep-copy the headers, as these will be modified below
proxyreq.Header = make(http.Header)
copyHeader(proxyreq.Header, clientreq.Header)
@ -73,10 +90,31 @@ func (p *reverseProxy) ServeHTTP(rw http.ResponseWriter, clientreq *http.Request
return
}
completeCh := make(chan bool, 1)
closeNotifier, ok := rw.(http.CloseNotifier)
if ok {
go func() {
select {
case <-closeNotifier.CloseNotify():
tp, ok := p.transport.(*http.Transport)
if ok {
tp.CancelRequest(proxyreq)
}
case <-completeCh:
}
}()
defer func() {
completeCh <- true
}()
}
var res *http.Response
var err error
for _, ep := range endpoints {
if proxybody != nil {
proxyreq.Body = ioutil.NopCloser(bytes.NewBuffer(proxybody))
}
redirectRequest(proxyreq, ep.URL)
res, err = p.transport.RoundTrip(proxyreq)

View File

@ -18,7 +18,7 @@ Package raft provides an implementation of the raft consensus algorithm.
The primary object in raft is a Node. You either start a Node from scratch
using raft.StartNode or start a Node from some initial state using raft.RestartNode.
storage := raft.NewMemoryStorage()
n := raft.StartNode(0x01, []int64{0x02, 0x03}, 3, 1, storage)
n := raft.StartNode(0x01, []raft.Peer{{ID: 0x02}, {ID: 0x03}}, 3, 1, storage)
Now that you are holding onto a Node you have a few responsibilities:

View File

@ -65,7 +65,7 @@ func newLog(storage Storage) *raftLog {
}
func (l *raftLog) String() string {
return fmt.Sprintf("committed=%d, applied=%d, unstable.offset=%d, len(unstable.Entries)=%d", l.unstable.offset, l.committed, l.applied, len(l.unstable.entries))
return fmt.Sprintf("committed=%d, applied=%d, unstable.offset=%d, len(unstable.Entries)=%d", l.committed, l.applied, l.unstable.offset, len(l.unstable.entries))
}
// maybeAppend returns (0, false) if the entries cannot be appended. Otherwise,

View File

@ -296,15 +296,15 @@ func TestCompactionSideEffects(t *testing.T) {
t.Errorf("lastIndex = %d, want %d", raftLog.lastIndex(), lastIndex)
}
for i := offset; i <= raftLog.lastIndex(); i++ {
if raftLog.term(i) != i {
t.Errorf("term(%d) = %d, want %d", i, raftLog.term(i), i)
for j := offset; j <= raftLog.lastIndex(); j++ {
if raftLog.term(j) != j {
t.Errorf("term(%d) = %d, want %d", j, raftLog.term(j), j)
}
}
for i := offset; i <= raftLog.lastIndex(); i++ {
if !raftLog.matchTerm(i, i) {
t.Errorf("matchTerm(%d) = false, want true", i)
for j := offset; j <= raftLog.lastIndex(); j++ {
if !raftLog.matchTerm(j, j) {
t.Errorf("matchTerm(%d) = false, want true", j)
}
}
@ -354,9 +354,9 @@ func TestNextEnts(t *testing.T) {
raftLog.maybeCommit(5, 1)
raftLog.appliedTo(tt.applied)
ents := raftLog.nextEnts()
if !reflect.DeepEqual(ents, tt.wents) {
t.Errorf("#%d: ents = %+v, want %+v", i, ents, tt.wents)
nents := raftLog.nextEnts()
if !reflect.DeepEqual(nents, tt.wents) {
t.Errorf("#%d: nents = %+v, want %+v", i, nents, tt.wents)
}
}
}
@ -649,10 +649,10 @@ func TestTerm(t *testing.T) {
{offset + num, 0},
}
for i, tt := range tests {
for j, tt := range tests {
term := l.term(tt.index)
if !reflect.DeepEqual(term, tt.w) {
t.Errorf("#%d: at = %d, want %d", i, term, tt.w)
t.Errorf("#%d: at = %d, want %d", j, term, tt.w)
}
}
}
@ -712,18 +712,18 @@ func TestSlice(t *testing.T) {
{offset + num, offset + num + 1, nil, true},
}
for i, tt := range tests {
for j, tt := range tests {
func() {
defer func() {
if r := recover(); r != nil {
if !tt.wpanic {
t.Errorf("%d: panic = %v, want %v: %v", i, true, false, r)
t.Errorf("%d: panic = %v, want %v: %v", j, true, false, r)
}
}
}()
g := l.slice(tt.from, tt.to)
if !reflect.DeepEqual(g, tt.w) {
t.Errorf("#%d: from %d to %d = %v, want %v", i, tt.from, tt.to, g, tt.w)
t.Errorf("#%d: from %d to %d = %v, want %v", j, tt.from, tt.to, g, tt.w)
}
}()
}

View File

@ -233,7 +233,7 @@ func (n *node) run(r *raft) {
lead := None
prevSoftSt := r.softState()
prevHardSt := r.HardState
prevHardSt := emptyState
for {
if advancec != nil {

View File

@ -304,7 +304,7 @@ func TestNodeStart(t *testing.T) {
wants := []Ready{
{
SoftState: &SoftState{Lead: 1, RaftState: StateLeader},
HardState: raftpb.HardState{Term: 2, Commit: 2},
HardState: raftpb.HardState{Term: 2, Commit: 2, Vote: 1},
Entries: []raftpb.Entry{
{Type: raftpb.EntryConfChange, Term: 1, Index: 1, Data: ccdata},
{Term: 2, Index: 2},
@ -315,7 +315,7 @@ func TestNodeStart(t *testing.T) {
},
},
{
HardState: raftpb.HardState{Term: 2, Commit: 3},
HardState: raftpb.HardState{Term: 2, Commit: 3, Vote: 1},
Entries: []raftpb.Entry{{Term: 2, Index: 3, Data: []byte("foo")}},
CommittedEntries: []raftpb.Entry{{Term: 2, Index: 3, Data: []byte("foo")}},
},
@ -332,10 +332,10 @@ func TestNodeStart(t *testing.T) {
}
n.Propose(ctx, []byte("foo"))
if g := <-n.Ready(); !reflect.DeepEqual(g, wants[1]) {
t.Errorf("#%d: g = %+v,\n w %+v", 2, g, wants[1])
if g2 := <-n.Ready(); !reflect.DeepEqual(g2, wants[1]) {
t.Errorf("#%d: g = %+v,\n w %+v", 2, g2, wants[1])
} else {
storage.Append(g.Entries)
storage.Append(g2.Entries)
n.Advance()
}
@ -354,7 +354,7 @@ func TestNodeRestart(t *testing.T) {
st := raftpb.HardState{Term: 1, Commit: 1}
want := Ready{
HardState: emptyState,
HardState: st,
// commit up to index commit index in st
CommittedEntries: entries[:st.Commit],
}
@ -389,7 +389,7 @@ func TestNodeRestartFromSnapshot(t *testing.T) {
st := raftpb.HardState{Term: 1, Commit: 3}
want := Ready{
HardState: emptyState,
HardState: st,
// commit up to index commit index in st
CommittedEntries: entries,
}

View File

@ -306,9 +306,11 @@ func (r *raft) maybeCommit() bool {
}
func (r *raft) reset(term uint64) {
r.Term = term
if r.Term != term {
r.Term = term
r.Vote = None
}
r.lead = None
r.Vote = None
r.elapsed = 0
r.votes = make(map[uint64]bool)
for i := range r.prs {

View File

@ -774,7 +774,7 @@ func TestVoteRequest(t *testing.T) {
{[]pb.Entry{{Term: 1, Index: 1}}, 2},
{[]pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}}, 3},
}
for i, tt := range tests {
for j, tt := range tests {
r := newRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage(), 0)
r.Step(pb.Message{
From: 2, To: 1, Type: pb.MsgApp, Term: tt.wterm - 1, LogTerm: 0, Index: 0, Entries: tt.ents,
@ -788,7 +788,7 @@ func TestVoteRequest(t *testing.T) {
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
if len(msgs) != 2 {
t.Fatalf("#%d: len(msg) = %d, want %d", i, len(msgs), 2)
t.Fatalf("#%d: len(msg) = %d, want %d", j, len(msgs), 2)
}
for i, m := range msgs {
if m.Type != pb.MsgVote {

View File

@ -510,7 +510,7 @@ func TestOldMessages(t *testing.T) {
// commit a new entry
tt.send(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
l := &raftLog{
ilog := &raftLog{
storage: &MemoryStorage{
ents: []pb.Entry{
{}, {Data: nil, Term: 1, Index: 1},
@ -521,7 +521,7 @@ func TestOldMessages(t *testing.T) {
unstable: unstable{offset: 5},
committed: 4,
}
base := ltoa(l)
base := ltoa(ilog)
for i, p := range tt.peers {
if sm, ok := p.(*raft); ok {
l := ltoa(sm.raftLog)
@ -548,7 +548,7 @@ func TestProposal(t *testing.T) {
{newNetwork(nil, nopStepper, nopStepper, nil, nil), true},
}
for i, tt := range tests {
for j, tt := range tests {
send := func(m pb.Message) {
defer func() {
// only recover is we expect it to panic so
@ -556,7 +556,7 @@ func TestProposal(t *testing.T) {
if !tt.success {
e := recover()
if e != nil {
t.Logf("#%d: err: %s", i, e)
t.Logf("#%d: err: %s", j, e)
}
}
}()
@ -591,7 +591,7 @@ func TestProposal(t *testing.T) {
}
sm := tt.network.peers[1].(*raft)
if g := sm.Term; g != 1 {
t.Errorf("#%d: term = %d, want %d", i, g, 1)
t.Errorf("#%d: term = %d, want %d", j, g, 1)
}
}
}
@ -603,7 +603,7 @@ func TestProposalByProxy(t *testing.T) {
newNetwork(nil, nil, nopStepper),
}
for i, tt := range tests {
for j, tt := range tests {
// promote 0 the leader
tt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
@ -629,7 +629,7 @@ func TestProposalByProxy(t *testing.T) {
}
sm := tt.peers[1].(*raft)
if g := sm.Term; g != 1 {
t.Errorf("#%d: term = %d, want %d", i, g, 1)
t.Errorf("#%d: term = %d, want %d", j, g, 1)
}
}
}
@ -1601,8 +1601,8 @@ func newNetwork(peers ...Interface) *network {
npeers := make(map[uint64]Interface, size)
nstorage := make(map[uint64]*MemoryStorage, size)
for i, p := range peers {
id := peerAddrs[i]
for j, p := range peers {
id := peerAddrs[j]
switch v := p.(type) {
case nil:
nstorage[id] = NewMemoryStorage()

157
raft/rafttest/network.go Normal file
View File

@ -0,0 +1,157 @@
package rafttest
import (
"math/rand"
"sync"
"time"
"github.com/coreos/etcd/raft/raftpb"
)
// a network interface
type iface interface {
send(m raftpb.Message)
recv() chan raftpb.Message
disconnect()
connect()
}
// a network
type network interface {
// drop message at given rate (1.0 drops all messages)
drop(from, to uint64, rate float64)
// delay message for (0, d] randomly at given rate (1.0 delay all messages)
// do we need rate here?
delay(from, to uint64, d time.Duration, rate float64)
disconnect(id uint64)
connect(id uint64)
// heal heals the network
heal()
}
type raftNetwork struct {
mu sync.Mutex
disconnected map[uint64]bool
dropmap map[conn]float64
delaymap map[conn]delay
recvQueues map[uint64]chan raftpb.Message
}
type conn struct {
from, to uint64
}
type delay struct {
d time.Duration
rate float64
}
func newRaftNetwork(nodes ...uint64) *raftNetwork {
pn := &raftNetwork{
recvQueues: make(map[uint64]chan raftpb.Message),
dropmap: make(map[conn]float64),
delaymap: make(map[conn]delay),
disconnected: make(map[uint64]bool),
}
for _, n := range nodes {
pn.recvQueues[n] = make(chan raftpb.Message, 1024)
}
return pn
}
func (rn *raftNetwork) nodeNetwork(id uint64) iface {
return &nodeNetwork{id: id, raftNetwork: rn}
}
func (rn *raftNetwork) send(m raftpb.Message) {
rn.mu.Lock()
to := rn.recvQueues[m.To]
if rn.disconnected[m.To] {
to = nil
}
drop := rn.dropmap[conn{m.From, m.To}]
delay := rn.delaymap[conn{m.From, m.To}]
rn.mu.Unlock()
if to == nil {
return
}
if drop != 0 && rand.Float64() < drop {
return
}
// TODO: shall we delay without blocking the send call?
if delay.d != 0 && rand.Float64() < delay.rate {
rd := rand.Int63n(int64(delay.d))
time.Sleep(time.Duration(rd))
}
select {
case to <- m:
default:
// drop messages when the receiver queue is full.
}
}
func (rn *raftNetwork) recvFrom(from uint64) chan raftpb.Message {
rn.mu.Lock()
fromc := rn.recvQueues[from]
if rn.disconnected[from] {
fromc = nil
}
rn.mu.Unlock()
return fromc
}
func (rn *raftNetwork) drop(from, to uint64, rate float64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.dropmap[conn{from, to}] = rate
}
func (rn *raftNetwork) delay(from, to uint64, d time.Duration, rate float64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.delaymap[conn{from, to}] = delay{d, rate}
}
func (rn *raftNetwork) heal() {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.dropmap = make(map[conn]float64)
rn.delaymap = make(map[conn]delay)
}
func (rn *raftNetwork) disconnect(id uint64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.disconnected[id] = true
}
func (rn *raftNetwork) connect(id uint64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.disconnected[id] = false
}
type nodeNetwork struct {
id uint64
*raftNetwork
}
func (nt *nodeNetwork) connect() {
nt.raftNetwork.connect(nt.id)
}
func (nt *nodeNetwork) disconnect() {
nt.raftNetwork.disconnect(nt.id)
}
func (nt *nodeNetwork) send(m raftpb.Message) {
nt.raftNetwork.send(m)
}
func (nt *nodeNetwork) recv() chan raftpb.Message {
return nt.recvFrom(nt.id)
}

View File

@ -0,0 +1,58 @@
package rafttest
import (
"testing"
"time"
"github.com/coreos/etcd/raft/raftpb"
)
func TestNetworkDrop(t *testing.T) {
// drop around 10% messages
sent := 1000
droprate := 0.1
nt := newRaftNetwork(1, 2)
nt.drop(1, 2, droprate)
for i := 0; i < sent; i++ {
nt.send(raftpb.Message{From: 1, To: 2})
}
c := nt.recvFrom(2)
received := 0
done := false
for !done {
select {
case <-c:
received++
default:
done = true
}
}
drop := sent - received
if drop > int((droprate+0.1)*float64(sent)) || drop < int((droprate-0.1)*float64(sent)) {
t.Errorf("drop = %d, want around %d", drop, droprate*float64(sent))
}
}
func TestNetworkDelay(t *testing.T) {
sent := 1000
delay := time.Millisecond
delayrate := 0.1
nt := newRaftNetwork(1, 2)
nt.delay(1, 2, delay, delayrate)
var total time.Duration
for i := 0; i < sent; i++ {
s := time.Now()
nt.send(raftpb.Message{From: 1, To: 2})
total += time.Since(s)
}
w := time.Duration(float64(sent)*delayrate/2) * delay
// there are pretty overhead in the send call, since it genarete random numbers.
if total < w+10*delay {
t.Errorf("total = %v, want > %v", total, w)
}
}

114
raft/rafttest/node.go Normal file
View File

@ -0,0 +1,114 @@
package rafttest
import (
"log"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/net/context"
"github.com/coreos/etcd/raft"
"github.com/coreos/etcd/raft/raftpb"
)
type node struct {
raft.Node
id uint64
iface iface
stopc chan struct{}
pausec chan bool
// stable
storage *raft.MemoryStorage
state raftpb.HardState
}
func startNode(id uint64, peers []raft.Peer, iface iface) *node {
st := raft.NewMemoryStorage()
rn := raft.StartNode(id, peers, 10, 1, st)
n := &node{
Node: rn,
id: id,
storage: st,
iface: iface,
pausec: make(chan bool),
}
n.start()
return n
}
func (n *node) start() {
n.stopc = make(chan struct{})
ticker := time.Tick(5 * time.Millisecond)
go func() {
for {
select {
case <-ticker:
n.Tick()
case rd := <-n.Ready():
if !raft.IsEmptyHardState(rd.HardState) {
n.state = rd.HardState
n.storage.SetHardState(n.state)
}
n.storage.Append(rd.Entries)
// TODO: make send async, more like real world...
for _, m := range rd.Messages {
n.iface.send(m)
}
n.Advance()
case m := <-n.iface.recv():
n.Step(context.TODO(), m)
case <-n.stopc:
n.Stop()
log.Printf("raft.%d: stop", n.id)
n.Node = nil
close(n.stopc)
return
case p := <-n.pausec:
recvms := make([]raftpb.Message, 0)
for p {
select {
case m := <-n.iface.recv():
recvms = append(recvms, m)
case p = <-n.pausec:
}
}
// step all pending messages
for _, m := range recvms {
n.Step(context.TODO(), m)
}
}
}
}()
}
// stop stops the node. stop a stopped node might panic.
// All in memory state of node is discarded.
// All stable MUST be unchanged.
func (n *node) stop() {
n.iface.disconnect()
n.stopc <- struct{}{}
// wait for the shutdown
<-n.stopc
}
// restart restarts the node. restart a started node
// blocks and might affect the future stop operation.
func (n *node) restart() {
// wait for the shutdown
<-n.stopc
n.Node = raft.RestartNode(n.id, 10, 1, n.storage, 0)
n.start()
n.iface.connect()
}
// pause pauses the node.
// The paused node buffers the received messages and replies
// all of them when it resumes.
func (n *node) pause() {
n.pausec <- true
}
// resume resumes the paused node.
func (n *node) resume() {
n.pausec <- false
}

112
raft/rafttest/node_test.go Normal file
View File

@ -0,0 +1,112 @@
package rafttest
import (
"testing"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/net/context"
"github.com/coreos/etcd/raft"
)
func TestBasicProgress(t *testing.T) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}, {4, nil}, {5, nil}}
nt := newRaftNetwork(1, 2, 3, 4, 5)
nodes := make([]*node, 0)
for i := 1; i <= 5; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
time.Sleep(50 * time.Millisecond)
for i := 0; i < 1000; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
time.Sleep(100 * time.Millisecond)
for _, n := range nodes {
n.stop()
if n.state.Commit != 1006 {
t.Errorf("commit = %d, want = 1006", n.state.Commit)
}
}
}
func TestRestart(t *testing.T) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}, {4, nil}, {5, nil}}
nt := newRaftNetwork(1, 2, 3, 4, 5)
nodes := make([]*node, 0)
for i := 1; i <= 5; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
time.Sleep(50 * time.Millisecond)
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[1].stop()
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[2].stop()
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[2].restart()
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[1].restart()
// give some time for nodes to catch up with the raft leader
time.Sleep(300 * time.Millisecond)
for _, n := range nodes {
n.stop()
if n.state.Commit != 1206 {
t.Errorf("commit = %d, want = 1206", n.state.Commit)
}
}
}
func TestPause(t *testing.T) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}, {4, nil}, {5, nil}}
nt := newRaftNetwork(1, 2, 3, 4, 5)
nodes := make([]*node, 0)
for i := 1; i <= 5; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
time.Sleep(50 * time.Millisecond)
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[1].pause()
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[2].pause()
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[2].resume()
for i := 0; i < 300; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[1].resume()
// give some time for nodes to catch up with the raft leader
time.Sleep(300 * time.Millisecond)
for _, n := range nodes {
n.stop()
if n.state.Commit != 1206 {
t.Errorf("commit = %d, want = 1206", n.state.Commit)
}
}
}

View File

@ -84,10 +84,10 @@ func DescribeMessage(m pb.Message, f EntryFormatter) string {
// Entry for debugging.
func DescribeEntry(e pb.Entry, f EntryFormatter) string {
var formatted string
if f == nil {
formatted = fmt.Sprintf("%q", e.Data)
} else {
if e.Type == pb.EntryNormal && f != nil {
formatted = f(e.Data)
} else {
formatted = fmt.Sprintf("%q", e.Data)
}
return fmt.Sprintf("%d/%d %s %s", e.Term, e.Index, e.Type, formatted)
}

View File

@ -54,7 +54,9 @@ func (er *entryReader) readEntries() ([]raftpb.Entry, error) {
}
er.ents.Add()
}
er.lastIndex.Set(int64(ents[l-1].Index))
if l > 0 {
er.lastIndex.Set(int64(ents[l-1].Index))
}
return ents, nil
}

View File

@ -121,7 +121,6 @@ func (h *streamHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
fromStr := strings.TrimPrefix(r.URL.Path, RaftStreamPrefix+"/")
from, err := types.IDFromString(fromStr)
if err != nil {
log.Printf("rafthttp: path %s cannot be parsed", fromStr)
http.Error(w, "invalid path", http.StatusNotFound)
return
}

View File

@ -40,6 +40,7 @@ const (
appRespBatchMs = 50
propBatchMs = 10
DialTimeout = time.Second
ConnReadTimeout = 5 * time.Second
ConnWriteTimeout = 5 * time.Second
)
@ -196,19 +197,19 @@ func (p *peer) handle() {
p.errored = err
}
if p.active {
log.Printf("sender: the connection with %s becomes inactive", p.id)
log.Printf("sender: the connection with %s became inactive", p.id)
p.active = false
}
if m.Type == raftpb.MsgApp {
if m.Type == raftpb.MsgApp && p.fs != nil {
p.fs.Fail()
}
} else {
if !p.active {
log.Printf("sender: the connection with %s becomes active", p.id)
log.Printf("sender: the connection with %s became active", p.id)
p.active = true
p.errored = nil
}
if m.Type == raftpb.MsgApp {
if m.Type == raftpb.MsgApp && p.fs != nil {
p.fs.Succ(end.Sub(start))
}
}

42
rafthttp/remote.go Normal file
View File

@ -0,0 +1,42 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package rafthttp
import (
"net/http"
"github.com/coreos/etcd/pkg/types"
"github.com/coreos/etcd/raft/raftpb"
)
type remote struct {
id types.ID
peer *peer
}
func startRemote(tr http.RoundTripper, u string, local, to, cid types.ID, r Raft, errorc chan error) *remote {
return &remote{
id: to,
peer: NewPeer(tr, u, to, cid, r, nil, errorc),
}
}
func (g *remote) Send(m raftpb.Message) {
g.peer.send(m)
}
func (g *remote) Stop() {
g.peer.Stop()
}

View File

@ -76,8 +76,11 @@ func (s *stream) attach(sw *streamWriter) error {
// ignore lower-term streaming request
if sw.term < s.w.term {
return fmt.Errorf("cannot attach out of data stream server [%d / %d]", sw.term, s.w.term)
} else if sw.term == s.w.term {
s.w.stopWithoutLog()
} else {
s.w.stop()
}
s.w.stop()
}
s.w = sw
return nil
@ -151,21 +154,23 @@ type WriteFlusher interface {
// TODO: replace fs with stream stats
type streamWriter struct {
to types.ID
term uint64
fs *stats.FollowerStats
q chan []raftpb.Entry
done chan struct{}
to types.ID
term uint64
fs *stats.FollowerStats
q chan []raftpb.Entry
done chan struct{}
printLog bool
}
// newStreamWriter starts and returns a new unstarted stream writer.
// The caller should call stop when finished, to shut it down.
func newStreamWriter(to types.ID, term uint64) *streamWriter {
s := &streamWriter{
to: to,
term: term,
q: make(chan []raftpb.Entry, streamBufSize),
done: make(chan struct{}),
to: to,
term: term,
q: make(chan []raftpb.Entry, streamBufSize),
done: make(chan struct{}),
printLog: true,
}
return s
}
@ -188,12 +193,20 @@ func (s *streamWriter) send(ents []raftpb.Entry) error {
func (s *streamWriter) handle(w WriteFlusher) {
defer func() {
close(s.done)
log.Printf("rafthttp: server streaming to %s at term %d has been stopped", s.to, s.term)
if s.printLog {
log.Printf("rafthttp: server streaming to %s at term %d has been stopped", s.to, s.term)
}
}()
ew := newEntryWriter(w, s.to)
defer ew.stop()
for ents := range s.q {
// Considering Commit in MsgApp is not recovered when received,
// zero-entry appendEntry messages have no use to raft state machine.
// Drop it here because it is useless.
if len(ents) == 0 {
continue
}
start := time.Now()
if err := ew.writeEntries(ents); err != nil {
log.Printf("rafthttp: encountered error writing to server log stream: %v", err)
@ -209,6 +222,11 @@ func (s *streamWriter) stop() {
<-s.done
}
func (s *streamWriter) stopWithoutLog() {
s.printLog = false
s.stop()
}
func (s *streamWriter) stopNotify() <-chan struct{} { return s.done }
// TODO: move the raft interface out of the reader.
@ -289,9 +307,6 @@ func (s *streamReader) handle(r io.Reader) {
}
return
}
// Considering Commit in MsgApp is not recovered, zero-entry appendEntry
// messages have no use to raft state machine. Drop it here because
// we don't have easy way to recover its Index easily.
if len(ents) == 0 {
continue
}

View File

@ -35,8 +35,15 @@ type Raft interface {
type Transporter interface {
Handler() http.Handler
Send(m []raftpb.Message)
// AddRemote adds a remote with given peer urls into the transport.
// A remote helps newly joined member to catch up the progress of cluster,
// and will not be used after that.
// It is the caller's responsibility to ensure the urls are all vaild,
// or it panics.
AddRemote(id types.ID, urls []string)
AddPeer(id types.ID, urls []string)
RemovePeer(id types.ID)
RemoveAllPeers()
UpdatePeer(id types.ID, urls []string)
Stop()
}
@ -49,9 +56,10 @@ type transport struct {
serverStats *stats.ServerStats
leaderStats *stats.LeaderStats
mu sync.RWMutex // protect the peer map
peers map[types.ID]*peer // remote peers
errorc chan error
mu sync.RWMutex // protect the remote and peer map
remotes map[types.ID]*remote // remotes map that helps newly joined member to catch up
peers map[types.ID]*peer // peers map
errorc chan error
}
func NewTransporter(rt http.RoundTripper, id, cid types.ID, r Raft, errorc chan error, ss *stats.ServerStats, ls *stats.LeaderStats) Transporter {
@ -62,6 +70,7 @@ func NewTransporter(rt http.RoundTripper, id, cid types.ID, r Raft, errorc chan
raft: r,
serverStats: ss,
leaderStats: ls,
remotes: make(map[types.ID]*remote),
peers: make(map[types.ID]*peer),
errorc: errorc,
}
@ -89,21 +98,30 @@ func (t *transport) Send(msgs []raftpb.Message) {
continue
}
to := types.ID(m.To)
p, ok := t.peers[to]
if !ok {
log.Printf("etcdserver: send message to unknown receiver %s", to)
if ok {
if m.Type == raftpb.MsgApp {
t.serverStats.SendAppendReq(m.Size())
}
p.Send(m)
continue
}
if m.Type == raftpb.MsgApp {
t.serverStats.SendAppendReq(m.Size())
g, ok := t.remotes[to]
if ok {
g.Send(m)
continue
}
p.Send(m)
log.Printf("etcdserver: send message to unknown receiver %s", to)
}
}
func (t *transport) Stop() {
for _, r := range t.remotes {
r.Stop()
}
for _, p := range t.peers {
p.Stop()
}
@ -112,6 +130,21 @@ func (t *transport) Stop() {
}
}
func (t *transport) AddRemote(id types.ID, us []string) {
t.mu.Lock()
defer t.mu.Unlock()
if _, ok := t.remotes[id]; ok {
return
}
peerURL := us[0]
u, err := url.Parse(peerURL)
if err != nil {
log.Panicf("unexpect peer url %s", peerURL)
}
u.Path = path.Join(u.Path, RaftPrefix)
t.remotes[id] = startRemote(t.roundTripper, u.String(), t.id, id, t.clusterID, t.raft, t.errorc)
}
func (t *transport) AddPeer(id types.ID, urls []string) {
t.mu.Lock()
defer t.mu.Unlock()
@ -132,8 +165,26 @@ func (t *transport) AddPeer(id types.ID, urls []string) {
func (t *transport) RemovePeer(id types.ID) {
t.mu.Lock()
defer t.mu.Unlock()
t.peers[id].Stop()
t.removePeer(id)
}
func (t *transport) RemoveAllPeers() {
t.mu.Lock()
defer t.mu.Unlock()
for id, _ := range t.peers {
t.removePeer(id)
}
}
// the caller of this function must have the peers mutex.
func (t *transport) removePeer(id types.ID) {
if peer, ok := t.peers[id]; ok {
peer.Stop()
} else {
log.Panicf("rafthttp: unexpected removal of unknown peer '%d'", id)
}
delete(t.peers, id)
delete(t.leaderStats.Followers, id.String())
}
func (t *transport) UpdatePeer(id types.ID, urls []string) {

View File

@ -42,7 +42,7 @@ function package {
cp etcd/README.md ${target}/README.md
cp etcd/etcdctl/README.md ${target}/README-etcdctl.md
cp -R etcd/Documentation/2.0 ${target}/Documentation
cp -R etcd/Documentation ${target}/Documentation
}
function main {

View File

@ -82,29 +82,32 @@ func (s *Snapshotter) Load() (*raftpb.Snapshot, error) {
break
}
}
return snap, err
if err != nil {
return nil, ErrNoSnapshot
}
return snap, nil
}
func loadSnap(dir, name string) (*raftpb.Snapshot, error) {
var err error
var b []byte
fpath := path.Join(dir, name)
defer func() {
if err != nil {
renameBroken(fpath)
}
}()
b, err = ioutil.ReadFile(fpath)
snap, err := Read(fpath)
if err != nil {
log.Printf("snap: snapshotter cannot read file %v: %v", name, err)
renameBroken(fpath)
}
return snap, err
}
// Read reads the snapshot named by snapname and returns the snapshot.
func Read(snapname string) (*raftpb.Snapshot, error) {
b, err := ioutil.ReadFile(snapname)
if err != nil {
log.Printf("snap: snapshotter cannot read file %v: %v", snapname, err)
return nil, err
}
var serializedSnap snappb.Snapshot
if err = serializedSnap.Unmarshal(b); err != nil {
log.Printf("snap: corrupted snapshot file %v: %v", name, err)
log.Printf("snap: corrupted snapshot file %v: %v", snapname, err)
return nil, err
}
@ -115,13 +118,13 @@ func loadSnap(dir, name string) (*raftpb.Snapshot, error) {
crc := crc32.Update(0, crcTable, serializedSnap.Data)
if crc != serializedSnap.Crc {
log.Printf("snap: corrupted snapshot file %v: crc mismatch", name)
log.Printf("snap: corrupted snapshot file %v: crc mismatch", snapname)
return nil, ErrCRCMismatch
}
var snap raftpb.Snapshot
if err = snap.Unmarshal(serializedSnap.Data); err != nil {
log.Printf("snap: corrupted snapshot file %v: %v", name, err)
log.Printf("snap: corrupted snapshot file %v: %v", snapname, err)
return nil, err
}
return &snap, nil

View File

@ -76,7 +76,7 @@ func TestBadCRC(t *testing.T) {
// fake a crc mismatch
crcTable = crc32.MakeTable(crc32.Koopman)
_, err = ss.Load()
_, err = Read(path.Join(dir, fmt.Sprintf("%016x-%016x.snap", 1, 1)))
if err == nil || err != ErrCRCMismatch {
t.Errorf("err = %v, want %v", err, ErrCRCMismatch)
}
@ -182,7 +182,7 @@ func TestNoSnapshot(t *testing.T) {
defer os.RemoveAll(dir)
ss := New(dir)
_, err = ss.Load()
if err == nil || err != ErrNoSnapshot {
if err != ErrNoSnapshot {
t.Errorf("err = %v, want %v", err, ErrNoSnapshot)
}
}
@ -195,14 +195,35 @@ func TestEmptySnapshot(t *testing.T) {
}
defer os.RemoveAll(dir)
err = ioutil.WriteFile(path.Join(dir, "1.snap"), []byte("shit"), 0x700)
err = ioutil.WriteFile(path.Join(dir, "1.snap"), []byte(""), 0x700)
if err != nil {
t.Fatal(err)
}
_, err = Read(path.Join(dir, "1.snap"))
if err != ErrEmptySnapshot {
t.Errorf("err = %v, want %v", err, ErrEmptySnapshot)
}
}
// TestAllSnapshotBroken ensures snapshotter returens
// ErrNoSnapshot if all the snapshots are broken.
func TestAllSnapshotBroken(t *testing.T) {
dir := path.Join(os.TempDir(), "snapshot")
err := os.Mkdir(dir, 0700)
if err != nil {
t.Fatal(err)
}
defer os.RemoveAll(dir)
err = ioutil.WriteFile(path.Join(dir, "1.snap"), []byte("bad"), 0x700)
if err != nil {
t.Fatal(err)
}
ss := New(dir)
_, err = ss.Load()
if err == nil || err != ErrEmptySnapshot {
t.Errorf("err = %v, want %v", err, ErrEmptySnapshot)
if err != ErrNoSnapshot {
t.Errorf("err = %v, want %v", err, ErrNoSnapshot)
}
}

View File

@ -88,8 +88,8 @@ func TestFullEventQueue(t *testing.T) {
// Add
for i := 0; i < 1000; i++ {
e := newEvent(Create, "/foo", uint64(i), uint64(i))
eh.addEvent(e)
ce := newEvent(Create, "/foo", uint64(i), uint64(i))
eh.addEvent(ce)
e, err := eh.scan("/foo", true, uint64(i-1))
if i > 0 {
if e == nil || err != nil {

View File

@ -51,10 +51,10 @@ func TestHeapUpdate(t *testing.T) {
// add from older expire time to earlier expire time
// the path is equal to ttl from now
for i, n := range kvs {
for i := range kvs {
path := fmt.Sprintf("%v", 10-i)
m := time.Duration(10 - i)
n = newKV(nil, path, path, 0, nil, "", time.Now().Add(time.Second*m))
n := newKV(nil, path, path, 0, nil, "", time.Now().Add(time.Second*m))
kvs[i] = n
h.push(n)
}

View File

@ -369,10 +369,13 @@ func (n *node) Compare(prevValue string, prevIndex uint64) (ok bool, which int)
// If the node is a key-value pair, it will clone the pair.
func (n *node) Clone() *node {
if !n.IsDir() {
return newKV(n.store, n.Path, n.Value, n.CreatedIndex, n.Parent, n.ACL, n.ExpireTime)
newkv := newKV(n.store, n.Path, n.Value, n.CreatedIndex, n.Parent, n.ACL, n.ExpireTime)
newkv.ModifiedIndex = n.ModifiedIndex
return newkv
}
clone := newDir(n.store, n.Path, n.CreatedIndex, n.Parent, n.ACL, n.ExpireTime)
clone.ModifiedIndex = n.ModifiedIndex
for key, child := range n.Children {
clone.Children[key] = child.Clone()

View File

@ -78,10 +78,24 @@ func newStats() *Stats {
}
func (s *Stats) clone() *Stats {
return &Stats{s.GetSuccess, s.GetFail, s.SetSuccess, s.SetFail,
s.DeleteSuccess, s.DeleteFail, s.UpdateSuccess, s.UpdateFail, s.CreateSuccess,
s.CreateFail, s.CompareAndSwapSuccess, s.CompareAndSwapFail,
s.CompareAndDeleteSuccess, s.CompareAndDeleteFail, s.Watchers, s.ExpireCount}
return &Stats{
GetSuccess: s.GetSuccess,
GetFail: s.GetFail,
SetSuccess: s.SetSuccess,
SetFail: s.SetFail,
DeleteSuccess: s.DeleteSuccess,
DeleteFail: s.DeleteFail,
UpdateSuccess: s.UpdateSuccess,
UpdateFail: s.UpdateFail,
CreateSuccess: s.CreateSuccess,
CreateFail: s.CreateFail,
CompareAndSwapSuccess: s.CompareAndSwapSuccess,
CompareAndSwapFail: s.CompareAndSwapFail,
CompareAndDeleteSuccess: s.CompareAndDeleteSuccess,
CompareAndDeleteFail: s.CompareAndDeleteFail,
ExpireCount: s.ExpireCount,
Watchers: s.Watchers,
}
}
func (s *Stats) toJson() []byte {

View File

@ -25,6 +25,7 @@ import (
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/jonboulle/clockwork"
etcdErr "github.com/coreos/etcd/error"
"github.com/coreos/etcd/pkg/types"
)
// The default version to set when the store is first initialized.
@ -68,21 +69,27 @@ type store struct {
ttlKeyHeap *ttlKeyHeap // need to recovery manually
worldLock sync.RWMutex // stop the world lock
clock clockwork.Clock
readonlySet types.Set
}
func New() Store {
s := newStore()
// The given namespaces will be created as initial directories in the returned store.
func New(namespaces ...string) Store {
s := newStore(namespaces...)
s.clock = clockwork.NewRealClock()
return s
}
func newStore() *store {
func newStore(namespaces ...string) *store {
s := new(store)
s.CurrentVersion = defaultVersion
s.Root = newDir(s, "/", s.CurrentIndex, nil, "", Permanent)
for _, namespace := range namespaces {
s.Root.Add(newDir(s, namespace, s.CurrentIndex, s.Root, "", Permanent))
}
s.Stats = newStats()
s.WatcherHub = newWatchHub(1000)
s.ttlKeyHeap = newTtlKeyHeap()
s.readonlySet = types.NewUnsafeSet(append(namespaces, "/")...)
return s
}
@ -203,7 +210,7 @@ func (s *store) CompareAndSwap(nodePath string, prevValue string, prevIndex uint
nodePath = path.Clean(path.Join("/", nodePath))
// we do not allow the user to change "/"
if nodePath == "/" {
if s.readonlySet.Contains(nodePath) {
return nil, etcdErr.NewError(etcdErr.EcodeRootROnly, "/", s.CurrentIndex)
}
@ -258,7 +265,7 @@ func (s *store) Delete(nodePath string, dir, recursive bool) (*Event, error) {
nodePath = path.Clean(path.Join("/", nodePath))
// we do not allow the user to change "/"
if nodePath == "/" {
if s.readonlySet.Contains(nodePath) {
return nil, etcdErr.NewError(etcdErr.EcodeRootROnly, "/", s.CurrentIndex)
}
@ -401,7 +408,7 @@ func (s *store) Update(nodePath string, newValue string, expireTime time.Time) (
nodePath = path.Clean(path.Join("/", nodePath))
// we do not allow the user to change "/"
if nodePath == "/" {
if s.readonlySet.Contains(nodePath) {
return nil, etcdErr.NewError(etcdErr.EcodeRootROnly, "/", s.CurrentIndex)
}
@ -461,7 +468,7 @@ func (s *store) internalCreate(nodePath string, dir bool, value string, unique,
nodePath = path.Clean(path.Join("/", nodePath))
// we do not allow the user to change "/"
if nodePath == "/" {
if s.readonlySet.Contains(nodePath) {
return nil, etcdErr.NewError(etcdErr.EcodeRootROnly, "/", currIndex)
}

Some files were not shown because too many files have changed in this diff Show More