Compare commits

...

432 Commits

Author SHA1 Message Date
e4561dd8cf *: bump to v2.2.0 2015-09-10 10:02:45 -07:00
6e7725cd51 Merge pull request #3478 from endocode/kayrus/typo_fix
doc: member id typo fixed
2015-09-10 00:11:26 -07:00
37392ad223 doc: member id typo fixed 2015-09-10 08:47:45 +02:00
9b032c6a00 Merge pull request #3473 from MrLawes/master
doc: fix bad url in using a directory TTL section
2015-09-09 18:57:09 -07:00
1c058e9706 doc: fix bad url in using a directory TTL section 2015-09-10 09:23:10 +08:00
f3085d2ea4 Merge pull request #3459 from yichengq/release-doc
docs/dev: add release doc
2015-09-09 17:46:10 -07:00
b70e6fc677 docs/dev: add release doc
It documents the standard way to release etcd today. Maintainer should
follow this doc to cut release, and update it in time to fit current
situation.
2015-09-09 16:42:31 -07:00
c34cf04c27 Merge pull request #3448 from yichengq/release-script
scripts: add release.sh
2015-09-09 13:54:15 -07:00
bdd8774169 Merge pull request #3204 from endocode/kayrus/recovery
Improved "disaster restore" doc, added "member update" command descri…
2015-09-09 12:23:51 -07:00
19ad634673 doc: improved "disaster restore" doc, added "member update" command description 2015-09-09 20:07:31 +02:00
7d4cd7c76a scripts: add release.sh
It could build all binaries and images for the given version.
2015-09-09 09:50:41 -07:00
af0474f2e3 Merge pull request #3465 from raoofm/patch-1
etcdmain: Proxy doesnt specify - listening on http or https
2015-09-08 14:38:55 -07:00
2de1c36061 etcdmain: Proxy doesnt specify - listening on http or https
etcdmain: Proxy doesnt specify - listening on http or https

Fixes #3464
2015-09-08 17:19:23 -04:00
ccdd10c757 Merge pull request #3463 from yichengq/update-roadmap
roadmap: remove 2.2 section
2015-09-08 13:55:50 -07:00
c837f0526f roadmap: remove 2.2 section
We have finished all of them.
2015-09-08 13:43:39 -07:00
d8e6e217fd Merge pull request #3461 from xiang90/doc
doc: remove one limitation in upgrade doc
2015-09-08 13:29:43 -07:00
3689ea3071 doc: remove one limitation in upgrade doc 2015-09-08 13:28:23 -07:00
a44da0b62a Merge pull request #3451 from raoofm/patch-1
discovery: log error only if both ssl and non-ssl srv lookups fail
2015-09-06 20:54:43 -07:00
9a2809f0b5 discovery: log error only if both ssl and non-ssl srv lookups fail
discovery: log error only if both ssl and non-ssl srv lookups fail
Earlier we were logging as soon as one of the lookups failed.

Fixes #3414
2015-09-06 23:44:19 -04:00
184337568d scripts/build-docker: build docker in image-docker dir
The docker build command will use whatever directory contains the
Dockerfile as the build context (including all of its subdirectories).
And the <src> path of ADD must be inside the context of the build.
So change it to build in a specific directory for clean and fast.
2015-09-06 00:17:41 -07:00
15d1db9bf8 scripts/build-aci: support BINARYDIR and BUILDDIR
This makes it more configurable, and is ready for overall release script.
2015-09-06 00:17:41 -07:00
6b70fa72fe scripts: build-release -> build-binary
This makes the functionality of the script more clear, and always use
bash to run the script because it has bash-specific grammar.
2015-09-06 00:16:51 -07:00
cf6cb82caa scripts/build-docker: stop creating scratch image
Scratch image has become docker's reserved image.
2015-09-06 00:16:08 -07:00
a1b01c266a scripts/build-aci: fix the way to check executability
Or it may treat runnable command as unexecutable.
2015-09-06 00:15:31 -07:00
b9646b5734 Merge pull request #3447 from xiang90/txn
etcdctlv3: fix txn command
2015-09-05 18:21:11 -07:00
1532f7585b etcdctlv3: fix txn command 2015-09-05 16:08:15 -07:00
dab0871acb Merge pull request #3446 from xiang90/v3
etcdserver: refactor v3demo do
2015-09-05 15:41:00 -07:00
95d5556445 etcdserver: refactor v3demo do 2015-09-05 15:31:28 -07:00
d5ab71a4e8 Merge pull request #3445 from xiang90/api_doc
doc: add monitoring section to admin doc
2015-09-05 08:27:11 -07:00
13b3c64c10 doc: add monitoring section to admin doc 2015-09-05 08:25:35 -07:00
51d0630a8e Merge pull request #3440 from yichengq/memory-bench
docs/benchmark: add 2.2.0-rc memory usage benchmark
2015-09-04 20:23:56 -07:00
91b5b247e9 docs/benchmark: add 2.2.0-rc memory usage benchmark
It records the memory usage for different average value size, and
records the data size limitation.
2015-09-04 18:27:49 -07:00
106d918dd5 Merge pull request #3444 from xiang90/doc
etcdctl: suggest endpoint over peer
2015-09-04 13:22:03 -07:00
322aab133d etcdctl: suggest endpoint over peer 2015-09-04 13:16:33 -07:00
9fa05ad8a0 Merge pull request #3443 from xiang90/test
test: now raft has no shadow issue
2015-09-04 11:31:44 -07:00
39580479b5 Merge pull request #3442 from xiang90/b
etcdctl: prepare for health endpoint change
2015-09-04 11:30:44 -07:00
a6e67a6dec test: now raft has no shadow issue
We can test raft pkg now!
2015-09-04 10:52:14 -07:00
778f8d8fea Merge pull request #3434 from xiang90/index_revision
*: v3api index->revision
2015-09-04 10:48:59 -07:00
3f18ded10a *: v3api index->revision 2015-09-04 10:41:20 -07:00
5a5f15de39 Merge pull request #3438 from yichengq/storage-test
storage: add mock tests for store struct
2015-09-04 10:26:08 -07:00
04539c6240 etcdctl: prepare for health endpoint change
We made a mistake on the health endpoint by returning a string "true".
We have to make the etcdctl works for the next version of etcd which
will correct the mistake on the server side.

It is too late to change the server side right now since we already
released a version of etcdctl that only understands "true".
2015-09-04 10:20:24 -07:00
215f27c2f5 storage: add mock tests for store struct 2015-09-04 08:53:49 -07:00
8ca76a789b Merge pull request #3439 from akolb1/godep_all_fixes
Godep: fixed missing dependencies
2015-09-03 22:20:48 -07:00
2782418923 Godep: fixed missing dependencies 2015-09-04 04:51:44 +00:00
5ae2eb4731 storage: avoid one extra round of wait
It could exit early if it knows that there is no more keys.
2015-09-03 19:12:27 -07:00
9175df7c71 storage: correct revision for range when deleteRange
to make it logically reasonable.
2015-09-03 19:12:27 -07:00
797a4796d9 storage: remove check for DELETE type KeyValue
kvindex always returns kvs that exist at given revision, so there is no
need to check for whether the KeyValue range from backend is DELETE type.
2015-09-03 19:12:27 -07:00
00e31f13a6 storage: remove unnecessary rev parameter 2015-09-03 19:12:27 -07:00
2f2b084ab5 Merge pull request #3436 from xiang90/remove_consistent_token
*: replace consistent token with revision in v3 api
2015-09-03 17:16:07 -07:00
254d641ff9 Merge pull request #3429 from xiang90/upgrade_doc
doc: add upgrade to 2.2 doc
2015-09-03 15:47:10 -07:00
2ac9af4924 *: replace consistent token with revision in v3 api 2015-09-03 15:41:33 -07:00
243fe519a9 Merge pull request #3435 from xiang90/gogoproto
*: update gogoproto
2015-09-03 15:35:48 -07:00
ef7cf058a2 *: update gogoproto 2015-09-03 15:32:25 -07:00
356aba7595 doc: add upgrade to 2.2 doc 2015-09-03 11:48:30 -07:00
ae2b43b588 Merge pull request #3433 from tamird/proto-import-path
*: regenerate proto to use local import path
2015-09-03 10:52:37 -07:00
45390b9fb8 *: regenerate proto to use local import path
Using Go-style import paths in protos is not idiomatic. Normally, this
detail would be internal to etcd, but the path from which gogoproto
is imported affects downstream consumers (e.g. cockroachdb).

In cockroach, we want to avoid including `$GOPATH/src` in our protoc
include path for various reasons. This patch puts etcd on the same
convention, which allows this for cockroach.

More information: https://github.com/cockroachdb/cockroach/pull/2339#discussion_r38663417

This commit also regenerates all the protos, which seem to have
drifted a tiny bit.
2015-09-03 13:38:28 -04:00
84d1527df6 Merge pull request #3432 from coreos/robszumski-patch-1
docs: insert whitespace
2015-09-03 09:56:57 -07:00
49e7e6eb9f docs: insert whitespace
Fixes the rendering of this page on https://coreos.com/etcd/docs/2.1.0/proxy.html
2015-09-03 09:50:07 -07:00
1eaf169057 Merge pull request #3395 from yichengq/backend-test
storage/backend: add unit tests for backend and batchTx
2015-09-03 07:23:38 -07:00
44fd734038 storage/backend: add unit tests for backend and batchTx 2015-09-02 16:57:13 -07:00
16e9e4b3d5 Merge pull request #3412 from yichengq/etcdctl-sync
etcdctl: better logging for sync process
2015-09-02 16:49:00 -07:00
8e040efed9 etcdctl: log more about sync process
Users don't even know that etcdctl is doing sync and fails on sync
process. So we add more logs for sync process.
2015-09-02 16:10:25 -07:00
3a8db488ca Merge pull request #3415 from yichengq/better-err
etcdctl/command: print more details about ErrNoEndpoint
2015-09-02 10:11:45 -07:00
41cc16481f Merge pull request #3418 from AdoHe/fix_build_script_error
build: fix build error on ubuntu
2015-09-01 22:44:23 -07:00
9665cda7c1 build: fix build error on ubuntu 2015-09-02 13:28:55 +08:00
484a115813 Merge pull request #3424 from akolb1/bolt_solaris1
Godeps: boltdb dependency missing solaris support
2015-09-01 16:19:23 -07:00
ecbc44fb63 Godeps: boltdb dependency missing solaris support 2015-09-01 23:17:36 +00:00
423e3bbbd8 etcdctl/cluster_health: provide better message for empty client urls
It skips sync when init client, and prints out unreachable messagen and
points to notice when checking health of etcd members one by one.
2015-09-01 14:42:19 -07:00
aa0c8fea55 Merge pull request #3321 from yichengq/doc-tls-setup
docs/security: link cfssl example
2015-09-01 14:28:40 -07:00
6caae58814 docs/security: recommend cfssl instead of etcd-ca
This provides a more general and stable way for users to set TLS cluster.
2015-09-01 14:07:26 -07:00
d412eaa3a2 Merge pull request #3308 from yichengq/go-codec
Use ugorji codec for unmarshalling key responses in client
2015-09-01 14:04:38 -07:00
53b8175d3f Merge pull request #3421 from xiang90/3411
etcdmain: proxy does not need to belong to the discovered cluster
2015-09-01 13:49:31 -07:00
7957677cf2 etcdmain: proxy does not need to belong to the discovered cluster 2015-09-01 11:24:02 -07:00
a94118893c Merge pull request #3413 from xiang90/snapshot_dir
*: support wal dir
2015-09-01 10:03:50 -07:00
d94e712d91 *: support wal dir 2015-09-01 09:54:27 -07:00
85b6c51a23 Merge pull request #3420 from yichengq/wait-more
storage: extend timeout to wait for put complete
2015-09-01 09:25:46 -07:00
a21166c3aa storage: extend timeout to wait for put complete
travis is sometimes slow, and it could fail to complete the put in 10ms.
2015-09-01 09:03:03 -07:00
8ac981e1ee Merge pull request #3416 from yichengq/get-cluster-timeout
etcdserver: add timeout param on getClusterFromRemotePeers
2015-09-01 09:00:19 -07:00
f3bfcb9dee etcdserver: add timeout param on getClusterFromRemotePeers
It sets 10s timeout for public GetClusterFromRemotePeers.

This helps the following cases to work well in high latency scenario:

1. proxy sync members from the cluster
2. newly-joined member sync members from the cluster

Besides 10s request timeout, the request is also controlled by dial
timeout and read connection timeout.
2015-09-01 08:49:01 -07:00
1fabc48968 Merge pull request #3404 from bdarnell/multinode-propose-panic
raft: A removed node can no longer be leader.
2015-08-31 20:06:34 -07:00
4f20e01f60 raft: Ignore proposals if not a current member.
Fixes another panic in MultiNode.Propose.
2015-08-31 20:31:14 -04:00
c2caa4ae3b etcdctl/command: print more details about ErrNoEndpoint
This commit prints more details if getting ErrNoEndpoint when sync with
cluster. This helps users to know what happens.
2015-08-31 16:28:43 -07:00
4b9b0cbcc1 storage: add newBackend and newBatchTx
This is for ease of testing.
2015-08-31 13:25:10 -07:00
57b39aca4e Merge pull request #3403 from xiang90/doc
doc: add 0.4.9 to 2.2 migration guide
2015-08-31 11:28:25 -07:00
3c1f80bdff Merge pull request #3401 from xiang90/more_metrics
more on storage metrics
2015-08-31 09:55:29 -07:00
406bb6749e doc: add 0.4.9 to 2.2 migration guide 2015-08-31 09:55:12 -07:00
bc71aab07a Merge pull request #3409 from xiang90/fix_force_new
etcdserver: ignore confChangeUpdateNode in getIDs
2015-08-31 09:44:10 -07:00
1bcaa9f4a1 etcdserver: ignore confChangeUpdateNode in getIDs 2015-08-31 09:36:39 -07:00
aaa7dfc14d Merge pull request #3407 from MSamman/fix-build-warning
build: fixed build warning
2015-08-31 07:47:23 -07:00
dd4317db43 build: fixed build warning
to clear warning and ensure git sha linkage works in the future

Fixes #3406
2015-08-30 15:05:56 -07:00
b9632e0f8d storage: register txnCounter 2015-08-28 15:17:16 -07:00
dd443be41b storage: report total number of keys 2015-08-28 15:16:53 -07:00
d2cb732c7b test: activate test on storage/backend 2015-08-28 13:52:31 -07:00
054fab84ee storage/backend: remove startc var
This makes start logic cleaner.
2015-08-28 13:52:31 -07:00
fca98c9071 Merge pull request #3398 from xiang90/storage_metrics
storage: add initial metrics for kv
2015-08-28 13:50:44 -07:00
b5838edb93 storage: add initial metrics for kv 2015-08-28 13:41:42 -07:00
6cbaaa715c Merge pull request #3396 from bdarnell/multinode-propose-panic
raft: Fix a nil-pointer panic in MultiNode.Propose.
2015-08-28 12:34:49 -07:00
cba7c6a180 *: bump to v2.2.0-rc.0+git 2015-08-28 10:26:56 -07:00
dc3e027288 *: bump to v2.2.0-rc.0 2015-08-28 10:26:32 -07:00
b40e077047 Merge pull request #3388 from sckott/docfix-tuning
fix docs, change tuning link in api.md from section to file
2015-08-28 09:23:58 -07:00
05924b330a raft: Fix a nil-pointer panic in MultiNode.Propose. 2015-08-28 11:17:59 +02:00
f04884f74d storage/backend: fix off-by-one error for pending var
Or it may commit until batchLimit + 1.
2015-08-27 22:51:32 -07:00
7ed929fb3d storage/backend: fix limit doesn't effect in range 2015-08-27 22:51:32 -07:00
37d9354aa2 Merge pull request #3394 from yichengq/bench-2.2
adjust file and README in docs/benchmark
2015-08-27 21:09:39 -07:00
9d78d84270 Merge pull request #3390 from xiang90/ctl_peer
etcdctl: suggest endpoint over peers flag
2015-08-27 21:03:39 -07:00
8d8033df55 etcdctl: suggest endpoint over peers flag 2015-08-27 18:52:17 -07:00
753a079700 docs/benchmark: add benchmark result links in README 2015-08-27 17:08:49 -07:00
425afa66ea docs/benchmarks: update bench version for more accuracy 2015-08-27 17:08:30 -07:00
f68e4a1a5d Merge pull request #3392 from yichengq/bench-2.2
docs/benchmark: update etcd 2.2 bench
2015-08-27 16:58:04 -07:00
605f0ce730 docs/benchmark: update etcd 2.2 bench
This benchmark is for etcd 2.2 rc after fixing several performance
downgrade bugs.
2015-08-27 16:52:55 -07:00
b0192118dd doc: change tuning link in api.md from section to file 2015-08-27 15:04:07 -07:00
1124a06860 Merge pull request #3387 from yichengq/fix-quorum
doc: correct calculation of fault tolerance of an etcd cluster in adm…
2015-08-27 14:48:39 -07:00
bc2b8856d7 doc: correct calculation of fault tolerance of an etcd cluster in admin_guide.md
doc: correct calculation of fault tolerance of an etcd cluster in admin_guide.md
2015-08-27 14:30:12 -07:00
df83af944b Merge pull request #3384 from yichengq/fix-shadow
test: use go vet shadow feature instead of go-nyet
2015-08-27 14:27:57 -07:00
92cd24d5bd *: fix govet shadow check failure 2015-08-27 14:15:30 -07:00
b2d33e6dcb Merge pull request #3382 from xiang90/env
pkg/flags: print out evn usage information
2015-08-27 13:36:55 -07:00
ccdb850e1e test: use go vet shadow feature instead of go-nyet
Use official support instead of home-made one.
2015-08-27 13:29:12 -07:00
4ac4648b5b Merge pull request #3383 from cognusion/fixes2
Test Fixes: Take 2
2015-08-27 13:22:19 -07:00
327632014e cors: Removed new(?) header from test, resolving failure
"X-Content-Type-Options" was being autoadded, but none of the
test maps took it into account. I saw that "Content-Type" was
also being deleted, so I figured that was the best solution
for this as well.
2015-08-27 15:23:14 -04:00
19a28c8efd storage: Fixed backend test
./backend_test.go:23: multiple-value batchTx.UnsafeRange() in single-value context
2015-08-27 15:20:29 -04:00
32372e1d70 raft: Fixed a test misassumption
network_test.go:56: total = 59.22354ms, want > 50ms
59 is > 50, but the equation added 10 to the right side
2015-08-27 15:15:34 -04:00
c8f5e03b75 pkg/flags: print out evn usage information 2015-08-27 12:08:31 -07:00
25c87f13fd Merge pull request #3354 from mx2323/faq
add faq documentation
2015-08-26 16:36:04 -07:00
8f3ea5ebed doc: add faq documentation 2015-08-26 16:34:52 -07:00
59a5a7e309 Merge pull request #3368 from yichengq/storage-test
add unit tests for storage
2015-08-26 15:32:02 -07:00
0d38c13990 storage: use temp path to handle test file 2015-08-26 15:01:41 -07:00
2d01eb4e11 storage: add tests for kvstore_compaction 2015-08-26 15:01:13 -07:00
f38778160d Merge pull request #3376 from yichengq/connection-down
etcdserver: specify request timeout error due to connection down
2015-08-26 13:09:30 -07:00
0813139140 storage: add more tests for index 2015-08-26 12:53:30 -07:00
3723f01b48 storage: add more unit tests for keyIndex 2015-08-26 12:53:30 -07:00
ad8a291dc1 storage: return error when tombstone on new generation
It is not allowed to put tombstone on an empty generation.
2015-08-26 12:53:30 -07:00
ffa87f9678 storage: fix the comment in generation.walk 2015-08-26 12:53:30 -07:00
8f6bf029f8 etcdserver: specify request timeout error due to connection lost
It specifies request timeout error possibly caused by connection lost,
and print out better log for user to understand.

It handles two cases:
1. the leader cannot connect to majority of cluster.
2. the connection between follower and leader is down for a while,
and it losts proposals.

log format:
```
20:04:19 etcd3 | 2015-08-25 20:04:19.368126 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
20:04:19 etcd3 | 2015-08-25 20:04:19.368227 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
```
2015-08-26 12:38:37 -07:00
76db9747f8 Merge pull request #3377 from yichengq/tls-info-string
pkg/transport: print ClientCertAuth in TLSInfo.String()
2015-08-25 22:45:10 -07:00
45bb88069b Merge pull request #3378 from yichengq/set-late
etcdmain: check error before assigning peer transport
2015-08-25 22:38:36 -07:00
58455a2ae4 etcdmain: check error before assigning peer transport
Or it may panic when new transport fails, e.g., TLS info is invalid.
2015-08-25 22:04:26 -07:00
57e88465bf pkg/transport: print ClientCertAuth in TLSInfo.String()
It is good to print it in debug output:

```
21:56:12 etcd1 | 2015-08-25 21:56:12.162406 I | etcdmain: peerTLS: cert
= certs/etcd1.pem, key = certs/etcd1-key.pem, ca = , trusted-ca =
certs/ca.pem, client-cert-auth = true
```
2015-08-25 21:53:52 -07:00
6250fed8a8 Merge pull request #3096 from philips/tls-info-debug
pkg/transport: include debug output for trusted-ca
2015-08-25 20:08:19 -07:00
008f988f6b Merge pull request #3375 from xiang90/doc
doc: add evn variable name to configuration.md
2015-08-25 14:48:35 -07:00
2b58da1699 Merge pull request #3374 from yichengq/gomaxprocs
etcdmain: change default GOMAXPROCS when compiling in go1.5
2015-08-25 14:48:00 -07:00
35a0459cc8 doc: add evn variable name to configuration.md 2015-08-25 14:35:15 -07:00
32ab3f6931 Merge pull request #3372 from xiang90/doc
improve clustering.md doc
2015-08-25 14:04:30 -07:00
c30c85898e doc: add explanation for client urls 2015-08-25 13:46:27 -07:00
2ac9a329ab etcdmain: stop setting GOMAXPROCS explicitly
We always want to use GOMAXPROCS() as the way go parses it. When in go1.4, we
want to expose GOMAXPROCS value, so we set GOMAXPROCS explicitly as the
way go 1.4 does and print it out.

But it becomes a problem when go 1.5 changes the way to set GOMAXPROCS.

Fix the problem by stop setting GOMAXPROCS and get its value directly.

Due to this change, it sets default GOMAXPROCS to the
number of CPUs available when compiling in go 1.5, which matches how go 1.5 works:
https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit

This is a behavior change in etcd 2.2.
2015-08-25 13:38:16 -07:00
a4285ef5c9 Merge pull request #3367 from MSamman/master
etcdserver: handle malformed basic auth
2015-08-25 13:12:48 -07:00
e2e002f94e etcdserver: handle malformed basic auth
return insufficient credentials if basic auth header is malformed

Fixes #3280
2015-08-25 12:37:24 -07:00
7bd558b2e0 Merge pull request #3373 from ecnahc515/add_report_bugs_contributing
Contributing: Link to reporting bugs doc
2015-08-25 12:17:06 -07:00
ad843341a9 Contributing: Link to reporting bugs doc 2015-08-25 12:15:03 -07:00
f56c5455f3 doc: mention reconfiguration design in clustering.md 2015-08-25 11:22:08 -07:00
986f354694 Merge pull request #3371 from xiang90/bolt
Godeps: update bolt dependency
2015-08-25 11:17:14 -07:00
e8f40b0412 storage/backend: add commitAndStop
After the upgrade of boltdb, db.Close waits for all txn to finish.
CommitAndStop commits the current txn and stop creating new ones.
2015-08-25 10:57:25 -07:00
8738a88fae Godeps: update bolt dependency 2015-08-25 10:39:29 -07:00
2d06f6b371 Merge pull request #3362 from yichengq/rafthttp-cancel
rafthttp: always cancel in-flight request when stop streamReader
2015-08-25 09:26:46 -07:00
61a75b3d48 rafthttp: always cancel in-flight request when pipeline.send
This fits the way for go1.5 to cancel request.
2015-08-25 09:07:49 -07:00
27b9963959 client: always cancel in-flight request when do request
This fits the way for go1.5 to cancel request.
2015-08-25 09:04:58 -07:00
ece39c9462 proxy: always cancel in-flight request
This fits the way for go1.5 to cancel request.
2015-08-25 08:59:59 -07:00
6fc638673c rafthttp: return err if stopped before setting cancel in dial()
The original workflow may fail to cancel if stop() cancels the finished
request just before dial() assigning a new cancel. This commit checks
streamReader status before setting cancel to avoid this problem.

It is tested at travis for 300 times. go 1.5 always works well, while
go 1.4 fails to stop once.
2015-08-25 08:59:12 -07:00
fc95ec0cc6 rafthttp: always cancel in-flight request when stop streamReader
This problem is totally fixed at 1.5.

go1.5 adds a Request.Cancel channel, which allows for "race free"
cancellation
(8b4278ffb7).
Our implementation relies on it to always cancel in-flight request.
2015-08-25 08:54:13 -07:00
0132b091d2 Merge pull request #3360 from yichengq/bench-3
*: add initial read benchmark for etcd v3
2015-08-25 07:58:30 -07:00
3632a1b9b1 *: add initial read benchmark for etcd v3
It includes the initial read benchmark for etcd v3.

This is the first step to give some rough thoughts. I haven't digged
deeper to answer some questions, including why its performance is not
better than HTTP + json, why one put will cause performance downgrade.
2015-08-25 07:50:18 -07:00
e3ef1d363a Merge pull request #3366 from xiang90/v3_proto
update v3 proto and doc
2015-08-24 11:22:29 -07:00
0cb45aee64 rfc: update v3 proto 2015-08-24 11:00:51 -07:00
1cccbb5ebd etcdserverpb: add comments for compaction 2015-08-24 10:52:54 -07:00
3a60d490d1 storagepb: fix comment location 2015-08-24 10:42:16 -07:00
4a5b94478e etcdserverpb: update comment for txn request 2015-08-24 10:40:05 -07:00
98ceb3cdbd etcdserverpb: add more field into rangeResponse 2015-08-24 10:33:20 -07:00
c7f10ed975 Merge pull request #3361 from yichengq/no-log
integration: only print critical log
2015-08-24 09:44:13 -07:00
3702be476b integration: only print critical log
This limits the logs printed out in integration test, so it will not
have log flood and help us read fatal log in travis.
2015-08-23 21:22:21 -07:00
514c4371a9 Merge pull request #3359 from yichengq/storage-test
functional tests for storage package and some related fixes
2015-08-23 21:12:36 -07:00
1e2b0acf6d test: activate test for storage package 2015-08-23 20:59:06 -07:00
9c0c314425 storage: add functional tests for the package
It adds and reorganize tests to construct functional tests.
2015-08-23 20:59:06 -07:00
9960651c3f storage: let range work in the process of txn
range should work in the process of txn to help check the status during the
txn.
2015-08-23 20:59:06 -07:00
6d97dcaf3f storage: ensure that desired compaction is persisted
It needs to persist the desired compaction, so it won't forget the compaction
if it crashes later.
2015-08-23 20:59:06 -07:00
353f10ca2b storage: reject to compact on future rev
Compaction on future rev is unreasonable.
2015-08-23 20:59:06 -07:00
47b243be5d storage: let TxnDeleteRange return rev if no error
If it doesn't return error, it should return valid rev.
2015-08-23 20:59:06 -07:00
62f7481b19 storage: keyIndex.get returns err when key is tombstoned
Before this commit, it will return wrong create index, mod index.

It lets findGeneration return error when rev is at the gap of two
generations. This leads to the change of compact() code.
2015-08-23 20:59:02 -07:00
3b2fa9f1de storage: fix TestKeyIndexCompact
It fails to pass before.
2015-08-23 17:22:49 -07:00
97b211c8ba Merge pull request #3357 from ccding/master
go vet
2015-08-22 10:29:29 -07:00
c09b667d57 *: fix go vet reported issues 2015-08-22 12:19:02 -05:00
044b23c3ca Merge pull request #3356 from xiang90/travis
*: test gofmt with -s and fix reported issues
2015-08-21 18:59:51 -07:00
6b23a8131f *: test gofmt with -s and fix reported issues 2015-08-21 18:52:16 -07:00
301b7f57c0 Merge pull request #3355 from yichengq/health-var
etcdctl/cluster_health: set health var when checked healthy
2015-08-21 15:37:15 -07:00
224755855d etcdctl/cluster_health: set health var when checked healthy
This was a typo.
2015-08-21 15:27:35 -07:00
84b614c508 Merge pull request #3342 from xiang90/travis
travis: test for go 1.5 build
2015-08-21 14:49:00 -07:00
1dcc145aef client: fix test 2015-08-21 14:36:29 -07:00
8c0610d4f5 Merge pull request #3352 from yichengq/fix-name-url
fix that etcd fails to start if using both IP and hostname when discovery srv
2015-08-21 12:38:38 -07:00
3c1e6b54b3 pkg/netutil: stop resolving in place
It helps to copy out a and b, and not modify the original a and b.
2015-08-21 12:09:17 -07:00
1c334979cd pkg/netutil: not introduce empty url when converting
It should not make slices with length and append elements at the same
time.
2015-08-21 12:08:17 -07:00
7b871aab41 pkg/netutil: not export resolve and urlsEqual functions
They are only used in this package, so there is no need to public them.
2015-08-21 11:58:37 -07:00
b1192e5c48 pkg/netutil: fix false negative comparison
Sort the resolved URLs before DeepEqual, so it will not compare URLs
that may be out of order due to resolution.
2015-08-21 10:15:08 -07:00
72462a72fb etcdserver: remove TODO to delete URLStringsEqual
Discovery SRV supports to compare IP addresses with domain names,
so we need URLStringsEqual function.
2015-08-21 09:52:17 -07:00
8ea3d157c5 Revert "Revert "Treat URLs have same IP address as same""
This reverts commit 3153e635d5.

Conflicts:
	etcdserver/config.go
2015-08-21 09:41:13 -07:00
07af0b3e5b Merge pull request #3346 from xiang90/auth_skip
etcdserver/auth: cache auth enable result
2015-08-20 23:32:29 -07:00
11a689d063 etcdserver/auth: cache auth enable result 2015-08-20 23:05:00 -07:00
e8e507b29b Merge pull request #3348 from xiang90/l
use limited listener from golang
2015-08-20 22:44:51 -07:00
ff37cc455c pkg/transport: remove home-grown limitedListener 2015-08-20 20:03:27 -07:00
92634356c1 *: use limitedListener from golang 2015-08-20 20:02:35 -07:00
da9a12b97c Merge pull request #3344 from xiang90/startup_version
etcdmain: print out version information on startup
2015-08-20 15:10:25 -07:00
6b77c146ec etcdmain: print out version information on startup 2015-08-20 14:50:16 -07:00
31395d257c travis: test for go 1.5 build 2015-08-20 11:39:41 -07:00
7cf9770e12 Merge pull request #3340 from xiang90/fix_perallocate
pkg/fileutil: treat not support error as nil error in preallocate
2015-08-20 11:38:03 -07:00
3ca5482251 pkg/fileutil: treat not support error as nil error in preallocate 2015-08-20 11:15:02 -07:00
4a6d6b0052 Merge pull request #3338 from spacejam/master
Reversion->Revision
2015-08-20 10:16:31 -07:00
acd7a92f03 storage: reversion -> revision 2015-08-20 08:39:07 -07:00
e1dfcec0ab Merge pull request #3327 from yichengq/bench-2.2
docs/benchmarks: add benchmark result for 2.2
2015-08-20 00:18:32 -07:00
807de81172 docs/benchmarks: add benchmark result for 2.2
And it analyzes the reason for performance changes.
2015-08-19 23:59:33 -07:00
795e962403 Merge pull request #3334 from mitake/snap-marsharing-prometheus
snap: export durations of marsharing cost during snapshot save
2015-08-19 20:59:04 -07:00
7a6d33620f snap: export durations of marshalling cost during snapshot save
Currently, total duration of snapshot saving is exported for
prometheus. For more detailed analysis, this commit let etcd export
durations of marshalling for prometheus.
2015-08-20 12:47:07 +09:00
46a2ae77a1 hack/benchmark: add script for benchmark
This is for etcd benchmark.
2015-08-19 20:37:27 -07:00
b0303e948c Merge pull request #3323 from xiang90/cl_health
etcdctl: use health endpoint to greatly simplify health checking
2015-08-19 17:15:52 -07:00
568d1c6783 etcdctl: use health endpoint to greatly simplify health checking 2015-08-19 11:47:08 -07:00
60387dc408 Merge pull request #3320 from yichengq/doc-rtt
docs: document how to set heartbeat interval and election timeout
2015-08-19 11:08:05 -07:00
28b61acd9e Merge pull request #3324 from xiang90/raft_logging
raft: downgrade the logging around snapshot to debugf
2015-08-18 17:18:08 -07:00
d01b6cd639 Merge pull request #3326 from elimisteve/master
client: fixed typo in WatcherOptions docs
2015-08-18 16:49:43 -07:00
952827157a client: fixed typo in WatcherOptions docs
specifices -> specifies
2015-08-18 16:43:09 -07:00
b3d2a621ab Merge pull request #3325 from elimisteve/master
client: spelling error in docs (occured -> occurred)
2015-08-18 16:35:13 -07:00
69fc796926 client: spelling error in docs (occured -> occurred) 2015-08-18 16:26:52 -07:00
50c1db3fbf raft: downgrade the logging around snapshot to debugf
Snapshot related logging is spamming when leader trying to
sync a failed peer.
2015-08-18 15:43:53 -07:00
7082d3a765 docs: document how to set heartbeat interval and election timeout
It gives more details about how to set heartbeat interval and election
timeout correctly based on RTT.
2015-08-18 13:54:44 -07:00
28cec1128d Merge pull request #3322 from philips/use-proxy-as-default-endpoint
Procfile: use proxy as default
2015-08-18 12:38:51 -07:00
087061e434 Merge pull request #3303 from yichengq/auth-path
use canonical path for auth
2015-08-18 12:06:48 -07:00
4778d780a8 pkg/pathutil: change copyright for path.go
The file only contains the function that is borrowed from std http lib,
so we use their copyright.
2015-08-18 11:48:22 -07:00
9106675fd4 Procfile: use proxy as default
I think it makes sense to make the proxy listen on the default port so
we can give the proxy more testing by default. Also, this should make it
easy to kill a single etcd member and test that etcdctl still works,
etc.

However, I have hit a bug: the proxy takes several seconds
2015-08-18 09:42:13 -07:00
fab3feab66 etcdctl/role: reject non-canonical permission path
Non-canonical permission path is useless because the path received
by auth is always canonical, which is due to our ServeMux always
redirects request to canonical path().

This helps users to detect path permission setting error early.

Ref: http://godoc.org/net/http#ServeMux
2015-08-18 08:59:53 -07:00
b5ec7f543a client: use canonical url path in request
The main change is that it keeps the trailing slash. This helps
auth feature to judge path permission accurately.
2015-08-18 08:59:48 -07:00
927d5f3d26 Merge pull request #3301 from yichengq/ca-file
etcdmain: update -ca-file description
2015-08-17 23:36:33 -07:00
c0747a7b8b etcdmain: update -ca-file description
so people could deprecate old flags and use new flags much easier.
2015-08-17 22:36:04 -07:00
bcb4d5d53e Merge pull request #3311 from yichengq/request-timeout
extend hardcoded timeout for globally-deployed etcd cluster
2015-08-17 17:00:24 -07:00
dfc6b4436f Merge pull request #3315 from xiang90/key_err
etcdhttp:write etcderror for all errors in keyhandler
2015-08-17 16:54:12 -07:00
ffae601af5 etcdmain: calculate dial timeout for peer transport
This helps peer communication in globally-deployed cluster.
2015-08-17 16:52:53 -07:00
1375ef8985 etcdserver: remove getVersion timeout
The request can still time out because we have set dial timeout and
read/write timeout. It increases timeout expectation from 1s to 5s,
but it makes it workable in globally-deployer cluster.
2015-08-17 16:50:40 -07:00
c7fbc01ef1 Merge pull request #3314 from sebschrader/proxy-loop
Warn about proxy loops with incorrect advertise-client-urls
2015-08-17 16:04:00 -07:00
d487cf6b63 etcdhttp:write etcderror for all errors in keyhandler 2015-08-17 15:51:29 -07:00
f70950ff93 docs: warn about proxy loops with incorrect advertise-client-urls 2015-08-18 00:42:48 +02:00
c530385d6d Merge pull request #3313 from yichengq/internal-timeout
etcdserver: use ReqTimeout only
2015-08-17 15:05:46 -07:00
af6d1d3d95 Merge pull request #3310 from xiang90/http_err
*: key handler should write auth error as etcd error
2015-08-17 14:57:19 -07:00
2d5b95c49f etcdserver: use ReqTimeout only
We cannot refer RTT value from heartbeat interval, so CommitTimeout
is invalid. Remove it and use ReqTimeout instead.
2015-08-17 14:54:25 -07:00
87f061bab2 *: key handler should write auth error as etcd error 2015-08-17 14:45:45 -07:00
ba3a9b5f92 Merge pull request #3309 from xiang90/enforce
etcdserver: add version enforcement when setting cluster version
2015-08-17 12:41:04 -07:00
15e03d801f etcdserver: add version enforcement when setting cluster version 2015-08-17 11:12:39 -07:00
f615f9a999 Merge pull request #3305 from xiang90/c_v
*: only print out major.minor version for cluster version
2015-08-17 09:40:01 -07:00
7083828ae3 Godeps: import github.com/ugorji/go/codec 2015-08-16 18:13:44 -07:00
a364af72af client: use ugorij/go/codec to unmarshal key response
This change speeds up response unmarshal ~2x:

```
BenchmarkSmallResponseUnmarshal	   20000	     75243 ns/op
BenchmarkManySmallResponseUnmarshal	     200	   6629661 ns/op
BenchmarkMediumResponseUnmarshal	    1000	   1359041 ns/op
BenchmarkLargeResponseUnmarshal	      20	  61600978 ns/op
```
2015-08-16 18:08:54 -07:00
95d100e957 client: add response unmarshal benchmark
The benchmark result:

```
BenchmarkSmallResponseUnmarshal	  10000	   164524 ns/op
BenchmarkManySmallResponseUnmarshal	    100	 13916636 ns/op
BenchmarkMediumResponseUnmarshal	   1000	  1974295 ns/op
BenchmarkLargeResponseUnmarshal	     20	 80462001 ns/op
ok		github.com/coreos/etcd/client	7.777s
```
2015-08-16 16:44:50 -07:00
d95c7d8a94 Merge pull request #3307 from ian-kelling/master
documentation: fix misspelled word
2015-08-15 18:53:58 -07:00
8dd44465c3 documentation: fix misspelled word 2015-08-15 17:56:17 -07:00
f199a484af *: only print out major.minor version for cluster version 2015-08-15 08:30:06 -07:00
bbcb38189c Merge pull request #3302 from xiang90/v
etcdserver: better version detection log output
2015-08-14 16:14:55 -07:00
0076ab154b etcdserver: better version detection log output
Fix https://github.com/coreos/etcd/issues/3288
2015-08-14 16:08:33 -07:00
dd56b7e05e Merge pull request #3299 from xiang90/txn
initial support for txn
2015-08-14 16:05:16 -07:00
5cd109949a etcdctl: support txn 2015-08-14 15:58:38 -07:00
9233fff48f etcdserver: support txn 2015-08-14 11:45:31 -07:00
46865fa5a5 etcdserverpb: update proto 2015-08-14 11:45:07 -07:00
d448593bbc Merge pull request #3295 from yichengq/err-example
client: fix clusterError typo in README
2015-08-14 09:35:31 -07:00
5eed141d54 client: fix clusterError typo in README
It helps users to use client better.
2015-08-13 16:38:41 -07:00
fefb273389 *: bump to v2.2.0-alpha.1+git 2015-08-13 16:01:31 -07:00
201bb4b3d8 *: bump to v2.2.0-alpha.1 2015-08-13 16:01:09 -07:00
3cc4957d98 Merge pull request #3293 from yichengq/improve-err
etcdserver: improve error message when timeout due to leader fail
2015-08-13 15:58:48 -07:00
c229e6e655 etcdserver: improve error message when timeout due to leader fail 2015-08-13 15:46:21 -07:00
394894e03e Merge pull request #3291 from yichengq/auth-cap
etcdhttp: add auth capability in 2.2
2015-08-13 15:01:59 -07:00
ceb27b1c48 etcdhttp: add auth capability in 2.2 2015-08-13 14:49:10 -07:00
a17288558e Merge pull request #3289 from yichengq/marshal
etcdserver: go back to marshal request in 2.1 way
2015-08-13 14:20:24 -07:00
334bdd1c26 Merge pull request #3153 from gtank/tls-setup
hack: TLS setup using cfssl
2015-08-13 13:53:14 -07:00
959feb70d1 Merge pull request #3275 from xiang90/sort
improve in order key generation
2015-08-13 13:51:19 -07:00
a7b9bff939 store: add 0 as padding for better lexicographic sorting. 2015-08-13 13:42:37 -07:00
0fdb77aea2 etcdserver: go back to marshal request in 2.1 way
It fixes the problem that 2.1 cannot roll upgrade to 2.2 smoothly
because 2.1 cannot understand the bytes marshalled at 2.2.
2015-08-13 13:41:52 -07:00
003d096138 Merge pull request #3286 from yichengq/fit-2.2
*: update MinClusterVersion and supportedStream map
2015-08-13 13:31:37 -07:00
c9cca6a93b *: update MinClusterVersion and supportedStream map 2015-08-13 13:05:14 -07:00
846b1fdbcd Merge pull request #3287 from xiang90/update_roadmap
Update roadmap
2015-08-13 13:00:01 -07:00
329647ab62 roadmap: update roadmap 2015-08-13 12:56:23 -07:00
6a64051245 roadmap: remove 2.1 milestone 2015-08-13 12:51:58 -07:00
80005af5b2 Merge pull request #3285 from yichengq/bump-capnslog
godeps: bump capnslog to 42a8c3b1a6f917bb8346ef738f32712a7ca0ede7
2015-08-13 11:49:38 -07:00
d66ede7186 godeps: bump capnslog to 42a8c3b1a6f917bb8346ef738f32712a7ca0ede7 2015-08-13 11:32:45 -07:00
a46943548a *: bump to v2.2.0-alpha.0+git 2015-08-13 10:21:36 -07:00
ab5a69cb18 *: bump to v2.2.0-alpha.0 2015-08-13 10:20:05 -07:00
976ce93539 Merge pull request #3277 from yichengq/better-log
etcdserver: specify timeout caused by leader election
2015-08-12 17:02:27 -07:00
27170e67b9 etcdserver: specify timeout caused by leader election
Before this PR, the timeout caused by leader election returns:

```
14:45:37 etcd2 | 2015-08-12 14:45:37.786349 E | etcdhttp: got unexpected
response error (etcdserver: request timed out)
```

After this PR:

```
15:52:54 etcd1 | 2015-08-12 15:52:54.389523 E | etcdhttp: etcdserver:
request timed out, possibly due to leader down
```
2015-08-12 16:53:18 -07:00
ddfe343e77 Merge pull request #3271 from yichengq/doc-discovery
docs: add discovery protocol doc
2015-08-12 13:51:32 -07:00
a45f0ede56 docs: add discovery protocol doc
This document talks about the technical details of discovery service
protocol. It helps users to learn about how discovery service works and
what behavior to expect.
2015-08-12 13:15:21 -07:00
7bd9d9aede Merge pull request #3273 from polvi/kube-hack
add etcd on k8s example
2015-08-12 22:13:15 +03:00
cfb3522b63 add etcd on k8s example 2015-08-12 22:12:00 +03:00
f468d8b51a Merge pull request #3270 from xiang90/better_err
Better error message for etcdctl
2015-08-12 10:27:42 -07:00
7e04a79fb4 etcdctl: print out better error information 2015-08-12 10:09:56 -07:00
5d06d4ec44 client: print url as string 2015-08-12 10:09:40 -07:00
e894756144 Merge pull request #3190 from yichengq/adjust-prop-timeout
etcdserver: adjust proposal timeout based on config
2015-08-12 09:41:25 -07:00
c3d4d11402 etcdhttp: adjust request timeout based on config
It uses heartbeat interval and election timeout to estimate the
expected request timeout.

This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
2015-08-12 09:22:59 -07:00
18ecc297bc Merge pull request #3254 from es-chow/log-group
set groupID in multinode as log context so it can be logged
2015-08-12 08:05:50 -07:00
cc362ccdad raft: set logger to raft so log context such as multinode groupID can be logged 2015-08-12 22:56:00 +08:00
5a91937367 etcdserver: adjust commit timeout based on config
It uses heartbeat interval and election timeout to estimate the
commit timeout for internal requests.

This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
2015-08-11 21:09:03 -07:00
042afcf2a3 Merge pull request #3266 from yichengq/client-readme
client: clean up README
2015-08-11 16:21:13 -07:00
7d618c46ad client: clean up README
Address rob's comments about sentences in README.
2015-08-11 15:33:56 -07:00
18a1c95f22 Merge pull request #3263 from xiang90/ctl_tr
etcdctl: add per request timeout
2015-08-11 14:17:12 -07:00
dceacacd49 Merge pull request #3194 from yichengq/client-readme
client: add README
2015-08-11 13:35:54 -07:00
e36c499d0f etcdctl: add per request timeout 2015-08-11 13:33:50 -07:00
8a7cf56e13 client: add README
It describes some basic usage and caveat of etcd/client package.

Write it together with Xiang.
2015-08-11 12:07:24 -07:00
83efc08137 Merge pull request #3262 from yichengq/client-deadline
client: return context.DeadlineExceeded instead of ClusterError
2015-08-11 10:42:29 -07:00
a1ef699aeb client: return context.DeadlineExceeded instead of ClusterError
This is done to match user expectation to see context.DeadlineExceeded
when it reaches deadline.
2015-08-11 10:18:38 -07:00
1fe52e1ec3 Merge pull request #3245 from yichengq/client_timeout
client: set timeout for each request
2015-08-11 10:10:42 -07:00
f4c29a5f55 client: support to set timeout for each request
Add HeaderTimeout field in Config, so users could set timeout for each request.
Before this, one hanged request may block the call for long time. After
this, if the network is good, the user could set short timeout and expect
that API call can attempt next available endpoint quickly.
2015-08-11 10:01:05 -07:00
a718329ad3 Merge pull request #3248 from xiang90/v3
initial v3 demo
2015-08-10 13:59:03 -07:00
fb5e1ac548 Merge pull request #3256 from xiang90/update_log
update logger
2015-08-10 13:54:28 -07:00
6c58333969 etcdmain: use default formatter
The default formatter would use syslog style when running
under init system, and would use pretty format otherwise.
2015-08-10 13:38:22 -07:00
48e36bbb84 Godep: update capnslog dependency 2015-08-10 13:38:00 -07:00
b0ea4ab3b1 doc: link to v3 api doc 2015-08-10 11:22:55 -07:00
c32919e6d1 *: rename v3etcdctl to etcdctlv3 2015-08-10 11:21:37 -07:00
c1e0b19f9f *: better flag 2015-08-10 09:53:17 -07:00
48b1cd54f3 Merge pull request #3243 from xiang90/conf
doc: add runtime reconfiguration design doc
2015-08-09 10:56:51 -07:00
89bf5824c2 Merge pull request #3159 from sofuture/master
use /usr/bin/env to find bash
2015-08-09 10:56:12 -07:00
601801ced5 doc: add runtime reconfiguration design doc 2015-08-09 10:55:34 -07:00
45f3a0c547 Merge pull request #3249 from philips/get-etcd-running-under-arm64
Get etcd running under arm64
2015-08-08 20:32:33 -07:00
1239e1ce6f test, scripts: use /usr/bin/env to find bash
use /usr/bin/env to find bash

add set -e back into scripts it was removed from
2015-08-08 20:52:53 -06:00
1b894c6b0b test: race detector doesn't work on armv7l
Test fails without this fix on armv7l:

    go test: -race is only supported on linux/amd64, freebsd/amd64, darwin/amd64 and windows/amd64
2015-08-08 18:11:41 -07:00
fb1951204c etcdserver: move atomics to make etcd work on arm64
Follow the simple rule in the atomic package:

"On both ARM and x86-32, it is the caller's responsibility to arrange
for 64-bit alignment of 64-bit words accessed atomically. The first word
in a global variable or in an allocated struct or slice can be relied
upon to be 64-bit aligned."

Tested on a system with /proc/cpuinfo reporting:

processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3
tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc0d
CPU revision    : 1
2015-08-08 18:11:41 -07:00
9ff7075ce8 etcdserver: use v3server interface 2015-08-08 10:39:04 -07:00
523567bcc7 v3etcdctl: initial v3 ctl support 2015-08-08 05:58:58 -07:00
f004b4dac7 *: etcdserver supports v3 demo 2015-08-08 05:58:29 -07:00
82afadbcc6 etcdserverpb: update proto 2015-08-08 05:31:35 -07:00
668a8a8367 Merge pull request #3242 from xiang90/typo
*: fix typos vaild->valid
2015-08-07 10:58:39 -07:00
845c51fedd *: fix typos vaild->valid 2015-08-07 10:57:11 -07:00
f0a5874473 Merge pull request #3241 from yichengq/sync-pin
client: Sync() pin the endpoint when member list doesn't change
2015-08-07 10:24:29 -07:00
0ab16db728 client: Sync() pin the endpoint when member list doesn't change
This helps client to pin the same endpoint as long as cluster doesn't change.
2015-08-07 10:08:28 -07:00
d7adcc3e65 Merge pull request #3239 from xiang90/improve_probing
rafthttp: use customized transport for probing
2015-08-07 09:37:32 -07:00
b6580a9591 rafthttp: use customized transport for probing
We need to support TLS verification when probing.
2015-08-06 16:20:44 -07:00
d2363afd52 Merge pull request #3240 from xiang90/fix_log
etcdmain: fix path printing
2015-08-06 15:56:14 -07:00
f03f048232 Merge pull request #3184 from yichengq/fast-bootstrap
etcdserver: tick ElectionTicks before starting when bootstrap new cluster
2015-08-06 15:54:40 -07:00
1b572ae2dd etcdmain: fix path printing 2015-08-06 15:53:24 -07:00
21f5b885f2 etcdserver: fast election timeout when bootstrap cluster
The behavior accelarates the happen of the first-time leader election,
so the cluster could elect its leader fast. Technically, it could
help to reduce `electionMs - heartbeatMs` wait time for the first leader election.

Main usage:
1. Quick start for the local cluster when setting a little longer
election timeout
2. Quick start for the global cluster, which sets election timeout to
its maximum 50s.
2015-08-06 15:44:26 -07:00
a637e86372 Merge pull request #3220 from yichengq/fix-auth-check
etcdhttp: fix access check for multiple roles in auth
2015-08-06 15:09:04 -07:00
b9c6b64d61 Merge pull request #3216 from yichengq/cancel-err
client: return context canceled error correctly
2015-08-06 15:04:49 -07:00
b965c4b415 Merge pull request #3217 from yichengq/update-migrate-example
update commands used in admin_guide.md
2015-08-06 15:00:04 -07:00
78af793338 client: return context canceled error correctly
If the body is closed to stop watching, it will ignore the error from
reading body and return context error.

Before this PR, the cancel when watching always returns error `read tcp
127.0.0.1:57824: use of closed network connection`. After this PR, it
will return expected context canceled error.
2015-08-06 14:52:04 -07:00
b04bb3e0ea Merge pull request #3229 from xiang90/f_cerr
client: return context.Canceled error when user cancels the request
2015-08-06 14:41:19 -07:00
25ad71fbac Merge pull request #3225 from yichengq/client-record-err
client: return correct error for 50x response
2015-08-06 14:40:38 -07:00
7314310aed Merge pull request #3233 from xiang90/srv_discovery
better dns discovery error and doc
2015-08-06 14:35:22 -07:00
cfeaf3d172 client: return correct error for 50x response
etcd always returns 500/503 response when it may have no leader.
So we should log the other 50x response in a normal way.

This helps to log correctly when discovery meets 504 error. Before this
PR, it logs like this:

```
18:31:58 etcd2 | 2015/08/4 18:31:58 discovery: error #0: client: etcd
member https://discovery.etcd.io has no leader
18:31:58 etcd2 | 2015/08/4 18:31:58 discovery: waiting for other nodes:
error connecting to https://discovery.etcd.io, retrying in 4s
```

After this PR:

```
22:20:25 etcd2 | 2015/08/4 22:20:25 discovery: error #0: client: etcd
member https://discovery.etcd.io returns server error [Gateway Timeout]
22:20:25 etcd2 | 2015/08/4 22:20:25 discovery: waiting for other nodes:
error connecting to https://discovery.etcd.io, retrying in 4s
```
2015-08-06 14:25:03 -07:00
e9f05e8959 doc: explain srv error 2015-08-06 14:24:58 -07:00
2c2249dadc Merge pull request #3219 from yichengq/limit-listener
etcdmain: stop accepting client conns when it reachs limit
2015-08-06 12:17:49 -07:00
97923ca3fc etcdmain: close client conns when it exceeds limit
This solves the problem that etcd may fatal because its critical path
cannot get file descriptor resource when the number of clients is too
big. The PR lets the client listener close client connections
immediately after they are accepted when
the file descriptor usage in the process reaches some pre-set limit, so
it ensures that the internal critical path could always get file
descriptor when it needs.

When there are tons to clients connecting to the server, the original
behavior is like this:

```
2015/08/4 16:42:08 etcdserver: cannot monitor file descriptor usage
(open /proc/self/fd: too many open files)
2015/08/4 16:42:33 etcdserver: failed to purge snap file open
default2.etcd/member/snap: too many open files
[halted]
```

Current behavior is like this:

```
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:26 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:27 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 etcdserver: 80% of the file descriptor limit is
used [used = 873, limit = 1024]
```

It is available at linux system today because pkg/runtime only has linux
support.
2015-08-06 12:03:20 -07:00
203e0f178b etcdmian: better error for srv discovery failure 2015-08-06 11:38:53 -07:00
01c286ccb6 Merge pull request #3231 from xiang90/fallocate
pkg/fileutil: support perallocate
2015-08-06 10:25:28 -07:00
39a4b6a5e5 pkg/fileutil: support perallocate 2015-08-06 10:10:58 -07:00
9a8607fce1 Merge pull request #3187 from yichengq/client-keep-sync
client: add KeepSync function
2015-08-06 00:16:28 -07:00
c53b3016ae client: add AutoSync function
AutoSync provides the way for client to syncing member list from
etcd cluster automatically.
2015-08-05 13:22:56 -07:00
807a6f209e docs/admin_guide: decouple example from CoreOS specific details
This makes the example commands general, while keeping it easy to
understand. It also fixes some name mismatch.
2015-08-05 11:33:46 -07:00
f38187bbdb client: return context.Canceled error when user cancels the request 2015-08-05 09:52:30 -07:00
ff0b8723c7 Merge pull request #2688 from xiang90/versioning
etcdserver: internal request union
2015-08-05 09:27:32 -07:00
58503817ec etcdserver: internal request union 2015-08-05 07:47:10 -07:00
487639b2d8 Merge pull request #3222 from mitake/wal-log-error
wal: log errors in wal.Close()
2015-08-04 23:19:45 -07:00
9cbeffc720 Merge pull request #3224 from xiang90/fix_ls
etcdctl: ls takes / as default key arg
2015-08-04 23:15:29 -07:00
ba76e27875 wal: log errors in wal.Close()
This patch adds error logging in wal.Close() if unlocking and
destroying fail. Though it is hard to handling the errors, logging
would be helpful for trouble shooting.
2015-08-05 15:03:45 +09:00
9527a97720 etcdctl: ls takes / as default key arg 2015-08-04 22:56:55 -07:00
718a42f408 Merge pull request #3210 from xiang90/probing
monitoring connectivity between peers
2015-08-04 16:56:31 -07:00
18169e896c etcdhttp: fix access check for multiple roles in auth
Check access for multiple roles should go through all roles.
2015-08-04 14:31:07 -07:00
0650170a1b Merge pull request #3196 from eyakubovich/fix-watch-timeout
client: handle watch timing out elegantly
2015-08-04 13:52:42 -07:00
1e048b5c24 rafthttp: cleanup prober when stopping the transport 2015-08-04 17:42:51 +08:00
709718ed97 godeps: update probing pkg 2015-08-04 17:40:39 +08:00
0fc764200d rafthttp: monitor connection 2015-08-04 17:39:40 +08:00
ff5c3469c1 Merge pull request #3197 from xiang90/health
etcdctl: cluster-health supports forever flag
2015-08-03 20:48:06 -07:00
6312e22b1d client: handle empty watch responses elegantly
Even though current etcd does not time out
watches, the client could be running against
an old etcd version or the server may close
polling connection for other reasons.
This patch ignores successful (as in 200)
responses with emtpy bodies instead
of producing JSON errors.
2015-08-03 11:47:21 -07:00
306085db5f Godeps: add probing dependency 2015-08-03 09:07:43 +08:00
f7f00b0af6 etcdctl: cluster-health supports forever flag
cluster-health command supports checking the cluster health
forever.
2015-08-01 22:29:08 +08:00
3da1df2648 Merge pull request #3207 from xiang90/rm_migration
*: remove migration related stuff from 2.2
2015-08-01 19:47:17 +08:00
2b8abeb093 *: remove migration related stuff from 2.2 2015-08-01 19:37:20 +08:00
eee1c8b8ee Merge pull request #3200 from xiang90/d_doc
doc: unique names must be specified when using public discovery service
2015-08-01 07:34:25 +08:00
8bd9554338 Merge pull request #3202 from yichengq/fix-etcdctl-watch
etcdctl: fix watch -after-index parsing
2015-07-31 14:41:45 -07:00
4a89b3f8f3 Merge pull request #3116 from offscale/master
build: implemented build shell-script for Windows
2015-07-31 11:55:42 -07:00
05b2d06788 Merge pull request #3199 from xiang90/sdnotify
etcdmain: support sdnotify for readiness
2015-07-31 19:04:35 +08:00
4a0d8ee4bd build: implemented build shell-script for Windows 2015-07-31 17:43:47 +10:00
0cbac56fa2 etcdmain: support sdnotify for readiness 2015-07-31 13:33:18 +08:00
beeecc32b0 doc: unique names must be specified when using public discovery service 2015-07-31 09:12:44 +08:00
c1c5c7c99c Merge pull request #3091 from barakmich/client_auth_cov
etcdhttp: Improve test coverage surrounding auth
2015-07-30 17:00:49 -04:00
dd1a8fe330 etcdhttp: Improve test coverage surrounding auth 2015-07-30 14:21:08 -04:00
147885078c etcdctl: fix watch -after-index parsing
It uses -after-index incorrectly now:

```
$ ./bin/etcdctl --debug watch -after-index 31 foo
Cluster-Endpoints: http://localhost:2379, http://localhost:4001
cURL Command: curl -X GET
http://localhost:2379/v2/keys/foo?recursive=false&wait=true&waitIndex=33
```

After this PR:

```
$ ./bin/etcdctl --debug watch -after-index 31 foo
Cluster-Endpoints: http://localhost:2379, http://localhost:4001
cURL Command: curl -X GET
http://localhost:2379/v2/keys/foo?recursive=false&wait=true&waitIndex=32
```
2015-07-30 11:15:43 -07:00
219ed1695b Merge pull request #3178 from yichengq/refactor-cluster-health
etcdctl: refactor the way to check cluster health
2015-07-29 18:16:26 -07:00
80b794dccc Merge pull request #3185 from xiang90/add_debug_endpoint
etcdhttp: add config/local/debug endpoint
2015-07-30 08:46:07 +08:00
4e31df2c2b etcdhttp: add config/local/log endpoint
PUT on the endpoint sets the GlobalDebugLevel to json level value.
The action overwrites the origianl log level setting from
users. We need to write doc to warn this.
2015-07-30 08:35:01 +08:00
e62a3b8a62 Merge pull request #2891 from glensc/patch-1
build: use posix shell
2015-07-29 17:15:57 -07:00
ff945c7404 Merge pull request #3181 from xiang90/2.2-client-error
client: return cluster error if the etcd cluster is not avaliable
2015-07-30 08:08:09 +08:00
f1aaa7a9e3 etcdctl: refactor the way to check cluster health
This method uses raft status exposed at /debug/varz to determine the
health of the cluster. It uses whether commit index increases to
determine the cluster health, and uses whether match index increases to
determine the member health.

This could fix the bug #2711 that fails to detect follower is unhealthy
because it doesn't rely on whether message in long-polling connection is sent.

This health check is stricter than the old one, and reflects the
situation that whether followers are healthy in the view of the leader. One
example is that if the follower is receiving the snapshot, it will turns
out to be unhealthy because it doesn't move forward.

`etcdctl cluster-health` will reflect the healthy view in the raft level,
while connectivity checks reflects the healthy view in transport level.
2015-07-29 17:06:55 -07:00
a47e661fff discovery: print out detailed cluster error 2015-07-29 23:06:57 +08:00
5fa8652241 client: return cluster error if the etcd cluster is not avaliable
Add a new ClusterError type. It contians all encountered errors and
return ClusterNotAvailable as the error string.
2015-07-29 22:55:15 +08:00
6b8b507312 Merge pull request #3176 from yichengq/reject-high-election
etcdmain: reject unreasonably high values of -election-timeout
2015-07-28 10:33:58 -07:00
ec214030d0 etcdmain: reject unreasonably high values of -election-timeout
This helps users to detect setting problem early.
2015-07-28 10:07:57 -07:00
edfec45bf5 hack: TLS setup using cfssl
this demonstrates basic TLS setup with cfssl. it's much easier than other
available tools.
2015-07-27 14:51:17 -07:00
7831a30e46 Merge pull request #3180 from shafreeck/master
Update libraries-and-tools.md
2015-07-27 14:45:31 -07:00
6184e271a4 Merge pull request #3164 from yichengq/pin-endpoint
client: pin itself to an endpoint that given
2015-07-27 14:35:51 -07:00
6fc9dbfe56 Merge pull request #3114 from yichengq/clean-raft-init
etcdserver: clean up start and stop logic of raft
2015-07-27 14:19:25 -07:00
ea2347a40f client: pin itself to an endpoint that given
1. When reset endpoints, client will choose a random endpoint to pin.
2. If the pinned endpoint is healthy, client will keep using it.
3. If the pinned endpoint becomes unhealthy, client will attempt other
endpoints and update its pin.
2015-07-27 13:36:53 -07:00
7696dd3280 etcdserver: clean up start and stop logic of raft
kill TODO and make it more readable.
2015-07-27 13:24:26 -07:00
5e3dc31e6f Merge pull request #3150 from gouyang/master
pkg/mflag: add modified flag package
2015-07-24 15:26:07 -07:00
a7eef376b7 Merge pull request #3183 from xiang90/txn
*: tnx -> txn
2015-07-25 01:48:06 +08:00
53a77fa519 *: tnx -> txn 2015-07-24 23:21:09 +08:00
c9769ee966 etcdmain: Don't print flags when flag parse error
At present it prints the whole usage and flags, which cause the exact
error message is hidden two screens above.

Fixes #3141

Signed-off-by: Guohua Ouyang <gouyang@redhat.com>
2015-07-24 21:29:21 +08:00
e75446ca27 docs: add cetcd into libraries-and-tools.md 2015-07-24 12:08:39 +00:00
b407f72766 Merge pull request #3166 from yichengq/publish-timeout
etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout
2015-07-23 10:30:41 -07:00
b7892b20c1 etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout
This makes code more readable and reasonable.
2015-07-23 10:09:28 -07:00
58bc617dd0 Merge pull request #3175 from xiang90/2.2-ctl-bug
etcdctl: fix exec watch command
2015-07-23 14:37:38 +08:00
448ca20cdc etcdctl: fix exec watch command
The previous flag parsing has a small issue. It uses
`recursive == true` and `after-index == 0` to determine
if user specifies the sub flags. This is incorrect since
user can specify `after-index = 0`. Then the flag parsing
would be confused.

This commit explicitly find the `--` in the remaining args
and determine the key and cmdArgs accordingly.
2015-07-23 13:13:15 +08:00
43f4b99d52 Merge pull request #3174 from xiang90/2.2_submit_bug
doc: add reporting bug doc
2015-07-23 13:08:35 +08:00
1b5e41e3f4 doc: add reporting bug doc 2015-07-23 12:55:38 +08:00
93002caca5 Merge pull request #3165 from yichengq/client-quorum
client: add Quorum option in getOption
2015-07-22 16:54:14 -07:00
b20b87893f client: add Quorum option in getOption 2015-07-22 15:19:34 -07:00
6be02ff5ec etcdmian: fix initialization confilct
Fix #3142

Ignore flags if etcd is already initialized.
2015-07-21 12:53:21 -07:00
24db661401 etcdmain: warn when listening on HTTP if TLS is set
If the user sets TLS info, this implies that he wants to listen on TLS.
If etcd finds that urls to listen is still HTTP schema, it prints out
warning to notify user about possible wrong setting.
2015-07-21 12:53:21 -07:00
604709cad7 etcdctl: update -peers to default to use schema
Change its default value from `127.0.0.1:4001,127.0.0.1:2379` to
`http://127.0.0.1:4001,http://127.0.0.1:2379`

Adding HTTP schema makes its format consistent with etcd's xxx-urls
flags.
2015-07-21 12:53:21 -07:00
d9c27138fa discovery: return bad discovery endpoint error 2015-07-21 12:53:21 -07:00
d2dac0fe59 client: consume json error and return ErrInvaildJSON
The default JSON error is not very readable. We let client
consume the error and return a more understandable error in
the context of etcd.

Fix #3120
2015-07-21 12:53:21 -07:00
6317abf7e4 pkg/transport: fix HTTPS downgrade bug for keepalive listener
If TLS config is empty, etcd downgrades keepalive listener from HTTPS to
HTTP without warning. This results in HTTPS downgrade bug for client urls.
The commit returns error if it cannot listen on TLS.
2015-07-21 12:53:21 -07:00
43437e21f9 etcdctl: added domain discovery flag
provided a domain, will look up SRV records for etcd endpoints

Fixes #2636
2015-07-21 12:53:21 -07:00
dc3f7f5d90 *: detect duplicate name for discovery bootstrap 2015-07-21 12:53:20 -07:00
b8279b3591 types: add len func for urlmaps 2015-07-21 12:53:20 -07:00
ee82ee05b4 etcdctl: support member update command 2015-07-21 12:53:20 -07:00
6e3769d39e client: add member update 2015-07-21 12:53:20 -07:00
9f9661f513 etcdctl: print out key and action when watching recursively 2015-07-21 12:53:20 -07:00
87ef0f0b3e godep: remove go-etcd dependency 2015-07-21 12:53:20 -07:00
071ad9f72b etcdctl: health use etcd/client 2015-07-21 12:53:20 -07:00
0b1ddce889 etcdctl: import snap use etcd/client 2015-07-21 12:53:20 -07:00
adeb101e04 etcdctl: remove old stuff 2015-07-21 12:53:20 -07:00
759c156e3e etcdctl: exec_watch use etcd/client 2015-07-21 12:53:20 -07:00
5b01b3877f etcdctl: watch use etcd/client 2015-07-21 12:53:20 -07:00
b20c06348d etcdctl: ls use etcd/client 2015-07-21 12:53:19 -07:00
ae1669de26 etcdctl: updatedir use etcd/client 2015-07-21 12:53:19 -07:00
f12ae45c6a etcdctl: update use etcd/client 2015-07-21 12:53:19 -07:00
58b19a7c1e etcdctl: rmdir use etcd/client 2015-07-21 12:53:19 -07:00
9d7a8dd2b0 etcdctl: mk use etcd/client 2015-07-21 12:53:19 -07:00
61befc7ce6 etcdctl: minor cleanup 2015-07-21 12:53:19 -07:00
e3fcc450cf etcdctl: make rm use etcd/client 2015-07-21 12:53:19 -07:00
9d9c3a7180 etcdctl: make setdir/mkdir use etcd/client 2015-07-21 12:53:19 -07:00
db4b18aee3 etcdctl: make set command use etcd/client 2015-07-21 12:53:19 -07:00
e9478ba630 etcdctl: make get command use etcd/client 2015-07-21 12:53:19 -07:00
09b9c30beb pkg/transport: include debug output for trusted-ca
since --peer-ca-file is deprecated we need to update the debug output

before:

```
$ etcd ... --peer-cert-file infra1.crt -peer-key-file
 infra1.key.insecure -peer-trusted-ca-file ca.crt --client-cert-auth
etcdmain: peerTLS: cert = infra1.crt, key = infra1.key.insecure, ca =
```

after:

```
$ etcd ... --peer-cert-file infra1.crt -peer-key-file
 infra1.key.insecure -peer-trusted-ca-file ca.crt --client-cert-auth
etcdmain: peerTLS: cert = infra1.crt, key = infra1.key.insecure, ca = , trusted-ca = ca.crt
```
2015-07-04 14:28:18 -07:00
77c3613d94 build: use posix shell 2015-05-30 09:34:54 +03:00
511 changed files with 112883 additions and 13423 deletions

View File

@ -2,6 +2,7 @@ language: go
sudo: false
go:
- 1.4
- 1.5
install:
- go get github.com/barakmich/go-nyet

View File

@ -12,6 +12,14 @@ etcd is Apache 2.0 licensed and accepts contributions via GitHub pull requests.
- Fork the repository on GitHub
- Read the README.md for build instructions
## Reporting Bugs and Creating Issues
Reporting bugs is one of the best ways to contribute. However, a good bug report
has some very specific qualities, so please read over our short document on
[reporting bugs](https://github.com/coreos/etcd/blob/master/Documentation/reporting_bugs.md)
before you submit your bug report. This document might contain links known
issues, another good reason to take a look there, before reporting your bug.
## Contribution flow
This is a rough outline of what a contributor's workflow looks like:

View File

@ -0,0 +1,31 @@
## Snapshot Migration
You can migrate a snapshot of your data from a v0.4.9+ cluster into a new etcd 2.2 cluster using a snapshot migration. After snapshot migration, the etcd indexes of your data will change. Many etcd applications rely on these indexes to behave correctly. This operation should only be done while all etcd applications are stopped.
To get started get the newest data snapshot from the 0.4.9+ cluster:
```
curl http://cluster.example.com:4001/v2/migration/snapshot > backup.snap
```
Now, import the snapshot into your new cluster:
```
etcdctl --endpoint new_cluster.example.com import --snap backup.snap
```
If you have a large amount of data, you can specify more concurrent works to copy data in parallel by using `-c` flag.
If you have hidden keys to copy, you can use `--hidden` flag to specify.
And the data will quickly copy into the new cluster:
```
entering dir: /
entering dir: /foo
entering dir: /foo/bar
copying key: /foo/bar/1 1
entering dir: /
entering dir: /foo2
entering dir: /foo2/bar2
copying key: /foo2/bar2/2 2
```

View File

@ -8,14 +8,17 @@ When first started, etcd stores its configuration into a data directory specifie
Configuration is stored in the write ahead log and includes: the local member ID, cluster ID, and initial cluster configuration.
The write ahead log and snapshot files are used during member operation and to recover after a restart.
If a members data directory is ever lost or corrupted then the user should remove the etcd member from the cluster via the [members API][members-api].
Having a dedicated disk to store wal files can improve the throughput and stabilize the cluster.
It is highly recommended to dedicate a wal disk and set `--wal-dir` to point to a directory on that device for a production cluster deployment.
If a members data directory is ever lost or corrupted then the user should [remove][remove-a-member] the etcd member from the cluster using `etcdctl` tool.
A user should avoid restarting an etcd member with a data directory from an out-of-date backup.
Using an out-of-date data directory can lead to inconsistency as the member had agreed to store information via raft then re-joins saying it needs that information again.
For maximum safety, if an etcd member suffers any sort of data corruption or loss, it must be removed from the cluster.
Once removed the member can be re-added with an empty data directory.
[members-api]: other_apis.md#members-api
[remove-a-member]: runtime-configuration.md#remove-a-member
#### Contents
@ -24,6 +27,8 @@ The data directory has two sub-directories in it:
1. wal: write ahead log files are stored here. For details see the [wal package documentation][wal-pkg]
2. snap: log snapshots are stored here. For details see the [snap package documentation][snap-pkg]
If `--wal-dir` flag is set, etcd will write the write ahead log files to the specified directory instead of data directory.
[wal-pkg]: http://godoc.org/github.com/coreos/etcd/wal
[snap-pkg]: http://godoc.org/github.com/coreos/etcd/snap
@ -34,6 +39,74 @@ The data directory has two sub-directories in it:
If you are spinning up multiple clusters for testing it is recommended that you specify a unique initial-cluster-token for the different clusters.
This can protect you from cluster corruption in case of mis-configuration because two members started with different cluster tokens will refuse members from each other.
#### Monitoring
It is important to monitor your production etcd cluster for healthy information and runtime metrics.
##### Health Monitoring
At lowest level, etcd exposes health information via HTTP at `/health` in JSON format. If it returns `{"health": "true"}`, then the cluster is healthy. Please note the `/health` endpoint is still an experimental one as in etcd 2.2.
```
$ curl -L http://127.0.0.1:2379/health
{"health": "true"}
```
You can also use etcdctl to check the cluster-wide health information. It will contact all the members of the cluster and collect the health information for you.
```
$./etcdctl cluster-health
member 8211f1d0f64f3269 is healthy: got healthy result from http://127.0.0.1:12379
member 91bc3c398fb3c146 is healthy: got healthy result from http://127.0.0.1:22379
member fd422379fda50e48 is healthy: got healthy result from http://127.0.0.1:32379
cluster is healthy
```
##### Runtime Metrics
etcd uses [Prometheus](http://prometheus.io/) for metrics reporting in the server. You can read more through the runtime metrics [doc](metrics.md).
#### Debugging
Debugging a distributed system can be difficult. etcd provides several ways to make debug
easier.
##### Enabling Debug Logging
When you want to debug etcd without stopping it, you can enable debug logging at runtime.
etcd exposes logging configuration at `/config/local/log`.
```
$ curl http://127.0.0.1:2379/config/local/log -XPUT -d '{"Level":"DEBUG"}'
$ # debug logging enabled
$
$ curl http://127.0.0.1:2379/config/local/log -XPUT -d '{"Level":"INFO"}'
$ # debug logging disabled
```
##### Debugging Variables
Debug variables are exposed for real-time debugging purposes. Developers who are familiar with etcd can utilize these variables to debug unexpected behavior. etcd exposes debug variables via HTTP at `/debug/vars` in JSON format. The debug variables contains
`cmdline`, `file_descriptor_limit`, `memstats` and `raft.status`.
`cmdline` is the command line arguments passed into etcd.
`file_descriptor_limit` is the max number of file descriptors etcd can utilize.
`memstats` is well explained [here](http://golang.org/pkg/runtime/#MemStats).
`raft.status` is useful when you want to debug low level raft issues if you are familiar with raft internals. In most cases, you do not need to check `raft.status`.
```json
{
"cmdline": ["./etcd"],
"file_descriptor_limit": 0,
"memstats": {"Alloc":4105744,"TotalAlloc":42337320,"Sys":12560632,"...":"..."},
"raft.status": {"id":"ce2a822cea30bfca","term":5,"vote":"ce2a822cea30bfca","commit":23509,"lead":"ce2a822cea30bfca","raftState":"StateLeader","progress":{"ce2a822cea30bfca":{"match":23509,"next":23510,"state":"ProgressStateProbe"}}}
}
```
#### Optimal Cluster Size
The recommended etcd cluster size is 3, 5 or 7, which is decided by the fault tolerance requirement. A 7-member cluster can provide enough fault tolerance in most cases. While larger cluster provides better fault tolerance the write performance reduces since data needs to be replicated to more machines.
@ -57,7 +130,7 @@ As you can see, adding another member to bring the size of cluster up to an odd
#### Changing Cluster Size
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md), which allows the cluster to be modified without downtime. The `etcdctl` tool has a `member list`, `member add` and `member remove` commands to complete this process.
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md#cluster-reconfiguration-operations), which allows the cluster to be modified without downtime. The `etcdctl` tool has a `member list`, `member add` and `member remove` commands to complete this process.
### Member Migration
@ -67,7 +140,7 @@ The data directory contains all the data to recover a member to its point-in-tim
* Stop the member process
* Copy the data directory of the now-idle member to the new machine
* Update the peer URLs for that member to reflect the new machine according to the [member api] [change peer url]
* Update the peer URLs for that member to reflect the new machine according to the [runtime configuration] [change peer url]
* Start etcd on the new machine, using the same configuration and the copy of the data directory
This example will walk you through the process of migrating the infra1 member to a new machine:
@ -78,11 +151,11 @@ This example will walk you through the process of migrating the infra1 member to
|infra1|10.0.1.11:2380|
|infra2|10.0.1.12:2380|
```
```sh
$ export ETCDCTL_PEERS=http://10.0.1.10:2379,http://10.0.1.11:2379,http://10.0.1.12:2379
```
```
```sh
$ etcdctl member list
84194f7c5edd8b37: name=infra0 peerURLs=http://10.0.1.10:2380 clientURLs=http://127.0.0.1:2379,http://10.0.1.10:2379
b4db3bf5e495e255: name=infra1 peerURLs=http://10.0.1.11:2380 clientURLs=http://127.0.0.1:2379,http://10.0.1.11:2379
@ -91,53 +164,59 @@ bc1083c870280d44: name=infra2 peerURLs=http://10.0.1.12:2380 clientURLs=http://1
#### Stop the member etcd process
```
$ ssh core@10.0.1.11
```sh
$ ssh 10.0.1.11
```
```
$ sudo systemctl stop etcd
```sh
$ kill `pgrep etcd`
```
#### Copy the data directory of the now-idle member to the new machine
```
$ tar -cvzf node1.etcd.tar.gz /var/lib/etcd/node1.etcd
$ tar -cvzf infra1.etcd.tar.gz %data_dir%
```
```
$ scp node1.etcd.tar.gz core@10.0.1.13:~/
```sh
$ scp infra1.etcd.tar.gz 10.0.1.13:~/
```
#### Update the peer URLs for that member to reflect the new machine
```
```sh
$ curl http://10.0.1.10:2379/v2/members/b4db3bf5e495e255 -XPUT \
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.1.13:2380"]}'
```
Or use `etcdctl member update` command
```sh
$ etcdctl member update b4db3bf5e495e255 http://10.0.1.13:2380
```
#### Start etcd on the new machine, using the same configuration and the copy of the data directory
```sh
$ ssh 10.0.1.13
```
$ ssh core@10.0.1.13
```sh
$ tar -xzvf infra1.etcd.tar.gz -C %data_dir%
```
```
$ tar -xzvf node1.etcd.tar.gz -C /var/lib/etcd
```
```
etcd -name node1 \
etcd -name infra1 \
-listen-peer-urls http://10.0.1.13:2380 \
-listen-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379
```
[change peer url]: other_apis.md#change-the-peer-urls-of-a-member
[change peer url]: runtime-configuration.md#update-a-member
### Disaster Recovery
etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N/2)-1_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.
etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N-1)/2_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.
To recover from such scenarios, etcd provides functionality to backup and restore the datastore and recreate the cluster without data loss.
@ -149,8 +228,8 @@ The first step of the recovery is to backup the data directory on a functioning
```sh
etcdctl backup \
--data-dir /var/lib/etcd \
--backup-dir /tmp/etcd_backup
--data-dir %data_dir% \
--backup-dir %backup_data_dir%
```
This command will rewrite some of the metadata contained in the backup (specifically, the node ID and cluster ID), which means that the node will lose its former identity. In order to recreate a cluster from the backup, you will need to start a new, single-node cluster. The metadata is rewritten to prevent the new node from inadvertently being joined onto an existing cluster.
@ -161,7 +240,7 @@ To restore a backup using the procedure created above, start etcd with the `-for
```sh
etcd \
-data-dir=/tmp/etcd_backup \
-data-dir=%backup_data_dir% \
-force-new-cluster \
...
```
@ -172,18 +251,18 @@ Once you have verified that etcd has started successfully, shut it down and move
```sh
pkill etcd
rm -fr /var/lib/etcd
mv /tmp/etcd_backup /var/lib/etcd
rm -fr %data_dir%
mv %backup_data_dir% %data_dir%
etcd \
-data-dir=/var/lib/etcd \
-data-dir=%data_dir% \
...
```
#### Restoring the cluster
Now that the node is running successfully, you should [change its advertised peer URLs](other_apis.md#change-the-peer-urls-of-a-member), as the `--force-new-cluster` has set the peer URL to the default (listening on localhost).
Now that if the node is running successfully, you should [change its advertised peer URLs](runtime-configuration.md#update-a-member), as the `--force-new-cluster` has set the peer URL to the default (listening on localhost).
You can then add more nodes to the cluster and restore resiliency. See the [runtime configuration](runtime-configuration.md) guide for more details.
You can then add more nodes to the cluster and restore resiliency. See the [add a new member](runtime-configuration.md#add-a-new-member) guide for more details. **NB:** If you are trying to restore your cluster using old failed etcd nodes, please make sure you have stopped old etcd instances and removed their old data directories specified by the data-dir configuration parameter.
### Client Request Timeout

View File

@ -82,7 +82,7 @@ X-Raft-Term: 1
- `X-Raft-Index` is similar to the etcd index but is for the underlying raft protocol
- `X-Raft-Term` is an integer that will increase whenever an etcd master election happens in the cluster. If this number is increasing rapidly, you may need to tune the election timeout. See the [tuning][tuning] section for details.
[tuning]: #tuning
[tuning]: tuning.md
### Get the value of a key
@ -356,6 +356,13 @@ So the first watch after the get should be:
curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=2008'
```
#### Connection being closed prematurely
The server may close a long polling connection before emitting any events.
This can happend due to a timeout or the server being shutdown.
Since the HTTP header is sent immediately upon accepting the connection, the response will be seen as empty: `200 OK` and empty body.
The clients should be prepared to deal with this scenario and retry the watch.
### Atomically Creating In-Order Keys
Using `POST` on a directory, you can create keys with key names that are created in-order.
@ -373,7 +380,7 @@ curl http://127.0.0.1:2379/v2/keys/queue -XPOST -d value=Job1
"action": "create",
"node": {
"createdIndex": 6,
"key": "/queue/6",
"key": "/queue/00000000000000000006",
"modifiedIndex": 6,
"value": "Job1"
}
@ -392,7 +399,7 @@ curl http://127.0.0.1:2379/v2/keys/queue -XPOST -d value=Job2
"action": "create",
"node": {
"createdIndex": 29,
"key": "/queue/29",
"key": "/queue/00000000000000000029",
"modifiedIndex": 29,
"value": "Job2"
}
@ -416,13 +423,13 @@ curl -s 'http://127.0.0.1:2379/v2/keys/queue?recursive=true&sorted=true'
"nodes": [
{
"createdIndex": 2,
"key": "/queue/2",
"key": "/queue/00000000000000000002",
"modifiedIndex": 2,
"value": "Job1"
},
{
"createdIndex": 3,
"key": "/queue/3",
"key": "/queue/00000000000000000003",
"modifiedIndex": 3,
"value": "Job2"
}
@ -465,7 +472,7 @@ curl http://127.0.0.1:2379/v2/keys/dir -XPUT -d ttl=30 -d dir=true -d prevExist=
Keys that are under this directory work as usual, but when the directory expires, a watcher on a key under the directory will get an expire event:
```sh
curl 'http://127.0.0.1:2379/v2/keys/dir/asdf?wait=true'
curl 'http://127.0.0.1:2379/v2/keys/dir?wait=true'
```
```json

View File

@ -24,49 +24,6 @@ https://github.com/coreos/etcd/blob/master/Documentation/configuration.md.
The default data dir location has changed from {$hostname}.etcd to {name}.etcd.
## Data Directory Migration
The disk format within the data directory changed with etcd 2.0.
If you run etcd 2.0 on an etcd 0.4 data directory it will automatically migrate the data and start.
You will want to coordinate this upgrade by walking through each of your machines in the cluster, stopping etcd 0.4 and then starting etcd 2.0.
If you would rather manually do the migration, to test it out first in another environment, you can use the [migration tool doc][migrationtooldoc].
[migrationtooldoc]: https://github.com/coreos/etcd/blob/master/tools/etcd-migrate/README.md
## Snapshot Migration
If you are only interested in the data in etcd you can migrate a snapshot of your data from a v0.4.9+ cluster into a new etcd 2.0 cluster using a snapshot migration.
The advantage of this method is that you are directly dumping only the etcd data so you can run your old and new cluster side-by-side, snapshot the data, import it and then point your applications at this cluster.
The disadvantage is that the etcd indexes of your data will change which may confuse applications that use etcd.
To get started get the newest data snapshot from the 0.4.9+ cluster:
```
curl http://cluster.example.com:4001/v2/migration/snapshot > backup.snap
```
Now, import the snapshot into your new cluster:
```
etcdctl -C new_cluster.example.com import --snap backup.snap
```
If you have a large amount of data, you can specify more concurrent works to copy data in parallel by using `-c` flag.
If you have hidden keys to copy, you can use `--hidden` flag to specify.
And the data will quickly copy into the new cluster:
```
entering dir: /
entering dir: /foo
entering dir: /foo/bar
copying key: /foo/bar/1 1
entering dir: /
entering dir: /foo2
entering dir: /foo2/bar2
copying key: /foo2/bar2/2 2
```
## Key-Value API
### Read consistency flag

View File

@ -2,4 +2,12 @@
etcd benchmarks will be published regularly and tracked for each release below:
- [etcd v2.1.0](etcd-2-1-0-benchmarks.md)
- [etcd v2.1.0-alpha](./etcd-2-1-0-alpha-benchmarks.md)
- [etcd v2.2.0-rc](./etcd-2-2-0-rc-benchmarks.md)
- [etcd v3 demo](./etcd-3-demo-benchmarks.md)
# Memory Usage Benchmarks
It records expected memory usage in different scenarios.
- [etcd v2.2.0-rc](./etcd-2-2-0-rc-memory-benchmarks.md)

View File

@ -6,7 +6,7 @@ GCE n1-highcpu-2 machine type
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
- etcd version 2.1.0
- etcd version 2.1.0 alpha
## etcd Cluster

View File

@ -0,0 +1,67 @@
## Physical machines
GCE n1-highcpu-2 machine type
- 1x dedicated local SSD mounted under /var/lib/etcd
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
## etcd Cluster
3 etcd 2.2.0-rc members, each runs on a single machine.
Detailed versions:
```
etcd Version: 2.2.0-alpha.1+git
Git SHA: 59a5a7e
Go Version: go1.4.2
Go OS/Arch: linux/amd64
```
Also, we use 3 etcd 2.1.0 alpha-stage members to form cluster to get base performance. etcd's commit head is at [c7146bd5](https://github.com/coreos/etcd/commits/c7146bd5f2c73716091262edc638401bb8229144), which is the same as the one that we use in [etcd 2.1 benchmark](./etcd-2-1-0-benchmarks.md).
## Testing
Bootstrap another machine and use benchmark tool [boom](https://github.com/rakyll/boom) to send requests to each etcd member. Check [here](../../hack/benchmark/) for instructions.
## Performance
### reading one single key
| key size in bytes | number of clients | target etcd server | read QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|--------------------|----------|---------------|
| 64 | 1 | leader only | 2804 (-5%) | 0.4 (+0%) |
| 64 | 64 | leader only | 17816 (+0%) | 5.7 (-6%) |
| 64 | 256 | leader only | 18667 (-6%) | 20.4 (+2%) |
| 256 | 1 | leader only | 2181 (-15%) | 0.5 (+25%) |
| 256 | 64 | leader only | 17435 (-7%) | 6.0 (+9%) |
| 256 | 256 | leader only | 18180 (-8%) | 21.3 (+3%) |
| 64 | 64 | all servers | 46965 (-4%) | 2.1 (+0%) |
| 64 | 256 | all servers | 55286 (-6%) | 7.4 (+6%) |
| 256 | 64 | all servers | 46603 (-6%) | 2.1 (+5%) |
| 256 | 256 | all servers | 55291 (-6%) | 7.3 (+4%) |
### writing one single key
| key size in bytes | number of clients | target etcd server | write QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|--------------------|-----------|---------------|
| 64 | 1 | leader only | 76 (+22%) | 19.4 (-15%) |
| 64 | 64 | leader only | 2461 (+45%) | 31.8 (-32%) |
| 64 | 256 | leader only | 4275 (+1%) | 69.6 (-10%) |
| 256 | 1 | leader only | 64 (+20%) | 16.7 (-30%) |
| 256 | 64 | leader only | 2385 (+30%) | 31.5 (-19%) |
| 256 | 256 | leader only | 4353 (-3%) | 74.0 (+9%) |
| 64 | 64 | all servers | 2005 (+81%) | 49.8 (-55%) |
| 64 | 256 | all servers | 4868 (+35%) | 81.5 (-40%) |
| 256 | 64 | all servers | 1925 (+72%) | 47.7 (-59%) |
| 256 | 256 | all servers | 4975 (+36%) | 70.3 (-36%) |
### performance changes explanation
- read QPS in most scenarios is decreased by 5~8%. The reason is that etcd records store metrics for each store operation. The metrics is important for monitoring and debugging, so this is acceptable.
- write QPS to leader is increased by 20~30%. This is because we decouple raft main loop and entry apply loop, which avoids them blocking each other.
- write QPS to all servers is increased by 30~80% because follower could receive latest commit index earlier and commit proposals faster.

View File

@ -0,0 +1,47 @@
## Physical machine
GCE n1-standard-2 machine type
- 1x dedicated local SSD mounted under /var/lib/etcd
- 1x dedicated slow disk for the OS
- 7.5 GB memory
- 2x CPUs
## etcd
```
etcd Version: 2.2.0-rc.0+git
Git SHA: 103cb5c
Go Version: go1.5
Go OS/Arch: linux/amd64
```
## Testing
Start 3-member etcd cluster, each of which uses 2 cores.
The length of key name is always 64 bytes, which is a reasonable length of average key bytes.
## Memory Maximal Usage
- etcd may use maximal memory if one follower is dead and the leader keeps sending snapshots.
- `max RSS` is the maximal memory usage recorded in 3 runs.
| value bytes | key number | data size(MB) | max RSS(MB) | max RSS/data rate on leader |
|-------------|-------------|---------------|-------------|-----------------------------|
| 128 | 50000 | 6 | 433 | 72x |
| 128 | 100000 | 12 | 659 | 54x |
| 128 | 200000 | 24 | 1466 | 61x |
| 1024 | 50000 | 48 | 1253 | 26x |
| 1024 | 100000 | 96 | 2344 | 24x |
| 1024 | 200000 | 192 | 4361 | 22x |
## Data Size Threshold
- When etcd reaches data size threshold, it may trigger leader election easily and drop part of proposals.
- At most cases, etcd cluster should work smoothly if it doesn't hit the threshold. If it doesn't work well due to insufficient resources, you need to decrease its data size.
| value bytes | key number limitation | suggested data size threshold(MB) | consumed RSS(MB) |
|-------------|-----------------------|-----------------------------------|------------------|
| 128 | 400K | 48 | 2400 |
| 1024 | 300K | 292 | 6500 |

View File

@ -0,0 +1,40 @@
## Physical machines
GCE n1-highcpu-2 machine type
- 1x dedicated local SSD mounted under /var/lib/etcd
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
- etcd version 2.2.0
## etcd Cluster
1 etcd member running in v3 demo mode
## Testing
Use [etcd v3 benchmark tool](../../hack/v3benchmark/).
## Performance
### reading one single key
| key size in bytes | number of clients | read QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|----------|---------------|
| 256 | 1 | 2716 | 0.4 |
| 256 | 64 | 16623 | 6.1 |
| 256 | 256 | 16622 | 21.7 |
The performance is nearly the same as the one with empty server handler.
### reading one single key after putting
| key size in bytes | number of clients | read QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|----------|---------------|
| 256 | 1 | 2269 | 0.5 |
| 256 | 64 | 13582 | 8.6 |
| 256 | 256 | 13262 | 47.5 |
The performance with empty server handler is not affected by one put. So the
performance downgrade should be caused by storage package.

View File

@ -4,7 +4,7 @@
Starting an etcd cluster statically requires that each member knows another in the cluster. In a number of cases, you might not know the IPs of your cluster members ahead of time. In these cases, you can bootstrap an etcd cluster with the help of a discovery service.
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md).
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md). To better understand the design behind runtime reconfiguration, we suggest you read [this](runtime-reconf-design.md).
This guide will cover the following mechanisms for bootstrapping an etcd cluster:
@ -38,6 +38,8 @@ Note that the URLs specified in `initial-cluster` are the _advertised peer URLs_
If you are spinning up multiple clusters (or creating and destroying a single cluster) with same configuration for testing purpose, it is highly recommended that you specify a unique `initial-cluster-token` for the different clusters. By doing this, etcd can generate unique cluster IDs and member IDs for the clusters even if they otherwise have the exact same configuration. This can protect you from cross-cluster-interaction, which might corrupt your clusters.
etcd listens on [`listen-client-urls`](configuration.md#-listen-client-urls) to accept client traffic. etcd member advertises the URLs specified in [`advertise-client-urls`](configuration.md#-advertise-client-urls) to other members, proxies, clients. Please make sure the `advertise-client-urls` are reachable from intended clients. A common mistake is setting `advertise-client-urls` to localhost or leave it as default when you want the remote clients to reach etcd.
On each machine you would start etcd with these flags:
```
@ -122,6 +124,8 @@ There two methods that can be used for discovery:
### etcd Discovery
To better understand the design about discovery service protocol, we suggest you read [this](./discovery_protocol.md).
#### Lifetime of a Discovery URL
A discovery URL identifies a unique etcd cluster. Instead of reusing a discovery URL, you should always create discovery URLs for new clusters.
@ -144,6 +148,8 @@ If you bootstrap an etcd cluster using discovery service with more than the expe
The URL you will use in this case will be `https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83` and the etcd members will use the `https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83` directory for registration as they start.
Each member must have a different name flag specified. Or discovery will fail due to duplicated name.
Now we start etcd with those relevant flags for each member:
```
@ -194,6 +200,8 @@ ETCD_DISCOVERY=https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573d
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
Each member must have a different name flag specified. Or discovery will fail due to duplicated name.
Now we start etcd with those relevant flags for each member:
```
@ -296,6 +304,8 @@ infra2.example.com. 300 IN A 10.0.1.12
etcd cluster members can listen on domain names or IP address, the bootstrap process will resolve DNS A records.
The resolved address in `-initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets. The etcd member reads the resolved address to find out if it belongs to the cluster defined in the SRV records.
```
$ etcd -name infra0 \
-discovery-srv example.com \
@ -372,6 +382,10 @@ DNS SRV records can also be used to configure the list of peers for an etcd serv
$ etcd --proxy on -discovery-srv example.com
```
#### Error Cases
You might see the an error like `cannot find local etcd $name from SRV records.`. That means the etcd member fails to find itself from the cluster defined in SRV records. The resolved address in `-initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets.
# 0.4 to 2.0+ Migration Guide
In etcd 2.0 we introduced the ability to listen on more than one address and to advertise multiple addresses. This makes using etcd easier when you have complex networking, such as private and public networks on various cloud providers.

View File

@ -13,45 +13,64 @@ To start etcd automatically using custom settings at startup in Linux, using a [
##### -name
+ Human-readable name for this member.
+ default: "default"
+ env variable: ETCD_NAME
+ This value is referenced as this node's own entries listed in the `-initial-cluster` flag (Ex: `default=http://localhost:2380` or `default=http://localhost:2380,default=http://localhost:7001`). This needs to match the key used in the flag if you're using [static boostrapping](clustering.md#static).
##### -data-dir
+ Path to the data directory.
+ default: "${name}.etcd"
+ env variable: ETCD_DATA_DIR
##### -wal-dir
+ Path to the dedicated wal directory. If this flag is set, etcd will write the WAL files to the walDir rather than the dataDir. This allows a dedicated disk to be used, and helps avoid io competition between logging and other IO operations.
+ default: ""
+ env variable: ETCD_WAL_DIR
##### -snapshot-count
+ Number of committed transactions to trigger a snapshot to disk.
+ default: "10000"
+ env variable: ETCD_SNAPSHOT_COUNT
##### -heartbeat-interval
+ Time (in milliseconds) of a heartbeat interval.
+ default: "100"
+ env variable: ETCD_HEARTBEAT_INTERVAL
##### -election-timeout
+ Time (in milliseconds) for an election to timeout.
+ Time (in milliseconds) for an election to timeout. See [Documentation/tuning.md](tuning.md#time-parameters) for details.
+ default: "1000"
+ env variable: ETCD_ELECTION_TIMEOUT
##### -listen-peer-urls
+ List of URLs to listen on for peer traffic.
+ List of URLs to listen on for peer traffic. This flag tells the etcd to accept incoming requests from its peers on the specified scheme://IP:port combinations. Scheme can be either http or https.If 0.0.0.0 is specified as the IP, etcd listens to the given port on all interfaces. If an IP address is given as well as a port, etcd will listen on the given port and interface. Multiple URLs may be used to specify a number of addresses and ports to listen on. The etcd will respond to requests from any of the listed addresses and ports.
+ default: "http://localhost:2380,http://localhost:7001"
+ env variable: ETCD_LISTEN_PEER_URLS
+ example: "http://10.0.0.1:2380"
+ invalid example: "http://example.com:2380" (domain name is invalid for binding)
##### -listen-client-urls
+ List of URLs to listen on for client traffic.
+ List of URLs to listen on for client traffic. This flag tells the etcd to accept incoming requests from the clients on the specified scheme://IP:port combinations. Scheme can be either http or https. If 0.0.0.0 is specified as the IP, etcd listens to the given port on all interfaces. If an IP address is given as well as a port, etcd will listen on the given port and interface. Multiple URLs may be used to specify a number of addresses and ports to listen on. The etcd will respond to requests from any of the listed addresses and ports.
+ default: "http://localhost:2379,http://localhost:4001"
+ env variable: ETCD_LISTEN_CLIENT_URLS
+ example: "http://10.0.0.1:2379"
+ invalid example: "http://example.com:2379" (domain name is invalid for binding)
##### -max-snapshots
+ Maximum number of snapshot files to retain (0 is unlimited)
+ default: 5
+ env variable: ETCD_MAX_SNAPSHOTS
+ The default for users on Windows is unlimited, and manual purging down to 5 (or your preference for safety) is recommended.
##### -max-wals
+ Maximum number of wal files to retain (0 is unlimited)
+ default: 5
+ env variable: ETCD_MAX_WALS
+ The default for users on Windows is unlimited, and manual purging down to 5 (or your preference for safety) is recommended.
##### -cors
+ Comma-separated white list of origins for CORS (cross-origin resource sharing).
+ default: none
+ env variable: ETCD_CORS
### Clustering Flags
@ -61,43 +80,55 @@ To start etcd automatically using custom settings at startup in Linux, using a [
##### -initial-advertise-peer-urls
+ List of this member's peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster. At least one must be routable to all cluster members.
+ List of this member's peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster. At least one must be routable to all cluster members. These URLs can contain domain names.
+ default: "http://localhost:2380,http://localhost:7001"
+ env variable: ETCD_INITIAL_ADVERTISE_PEER_URLS
+ example: "http://example.com:2380, http://10.0.0.1:2380"
##### -initial-cluster
+ Initial cluster configuration for bootstrapping.
+ default: "default=http://localhost:2380,default=http://localhost:7001"
+ env variable: ETCD_INITIAL_CLUSTER
+ The key is the value of the `-name` flag for each node provided. The default uses `default` for the key because this is the default for the `-name` flag.
##### -initial-cluster-state
+ Initial cluster state ("new" or "existing"). Set to `new` for all members present during initial static or DNS bootstrapping. If this option is set to `existing`, etcd will attempt to join the existing cluster. If the wrong value is set, etcd will attempt to start but fail safely.
+ default: "new"
+ env variable: ETCD_INITIAL_CLUSTER_STATE
[static bootstrap]: clustering.md#static
##### -initial-cluster-token
+ Initial cluster token for the etcd cluster during bootstrap.
+ default: "etcd-cluster"
+ env variable: ETCD_INITIAL_CLUSTER_TOKEN
##### -advertise-client-urls
+ List of this member's client URLs to advertise to the rest of the cluster.
+ List of this member's client URLs to advertise to the rest of the cluster. These URLs can contain domain names.
+ default: "http://localhost:2379,http://localhost:4001"
+ env variable: ETCD_ADVERTISE_CLIENT_URLS
+ example: "http://example.com:2379, http://10.0.0.1:2379"
+ Be careful if you are advertising URLs such as http://localhost:2379 from a cluster member and are using the proxy feature of etcd. This will cause loops, because the proxy will be forwarding requests to itself until its resources (memory, file descriptors) are eventually depleted.
##### -discovery
+ Discovery URL used to bootstrap the cluster.
+ default: none
+ env variable: ETCD_DISCOVERY
##### -discovery-srv
+ DNS srv domain used to bootstrap the cluster.
+ default: none
+ env variable: ETCD_DISCOVERY_SRV
##### -discovery-fallback
+ Expected behavior ("exit" or "proxy") when discovery services fails.
+ default: "proxy"
+ env variable: ETCD_DISCOVERY_FALLBACK
##### -discovery-proxy
+ HTTP proxy to use for traffic to discovery service.
+ default: none
+ env variable: ETCD_DISCOVERY_PROXY
### Proxy Flags
@ -106,81 +137,99 @@ To start etcd automatically using custom settings at startup in Linux, using a [
##### -proxy
+ Proxy mode setting ("off", "readonly" or "on").
+ default: "off"
+ env variable: ETCD_PROXY
##### -proxy-failure-wait
+ Time (in milliseconds) an endpoint will be held in a failed state before being reconsidered for proxied requests.
+ default: 5000
+ env variable: ETCD_PROXY_FAILURE_WAIT
##### -proxy-refresh-interval
+ Time (in milliseconds) of the endpoints refresh interval.
+ default: 30000
+ env variable: ETCD_PROXY_REFRESH_INTERVAL
##### -proxy-dial-timeout
+ Time (in milliseconds) for a dial to timeout or 0 to disable the timeout
+ default: 1000
+ env variable: ETCD_PROXY_DIAL_TIMEOUT
##### -proxy-write-timeout
+ Time (in milliseconds) for a write to timeout or 0 to disable the timeout.
+ default: 5000
+ env variable: ETCD_PROXY_WRITE_TIMEOUT
##### -proxy-read-timeout
+ Time (in milliseconds) for a read to timeout or 0 to disable the timeout.
+ Don't change this value if you use watches because they are using long polling requests.
+ default: 0
+ env variable: ETCD_PROXY_READ_TIMEOUT
### Security Flags
The security flags help to [build a secure etcd cluster][security].
##### -ca-file [DEPRECATED]
+ Path to the client server TLS CA file.
+ Path to the client server TLS CA file. `-ca-file ca.crt` could be replaced by `-trusted-ca-file ca.crt -client-cert-auth` and etcd will perform the same.
+ default: none
+ env variable: ETCD_CA_FILE
##### -cert-file
+ Path to the client server TLS cert file.
+ default: none
+ env variable: ETCD_CERT_FILE
##### -key-file
+ Path to the client server TLS key file.
+ default: none
+ env variable: ETCD_KEY_FILE
##### -client-cert-auth
+ Enable client cert authentication.
+ default: false
+ env variable: ETCD_CLIENT_CERT_AUTH
##### -trusted-ca-file
+ Path to the client server TLS trusted CA key file.
+ default: none
+ env variable: ETCD_TRUSTED_CA_FILE
##### -peer-ca-file [DEPRECATED]
+ Path to the peer server TLS CA file.
+ Path to the peer server TLS CA file. `-peer-ca-file ca.crt` could be replaced by `-peer-trusted-ca-file ca.crt -peer-client-cert-auth` and etcd will perform the same.
+ default: none
+ env variable: ETCD_PEER_CA_FILE
##### -peer-cert-file
+ Path to the peer server TLS cert file.
+ default: none
+ env variable: ETCD_PEER_CERT_FILE
##### -peer-key-file
+ Path to the peer server TLS key file.
+ default: none
+ env variable: ETCD_PEER_KEY_FILE
##### -peer-client-cert-auth
+ Enable peer client cert authentication.
+ default: false
+ env variable: ETCD_PEER_CLIENT_CERT_AUTH
##### -peer-trusted-ca-file
+ Path to the peer server TLS trusted CA file.
+ default: none
+ env variable: ETCD_PEER_TRUSTED_CA_FILE
### Logging Flags
##### -debug
+ Drop the default log level to DEBUG for all subpackages.
+ default: false (INFO for all packages)
+ env variable: ETCD_DEBUG
##### -log-package-levels
+ Set individual etcd subpackages to specific log levels. An example being `etcdserver=WARNING,security=DEBUG`
+ default: none (INFO for all packages)
+ env variable: ETCD_LOG_PACKAGE_LEVELS
### Unsafe Flags
@ -192,6 +241,14 @@ Follow the instructions when using these flags.
##### -force-new-cluster
+ Force to create a new one-member cluster. It commits configuration changes in force to remove all existing members in the cluster and add itself. It needs to be set to [restore a backup][restore].
+ default: false
+ env variable: ETCD_FORCE_NEW_CLUSTER
### Experimental Flags
##### -experimental-v3demo
+ Enable experimental [v3 demo API](rfc/v3api.proto).
+ default: false
+ env variable: ETCD_EXPERIMENTAL_V3DEMO
### Miscellaneous Flags

View File

@ -0,0 +1,109 @@
# etcd release guide
The guide talks about how to release a new version of etcd.
The procedure includes some manual steps for sanity checking but it can probably be further scripted. Please keep this document up-to-date if you want to make changes to the release process.
## Prepare Release
Set desired version as environment variable for following steps. Here is an example to release 2.1.3:
```
export VERSION=v2.1.3
export PREV_VERSION=v2.1.2
```
All releases version numbers follow the format of [semantic versioning 2.0.0](http://semver.org/).
### Major, Minor Version Release, or its Pre-release
- Ensure the relevant milestone on GitHub is complete. All referenced issues should be closed, or moved elsewhere.
- Remove this release from [roadmap](https://github.com/coreos/etcd/blob/master/ROADMAP.md), if necessary.
- Ensure the latest upgrade documentation is available.
- Bump [hardcoded MinClusterVerion in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L29), if necessary.
- Add feature capability maps for the new version, if necessary.
### Patch Version Release
- Discuss about commits that are backported to the patch release. The commits should not include merge commits.
- Cherry-pick these commits starting from the oldest one into stable branch.
## Write Release Note
- Write introduction for the new release. For example, what major bug we fix, what new features we introduce or what performance improvement we make.
- Write changelog for the last release. ChangeLog should be straightforward and easy to understand for the end-user.
- Put `[GH XXXX]` at the head of change line to reference Pull Request that introduces the change. Moreover, add a link on it to jump to the Pull Request.
## Tag Version
- Bump [hardcoded Version in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L30) to the latest version `${VERSION}`.
- Ensure all tests on CI system are passed.
- Manually check etcd is buildable in Linux, Darwin and Windows.
- Manually check upgrade etcd cluster of previous minor version works well.
- Manually check new features work well.
- Add a signed tag through `git tag -s ${VERSION}`.
- Sanity check tag correctness through `git show tags/$VERSION`.
- Push the tag to GitHub through `git push origin tags/$VERSION`. This assumes `origin` corresponds to "https://github.com/coreos/etcd".
## Build Release Binaries and Images
- Ensure `actool` is available, or installing it through `go get github.com/appc/spec/actool`.
- Ensure `docker` is available.
Run release script in root directory:
```
./scripts/release.sh ${VERSION}
```
It generates all release binaries and images under directory ./release.
## Sign Binaries and Images
Choose appropriate private key to sign the generated binaries and images.
The following commands are used for public release sign:
```
cd release
# personal GPG is okay for now
for i in etcd-*{.zip,.tar.gz}; do gpg --sign ${i}; done
# use `CoreOS ACI Builder <release@coreos.com>` secret key
gpg -u 88182190 -a --output etcd-${VERSION}-linux-amd64.aci.asc --detach-sig etcd-${VERSION}-linux-amd64.aci
```
## Publish Release Page in GitHub
- Set release title as the version name.
- Follow the format of previous release pages.
- Attach the generated binaries, aci image and signatures.
- Select whether it is a pre-release.
- Publish the release!
## Publish Docker Image in Quay.io
- Push docker image:
```
docker login quay.io
docker push quay.io/coreos/etcd:${VERSION}
```
- Add `latest` tag to the new image on [quay.io](https://quay.io/repository/coreos/etcd?tag=latest&tab=tags) if this is a stable release.
## Announce to etcd-dev Googlegroup
- Follow the format of [previous release emails](https://groups.google.com/forum/#!forum/etcd-dev).
- Make sure to include a list of authors that contributed since the previous release - something like the following might be handy:
```
git log ...${PREV_VERSION} --pretty=format:"%an" | sort | uniq | tr '\n' ',' | sed -e 's#,#, #g' -e 's#, $##'
```
- Send email to etcd-dev@googlegroups.com
## Post Release
- Create new stable branch through `git push origin ${VERSION_MAJOR}.${VERSION_MINOR}` if this is a major stable release. This assumes `origin` corresponds to "https://github.com/coreos/etcd".
- Bump [hardcoded Version in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L30) to the version `${VERSION}+git`.

View File

@ -0,0 +1,109 @@
# Discovery Service Protocol
Discovery service protocol helps new etcd member to discover all other members in cluster bootstrap phase using a shared discovery URL.
Discovery service protocol is _only_ used in cluster bootstrap phase, and cannot be used for runtime reconfiguration or cluster monitoring.
The protocol uses a new discovery token to bootstrap one _unique_ etcd cluster. Remember that one discovery token can represent only one etcd cluster. As long as discovery protocol on this token starts, even if fails halfway, it must not be used to bootstrap another etcd cluster.
The rest of this article will walk through the discovery process with examples that correspond to a self-hosted discovery cluster. The public discovery service, discovery.etcd.io, functions the same way, but with a layer of polish to abstract away ugly URLs, generate UUIDs automatically, and provide some protections against excessive requests. At its core, the public discovery service still uses an etcd cluster as the data store as described in this document.
## The Protocol Workflow
The idea of discovery protocol is to use an internal etcd cluster to coordinate bootstrap of a new cluster. First, all new members interact with discovery service and help to generate the expected member list. Then each new member bootstraps its server using this list, which performs the same functionality as -initial-cluster flag.
In the following example workflow, we will list each step of protocol in curl format for ease of understanding.
By convention the etcd discovery protocol uses the key prefix `_etcd/registry`. If `http://example.com` hosts a etcd cluster for discovery service, a full URL to discovery keyspace will be `http://example.com/v2/keys/_etcd/registry`. We will use this as the URL prefix in the example.
### Creating a New Discovery Token
Generate a unique token that will identify the new cluster. This will be used as a unique prefix in discovery keyspace in the following steps. An easy way to do this is to use `uuidgen`:
```
UUID=$(uuidgen)
```
### Specifying the Expected Cluster Size
You need to specify the expected cluster size for this discovery token. The size is used by the discovery service to know when it has found all members that will initially form the cluster.
```
curl -X PUT http://example.com/v2/keys/_etcd/registry/${UUID}/_config/size -d value=${cluster_size}
```
Usually the cluster size is 3, 5 or 7. Check [optimal cluster size](admin_guide.md#optimal-cluster-size) for more details.
### Bringing up etcd Processes
Now that you have your discovery URL, you can use it as `-discovery` flag and bring up etcd processes. Every etcd process will follow this next few steps internally if given a `-discovery` flag.
### Registering itself
The first thing for etcd process is to register itself into the discovery URL as a member. This is done by creating member ID as a key in the discovery URL.
```
curl -X PUT http://example.com/v2/keys/_etcd/registry/${UUID}/${member_id}?prevExist=false -d value="${member_name}=${member_peer_url_1}&${member_name}=${member_peer_url_2}"
```
### Checking the Status
It checks the expected cluster size and registration status in discovery URL, and decides what the next action is.
```
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}/_config/size
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}
```
If registered members are still not enough, it will wait for left members to appear.
If the number of registered members is bigger than the expected size N, it treats the first N registered members as the member list for the cluster. If the member itself is in the member list, the discovery procedure succeeds and it fetches all peers through the member list. If it is not in the member list, the discovery procedure finishes with the failure that the cluster has been full.
In etcd implementation, the member may check the cluster status even before registering itself. So it could fail quickly if the cluster has been full.
### Waiting for All Members
The wait process is described in details [here](https://github.com/coreos/etcd/blob/master/Documentation/api.md#waiting-for-a-change).
```
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}?wait=true&waitIndex=${current_etcd_index}
```
It keeps waiting until finding all members.
## Public Discovery Service
CoreOS Inc. hosts a public discovery service at https://discovery.etcd.io/ , which provides some nice features for ease of use.
### Mask Key Prefix
Public discovery service will redirect `https://discovery.etcd.io/${UUID}` to etcd cluster behind for the key at `/v2/keys/_etcd/registry`. It masks register key prefix for short and readable discovery url.
### Get new token
```
GET /new
Sent query:
size=${cluster_size}
Possible status codes:
200 OK
400 Bad Request
200 Body:
generated discovery url
```
The generation process in the service follows the step from [Creating a New Discovery Token](#creating-a-new-discovery-token) to [Specifying the Expected Cluster Size](#specifying-the-expected-cluster-size).
### Check Discovery Status
```
GET /${UUID}
```
You can check the status for this discovery token, including the machines that have been registered, by requesting the value of the UUID.
### Open-source repository
The repository is located at https://github.com/coreos/discovery.etcd.io. You could use it to build your own public discovery service.

80
Documentation/faq.md Normal file
View File

@ -0,0 +1,80 @@
# FAQ
## 1) How come I can read an old version of the data when a majority of the members are down?
In situations where a client connects to a minority, etcd
favors by default availability over consistency. This means that even though
data might be “out of date”, it is still better to return something versus
nothing.
In order to confirm that a read is up to date with a majority of the cluster,
the client can use the `quorum=true` parameter on reads of keys. This means
that a majority of the cluster is checked on reads before returning the data,
otherwise the read will timeout and fail.
## 2) With quorum=false, doesnt this mean that if my client switched the member it was connected to, that it could experience a logical ordering where the cluster goes backwards in time?
Yes, but this could be handled at the etcd client implementation via
remembering the last seen index. The “index” is the cluster's single
irrevocable sequence of the entire modification history. The client could
remember the last seen index, and determine via comparing the index returned on
the GET whether or not the state of the key-value pair is before or after its
last seen state.
## 3) What happens if a watch is registered on a minority member?
The watch will stay untriggered, even as modifications are occurring in the
majority quorum. This is an open issue, and is being addressed in v3. There are
multiple ways to work around the watch trigger not firing.
1) build a signaling mechanism independent of etcd. This could be as simple as
a “pulse” to the client to reissue a GET with quorum=true for the most recent
version of the data.
2) poll on the `/v2/keys` endpoint and check that the raft-index is increasing every
timeout.
## 4) What is a proxy used for?
A proxy is a redirection server to the etcd cluster. The proxy handles the
redirection of a client to the current configuration of the etcd cluster. A
typical usecase is to start a proxy on a machine, and on first boot up of the
proxy specify both the `--proxy` flag and the `--initial-cluster` flag.
From there, any etcdctl client that starts up automatically speaks to the local
proxy and the proxy redirects operations to the current configuration of the
cluster it was originally paired with.
In the v2 spec of etcd, proxies cannot be promoted to members of the cluster.
They also cannot be promoted to followers or at any point become part of the
replication of the etcd cluster itself.
## 5) How is cluster membership and health handled in etcd v2?
The design goal of etcd is that reconfiguration is simply an API, and health
monitoring and addition/removal of members is up to the individual application
and their integration with the reconfiguration API.
Thus, a member that is down, even infinitely, will never be automatically
removed from the etcd cluster member list.
This makes sense because its usually an application level / administrative
action to determine whether a reconfiguration should happen based on health.
For more information, refer to [Documentation/runtime-reconfiguration.md].
## 6) how does --peers work with etcdctl?
The `--peers` flag can specify any number of etcd cluster members in a comma
separated list. This list might be a subset, equal to, or more than the actual
etcd cluster member list itself.
If only one peer is specified via the `--peers` flag, the etcdctl discovers the
rest of the cluster via the member list of that one peer, and then it randomly
chooses a member to use. Again, the client can use the `quorum=true` flag on
reads, which will always fail when using a member in the minority.
If peers from multiple clusters are specified via the `--peers` flag, etcdctl
will randomly choose a peer, and the request will simply get routed to one of
the clusters. This is probably not what you want.

View File

@ -45,6 +45,7 @@
**C libraries**
- [jdarcy/etcd-api](https://github.com/jdarcy/etcd-api) - Supports v2
- [shafreeck/cetcd](https://github.com/shafreeck/cetcd) - Supports v2
**C++ libraries**
- [edwardcapriolo/etcdcpp](https://github.com/edwardcapriolo/etcdcpp) - Supports v2

View File

@ -1,12 +1,12 @@
## Proxy
etcd can now run as a transparent proxy. Running etcd as a proxy allows for easily discovery of etcd within your infrastructure, since it can run on each machine as a local service. In this mode, etcd acts as a reverse proxy and forwards client requests to an active etcd cluster. The etcd proxy does not participant in the consensus replication of the etcd cluster, thus it neither increases the resilience nor decreases the write performance of the etcd cluster.
etcd can now run as a transparent proxy. Running etcd as a proxy allows for easily discovery of etcd within your infrastructure, since it can run on each machine as a local service. In this mode, etcd acts as a reverse proxy and forwards client requests to an active etcd cluster. The etcd proxy does not participate in the consensus replication of the etcd cluster, thus it neither increases the resilience nor decreases the write performance of the etcd cluster.
etcd currently supports two proxy modes: `readwrite` and `readonly`. The default mode is `readwrite`, which forwards both read and write requests to the etcd cluster. A `readonly` etcd proxy only forwards read requests to the etcd cluster, and returns `HTTP 501` to all write requests.
The proxy will shuffle the list of cluster members periodically to avoid sending all connections to a single member.
The member list used by proxy consists of all client URLs advertised within the cluster, as specified in each members' `-advertise-client-urls` flag. If this flag is set incorrectly, requests sent to the proxy are forwarded to wrong addresses and then fail. The fix for this problem is to restart etcd member with correct `-advertise-client-urls` flag. After client URLs list in proxy is recalculated, which happens every 30 seconds, requests will be forwarded correctly.
The member list used by proxy consists of all client URLs advertised within the cluster, as specified in each members' `-advertise-client-urls` flag. If this flag is set incorrectly, requests sent to the proxy are forwarded to wrong addresses and then fail. Including URLs in the `-advertise-client-urls` flag that point to the proxy itself, e.g. http://localhost:2379, is even more problematic as it will cause loops, because the proxy keeps trying to forward requests to itself until its resources (memory, file descriptors) are eventually depleted. The fix for this problem is to restart etcd member with correct `-advertise-client-urls` flag. After client URLs list in proxy is recalculated, which happens every 30 seconds, requests will be forwarded correctly.
### Using an etcd proxy
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery`).
@ -17,6 +17,7 @@ The proxy will be listening on `listen-client-urls` and forward requests to the
#### Start an etcd proxy with a static configuration
To start a proxy that will connect to a statically defined etcd cluster, specify the `initial-cluster` flag:
```
etcd -proxy on -listen-client-urls http://127.0.0.1:8080 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
```

View File

@ -0,0 +1,43 @@
## Reporting Bugs
If you find bugs or documentation mistakes in etcd project, please let us know by [opening an issue](https://github.com/coreos/etcd/issues/new). We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check there that one does not already exist.
To make your bug report accurate and easy to understand, please try to create bug reports that are:
- Specific. Include as much details as possible: which version, what environment, what configuration, etc. You can also attach etcd log (the starting log with etcd configuration is especially important).
- Reproducible. Include the steps to reproduce the problem. We understand some issues might be hard to reproduce, please includes the steps that might lead to the problem. You can also attach the affected etcd data dir and stack strace to the bug report.
- Isolated. Please try to isolate and reproduce the bug with minimum dependencies. It would significantly slow down the speed to fix a bug if too many dependencies are involved in a bug report. Debugging external systems that rely on etcd is out of scope, but we are happy to point you in the right direction or help you interact with etcd in the correct manner.
- Unique. Do not duplicate existing bug report.
- Scoped. One bug per report. Do not follow up with another bug inside one report.
You might also want to read [Elika Etemads article on filing good bug reports](http://fantasai.inkedblade.net/style/talks/filing-good-bugs/) before creating a bug report.
We might ask you for further information to locate a bug. A duplicated bug report will be closed.
## Frequently Asked Questions
### How to get stack trace
``` bash
$ kill -QUIT $PID
```
### How to get etcd version
``` bash
$ etcd --version
```
### How to get etcd configuration and log when it runs as systemd service etcd2.service
``` bash
$ sudo systemctl cat etcd2
$ sudo journalctl -u etcd2
```
Due to an upstream systemd bug, journald may miss the last few log lines when its process exit. If journalctl tells you that etcd stops without fatal or panic message, you could try `sudo journalctl -f -t etcd2` to get full log.

View File

@ -14,14 +14,14 @@
- more efficient/ low cost keep alive
- a logical group of TTL keys
5. Replace CAS/CAD with multi-object Tnx
5. Replace CAS/CAD with multi-object Txn
- MUCH MORE powerful and flexible
6. Support efficient watching with multiple ranges
7. RPC API supports the completed set of APIs.
- more efficient than JSON/HTTP
- additional tnx/lease support
- additional txn/lease support
8. HTTP API supports a subset of APIs.
- easy for people to try out etcd
@ -42,7 +42,7 @@ Put( PutRequest { key = foo, value = bar } )
PutResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 1,
revision = 1,
raft_term = 0x1,
}
```
@ -54,14 +54,14 @@ Get ( RangeRequest { key = foo } )
RangeResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 1,
revision = 1,
raft_term = 0x1,
kvs = {
{
key = foo,
value = bar,
create_index = 1,
mod_index = 1,
create_revision = 1,
mod_revision = 1,
version = 1;
},
},
@ -75,35 +75,35 @@ Range ( RangeRequest { key = foo, end_key = foo80, limit = 30 } )
RangeResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 100,
revision = 100,
raft_term = 0x1,
kvs = {
{
key = foo0,
value = bar0,
create_index = 1,
mod_index = 1,
create_revision = 1,
mod_revision = 1,
version = 1;
},
...,
{
key = foo30,
value = bar30,
create_index = 30,
mod_index = 30,
create_revision = 30,
mod_revision = 30,
version = 1;
},
},
}
```
#### Finish a tnx (assume we have foo0=bar0, foo1=bar1)
#### Finish a txn (assume we have foo0=bar0, foo1=bar1)
```
Tnx(TnxRequest {
// mod_index of foo0 is equal to 1, mod_index of foo1 is greater than 1
Txn(TxnRequest {
// mod_revision of foo0 is equal to 1, mod_revision of foo1 is greater than 1
compare = {
{compareType = equal, key = foo0, mod_index = 1},
{compareType = greater, key = foo1, mod_index = 1}}
{compareType = equal, key = foo0, mod_revision = 1},
{compareType = greater, key = foo1, mod_revision = 1}}
},
// if the comparison succeeds, put foo2 = bar2
success = {PutRequest { key = foo2, value = success }},
@ -111,10 +111,10 @@ Tnx(TnxRequest {
failure = {PutRequest { key = foo2, value = failure }},
)
TnxResponse {
TxnResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 3,
revision = 3,
raft_term = 0x1,
succeeded = true,
responses = {
@ -122,7 +122,7 @@ TnxResponse {
{
cluster_id = 0x1000,
member_id = 0x1,
index = 3,
revision = 3,
raft_term = 0x1,
}
}
@ -135,8 +135,8 @@ TnxResponse {
Watch( WatchRequest{
key = foo,
end_key = fop, // prefix foo
start_index = 20,
end_index = 10000,
start_revision = 20,
end_revision = 10000,
// server decided notification frequency
progress_notification = true,
}
@ -147,14 +147,14 @@ Watch( WatchRequest{
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 3,
revision = 3,
raft_term = 0x1,
event_type = put,
kv = {
key = foo0,
value = bar0,
create_index = 1,
mod_index = 1,
create_revision = 1,
mod_revision = 1,
version = 1;
},
}
@ -164,7 +164,7 @@ WatchResponse {
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 2000,
revision = 2000,
raft_term = 0x1,
// nil event as notification
}
@ -175,14 +175,14 @@ WatchResponse {
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 3000,
revision = 3000,
raft_term = 0x1,
event_type = put,
kv = {
key = foo0,
value = bar3000,
create_index = 1,
mod_index = 3000,
create_revision = 1,
mod_revision = 3000,
version = 2;
},
}

View File

@ -6,19 +6,19 @@ service etcd {
rpc Range(RangeRequest) returns (RangeResponse) {}
// Put puts the given key into the store.
// A put request increases the index of the store,
// A put request increases the revision of the store,
// and generates one event in the event history.
rpc Put(PutRequest) returns (PutResponse) {}
// Delete deletes the given range from the store.
// A delete request increase the index of the store,
// A delete request increase the revision of the store,
// and generates one event in the event history.
rpc DeleteRange(DeleteRangeRequest) returns (DeleteRangeResponse) {}
// Tnx processes all the requests in one transaction.
// A tnx request increases the index of the store,
// and generates events with the same index in the event history.
rpc Tnx(TnxRequest) returns (TnxResponse) {}
// Txn processes all the requests in one transaction.
// A txn request increases the revision of the store,
// and generates events with the same revision in the event history.
rpc Txn(TxnRequest) returns (TxnResponse) {}
// Watch watches the events happening or happened in etcd. Both input and output
// are stream. One watch rpc can watch for multiple ranges and get a stream of
@ -41,10 +41,10 @@ service etcd {
// LeaseAttach attaches keys with a lease.
rpc LeaseAttach(LeaseAttachRequest) returns (LeaseAttachResponse) {}
// LeaseTnx likes Tnx. It has two addition success and failure LeaseAttachRequest list.
// If the Tnx is successful, then the success list will be executed. Or the failure list
// LeaseTxn likes Txn. It has two addition success and failure LeaseAttachRequest list.
// If the Txn is successful, then the success list will be executed. Or the failure list
// will be executed.
rpc LeaseTnx(LeaseTnxRequest) returns (LeaseTnxResponse) {}
rpc LeaseTxn(LeaseTxnRequest) returns (LeaseTxnResponse) {}
// KeepAlive keeps the lease alive.
rpc LeaseKeepAlive(stream LeaseKeepAliveRequest) returns (stream LeaseKeepAliveResponse) {}
@ -52,51 +52,54 @@ service etcd {
message ResponseHeader {
// an error type message?
optional string error = 1;
optional uint64 cluster_id = 2;
optional uint64 member_id = 3;
// index of the store when the request was applied.
optional int64 index = 4;
string error = 1;
uint64 cluster_id = 2;
uint64 member_id = 3;
// revision of the store when the request was applied.
int64 revision = 4;
// term of raft when the request was applied.
optional uint64 raft_term = 5;
uint64 raft_term = 5;
}
message RangeRequest {
// if the range_end is not given, the request returns the key.
optional bytes key = 1;
bytes key = 1;
// if the range_end is given, it gets the keys in range [key, range_end).
optional bytes range_end = 2;
bytes range_end = 2;
// limit the number of keys returned.
optional int64 limit = 3;
// the response will be consistent with previous request with same token if the token is
// given and is vaild.
optional bytes consistent_token = 4;
int64 limit = 3;
// range over the store at the given revision.
// if revision is less or equal to zero, range over the newest store.
// if the revision has been compacted, ErrCompaction will be returned in
// response.
int64 revision = 4;
}
message RangeResponse {
optional ResponseHeader header = 1;
repeated KeyValue kvs = 2;
optional bytes consistent_token = 3;
ResponseHeader header = 1;
repeated storagepb.KeyValue kvs = 2;
// more indicates if there are more keys to return in the requested range.
bool more = 3;
}
message PutRequest {
optional bytes key = 1;
optional bytes value = 2;
bytes key = 1;
bytes value = 2;
}
message PutResponse {
optional ResponseHeader header = 1;
ResponseHeader header = 1;
}
message DeleteRangeRequest {
// if the range_end is not given, the request deletes the key.
optional bytes key = 1;
bytes key = 1;
// if the range_end is given, it deletes the keys in range [key, range_end).
optional bytes range_end = 2;
bytes range_end = 2;
}
message DeleteRangeResponse {
optional ResponseHeader header = 1;
ResponseHeader header = 1;
}
message RequestUnion {
@ -109,38 +112,44 @@ message RequestUnion {
message ResponseUnion {
oneof response {
RangeResponse reponse_range = 1;
RangeResponse response_range = 1;
PutResponse response_put = 2;
DeleteRangeResponse response_delete_range = 3;
}
}
message Compare {
enum CompareType {
enum CompareResult {
EQUAL = 0;
GREATER = 1;
LESS = 2;
}
optional CompareType type = 1;
enum CompareTarget {
VERSION = 0;
CREATE = 1;
MOD = 2;
VALUE= 3;
}
CompareResult result = 1;
CompareTarget target = 2;
// key path
optional bytes key = 2;
oneof target {
bytes key = 3;
oneof target_union {
// version of the given key
int64 version = 3;
// create index of the given key
int64 create_index = 4;
// last modified index of the given key
int64 mod_index = 5;
int64 version = 4;
// create revision of the given key
int64 create_revision = 5;
// last modified revision of the given key
int64 mod_revision = 6;
// value of the given key
bytes value = 6;
bytes value = 7;
}
}
// First all the compare requests are processed.
// If all the compare succeed, all the success
// requests will be processed.
// Or all the failure requests will be processed and
// all the errors in the comparison will be returned.
// If the comparisons succeed, then the success requests will be processed in order,
// and the response will contain their respective responses in order.
// If the comparisons fail, then the failure requests will be processed in order,
// and the response will contain their respective responses in order.
// From google paxosdb paper:
// Our implementation hinges around a powerful primitive which we call MultiOp. All other database
@ -157,44 +166,44 @@ message Compare {
// if guard evaluates to
// true.
// 3. A list of database operations called f op. Like t op, but executed if guard evaluates to false.
message TnxRequest {
message TxnRequest {
repeated Compare compare = 1;
repeated RequestUnion success = 2;
repeated RequestUnion failure = 3;
}
message TnxResponse {
optional ResponseHeader header = 1;
optional bool succeeded = 2;
message TxnResponse {
ResponseHeader header = 1;
bool succeeded = 2;
repeated ResponseUnion responses = 3;
}
message KeyValue {
optional bytes key = 1;
// mod_index is the last modified index of the key.
optional int64 create_index = 2;
optional int64 mod_index = 3;
bytes key = 1;
int64 create_revision = 2;
// mod_revision is the last modified revision of the key.
int64 mod_revision = 3;
// version is the version of the key. A deletion resets
// the version to zero and any modification of the key
// increases its version.
optional int64 version = 4;
optional bytes value = 5;
int64 version = 4;
bytes value = 5;
}
message WatchRangeRequest {
// if the range_end is not given, the request returns the key.
optional bytes key = 1;
bytes key = 1;
// if the range_end is given, it gets the keys in range [key, range_end).
optional bytes range_end = 2;
// start_index is an optional index (including) to watch from. No start_index is "now".
optional int64 start_index = 3;
// end_index is an optional index (excluding) to end watch. No end_index is "forever".
optional int64 end_index = 4;
optional bool progress_notification = 5;
bytes range_end = 2;
// start_revision is an optional revision (including) to watch from. No start_revision is "now".
int64 start_revision = 3;
// end_revision is an optional revision (excluding) to end watch. No end_revision is "forever".
int64 end_revision = 4;
bool progress_notification = 5;
}
message WatchRangeResponse {
optional ResponseHeader header = 1;
ResponseHeader header = 1;
repeated Event events = 2;
}
@ -204,69 +213,73 @@ message Event {
DELETE = 1;
EXPIRE = 2;
}
optional EventType event_type = 1;
EventType event_type = 1;
// a put event contains the current key-value
// a delete/expire event contains the previous
// key-value
optional KeyValue kv = 2;
KeyValue kv = 2;
}
// Compaction compacts the kv store upto the given revision (including).
// It removes the old versions of a key. It keeps the newest version of
// the key even if its latest modification revision is smaller than the given
// revision.
message CompactionRequest {
optional int64 index = 1;
int64 revision = 1;
}
message CompactionResponse {
optional ResponseHeader header = 1;
ResponseHeader header = 1;
}
message LeaseCreateRequest {
// advisory ttl in seconds
optional int64 ttl = 1;
int64 ttl = 1;
}
message LeaseCreateResponse {
optional ResponseHeader header = 1;
optional int64 lease_id = 2;
ResponseHeader header = 1;
int64 lease_id = 2;
// server decided ttl in second
optional int64 ttl = 3;
optional string error = 4;
int64 ttl = 3;
string error = 4;
}
message LeaseRevokeRequest {
optional int64 lease_id = 1;
int64 lease_id = 1;
}
message LeaseRevokeResponse {
optional ResponseHeader header = 1;
ResponseHeader header = 1;
}
message LeaseTnxRequest {
optional TnxRequest request = 1;
message LeaseTxnRequest {
TxnRequest request = 1;
repeated LeaseAttachRequest success = 2;
repeated LeaseAttachRequest failure = 3;
}
message LeaseTnxResponse {
optional ResponseHeader header = 1;
optional TnxResponse response = 2;
message LeaseTxnResponse {
ResponseHeader header = 1;
TxnResponse response = 2;
repeated LeaseAttachResponse attach_responses = 3;
}
message LeaseAttachRequest {
optional int64 lease_id = 1;
optional bytes key = 2;
int64 lease_id = 1;
bytes key = 2;
}
message LeaseAttachResponse {
optional ResponseHeader header = 1;
ResponseHeader header = 1;
}
message LeaseKeepAliveRequest {
optional int64 lease_id = 1;
int64 lease_id = 1;
}
message LeaseKeepAliveResponse {
optional ResponseHeader header = 1;
optional int64 lease_id = 2;
optional int64 ttl = 3;
ResponseHeader header = 1;
int64 lease_id = 2;
int64 ttl = 3;
}

View File

@ -4,6 +4,8 @@ etcd comes with support for incremental runtime reconfiguration, which allows us
Reconfiguration requests can only be processed when the the majority of the cluster members are functioning. It is **highly recommended** to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to [restart from majority failure][majority failure].
To better understand the design behind runtime reconfiguration, we suggest you read [this](runtime-reconf-design.md).
[majority failure]: #restart-cluster-from-majority-failure
## Reconfiguration Use Cases
@ -37,7 +39,7 @@ To replace the machine, follow the instructions for [removing the member][remove
### Restart Cluster from Majority Failure
If the majority of your cluster is lost, then you need to take manual action in order to recover safely.
If the majority of your cluster is lost or all of your nodes have changed IP addresses, then you need to take manual action in order to recover safely.
The basic steps in the recovery process include [creating a new cluster using the old data][disaster recovery], forcing a single member to act as the leader, and finally using runtime configuration to [add new members][add member] to this new cluster one at a time.
[add member]: #add-a-new-member
@ -52,28 +54,38 @@ This is essentially the same requirement as for any other write to etcd.
All changes to the cluster are done one at a time:
To replace a single member you will make an add then a remove operation
To increase from 3 to 5 members you will make two add operations
To decrease from 5 to 3 you will make two remove operations
* To update a single member peerURLs you will make an update operation
* To replace a single member you will make an add then a remove operation
* To increase from 3 to 5 members you will make two add operations
* To decrease from 5 to 3 you will make two remove operations
All of these examples will use the `etcdctl` command line tool that ships with etcd.
If you want to use the member API directly you can find the documentation [here](other_apis.md).
### Remove a Member
### Update a Member
First, we need to find the target member's ID. You can list all members with `etcdctl`:
If you would like to update a member IP address (peerURLs), first, we need to find the target member's ID. You can list all members with `etcdctl`:
```
```sh
$ etcdctl member list
6e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:7002 clientURLs=http://127.0.0.1:4002
924e2e83e93f2560: name=node3 peerURLs=http://localhost:7003 clientURLs=http://127.0.0.1:4003
a8266ecf031671f3: name=node1 peerURLs=http://localhost:7001 clientURLs=http://127.0.0.1:4001
6e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:23802 clientURLs=http://127.0.0.1:23792
924e2e83e93f2560: name=node3 peerURLs=http://localhost:23803 clientURLs=http://127.0.0.1:23793
a8266ecf031671f3: name=node1 peerURLs=http://localhost:23801 clientURLs=http://127.0.0.1:23791
```
In this example let's `update` a8266ecf031671f3 member ID and change its peerURLs value to http://10.0.1.10:2380
```sh
$ etcdctl member update a8266ecf031671f3 http://10.0.1.10:2380
Updated member with ID a8266ecf031671f3 in cluster
```
### Remove a Member
Let us say the member ID we want to remove is a8266ecf031671f3.
We then use the `remove` command to perform the removal:
```
```sh
$ etcdctl member remove a8266ecf031671f3
Removed member a8266ecf031671f3 from cluster
```
@ -95,7 +107,7 @@ Adding a member is a two step process:
Using `etcdctl` let's add the new member to the cluster by specifying its [name](configuration.md#-name) and [advertised peer URLs](configuration.md#-initial-advertise-peer-urls):
```
```sh
$ etcdctl member add infra3 http://10.0.1.13:2380
added member 9bf1b35fc7761a23 to cluster
@ -107,11 +119,11 @@ ETCD_INITIAL_CLUSTER_STATE=existing
`etcdctl` has informed the cluster about the new member and printed out the environment variables needed to successfully start it.
Now start the new etcd process with the relevant flags for the new member:
```
```sh
$ export ETCD_NAME="infra3"
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=existing
$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379 -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380
$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379 -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380 -data-dir %data_dir%
```
The new member will run as a part of the cluster and immediately begin catching up with the rest of the cluster.
@ -124,7 +136,7 @@ If you add a new member to a 1-node cluster, the cluster cannot make progress be
In the following case we have not included our new host in the list of enumerated nodes.
If this is a new cluster, the node must be added to the list of initial cluster members.
```
```sh
$ etcd -name infra3 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state existing
@ -134,7 +146,7 @@ exit 1
In this case we give a different address (10.0.1.14:2380) to the one that we used to join the cluster (10.0.1.13:2380).
```
```sh
$ etcd -name infra4 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra4=http://10.0.1.14:2380 \
-initial-cluster-state existing
@ -144,7 +156,7 @@ exit 1
When we start etcd using the data directory of a removed member, etcd will exit automatically if it connects to any alive member in the cluster:
```
```sh
$ etcd
etcd: this member has been permanently removed from the cluster. Exiting.
exit 1

View File

@ -0,0 +1,47 @@
### Design of Runtime Reconfiguration
Runtime reconfiguration is one of the hardest and most error prone features in a distributed system, especially in a consensus based system like etcd.
Read on to learn about the design of etcd's runtime reconfiguration commands and how we tackled these problems.
### Two Phase Config Changes Keep you Safe
In etcd, every runtime reconfiguration has to go through [two phases](Documentation/runtime-configuration.md#add-a-new-member) for safety reasons. For example, to add a member you need to first inform cluster of new configuration and then start the new member.
Phase 1 - Inform cluster of new configuration
To add a member into etcd cluster, you need to make an API call to request a new member to be added to the cluster. And this is only way that you can add a new member into an existing cluster. The API call returns when the cluster agrees on the configuration change.
Phase 2 - Start new member
To join the etcd member into the existing cluster, you need to specify the correct `initial-cluster` and set `initial-cluster-state` to `existing`. When the member starts, it will contact the existing cluster first and verify the current cluster configuration matches the expected one specified in `initial-cluster`. When the new member successfully starts, you know your cluster reached the expected configuration.
By splitting the process into two discrete phases users are forced to be explicit regarding cluster membership changes. This actually gives users more flexibility and makes things easier to reason about. For example, if there is an attempt to add a new member with the same ID as an existing member in an etcd cluster, the action will fail immediately during phase one without impacting the running cluster. Similar protection is provided to prevent adding new members by mistake. If a new etcd member attempts to join the cluster before the cluster has accepted the configuration change,, it will not be accepted by the cluster.
Without the explicit workflow around cluster membership etcd would be vulnerable to unexpected cluster membership changes. For example, if etcd is running under an init system such as systemd, etcd would be restarted after being removed via the membership API, and attempt to rejoin the cluster on startup. This cycle would continue every time a member is removed via the API and systemd is set to restart etcd after failing, which is unexpected.
We think runtime reconfiguration should be a low frequent operation. We made the decision to keep it explicit and user-driven to ensure configuration safety and keep your cluster always running smoothly under your control.
### Permanent Loss of Quorum Requires New Cluster
If a cluster permanently loses a majority of its members, a new cluster will need to be started from an old data directory to recover the previous state.
It is entirely possible to force removing the failed members from the existing cluster to recover. However, we decided not to support this method since it bypasses the normal consensus committing phase, which is unsafe. If the member to remove is not actually dead or you force to remove different members through different members in the same cluster, you will end up with diverged cluster with same clusterID. This is very dangerous and hard to debug/fix afterwards.
If you have a correct deployment, the possibility of permanent majority lose is very low. But it is a severe enough problem that worth special care. We strongly suggest you to read the [disaster recovery documentation](admin_guide.md#disaster-recovery) and prepare for permanent majority lose before you put etcd into production.
### Do Not Use Public Discovery Service For Runtime Reconfiguration
The public discovery service should only be used for bootstrapping a cluster. To join member into an existing cluster, you should use runtime reconfiguration API.
Discovery service is designed for bootstrapping an etcd cluster in the cloud environment, when you do not know the IP addresses of all the members beforehand. After you successfully bootstrap a cluster, the IP addresses of all the members are known. Technically, you should not need the discovery service any more.
It seems that using public discovery service is a convenient way to do runtime reconfiguration, after all discovery service already has all the cluster configuration information. However relying on public discovery service brings troubles:
1. it introduces a external dependencies for the entire life-cycle of your cluster, not just bootstrap time. If there is a network issue between your cluster and public discover service, your cluster will suffer from it.
2. public discovery service must reflect correct runtime configuration of your cluster during it life-cycle. It has to provide security mechanism to avoid bad actions, and it is hard.
3. public discovery service has to keep tens of thousands of cluster configurations. Our public discovery service backend is not ready for that workload.
If you want to have a discovery service that supports runtime reconfiguration, the best choice is to build your private one.

View File

@ -4,7 +4,7 @@ etcd supports SSL/TLS as well as authentication through client certificates, bot
To get up and running you first need to have a CA certificate and a signed key pair for one member. It is recommended to create and sign a new key pair for every member in a cluster.
For convenience the [etcd-ca](https://github.com/coreos/etcd-ca) tool provides an easy interface to certificate generation, alternatively this site provides a good reference on how to generate self-signed key pairs:
For convenience the [cfssl](https://github.com/cloudflare/cfssl) tool provides an easy interface to certificate generation, and we provide a full example using the tool at [here](../hack/tls-setup). Alternatively this site provides a good reference on how to generate self-signed key pairs:
http://www.g-loaded.eu/2005/11/10/be-your-own-ca/

View File

@ -10,7 +10,7 @@ The network isn't the only source of latency. Each request and response may be i
The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.
The first parameter is called the *Heartbeat Interval*.
This is the frequency with which the leader will notify followers that it is still the leader.
etcd batches commands together for higher throughput so this heartbeat interval is also a delay for how long it takes for commands to be committed.
For best pratices, the parameter should be set around round-trip time between members.
By default, etcd uses a `100ms` heartbeat interval.
The second parameter is the *Election Timeout*.
@ -18,16 +18,22 @@ This timeout is how long a follower node will go without hearing a heartbeat bef
By default, etcd uses a `1000ms` election timeout.
Adjusting these values is a trade off.
Lowering the heartbeat interval will cause individual commands to be committed faster but it will lower the overall throughput of etcd.
If your etcd instances have low utilization then lowering the heartbeat interval can improve your command response time.
The value of heartbeat interval is recommended to be around the maximum of average round-trip time (RTT) between members, normally around 0.5-1.5x the round-trip time.
If heartbeat interval is too low, etcd will send unnecessary messages that increase the usage of CPU and network resources.
On the other side, a too high heartbeat interval leads to high election timeout. Higher election timeout takes longer time to detect a leader failure.
The easiest way to measure round-trip time (RTT) is to use [PING utility](https://en.wikipedia.org/wiki/Ping_(networking_utility)).
The election timeout should be set based on the heartbeat interval and your network ping time between nodes.
Election timeouts should be at least 10 times your ping time so it can account for variance in your network.
For example, if the ping time between your nodes is 10ms then you should have at least a 100ms election timeout.
The election timeout should be set based on the heartbeat interval and average round-trip time between members.
Election timeouts must be at least 10 times the round-trip time so it can account for variance in your network.
For example, if the round-trip time between your members is 10ms then you should have at least a 100ms election timeout.
The upper limit of election timeout is 50000ms, which should only be used when deploying global etcd cluster. First, 5s is the upper limit of average global round-trip time. A reasonable round-trip time for the continental united states is 130ms, and the time between US and japan is around 350-400ms. Because package gets delayed a lot, and network situation may be terrible, 5s is a safe value for it. Then, because election timeout should be an order of magnitude bigger than broadcast time, 50s becomes its maximum.
You should also set your election timeout to at least 5 to 10 times your heartbeat interval to account for variance in leader replication.
For a heartbeat interval of 50ms you should set your election timeout to at least 250ms - 500ms.
The heartbeat interval and election timeout value should be the same for all members in one cluster. Setting different values for etcd members may disrupt cluster stability.
You can override the default values on the command line:
```sh

View File

@ -0,0 +1,128 @@
## Upgrade etcd from 2.1 to 2.2
In the general case, upgrading from etcd 2.1 to 2.2 can be a zero-downtime, rolling upgrade:
- one by one, stop the etcd v2.1 processes and replace them with etcd v2.2 processes
- after you are running all v2.2 processes, new features in v2.2 are available to the cluster
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
### Upgrade Checklists
#### Upgrade Requirement
To upgrade an existing etcd deployment to 2.2, you must be running 2.1. If youre running a version of etcd before 2.1, you must upgrade to [2.1](https://github.com/coreos/etcd/releases/tag/v2.1.2) before upgrading to 2.2.
Also, to ensure a smooth rolling upgrade, your running cluster must be healthy. You can check the health of the cluster by using `etcdctl cluster-health` command.
#### Preparedness
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
You might also want to [backup your data directory](admin_guide.md#backing-up-the-datastore) for a potential [downgrade](#downgrade).
#### Mixed Versions
While upgrading, an etcd cluster supports mixed versions of etcd members. The cluster is only considered upgraded once all its members are upgraded to 2.2.
Internally, etcd members negotiate with each other to determine the overall etcd cluster version, which controls the reported cluster version and the supported features.
#### Limitations
If you have a data size larger than 100MB you should contact us before upgrading, so we can make sure the upgrades work smoothly.
Every etcd 2.2 member will do health checking across the cluster periodically. etcd 2.1 member does not support health checking. During the upgrade, etcd 2.2 member will log warning about the unhealthy state of etcd 2.1 member. You can ignore the warning.
#### Downgrade
If all members have been upgraded to v2.2, the cluster will be upgraded to v2.2, and downgrade is **not possible**. If any member is still v2.1, the cluster will remain in v2.1, and you can go back to use v2.1 binary.
Please [backup your data directory](admin_guide.md#backing-up-the-datastore) of all etcd members if you want to downgrade the cluster, even if it is upgraded.
### Upgrade Procedure
In the example, we upgrade a three member v2.1 cluster running on local machine.
#### 1. Check upgrade requirements.
```
$ etcdctl cluster-health
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
cluster is healthy
$ curl http://localhost:4001/version
{"etcdserver":"2.1.x","etcdcluster":"2.1.0"}
```
#### 2. Stop the existing etcd process
You will see similar error logging from other etcd processes in your cluster. This is normal, since you just shut down a member and the connection is broken.
```
2015/09/2 09:48:35 etcdserver: failed to reach the peerURL(http://localhost:12380) of member a8266ecf031671f3 (Get http://localhost:12380/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:35 etcdserver: cannot get the version of member a8266ecf031671f3 (Get http://localhost:12380/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:35 rafthttp: failed to write a8266ecf031671f3 on stream Message (write tcp 127.0.0.1:32380->127.0.0.1:64394: write: broken pipe)
2015/09/2 09:48:35 rafthttp: failed to write a8266ecf031671f3 on pipeline (dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:40 etcdserver: failed to reach the peerURL(http://localhost:7001) of member a8266ecf031671f3 (Get http://localhost:7001/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:40 etcdserver: cannot get the version of member a8266ecf031671f3 (Get http://localhost:12380/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:40 rafthttp: failed to heartbeat a8266ecf031671f3 on stream MsgApp v2 (write tcp 127.0.0.1:32380->127.0.0.1:64393: write: broken pipe)
```
You will see logging output like this from ungraded member due to a mixed version cluster. You can ignore this while upgrading.
```
2015/09/2 09:48:45 etcdserver: the etcd version 2.1.2+git is not up-to-date
2015/09/2 09:48:45 etcdserver: member a8266ecf031671f3 has a higher version &{2.2.0-rc.0+git 2.1.0}
```
You will also see logging output like this from the newly upgraded member, since etcd 2.1 member does not support health checking. You can ignore this while upgrading.
```
2015-09-02 09:55:42.691384 W | rafthttp: the connection to peer 6e3bd23ae5f1eae0 is unhealthy
2015-09-02 09:55:42.705626 W | rafthttp: the connection to peer 924e2e83e93f2560 is unhealthy
```
You could [backup your data directory](https://github.com/coreos/etcd/blob/7f7e2cc79d9c5c342a6eb1e48c386b0223cf934e/Documentation/admin_guide.md#backing-up-the-datastore) for data safety.
```
$ etcdctl backup \
--data-dir /var/lib/etcd \
--backup-dir /tmp/etcd_backup
```
#### 3. Drop-in etcd v2.2 binary and start the new etcd process
Now, you can start the etcd v2.2 binary with the previous configuration.
You will see the etcd start and publish its information to the cluster.
```
2015-09-02 09:56:46.117609 I | etcdserver: published {Name:infra2 ClientURLs:[http://localhost:22380]} to cluster e9c7614f68f35fb2
```
You could verify the cluster becomes healthy.
```
$ etcdctl cluster-health
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
cluster is healthy
```
#### 4. Repeat step 2 to step 3 for all other members
#### 5. Finish
When all members are upgraded, you will see the cluster is upgraded to 2.2 successfully:
```
2015-09-02 09:56:54.896848 N | etcdserver: updated the cluster version from 2.1 to 2.2
```
```
$ curl http://127.0.0.1:4001/version
{"etcdserver":"2.2.x","etcdcluster":"2.2.0"}
```

48
Godeps/Godeps.json generated
View File

@ -1,6 +1,6 @@
{
"ImportPath": "github.com/coreos/etcd",
"GoVersion": "go1.4.1",
"GoVersion": "go1.4.2",
"Packages": [
"./..."
],
@ -16,12 +16,12 @@
},
{
"ImportPath": "github.com/bgentry/speakeasy",
"Rev": "5dfe43257d1f86b96484e760f2f0c4e2559089c7"
"Rev": "36e9cfdd690967f4f690c6edcc9ffacd006014a0"
},
{
"ImportPath": "github.com/boltdb/bolt",
"Comment": "v1.0-71-g71f28ea",
"Rev": "71f28eaecbebd00604d87bb1de0dae8fcfa54bbd"
"Comment": "v1.0-119-g90fef38",
"Rev": "90fef389f98027ca55594edd7dbd6e7f3926fdad"
},
{
"ImportPath": "github.com/bradfitz/http2",
@ -32,18 +32,28 @@
"Comment": "1.2.0-26-gf7ebb76",
"Rev": "f7ebb761e83e21225d1d8954fde853bf8edd46c4"
},
{
"ImportPath": "github.com/coreos/go-etcd/etcd",
"Comment": "v2.0.0-13-g4cceaf7",
"Rev": "4cceaf7283b76f27c4a732b20730dcdb61053bf5"
},
{
"ImportPath": "github.com/coreos/go-semver/semver",
"Rev": "568e959cd89871e61434c1143528d9162da89ef2"
},
{
"ImportPath": "github.com/coreos/go-systemd/daemon",
"Comment": "v3-6-gcea488b",
"Rev": "cea488b4e6855fee89b6c22a811e3c5baca861b6"
},
{
"ImportPath": "github.com/coreos/go-systemd/journal",
"Comment": "v3-6-gcea488b",
"Rev": "cea488b4e6855fee89b6c22a811e3c5baca861b6"
},
{
"ImportPath": "github.com/coreos/go-systemd/util",
"Comment": "v3-6-gcea488b",
"Rev": "cea488b4e6855fee89b6c22a811e3c5baca861b6"
},
{
"ImportPath": "github.com/coreos/pkg/capnslog",
"Rev": "99f6e6b8f8ea30b0f82769c1411691c44a66d015"
"Rev": "42a8c3b1a6f917bb8346ef738f32712a7ca0ede7"
},
{
"ImportPath": "github.com/gogo/protobuf/proto",
@ -93,13 +103,21 @@
"ImportPath": "github.com/prometheus/procfs",
"Rev": "ee2372b58cee877abe07cde670d04d3b3bac5ee6"
},
{
"ImportPath": "github.com/rakyll/pb",
"Rev": "dc507ad06b7462501281bb4691ee43f0b1d1ec37"
},
{
"ImportPath": "github.com/stretchr/testify/assert",
"Rev": "9cc77fa25329013ce07362c7742952ff887361f2"
},
{
"ImportPath": "github.com/ugorji/go/codec",
"Rev": "821cda7e48749cacf7cad2c6ed01e96457ca7e9d"
"Rev": "5abd4e96a45c386928ed2ca2a7ef63e2533e18ec"
},
{
"ImportPath": "github.com/xiang90/probing",
"Rev": "6a0cc1ae81b4cc11db5e491e030e4b98fba79c19"
},
{
"ImportPath": "golang.org/x/crypto/bcrypt",
@ -113,10 +131,18 @@
"ImportPath": "golang.org/x/net/context",
"Rev": "7dbad50ab5b31073856416cdcfeb2796d682f844"
},
{
"ImportPath": "golang.org/x/net/netutil",
"Rev": "7dbad50ab5b31073856416cdcfeb2796d682f844"
},
{
"ImportPath": "golang.org/x/oauth2",
"Rev": "3046bc76d6dfd7d3707f6640f85e42d9c4050f50"
},
{
"ImportPath": "golang.org/x/sys/unix",
"Rev": "9c60d1c508f5134d1ca726b4641db998f2523357"
},
{
"ImportPath": "google.golang.org/cloud/compute/metadata",
"Rev": "f20d6dcccb44ed49de45ae3703312cb46e627db1"

View File

@ -4,7 +4,7 @@
// Original code is based on code by RogerV in the golang-nuts thread:
// https://groups.google.com/group/golang-nuts/browse_thread/thread/40cc41e9d9fc9247
// +build darwin freebsd linux netbsd openbsd
// +build darwin freebsd linux netbsd openbsd solaris
package speakeasy
@ -19,9 +19,8 @@ import (
const sttyArg0 = "/bin/stty"
var (
sttyArgvEOff []string = []string{"stty", "-echo"}
sttyArgvEOn []string = []string{"stty", "echo"}
ws syscall.WaitStatus = 0
sttyArgvEOff = []string{"stty", "-echo"}
sttyArgvEOn = []string{"stty", "echo"}
)
// getPassword gets input hidden from the terminal from a user. This is
@ -47,10 +46,11 @@ func getPassword() (password string, err error) {
}
// Turn on the terminal echo and stop listening for signals.
defer signal.Stop(sig)
defer close(brk)
defer echoOn(fd)
syscall.Wait4(pid, &ws, 0, nil)
syscall.Wait4(pid, nil, 0, nil)
line, err := readline()
if err == nil {
@ -76,7 +76,7 @@ func echoOn(fd []uintptr) {
// Turn on the terminal echo.
pid, e := syscall.ForkExec(sttyArg0, sttyArgvEOn, &syscall.ProcAttr{Dir: "", Files: fd})
if e == nil {
syscall.Wait4(pid, &ws, 0, nil)
syscall.Wait4(pid, nil, 0, nil)
}
}

View File

@ -1,3 +1,4 @@
*.prof
*.test
*.swp
/bin/

View File

@ -87,6 +87,11 @@ are not thread safe. To work with data in multiple goroutines you must start
a transaction for each one or use locking to ensure only one goroutine accesses
a transaction at a time. Creating transaction from the `DB` is thread safe.
Read-only transactions and read-write transactions should not depend on one
another and generally shouldn't be opened simultaneously in the same goroutine.
This can cause a deadlock as the read-write transaction needs to periodically
re-map the data file but it cannot do so while a read-only transaction is open.
#### Read-write transactions
@ -446,6 +451,21 @@ It's also useful to pipe these stats to a service such as statsd for monitoring
or to provide an HTTP endpoint that will perform a fixed-length sample.
### Read-Only Mode
Sometimes it is useful to create a shared, read-only Bolt database. To this,
set the `Options.ReadOnly` flag when opening your database. Read-only mode
uses a shared lock to allow multiple processes to read from the database but
it will block any processes from opening the database in read-write mode.
```go
db, err := bolt.Open("my.db", 0666, &bolt.Options{ReadOnly: true})
if err != nil {
log.Fatal(err)
}
```
## Resources
For more information on getting started with Bolt, check out the following articles:
@ -550,6 +570,11 @@ Here are a few things to note when evaluating and using Bolt:
However, this is expected and the OS will release memory as needed. Bolt can
handle databases much larger than the available physical RAM.
* The data structures in the Bolt database are memory mapped so the data file
will be endian specific. This means that you cannot copy a Bolt file from a
little endian machine to a big endian machine and have it work. For most
users this is not a concern since most modern CPUs are little endian.
* Because of the way pages are laid out on disk, Bolt cannot truncate data files
and return free pages back to the disk. Instead, Bolt maintains a free list
of unused pages within its data file. These free pages can be reused by later
@ -567,7 +592,7 @@ Here are a few things to note when evaluating and using Bolt:
Below is a list of public, open source projects that use Bolt:
* [Operation Go: A Routine Mission](http://gocode.io) - An online programming game for Golang using Bolt for user accounts and a leaderboard.
* [Bazil](https://github.com/bazillion/bazil) - A file system that lets your data reside where it is most convenient for it to reside.
* [Bazil](https://bazil.org/) - A file system that lets your data reside where it is most convenient for it to reside.
* [DVID](https://github.com/janelia-flyem/dvid) - Added Bolt as optional storage engine and testing it against Basho-tuned leveldb.
* [Skybox Analytics](https://github.com/skybox/skybox) - A standalone funnel analysis tool for web analytics.
* [Scuttlebutt](https://github.com/benbjohnson/scuttlebutt) - Uses Bolt to store and process all Twitter mentions of GitHub projects.
@ -587,5 +612,10 @@ Below is a list of public, open source projects that use Bolt:
* [SkyDB](https://github.com/skydb/sky) - Behavioral analytics database.
* [Seaweed File System](https://github.com/chrislusf/weed-fs) - Highly scalable distributed key~file system with O(1) disk read.
* [InfluxDB](http://influxdb.com) - Scalable datastore for metrics, events, and real-time analytics.
* [Freehold](http://tshannon.bitbucket.org/freehold/) - An open, secure, and lightweight platform for your files and data.
* [Prometheus Annotation Server](https://github.com/oliver006/prom_annotation_server) - Annotation server for PromDash & Prometheus service monitoring system.
* [Consul](https://github.com/hashicorp/consul) - Consul is service discovery and configuration made easy. Distributed, highly available, and datacenter-aware.
* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistant, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs.
* [drive](https://github.com/odeke-em/drive) - drive is an unofficial Google Drive command line client for \*NIX operating systems.
If you are using Bolt in a project please send a pull request to add it to the list.

View File

@ -20,6 +20,9 @@ import (
// take permanent effect only after a successful return is seen in
// caller.
//
// The maximum batch size and delay can be adjusted with DB.MaxBatchSize
// and DB.MaxBatchDelay, respectively.
//
// Batch is only useful when there are multiple goroutines calling it.
func (db *DB) Batch(fn func(*Tx) error) error {
errCh := make(chan error, 1)

View File

@ -1,4 +1,4 @@
// +build !windows,!plan9
// +build !windows,!plan9,!solaris
package bolt
@ -11,7 +11,7 @@ import (
)
// flock acquires an advisory lock on a file descriptor.
func flock(f *os.File, timeout time.Duration) error {
func flock(f *os.File, exclusive bool, timeout time.Duration) error {
var t time.Time
for {
// If we're beyond our timeout then return an error.
@ -21,9 +21,13 @@ func flock(f *os.File, timeout time.Duration) error {
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
flag := syscall.LOCK_SH
if exclusive {
flag = syscall.LOCK_EX
}
// Otherwise attempt to obtain an exclusive lock.
err := syscall.Flock(int(f.Fd()), syscall.LOCK_EX|syscall.LOCK_NB)
err := syscall.Flock(int(f.Fd()), flag|syscall.LOCK_NB)
if err == nil {
return nil
} else if err != syscall.EWOULDBLOCK {
@ -44,11 +48,13 @@ func funlock(f *os.File) error {
func mmap(db *DB, sz int) error {
// Truncate and fsync to ensure file size metadata is flushed.
// https://github.com/boltdb/bolt/issues/284
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("file resize error: %s", err)
}
if err := db.file.Sync(); err != nil {
return fmt.Errorf("file sync error: %s", err)
if !db.NoGrowSync && !db.readOnly {
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("file resize error: %s", err)
}
if err := db.file.Sync(); err != nil {
return fmt.Errorf("file sync error: %s", err)
}
}
// Map the data file to memory.
@ -57,6 +63,11 @@ func mmap(db *DB, sz int) error {
return err
}
// Advise the kernel that the mmap is accessed randomly.
if err := madvise(b, syscall.MADV_RANDOM); err != nil {
return fmt.Errorf("madvise: %s", err)
}
// Save the original byte slice and convert to a byte array pointer.
db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0]))
@ -78,3 +89,12 @@ func munmap(db *DB) error {
db.datasz = 0
return err
}
// NOTE: This function is copied from stdlib because it is not available on darwin.
func madvise(b []byte, advice int) (err error) {
_, _, e1 := syscall.Syscall(syscall.SYS_MADVISE, uintptr(unsafe.Pointer(&b[0])), uintptr(len(b)), uintptr(advice))
if e1 != 0 {
err = e1
}
return
}

View File

@ -0,0 +1,100 @@
package bolt
import (
"fmt"
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/sys/unix"
"os"
"syscall"
"time"
"unsafe"
)
// flock acquires an advisory lock on a file descriptor.
func flock(f *os.File, exclusive bool, timeout time.Duration) error {
var t time.Time
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Pid = 0
lock.Whence = 0
lock.Pid = 0
if exclusive {
lock.Type = syscall.F_WRLCK
} else {
lock.Type = syscall.F_RDLCK
}
err := syscall.FcntlFlock(f.Fd(), syscall.F_SETLK, &lock)
if err == nil {
return nil
} else if err != syscall.EAGAIN {
return err
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
}
}
// funlock releases an advisory lock on a file descriptor.
func funlock(f *os.File) error {
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Type = syscall.F_UNLCK
lock.Whence = 0
return syscall.FcntlFlock(uintptr(f.Fd()), syscall.F_SETLK, &lock)
}
// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
// Truncate and fsync to ensure file size metadata is flushed.
// https://github.com/boltdb/bolt/issues/284
if !db.NoGrowSync && !db.readOnly {
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("file resize error: %s", err)
}
if err := db.file.Sync(); err != nil {
return fmt.Errorf("file sync error: %s", err)
}
}
// Map the data file to memory.
b, err := unix.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED)
if err != nil {
return err
}
// Advise the kernel that the mmap is accessed randomly.
if err := unix.Madvise(b, syscall.MADV_RANDOM); err != nil {
return fmt.Errorf("madvise: %s", err)
}
// Save the original byte slice and convert to a byte array pointer.
db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0]))
db.datasz = sz
return nil
}
// munmap unmaps a DB's data file from memory.
func munmap(db *DB) error {
// Ignore the unmap if we have no mapped data.
if db.dataref == nil {
return nil
}
// Unmap using the original byte slice.
err := unix.Munmap(db.dataref)
db.dataref = nil
db.data = nil
db.datasz = 0
return err
}

View File

@ -16,7 +16,7 @@ func fdatasync(db *DB) error {
}
// flock acquires an advisory lock on a file descriptor.
func flock(f *os.File, _ time.Duration) error {
func flock(f *os.File, _ bool, _ time.Duration) error {
return nil
}
@ -28,9 +28,11 @@ func funlock(f *os.File) error {
// mmap memory maps a DB's data file.
// Based on: https://github.com/edsrzf/mmap-go
func mmap(db *DB, sz int) error {
// Truncate the database to the size of the mmap.
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("truncate: %s", err)
if !db.readOnly {
// Truncate the database to the size of the mmap.
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("truncate: %s", err)
}
}
// Open a file mapping handle.

View File

@ -640,6 +640,22 @@ func TestBucket_Put_KeyTooLarge(t *testing.T) {
})
}
// Ensure that an error is returned when inserting a value that's too large.
func TestBucket_Put_ValueTooLarge(t *testing.T) {
if os.Getenv("DRONE") == "true" {
t.Skip("not enough RAM for test")
}
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
err := tx.Bucket([]byte("widgets")).Put([]byte("foo"), make([]byte, bolt.MaxValueSize+1))
equals(t, err, bolt.ErrValueTooLarge)
return nil
})
}
// Ensure a bucket can calculate stats.
func TestBucket_Stats(t *testing.T) {
db := NewTestDB()

View File

@ -344,7 +344,7 @@ func (cmd *DumpCommand) Run(args ...string) error {
for i, pageID := range pageIDs {
// Print a separator.
if i > 0 {
fmt.Fprintln(cmd.Stdout, "===============================================\n")
fmt.Fprintln(cmd.Stdout, "===============================================")
}
// Print page to stdout.
@ -465,7 +465,7 @@ func (cmd *PageCommand) Run(args ...string) error {
for i, pageID := range pageIDs {
// Print a separator.
if i > 0 {
fmt.Fprintln(cmd.Stdout, "===============================================\n")
fmt.Fprintln(cmd.Stdout, "===============================================")
}
// Retrieve page info and page size.
@ -917,7 +917,7 @@ func (cmd *BenchCommand) Run(args ...string) error {
// Write to the database.
var results BenchResults
if err := cmd.runWrites(db, options, &results); err != nil {
return fmt.Errorf("write: ", err)
return fmt.Errorf("write: %v", err)
}
// Read from the database.

View File

@ -55,6 +55,14 @@ type DB struct {
// THIS IS UNSAFE. PLEASE USE WITH CAUTION.
NoSync bool
// When true, skips the truncate call when growing the database.
// Setting this to true is only safe on non-ext3/ext4 systems.
// Skipping truncation avoids preallocation of hard drive space and
// bypasses a truncate() and fsync() syscall on remapping.
//
// https://github.com/boltdb/bolt/issues/284
NoGrowSync bool
// MaxBatchSize is the maximum size of a batch. Default value is
// copied from DefaultMaxBatchSize in Open.
//
@ -96,6 +104,10 @@ type DB struct {
ops struct {
writeAt func(b []byte, off int64) (n int, err error)
}
// Read only mode.
// When true, Update() and Begin(true) return ErrDatabaseReadOnly immediately.
readOnly bool
}
// Path returns the path to currently open database file.
@ -123,24 +135,34 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
if options == nil {
options = DefaultOptions
}
db.NoGrowSync = options.NoGrowSync
// Set default values for later DB operations.
db.MaxBatchSize = DefaultMaxBatchSize
db.MaxBatchDelay = DefaultMaxBatchDelay
flag := os.O_RDWR
if options.ReadOnly {
flag = os.O_RDONLY
db.readOnly = true
}
// Open data file and separate sync handler for metadata writes.
db.path = path
var err error
if db.file, err = os.OpenFile(db.path, os.O_RDWR|os.O_CREATE, mode); err != nil {
if db.file, err = os.OpenFile(db.path, flag|os.O_CREATE, mode); err != nil {
_ = db.close()
return nil, err
}
// Lock file so that other processes using Bolt cannot use the database
// at the same time. This would cause corruption since the two processes
// would write meta pages and free pages separately.
if err := flock(db.file, options.Timeout); err != nil {
// Lock file so that other processes using Bolt in read-write mode cannot
// use the database at the same time. This would cause corruption since
// the two processes would write meta pages and free pages separately.
// The database file is locked exclusively (only one process can grab the lock)
// if !options.ReadOnly.
// The database file is locked using the shared lock (more than one process may
// hold a lock at the same time) otherwise (options.ReadOnly is set).
if err := flock(db.file, !db.readOnly, options.Timeout); err != nil {
_ = db.close()
return nil, err
}
@ -247,8 +269,8 @@ func (db *DB) munmap() error {
// of the database. The minimum size is 1MB and doubles until it reaches 1GB.
// Returns an error if the new mmap size is greater than the max allowed.
func (db *DB) mmapSize(size int) (int, error) {
// Double the size from 1MB until 1GB.
for i := uint(20); i <= 30; i++ {
// Double the size from 32KB until 1GB.
for i := uint(15); i <= 30; i++ {
if size <= 1<<i {
return 1 << i, nil
}
@ -329,8 +351,15 @@ func (db *DB) init() error {
// Close releases all database resources.
// All transactions must be closed before closing the database.
func (db *DB) Close() error {
db.rwlock.Lock()
defer db.rwlock.Unlock()
db.metalock.Lock()
defer db.metalock.Unlock()
db.mmaplock.RLock()
defer db.mmaplock.RUnlock()
return db.close()
}
@ -350,8 +379,11 @@ func (db *DB) close() error {
// Close file handles.
if db.file != nil {
// Unlock the file.
_ = funlock(db.file)
// No need to unlock read-only file.
if !db.readOnly {
// Unlock the file.
_ = funlock(db.file)
}
// Close the file descriptor.
if err := db.file.Close(); err != nil {
@ -369,6 +401,11 @@ func (db *DB) close() error {
// will cause the calls to block and be serialized until the current write
// transaction finishes.
//
// Transactions should not be depedent on one another. Opening a read
// transaction and a write transaction in the same goroutine can cause the
// writer to deadlock because the database periodically needs to re-mmap itself
// as it grows and it cannot do that while a read transaction is open.
//
// IMPORTANT: You must close read-only transactions after you are finished or
// else the database will not reclaim old pages.
func (db *DB) Begin(writable bool) (*Tx, error) {
@ -417,6 +454,11 @@ func (db *DB) beginTx() (*Tx, error) {
}
func (db *DB) beginRWTx() (*Tx, error) {
// If the database was opened with Options.ReadOnly, return an error.
if db.readOnly {
return nil, ErrDatabaseReadOnly
}
// Obtain writer lock. This is released by the transaction when it closes.
// This enforces only one writer transaction at a time.
db.rwlock.Lock()
@ -547,6 +589,12 @@ func (db *DB) View(fn func(*Tx) error) error {
return nil
}
// Sync executes fdatasync() against the database file handle.
//
// This is not necessary under normal operation, however, if you use NoSync
// then it allows you to force the database file to sync against the disk.
func (db *DB) Sync() error { return fdatasync(db) }
// Stats retrieves ongoing performance stats for the database.
// This is only updated when a transaction closes.
func (db *DB) Stats() Stats {
@ -607,18 +655,30 @@ func (db *DB) allocate(count int) (*page, error) {
return p, nil
}
func (db *DB) IsReadOnly() bool {
return db.readOnly
}
// Options represents the options that can be set when opening a database.
type Options struct {
// Timeout is the amount of time to wait to obtain a file lock.
// When set to zero it will wait indefinitely. This option is only
// available on Darwin and Linux.
Timeout time.Duration
// Sets the DB.NoGrowSync flag before memory mapping the file.
NoGrowSync bool
// Open database in read-only mode. Uses flock(..., LOCK_SH |LOCK_NB) to
// grab a shared lock (UNIX).
ReadOnly bool
}
// DefaultOptions represent the options used if nil options are passed into Open().
// No timeout is used which will cause Bolt to wait indefinitely for a lock.
var DefaultOptions = &Options{
Timeout: 0,
Timeout: 0,
NoGrowSync: false,
}
// Stats represents statistics about the database.

View File

@ -42,6 +42,9 @@ func TestOpen_Timeout(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("timeout not supported on windows")
}
if runtime.GOOS == "solaris" {
t.Skip("solaris fcntl locks don't support intra-process locking")
}
path := tempfile()
defer os.Remove(path)
@ -66,6 +69,9 @@ func TestOpen_Wait(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("timeout not supported on windows")
}
if runtime.GOOS == "solaris" {
t.Skip("solaris fcntl locks don't support intra-process locking")
}
path := tempfile()
defer os.Remove(path)
@ -224,6 +230,80 @@ func TestDB_Open_FileTooSmall(t *testing.T) {
equals(t, errors.New("file size too small"), err)
}
// Ensure that a database can be opened in read-only mode by multiple processes
// and that a database can not be opened in read-write mode and in read-only
// mode at the same time.
func TestOpen_ReadOnly(t *testing.T) {
if runtime.GOOS == "solaris" {
t.Skip("solaris fcntl locks don't support intra-process locking")
}
bucket, key, value := []byte(`bucket`), []byte(`key`), []byte(`value`)
path := tempfile()
defer os.Remove(path)
// Open in read-write mode.
db, err := bolt.Open(path, 0666, nil)
ok(t, db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket(bucket)
if err != nil {
return err
}
return b.Put(key, value)
}))
assert(t, db != nil, "")
assert(t, !db.IsReadOnly(), "")
ok(t, err)
ok(t, db.Close())
// Open in read-only mode.
db0, err := bolt.Open(path, 0666, &bolt.Options{ReadOnly: true})
ok(t, err)
defer db0.Close()
// Opening in read-write mode should return an error.
_, err = bolt.Open(path, 0666, &bolt.Options{Timeout: time.Millisecond * 100})
assert(t, err != nil, "")
// And again (in read-only mode).
db1, err := bolt.Open(path, 0666, &bolt.Options{ReadOnly: true})
ok(t, err)
defer db1.Close()
// Verify both read-only databases are accessible.
for _, db := range []*bolt.DB{db0, db1} {
// Verify is is in read only mode indeed.
assert(t, db.IsReadOnly(), "")
// Read-only databases should not allow updates.
assert(t,
bolt.ErrDatabaseReadOnly == db.Update(func(*bolt.Tx) error {
panic(`should never get here`)
}),
"")
// Read-only databases should not allow beginning writable txns.
_, err = db.Begin(true)
assert(t, bolt.ErrDatabaseReadOnly == err, "")
// Verify the data.
ok(t, db.View(func(tx *bolt.Tx) error {
b := tx.Bucket(bucket)
if b == nil {
return fmt.Errorf("expected bucket `%s`", string(bucket))
}
got := string(b.Get(key))
expected := string(value)
if got != expected {
return fmt.Errorf("expected `%s`, got `%s`", expected, got)
}
return nil
}))
}
}
// TODO(benbjohnson): Test corruption at every byte of the first two pages.
// Ensure that a database cannot open a transaction when it's not open.
@ -254,6 +334,49 @@ func TestDB_BeginRW_Closed(t *testing.T) {
assert(t, tx == nil, "")
}
func TestDB_Close_PendingTx_RW(t *testing.T) { testDB_Close_PendingTx(t, true) }
func TestDB_Close_PendingTx_RO(t *testing.T) { testDB_Close_PendingTx(t, false) }
// Ensure that a database cannot close while transactions are open.
func testDB_Close_PendingTx(t *testing.T, writable bool) {
db := NewTestDB()
defer db.Close()
// Start transaction.
tx, err := db.Begin(true)
if err != nil {
t.Fatal(err)
}
// Open update in separate goroutine.
done := make(chan struct{})
go func() {
db.Close()
close(done)
}()
// Ensure database hasn't closed.
time.Sleep(100 * time.Millisecond)
select {
case <-done:
t.Fatal("database closed too early")
default:
}
// Commit transaction.
if err := tx.Commit(); err != nil {
t.Fatal(err)
}
// Ensure database closed now.
time.Sleep(100 * time.Millisecond)
select {
case <-done:
default:
t.Fatal("database did not close")
}
}
// Ensure a database can provide a transactional block.
func TestDB_Update(t *testing.T) {
db := NewTestDB()
@ -678,7 +801,7 @@ func (db *TestDB) PrintStats() {
// MustCheck runs a consistency check on the database and panics if any errors are found.
func (db *TestDB) MustCheck() {
db.View(func(tx *bolt.Tx) error {
db.Update(func(tx *bolt.Tx) error {
// Collect all the errors.
var errors []error
for err := range tx.Check() {

View File

@ -36,6 +36,10 @@ var (
// ErrTxClosed is returned when committing or rolling back a transaction
// that has already been committed or rolled back.
ErrTxClosed = errors.New("tx closed")
// ErrDatabaseReadOnly is returned when a mutating transaction is started on a
// read-only database.
ErrDatabaseReadOnly = errors.New("database is in read-only mode")
)
// These errors can occur when putting or deleting a value or a bucket.

View File

@ -48,15 +48,14 @@ func (f *freelist) pending_count() int {
// all returns a list of all free ids and all pending ids in one sorted list.
func (f *freelist) all() []pgid {
ids := make([]pgid, len(f.ids))
copy(ids, f.ids)
m := make(pgids, 0)
for _, list := range f.pending {
ids = append(ids, list...)
m = append(m, list...)
}
sort.Sort(pgids(ids))
return ids
sort.Sort(m)
return pgids(f.ids).merge(m)
}
// allocate returns the starting page id of a contiguous list of pages of a given size.
@ -127,15 +126,17 @@ func (f *freelist) free(txid txid, p *page) {
// release moves all page ids for a transaction id (or older) to the freelist.
func (f *freelist) release(txid txid) {
m := make(pgids, 0)
for tid, ids := range f.pending {
if tid <= txid {
// Move transaction's pending pages to the available freelist.
// Don't remove from the cache since the page is still free.
f.ids = append(f.ids, ids...)
m = append(m, ids...)
delete(f.pending, tid)
}
}
sort.Sort(pgids(f.ids))
sort.Sort(m)
f.ids = pgids(f.ids).merge(m)
}
// rollback removes the pages from a given pending tx.

View File

@ -1,7 +1,9 @@
package bolt
import (
"math/rand"
"reflect"
"sort"
"testing"
"unsafe"
)
@ -127,3 +129,28 @@ func TestFreelist_write(t *testing.T) {
t.Fatalf("exp=%v; got=%v", exp, f2.ids)
}
}
func Benchmark_FreelistRelease10K(b *testing.B) { benchmark_FreelistRelease(b, 10000) }
func Benchmark_FreelistRelease100K(b *testing.B) { benchmark_FreelistRelease(b, 100000) }
func Benchmark_FreelistRelease1000K(b *testing.B) { benchmark_FreelistRelease(b, 1000000) }
func Benchmark_FreelistRelease10000K(b *testing.B) { benchmark_FreelistRelease(b, 10000000) }
func benchmark_FreelistRelease(b *testing.B, size int) {
ids := randomPgids(size)
pending := randomPgids(len(ids) / 400)
b.ResetTimer()
for i := 0; i < b.N; i++ {
f := &freelist{ids: ids, pending: map[txid][]pgid{1: pending}}
f.release(1)
}
}
func randomPgids(n int) []pgid {
rand.Seed(42)
pgids := make(pgids, n)
for i := range pgids {
pgids[i] = pgid(rand.Int63())
}
sort.Sort(pgids)
return pgids
}

View File

@ -221,11 +221,20 @@ func (n *node) write(p *page) {
_assert(elem.pgid != p.id, "write: circular dependency occurred")
}
// If the length of key+value is larger than the max allocation size
// then we need to reallocate the byte array pointer.
//
// See: https://github.com/boltdb/bolt/pull/335
klen, vlen := len(item.key), len(item.value)
if len(b) < klen+vlen {
b = (*[maxAllocSize]byte)(unsafe.Pointer(&b[0]))[:]
}
// Write data for the element to the end of the page.
copy(b[0:], item.key)
b = b[len(item.key):]
b = b[klen:]
copy(b[0:], item.value)
b = b[len(item.value):]
b = b[vlen:]
}
// DEBUG ONLY: n.dump()

View File

@ -3,6 +3,7 @@ package bolt
import (
"fmt"
"os"
"sort"
"unsafe"
)
@ -96,7 +97,7 @@ type branchPageElement struct {
// key returns a byte slice of the node key.
func (n *branchPageElement) key() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return buf[n.pos : n.pos+n.ksize]
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize]
}
// leafPageElement represents a node on a leaf page.
@ -110,13 +111,13 @@ type leafPageElement struct {
// key returns a byte slice of the node key.
func (n *leafPageElement) key() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return buf[n.pos : n.pos+n.ksize]
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize]
}
// value returns a byte slice of the node value.
func (n *leafPageElement) value() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return buf[n.pos+n.ksize : n.pos+n.ksize+n.vsize]
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos+n.ksize]))[:n.vsize]
}
// PageInfo represents human readable information about a page.
@ -132,3 +133,40 @@ type pgids []pgid
func (s pgids) Len() int { return len(s) }
func (s pgids) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s pgids) Less(i, j int) bool { return s[i] < s[j] }
// merge returns the sorted union of a and b.
func (a pgids) merge(b pgids) pgids {
// Return the opposite slice if one is nil.
if len(a) == 0 {
return b
} else if len(b) == 0 {
return a
}
// Create a list to hold all elements from both lists.
merged := make(pgids, 0, len(a)+len(b))
// Assign lead to the slice with a lower starting value, follow to the higher value.
lead, follow := a, b
if b[0] < a[0] {
lead, follow = b, a
}
// Continue while there are elements in the lead.
for len(lead) > 0 {
// Merge largest prefix of lead that is ahead of follow[0].
n := sort.Search(len(lead), func(i int) bool { return lead[i] > follow[0] })
merged = append(merged, lead[:n]...)
if n >= len(lead) {
break
}
// Swap lead and follow.
lead, follow = follow, lead[n:]
}
// Append what's left in follow.
merged = append(merged, follow...)
return merged
}

View File

@ -1,7 +1,10 @@
package bolt
import (
"reflect"
"sort"
"testing"
"testing/quick"
)
// Ensure that the page type can be returned in human readable format.
@ -27,3 +30,43 @@ func TestPage_typ(t *testing.T) {
func TestPage_dump(t *testing.T) {
(&page{id: 256}).hexdump(16)
}
func TestPgids_merge(t *testing.T) {
a := pgids{4, 5, 6, 10, 11, 12, 13, 27}
b := pgids{1, 3, 8, 9, 25, 30}
c := a.merge(b)
if !reflect.DeepEqual(c, pgids{1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 25, 27, 30}) {
t.Errorf("mismatch: %v", c)
}
a = pgids{4, 5, 6, 10, 11, 12, 13, 27, 35, 36}
b = pgids{8, 9, 25, 30}
c = a.merge(b)
if !reflect.DeepEqual(c, pgids{4, 5, 6, 8, 9, 10, 11, 12, 13, 25, 27, 30, 35, 36}) {
t.Errorf("mismatch: %v", c)
}
}
func TestPgids_merge_quick(t *testing.T) {
if err := quick.Check(func(a, b pgids) bool {
// Sort incoming lists.
sort.Sort(a)
sort.Sort(b)
// Merge the two lists together.
got := a.merge(b)
// The expected value should be the two lists combined and sorted.
exp := append(a, b...)
sort.Sort(exp)
if !reflect.DeepEqual(exp, got) {
t.Errorf("\nexp=%+v\ngot=%+v\n", exp, got)
return false
}
return true
}, nil); err != nil {
t.Fatal(err)
}
}

View File

@ -127,7 +127,8 @@ func (tx *Tx) OnCommit(fn func()) {
}
// Commit writes all changes to disk and updates the meta page.
// Returns an error if a disk write error occurs.
// Returns an error if a disk write error occurs, or if Commit is
// called on a read-only transaction.
func (tx *Tx) Commit() error {
_assert(!tx.managed, "managed tx commit not allowed")
if tx.db == nil {
@ -203,7 +204,8 @@ func (tx *Tx) Commit() error {
return nil
}
// Rollback closes the transaction and ignores all previous updates.
// Rollback closes the transaction and ignores all previous updates. Read-only
// transactions must be rolled back and not committed.
func (tx *Tx) Rollback() error {
_assert(!tx.managed, "managed tx rollback not allowed")
if tx.db == nil {
@ -421,15 +423,39 @@ func (tx *Tx) write() error {
// Write pages to disk in order.
for _, p := range pages {
size := (int(p.overflow) + 1) * tx.db.pageSize
buf := (*[maxAllocSize]byte)(unsafe.Pointer(p))[:size]
offset := int64(p.id) * int64(tx.db.pageSize)
if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
return err
}
// Update statistics.
tx.stats.Write++
// Write out page in "max allocation" sized chunks.
ptr := (*[maxAllocSize]byte)(unsafe.Pointer(p))
for {
// Limit our write to our max allocation size.
sz := size
if sz > maxAllocSize-1 {
sz = maxAllocSize - 1
}
// Write chunk to disk.
buf := ptr[:sz]
if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
return err
}
// Update statistics.
tx.stats.Write++
// Exit inner for loop if we've written all the chunks.
size -= sz
if size == 0 {
break
}
// Otherwise move offset forward and move pointer to next chunk.
offset += int64(sz)
ptr = (*[maxAllocSize]byte)(unsafe.Pointer(&ptr[sz]))
}
}
// Ignore file sync if flag is set on DB.
if !tx.db.NoSync || IgnoreNoSync {
if err := fdatasync(tx.db); err != nil {
return err

View File

@ -252,6 +252,38 @@ func TestTx_DeleteBucket_NotFound(t *testing.T) {
})
}
// Ensure that no error is returned when a tx.ForEach function does not return
// an error.
func TestTx_ForEach_NoError(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
equals(t, nil, tx.ForEach(func(name []byte, b *bolt.Bucket) error {
return nil
}))
return nil
})
}
// Ensure that an error is returned when a tx.ForEach function returns an error.
func TestTx_ForEach_WithError(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
err := errors.New("foo")
equals(t, err, tx.ForEach(func(name []byte, b *bolt.Bucket) error {
return err
}))
return nil
})
}
// Ensure that Tx commit handlers are called after a transaction successfully commits.
func TestTx_OnCommit(t *testing.T) {
var x int

View File

@ -1,23 +0,0 @@
package etcd
// Add a new directory with a random etcd-generated key under the given path.
func (c *Client) AddChildDir(key string, ttl uint64) (*Response, error) {
raw, err := c.post(key, "", ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// Add a new file with a random etcd-generated key under the given path.
func (c *Client) AddChild(key string, value string, ttl uint64) (*Response, error) {
raw, err := c.post(key, value, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}

View File

@ -1,73 +0,0 @@
package etcd
import "testing"
func TestAddChild(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("fooDir", true)
c.Delete("nonexistentDir", true)
}()
c.CreateDir("fooDir", 5)
_, err := c.AddChild("fooDir", "v0", 5)
if err != nil {
t.Fatal(err)
}
_, err = c.AddChild("fooDir", "v1", 5)
if err != nil {
t.Fatal(err)
}
resp, err := c.Get("fooDir", true, false)
// The child with v0 should proceed the child with v1 because it's added
// earlier, so it should have a lower key.
if !(len(resp.Node.Nodes) == 2 && (resp.Node.Nodes[0].Value == "v0" && resp.Node.Nodes[1].Value == "v1")) {
t.Fatalf("AddChild 1 failed. There should be two chlidren whose values are v0 and v1, respectively."+
" The response was: %#v", resp)
}
// Creating a child under a nonexistent directory should succeed.
// The directory should be created.
resp, err = c.AddChild("nonexistentDir", "foo", 5)
if err != nil {
t.Fatal(err)
}
}
func TestAddChildDir(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("fooDir", true)
c.Delete("nonexistentDir", true)
}()
c.CreateDir("fooDir", 5)
_, err := c.AddChildDir("fooDir", 5)
if err != nil {
t.Fatal(err)
}
_, err = c.AddChildDir("fooDir", 5)
if err != nil {
t.Fatal(err)
}
resp, err := c.Get("fooDir", true, false)
// The child with v0 should proceed the child with v1 because it's added
// earlier, so it should have a lower key.
if !(len(resp.Node.Nodes) == 2 && (len(resp.Node.Nodes[0].Nodes) == 0 && len(resp.Node.Nodes[1].Nodes) == 0)) {
t.Fatalf("AddChildDir 1 failed. There should be two chlidren whose values are v0 and v1, respectively."+
" The response was: %#v", resp)
}
// Creating a child under a nonexistent directory should succeed.
// The directory should be created.
resp, err = c.AddChildDir("nonexistentDir", 5)
if err != nil {
t.Fatal(err)
}
}

View File

@ -1,490 +0,0 @@
package etcd
import (
"crypto/tls"
"crypto/x509"
"encoding/json"
"errors"
"io"
"io/ioutil"
"math/rand"
"net"
"net/http"
"net/url"
"os"
"path"
"strings"
"time"
)
// See SetConsistency for how to use these constants.
const (
// Using strings rather than iota because the consistency level
// could be persisted to disk, so it'd be better to use
// human-readable values.
STRONG_CONSISTENCY = "STRONG"
WEAK_CONSISTENCY = "WEAK"
)
const (
defaultBufferSize = 10
)
func init() {
rand.Seed(int64(time.Now().Nanosecond()))
}
type Config struct {
CertFile string `json:"certFile"`
KeyFile string `json:"keyFile"`
CaCertFile []string `json:"caCertFiles"`
DialTimeout time.Duration `json:"timeout"`
Consistency string `json:"consistency"`
}
type credentials struct {
username string
password string
}
type Client struct {
config Config `json:"config"`
cluster *Cluster `json:"cluster"`
httpClient *http.Client
credentials *credentials
transport *http.Transport
persistence io.Writer
cURLch chan string
// CheckRetry can be used to control the policy for failed requests
// and modify the cluster if needed.
// The client calls it before sending requests again, and
// stops retrying if CheckRetry returns some error. The cases that
// this function needs to handle include no response and unexpected
// http status code of response.
// If CheckRetry is nil, client will call the default one
// `DefaultCheckRetry`.
// Argument cluster is the etcd.Cluster object that these requests have been made on.
// Argument numReqs is the number of http.Requests that have been made so far.
// Argument lastResp is the http.Responses from the last request.
// Argument err is the reason of the failure.
CheckRetry func(cluster *Cluster, numReqs int,
lastResp http.Response, err error) error
}
// NewClient create a basic client that is configured to be used
// with the given machine list.
func NewClient(machines []string) *Client {
config := Config{
// default timeout is one second
DialTimeout: time.Second,
Consistency: WEAK_CONSISTENCY,
}
client := &Client{
cluster: NewCluster(machines),
config: config,
}
client.initHTTPClient()
client.saveConfig()
return client
}
// NewTLSClient create a basic client with TLS configuration
func NewTLSClient(machines []string, cert, key, caCert string) (*Client, error) {
// overwrite the default machine to use https
if len(machines) == 0 {
machines = []string{"https://127.0.0.1:4001"}
}
config := Config{
// default timeout is one second
DialTimeout: time.Second,
Consistency: WEAK_CONSISTENCY,
CertFile: cert,
KeyFile: key,
CaCertFile: make([]string, 0),
}
client := &Client{
cluster: NewCluster(machines),
config: config,
}
err := client.initHTTPSClient(cert, key)
if err != nil {
return nil, err
}
err = client.AddRootCA(caCert)
client.saveConfig()
return client, nil
}
// NewClientFromFile creates a client from a given file path.
// The given file is expected to use the JSON format.
func NewClientFromFile(fpath string) (*Client, error) {
fi, err := os.Open(fpath)
if err != nil {
return nil, err
}
defer func() {
if err := fi.Close(); err != nil {
panic(err)
}
}()
return NewClientFromReader(fi)
}
// NewClientFromReader creates a Client configured from a given reader.
// The configuration is expected to use the JSON format.
func NewClientFromReader(reader io.Reader) (*Client, error) {
c := new(Client)
b, err := ioutil.ReadAll(reader)
if err != nil {
return nil, err
}
err = json.Unmarshal(b, c)
if err != nil {
return nil, err
}
if c.config.CertFile == "" {
c.initHTTPClient()
} else {
err = c.initHTTPSClient(c.config.CertFile, c.config.KeyFile)
}
if err != nil {
return nil, err
}
for _, caCert := range c.config.CaCertFile {
if err := c.AddRootCA(caCert); err != nil {
return nil, err
}
}
return c, nil
}
// Override the Client's HTTP Transport object
func (c *Client) SetTransport(tr *http.Transport) {
c.httpClient.Transport = tr
c.transport = tr
}
func (c *Client) SetCredentials(username, password string) {
c.credentials = &credentials{username, password}
}
func (c *Client) Close() {
c.transport.DisableKeepAlives = true
c.transport.CloseIdleConnections()
}
// initHTTPClient initializes a HTTP client for etcd client
func (c *Client) initHTTPClient() {
c.transport = &http.Transport{
Dial: c.dial,
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
},
}
c.httpClient = &http.Client{Transport: c.transport}
}
// initHTTPClient initializes a HTTPS client for etcd client
func (c *Client) initHTTPSClient(cert, key string) error {
if cert == "" || key == "" {
return errors.New("Require both cert and key path")
}
tlsCert, err := tls.LoadX509KeyPair(cert, key)
if err != nil {
return err
}
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{tlsCert},
InsecureSkipVerify: true,
}
tr := &http.Transport{
TLSClientConfig: tlsConfig,
Dial: c.dial,
}
c.httpClient = &http.Client{Transport: tr}
return nil
}
// SetPersistence sets a writer to which the config will be
// written every time it's changed.
func (c *Client) SetPersistence(writer io.Writer) {
c.persistence = writer
}
// SetConsistency changes the consistency level of the client.
//
// When consistency is set to STRONG_CONSISTENCY, all requests,
// including GET, are sent to the leader. This means that, assuming
// the absence of leader failures, GET requests are guaranteed to see
// the changes made by previous requests.
//
// When consistency is set to WEAK_CONSISTENCY, other requests
// are still sent to the leader, but GET requests are sent to a
// random server from the server pool. This reduces the read
// load on the leader, but it's not guaranteed that the GET requests
// will see changes made by previous requests (they might have not
// yet been committed on non-leader servers).
func (c *Client) SetConsistency(consistency string) error {
if !(consistency == STRONG_CONSISTENCY || consistency == WEAK_CONSISTENCY) {
return errors.New("The argument must be either STRONG_CONSISTENCY or WEAK_CONSISTENCY.")
}
c.config.Consistency = consistency
return nil
}
// Sets the DialTimeout value
func (c *Client) SetDialTimeout(d time.Duration) {
c.config.DialTimeout = d
}
// AddRootCA adds a root CA cert for the etcd client
func (c *Client) AddRootCA(caCert string) error {
if c.httpClient == nil {
return errors.New("Client has not been initialized yet!")
}
certBytes, err := ioutil.ReadFile(caCert)
if err != nil {
return err
}
tr, ok := c.httpClient.Transport.(*http.Transport)
if !ok {
panic("AddRootCA(): Transport type assert should not fail")
}
if tr.TLSClientConfig.RootCAs == nil {
caCertPool := x509.NewCertPool()
ok = caCertPool.AppendCertsFromPEM(certBytes)
if ok {
tr.TLSClientConfig.RootCAs = caCertPool
}
tr.TLSClientConfig.InsecureSkipVerify = false
} else {
ok = tr.TLSClientConfig.RootCAs.AppendCertsFromPEM(certBytes)
}
if !ok {
err = errors.New("Unable to load caCert")
}
c.config.CaCertFile = append(c.config.CaCertFile, caCert)
c.saveConfig()
return err
}
// SetCluster updates cluster information using the given machine list.
func (c *Client) SetCluster(machines []string) bool {
success := c.internalSyncCluster(machines)
return success
}
func (c *Client) GetCluster() []string {
return c.cluster.Machines
}
// SyncCluster updates the cluster information using the internal machine list.
// If no members are found, the intenral machine list is left untouched.
func (c *Client) SyncCluster() bool {
return c.internalSyncCluster(c.cluster.Machines)
}
// internalSyncCluster syncs cluster information using the given machine list.
func (c *Client) internalSyncCluster(machines []string) bool {
// comma-separated list of machines in the cluster.
members := ""
for _, machine := range machines {
httpPath := c.createHttpPath(machine, path.Join(version, "members"))
resp, err := c.httpClient.Get(httpPath)
if err != nil {
// try another machine in the cluster
continue
}
if resp.StatusCode != http.StatusOK { // fall-back to old endpoint
httpPath := c.createHttpPath(machine, path.Join(version, "machines"))
resp, err := c.httpClient.Get(httpPath)
if err != nil {
// try another machine in the cluster
continue
}
b, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
// try another machine in the cluster
continue
}
members = string(b)
} else {
b, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
// try another machine in the cluster
continue
}
var mCollection memberCollection
if err := json.Unmarshal(b, &mCollection); err != nil {
// try another machine
continue
}
urls := make([]string, 0)
for _, m := range mCollection {
urls = append(urls, m.ClientURLs...)
}
members = strings.Join(urls, ",")
}
// We should never do an empty cluster update.
if members == "" {
continue
}
// update Machines List
c.cluster.updateFromStr(members)
logger.Debug("sync.machines ", c.cluster.Machines)
c.saveConfig()
return true
}
return false
}
// createHttpPath creates a complete HTTP URL.
// serverName should contain both the host name and a port number, if any.
func (c *Client) createHttpPath(serverName string, _path string) string {
u, err := url.Parse(serverName)
if err != nil {
panic(err)
}
u.Path = path.Join(u.Path, _path)
if u.Scheme == "" {
u.Scheme = "http"
}
return u.String()
}
// dial attempts to open a TCP connection to the provided address, explicitly
// enabling keep-alives with a one-second interval.
func (c *Client) dial(network, addr string) (net.Conn, error) {
conn, err := net.DialTimeout(network, addr, c.config.DialTimeout)
if err != nil {
return nil, err
}
tcpConn, ok := conn.(*net.TCPConn)
if !ok {
return nil, errors.New("Failed type-assertion of net.Conn as *net.TCPConn")
}
// Keep TCP alive to check whether or not the remote machine is down
if err = tcpConn.SetKeepAlive(true); err != nil {
return nil, err
}
if err = tcpConn.SetKeepAlivePeriod(time.Second); err != nil {
return nil, err
}
return tcpConn, nil
}
func (c *Client) OpenCURL() {
c.cURLch = make(chan string, defaultBufferSize)
}
func (c *Client) CloseCURL() {
c.cURLch = nil
}
func (c *Client) sendCURL(command string) {
go func() {
select {
case c.cURLch <- command:
default:
}
}()
}
func (c *Client) RecvCURL() string {
return <-c.cURLch
}
// saveConfig saves the current config using c.persistence.
func (c *Client) saveConfig() error {
if c.persistence != nil {
b, err := json.Marshal(c)
if err != nil {
return err
}
_, err = c.persistence.Write(b)
if err != nil {
return err
}
}
return nil
}
// MarshalJSON implements the Marshaller interface
// as defined by the standard JSON package.
func (c *Client) MarshalJSON() ([]byte, error) {
b, err := json.Marshal(struct {
Config Config `json:"config"`
Cluster *Cluster `json:"cluster"`
}{
Config: c.config,
Cluster: c.cluster,
})
if err != nil {
return nil, err
}
return b, nil
}
// UnmarshalJSON implements the Unmarshaller interface
// as defined by the standard JSON package.
func (c *Client) UnmarshalJSON(b []byte) error {
temp := struct {
Config Config `json:"config"`
Cluster *Cluster `json:"cluster"`
}{}
err := json.Unmarshal(b, &temp)
if err != nil {
return err
}
c.cluster = temp.Cluster
c.config = temp.Config
return nil
}

View File

@ -1,108 +0,0 @@
package etcd
import (
"encoding/json"
"fmt"
"net"
"net/url"
"os"
"testing"
)
// To pass this test, we need to create a cluster of 3 machines
// The server should be listening on localhost:4001, 4002, 4003
func TestSync(t *testing.T) {
fmt.Println("Make sure there are three nodes at 0.0.0.0:4001-4003")
// Explicit trailing slash to ensure this doesn't reproduce:
// https://github.com/coreos/go-etcd/issues/82
c := NewClient([]string{"http://127.0.0.1:4001/"})
success := c.SyncCluster()
if !success {
t.Fatal("cannot sync machines")
}
for _, m := range c.GetCluster() {
u, err := url.Parse(m)
if err != nil {
t.Fatal(err)
}
if u.Scheme != "http" {
t.Fatal("scheme must be http")
}
host, _, err := net.SplitHostPort(u.Host)
if err != nil {
t.Fatal(err)
}
if host != "localhost" {
t.Fatal("Host must be localhost")
}
}
badMachines := []string{"abc", "edef"}
success = c.SetCluster(badMachines)
if success {
t.Fatal("should not sync on bad machines")
}
goodMachines := []string{"127.0.0.1:4002"}
success = c.SetCluster(goodMachines)
if !success {
t.Fatal("cannot sync machines")
} else {
fmt.Println(c.cluster.Machines)
}
}
func TestPersistence(t *testing.T) {
c := NewClient(nil)
c.SyncCluster()
fo, err := os.Create("config.json")
if err != nil {
t.Fatal(err)
}
defer func() {
if err := fo.Close(); err != nil {
panic(err)
}
}()
c.SetPersistence(fo)
err = c.saveConfig()
if err != nil {
t.Fatal(err)
}
c2, err := NewClientFromFile("config.json")
if err != nil {
t.Fatal(err)
}
// Verify that the two clients have the same config
b1, _ := json.Marshal(c)
b2, _ := json.Marshal(c2)
if string(b1) != string(b2) {
t.Fatalf("The two configs should be equal!")
}
}
func TestClientRetry(t *testing.T) {
c := NewClient([]string{"http://strange", "http://127.0.0.1:4001"})
// use first endpoint as the picked url
c.cluster.picked = 0
if _, err := c.Set("foo", "bar", 5); err != nil {
t.Fatal(err)
}
if _, err := c.Delete("foo", true); err != nil {
t.Fatal(err)
}
}

View File

@ -1,37 +0,0 @@
package etcd
import (
"math/rand"
"strings"
)
type Cluster struct {
Leader string `json:"leader"`
Machines []string `json:"machines"`
picked int
}
func NewCluster(machines []string) *Cluster {
// if an empty slice was sent in then just assume HTTP 4001 on localhost
if len(machines) == 0 {
machines = []string{"http://127.0.0.1:4001"}
}
// default leader and machines
return &Cluster{
Leader: "",
Machines: machines,
picked: rand.Intn(len(machines)),
}
}
func (cl *Cluster) failure() { cl.picked = rand.Intn(len(cl.Machines)) }
func (cl *Cluster) pick() string { return cl.Machines[cl.picked] }
func (cl *Cluster) updateFromStr(machines string) {
cl.Machines = strings.Split(machines, ",")
for i := range cl.Machines {
cl.Machines[i] = strings.TrimSpace(cl.Machines[i])
}
cl.picked = rand.Intn(len(cl.Machines))
}

View File

@ -1,34 +0,0 @@
package etcd
import "fmt"
func (c *Client) CompareAndDelete(key string, prevValue string, prevIndex uint64) (*Response, error) {
raw, err := c.RawCompareAndDelete(key, prevValue, prevIndex)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
func (c *Client) RawCompareAndDelete(key string, prevValue string, prevIndex uint64) (*RawResponse, error) {
if prevValue == "" && prevIndex == 0 {
return nil, fmt.Errorf("You must give either prevValue or prevIndex.")
}
options := Options{}
if prevValue != "" {
options["prevValue"] = prevValue
}
if prevIndex != 0 {
options["prevIndex"] = prevIndex
}
raw, err := c.delete(key, options)
if err != nil {
return nil, err
}
return raw, err
}

View File

@ -1,46 +0,0 @@
package etcd
import (
"testing"
)
func TestCompareAndDelete(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
}()
c.Set("foo", "bar", 5)
// This should succeed an correct prevValue
resp, err := c.CompareAndDelete("foo", "bar", 0)
if err != nil {
t.Fatal(err)
}
if !(resp.PrevNode.Value == "bar" && resp.PrevNode.Key == "/foo" && resp.PrevNode.TTL == 5) {
t.Fatalf("CompareAndDelete 1 prevNode failed: %#v", resp)
}
resp, _ = c.Set("foo", "bar", 5)
// This should fail because it gives an incorrect prevValue
_, err = c.CompareAndDelete("foo", "xxx", 0)
if err == nil {
t.Fatalf("CompareAndDelete 2 should have failed. The response is: %#v", resp)
}
// This should succeed because it gives an correct prevIndex
resp, err = c.CompareAndDelete("foo", "", resp.Node.ModifiedIndex)
if err != nil {
t.Fatal(err)
}
if !(resp.PrevNode.Value == "bar" && resp.PrevNode.Key == "/foo" && resp.PrevNode.TTL == 5) {
t.Fatalf("CompareAndSwap 3 prevNode failed: %#v", resp)
}
c.Set("foo", "bar", 5)
// This should fail because it gives an incorrect prevIndex
resp, err = c.CompareAndDelete("foo", "", 29817514)
if err == nil {
t.Fatalf("CompareAndDelete 4 should have failed. The response is: %#v", resp)
}
}

View File

@ -1,36 +0,0 @@
package etcd
import "fmt"
func (c *Client) CompareAndSwap(key string, value string, ttl uint64,
prevValue string, prevIndex uint64) (*Response, error) {
raw, err := c.RawCompareAndSwap(key, value, ttl, prevValue, prevIndex)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
func (c *Client) RawCompareAndSwap(key string, value string, ttl uint64,
prevValue string, prevIndex uint64) (*RawResponse, error) {
if prevValue == "" && prevIndex == 0 {
return nil, fmt.Errorf("You must give either prevValue or prevIndex.")
}
options := Options{}
if prevValue != "" {
options["prevValue"] = prevValue
}
if prevIndex != 0 {
options["prevIndex"] = prevIndex
}
raw, err := c.put(key, value, ttl, options)
if err != nil {
return nil, err
}
return raw, err
}

View File

@ -1,57 +0,0 @@
package etcd
import (
"testing"
)
func TestCompareAndSwap(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
}()
c.Set("foo", "bar", 5)
// This should succeed
resp, err := c.CompareAndSwap("foo", "bar2", 5, "bar", 0)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Value == "bar2" && resp.Node.Key == "/foo" && resp.Node.TTL == 5) {
t.Fatalf("CompareAndSwap 1 failed: %#v", resp)
}
if !(resp.PrevNode.Value == "bar" && resp.PrevNode.Key == "/foo" && resp.PrevNode.TTL == 5) {
t.Fatalf("CompareAndSwap 1 prevNode failed: %#v", resp)
}
// This should fail because it gives an incorrect prevValue
resp, err = c.CompareAndSwap("foo", "bar3", 5, "xxx", 0)
if err == nil {
t.Fatalf("CompareAndSwap 2 should have failed. The response is: %#v", resp)
}
resp, err = c.Set("foo", "bar", 5)
if err != nil {
t.Fatal(err)
}
// This should succeed
resp, err = c.CompareAndSwap("foo", "bar2", 5, "", resp.Node.ModifiedIndex)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Value == "bar2" && resp.Node.Key == "/foo" && resp.Node.TTL == 5) {
t.Fatalf("CompareAndSwap 3 failed: %#v", resp)
}
if !(resp.PrevNode.Value == "bar" && resp.PrevNode.Key == "/foo" && resp.PrevNode.TTL == 5) {
t.Fatalf("CompareAndSwap 3 prevNode failed: %#v", resp)
}
// This should fail because it gives an incorrect prevIndex
resp, err = c.CompareAndSwap("foo", "bar3", 5, "", 29817514)
if err == nil {
t.Fatalf("CompareAndSwap 4 should have failed. The response is: %#v", resp)
}
}

View File

@ -1 +0,0 @@
{"config":{"certFile":"","keyFile":"","caCertFiles":null,"timeout":1000000000,"consistency":"STRONG"},"cluster":{"leader":"http://127.0.0.1:4001","machines":["http://127.0.0.1:4001"]}}

View File

@ -1,55 +0,0 @@
package etcd
import (
"fmt"
"io/ioutil"
"log"
"strings"
)
var logger *etcdLogger
func SetLogger(l *log.Logger) {
logger = &etcdLogger{l}
}
func GetLogger() *log.Logger {
return logger.log
}
type etcdLogger struct {
log *log.Logger
}
func (p *etcdLogger) Debug(args ...interface{}) {
msg := "DEBUG: " + fmt.Sprint(args...)
p.log.Println(msg)
}
func (p *etcdLogger) Debugf(f string, args ...interface{}) {
msg := "DEBUG: " + fmt.Sprintf(f, args...)
// Append newline if necessary
if !strings.HasSuffix(msg, "\n") {
msg = msg + "\n"
}
p.log.Print(msg)
}
func (p *etcdLogger) Warning(args ...interface{}) {
msg := "WARNING: " + fmt.Sprint(args...)
p.log.Println(msg)
}
func (p *etcdLogger) Warningf(f string, args ...interface{}) {
msg := "WARNING: " + fmt.Sprintf(f, args...)
// Append newline if necessary
if !strings.HasSuffix(msg, "\n") {
msg = msg + "\n"
}
p.log.Print(msg)
}
func init() {
// Default logger uses the go default log.
SetLogger(log.New(ioutil.Discard, "go-etcd", log.LstdFlags))
}

View File

@ -1,28 +0,0 @@
package etcd
import (
"testing"
)
type Foo struct{}
type Bar struct {
one string
two int
}
// Tests that logs don't panic with arbitrary interfaces
func TestDebug(t *testing.T) {
f := &Foo{}
b := &Bar{"asfd", 3}
for _, test := range []interface{}{
1234,
"asdf",
f,
b,
} {
logger.Debug(test)
logger.Debugf("something, %s", test)
logger.Warning(test)
logger.Warningf("something, %s", test)
}
}

View File

@ -1,40 +0,0 @@
package etcd
// Delete deletes the given key.
//
// When recursive set to false, if the key points to a
// directory the method will fail.
//
// When recursive set to true, if the key points to a file,
// the file will be deleted; if the key points to a directory,
// then everything under the directory (including all child directories)
// will be deleted.
func (c *Client) Delete(key string, recursive bool) (*Response, error) {
raw, err := c.RawDelete(key, recursive, false)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// DeleteDir deletes an empty directory or a key value pair
func (c *Client) DeleteDir(key string) (*Response, error) {
raw, err := c.RawDelete(key, false, true)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
func (c *Client) RawDelete(key string, recursive bool, dir bool) (*RawResponse, error) {
ops := Options{
"recursive": recursive,
"dir": dir,
}
return c.delete(key, ops)
}

View File

@ -1,81 +0,0 @@
package etcd
import (
"testing"
)
func TestDelete(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
}()
c.Set("foo", "bar", 5)
resp, err := c.Delete("foo", false)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Value == "") {
t.Fatalf("Delete failed with %s", resp.Node.Value)
}
if !(resp.PrevNode.Value == "bar") {
t.Fatalf("Delete PrevNode failed with %s", resp.Node.Value)
}
resp, err = c.Delete("foo", false)
if err == nil {
t.Fatalf("Delete should have failed because the key foo did not exist. "+
"The response was: %v", resp)
}
}
func TestDeleteAll(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
c.Delete("fooDir", true)
}()
c.SetDir("foo", 5)
// test delete an empty dir
resp, err := c.DeleteDir("foo")
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Value == "") {
t.Fatalf("DeleteAll 1 failed: %#v", resp)
}
if !(resp.PrevNode.Dir == true && resp.PrevNode.Value == "") {
t.Fatalf("DeleteAll 1 PrevNode failed: %#v", resp)
}
c.CreateDir("fooDir", 5)
c.Set("fooDir/foo", "bar", 5)
_, err = c.DeleteDir("fooDir")
if err == nil {
t.Fatal("should not able to delete a non-empty dir with deletedir")
}
resp, err = c.Delete("fooDir", true)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Value == "") {
t.Fatalf("DeleteAll 2 failed: %#v", resp)
}
if !(resp.PrevNode.Dir == true && resp.PrevNode.Value == "") {
t.Fatalf("DeleteAll 2 PrevNode failed: %#v", resp)
}
resp, err = c.Delete("foo", true)
if err == nil {
t.Fatalf("DeleteAll should have failed because the key foo did not exist. "+
"The response was: %v", resp)
}
}

View File

@ -1,49 +0,0 @@
package etcd
import (
"encoding/json"
"fmt"
)
const (
ErrCodeEtcdNotReachable = 501
ErrCodeUnhandledHTTPStatus = 502
)
var (
errorMap = map[int]string{
ErrCodeEtcdNotReachable: "All the given peers are not reachable",
}
)
type EtcdError struct {
ErrorCode int `json:"errorCode"`
Message string `json:"message"`
Cause string `json:"cause,omitempty"`
Index uint64 `json:"index"`
}
func (e EtcdError) Error() string {
return fmt.Sprintf("%v: %v (%v) [%v]", e.ErrorCode, e.Message, e.Cause, e.Index)
}
func newError(errorCode int, cause string, index uint64) *EtcdError {
return &EtcdError{
ErrorCode: errorCode,
Message: errorMap[errorCode],
Cause: cause,
Index: index,
}
}
func handleError(b []byte) error {
etcdErr := new(EtcdError)
err := json.Unmarshal(b, etcdErr)
if err != nil {
logger.Warningf("cannot unmarshal etcd error: %v", err)
return err
}
return etcdErr
}

View File

@ -1,32 +0,0 @@
package etcd
// Get gets the file or directory associated with the given key.
// If the key points to a directory, files and directories under
// it will be returned in sorted or unsorted order, depending on
// the sort flag.
// If recursive is set to false, contents under child directories
// will not be returned.
// If recursive is set to true, all the contents will be returned.
func (c *Client) Get(key string, sort, recursive bool) (*Response, error) {
raw, err := c.RawGet(key, sort, recursive)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
func (c *Client) RawGet(key string, sort, recursive bool) (*RawResponse, error) {
var q bool
if c.config.Consistency == STRONG_CONSISTENCY {
q = true
}
ops := Options{
"recursive": recursive,
"sorted": sort,
"quorum": q,
}
return c.get(key, ops)
}

View File

@ -1,131 +0,0 @@
package etcd
import (
"reflect"
"testing"
)
// cleanNode scrubs Expiration, ModifiedIndex and CreatedIndex of a node.
func cleanNode(n *Node) {
n.Expiration = nil
n.ModifiedIndex = 0
n.CreatedIndex = 0
}
// cleanResult scrubs a result object two levels deep of Expiration,
// ModifiedIndex and CreatedIndex.
func cleanResult(result *Response) {
// TODO(philips): make this recursive.
cleanNode(result.Node)
for i, _ := range result.Node.Nodes {
cleanNode(result.Node.Nodes[i])
for j, _ := range result.Node.Nodes[i].Nodes {
cleanNode(result.Node.Nodes[i].Nodes[j])
}
}
}
func TestGet(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
}()
c.Set("foo", "bar", 5)
result, err := c.Get("foo", false, false)
if err != nil {
t.Fatal(err)
}
if result.Node.Key != "/foo" || result.Node.Value != "bar" {
t.Fatalf("Get failed with %s %s %v", result.Node.Key, result.Node.Value, result.Node.TTL)
}
result, err = c.Get("goo", false, false)
if err == nil {
t.Fatalf("should not be able to get non-exist key")
}
}
func TestGetAll(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("fooDir", true)
}()
c.CreateDir("fooDir", 5)
c.Set("fooDir/k0", "v0", 5)
c.Set("fooDir/k1", "v1", 5)
// Return kv-pairs in sorted order
result, err := c.Get("fooDir", true, false)
if err != nil {
t.Fatal(err)
}
expected := Nodes{
&Node{
Key: "/fooDir/k0",
Value: "v0",
TTL: 5,
},
&Node{
Key: "/fooDir/k1",
Value: "v1",
TTL: 5,
},
}
cleanResult(result)
if !reflect.DeepEqual(result.Node.Nodes, expected) {
t.Fatalf("(actual) %v != (expected) %v", result.Node.Nodes, expected)
}
// Test the `recursive` option
c.CreateDir("fooDir/childDir", 5)
c.Set("fooDir/childDir/k2", "v2", 5)
// Return kv-pairs in sorted order
result, err = c.Get("fooDir", true, true)
cleanResult(result)
if err != nil {
t.Fatal(err)
}
expected = Nodes{
&Node{
Key: "/fooDir/childDir",
Dir: true,
Nodes: Nodes{
&Node{
Key: "/fooDir/childDir/k2",
Value: "v2",
TTL: 5,
},
},
TTL: 5,
},
&Node{
Key: "/fooDir/k0",
Value: "v0",
TTL: 5,
},
&Node{
Key: "/fooDir/k1",
Value: "v1",
TTL: 5,
},
}
cleanResult(result)
if !reflect.DeepEqual(result.Node.Nodes, expected) {
t.Fatalf("(actual) %v != (expected) %v", result.Node.Nodes, expected)
}
}

View File

@ -1,30 +0,0 @@
package etcd
import "encoding/json"
type Member struct {
ID string `json:"id"`
Name string `json:"name"`
PeerURLs []string `json:"peerURLs"`
ClientURLs []string `json:"clientURLs"`
}
type memberCollection []Member
func (c *memberCollection) UnmarshalJSON(data []byte) error {
d := struct {
Members []Member
}{}
if err := json.Unmarshal(data, &d); err != nil {
return err
}
if d.Members == nil {
*c = make([]Member, 0)
return nil
}
*c = d.Members
return nil
}

View File

@ -1,71 +0,0 @@
package etcd
import (
"encoding/json"
"reflect"
"testing"
)
func TestMemberCollectionUnmarshal(t *testing.T) {
tests := []struct {
body []byte
want memberCollection
}{
{
body: []byte(`{"members":[]}`),
want: memberCollection([]Member{}),
},
{
body: []byte(`{"members":[{"id":"2745e2525fce8fe","peerURLs":["http://127.0.0.1:7003"],"name":"node3","clientURLs":["http://127.0.0.1:4003"]},{"id":"42134f434382925","peerURLs":["http://127.0.0.1:2380","http://127.0.0.1:7001"],"name":"node1","clientURLs":["http://127.0.0.1:2379","http://127.0.0.1:4001"]},{"id":"94088180e21eb87b","peerURLs":["http://127.0.0.1:7002"],"name":"node2","clientURLs":["http://127.0.0.1:4002"]}]}`),
want: memberCollection(
[]Member{
{
ID: "2745e2525fce8fe",
Name: "node3",
PeerURLs: []string{
"http://127.0.0.1:7003",
},
ClientURLs: []string{
"http://127.0.0.1:4003",
},
},
{
ID: "42134f434382925",
Name: "node1",
PeerURLs: []string{
"http://127.0.0.1:2380",
"http://127.0.0.1:7001",
},
ClientURLs: []string{
"http://127.0.0.1:2379",
"http://127.0.0.1:4001",
},
},
{
ID: "94088180e21eb87b",
Name: "node2",
PeerURLs: []string{
"http://127.0.0.1:7002",
},
ClientURLs: []string{
"http://127.0.0.1:4002",
},
},
},
),
},
}
for i, tt := range tests {
var got memberCollection
err := json.Unmarshal(tt.body, &got)
if err != nil {
t.Errorf("#%d: unexpected error: %v", i, err)
continue
}
if !reflect.DeepEqual(tt.want, got) {
t.Errorf("#%d: incorrect output: want=%#v, got=%#v", i, tt.want, got)
}
}
}

View File

@ -1,72 +0,0 @@
package etcd
import (
"fmt"
"net/url"
"reflect"
)
type Options map[string]interface{}
// An internally-used data structure that represents a mapping
// between valid options and their kinds
type validOptions map[string]reflect.Kind
// Valid options for GET, PUT, POST, DELETE
// Using CAPITALIZED_UNDERSCORE to emphasize that these
// values are meant to be used as constants.
var (
VALID_GET_OPTIONS = validOptions{
"recursive": reflect.Bool,
"quorum": reflect.Bool,
"sorted": reflect.Bool,
"wait": reflect.Bool,
"waitIndex": reflect.Uint64,
}
VALID_PUT_OPTIONS = validOptions{
"prevValue": reflect.String,
"prevIndex": reflect.Uint64,
"prevExist": reflect.Bool,
"dir": reflect.Bool,
}
VALID_POST_OPTIONS = validOptions{}
VALID_DELETE_OPTIONS = validOptions{
"recursive": reflect.Bool,
"dir": reflect.Bool,
"prevValue": reflect.String,
"prevIndex": reflect.Uint64,
}
)
// Convert options to a string of HTML parameters
func (ops Options) toParameters(validOps validOptions) (string, error) {
p := "?"
values := url.Values{}
if ops == nil {
return "", nil
}
for k, v := range ops {
// Check if the given option is valid (that it exists)
kind := validOps[k]
if kind == reflect.Invalid {
return "", fmt.Errorf("Invalid option: %v", k)
}
// Check if the given option is of the valid type
t := reflect.TypeOf(v)
if kind != t.Kind() {
return "", fmt.Errorf("Option %s should be of %v kind, not of %v kind.",
k, kind, t.Kind())
}
values.Set(k, fmt.Sprintf("%v", v))
}
p += values.Encode()
return p, nil
}

View File

@ -1,403 +0,0 @@
package etcd
import (
"errors"
"fmt"
"io"
"io/ioutil"
"net/http"
"net/url"
"path"
"strings"
"sync"
"time"
)
// Errors introduced by handling requests
var (
ErrRequestCancelled = errors.New("sending request is cancelled")
)
type RawRequest struct {
Method string
RelativePath string
Values url.Values
Cancel <-chan bool
}
// NewRawRequest returns a new RawRequest
func NewRawRequest(method, relativePath string, values url.Values, cancel <-chan bool) *RawRequest {
return &RawRequest{
Method: method,
RelativePath: relativePath,
Values: values,
Cancel: cancel,
}
}
// getCancelable issues a cancelable GET request
func (c *Client) getCancelable(key string, options Options,
cancel <-chan bool) (*RawResponse, error) {
logger.Debugf("get %s [%s]", key, c.cluster.pick())
p := keyToPath(key)
str, err := options.toParameters(VALID_GET_OPTIONS)
if err != nil {
return nil, err
}
p += str
req := NewRawRequest("GET", p, nil, cancel)
resp, err := c.SendRequest(req)
if err != nil {
return nil, err
}
return resp, nil
}
// get issues a GET request
func (c *Client) get(key string, options Options) (*RawResponse, error) {
return c.getCancelable(key, options, nil)
}
// put issues a PUT request
func (c *Client) put(key string, value string, ttl uint64,
options Options) (*RawResponse, error) {
logger.Debugf("put %s, %s, ttl: %d, [%s]", key, value, ttl, c.cluster.pick())
p := keyToPath(key)
str, err := options.toParameters(VALID_PUT_OPTIONS)
if err != nil {
return nil, err
}
p += str
req := NewRawRequest("PUT", p, buildValues(value, ttl), nil)
resp, err := c.SendRequest(req)
if err != nil {
return nil, err
}
return resp, nil
}
// post issues a POST request
func (c *Client) post(key string, value string, ttl uint64) (*RawResponse, error) {
logger.Debugf("post %s, %s, ttl: %d, [%s]", key, value, ttl, c.cluster.pick())
p := keyToPath(key)
req := NewRawRequest("POST", p, buildValues(value, ttl), nil)
resp, err := c.SendRequest(req)
if err != nil {
return nil, err
}
return resp, nil
}
// delete issues a DELETE request
func (c *Client) delete(key string, options Options) (*RawResponse, error) {
logger.Debugf("delete %s [%s]", key, c.cluster.pick())
p := keyToPath(key)
str, err := options.toParameters(VALID_DELETE_OPTIONS)
if err != nil {
return nil, err
}
p += str
req := NewRawRequest("DELETE", p, nil, nil)
resp, err := c.SendRequest(req)
if err != nil {
return nil, err
}
return resp, nil
}
// SendRequest sends a HTTP request and returns a Response as defined by etcd
func (c *Client) SendRequest(rr *RawRequest) (*RawResponse, error) {
var req *http.Request
var resp *http.Response
var httpPath string
var err error
var respBody []byte
var numReqs = 1
checkRetry := c.CheckRetry
if checkRetry == nil {
checkRetry = DefaultCheckRetry
}
cancelled := make(chan bool, 1)
reqLock := new(sync.Mutex)
if rr.Cancel != nil {
cancelRoutine := make(chan bool)
defer close(cancelRoutine)
go func() {
select {
case <-rr.Cancel:
cancelled <- true
logger.Debug("send.request is cancelled")
case <-cancelRoutine:
return
}
// Repeat canceling request until this thread is stopped
// because we have no idea about whether it succeeds.
for {
reqLock.Lock()
c.httpClient.Transport.(*http.Transport).CancelRequest(req)
reqLock.Unlock()
select {
case <-time.After(100 * time.Millisecond):
case <-cancelRoutine:
return
}
}
}()
}
// If we connect to a follower and consistency is required, retry until
// we connect to a leader
sleep := 25 * time.Millisecond
maxSleep := time.Second
for attempt := 0; ; attempt++ {
if attempt > 0 {
select {
case <-cancelled:
return nil, ErrRequestCancelled
case <-time.After(sleep):
sleep = sleep * 2
if sleep > maxSleep {
sleep = maxSleep
}
}
}
logger.Debug("Connecting to etcd: attempt ", attempt+1, " for ", rr.RelativePath)
// get httpPath if not set
if httpPath == "" {
httpPath = c.getHttpPath(rr.RelativePath)
}
// Return a cURL command if curlChan is set
if c.cURLch != nil {
command := fmt.Sprintf("curl -X %s %s", rr.Method, httpPath)
for key, value := range rr.Values {
command += fmt.Sprintf(" -d %s=%s", key, value[0])
}
if c.credentials != nil {
command += fmt.Sprintf(" -u %s", c.credentials.username)
}
c.sendCURL(command)
}
logger.Debug("send.request.to ", httpPath, " | method ", rr.Method)
req, err := func() (*http.Request, error) {
reqLock.Lock()
defer reqLock.Unlock()
if rr.Values == nil {
if req, err = http.NewRequest(rr.Method, httpPath, nil); err != nil {
return nil, err
}
} else {
body := strings.NewReader(rr.Values.Encode())
if req, err = http.NewRequest(rr.Method, httpPath, body); err != nil {
return nil, err
}
req.Header.Set("Content-Type",
"application/x-www-form-urlencoded; param=value")
}
return req, nil
}()
if err != nil {
return nil, err
}
if c.credentials != nil {
req.SetBasicAuth(c.credentials.username, c.credentials.password)
}
resp, err = c.httpClient.Do(req)
// clear previous httpPath
httpPath = ""
defer func() {
if resp != nil {
resp.Body.Close()
}
}()
// If the request was cancelled, return ErrRequestCancelled directly
select {
case <-cancelled:
return nil, ErrRequestCancelled
default:
}
numReqs++
// network error, change a machine!
if err != nil {
logger.Debug("network error: ", err.Error())
lastResp := http.Response{}
if checkErr := checkRetry(c.cluster, numReqs, lastResp, err); checkErr != nil {
return nil, checkErr
}
c.cluster.failure()
continue
}
// if there is no error, it should receive response
logger.Debug("recv.response.from ", httpPath)
if validHttpStatusCode[resp.StatusCode] {
// try to read byte code and break the loop
respBody, err = ioutil.ReadAll(resp.Body)
if err == nil {
logger.Debug("recv.success ", httpPath)
break
}
// ReadAll error may be caused due to cancel request
select {
case <-cancelled:
return nil, ErrRequestCancelled
default:
}
if err == io.ErrUnexpectedEOF {
// underlying connection was closed prematurely, probably by timeout
// TODO: empty body or unexpectedEOF can cause http.Transport to get hosed;
// this allows the client to detect that and take evasive action. Need
// to revisit once code.google.com/p/go/issues/detail?id=8648 gets fixed.
respBody = []byte{}
break
}
}
if resp.StatusCode == http.StatusTemporaryRedirect {
u, err := resp.Location()
if err != nil {
logger.Warning(err)
} else {
// set httpPath for following redirection
httpPath = u.String()
}
resp.Body.Close()
continue
}
if checkErr := checkRetry(c.cluster, numReqs, *resp,
errors.New("Unexpected HTTP status code")); checkErr != nil {
return nil, checkErr
}
resp.Body.Close()
}
r := &RawResponse{
StatusCode: resp.StatusCode,
Body: respBody,
Header: resp.Header,
}
return r, nil
}
// DefaultCheckRetry defines the retrying behaviour for bad HTTP requests
// If we have retried 2 * machine number, stop retrying.
// If status code is InternalServerError, sleep for 200ms.
func DefaultCheckRetry(cluster *Cluster, numReqs int, lastResp http.Response,
err error) error {
if numReqs > 2*len(cluster.Machines) {
errStr := fmt.Sprintf("failed to propose on members %v twice [last error: %v]", cluster.Machines, err)
return newError(ErrCodeEtcdNotReachable, errStr, 0)
}
if isEmptyResponse(lastResp) {
// always retry if it failed to get response from one machine
return nil
}
if !shouldRetry(lastResp) {
body := []byte("nil")
if lastResp.Body != nil {
if b, err := ioutil.ReadAll(lastResp.Body); err == nil {
body = b
}
}
errStr := fmt.Sprintf("unhandled http status [%s] with body [%s]", http.StatusText(lastResp.StatusCode), body)
return newError(ErrCodeUnhandledHTTPStatus, errStr, 0)
}
// sleep some time and expect leader election finish
time.Sleep(time.Millisecond * 200)
logger.Warning("bad response status code", lastResp.StatusCode)
return nil
}
func isEmptyResponse(r http.Response) bool { return r.StatusCode == 0 }
// shouldRetry returns whether the reponse deserves retry.
func shouldRetry(r http.Response) bool {
// TODO: only retry when the cluster is in leader election
// We cannot do it exactly because etcd doesn't support it well.
return r.StatusCode == http.StatusInternalServerError
}
func (c *Client) getHttpPath(s ...string) string {
fullPath := c.cluster.pick() + "/" + version
for _, seg := range s {
fullPath = fullPath + "/" + seg
}
return fullPath
}
// buildValues builds a url.Values map according to the given value and ttl
func buildValues(value string, ttl uint64) url.Values {
v := url.Values{}
if value != "" {
v.Set("value", value)
}
if ttl > 0 {
v.Set("ttl", fmt.Sprintf("%v", ttl))
}
return v
}
// convert key string to http path exclude version, including URL escaping
// for example: key[foo] -> path[keys/foo]
// key[/%z] -> path[keys/%25z]
// key[/] -> path[keys/]
func keyToPath(key string) string {
// URL-escape our key, except for slashes
p := strings.Replace(url.QueryEscape(path.Join("keys", key)), "%2F", "/", -1)
// corner case: if key is "/" or "//" ect
// path join will clear the tailing "/"
// we need to add it back
if p == "keys" {
p = "keys/"
}
return p
}

View File

@ -1,22 +0,0 @@
package etcd
import "testing"
func TestKeyToPath(t *testing.T) {
tests := []struct {
key string
wpath string
}{
{"", "keys/"},
{"foo", "keys/foo"},
{"foo/bar", "keys/foo/bar"},
{"%z", "keys/%25z"},
{"/", "keys/"},
}
for i, tt := range tests {
path := keyToPath(tt.key)
if path != tt.wpath {
t.Errorf("#%d: path = %s, want %s", i, path, tt.wpath)
}
}
}

File diff suppressed because it is too large Load Diff

View File

@ -1,93 +0,0 @@
package etcd
//go:generate codecgen -o response.generated.go response.go
import (
"net/http"
"strconv"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/ugorji/go/codec"
)
const (
rawResponse = iota
normalResponse
)
type responseType int
type RawResponse struct {
StatusCode int
Body []byte
Header http.Header
}
var (
validHttpStatusCode = map[int]bool{
http.StatusCreated: true,
http.StatusOK: true,
http.StatusBadRequest: true,
http.StatusNotFound: true,
http.StatusPreconditionFailed: true,
http.StatusForbidden: true,
http.StatusUnauthorized: true,
}
)
// Unmarshal parses RawResponse and stores the result in Response
func (rr *RawResponse) Unmarshal() (*Response, error) {
if rr.StatusCode != http.StatusOK && rr.StatusCode != http.StatusCreated {
return nil, handleError(rr.Body)
}
resp := new(Response)
err := codec.NewDecoderBytes(rr.Body, new(codec.JsonHandle)).Decode(resp)
if err != nil {
return nil, err
}
// attach index and term to response
resp.EtcdIndex, _ = strconv.ParseUint(rr.Header.Get("X-Etcd-Index"), 10, 64)
resp.RaftIndex, _ = strconv.ParseUint(rr.Header.Get("X-Raft-Index"), 10, 64)
resp.RaftTerm, _ = strconv.ParseUint(rr.Header.Get("X-Raft-Term"), 10, 64)
return resp, nil
}
type Response struct {
Action string `json:"action"`
Node *Node `json:"node"`
PrevNode *Node `json:"prevNode,omitempty"`
EtcdIndex uint64 `json:"etcdIndex"`
RaftIndex uint64 `json:"raftIndex"`
RaftTerm uint64 `json:"raftTerm"`
}
type Node struct {
Key string `json:"key, omitempty"`
Value string `json:"value,omitempty"`
Dir bool `json:"dir,omitempty"`
Expiration *time.Time `json:"expiration,omitempty"`
TTL int64 `json:"ttl,omitempty"`
Nodes Nodes `json:"nodes,omitempty"`
ModifiedIndex uint64 `json:"modifiedIndex,omitempty"`
CreatedIndex uint64 `json:"createdIndex,omitempty"`
}
type Nodes []*Node
// interfaces for sorting
func (ns Nodes) Len() int {
return len(ns)
}
func (ns Nodes) Less(i, j int) bool {
return ns[i].Key < ns[j].Key
}
func (ns Nodes) Swap(i, j int) {
ns[i], ns[j] = ns[j], ns[i]
}

View File

@ -1,42 +0,0 @@
package etcd
import (
"fmt"
"testing"
)
func TestSetCurlChan(t *testing.T) {
c := NewClient(nil)
c.OpenCURL()
defer func() {
c.Delete("foo", true)
}()
_, err := c.Set("foo", "bar", 5)
if err != nil {
t.Fatal(err)
}
expected := fmt.Sprintf("curl -X PUT %s/v2/keys/foo -d value=bar -d ttl=5",
c.cluster.pick())
actual := c.RecvCURL()
if expected != actual {
t.Fatalf(`Command "%s" is not equal to expected value "%s"`,
actual, expected)
}
c.SetConsistency(STRONG_CONSISTENCY)
_, err = c.Get("foo", false, false)
if err != nil {
t.Fatal(err)
}
expected = fmt.Sprintf("curl -X GET %s/v2/keys/foo?quorum=true&recursive=false&sorted=false",
c.cluster.pick())
actual = c.RecvCURL()
if expected != actual {
t.Fatalf(`Command "%s" is not equal to expected value "%s"`,
actual, expected)
}
}

View File

@ -1,137 +0,0 @@
package etcd
// Set sets the given key to the given value.
// It will create a new key value pair or replace the old one.
// It will not replace a existing directory.
func (c *Client) Set(key string, value string, ttl uint64) (*Response, error) {
raw, err := c.RawSet(key, value, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// SetDir sets the given key to a directory.
// It will create a new directory or replace the old key value pair by a directory.
// It will not replace a existing directory.
func (c *Client) SetDir(key string, ttl uint64) (*Response, error) {
raw, err := c.RawSetDir(key, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// CreateDir creates a directory. It succeeds only if
// the given key does not yet exist.
func (c *Client) CreateDir(key string, ttl uint64) (*Response, error) {
raw, err := c.RawCreateDir(key, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// UpdateDir updates the given directory. It succeeds only if the
// given key already exists.
func (c *Client) UpdateDir(key string, ttl uint64) (*Response, error) {
raw, err := c.RawUpdateDir(key, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// Create creates a file with the given value under the given key. It succeeds
// only if the given key does not yet exist.
func (c *Client) Create(key string, value string, ttl uint64) (*Response, error) {
raw, err := c.RawCreate(key, value, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// CreateInOrder creates a file with a key that's guaranteed to be higher than other
// keys in the given directory. It is useful for creating queues.
func (c *Client) CreateInOrder(dir string, value string, ttl uint64) (*Response, error) {
raw, err := c.RawCreateInOrder(dir, value, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
// Update updates the given key to the given value. It succeeds only if the
// given key already exists.
func (c *Client) Update(key string, value string, ttl uint64) (*Response, error) {
raw, err := c.RawUpdate(key, value, ttl)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
func (c *Client) RawUpdateDir(key string, ttl uint64) (*RawResponse, error) {
ops := Options{
"prevExist": true,
"dir": true,
}
return c.put(key, "", ttl, ops)
}
func (c *Client) RawCreateDir(key string, ttl uint64) (*RawResponse, error) {
ops := Options{
"prevExist": false,
"dir": true,
}
return c.put(key, "", ttl, ops)
}
func (c *Client) RawSet(key string, value string, ttl uint64) (*RawResponse, error) {
return c.put(key, value, ttl, nil)
}
func (c *Client) RawSetDir(key string, ttl uint64) (*RawResponse, error) {
ops := Options{
"dir": true,
}
return c.put(key, "", ttl, ops)
}
func (c *Client) RawUpdate(key string, value string, ttl uint64) (*RawResponse, error) {
ops := Options{
"prevExist": true,
}
return c.put(key, value, ttl, ops)
}
func (c *Client) RawCreate(key string, value string, ttl uint64) (*RawResponse, error) {
ops := Options{
"prevExist": false,
}
return c.put(key, value, ttl, ops)
}
func (c *Client) RawCreateInOrder(dir string, value string, ttl uint64) (*RawResponse, error) {
return c.post(dir, value, ttl)
}

View File

@ -1,241 +0,0 @@
package etcd
import (
"testing"
)
func TestSet(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
}()
resp, err := c.Set("foo", "bar", 5)
if err != nil {
t.Fatal(err)
}
if resp.Node.Key != "/foo" || resp.Node.Value != "bar" || resp.Node.TTL != 5 {
t.Fatalf("Set 1 failed: %#v", resp)
}
if resp.PrevNode != nil {
t.Fatalf("Set 1 PrevNode failed: %#v", resp)
}
resp, err = c.Set("foo", "bar2", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/foo" && resp.Node.Value == "bar2" && resp.Node.TTL == 5) {
t.Fatalf("Set 2 failed: %#v", resp)
}
if resp.PrevNode.Key != "/foo" || resp.PrevNode.Value != "bar" || resp.Node.TTL != 5 {
t.Fatalf("Set 2 PrevNode failed: %#v", resp)
}
}
func TestUpdate(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
c.Delete("nonexistent", true)
}()
resp, err := c.Set("foo", "bar", 5)
if err != nil {
t.Fatal(err)
}
// This should succeed.
resp, err = c.Update("foo", "wakawaka", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Action == "update" && resp.Node.Key == "/foo" && resp.Node.TTL == 5) {
t.Fatalf("Update 1 failed: %#v", resp)
}
if !(resp.PrevNode.Key == "/foo" && resp.PrevNode.Value == "bar" && resp.Node.TTL == 5) {
t.Fatalf("Update 1 prevValue failed: %#v", resp)
}
// This should fail because the key does not exist.
resp, err = c.Update("nonexistent", "whatever", 5)
if err == nil {
t.Fatalf("The key %v did not exist, so the update should have failed."+
"The response was: %#v", resp.Node.Key, resp)
}
}
func TestCreate(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("newKey", true)
}()
newKey := "/newKey"
newValue := "/newValue"
// This should succeed
resp, err := c.Create(newKey, newValue, 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Action == "create" && resp.Node.Key == newKey &&
resp.Node.Value == newValue && resp.Node.TTL == 5) {
t.Fatalf("Create 1 failed: %#v", resp)
}
if resp.PrevNode != nil {
t.Fatalf("Create 1 PrevNode failed: %#v", resp)
}
// This should fail, because the key is already there
resp, err = c.Create(newKey, newValue, 5)
if err == nil {
t.Fatalf("The key %v did exist, so the creation should have failed."+
"The response was: %#v", resp.Node.Key, resp)
}
}
func TestCreateInOrder(t *testing.T) {
c := NewClient(nil)
dir := "/queue"
defer func() {
c.DeleteDir(dir)
}()
var firstKey, secondKey string
resp, err := c.CreateInOrder(dir, "1", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Action == "create" && resp.Node.Value == "1" && resp.Node.TTL == 5) {
t.Fatalf("Create 1 failed: %#v", resp)
}
firstKey = resp.Node.Key
resp, err = c.CreateInOrder(dir, "2", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Action == "create" && resp.Node.Value == "2" && resp.Node.TTL == 5) {
t.Fatalf("Create 2 failed: %#v", resp)
}
secondKey = resp.Node.Key
if firstKey >= secondKey {
t.Fatalf("Expected first key to be greater than second key, but %s is not greater than %s",
firstKey, secondKey)
}
}
func TestSetDir(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("foo", true)
c.Delete("fooDir", true)
}()
resp, err := c.CreateDir("fooDir", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/fooDir" && resp.Node.Value == "" && resp.Node.TTL == 5) {
t.Fatalf("SetDir 1 failed: %#v", resp)
}
if resp.PrevNode != nil {
t.Fatalf("SetDir 1 PrevNode failed: %#v", resp)
}
// This should fail because /fooDir already points to a directory
resp, err = c.CreateDir("/fooDir", 5)
if err == nil {
t.Fatalf("fooDir already points to a directory, so SetDir should have failed."+
"The response was: %#v", resp)
}
_, err = c.Set("foo", "bar", 5)
if err != nil {
t.Fatal(err)
}
// This should succeed
// It should replace the key
resp, err = c.SetDir("foo", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/foo" && resp.Node.Value == "" && resp.Node.TTL == 5) {
t.Fatalf("SetDir 2 failed: %#v", resp)
}
if !(resp.PrevNode.Key == "/foo" && resp.PrevNode.Value == "bar" && resp.PrevNode.TTL == 5) {
t.Fatalf("SetDir 2 failed: %#v", resp)
}
}
func TestUpdateDir(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("fooDir", true)
}()
resp, err := c.CreateDir("fooDir", 5)
if err != nil {
t.Fatal(err)
}
// This should succeed.
resp, err = c.UpdateDir("fooDir", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Action == "update" && resp.Node.Key == "/fooDir" &&
resp.Node.Value == "" && resp.Node.TTL == 5) {
t.Fatalf("UpdateDir 1 failed: %#v", resp)
}
if !(resp.PrevNode.Key == "/fooDir" && resp.PrevNode.Dir == true && resp.PrevNode.TTL == 5) {
t.Fatalf("UpdateDir 1 PrevNode failed: %#v", resp)
}
// This should fail because the key does not exist.
resp, err = c.UpdateDir("nonexistentDir", 5)
if err == nil {
t.Fatalf("The key %v did not exist, so the update should have failed."+
"The response was: %#v", resp.Node.Key, resp)
}
}
func TestCreateDir(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("fooDir", true)
}()
// This should succeed
resp, err := c.CreateDir("fooDir", 5)
if err != nil {
t.Fatal(err)
}
if !(resp.Action == "create" && resp.Node.Key == "/fooDir" &&
resp.Node.Value == "" && resp.Node.TTL == 5) {
t.Fatalf("CreateDir 1 failed: %#v", resp)
}
if resp.PrevNode != nil {
t.Fatalf("CreateDir 1 PrevNode failed: %#v", resp)
}
// This should fail, because the key is already there
resp, err = c.CreateDir("fooDir", 5)
if err == nil {
t.Fatalf("The key %v did exist, so the creation should have failed."+
"The response was: %#v", resp.Node.Key, resp)
}
}

View File

@ -1,6 +0,0 @@
package etcd
const (
version = "v2"
packageVersion = "v2.0.0+git"
)

View File

@ -1,103 +0,0 @@
package etcd
import (
"errors"
)
// Errors introduced by the Watch command.
var (
ErrWatchStoppedByUser = errors.New("Watch stopped by the user via stop channel")
)
// If recursive is set to true the watch returns the first change under the given
// prefix since the given index.
//
// If recursive is set to false the watch returns the first change to the given key
// since the given index.
//
// To watch for the latest change, set waitIndex = 0.
//
// If a receiver channel is given, it will be a long-term watch. Watch will block at the
//channel. After someone receives the channel, it will go on to watch that
// prefix. If a stop channel is given, the client can close long-term watch using
// the stop channel.
func (c *Client) Watch(prefix string, waitIndex uint64, recursive bool,
receiver chan *Response, stop chan bool) (*Response, error) {
logger.Debugf("watch %s [%s]", prefix, c.cluster.Leader)
if receiver == nil {
raw, err := c.watchOnce(prefix, waitIndex, recursive, stop)
if err != nil {
return nil, err
}
return raw.Unmarshal()
}
defer close(receiver)
for {
raw, err := c.watchOnce(prefix, waitIndex, recursive, stop)
if err != nil {
return nil, err
}
resp, err := raw.Unmarshal()
if err != nil {
return nil, err
}
waitIndex = resp.Node.ModifiedIndex + 1
receiver <- resp
}
}
func (c *Client) RawWatch(prefix string, waitIndex uint64, recursive bool,
receiver chan *RawResponse, stop chan bool) (*RawResponse, error) {
logger.Debugf("rawWatch %s [%s]", prefix, c.cluster.Leader)
if receiver == nil {
return c.watchOnce(prefix, waitIndex, recursive, stop)
}
for {
raw, err := c.watchOnce(prefix, waitIndex, recursive, stop)
if err != nil {
return nil, err
}
resp, err := raw.Unmarshal()
if err != nil {
return nil, err
}
waitIndex = resp.Node.ModifiedIndex + 1
receiver <- raw
}
}
// helper func
// return when there is change under the given prefix
func (c *Client) watchOnce(key string, waitIndex uint64, recursive bool, stop chan bool) (*RawResponse, error) {
options := Options{
"wait": true,
}
if waitIndex > 0 {
options["waitIndex"] = waitIndex
}
if recursive {
options["recursive"] = true
}
resp, err := c.getCancelable(key, options, stop)
if err == ErrRequestCancelled {
return nil, ErrWatchStoppedByUser
}
return resp, err
}

View File

@ -1,119 +0,0 @@
package etcd
import (
"fmt"
"runtime"
"testing"
"time"
)
func TestWatch(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("watch_foo", true)
}()
go setHelper("watch_foo", "bar", c)
resp, err := c.Watch("watch_foo", 0, false, nil, nil)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/watch_foo" && resp.Node.Value == "bar") {
t.Fatalf("Watch 1 failed: %#v", resp)
}
go setHelper("watch_foo", "bar", c)
resp, err = c.Watch("watch_foo", resp.Node.ModifiedIndex+1, false, nil, nil)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/watch_foo" && resp.Node.Value == "bar") {
t.Fatalf("Watch 2 failed: %#v", resp)
}
routineNum := runtime.NumGoroutine()
ch := make(chan *Response, 10)
stop := make(chan bool, 1)
go setLoop("watch_foo", "bar", c)
go receiver(ch, stop)
_, err = c.Watch("watch_foo", 0, false, ch, stop)
if err != ErrWatchStoppedByUser {
t.Fatalf("Watch returned a non-user stop error")
}
if newRoutineNum := runtime.NumGoroutine(); newRoutineNum != routineNum {
t.Fatalf("Routine numbers differ after watch stop: %v, %v", routineNum, newRoutineNum)
}
}
func TestWatchAll(t *testing.T) {
c := NewClient(nil)
defer func() {
c.Delete("watch_foo", true)
}()
go setHelper("watch_foo/foo", "bar", c)
resp, err := c.Watch("watch_foo", 0, true, nil, nil)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/watch_foo/foo" && resp.Node.Value == "bar") {
t.Fatalf("WatchAll 1 failed: %#v", resp)
}
go setHelper("watch_foo/foo", "bar", c)
resp, err = c.Watch("watch_foo", resp.Node.ModifiedIndex+1, true, nil, nil)
if err != nil {
t.Fatal(err)
}
if !(resp.Node.Key == "/watch_foo/foo" && resp.Node.Value == "bar") {
t.Fatalf("WatchAll 2 failed: %#v", resp)
}
ch := make(chan *Response, 10)
stop := make(chan bool, 1)
routineNum := runtime.NumGoroutine()
go setLoop("watch_foo/foo", "bar", c)
go receiver(ch, stop)
_, err = c.Watch("watch_foo", 0, true, ch, stop)
if err != ErrWatchStoppedByUser {
t.Fatalf("Watch returned a non-user stop error")
}
if newRoutineNum := runtime.NumGoroutine(); newRoutineNum != routineNum {
t.Fatalf("Routine numbers differ after watch stop: %v, %v", routineNum, newRoutineNum)
}
}
func setHelper(key, value string, c *Client) {
time.Sleep(time.Second)
c.Set(key, value, 100)
}
func setLoop(key, value string, c *Client) {
time.Sleep(time.Second)
for i := 0; i < 10; i++ {
newValue := fmt.Sprintf("%s_%v", value, i)
c.Set(key, newValue, 100)
time.Sleep(time.Second / 10)
}
}
func receiver(c chan *Response, stop chan bool) {
for i := 0; i < 10; i++ {
<-c
}
stop <- true
}

View File

@ -0,0 +1,31 @@
// Code forked from Docker project
package daemon
import (
"errors"
"net"
"os"
)
var SdNotifyNoSocket = errors.New("No socket")
// SdNotify sends a message to the init daemon. It is common to ignore the error.
func SdNotify(state string) error {
socketAddr := &net.UnixAddr{
Name: os.Getenv("NOTIFY_SOCKET"),
Net: "unixgram",
}
if socketAddr.Name == "" {
return SdNotifyNoSocket
}
conn, err := net.DialUnix(socketAddr.Net, nil, socketAddr)
if err != nil {
return err
}
defer conn.Close()
_, err = conn.Write([]byte(state))
return err
}

View File

@ -0,0 +1,166 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package journal provides write bindings to the systemd journal
package journal
import (
"bytes"
"encoding/binary"
"errors"
"fmt"
"io"
"io/ioutil"
"net"
"os"
"strconv"
"strings"
"syscall"
)
// Priority of a journal message
type Priority int
const (
PriEmerg Priority = iota
PriAlert
PriCrit
PriErr
PriWarning
PriNotice
PriInfo
PriDebug
)
var conn net.Conn
func init() {
var err error
conn, err = net.Dial("unixgram", "/run/systemd/journal/socket")
if err != nil {
conn = nil
}
}
// Enabled returns true iff the systemd journal is available for logging
func Enabled() bool {
return conn != nil
}
// Send a message to the systemd journal. vars is a map of journald fields to
// values. Fields must be composed of uppercase letters, numbers, and
// underscores, but must not start with an underscore. Within these
// restrictions, any arbitrary field name may be used. Some names have special
// significance: see the journalctl documentation
// (http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html)
// for more details. vars may be nil.
func Send(message string, priority Priority, vars map[string]string) error {
if conn == nil {
return journalError("could not connect to journald socket")
}
data := new(bytes.Buffer)
appendVariable(data, "PRIORITY", strconv.Itoa(int(priority)))
appendVariable(data, "MESSAGE", message)
for k, v := range vars {
appendVariable(data, k, v)
}
_, err := io.Copy(conn, data)
if err != nil && isSocketSpaceError(err) {
file, err := tempFd()
if err != nil {
return journalError(err.Error())
}
_, err = io.Copy(file, data)
if err != nil {
return journalError(err.Error())
}
rights := syscall.UnixRights(int(file.Fd()))
/* this connection should always be a UnixConn, but better safe than sorry */
unixConn, ok := conn.(*net.UnixConn)
if !ok {
return journalError("can't send file through non-Unix connection")
}
unixConn.WriteMsgUnix([]byte{}, rights, nil)
} else if err != nil {
return journalError(err.Error())
}
return nil
}
func appendVariable(w io.Writer, name, value string) {
if !validVarName(name) {
journalError("variable name contains invalid character, ignoring")
}
if strings.ContainsRune(value, '\n') {
/* When the value contains a newline, we write:
* - the variable name, followed by a newline
* - the size (in 64bit little endian format)
* - the data, followed by a newline
*/
fmt.Fprintln(w, name)
binary.Write(w, binary.LittleEndian, uint64(len(value)))
fmt.Fprintln(w, value)
} else {
/* just write the variable and value all on one line */
fmt.Fprintf(w, "%s=%s\n", name, value)
}
}
func validVarName(name string) bool {
/* The variable name must be in uppercase and consist only of characters,
* numbers and underscores, and may not begin with an underscore. (from the docs)
*/
valid := name[0] != '_'
for _, c := range name {
valid = valid && ('A' <= c && c <= 'Z') || ('0' <= c && c <= '9') || c == '_'
}
return valid
}
func isSocketSpaceError(err error) bool {
opErr, ok := err.(*net.OpError)
if !ok {
return false
}
sysErr, ok := opErr.Err.(syscall.Errno)
if !ok {
return false
}
return sysErr == syscall.EMSGSIZE || sysErr == syscall.ENOBUFS
}
func tempFd() (*os.File, error) {
file, err := ioutil.TempFile("/dev/shm/", "journal.XXXXX")
if err != nil {
return nil, err
}
syscall.Unlink(file.Name())
if err != nil {
return nil, err
}
return file, nil
}
func journalError(s string) error {
s = "journal error: " + s
fmt.Fprintln(os.Stderr, s)
return errors.New(s)
}

View File

@ -0,0 +1,33 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package util contains utility functions related to systemd that applications
// can use to check things like whether systemd is running.
package util
import (
"os"
)
// IsRunningSystemd checks whether the host was booted with systemd as its init
// system. This functions similar to systemd's `sd_booted(3)`: internally, it
// checks whether /run/systemd/system/ exists and is a directory.
// http://www.freedesktop.org/software/systemd/man/sd_booted.html
func IsRunningSystemd() bool {
fi, err := os.Lstat("/run/systemd/system")
if err != nil {
return false
}
return fi.IsDir()
}

View File

@ -17,7 +17,6 @@ package main
import (
"flag"
oldlog "log"
"os"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/coreos/pkg/capnslog"
)
@ -32,7 +31,6 @@ func init() {
func main() {
rl := capnslog.MustRepoLogger("github.com/coreos/pkg/capnslog/cmd")
capnslog.SetFormatter(capnslog.NewStringFormatter(os.Stderr))
// We can parse the log level configs from the command line
flag.Parse()

View File

@ -18,6 +18,7 @@ import (
"bufio"
"fmt"
"io"
"runtime"
"strings"
"time"
)
@ -38,26 +39,68 @@ type StringFormatter struct {
}
func (s *StringFormatter) Format(pkg string, l LogLevel, i int, entries ...interface{}) {
now := time.Now()
y, m, d := now.Date()
h, min, sec := now.Clock()
s.w.WriteString(fmt.Sprintf("%d/%02d/%d %02d:%02d:%02d ", y, m, d, h, min, sec))
s.writeEntries(pkg, l, i, entries...)
now := time.Now().UTC()
s.w.WriteString(now.Format(time.RFC3339))
s.w.WriteByte(' ')
writeEntries(s.w, pkg, l, i, entries...)
s.Flush()
}
func (s *StringFormatter) writeEntries(pkg string, _ LogLevel, _ int, entries ...interface{}) {
func writeEntries(w *bufio.Writer, pkg string, _ LogLevel, _ int, entries ...interface{}) {
if pkg != "" {
s.w.WriteString(pkg + ": ")
w.WriteString(pkg + ": ")
}
str := fmt.Sprint(entries...)
endsInNL := strings.HasSuffix(str, "\n")
s.w.WriteString(str)
w.WriteString(str)
if !endsInNL {
s.w.WriteString("\n")
w.WriteString("\n")
}
s.Flush()
}
func (s *StringFormatter) Flush() {
s.w.Flush()
}
func NewPrettyFormatter(w io.Writer, debug bool) Formatter {
return &PrettyFormatter{
w: bufio.NewWriter(w),
debug: debug,
}
}
type PrettyFormatter struct {
w *bufio.Writer
debug bool
}
func (c *PrettyFormatter) Format(pkg string, l LogLevel, depth int, entries ...interface{}) {
now := time.Now()
ts := now.Format("2006-01-02 15:04:05")
c.w.WriteString(ts)
ms := now.Nanosecond() / 1000
c.w.WriteString(fmt.Sprintf(".%06d", ms))
if c.debug {
_, file, line, ok := runtime.Caller(depth) // It's always the same number of frames to the user's call.
if !ok {
file = "???"
line = 1
} else {
slash := strings.LastIndex(file, "/")
if slash >= 0 {
file = file[slash+1:]
}
}
if line < 0 {
line = 0 // not a real line number
}
c.w.WriteString(fmt.Sprintf(" [%s:%d]", file, line))
}
c.w.WriteString(fmt.Sprint(" ", l.Char(), " | "))
writeEntries(c.w, pkg, l, depth, entries...)
c.Flush()
}
func (c *PrettyFormatter) Flush() {
c.w.Flush()
}

View File

@ -44,7 +44,7 @@ func (g GlogFormatter) Format(pkg string, level LogLevel, depth int, entries ...
func GlogHeader(level LogLevel, depth int) []byte {
// Lmmdd hh:mm:ss.uuuuuu threadid file:line]
now := time.Now()
now := time.Now().UTC()
_, file, line, ok := runtime.Caller(depth) // It's always the same number of frames to the user's call.
if !ok {
file = "???"
@ -73,6 +73,7 @@ func GlogHeader(level LogLevel, depth int) []byte {
twoDigits(buf, second)
buf.WriteByte('.')
buf.WriteString(strconv.Itoa(now.Nanosecond() / 1000))
buf.WriteByte('Z')
buf.WriteByte(' ')
buf.WriteString(strconv.Itoa(pid))
buf.WriteByte(' ')

View File

@ -0,0 +1,49 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// +build !windows
package capnslog
import (
"io"
"os"
"syscall"
)
// Here's where the opinionation comes in. We need some sensible defaults,
// especially after taking over the log package. Your project (whatever it may
// be) may see things differently. That's okay; there should be no defaults in
// the main package that cannot be controlled or overridden programatically,
// otherwise it's a bug. Doing so is creating your own init_log.go file much
// like this one.
func init() {
initHijack()
// Go `log` pacakge uses os.Stderr.
SetFormatter(NewDefaultFormatter(os.Stderr))
SetGlobalLogLevel(INFO)
}
func NewDefaultFormatter(out io.Writer) Formatter {
if syscall.Getppid() == 1 {
// We're running under init, which may be systemd.
f, err := NewJournaldFormatter()
if err == nil {
return f
}
}
return NewPrettyFormatter(out, false)
}

View File

@ -0,0 +1,25 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package capnslog
import "os"
func init() {
initHijack()
// Go `log` pacakge uses os.Stderr.
SetFormatter(NewPrettyFormatter(os.Stderr, false))
SetGlobalLogLevel(INFO)
}

View File

@ -0,0 +1,66 @@
// Copyright 2015 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// +build !windows
package capnslog
import (
"errors"
"fmt"
"os"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/coreos/go-systemd/journal"
)
func NewJournaldFormatter() (Formatter, error) {
if !journal.Enabled() {
return nil, errors.New("No systemd detected")
}
return &journaldFormatter{}, nil
}
type journaldFormatter struct{}
func (j *journaldFormatter) Format(pkg string, l LogLevel, _ int, entries ...interface{}) {
var pri journal.Priority
switch l {
case CRITICAL:
pri = journal.PriCrit
case ERROR:
pri = journal.PriErr
case WARNING:
pri = journal.PriWarning
case NOTICE:
pri = journal.PriNotice
case INFO:
pri = journal.PriInfo
case DEBUG:
pri = journal.PriDebug
case TRACE:
pri = journal.PriDebug
default:
panic("Unhandled loglevel")
}
msg := fmt.Sprint(entries...)
tags := map[string]string{
"PACKAGE": pkg,
}
err := journal.Send(msg, pri, tags)
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
}
func (j *journaldFormatter) Flush() {}

View File

@ -18,7 +18,7 @@ import (
"log"
)
func init() {
func initHijack() {
pkg := NewPackageLogger("log", "")
w := packageWriter{pkg}
log.SetFlags(0)

View File

@ -24,7 +24,7 @@ type PackageLogger struct {
level LogLevel
}
const calldepth = 3
const calldepth = 2
func (p *PackageLogger) internalLog(depth int, inLevel LogLevel, entries ...interface{}) {
if inLevel != CRITICAL && p.level < inLevel {

12
Godeps/_workspace/src/github.com/rakyll/pb/LICENSE generated vendored Normal file
View File

@ -0,0 +1,12 @@
Copyright (c) 2012, Sergey Cherepanov
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

92
Godeps/_workspace/src/github.com/rakyll/pb/README.md generated vendored Normal file
View File

@ -0,0 +1,92 @@
## Terminal progress bar for Go
Simple progress bar for console programms.
### Installation
```
go get github.com/cheggaaa/pb
```
### Usage
```Go
package main
import (
"github.com/cheggaaa/pb"
"time"
)
func main() {
count := 100000
bar := pb.StartNew(count)
for i := 0; i < count; i++ {
bar.Increment()
time.Sleep(time.Millisecond)
}
bar.FinishPrint("The End!")
}
```
Result will be like this:
```
> go run test.go
37158 / 100000 [================>_______________________________] 37.16% 1m11s
```
More functions?
```Go
// create bar
bar := pb.New(count)
// refresh info every second (default 200ms)
bar.SetRefreshRate(time.Second)
// show percents (by default already true)
bar.ShowPercent = true
// show bar (by default already true)
bar.ShowBar = true
// no need counters
bar.ShowCounters = false
// show "time left"
bar.ShowTimeLeft = true
// show average speed
bar.ShowSpeed = true
// convert output to readable format (like KB, MB)
bar.SetUnits(pb.U_BYTES)
// and start
bar.Start()
```
Want handle progress of io operations?
```Go
// create and start bar
bar := pb.New(myDataLen).SetUnits(pb.U_BYTES)
bar.Start()
// my io.Reader
r := myReader
// my io.Writer
w := myWriter
// create multi writer
writer := io.MultiWriter(w, bar)
// and copy
io.Copy(writer, r)
// show example/copy/copy.go for advanced example
```
Not like the looks?
```Go
bar.Format("<.- >")
```

View File

@ -0,0 +1,83 @@
package main
import (
"github.com/cheggaaa/pb"
"os"
"fmt"
"io"
"time"
"strings"
"net/http"
"strconv"
)
func main() {
// check args
if len(os.Args) < 3 {
printUsage()
return
}
sourceName, destName := os.Args[1], os.Args[2]
// check source
var source io.Reader
var sourceSize int64
if strings.HasPrefix(sourceName, "http://") {
// open as url
resp, err := http.Get(sourceName)
if err != nil {
fmt.Printf("Can't get %s: %v\n", sourceName, err)
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Printf("Server return non-200 status: %v\n", resp.Status)
return
}
i, _ := strconv.Atoi(resp.Header.Get("Content-Length"))
sourceSize = int64(i)
source = resp.Body
} else {
// open as file
s, err := os.Open(sourceName)
if err != nil {
fmt.Printf("Can't open %s: %v\n", sourceName, err)
return
}
defer s.Close()
// get source size
sourceStat, err := s.Stat()
if err != nil {
fmt.Printf("Can't stat %s: %v\n", sourceName, err)
return
}
sourceSize = sourceStat.Size()
source = s
}
// create dest
dest, err := os.Create(destName)
if err != nil {
fmt.Printf("Can't create %s: %v\n", destName, err)
return
}
defer dest.Close()
// create bar
bar := pb.New(int(sourceSize)).SetUnits(pb.U_BYTES).SetRefreshRate(time.Millisecond * 10)
bar.ShowSpeed = true
bar.Start()
// create multi writer
writer := io.MultiWriter(dest, bar)
// and copy
io.Copy(writer, source)
bar.Finish()
}
func printUsage() {
fmt.Println("copy [source file or url] [dest file]")
}

View File

@ -0,0 +1,30 @@
package main
import (
"github.com/cheggaaa/pb"
"time"
)
func main() {
count := 5000
bar := pb.New(count)
// show percents (by default already true)
bar.ShowPercent = true
// show bar (by default already true)
bar.ShowPercent = true
// no need counters
bar.ShowCounters = true
bar.ShowTimeLeft = true
// and start
bar.Start()
for i := 0; i < count; i++ {
bar.Increment()
time.Sleep(time.Millisecond)
}
bar.FinishPrint("The End!")
}

44
Godeps/_workspace/src/github.com/rakyll/pb/format.go generated vendored Normal file
View File

@ -0,0 +1,44 @@
package pb
import (
"fmt"
"strings"
"strconv"
)
const (
// By default, without type handle
U_NO = 0
// Handle as b, Kb, Mb, etc
U_BYTES = 1
)
// Format integer
func Format(i int64, units int) string {
switch units {
case U_BYTES:
return FormatBytes(i)
}
// by default just convert to string
return strconv.Itoa(int(i))
}
// Convert bytes to human readable string. Like a 2 MiB, 64.2 KiB, 52 B
func FormatBytes(i int64) (result string) {
switch {
case i > (1024 * 1024 * 1024 * 1024):
result = fmt.Sprintf("%#.02f TB", float64(i)/1024/1024/1024/1024)
case i > (1024 * 1024 * 1024):
result = fmt.Sprintf("%#.02f GB", float64(i)/1024/1024/1024)
case i > (1024 * 1024):
result = fmt.Sprintf("%#.02f MB", float64(i)/1024/1024)
case i > 1024:
result = fmt.Sprintf("%#.02f KB", float64(i)/1024)
default:
result = fmt.Sprintf("%d B", i)
}
result = strings.Trim(result, " ")
return
}

267
Godeps/_workspace/src/github.com/rakyll/pb/pb.go generated vendored Normal file
View File

@ -0,0 +1,267 @@
package pb
import (
"fmt"
"io"
"math"
"strings"
"sync/atomic"
"time"
)
const (
// Default refresh rate - 200ms
DEFAULT_REFRESH_RATE = time.Millisecond * 200
FORMAT = "[=>-]"
)
// DEPRECATED
// variables for backward compatibility, from now do not work
// use pb.Format and pb.SetRefreshRate
var (
DefaultRefreshRate = DEFAULT_REFRESH_RATE
BarStart, BarEnd, Empty, Current, CurrentN string
)
// Create new progress bar object
func New(total int) (pb *ProgressBar) {
pb = &ProgressBar{
Total: int64(total),
RefreshRate: DEFAULT_REFRESH_RATE,
ShowPercent: true,
ShowCounters: true,
ShowBar: true,
ShowTimeLeft: true,
}
pb.Format(FORMAT)
return
}
// Create new object and start
func StartNew(total int) (pb *ProgressBar) {
pb = New(total)
pb.Start()
return
}
// Callback for custom output
// For example:
// bar.Callback = func(s string) {
// mySuperPrint(s)
// }
//
type Callback func(out string)
type ProgressBar struct {
current int64 // current must be first member of struct (https://code.google.com/p/go/issues/detail?id=5278)
Total int64
RefreshRate time.Duration
ShowPercent, ShowCounters bool
ShowSpeed, ShowTimeLeft, ShowBar bool
Output io.Writer
Callback Callback
NotPrint bool
Units int
isFinish bool
startTime time.Time
BarStart string
BarEnd string
Empty string
Current string
CurrentN string
}
// Start print
func (pb *ProgressBar) Start() {
pb.startTime = time.Now()
if pb.Total == 0 {
pb.ShowBar = false
pb.ShowTimeLeft = false
pb.ShowPercent = false
}
go pb.writer()
}
// Increment current value
func (pb *ProgressBar) Increment() int {
return pb.Add(1)
}
// Set current value
func (pb *ProgressBar) Set(current int) {
atomic.StoreInt64(&pb.current, int64(current))
}
// Add to current value
func (pb *ProgressBar) Add(add int) int {
return int(atomic.AddInt64(&pb.current, int64(add)))
}
// Set custom format for bar
// Example: bar.Format("[=>_]")
func (pb *ProgressBar) Format(format string) (bar *ProgressBar) {
bar = pb
formatEntries := strings.Split(format, "")
if len(formatEntries) != 5 {
return
}
pb.BarStart = formatEntries[0]
pb.BarEnd = formatEntries[4]
pb.Empty = formatEntries[3]
pb.Current = formatEntries[1]
pb.CurrentN = formatEntries[2]
return
}
// Set bar refresh rate
func (pb *ProgressBar) SetRefreshRate(rate time.Duration) (bar *ProgressBar) {
bar = pb
pb.RefreshRate = rate
return
}
// Set units
// bar.SetUnits(U_NO) - by default
// bar.SetUnits(U_BYTES) - for Mb, Kb, etc
func (pb *ProgressBar) SetUnits(units int) (bar *ProgressBar) {
bar = pb
switch units {
case U_NO, U_BYTES:
pb.Units = units
}
return
}
// End print
func (pb *ProgressBar) Finish() {
pb.isFinish = true
pb.write(atomic.LoadInt64(&pb.current))
if !pb.NotPrint {
fmt.Println()
}
}
// End print and write string 'str'
func (pb *ProgressBar) FinishPrint(str string) {
pb.Finish()
fmt.Println(str)
}
// implement io.Writer
func (pb *ProgressBar) Write(p []byte) (n int, err error) {
n = len(p)
pb.Add(n)
return
}
// implement io.Reader
func (pb *ProgressBar) Read(p []byte) (n int, err error) {
n = len(p)
pb.Add(n)
return
}
func (pb *ProgressBar) write(current int64) {
width, _ := terminalWidth()
var percentBox, countersBox, timeLeftBox, speedBox, barBox, end, out string
// percents
if pb.ShowPercent {
percent := float64(current) / (float64(pb.Total) / float64(100))
percentBox = fmt.Sprintf(" %#.02f %% ", percent)
}
// counters
if pb.ShowCounters {
if pb.Total > 0 {
countersBox = fmt.Sprintf("%s / %s ", Format(current, pb.Units), Format(pb.Total, pb.Units))
} else {
countersBox = Format(current, pb.Units) + " "
}
}
// time left
if pb.ShowTimeLeft && current > 0 {
fromStart := time.Now().Sub(pb.startTime)
perEntry := fromStart / time.Duration(current)
left := time.Duration(pb.Total-current) * perEntry
left = (left / time.Second) * time.Second
if left > 0 {
timeLeftBox = left.String()
}
}
// speed
if pb.ShowSpeed && current > 0 {
fromStart := time.Now().Sub(pb.startTime)
speed := float64(current) / (float64(fromStart) / float64(time.Second))
speedBox = Format(int64(speed), pb.Units) + "/s "
}
// bar
if pb.ShowBar {
size := width - len(countersBox+pb.BarStart+pb.BarEnd+percentBox+timeLeftBox+speedBox)
if size > 0 {
curCount := int(math.Ceil((float64(current) / float64(pb.Total)) * float64(size)))
emptCount := size - curCount
barBox = pb.BarStart
if emptCount < 0 {
emptCount = 0
}
if curCount > size {
curCount = size
}
if emptCount <= 0 {
barBox += strings.Repeat(pb.Current, curCount)
} else if curCount > 0 {
barBox += strings.Repeat(pb.Current, curCount-1) + pb.CurrentN
}
barBox += strings.Repeat(pb.Empty, emptCount) + pb.BarEnd
}
}
// check len
out = countersBox + barBox + percentBox + speedBox + timeLeftBox
if len(out) < width {
end = strings.Repeat(" ", width-len(out))
}
out = countersBox + barBox + percentBox + speedBox + timeLeftBox
// and print!
switch {
case pb.Output != nil:
fmt.Fprint(pb.Output, out+end)
case pb.Callback != nil:
pb.Callback(out + end)
case !pb.NotPrint:
fmt.Print("\r" + out + end)
}
}
func (pb *ProgressBar) writer() {
var c, oc int64
oc = -1
for {
if pb.isFinish {
break
}
c = atomic.LoadInt64(&pb.current)
if c != oc {
pb.write(c)
oc = c
}
time.Sleep(pb.RefreshRate)
}
}
type window struct {
Row uint16
Col uint16
Xpixel uint16
Ypixel uint16
}

35
Godeps/_workspace/src/github.com/rakyll/pb/pb_nix.go generated vendored Normal file
View File

@ -0,0 +1,35 @@
// +build linux darwin freebsd
package pb
import (
"runtime"
"syscall"
"unsafe"
)
const (
TIOCGWINSZ = 0x5413
TIOCGWINSZ_OSX = 1074295912
)
func bold(str string) string {
return "\033[1m" + str + "\033[0m"
}
func terminalWidth() (int, error) {
w := new(window)
tio := syscall.TIOCGWINSZ
if runtime.GOOS == "darwin" {
tio = TIOCGWINSZ_OSX
}
res, _, err := syscall.Syscall(syscall.SYS_IOCTL,
uintptr(syscall.Stdin),
uintptr(tio),
uintptr(unsafe.Pointer(w)),
)
if int(res) == -1 {
return 0, err
}
return int(w.Col), nil
}

16
Godeps/_workspace/src/github.com/rakyll/pb/pb_win.go generated vendored Normal file
View File

@ -0,0 +1,16 @@
// +build windows
package pb
import (
"github.com/olekukonko/ts"
)
func bold(str string) string {
return str
}
func terminalWidth() (int, error) {
size , err := ts.GetSize()
return size.Col() , err
}

View File

@ -1,5 +1,5 @@
// Copyright (c) 2012-2015 Ugorji Nwoke. All rights reserved.
// Use of this source code is governed by a BSD-style license found in the LICENSE file.
// Use of this source code is governed by a MIT license found in the LICENSE file.
/*
High Performance, Feature-Rich Idiomatic Go codec/encoding library for

View File

@ -1,5 +1,5 @@
// Copyright (c) 2012-2015 Ugorji Nwoke. All rights reserved.
// Use of this source code is governed by a BSD-style license found in the LICENSE file.
// Use of this source code is governed by a MIT license found in the LICENSE file.
package codec

View File

@ -1,5 +1,5 @@
// Copyright (c) 2012-2015 Ugorji Nwoke. All rights reserved.
// Use of this source code is governed by a BSD-style license found in the LICENSE file.
// Use of this source code is governed by a MIT license found in the LICENSE file.
package codec
@ -98,7 +98,7 @@ func (e *cborEncDriver) encUint(v uint64, bd byte) {
} else if v <= math.MaxUint32 {
e.w.writen1(bd + 0x1a)
bigenHelper{e.x[:4], e.w}.writeUint32(uint32(v))
} else if v <= math.MaxUint64 {
} else { // if v <= math.MaxUint64 {
e.w.writen1(bd + 0x1b)
bigenHelper{e.x[:8], e.w}.writeUint64(v)
}

View File

@ -1,5 +1,5 @@
// Copyright (c) 2012-2015 Ugorji Nwoke. All rights reserved.
// Use of this source code is governed by a BSD-style license found in the LICENSE file.
// Use of this source code is governed by a MIT license found in the LICENSE file.
package codec

View File

@ -1,5 +1,5 @@
// Copyright (c) 2012-2015 Ugorji Nwoke. All rights reserved.
// Use of this source code is governed by a BSD-style license found in the LICENSE file.
// Use of this source code is governed by a MIT license found in the LICENSE file.
package codec

Some files were not shown because too many files have changed in this diff Show More